Skip to content

SnpSift Annotate

Annotate using fields from another VCF file (e.g. dbSnp, 1000 Genomes projects, ClinVar, ExAC, etc.).

Typical usage

This is typically used to annotate IDs and INFO fields from a 'database' VCF file (e.g. dbSnp). Here is an example:

java -jar SnpSift.jar annotate dbSnp132.vcf variants.vcf > variants_annotated.vcf

Important: SnpSift annotate command has different strategies depending on the input VCF file:

  • Uncomressed VCF If the file is not compressed, it created an index in memory to optimize search. This assumes that both the database and the input VCF files are sorted by position, since it is required by the VCF standard (chromosome sort order can differ between files).
  • Compressed, Tabix indexed It uses the tabix index to speed up annotations.
  • Compressed, NOT Tabix indexed It loads the entire 'database' VCF file into memory, which may be slow or even impractical for large 'database' VCF files. This allows to annotate using unsorted VCF files.

Note:

  • By default it adds ALL database INFO fields.
  • You can use the -info command line option if you only want select only a subset of fields from db.vcf file.
  • You can use the -id command line option if you only want to add ID fields (no INFO fields will be added).
  • Using the -exists command line option, you can annotate entries that exists in the 'database' file.

Info

DbSnp in VCF format can be downloaded here (GRCh38 coordinates). For other versions, check this link.

Example 1: Annotating ID from dbSnp

$ cat test.chr22.vcf
#CHROM  POS         ID           REF  ALT  QUAL   FILTER  INFO
22      16157571    .            T    G    0.0    FAIL    NS=53
22      16346045    .            T    C    0.0    FAIL    NS=244
22      16350245    .            C    A    0.0    FAIL    NS=192
22      17054103    .            G    A    0.0    PASS    NS=404
22      17071906    .            A    T    0.0    PASS    NS=464
22      17072347    .            C    T    0.0    PASS    NS=464
22      17072394    .            C    G    0.0    PASS    NS=463
22      17072411    .            G    T    0.0    PASS    NS=464

$ java -jar SnpSift.jar annotate -id db/dbSnp/dbSnp137.20120616.vcf test.chr22.vcf
#CHROM  POS         ID           REF  ALT  QUAL   FILTER  INFO
22      16157571    .            T    G    0.0    FAIL    NS=53
22      16346045    rs56234788   T    C    0.0    FAIL    NS=244
22      16350245    rs2905295    C    A    0.0    FAIL    NS=192
22      17054103    rs4008588    G    A    0.0    PASS    NS=404
22      17071906    .            A    T    0.0    PASS    NS=464
22      17072347    rs139948519  C    T    0.0    PASS    NS=464
22      17072394    .            C    G    0.0    PASS    NS=463
22      17072411    rs41277596   G    T    0.0    PASS    NS=464

Example 2: Annotating ID and all INFO fields from dbSnp

(VCF headers not shown for brevity):

$ cat test.chr22.vcf
#CHROM  POS         ID           REF  ALT  QUAL   FILTER  INFO
22      16157571    .            T    G    0.0    FAIL    NS=53
22      16346045    .            T    C    0.0    FAIL    NS=244
22      16350245    .            C    A    0.0    FAIL    NS=192
22      17054103    .            G    A    0.0    PASS    NS=404
22      17071906    .            A    T    0.0    PASS    NS=464
22      17072347    .            C    T    0.0    PASS    NS=464
22      17072394    .            C    G    0.0    PASS    NS=463
22      17072411    .            G    T    0.0    PASS    NS=464

$ java -jar SnpSift.jar annotate db/dbSnp/dbSnp137.20120616.vcf test.chr22.vcf
#CHROM  POS         ID           REF  ALT  QUAL   FILTER  INFO
22      16157571    .            T    G    0.0    FAIL    NS=53
22      16346045    rs56234788   T    C    0.0    FAIL    NS=244;RSPOS=16346045;GMAF=0.162248628884826;dbSNPBuildID=129;SSR=0;SAO=0;VP=050100000000000100000100;WGT=0;VC=SNV;SLO;GNO
22      16350245    rs2905295    C    A    0.0    FAIL    NS=192;RSPOS=16350245;GMAF=0.230804387568556;dbSNPBuildID=101;SSR=1;SAO=0;VP=050000000000000100000140;WGT=0;VC=SNV;GNO
22      17054103    rs4008588    G    A    0.0    PASS    NS=404;RSPOS=17054103;GMAF=0.123400365630713;dbSNPBuildID=108;SSR=0;SAO=0;VP=050100000000070010000100;WGT=0;VC=SNV;SLO;VLD;G5A;G5;KGPilot123
22      17071906    .            A    T    0.0    PASS    NS=464
22      17072347    rs139948519  C    T    0.0    PASS    NS=464;RSPOS=17072347;dbSNPBuildID=134;SSR=0;SAO=0;VP=050200000004040010000100;WGT=0;VC=SNV;S3D;ASP;VLD;KGPilot123
22      17072394    .            C    G    0.0    PASS    NS=463
22      17072411    rs41277596   G    T    0.0    PASS    NS=464;RSPOS=17072411;GMAF=0.00411334552102377;dbSNPBuildID=127;SSR=0;SAO=0;VP=050200000008040010000100;GENEINFO=CCT8L2:150160;WGT=0;VC=SNV;S3D;CFL;VLD;KGPilot123