Skip to content

SnpSift

SnpSift is a toolbox that allows you to filter and manipulate annotated files.

Once your genomic variants have been annotated, you need to filter them out in order to find the "interesting / relevant variants". Given the large data files, this is not a trivial task (e.g. you cannot load all the variants into XLS spreadsheet). SnpSift helps to perform this VCF file manipulation and filtering required at this stage in data processing pipelines.

Download and install

SnpSift is part of SnpEff main distribution, so please click on here and follow the instructions on how to download and install SnpEff.

SnpSift utilities

SnpSift is a collection of tools to manipulate VCF (variant call format) files.

Some examples of what you can do:

Operation Meaning
Annotate Add 'ID' and INFO fields from another VCF database (e.g. dbSnp). Assumes entries are sorted.
Annotate (mem) Annotate from a database created from a VCF file, loaded into memory.
CaseControl Compare how many variants are in 'case' and in 'control' groups; calculate p-values (Fisher exact test).
Concordance Concordance metrics between two VCF files.
DbNSFP Annotate using dbNSFP, an integrated database of functional predictions from multiple algorithms (SIFT, Polyphen2, LRT, MutationTaster, PhyloP, GERP++, etc.).
Extract fields Extract fields from a VCF file into tab-separated format.
Filter Filter using arbitrary expressions, e.g. "(QUAL > 30) | (exists INDEL) | ( countHet() < 2 )".
GeneSets Annotate using MSigDb gene sets (GO, KEGG, Reactome, BioCarta, etc.).
GT Compress genotype fields to reduce VCF file size in large sequencing projects.
GWAS Catalog Annotate using GWAS Catalog.
Intersect Intersect intervals from multiple files to find consensus regions (e.g. ChIP-Seq peaks).
Intervals Filter variants that intersect with intervals defined in BED files.
Intervals Index Filter variants that intersect with intervals. Uses file indexing for fast random access; intended for huge VCF files and a small number of intervals.
Join Join files by genomic region (intersecting or closest).
PhastCons Annotate using conservation scores (phastCons).
Private Annotate if a variant is private to a family or group.
RmInfo Remove INFO fields from a VCF file.
RmRefGen Remove reference genotypes (replace '0/0' genotypes by '.').
Split Split a VCF file by chromosome.
TsTv Calculate transition to transversion ratio.
Variant type Annotate variant type (SNP, MNP, INS, DEL, or MIXED). Also adds HOM/HET if there is only one sample.
VcfCheck Check that a VCF file is well formed.
Vcf2Tped Convert VCF to TPED format.

Citing SnpSift

In order to cite SnpSift, please use the following example.

Source code

The project is hosted at GitHub.