SnpSift
SnpSift is a toolbox that allows you to filter and manipulate annotated files.
Once your genomic variants have been annotated, you need to filter them out in order to find the "interesting / relevant variants". Given the large data files, this is not a trivial task (e.g. you cannot load all the variants into XLS spreadsheet). SnpSift helps to perform this VCF file manipulation and filtering required at this stage in data processing pipelines.
Download and install
SnpSift is part of SnpEff main distribution, so please click on here and follow the instructions on how to download and install SnpEff.
SnpSift utilities
SnpSift is a collection of tools to manipulate VCF (variant call format) files.
Some examples of what you can do:
| Operation | Meaning |
|---|---|
| Annotate | Add 'ID' and INFO fields from another VCF database (e.g. dbSnp). Assumes entries are sorted. |
| Annotate (mem) | Annotate from a database created from a VCF file, loaded into memory. |
| CaseControl | Compare how many variants are in 'case' and in 'control' groups; calculate p-values (Fisher exact test). |
| Concordance | Concordance metrics between two VCF files. |
| DbNSFP | Annotate using dbNSFP, an integrated database of functional predictions from multiple algorithms (SIFT, Polyphen2, LRT, MutationTaster, PhyloP, GERP++, etc.). |
| Extract fields | Extract fields from a VCF file into tab-separated format. |
| Filter | Filter using arbitrary expressions, e.g. "(QUAL > 30) | (exists INDEL) | ( countHet() < 2 )". |
| GeneSets | Annotate using MSigDb gene sets (GO, KEGG, Reactome, BioCarta, etc.). |
| GT | Compress genotype fields to reduce VCF file size in large sequencing projects. |
| GWAS Catalog | Annotate using GWAS Catalog. |
| Intersect | Intersect intervals from multiple files to find consensus regions (e.g. ChIP-Seq peaks). |
| Intervals | Filter variants that intersect with intervals defined in BED files. |
| Intervals Index | Filter variants that intersect with intervals. Uses file indexing for fast random access; intended for huge VCF files and a small number of intervals. |
| Join | Join files by genomic region (intersecting or closest). |
| PhastCons | Annotate using conservation scores (phastCons). |
| Private | Annotate if a variant is private to a family or group. |
| RmInfo | Remove INFO fields from a VCF file. |
| RmRefGen | Remove reference genotypes (replace '0/0' genotypes by '.'). |
| Split | Split a VCF file by chromosome. |
| TsTv | Calculate transition to transversion ratio. |
| Variant type | Annotate variant type (SNP, MNP, INS, DEL, or MIXED). Also adds HOM/HET if there is only one sample. |
| VcfCheck | Check that a VCF file is well formed. |
| Vcf2Tped | Convert VCF to TPED format. |
Citing SnpSift
In order to cite SnpSift, please use the following example.
Source code
The project is hosted at GitHub.