Skip to content

SnpSift GWAS Catalog

Annotate variants with information from the GWAS Catalog.

Usage

java -jar SnpSift.jar gwasCat -genome GRCh37 [-db path/to/gwascatalog.txt] [file.vcf] > output.vcf

The command requires a genome version (e.g. -genome GRCh37) and a GWAS Catalog database file. The database can be specified via -db or configured in snpEff.config. Default VCF input is STDIN.

Database

You need the GWAS Catalog file in the "alternative" tab-delimited format, which can be downloaded from EBI.

If the database path is configured in snpEff.config (e.g. database.gwascatalog.GRCh37), SnpSift will find it automatically. Otherwise, specify the path with -db.

How it works

The GWAS Catalog is loaded into memory and indexed by chromosomal position using an interval tree. For each variant in the VCF file, the command matches by exact genomic position (chromosome and position), not by rsID.

Chromosome names are normalized (e.g. 'chr1' and '1' are treated the same). Large structural variants are skipped during annotation.

Output fields

The command adds up to five INFO fields to matching VCF entries:

Field Description
GWASCAT_TRAIT Associated trait (e.g. "Body_mass_index")
GWASCAT_P_VALUE p-value from the association study
GWASCAT_OR_BETA Odds ratio or Beta coefficient (only added when > 0)
GWASCAT_REPORTED_GENE Gene reported in the original study
GWASCAT_PUBMED_ID PubMed ID of the original paper

When multiple GWAS Catalog entries match a variant, values are comma-separated. All VCF entries are output (both annotated and non-annotated).

Example

$ java -jar SnpSift.jar gwasCat -genome GRCh37 -db gwascatalog.txt test.vcf > test.gwas.vcf

$ grep GWASCAT test.gwas.vcf | head -5
1   1005806 rs3934834   C   T   .   PASS    AF=0.091;GWASCAT_TRAIT=Body_mass_index;GWASCAT_P_VALUE=1.0E-6;GWASCAT_REPORTED_GENE=AGRN;GWASCAT_PUBMED_ID=25673413
1   2069172 rs425277    C   T   .   PASS    AF=0.400;GWASCAT_TRAIT=Height;GWASCAT_P_VALUE=2.0E-8;GWASCAT_REPORTED_GENE=PRKCZ;GWASCAT_PUBMED_ID=20881960
1   2392648 rs2477686   G   C   .   PASS    AF=0.745;GWASCAT_TRAIT=Non_obstructive_azoospermia;GWASCAT_P_VALUE=3.0E-7;GWASCAT_REPORTED_GENE=MORN1;GWASCAT_PUBMED_ID=24532628
1   2513216 rs734999    C   T   .   PASS    AF=0.547;GWASCAT_TRAIT=Ulcerative_colitis;GWASCAT_P_VALUE=5.0E-13;GWASCAT_REPORTED_GENE=TNFRSF14;GWASCAT_PUBMED_ID=21297633
1   3083712 rs2651899   T   C   .   PASS    AF=0.467;GWASCAT_TRAIT=Migraine;GWASCAT_P_VALUE=1.0E-11;GWASCAT_REPORTED_GENE=PRDM16;GWASCAT_PUBMED_ID=23793025