SnpSift Split
Simply split (or join) VCF files. Allows to create one file per chromosome or one file every N lines.
A typical usage for this command is to:
- Split very large VCF files
SnpSift split huge.vcf
- Perform some CPU intensive processing in parallel using several computers or cores
- Join the resulting VCF files
SnpSift split -j huge.000.vcf huge.001.vcf huge.002.vcf ... > huge.out.vcf
.
E.g.: Splitting a VCF having human variants:
java -jar SnpSift.jar split myHugeVcf.vcf.gz
Will create files myHugeVcf.1.vcf, myHugeVcf.2.vcf, ... , myHugeVcf.22.vcf, myHugeVcf.X.vcf, myHugeVcf.Y.vcf
You can also specify '-l' command line option to split the file every N lines.
E.g.: Split a VCF file every 10,000 lines:
java -jar SnpSift.jar split -l 10000 myHugeVcf.vcf.gz
Will create files myHugeVcf.001.vcf, myHugeVcf.002.vcf, ...
Info
VCF header will be added to each file, so resulting files will be more than 10,000 lines.
You can use -j
(join) command line option to join a set of VCF files.
java -jar SnpSift.jar split -j huge.000.vcf huge.001.vcf huge.002.vcf ... > huge.out.vcf