The VCF (Variant Call Format) is a format for text files, which is generally stored in a compressed manner to make the data retrieval of variants fast. The data which is redundant is not stored, only the variations are stored. VCF files are used to store all variant types which includes single nucleotide polymorphism (SNP) in a specific position of the genome, short insertions and deletions (INDEL) and structural variants (SV). VCF-validator includes various checks to ensure that the VCF file is consistent. It is based on a formal grammar and performs lexical, syntactic, and semantic analysis of the VCF file. It also includes a tool called VCF-debugulator which fixes errors such as the presence of duplicate variants automatically. SNPs and INDELs are fully supported in VCF-validator, but the support for SVs is still limited. The aim of this project is to improve the support for structural variants in the validator and the debugulator.


Anishka Gupta


  • Cristina Yenyxe Gonzalez
  • Jose Miguel Mut Lopez