vcf-validator is a suite of command-line tools that can validate and fix VCF files. The goal of my project is to overcome the limitations of the validation suite that restrict its suitability for users with a less technical, more biological profile. I would perform the following tasks:
The suite is hard to compile for non-Linux operating systems. Gradually the user bases of Windows and MacOS X are growing, and a simple concise way of building the tool must be provided to the users. To accomplish this, I aim to simplify the build process for Windows and MacOS X.
Currently, the suite is completely terminal-based and can only read from and write reports to local files and needs to be installed and executed in the user’s machine. To deal with this, I aim to provide a network interface to run the suite as a service that would allow users to validate their own remote files, or a dynamically generated VCF stream.
If the input VCF is compressed, it is the user's responsibility to decompress it. My task is to reduce this extra step by making the validator itself capable of decompressing such files.
I would investigate how to insert checksums of reference genome sequences in the VCF header.