Contributor: Rohan K

Data pipeline for exchange of genomic variation between public repositories

Mentors: Sundar Venkataraman, Cristina Yenyxe Gonzalez
Organization: Global Alliance for Genomics and Health

The main goal of this project is to implement a mechanism to be in sync with the latest human data submitted to dbSNP. Once imported, this information can be distributed via EVA implementations of the GA4GH APIs htsget and Beacon specifications, as well as the EVA website.

Acceptance criteria

Given a dbSNP FTP directory with the human variant information, the pipeline should parse JSONs for each chromosome and write the variants from the JSONs to the EVA archive

Needed tasks

Construct an object model for dbSNP and parse the JSONs to objects in that model
Convert the objects in the dbSNP model to the EVA variant object model
Construct a variant object by including its component objects from above. Use the Variant writer of variation-commons to write the variants to the EVA archive