Contributor
Rohit Shrivastava

Accessing Ensembl data with Presto and AWS Athena


Mentors
Andy Yates
Organization
Genome Assembly and Annotation
Technologies
python, javascript, html, sql, aws, SQL and database querying, Web API design, HTML & JavaScript, AWS Athena, Presto
Topics
web, bioinformatics, genome, cloud, data, AWS, Genomic Data, Ensembl, BioMart, Gene, Genome database
The goal of this project is to build a nextgen replacement for the BioMart tool that provides a way to download custom reports of genes, transcripts, proteins and other data types. Considering the huge amount of data that needs to be dealt with in the area of genomic study, the current tool has very limited use cases because of scalability issues. The new tool will use the latest technologies available in the market such as AWS Athena (built on Presto), Parquet/ORC to build a scalable solution.