Accessing Ensembl data with Presto and AWS Athena
- Mentors
- Andy Yates
- Organization
- Genome Assembly and Annotation
- Technologies
- python, javascript, html, sql, aws, SQL and database querying, Web API design, HTML & JavaScript, AWS Athena, Presto
- Topics
- web, bioinformatics, genome, cloud, data, AWS, Genomic Data, Ensembl, BioMart, Gene, Genome database
The goal of this project is to build a nextgen replacement for the BioMart tool that provides a way to download custom reports of genes, transcripts, proteins and other data types. Considering the huge amount of data that needs to be dealt with in the area of genomic study, the current tool has very limited use cases because of scalability issues. The new tool will use the latest technologies available in the market such as AWS Athena (built on Presto), Parquet/ORC to build a scalable solution.