Genes, Genomes and Variation

Authoritative databases of genome sequences and their function

Technologies
python, mysql, javascript, perl, rest
Topics
databases, genomics, scientific visualization, data analysis
Authoritative databases of genome sequences and their function

Ensembl was created alongside the publication of the first draft of the human genome in 2001 to distribute this goldmine of information to scientists across the world. It quickly became and remains one of the most important reference databases in genomics, following the rapid development of the field. Its initial mission included finding all of the genes in the human genome. A year later, the mouse genome was published and we developed tools to directly compare genomes across species. Over the following decade, sequencing capacity increased exponentially (faster that Moore's Law in fact) and large surveys started examining more species and more individuals within each species. Our mission therefore expanded to store these datasets and statistics efficiently. Finally, in recent years, sequencing has been used to study the biochemical activity of the DNA molecule within the different tissues of an individual, prompting us to extend yet again our remit.

At the same time, Ensembl is an evolving software development project. Over 15 years, we moved from a central relational MySQL database with a Perl API and static web pages, to an array of storage technologies with a RESTful interface and an interactive front-end. We have dedicated portals for the large clades on the tree of life (known as Ensembl Genomes). Our annotations are produced through centuries of CPU time, coordinated by our powerful eHive analysis workflow manager.

Today, we are a team of nearly 90 full time staff, housed at the European Bioinformatics Institute, and we collaborate with many external contributors around the world, in particular via our Github repositories where you can see us work day-to-day. We are at the intersection of two exciting and rapidly expanding fields, and there is no lack of interesting directions to push the project.

2019 Program

Successful Projects

Contributor
Srijan Verma
Mentor
Daniel Zerbino
Organization
Genes, Genomes and Variation
Applying machine learning techniques to characterising and naming lncRNA genes
Advances in RNA sequencing technologies have revealed the complexity of our genome. Long non-coding RNAs (lncRNAs) make up the majority of the...
Contributor
Praduman Goyal
Mentor
Fergal Martin, Osagie Izuogu
Organization
Genes, Genomes and Variation
Circular RNA analytics frontend
The project aims to deliver a responsive web-based analytics dashboard, integrated to an in-house catalogue of circRNAs identified from multiple...
Contributor
Harshit Gupta-1
Mentor
Mateus Patricio, Matthieu Muffato
Organization
Genes, Genomes and Variation
Using Deep Learning Techniques To Enhance Orthology Calls
Homology refers to the shared ancestry between a pair of structures, organisms or genes, in different taxa. Currently, homology types are decided on...