We focus on annotating features of genomic sequences and delpoying to users

The Genome Assembly and Annotation section of EMBL-EBI brings together key reference resources in the field of genomics:

  • Ensembl (http://www.ensembl.org) was created alongside the publication of the first draft of the human genome in 2001 to distribute this goldmine of information to scientists across the world. It now annotates this very long sequence with functional information, such as the genomic location of genes, evolutionarily conserved elements, known variants and gene regulatory elements across a broad number of species.
  • MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify (formerly EBI Metagenomics) has more than doubled the number of publicly available analysed datasets held within the resource.
  • The Hugo Gene Nomenclature Committee (HGNC) and its sibling project the Vertebrate Gene Nomenclature Committee (VGNC) are jointly responsible for defining the official names of genes in human, chimpanzee, dog, cow, horse, macaque, cat and pig. This official nomenclature ensures that studies and results on the same gene can easily be aggregated.
  • The Genome Wide Association Study (GWAS) Catalog collects all available common disease studies across cohorts of patients. These datasets allow researchers to correlate genotype to phenotype and better understand the mechanisms underlying disease as well as the role of individual human genes.

Given the rapid pace of generation of genomics and sequencing data, we support a fast-evolving software stack, and are constantly investigating new solutions for data storage, processing, distribution and display. We have a large compute cluster and deep knowledge of HCP, as well as access to both our own Embassy Cloud and other cloud providers

lightbulb_outline View ideas list


  • python
  • rust
  • pytorch
  • javascript
  • mysql


mail_outline Contact email

Genome Assembly and Annotation 2021 Projects

  • Ank
    An orchestration system for MGnify running on distributed heterogeneous compute clusters
    MGnify is a freely available online service hosted by the European Bioinformatics Institute (EMBL-EBI). It helps researchers to do exploration and...
  • Aidan Marshall
    Deep learning homology inference
    Many genes both within and across species share a common origin. Homologoy inference is concerned with disentangling the precise nature of this...
  • Rishab Mallick
    Extract important information from scientific papers
    Current limitations with the variant detection using wbtools (and entity extraction in Wormbases’s AFP pipeline) is that it relies on regular...