Genome Assembly and Annotation
Providing freely accessible genomic data
Providing freely accessible genomic data
The Genome Assembly and Annotation section of EMBL-EBI brings together key reference resources in the field of genomics:
- Ensembl (http://www.ensembl.org) was created in 1999 in preparation for the publication of the first draft of the human genome, to allow researchers and clinicians to start translating the secrets hidden within the human genome into real world applications. Ensembl has grown into a champion of biodiversity, providing data for tens of thousands of species across our vertebrate, metazoa, plant, fungi and bacterial divisions.
- MGnify (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the assembly, analysis and archiving of microbiome data derived from sequencing microbial populations that are present in particular environments. Over the past 2 years, MGnify has more than doubled the number of publicly available analysed datasets held within the resource.
- WormBase (https://wormbase.org/) is one of the World's oldest active bioinformatic resources, more than 20 years old. We scan all published literature and datasets on the model organism C. elegans, to create a very comprehensive resouce of genomics, strains, experiments, paper and people, aimed towards accelerating research and discoveries in fundamental biology as well as human health.
- The Hugo Gene Nomenclature Committee (HGNC) and its sibling project the Vertebrate Gene Nomenclature Committee (VGNC) are jointly responsible for defining the official names of genes in human and key vertebrate species. This official nomenclature ensures that studies and results on the same gene can easily be aggregated.
Given the rapid pace of generation of genomics and sequencing data, we support a fast-evolving software stack, and are constantly investigating new solutions for data storage, processing, distribution and display.
Please visit our projects page for ideas on potential GSoC projects:
https://www.ensembl.info/about/projects/
2023 Program
Successful Projects
Contributor
Satya.Adda
Mentor
Jorge Alvarez, Sarah Dyer
Organization
Genome Assembly and Annotation
Expand the species search functionality for the ensembl beta website (Metazoa).
The objective of this project is to create a standalone Elasticsearch tool that can handle taxonomic-related requests. This tool helps to expand the...
Contributor
Amartya Nambiar
Mentor
Martin Beracochea, sandyr
Organization
Genome Assembly and Annotation
Interactive Visualization for Comparative Metagenomics in MGnify
The project aims to improve the visualisation tools for metagenomics data in the MGnify platform by identifying and using new technologies that can...
Contributor
Kenny Lam
Mentor
Jose Gonzalez, Adam Frankish
Organization
Genome Assembly and Annotation
Differentiating Real and Misaligned Introns with Machine Learning
The advancement in the accuracy of long-read sequencing technology has allowed us to explore novel transcript variants of known genes. Preventing...
Contributor
Purav Biyani
Mentor
Fergal Martin, Leanne Haggerty, Thiago Genez
Organization
Genome Assembly and Annotation
A Nextflow Pipeline for Repeat Annotation
My proposal is to develop a NextFlow pipeline that will efficiently and accurately perform repeat annotation and masking on large genome sequences...
Contributor
Friederike Biermann
Mentor
Fergal Martin, Leanne Haggerty
Organization
Genome Assembly and Annotation
Using Deep Learning to Identify Features of Protein-Coding Genes
Accurate gene annotation in eukaryotes solely based on genomic data has been a significant obstacle in biology since the introduction of...