Using Machine Learning to Identify and Classify Repeat Features
- Mentors
- Jose Perez-Silva, William Stark, Francesca Tricomi, Leanne Haggerty
- Organization
- Genome Assembly and Annotation
- Technologies
- python, pytorch
- Topics
- machine learning, Bioinformation, Repeat sequence
A number of tools exist for identifying repeat features, but it remains a problem that the DNA sequence of some genes can be identified as being a repeat sequence. If such sequences are used to mask the genome, genes may be missed in the downstream annotation. Assuming that gene sequences have various signatures relating to their function and that repeats have different signatures including the repetitive nature of the signal itself, we want to train a classifier to separate the repeat sequences from the gene sequences.
We are inspired by DETR, an object detection model, this proposal will use transformer structure to complete the identify repeat sequence task, our model will unify segmentation and classification into one like the object detection model.