Contributor
Yantong

Using Machine Learning to Identify and Classify Repeat Features


Mentors
Jose Perez-Silva, William Stark, Francesca Tricomi, Leanne Haggerty
Organization
Genome Assembly and Annotation
Technologies
python, pytorch
Topics
machine learning, Bioinformation, Repeat sequence
A number of tools exist for identifying repeat features, but it remains a problem that the DNA sequence of some genes can be identified as being a repeat sequence. If such sequences are used to mask the genome, genes may be missed in the downstream annotation. Assuming that gene sequences have various signatures relating to their function and that repeats have different signatures including the repetitive nature of the signal itself, we want to train a classifier to separate the repeat sequences from the gene sequences. We are inspired by DETR, an object detection model, this proposal will use transformer structure to complete the identify repeat sequence task, our model will unify segmentation and classification into one like the object detection model.