Contributor
Konstantinos Kyriakidis

Development of a Long-read Nanopore RNA-seq pipeline in GenPipes


Mentors
Jose Hector Galvez
Organization
Canadian Centre for Computational Genomics

The majority of contemporary genomics and transcriptomics research is carried out using short-read technology such as the output of Illumina sequencers. However, newer long-read technologies such as PacBio and Oxford Nanopore (ONT) are becoming more prevalent due to the advantages they offer over short reads. The main advantage of long reads is that they span a much larger portion of the genome or transcriptome, making it easier to detect events such as structural variants or isoforms. As more researchers begin to take advantage of long reads, GenPipes needs to evolve and support long-read technology in its main pipelines. The objective of this project is to create a new RNA-seq pipeline (rnaseq_longreads), that supports long-read inputs. The new pipeline will be created based on the current versions of the RNA-seq pipeline, with the addition of the minimap2 aligner as well as StringTie v2 and Ballgown for generating transcript counts. In addition to implementing a long-read RNA-seq pipeline within the GenPipes framework, short-read RNA-seq pipeline will also be updated to utilize StringTie2 and both pipelines will be tested to ensure that they are working properly.