Advantages of long-read sequencing for mRNA profiling of biological systems

By Nucleati Team
Blog graphics.001

mRNA sequencing (mRNA-Seq) evaluates the transcriptome of disease states and biological processes. mRNA-Seq is a precise method for measuring gene expression. Unlike small reads typically produced by next-generation transcriptome sequencing, long reads help detect full-length transcripts, gene fusions, and allele-specific expression. In this blog post, we describe several advantages of long-read over short-read sequencing when used for transcriptome analysis.

PacBio's SMRT sequencing uses DNA polymerase to perform uninterrupted template-directed synthesis. It is free from the PCR amplification step and, as a result, removes amplification bias and provides uniform coverage across the transcriptome. It produces extraordinarily long reads with average lengths of 4200 to 8500 bp, which enhances the detection of novel transcript structures.

The transcriptome of species has a high degree of complexity. It contains multiple types of coding and noncoding RNAs. Long-read technologies such as Pacific Biosciences (PacBio) single-molecule real-time (SMRT) sequencing and Oxford Nanopore Technologies (ONT) nanopore sequencing are improving accuracy and throughput compared to short-reads. Long reads provide an accurate, high-resolution view of transcriptomes and isoform identification. Additionally, it accurately quantifies overlapping transcripts and increases the percentage of alignable reads. In addition to providing gene expression quantification, long-read sequencing provides full-length transcript resolution and enables the identification of unknown genes and alternatively spliced transcripts.

Compared to short reads, long reads produced by third-generation sequencers tend to overlap better with other reads. As a result, re-assembling the RNA pieces in their proper sequence is straightforward. Additionally, long-reads are more likely to cover the repetitive region, enabling the construction of whole transcript assemblies with fewer gaps.

Briefly, although short-read sequencing and long-read sequencing perform equally for quantification of gene expression, the latter is efficient in identifying transcript isoforms and provides better transcript resolution with a higher mean contig length. Higher contig length enables full-length de Novo assembly of the transcriptome.

References

  1. Mak, A.C. et al (2016) Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays. Genetics 202(1):351–62
  2. Norris, A.L. et al (2016) Nanopore sequencing detects structural variants in cancer. Cancer Biol Ther 17(3):246–53
  3. Schatz, M. (2014). De novo assembly of complex genomes using single molecule sequencing. Presentation.
  4. Chaisson, M.J.P. et al (2015) Genetic variation and the de novo assembly of human genomes. Nat Rev Genet 16(11):627–40
  5. Stankiewicz, P. and Lupski, J.R. (2010) Structural variation in the human genome and its role in disease. Annu Rev Med 19:455–77
  6. Istace, B. et al (2017) De novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer. GigaScience 
  7. Manolio TA, Collins FS. The HapMap and genome-wide association studies in diagnosis and therapy. Annu Rev Med. 2009;60:443–56.