A Global Assessment of the Transcription-Dependent Single Nucleotide Variants Relies on the Characteristics of RNA-Sequencing Technologies
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single nucleotide variants (SNVs) are crucial for cancer occurrence and development. SNVs at transcriptomic generally come from genomic variants (g-tSNVs) and RNA editing (e-tSNVs). Types and quantities of e-tSNVs remain a large argument due to relatively poor understanding of RNA editing processes. Herein, we developed TSCS (Transcript SNVs Classifier relied on complementary Sequencings), a machine learning classifier that integrates short-read (MGI) and long-read (PacBio) RNA-seq data to accurately distinguish true transcript SNVs using stringent criteria. Applied to five colorectal cancer cell lines (HCT15, LoVo, SW480, SW620, and HCT116), TSCS demonstrated superior accuracy and sensitivity, especially for low-frequency variants, outperforming established tools (GATK, BCFtools, Longshot, RED_ML). It increased total detected transcript SNVs by 31.83% on average, with g-tSNVs and e-tSNVs exceeding conventional methods by >1-fold and >2-fold, respectively. TSCS achieved mean recall rates of 75.3% for g-tSNVs and 77.2% for e-tSNVs. Notably, For the first time, e-tSNVs were found in relatively large proportion of total transcript SNVs in cancer cell lines, approximately 40%. Of the identified e-tSNVs, 80% were attributed to the known-RNA editing, but the other e-tSNVs did not fall into any known category. Importantly, the e-tSNVs uniquely detected in this study showed distinct patterns in SNV types and genomic locations. Additionally, the transcript SNVs called by TSCS were partially confirmed by experimental approaches, such as Sanger sequencing, RNC-seq and mass spectrometry. This study lays the foundation for surveying and appraising the cancer-related e-tSNVs.