Reference protein-coding transcripts of human genes annotated using long-read transcriptome datasets

Kuo-Feng Tung
Wen-chang Lin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accumulating NGS expression datasets suggest that protein-coding genes produce numerous alternatively spliced transcripts. However, this observation might be overestimated in short-read sequencing data, which often cannot accurately resolve distinct spliced isoforms and introduce ambiguity. Resolving tissue-specific expression profiles is crucial to identify bona fide translated peptide products. In this study, we identified the most highly expressed protein-coding transcripts by using long-read NGS datasets to better understand the biochemical and biological functions of human protein-coding genes. Using nanopore sequencing data from 30 normal human tissues in the GSE192955 dataset, we identified 18,094 dominantly expressed representative protein-coding transcripts (Ref-Tx) from 18,557 human genes. Comparison with MANE-select transcripts revealed that 14,546 Ref-Tx transcripts matched those in the MANE-select dataset. This result indicates improved agreement between long-read transcriptome data and MANE-select transcripts. A higher proportion of Rank1 transcripts were identified as Ref-Tx in the long-read dataset. Similar patterns were observed when Ref-Tx were compared with functional APPRIS annotations. Given the importance of tissue-specific expression profiles for protein-coding transcripts, we developed an expression visualization bioinformatic tool (eCPG). This webtool integrates the extensive expression information from 30 normal human tissues as well as from the GTEx project, which is designed to interrogate the dominant protein-coding transcripts.

Version published to 10.21203/rs.3.rs-8974155/v1 on Research Square
Mar 16, 2026

Long-read transcriptomics corrects Trichomonas vaginalis intron annotations and maps poly(A)/UTR landscapes

This article has 10 authors:
1. Yuan-Ming Yeh
2. Wei‑Hung Cheng
3. Kuo-Yang Huang
4. Chi‑Ching Lee
5. Seow-Chin Ong
6. Hong-Wei Luo
7. Jhen-Wei Syu
8. Cheng‑Hsun Chiu
9. Po-Jung Huang
10. Petrus Tang
This article has no evaluationsLatest version Mar 20, 2026
Mapping the genetic regulation of long non-coding RNAs across cellular contexts advances understanding of complex brain diseases

This article has 7 authors:
1. Tianyi Zhao
2. Yuran Jia
3. Li Chen
4. Liyang Song
5. Tianyi Zheng
6. Jian Yang
7. Yadong Wang
This article has no evaluationsLatest version Apr 14, 2026
Chromosome 20 RNA-seq Analysis: TDP-43 Knockout vs Wild-Type Rescue

This article has 4 authors:
1. Almokhtar Akeel S. Aljarodi
2. Ammar Yasir Babasit
3. Abdullah Eyad Abdullah
4. Ahmad Abdullah A. Bukhamsin
This article has no evaluationsLatest version Mar 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Long-read transcriptomics corrects Trichomonas vaginalis intron annotations and maps poly(A)/UTR landscapes

Mapping the genetic regulation of long non-coding RNAs across cellular contexts advances understanding of complex brain diseases

Chromosome 20 RNA-seq Analysis: TDP-43 Knockout vs Wild-Type Rescue