Sample-specific haplotype-resolved protein isoform characterization via long-read RNA-seq-based proteogenomics

David Wissel
Gloria M. Sheynkman
Mark D. Robinson

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein isoform inference from bottom-up mass spectrometry (MS) relies on database search strategies that assume the reference protein database accurately reflects the full repertoire of genetic and transcriptomic states present in the sample being analyzed. Long-read RNA sequencing (lrRNA-seq) now enables simultaneous recovery of complete transcript structures and the genetic variants present on each molecule, offering a direct route to allele-specific isoforms, yet this capability has not been fully leveraged to improve MS-based proteogenomics workflows. Here, we develop an end-to-end workflow for constructing and searching haplotype-resolved, sample-specific proteomes using matched lrRNA-seq and MS data. We benchmark multiple phasing algorithms on PacBio lrRNA-seq from Genome-in-a-Bottle samples and identify methods that achieve high phasing accuracy and completeness on transcriptomic reads. Our open-source, modular Snakemake pipeline performs variant calling, read-based phasing, isoform discovery, haplotype-resolved proteome construction, MS search, and downstream annotation. To demonstrate its utility, we apply the workflow to an induced pluripotent stem cell line (WTC11) and to an osteoblast differentiation time course, showing that haplotype-resolved databases enable detection of variant and splice peptides, allele-specific protein isoforms, and linked variants not detectable with reference-only proteomes. Together, our results demonstrate that lrRNA-seq-based phasing is feasible and effective for proteogenomics and provide a practical framework for allele-resolved proteome characterization in dynamic or disease-relevant settings.

Version published to 10.1101/2025.11.21.689818 on bioRxiv
Nov 24, 2025

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

This article has 7 authors:
1. Can Luo
2. Yichen Liu
3. Han Liu
4. Zhenmiao Zhang
5. Lu Zhang
6. Brock Peters
7. Xin Maizie Zhou
This article has no evaluationsLatest version Jan 12, 2026
Optimizing bioinformatic workflows to extract clinically usable gene expression data from targeted RNA sequencing panels: comparison with total RNAseq

This article has 12 authors:
1. Xiaokang Pan
2. Ashley Patton
3. Yi Seok Chang
4. Ryan Stevens
5. Nehad Mohamed
6. Matthew Hunt
7. Daniel Chappell
8. Yan Hu
9. Cecelia Miller
10. Weiqiang Zhao
11. Matthew Avenarius
12. Dan Jones
This article has no evaluationsLatest version Feb 3, 2026
Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

This article has 15 authors:
1. Sarah Silverstein
2. Kaushik Ganapathy
3. Sandra Donkervoort
4. Veronique Bolduc
5. Ying Hu
6. Justin Moy
7. Prech Uapinyoying
8. Svetlana Gorokhova
9. Vijay Ganesh
10. Ben Weisburd
11. Rotem OrBach
12. A. Reghan Foley
13. Pejman Mohammadi
14. David Adams
15. Carsten Bonnemann
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

Optimizing bioinformatic workflows to extract clinically usable gene expression data from targeted RNA sequencing panels: comparison with total RNAseq

Benchmarking RNA-seq Tools for Real-World Diagnostic Applications