LRP2: A proteogenomics pipeline for long-read informed protein isoform analysis and discovery

Megan D. Schertzer
Julia T. Lewandowski
Emily F. Watts
Will Rosenow
Madison M. Mehlferber
Erin D. Jeffery
Scott I. Adamson
Jocelyne Bruand
Elizabeth Tseng
Yaseswini Neelamraju
Francine E. Garrett-Bakelman
Egor Dolzhenko
David A. Knowles
Gloria Sheynkman

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Most human genes produce multiple RNA isoforms, yet it remains unclear which isoforms are translated into stable, functional proteins. Long-read RNA-sequencing resolves full-length transcript structures and, when paired with mass spectrometry, can provide empirical evidence of isoform translation. Despite this opportunity, comprehensive workflows integrating isoform discovery, open reading frame prediction, peptide identification, and protein inference remain limited, leaving users to handle these steps piecemeal. Here, we present LRP2, a modular, end-to-end long-read proteogenomics pipeline built in Nextflow. LRP2 scales transcript discovery to hundreds of samples via PacBio’s latest Isocall tool, removes technical artifacts with SQANTI QC, generates and classifies predicted proteomes via CPAT and SQANTI Protein, performs multi-group differential expression and usage analysis via edgeR, DRIMSeq and a long-read adaptation of LeafCutter, and integrates protein-level evidence from DDA and DIA MS data through FragPipe. For cross-dataset comparison of novel isoforms, LRP2 employs deterministic splice-junction, coordinate-based isoform identifiers.

Availability and implementation

LRP2 is freely available as a modular Nextflow pipeline at: https://github.com/sheynkman-lab/LRP2 . LRP2 supports Docker, Apptainer, and Conda environments with GENCODE references.

Contact

Megan Schertzer, cwp5au@virginia.edu

Gloria Sheynkman, gs9yr@virginia.edu

Version published to 10.64898/2026.05.27.728216 on bioRxiv
May 31, 2026

TransXplorer: an automated translational discovery platform for RNA-seq data

This article has 8 authors:
1. Varinder Madhav Verma
2. Eponine Oler
3. Hussain Syed
4. Scott Han
5. Mark Berjanskii
6. Andrew L. Mason
7. David Wishart
8. Gane Ka-Shu Wong
This article has no evaluationsLatest version May 16, 2026
The Structural Code of Breast Cancer Proteoform: Alternative Splicing-driven Protein Isoform Variation and Functional Diversification

This article has 19 authors:
1. Felicia T. Jiang
2. Dengwang Chen
3. Runhao Zhao
4. Xiangeng Wang
5. Hao Hao
6. Feng Liang
7. Yinghan Zhang
8. Taoyong Cui
9. Zhenchao Tang
10. Tianli Luo
11. Yi Shuai
12. Hualiang Yao
13. Minghao Xu
14. Chenchen Xu
15. Ziwei Wang
16. Jia Xu
17. Wentao Zhang
18. Jun Tan
19. Xin Wang
This article has no evaluationsLatest version May 5, 2026
PEXMap: A proteogenomic method for exon and isoform level mapping of mass spectrometry derived peptides

This article has 3 authors:
1. Deepanshi Awasthi
2. Paras Verma
3. Shashi Bhushan Pandit
This article has no evaluationsLatest version May 4, 2026

Discuss this preprint

Listed in

Abstract

Availability and implementation

Contact

Article activity feed

Related articles

TransXplorer: an automated translational discovery platform for RNA-seq data

The Structural Code of Breast Cancer Proteoform: Alternative Splicing-driven Protein Isoform Variation and Functional Diversification

PEXMap: A proteogenomic method for exon and isoform level mapping of mass spectrometry derived peptides