Limitations of de novo sequencing in resolving sequence ambiguity

Sam van Puyenbroeck
Denis Beslic
Tomi Suomi
Tanja Holstein
Thilo Muth
Laura L. Elo
Lennart Martens
Robbin Bouwmeester
Tim Van Den Bossche
Tine Claeys

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

De novo peptide sequencing enables peptide identification from fragmentation spectra without relying on sequence databases. However, incomplete spectra create ambiguity, making unambiguous identification challenging. Recent deep learning advances have produced numerous de novo models that predict sequences and refine peptide–spectrum matches under such conditions. Yet, their relative strengths, weaknesses, and ability to handle spectrum ambiguity remain unclear. Here, we benchmark eight state-of-the-art models on three publicly available proteomics datasets, comparing performance using established metrics and quantifying inter-model agreement. We assess post-processing approaches, including iterative refinement, rescoring, and reranking, for their ability to improve identification accuracy, and perform an error analysis to identify common mispredictions and their causes. Model performance varied, with considerable overlap of correct identifications. Post-processing yielded no or only modest improvements. Most sequencing errors were model-independent and driven by limited fragment ion coverage, a limitation also observed in database searches with large search spaces.

Version published to 10.1101/2025.08.19.671052 on bioRxiv
Aug 23, 2025

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Evaluation of Dorado v5.2.0 de novo basecalling models for the detection of tRNA modifications using RNA004 chemistry

This article has 4 authors:
1. Bhargesh Indravadan Patel
2. Franziskus N.M. Rübsam
3. Yu Sun
4. Ann E. Ehrenhofer-Murray
This article has no evaluationsLatest version Dec 23, 2025
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Evaluation of Dorado v5.2.0 de novo basecalling models for the detection of tRNA modifications using RNA004 chemistry

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model