Transforming de novo peptide sequencing by explainable AI

Yu Wang
Zhendong Liang
Tianze Ling
Cheng Chang
Tingpeng Yang
Linhai Xie
Yonghong He

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

De novo peptide sequencing is crucial for identifying novel proteins, yet its broader application is constrained by the lack of a robust quality control system. In response, we developed a transformer-based model, π-xNovo, that accurately predicts peptides. By analyzing the model's attention matrix, we elucidated the contribution of spectral peaks to amino acid predictions, thus making de novo sequencing results explainable. Leveraging these insights, we designed a quality control system, π-xNovo-QC, which distinguishes peptide predictions with an accuracy exceeding 80% and a sensitivity above 90%. Applying this system to a large-scale deep human proteome dataset resulted in the identification of 1,931,761 additional peptides, marking a 137% increase over traditional database search results. These newly identified peptides with high confidence facilitated a 17.9% increase in protein identification, a 23.59% increase in the detection of single amino acid polymorphism events, and a 20.02% increase in exon-skipping splicing events. The deployment of this explainable AI system holds significant potential for expanding the application of de novo peptide sequencing, particularly in exploring the darker matter of the entire proteome universe.

Version published to 10.21203/rs.3.rs-4716013/v1 on Research Square
Aug 5, 2024

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

This article has 13 authors:
1. Peilin Xie
2. Xingchen Liu
3. Lantian Yao
4. Zhihao Zhao
5. Anming Yang
6. Jiahui Guan
7. Zijun Jiao
8. Zhihong Liu
9. Junwen Wang
10. Tzong-Yi Lee
11. Zigang Li
12. Bingyu Cui
13. Ying-Chih Chiang
This article has no evaluationsLatest version Dec 11, 2025
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
Bayesian Optimization for Biochemical Discovery with LLMs

This article has 6 authors:
1. Rafael Gómez-Bombarelli
2. Mattias Akke
3. Soojung Yang
4. Jurgis Ruza
5. Jinyeop Song
6. Elton Pan
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

The Evolution of the AlphaFold Architecture

Bayesian Optimization for Biochemical Discovery with LLMs