Complete Simulation of timsTOF PASEF Raw Datasets with Timsim Enables Precise Evaluation of False Discovery and Phosphosite Localization Error Rates

Stefan Tenzer
David Teschner
Zixuan Xiao
Tim Maier
David Gomez-Zepeda
Mateusz Łącki
Michal Startek
Ute Distler
Tanja Ziesmann
Mathias Wilhelm
Andreas Hildebrand

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Accurate control of false discovery rates (FDR) and false localization rates (FLR) is central to quantitative proteomics and phosphoproteomics, yet rigorous validation is limited by the absence of high-complexity ground truth data. Here we introduce timsim, a simulation framework using machine-learning and first principle-driven prediction of peptide properties to generate native Bruker-format timsTOF dda-PASEF and dia-PASEF acquisition data with complete ground-truth annotation. Using timsim benchmarks, we show that several dia-PASEF workflows control FDR near the nominal 1% threshold at stripped-sequence level but exhibit inflated true FDR (3–5%) when modified peptidoforms are considered, driven by systematic misassignment of common modifications. In dda-PASEF analyses, match-between-runs produced peak-matching errors of up to 30% under high-density conditions. Simulated phosphoproteomics datasets enabled calibration of site localization scores, identifying a 0.65 site-probability cutoff as an optimal tradeoff between sensitivity and false localization. Timsim provides a scalable resource for rigorous benchmarking and development of proteomics software.

Version published to 10.21203/rs.3.rs-9032301/v1 on Research Square
Mar 12, 2026

How precise are mutation rate estimates? Comparison of different approaches to estimate de novo mutation rates

This article has 5 authors:
1. Xi Wang
2. Chaowei Zhang
3. Hongbo Wang
4. Kerry Reid
5. Juha Merilä
This article has no evaluationsLatest version Mar 6, 2026
Metagenomic-scale analysis of the predicted protein structure universe

This article has 11 authors:
1. Martin Steinegger
2. Jingi Yeo
3. Yewon Han
4. Nicola Bordin
5. Andy Lau
6. Shaun Kandathil
7. Hyunbin Kim
8. Eli Levy Karin
9. Milot Mirdita
10. David Jones
11. Christine Orengo
This article has no evaluationsLatest version Mar 31, 2026
Large-scale proteome inference from unpaired single-cell transcriptomic and proteomic data by msInfer

This article has 9 authors:
1. Yadong Wang
2. Tianyi Zhao
3. Yuzhi Sun
4. Renjie Liu
5. Liyuan Zhang
6. Chengcheng Zhang
7. Yuran Jia
8. Liang Cheng
9. Guohua Wang
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

How precise are mutation rate estimates? Comparison of different approaches to estimate de novo mutation rates

Metagenomic-scale analysis of the predicted protein structure universe

Large-scale proteome inference from unpaired single-cell transcriptomic and proteomic data by msInfer