Using Artificial Intelligence (AI) to Model Clinical Variant Reporting for Next Generation Sequencing (NGS) Oncology Assays

Kenneth D Doig
Rashindrie Perera
Yamuna Kankanige
Andrew Fellowes
Jason Li
Richard Lupat
Ella R Thompson
Piers Blombery
Stephen B Fox

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Targeted next generation sequencing (NGS) of somatic DNA is now routinely used for diagnostic and predictive reporting in the oncology clinic. The expert genomic analysis required for NGS assays remains a bottleneck to scaling the volume of patients being assessed. This study harnesses data from targeted clinical sequencing to build machine learning models that predict whether patient variants should be reported.

Methods

Three somatic assays were used to build machine learning prediction models (Stochastic Gradient Descent, Random Forest, XGBoost and Neural Networks). Using manual expert curation to select reportable variants as ground truth, we built models to classify clinically reportable variants. Assays were performed between 2020 and 2023 yielding 1,350,018 variants and used to report on 10,116 patients. All variants, together with 211 annotations and sequencing features, were used by the models to predict the likelihood of variants being reported.

Results

The tree-based ensemble models performed consistently well achieving between 0.891 and 0.995 on the precision recall/area under the curve (PRC AUC) metric when predicting whether a variant should be reported. To assist model explainability, individual model predictions were presented to users within a tertiary analysis platform as a waterfall plot showing individual feature contributions and their values for the variant. Over 30% of the model performance was due to features sourced from statistics derived in-house from the sequencing assay precluding easy generalization of the models to other assays or other laboratories.

Conclusions

Longitudinally acquired NGS assay data provide a strong basis for machine learning models for decision support to select variants for clinical oncology reports. The models provide a framework for consistent reporting practices and reducing inter-reviewer variability. To improve model transparency, individual variant predictions are able to be presented as part of reviewer workflows.

Version published to 10.1101/2025.05.14.25327648 on medRxiv
May 15, 2025

Artificial Intelligence and Machine Learning for De Novo Cancer Drug Discovery: A Systematic Review of Generative Design and Validation Gaps

This article has 4 authors:
1. Hashim Hashim
2. Fahad Abubakr
3. Mohamed Elhassadi
4. Ali Hasnain
This article has no evaluationsLatest version Dec 23, 2025
Prospective Germline Exome and Machine Learning-Based Risk Score Identify Predictive and PrognosticBiomarkers of Immunotherapy Outcomes in Advanced Non-Small Cell Lung Cancer

This article has 15 authors:
1. Andrea González-Hernández
2. Alberto Ríos
3. Juan Luis Onieva
4. Alexandra Cantero
5. Marina Rivero-Aguilar
6. Guillermo Paz-López
7. Antonio Rueda-Dominguez
8. María Garrido-Barros
9. Beatriz Martínez-Gálvez
10. Juan Zafra
11. Laura Cristina Figueroa-Ortiz
12. Elisabeth Pérez-Ruiz
13. José Carlos Benitez
14. Isabel Barragan
15. Javier Oliver
This article has no evaluationsLatest version Jan 20, 2026
Deep Learning Paradigm for Precision Lung Cancer Therapy with AI-Driven Genotype-Phenotype Mining and Patient-Derived Organoid Validation

This article has 19 authors:
1. Zhongze Gu
2. Mingyue Li
3. Xiaoming Shi
4. Tianmu Hu
5. Juan Zhang
6. Ziliang Ye
7. Yuhan Cai
8. Qiwei Li
9. Linchong Liu
10. Wenlong Yu
11. Jiajia Jing
12. Qiuyin Zhang
13. Juanjuan Li
14. Xin Zhou
15. Nan Qiao
16. Jun Bao
17. Zaozao Chen
18. Lili Xu
19. Tao Wang
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Artificial Intelligence and Machine Learning for De Novo Cancer Drug Discovery: A Systematic Review of Generative Design and Validation Gaps

Prospective Germline Exome and Machine Learning-Based Risk Score Identify Predictive and PrognosticBiomarkers of Immunotherapy Outcomes in Advanced Non-Small Cell Lung Cancer

Deep Learning Paradigm for Precision Lung Cancer Therapy with AI-Driven Genotype-Phenotype Mining and Patient-Derived Organoid Validation