Factors Influencing Accuracy, Interpretability and Reproducibility in the use of Machine Learning in Biology

Kaitlyn M. Martinez
Kristen Wilding
Trent R. Llewellyn
Daniel E. Jacobsen
Makaela M. Montoya
Jessica Z. Kubicek-Sutherland
Sweta Batni
Carrie Manore
Harshini Mukundan

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The complexity and variability of biological data has promoted the increased use of machine learning methods to understand processes and predict outcomes. These same features complicate reliable, reproducible, interpretable, and responsible use of such methods, resulting in questionable relevance of the derived outcomes. Here we systematically explore challenges associated with applying machine learning to predict and understand biological processes using a well characterized in vitro experimental system. We evaluated factors that vary while applying machine learning classifers: 1) type of biochemical signature (transcripts vs. proteins), data curation methods (pre- and post-processing), and 3) choice of machine learning classifier. Using accuracy, generalizability, interpretability, and reproducibility as metrics, we found that the above factors significantly modulate outcomes even within a simple model system. Our results caution against the unregulated use of machine learning methods in the biological sciences, and strongly advocate the need for data standards and validation tool-kits for such studies.

Version published to 10.21203/rs.3.rs-4171489/v1 on Research Square
Mar 27, 2024

QAFI: A Novel Method for Quantitative Estimation of Missense Variant Impact Using Protein-Specific Predictors and Ensemble Learning

This article has 3 authors:
1. Selen Ozkan
2. Natàlia Padilla
3. Xavier de la Cruz
This article has no evaluationsLatest version May 8, 2024
A novel method to guide biomarker combinations to optimize the sensitivity

This article has 6 authors:
1. Seyyed Mahmood Ghasem
2. Johannes F. Fahrmann
3. Samir Hanash
4. Kim-Anh Do
5. James P. Long
6. Ehsan Irajizad
This article has no evaluationsLatest version Apr 15, 2024
Prospective and External Validation of Prognostic Machine Learning Models for Short- and Long-Term Mortality Among Acutely Admitted Patients Based on Blood Tests.

This article has 12 authors:
1. Baker Nawfal Jawad
2. Izzet Altintas
3. Jesper Eugen-Olsen
4. Siar Niazi
5. Abdullah Mansouri
6. Line Jee Hartmann Rasmussen
7. Martin Schultz
8. Kasper Iversen
9. Nikolaj Normann Holm
10. Thomas Kallemose
11. Ove Andersen
12. Jan Nehlin
This article has no evaluationsLatest version Apr 26, 2024

Listed in

Abstract

Article activity feed

Related articles

QAFI: A Novel Method for Quantitative Estimation of Missense Variant Impact Using Protein-Specific Predictors and Ensemble Learning

A novel method to guide biomarker combinations to optimize the sensitivity

Prospective and External Validation of Prognostic Machine Learning Models for Short- and Long-Term Mortality Among Acutely Admitted Patients Based on Blood Tests.