Evaluating Limits of Machine Learning-Assisted Raman Spectroscopy in Classification of Biological Samples

Aman Yadav
Arlin Birkby
Noah Armstrong
Assame Arnob
Ming-Hsun Chou
Alma Fernandez
Aart J. Verhoef
Zhenhuan Yi
Siddhant Gulati
Siddhi Kotnis
Qing Sun
Katy C. Kao
Hung-Jen Wu

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Machine learning (ML)-assisted Raman spectroscopy has become a powerful analytical tool for the classification and identification of analytes; however, technical challenges impacting its detection accuracy have not been investigated. This study explores experimental factors affecting classification performance. Among the evaluated ML models, ML algorithms show minimal impacts on classification accuracy. Instead, experimental factors, including spectral similarity between tested samples and the data quality, dominate detection performance. Increases in spectral noises and spectral similarity significantly reduce classification accuracy. In well-controlled samples with low experimental noise, ML-assisted Raman spectroscopy can discriminate lipid mixtures with a composition difference of 1.85 mol%. To assess the effect of biological heterogeneity, we analyzed single-cell Raman spectra from Saccharomyces cerevisiae strains carrying single, double, or triple gene mutations. Intrinsic cell-to-cell variability introduced substantial spectral differences, severely reducing the accuracy of multiclass classification of these genetically similar strains at the single-cell level. Averaging Raman spectra across multiple cells improved classification accuracy by reducing this spectral variability. We also assess the effectiveness of transfer learning across different Raman spectrometers, specifically by applying a ML model trained on one instrument to another Raman spectrometer. Transfer learning can be improved with proper instrument calibration, highlighting the importance of instrument standardization. Overall, our results demonstrate that data quality and spectral similarity are the primary bottlenecks in ML-assisted Raman spectroscopy. Careful attention to sample preparation, data acquisition, measurement conditions, and instrument calibration is critical to achieving robust and reliable classification performance.

Arcadia Science
Mar 18, 2026

First, principal component analysis (PCA) was applied to reduce data complexity.

Have you explored using NMF (non-negative matrix factorization) for analyzing Raman spectra? There's some recent work comparing the use of MCR and NMF, since they enforce non-negative component vectors and might be better aligned/possibly interpretable for Raman spectroscopy. (https://doi.org/10.1016/j.aca.2025.344755)

Read the original source
Version published to 10.64898/2026.02.26.708284 on bioRxiv
Mar 1, 2026

Raman Spectroscopy Based Non-Destructive Biomedical Diagnosis

This article has 3 authors:
1. Aishwarya Shirke
2. Aditi Sahu
3. Piyush Kumar
This article has no evaluationsLatest version Mar 13, 2026
Raman-guided Sample Subset Selection for Cost-efficient Offline Calibration in Bioprocesses

This article has 4 authors:
1. Terrance Wilms
2. Fabian Schwenke
3. Rudibert King
4. Steffi Knorn
This article has no evaluationsLatest version Mar 9, 2026
The Evaluation of Machine Learning Models using Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS) Spectra for the Prediction of Antibiotic Resistance in Klebsiella pneumoniae.

This article has 1 author:
1. Stephen Mark Edward Fordham
This article has no evaluationsLatest version Jan 27, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Raman Spectroscopy Based Non-Destructive Biomedical Diagnosis

Raman-guided Sample Subset Selection for Cost-efficient Offline Calibration in Bioprocesses

The Evaluation of Machine Learning Models using Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS) Spectra for the Prediction of Antibiotic Resistance in Klebsiella pneumoniae.