A Concentration-Invariant FTIR Chemometric Workflow with Peak-Sparse Representation and Machine-Learning Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Fourier-transform infrared (FTIR) spectroscopy is a widely utilized analytical technique for qualitative identification in chemical, environmental, and industrial contexts. Variability in sample concentration and operator-dependent preprocessing can compromise the reproducibility of chemometric workflows. This research presents a concentration-invariant FTIR preprocessing and classification framework that incorporates Savitzky–Golay smoothing, asymmetric least-squares baseline correction, area normalization, and a percentile-based peak-sparse representation. Principal component analysis (PCA) is applied to the sparse spectra to generate a compact vibrational feature space, which is then used to train four supervised classifiers: PLS-DA, Random Forest, XGBoost, and Support Vector Machines. With a library of 89 pure organic compounds measured at four concentration levels, all models achieve macro-F1 scores between 0.97 and 1.00 under replicate-stratified evaluation, indicating strong robustness to concentration-driven spectral variation. The workflow is implemented in a lightweight Python/PyQt5 tool that enables real-time prediction and supports deployment in analytical laboratories and industrial quality-control settings. This study offers a transparent and reproducible chemometric framework that may serve as a basis for future extensions to complex mixtures and real-world sample matrices.

Article activity feed