Correction of gas chromatography–mass spectrometry long-term instrumental drift using quality control samples over 155 days

Jie Yu
Tong An
Daifeng Chen
Shining Zong
Dongxiao Bai
Dawei Qi
Luning Zhang
Junming Shi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Long-term instrumental data drift is a critical challenge for ensuring process reliability and product stability. In this study, we conducted 20 repeated tests on the smoke of six commercial tobacco products using gas chromatography-mass spectrometry (GC–MS) instrument over 155 days. We propose a simple, cost-effective, and reliable peak-area correction approach to address long-term data drift, especially on GC–MS data. Using 20 pooled quality control (QC) samples, we establish correction algorithm data set, and achieved reliable peak correction even for compositions exhibiting large fluctuations. Three algorithms − spline interpolation (SC), support vector regression (SVR), and Random Forest (RF) − were applied to normalise 178 target chemicals in 20 repeated measurements on six samples. Two key innovative approaches were introduced. First, we established a “virtual QC sample” by incorporating chromatographic peaks from all 20 QC results, via retention time and mass spectrum verification. This virtual QC served as the meta reference for analyzing and normalizing the test sample. Second, we took translated batch effects and injection order effects into two numerical indices in the algorithms, minimizing artificial parameterization of experiments. For chemical components found in the test samples but absent in the QC samples, normalisation was achieved using either adjacent chromatography peak for correction or by applying average correction coefficients derived from all QC data. Our results show that Random Forest algorithm provided the most stable and reliable correction model for long-term, highly variable data. Principal component analysis (PCA) and standard deviation analysis confirmed the robustness of the correction procedure. In contrast, models based on SC and SVR algorithms exhibited less stability with SC being the lowest. For data with large variation, SVR tends to over-fit and over-correct. This study shows that for long-term data measurements by GC–MS, periodic QC sample measurements combined with appropriate algorithm for correction can effectively compensate for long-term measurement variability, enabling reliable data tracking and quantitative comparison over extended periods.

Version published to 10.1038/s41598-025-24794-y
Nov 19, 2025
Version published to 10.21203/rs.3.rs-7401283/v1 on Research Square
Sep 4, 2025

Integrated workflow for univariate and multivariate evaluation of batch correction reliability

This article has 6 authors:
1. Elfried Salanon
2. Blandine Comte
3. Delphine Centeno
4. Stéphanie Durand
5. Estelle Pujos-Guillot
6. Julien Boccard
This article has no evaluationsLatest version Nov 27, 2025
Max-INtensity Untargeted Transformation (MINUT) for Direct Chemometric Modeling of High-Resolution Mass Spectrometry Data

This article has 6 authors:
1. Christophe CORDELLA
2. Valentin MEO
3. Benedicte GAURIAT
4. Jean-Francois GARNIER
5. Dominique BERTRAND
6. Lucie TSAMBA
This article has no evaluationsLatest version Nov 7, 2025
Data-Driven Benchmarking of Raw Material Quality for Risk-Based QC Optimization in Pharmaceutical Manufacturing

This article has 2 authors:
1. Muhammad Bintang Ramadhan
2. Khadijah Zai
This article has no evaluationsLatest version Nov 25, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrated workflow for univariate and multivariate evaluation of batch correction reliability

Max-INtensity Untargeted Transformation (MINUT) for Direct Chemometric Modeling of High-Resolution Mass Spectrometry Data

Data-Driven Benchmarking of Raw Material Quality for Risk-Based QC Optimization in Pharmaceutical Manufacturing