Correction of gas chromatography–mass spectrometry long-term instrumental drift using quality control samples over 155 days
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Long-term instrumental data drift is a critical challenge for ensuring process reliability and product stability. In this study, we conducted 20 repeated tests on the smoke of six commercial tobacco products using gas chromatography-mass spectrometry (GC–MS) instrument over 155 days. We propose a simple, cost-effective, and reliable peak-area correction approach to address long-term data drift, especially on GC–MS data. Using 20 pooled quality control (QC) samples, we establish correction algorithm data set, and achieved reliable peak correction even for compositions exhibiting large fluctuations. Three algorithms − spline interpolation (SC), support vector regression (SVR), and Random Forest (RF) − were applied to normalise 178 target chemicals in 20 repeated measurements on six samples. Two key innovative approaches were introduced. First, we established a “virtual QC sample” by incorporating chromatographic peaks from all 20 QC results, via retention time and mass spectrum verification. This virtual QC served as the meta reference for analyzing and normalizing the test sample. Second, we took translated batch effects and injection order effects into two numerical indices in the algorithms, minimizing artificial parameterization of experiments. For chemical components found in the test samples but absent in the QC samples, normalisation was achieved using either adjacent chromatography peak for correction or by applying average correction coefficients derived from all QC data. Our results show that Random Forest algorithm provided the most stable and reliable correction model for long-term, highly variable data. Principal component analysis (PCA) and standard deviation analysis confirmed the robustness of the correction procedure. In contrast, models based on SC and SVR algorithms exhibited less stability with SC being the lowest. For data with large variation, SVR tends to over-fit and over-correct. This study shows that for long-term data measurements by GC–MS, periodic QC sample measurements combined with appropriate algorithm for correction can effectively compensate for long-term measurement variability, enabling reliable data tracking and quantitative comparison over extended periods.