Generalizable Cysteine Quantification in Pea Cultivars from SERS Spectra Using AI

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

Rapid quantification of sulfur-containing amino acids, particularly cysteine, in legumes is critical for assessing nutritional quality, supporting breeding program screening, and ensuring consistency in quality control processes. However, conventional methods, such as high-performance liquid chromatography (HPLC), are time-consuming and resource-intensive for high-throughput applications. This study evaluated artificial intelligence models for predicting cysteine concentration from surface-enhanced Raman spectroscopy (SERS) spectra of pea extracts. SERS spectra were acquired from 20 cultivars grown at three geographically distinct locations, with HPLC-measured cysteine concentrations as a ground truth reference. Linear regression, partial least squares regression, support vector regression, random forest regression, and a one-dimensional convolutional neural network (1D-CNN) were compared using within-cultivar splits and leave-one-cultivar-out (LOCO) evaluation. The 1D-CNN achieved RMSE 0.008 g/100 g within cultivars and maintained performance under LOCO, while other models showed limited generalization. Shapley Additive Explanations highlighted informative bands in the 630–760 cm −1 range, and noise modeling optimized scan-count selection.

Article activity feed

  1. Figure 2.

    Are there wavenumbers that are consistent across the two? If there is not a significant degradation in the performance of the 1D-CNN when doing LOCO, why is the SHAP value 10x smaller?