Generalizable Cysteine Quantification in Pea Cultivars from SERS Spectra Using AI

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

Rapid quantification of sulfur-containing amino acids, particularly cysteine, in legumes is critical for assessing nutritional quality, supporting breeding program screening, and ensuring consistency in quality control processes. However, conventional methods, such as high-performance liquid chromatography (HPLC), are time-consuming and resource-intensive for high-throughput applications. This study evaluated artificial intelligence models for predicting cysteine concentration from surface-enhanced Raman spectroscopy (SERS) spectra of pea extracts. SERS spectra were acquired from 20 cultivars grown at three geographically distinct locations, with HPLC-measured cysteine concentrations as a ground truth reference. Linear regression, partial least squares regression, support vector regression, random forest regression, and a one-dimensional convolutional neural network (1D-CNN) were compared using within-cultivar splits and leave-one-cultivar-out (LOCO) evaluation. The 1D-CNN achieved RMSE 0.008 g/100 g within cultivars and maintained performance under LOCO, while other models showed limited generalization. Shapley Additive Explanations highlighted informative bands in the 630–760 cm −1 range, and noise modeling optimized scan-count selection.

Article activity feed

  1. Figure 2.

    Are there wavenumbers that are consistent across the two? If there is not a significant degradation in the performance of the 1D-CNN when doing LOCO, why is the SHAP value 10x smaller?