Leakage-safe and repeat-stable reduced-band selection for VIS–NIR spectral regression with stability-gated explainability: mango TSS
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Mango sweetness grading requires non-destructive estimation of total soluble solids (TSS, °Brix) despite substantial fruit-to-fruit variability. However, hyperspectral TSS studies often use non-group-aware data splits and single-split reporting, which can inflate generalisation estimates and yield reduced-band recommendations with unclear repeatability. We propose a leakage-safe, repeat-stable reduced-band framework for mango TSS prediction from VIS–NIR hyperspectral imaging (415–954 nm; 60 bands). After spatial conditioning (tissue masking and specular-glare correction), ROI spectra (n = 1,360) were extracted from 340 fruits organised into 68 FruitGroups tracked across five ripening days and three cultivars; reference TSS was obtained destructively from FruitGroup-matched fruits. Wavelength selection was evaluated using a two-stage, stratified group-aware protocol that enforces FruitGroup-level independence while preserving the TSS distribution across folds, with Optuna-based inner-CV hyperparameter optimisation and SVR as the final predictor. Stage 1 screened preprocessing × selection pipelines on a single hold-out split; Stage 2 re-evaluated shortlisted candidates over 10 repeated FruitGroup-level resamples and selected the final pipeline using a 1-SE acceptance rule with a parsimony/competitiveness tie-break. The selected MSC–Random Frog pipeline achieved outer prediction-set performance of RMSE 2.417 ± 0.267 °Brix, RPD 1.89 ± 0.22, and R² 0.711 ± 0.059 using k = 28 ± 3 bands. Repeat-wise stability analysis and stability-gated explainability (permutation importance and SHAP), computed conditional on each repeat’s selected subset, showed consistent importance rankings and supported a candidate 11-band multispectral recommendation spanning 508–945 nm, anchored at 554 nm (VIS), 701 nm (red-edge), and 909 nm (NIR). The framework yields repeat-stable, interpretable wavelength recommendations to support reduced-band translation for mango sweetness grading and sorting.