Leakage-safe and repeat-stable reduced-band selection for VIS–NIR spectral regression with stability-gated explainability: mango TSS

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mango sweetness grading requires non-destructive estimation of total soluble solids (TSS, °Brix) despite substantial fruit-to-fruit variability. However, hyperspectral TSS studies often use non-group-aware data splits and single-split reporting, which can inflate generalisation estimates and yield reduced-band recommendations with unclear repeatability. We propose a leakage-safe, repeat-stable reduced-band framework for mango TSS prediction from VIS–NIR hyperspectral imaging (415–954 nm; 60 bands). After spatial conditioning (tissue masking and specular-glare correction), ROI spectra (n = 1,360) were extracted from 340 fruits organised into 68 FruitGroups tracked across five ripening days and three cultivars; reference TSS was obtained destructively from FruitGroup-matched fruits. Wavelength selection was evaluated using a two-stage, stratified group-aware protocol that enforces FruitGroup-level independence while preserving the TSS distribution across folds, with Optuna-based inner-CV hyperparameter optimisation and SVR as the final predictor. Stage 1 screened preprocessing × selection pipelines on a single hold-out split; Stage 2 re-evaluated shortlisted candidates over 10 repeated FruitGroup-level resamples and selected the final pipeline using a 1-SE acceptance rule with a parsimony/competitiveness tie-break. The selected MSC–Random Frog pipeline achieved outer prediction-set performance of RMSE 2.417 ± 0.267 °Brix, RPD 1.89 ± 0.22, and R² 0.711 ± 0.059 using k = 28 ± 3 bands. Repeat-wise stability analysis and stability-gated explainability (permutation importance and SHAP), computed conditional on each repeat’s selected subset, showed consistent importance rankings and supported a candidate 11-band multispectral recommendation spanning 508–945 nm, anchored at 554 nm (VIS), 701 nm (red-edge), and 909 nm (NIR). The framework yields repeat-stable, interpretable wavelength recommendations to support reduced-band translation for mango sweetness grading and sorting.

Article activity feed