Machine Learning Enhanced Prediction of TDS for Strengthening Aquatic Disease Early Warning Systems
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Total Dissolved Solids (TDS) is a key determinant of aquatic ecosystem stability and a predictor of stress-induced disease outbreaks in fish and other aquatic organisms. Accurate forecasting of TDS is therefore essential for early warning systems and mitigating aquatic disease burdens. This study developed a hybrid stacked ensemble model integrating XGBoost, LightGBM, and Multi-Layer Perceptron (MLP), with ridge regression serving as the meta-learner. Interquartile Range (IQR) clipping was applied to remove outliers, followed by min–max scaling for normalization. Model performance was assessed using R², RMSE, and MSE on both raw and scaled targets. Among individual models, XGBoost demonstrated the highest predictive accuracy (R² = 0.954, RMSE = 46.73 mg/L), outperforming MLP (R² = 0.565) and LightGBM (R² = 0.458). The stacked ensemble improved calibration and variance reduction, achieving R² = 0.814 and RMSE = 93.89 mg/L. SHAP analysis identified electrical conductivity (EC) as the dominant positive driver of TDS, with strong rightward contributions at high feature values. The findings highlight the role of advanced ensemble learning in enhancing water quality forecasting and supporting proactive disease risk management. Improved TDS prediction can strengthen monitoring frameworks and help reduce the burden of waterborne and aquaculture-related diseases. Future research should incorporate cross-validation and optimized hyperparameter tuning to further enhance model robustness.