An Application of ML-assisted Techniques for Water Quality Testing Systems

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate assessment of water quality requires analyzing numerous chemical and physical parameters, often resulting in high-dimensional datasets that increase computational complexity and testing costs. This study presents a machine learning (ML)-assisted framework that integrates multiple dimensionality reduction and feature selection techniques to improve the efficiency and cost-effectiveness of water quality monitoring systems. The framework combines Recursive Feature Elimination with Cross-Validation (RFECV), Permutation Importance (PI), Random Forest (RF), Principal Component Analysis (PCA), and Auto-Encoders to identify and rank the most influential parameters. To enhance practical relevance, domain-specific health hazard weights are incorporated to refine feature prioritization, ensuring the retention of critical indicators while eliminating redundant or low-impact variables. The approach is validated using three publicly available Kaggle water quality datasets with varying feature dimensions and sample sizes. Seven ML classifiers: RF, Decision Tree (DT), Logistic Regression (LR), K-Nearest Neighbors (KNN), Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM), and Deep Neural Network (DNN) are employed to assess predictive performance. Results show that the proposed method reduces input dimensionality by up to 23% while maintaining prediction accuracy within 1% of the baseline. Moreover, it achieves up to 17% lower computational cost, 48% shorter training time, and 36% faster inference, translating into potential laboratory testing cost savings of up to 32%. Overall, the integration of ML-based dimensionality reduction with domain-informed feature weighting offers a practical and scalable pathway toward intelligent, resource-efficient water quality assessment systems.

Article activity feed