Look Out Below: Predicting Wastewater Infrastructure Service Type at the Land Parcel Scale

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

There has not been comprehensive national data collection of wastewater infrastructure serving the US population for 30 years, creating a data gap with implications for public health and asset management. We developed a model leveraging geospatial data and machine learning (Random Forest) to predict wastewater infrastructure in places where it is unknown. We employ a two-stage machine learning approach to model wastewater infrastructure coverage: Stage 1 identifies whether a parcel needs wastewater infrastructure and Stage 2 identifies whether it is served by an onsite wastewater treatment system or by a centralized sewer connection. We test this approach using data from Florida, including evaluating the approach’s applicability within Florida and an out-of-sample test in Virginia. The model achieved 91.8% accuracy across Florida with a 96.4% median Stage 2 confidence, suggesting potential use of confidence as a proxy for accuracy where ground-truth data is limited. The model achieved 81.9% accuracy in Virginia when predicting with a model trained only on data from Florida, suggesting strong transferability to new geographies. Variations in performance highlight opportunities for improvement in resolving sewer service boundary underestimation and testing to account for a range of local and historical circumstances. Our approach represents a scalable and transferable framework.

Article activity feed