Self-Supervised Graph Attention Networks for Community-Engaged Lead Contamination Risk Assessment
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Lead contamination in residential water supplies constitutes a nationwide public‑health emergency, burdening communities with chronic neurological, cardiovascular, and infrastructural costs. Current Environmental Protection Agency (EPA) Lead and Copper Rule protocols, which hinge on homeowner sampling and laboratory analyses, are prohibitively expensive, slow, and too sparsely deployed to flag emerging contamination clusters in time for preventive action. To bridge this gap, we propose a scalable graph machine learning-aided high‑resolution lead risk assessment framework using publicly available housing datasets (parcel, infrastructure) and historical lead testing data archives. The heart of the approach is a Self‑Supervised Graph Attention Network (SSGAT) that employs graph attention layers to model spatial dependencies between properties, coupled with self-supervised pretraining to enhance generalizability. An adaptive human‑in‑the‑loop module refines these attention weights through rapid expert review, ensuring locality‑specific nuances are captured without retraining from scratch. We pre‑trained the model on the publicly available Flint, Michigan dataset and fine‑tuned it using IRB‑approved parcel‑linked samples and stakeholder feedback collected in Andover, Massachusetts. The resulting system attains 90\% classification accuracy and an AUC of 83.6\%, surpassing state-of-the-art models by as much as 12\% while cutting per‑parcel screening costs by a factor of five. By uniting self‑supervised graph learning, transferability, and participatory validation, this work elevates computational methodology and environmental‑engineering practice toward scalable, proactive surveillance of spatially correlated drinking‑water hazards.