FEPO: a machine learning ensemble approach for predicting extreme phlebotomine sand fly abundance and leishmaniasis-risk hotspots across Europe
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Climate change significantly influences the spread of infectious diseases, including leishmaniasis, which is transmitted by phlebotomine sand flies. The geographical distribution of sand flies has expanded northward from the Mediterranean region, increasing the risk of leishmaniasis in areas that previously lacked systematic vector surveillance. This study presents FEPO (SandFlies Extreme POpulation prediction), a machine learning ensemble model that serves as a core component for developing early warning systems for vector-borne diseases. FEPO uses more than one thousand field trap records collected between 2011 and 2022, along with 1 km meteorological, hydrological, and morphological grids, to produce daily maps of sand fly density spanning 26 European countries. The model stacks gradient boosted decision trees using CatBoost and applies a tailored under and over sampling strategy to address the scarcity and skewness of observational data, where occasional population surges are buried among many zero and low abundance counts. Tenfold cross validation shows that FEPO achieves an 11% lower mean absolute error compared to baseline regression models. The model reveals persistent hotspots along the Mediterranean and Balkan coasts, as well as in parts of Central and Northern Europe, where environmental conditions favor vector proliferation. By delivering high resolution outputs, FEPO enables public health agencies to target trapping and mitigate outbreaks while also offering a transferable blueprint for early warning systems that address other climate sensitive disease vectors.