A scutum-focused deep learning pipeline for species-level identification of Aedes aegypti and Aedes albopictus from citizen-science images
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Mosquito-borne diseases transmitted by Aedes aegypti and Aedes albopictus — including dengue, Zika, chikungunya, and yellow fever — depend critically on rapid and accurate vector identification. Although deep learning has achieved high accuracy on curated laboratory images, performance degrades substantially when applied to community-submitted photographs that vary widely in quality, framing, and background. We sought to develop a robust pipeline for distinguishing these two morphologically similar vectors from real-world citizen-science images.
Methods
We compiled 2,112 mosquito images from the Global Mosquito Observation Database (GMOD) and assembled a multi-stage pipeline comprising: (i) a binary classifier to screen for mosquito presence; (ii) a YOLO-based object detector to localize specimens; (iii) an image-quality assessment module evaluating brightness, sharpness (Laplacian variance), contrast, and bounding-box ratio; (iv) Segment Anything Model (SAM) segmentation to isolate specimens from background clutter; and (v) a YOLO classifier trained on binary segmentation masks. To target the diagnostic characters used in conventional morphological taxonomy, we refined the pipeline to focus detection on the thoracic scutum — the region bearing the lyre-shaped pale-scale pattern of Ae. aegypti and the median white stripe of Ae. albopictus .
Results
Baseline YOLO classification on raw images achieved 30.95% accuracy for Ae. aegypti and 78.4% for Ae. albopictus , reflecting strong class imbalance and background noise. Augmentation alone provided only modest gains. The presence/absence classifier reached 90.52% accuracy, and the object detector localized mosquitoes with near-perfect precision. Whole-body SAM-mask classification improved overall accuracy to 68.21%. Refining the pipeline to scutum-focused classification yielded preliminary accuracies of 87.5% and 83.3% for Ae. albopictus and Ae. aegypti , respectively.
Conclusions
Community-sourced mosquito images, despite substantial noise and inconsistency, can support automated species-level vector surveillance when paired with a domain-informed, multi-stage deep-learning pipeline. Aligning machine attention with the morphological characters used by entomologists — via scutum-focused detection — delivers meaningful accuracy gains. This framework supports scalable citizen-science vector monitoring and lays the groundwork for integrating high-fidelity three-dimensional reference libraries to further strengthen real-world classifier performance.