Sentinel-2 Land Cover Classification: State-of-the-Art Methods and the Reality of Operational Deployment—A Systematic Review
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This systematic review investigates recent advances and persistent challenges in Land Use and Land Cover (LULC) classification using Sentinel-2 imagery, emphasizing the gap between benchmark results and operational performance. Following PRISMA guidelines, we analyzed 89 peer-reviewed studies published between 2020–2025 to address the discrepancy between academic benchmarks and real-world deployment. While benchmark datasets such as EuroSAT routinely achieve accuracies above 98%, operational systems deployed at regional or global scales typically reach only 75–85%. Through systematic analysis and meta-analysis of reported results, we identify three main factors: (i) methodological issues, particularly the inflation of reported accuracies caused by spatial autocorrelation; (ii) domain adaptation limitations, where geographic and temporal transferability reduce accuracy by 15–25%; (iii) training data constraints, where geographic diversity proves more important than sample size. Multi-spectral approaches provide modest 5–8% gains over RGB at significantly higher computational costs. Foundation models (e.g., Prithvi, Sky Sense) and self-supervised learning show promise for reducing data requirements while maintaining performance. Comparisons with operational products such as ESA WorldCover and Google Dynamic World confirm the more modest performance achievable under real-world conditions. The findings emphasize the need for rigorous spatial validation protocols, standardized evaluation frameworks, and closer integration between research and operational development.