Mapping 25,000 Cultural Heritage Sites with GIS and NLP: A Data-Driven Framework for Spatiotemporal Pattern Recognition

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Preservation of cultural heritage in the digital era confronts unprecedented complexity, demanding methods that can reliably document, analyze, and interpret ever-growing corpora of heritage sites. Conventional workflows—built for smaller, more uniform collections—struggle with today’s large-scale, heterogeneous datasets and often fail to surface robust, generalizable patterns. Methods: We designed an integrated computational workflow that fuses Geographic Information Systems (GIS), Natural Language Processing (NLP), and machine learning to interrogate 25,705 cultural heritage sites across Zhejiang Province, China. The pipeline combines kernel density estimation (bandwidth = 15.7 km), hierarchical spatial clustering, topic modeling via Latent Dirichlet Allocation (50 topics), and an ensemble classification stage incorporating Support Vector Machines, Random Forest, and Gradient Boosting. Results. Global spatial structure exhibits strong positive autocorrelation (Moran’s I = 0.847, p < 0.001). We further resolve five well-defined clusters that differ systematically in both heritage-site density and cultural attribute profiles. The ensemble classifier achieves 89.3% accuracy for heritage-type prediction (evaluated on a held-out set, n = 750), while the spatial prediction models prioritize 1,247 high-probability candidate locations for undocumented sites with a validation accuracy of 78.4%. Conclusions. Taken together, these results demonstrate the utility of a computational pipeline for province-scale cultural-heritage analysis, delivering decision-ready evidence to inform preservation planning and to target archaeological survey. The methodology is readily transferable to other regions and cultural settings, enabling rigorously evidence-based heritage management.

Article activity feed