Generating Landslide Archive Inventories Using Web Scraping and NLP Techniques for Türkiye
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Landslides are among the most frequent natural hazards that cause significant loss of life and serious economic damage worldwide. Although many inventories have been created using different approaches to understand landslide events, these are rarely updated automatically or in real time. Traditional approaches are laborious processes due to the time and intensive labor requirements, and are limited in terms of timeliness due to reporting delays. To address these challenges, we developed an automated approach that integrates web scraping, natural language processing (NLP), and geocoding techniques using digital media news sources in Türkiye to create a landslide archive inventory. Our algorithm verified 1727 of the 3051 news articles it captured between 1997 and 2024 as landslides and identified a total of 478 fatalities in 212 deadly incidents. 66.5% of the landslides captured on the web were located at the neighborhood/village level, providing substantial spatial accuracy. This location accuracy has also enabled risk estimation at the neighborhood/village level. Comparison with the manual national inventory shows moderate agreement, with F1 scores ranging from 0.434 to 0.552 in ± 1 and ± 7 daytime windows. The automated method not only captures spatial and temporal patterns of landslides but also extracts key attributes such as location, number of fatalities, and triggering factors (i.e., natural and anthropogenic). Our study demonstrates the potential of web-based automated approaches to complement traditional landslide inventories by providing near-real-time and verifiable data. Finally, we suggest adopting common reporting standards for natural hazard digital newspapers so that this approach can spread globally.