Generating Landslide Archive Inventories Using Web Scraping and NLP Techniques for Türkiye

Elnaz Najatishendi
Tolga Görüm
Seçkin Fidan
Fusun Balık Şanlı

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Landslides are among the most frequent natural hazards that cause significant loss of life and serious economic damage worldwide. Although many inventories have been created using different approaches to understand landslide events, these are rarely updated automatically or in real time. Traditional approaches are laborious processes due to the time and intensive labor requirements, and are limited in terms of timeliness due to reporting delays. To address these challenges, we developed an automated approach that integrates web scraping, natural language processing (NLP), and geocoding techniques using digital media news sources in Türkiye to create a landslide archive inventory. Our algorithm verified 1727 of the 3051 news articles it captured between 1997 and 2024 as landslides and identified a total of 478 fatalities in 212 deadly incidents. 66.5% of the landslides captured on the web were located at the neighborhood/village level, providing substantial spatial accuracy. This location accuracy has also enabled risk estimation at the neighborhood/village level. Comparison with the manual national inventory shows moderate agreement, with F1 scores ranging from 0.434 to 0.552 in ± 1 and ± 7 daytime windows. The automated method not only captures spatial and temporal patterns of landslides but also extracts key attributes such as location, number of fatalities, and triggering factors (i.e., natural and anthropogenic). Our study demonstrates the potential of web-based automated approaches to complement traditional landslide inventories by providing near-real-time and verifiable data. Finally, we suggest adopting common reporting standards for natural hazard digital newspapers so that this approach can spread globally.

Version published to 10.21203/rs.3.rs-7463555/v1 on Research Square
Sep 4, 2025

Two decades of land cover changes in the Colombian Andes

This article has 3 authors:
1. Paulo Arévalo
2. Christoph Nolte
3. Ana Reboredo Segovia
This article has no evaluationsLatest version Aug 29, 2025
Best practices and challenges for urban tree detection, classification, and geolocation with street-level images across North American cities

This article has 5 authors:
1. Thomas A Lake
2. Brit B Laginhas
3. Brennen T Farrell
4. Ross K Meentemeyer
5. Chris M Jones
This article has no evaluationsLatest version Oct 10, 2025
Introducing Deucalion and Pyrrha v1.0: Image Datasets for Disaster Management of Floods

This article has 3 authors:
1. Researcher, a little map, Greece
2. Stathis G. Arapostathis
3. PhD, Harokopio University
This article has no evaluationsLatest version Sep 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Two decades of land cover changes in the Colombian Andes

Best practices and challenges for urban tree detection, classification, and geolocation with street-level images across North American cities

Introducing Deucalion and Pyrrha v1.0: Image Datasets for Disaster Management of Floods