Child Sexual Abuse Datasets: A Systematic Review

João Macedo
Camila Laranjeira
Leo S. F. Ribeiro
Carlos Caetano
Fabricio Benevenuto
Sandra Avila
Jefersson A. dos Santos

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid growth of Child Sexual Abuse Imagery (CSAI) online demands automated tools to support timely and effective investigations. Machine learning models are essential for triage tasks such as CSAI classification. However, strict legal restrictions on data access force researchers to rely on datasets curated by law enforcement or proxy datasets from related domains. In this systematic review, we examine datasets — containing both real CSAI and CSAI-like content (e.g., synthetic or approximate) — used for training, evaluation, or statistical analysis in CSAI-related machine learning research. We distinguish between main datasets, used directly for tasks such as CSAI classification, and proxy datasets, used for related tasks like age estimation. Our analysis reveals a prevailing model-centric paradigm that prioritizes algorithmic performance while neglecting critical dataset properties, such as diversity, documentation, and fairness. This tendency risks introducing harmful biases and unintended effects when models are deployed in real-world contexts. To address these concerns, we evaluate the strengths and limitations of existing datasets, highlight key CSAI-specific data attributes, and advocate for a shift toward data-centric practices. We emphasize the urgent need for transparent dataset creation and standardized documentation to improve AI systems' ethical integrity and reliability in this high-stakes domain.

Version published to 10.21203/rs.3.rs-7963252/v1 on Research Square
Oct 28, 2025

Tools for Helping Identify Behavior Disorders: Comparing Bayesian Evidence-Based and Machine Learning Approaches

This article has 7 authors:
1. Yinuo Liu
2. Eric Arden Youngstrom
3. Caroline Bodary
4. Zhuoyu Shi
5. Jennifer Youngstrom
6. Ekaterina Stepanova
7. Robert L. Findling
This article has no evaluationsLatest version Dec 12, 2025
Child Online Sexual Exploitation and Abuse: Understanding Adversarial Tactics Techniques and Procedures

This article has 2 authors:
1. Abel Yeboah-ofori
2. Awo Aidam Amenyah
This article has no evaluationsLatest version Jan 13, 2026
Child Sexual Exploitation Material: Investigative and Legal Challenges with Generative Artificial Intelligence

This article has 1 author:
1. Chad M.S. Steel
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Tools for Helping Identify Behavior Disorders: Comparing Bayesian Evidence-Based and Machine Learning Approaches

Child Online Sexual Exploitation and Abuse: Understanding Adversarial Tactics Techniques and Procedures

Child Sexual Exploitation Material: Investigative and Legal Challenges with Generative Artificial Intelligence