Data Challenges in AI Systems and their Solutions: A Requirements and AI Engineering Systematic Literature Review and Comparison

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The performance and reliability of AI-based software systems depend heavily on data, but managing this data throughout the system lifecycle presents significant challenges. Since data characteristics fundamentally define AI system behavior, specifying and managing these data characteristics should be a critical part of the system's requirements. Requirements Engineering (RE) is therefore essential. While both RE and AI Engineering communities address data-related issues for AI systems, their approaches and the gaps between them remain unclear. To investigate this, we conducted a systematic literature review of 227 primary studies to map data-related challenges for AI-based systems and compare the solutions emerging from both communities. We present two primary contributions: (1) a taxonomy of 28 data challenges for AI systems, addressed by the RE and AI Engineering communities, and structured around a data-centric lifecycle; and (2) a mapping of 108 existing solutions (50 from RE and 58 from other AI Engineering disciplines) to these challenges. This provides an overview of challenges and solutions from both perspectives, highlighting problems that are often overlooked, such as unused ''dark data'' and data selection explainability, while also showing challenges that are often the focus of existing work, such as ''lack of domain knowledge'' and ''data quality concerns''. Based on these findings, we outline a research agenda and discuss potential synergy between RE and AI Engineering. Practitioners and researchers can use these results to direct their future efforts in addressing data challenges in AI-based systems, leading to the development of more reliable, robust and trustworthy AI systems.

Article activity feed