Dataset Documentation for Responsible AI: Analysis of Suitability and Usage for Health Datasets

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial Intelligence (AI) is rapidly transforming healthcare, but also raising concerns about algorithmic biases that mostly stem from the training data. It is widely supported that transparent dataset documentation is key to enabling responsible AI development. Several standardized dataset documentation approaches have been established, such as Datasheet, Dataset Nutrition Label, Accountability Documentation, Healthsheet, and Data Card. However, their suitability and usage for health datasets remain unclear. In this work, we compared all five approaches and evaluated their alignment with the STANDING Together Recommendations for Documentation of Health Datasets. We also investigated their real-world usage and gathered insights from generators and consumers of health datasets. Our findings reveal that none of these documentation approaches are used widely or fully suited for health datasets. We recommend developing a standard documentation approach for health datasets along with clear guidelines and automation tools to support adoption.

Article activity feed