Integration of AI and ETL Tools for Enhanced Healthcare Data Management

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid proliferation of healthcare data from electronic health records (EHRs), medical imaging systems, laboratory devices, and IoT-enabled patient monitoring devices has created unprecedented challenges for healthcare data management. Traditional Extract, Transform, Load (ETL) tools have long been employed to collect, integrate, and load data into centralized repositories such as data warehouses and data lakes. However, conventional ETL processes are often limited by rigid rule-based transformations, inefficiencies in handling unstructured or semi-structured data, and lack of automation in data quality assurance. This study investigates the integration of Artificial Intelligence (AI) techniques into ETL pipelines to enhance healthcare data management. AI methods—including machine learning, deep learning, and natural language processing (NLP) are incorporated to automate anomaly detection, optimize transformation rules, and extract insights from unstructured clinical text. A conceptual framework is proposed for an AI-augmented ETL system that ingests heterogeneous healthcare data, applies intelligent transformations, and loads high-quality, enriched datasets into a secure data warehouse. The system architecture enables real-time and batch processing, anomaly detection, and adaptive learning to improve ETL efficiency over time. Evaluation metrics include data quality improvement, processing speed, anomaly detection accuracy, and scalability. The findings demonstrate that AI-enhanced ETL significantly reduces data errors, accelerates processing, and provides enriched datasets suitable for downstream analytics, predictive modeling, and decision-making in healthcare operations. By integrating AI into ETL workflows, healthcare organizations can achieve more reliable, timely, and actionable data management, supporting clinical decision-making, operational efficiency, and regulatory compliance. This study contributes to the literature on intelligent data engineering in healthcare, presenting a scalable framework for future research and practical implementation in complex healthcare IT ecosystems.

Article activity feed