INTEGRATION OF DATA LAKES AND DATA WAREHOUSES FOR AI-DRIVEN HEALTHCARE ANALYTICS
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid digitalisation of healthcare has led to an unprecedented growth in heterogeneous data generated from Electronic Health Records (EHRs), Internet of Medical Things (IoMT) devices, medical imaging systems, and emerging AI-driven applications. While this data explosion presents significant opportunities for advanced analytics and intelligent decision-making, it also exposes critical challenges related to data silos, interoperability, governance, and architectural rigidity.
This study investigates the integration of data lakes and data warehouses within AI-driven healthcare analytics, using the Medilink OneHealth Data Cloud Initiative as a case study. The research addresses the fundamental “schema chasm” between schema-on-read flexibility and schema-on-write reliability, which constrains the effective use of large-scale healthcare data.
Through a structured case study methodology informed by an extensive literature review, the study analyses Medilink’s existing dual-architecture environment, identifying technical, operational, and organisational limitations that hinder unified analytics, regulatory compliance, and AI adoption. The findings demonstrate that maintaining separate data lakes and data warehouses introduces fragmentation, governance complexity, and increased operational overhead. The study evaluates hybrid data architectures, particularly the data lakehouse paradigm, as a strategic solution to reconcile flexibility and reliability. It highlights how modern lakehouse technologies can enable unified data management, improve data quality and governance, support diverse analytical workloads, and enhance scalability within regulated healthcare environments.
The research concludes that adopting a lakehouse-oriented architecture, supported by robust data governance and organisational readiness, provides a viable pathway for healthcare organisations seeking to balance innovation with trust, compliance, and operational efficiency in AI-driven analytics.