Analytical Centralization of Health Expenditure at the National Administrator of Health System Resources: Architecture, Data Quality, and Operational Performance of the ADRES Health System Analytics Platform, Colombia

Daniel Alfonso Garavito Jiménez
David E. Bello Angulo
Lady Tatiana Mejía Lemus
Diana Chipatecua
Daniel Darío Fula
Santiago Perez-Rubiano
Félix León Martínez
Juan Camilo Bohórquez Pinzón

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Between 2024 and 2025, Colombia universalized the Electronic Health Invoice with embedded Individual Health Services Delivery Records (RIPS — Registro Individual de Prestación de Servicios de Salud) (FEV-RIPS) as the standard for financial and clinical data exchange across the health system. ADRES — the entity responsible for administering the resources of the General Social Security Health System (SGSSS) — faced the challenge of processing information from multiple heterogeneous sources generated by more than 55,000 healthcare providers of varying complexity. Health systems in high-income countries converge clinical-financial data in consolidated platforms with years of operation; Colombia started from a fragmented architecture with incompatible historical sources, no cross-database standardization, and a resource administrator with no centralized analytical infrastructure until 2023.

Objective

We describe the design, the technical challenges of integrating heterogeneous data, and the operational performance of the analytical infrastructure built by ADRES to centralize large-scale processing of Colombian health system information, and we derive transferable lessons for health system resource administrators in Latin America facing equivalent digitalization mandates.

Methods

Technical-descriptive report based on operational metrics from the ADRES Azure/Databricks environment during January–November 2025. We report indicators of data volume managed, processing speed, deployed computational capacity, concurrent use by functional group, and implemented governance structure. The architecture integrates secure data transfer with MinSalud via VPN, OneLake Fabric connectivity, automated processing of multiple formats (XML, relational tables, flat files), and a data lake with a medallion pattern (Bronze/Silver/Gold) and automated pipelines. Data quality challenges are characterized through structural inconsistencies across system sources, coding incompatibilities (municipalities, dates, diagnoses), format heterogeneities in unstructured data, and the absence of complete technical documentation.

Results

The platform manages 21 catalogs, 1,183 tables, and over 110,645 million stored records, with cumulative production exceeding 1 trillion processed records. It executes queries on 100 billion records in ten seconds, using clusters of up to 32 TB RAM and 4,096 vCPU. During September–October 2025, monthly query peaks reached 78,028, distributed across eleven institutional functional groups. Integrating heterogeneous sources required developing specific technical capabilities: Python/PySpark parsers for XML with variable node depth, institutional equivalence tables to homologate incompatible municipality codes between BDUA and service delivery records, cleaning routines for extreme dates used as null representations (1900-01-01, 9999-12-31), and transformation logic to build coherent longitudinal series bridging classic RIPS and FEV-RIPS. During 2024–2025, the platform supported econometric expenditure analyses, multi-source information contrasts, responses to Constitutional Court judicial mandates, and publication of interactive dashboards publicly available on the ADRES institutional site. Integration of conversational AI agents (Genie, Copilot) enables analytical access for users without SQL knowledge, expanding the platform’s institutional reach.

Conclusions

ADRES built in one year an analytical infrastructure that provides, to our knowledge, the first published documentation of the systemic technical challenges of integrating heterogeneous data sources in a middle-income social security health system. The case demonstrates that centralizing health system information at national scale is technically feasible under the institutional constraints of a public entity — but it requires solving a set of cross-source data standardization problems that the literature on health information system implementation in middle-income countries does not document with quantitative precision. The derived lessons are transferable to health system resource administrators in Latin America facing equivalent challenges of heterogeneous information integration.

Version published to 10.64898/2026.06.08.26355159 on medRxiv
Jun 10, 2026

The Rise of Brazil’s Primary Care Digitalization: 12 Billion Records Across 27 Federative Units as a Foundation for Real-World Evidence and Scientific Democratization

This article has 4 authors:
1. Pedro Marton Pereira
2. Alysson Nathan Girotto
3. Gabriela Machado Silva
4. Gustavo Duregger
This article has no evaluationsLatest version Jul 4, 2026
A policy for delivery of essential medicines to vulnerable population in Argentina: a case study of the REMEDIAR program

This article has 4 authors:
1. Maisa Havela
2. Lucía Bartolomeu
3. Gisela Bardi
4. Adolfo Rubinstein
This article has no evaluationsLatest version Jun 8, 2026
Frequent, Persistent, and Yearly Inpatient Utilization Across a Multi-Hospital Government Health System in Jeddah, Saudi Arabia: A Retrospective Three-Definition Analysis (2022–2024)

This article has 4 authors:
1. Shada Baoum
2. Rajaa Al-Raddadi
3. Abdullah Alsahafi
4. Zaki Algasemi
This article has no evaluationsLatest version Jul 9, 2026

Discuss this preprint

Listed in

Abstract

Background

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

The Rise of Brazil’s Primary Care Digitalization: 12 Billion Records Across 27 Federative Units as a Foundation for Real-World Evidence and Scientific Democratization

A policy for delivery of essential medicines to vulnerable population in Argentina: a case study of the REMEDIAR program

Frequent, Persistent, and Yearly Inpatient Utilization Across a Multi-Hospital Government Health System in Jeddah, Saudi Arabia: A Retrospective Three-Definition Analysis (2022–2024)