Automating Epidemiology Report Generation from the MIMIC-IV Clinical Database using SNOMED CT and SQL
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective To present a unified and modular framework for automating the epidemiological research process from cohort definition to analysis and visualization using the MIMIC-IV dataset. Materials and Methods We combined SNOMED-CT ontologies, prompt-engineered SQL generation, and integration of structured and unstructured electronic health record data. Statistical summaries, logistic regression, and network-based co-word analyses were generated. Results The system successfully automated tasks such as cohort selection, ontology mapping, entity recognition, statistical analysis, and visualization. Applied to MIMIC-IV, the framework produced reproducible and interpretable epidemiological insights within hours, highlighting efficiency gains compared with manual workflows. Discussion Our approach demonstrates methodological advances by integrating knowledge engineering, NLP, and network analysis into a reproducible pipeline. The framework enables scalable, transparent, and efficient epidemiological research but remains limited by computational demands and variability in large language model–based SQL generation. Conclusion This modular pipeline illustrates a pathway toward automated, semantically grounded epidemiology reporting from EHRs, with potential applications in clinical and public health informatics.