AI-Supported Extraction of Functional Tissue Unit Properties for Human Reference Atlas Construction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The Human Reference Atlas (HRA) effort brings together experts from more than 25 international consortia to capture the multiscale organization of the human body—from large anatomical organ systems (macro) to the single-cell level (micro). Functional tissue units (FTU, meso) in 10 organs have been detailed and 2D illustrations have been created by experts. Comprehensive FTU property characterization is essential for the HRA, but manual review of the vast number of scholarly publications is impractical. Here, we introduce Large-Model Retrieval-Augmented Generation for HRA FTUs ( HRAftu-LM-RAG ), an AI-driven framework for scalable and automated extraction of FTU-relevant properties from scholarly publications. This validated framework integrates Large Language Models for textual reasoning, Large Vision Models for visual interpretation, and Retrieval Augmented Generation for knowledge grounding, offering a balanced trade-off between accuracy and processing efficiency. We retrieved 244,640 PubMed Central publications containing 1,389,168 figures for 22 FTUs and identified 617,237 figures with microscopy and schematic images. From these images and associated text, we automatically extracted 331,189 scale bars and 1,719,138 biological entity mentions, along with donor metadata such as sex and age. The resulting data facilitates the design, review, and approval of future FTU illustrations during HRA construction.