ForVA and GCM-CLIP: A Million-Scale Multimodal Dataset and Representation Learning Framework for Virtual Autopsy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Intelligent virtual autopsy faces a profound semantic misalignment driven by scarce multimodal data and insufficient fine-grained cognitive mapping, leaving models vulnerable to complex post-mortem noise and catastrophic 'shortcut learning'. To bridge this misalignment, we curate ForVA, a standardized multimodal virtual autopsy dataset of 1.2 million image-text pairs across 9 categories of death causes, and propose GCM-CLIP,a semantics-enhanced contrastive learning framework with an adaptive semantic decoupling module acting as a high-precision "semantic filter". Mechanistic analysis shows GCM-CLIP sharpens semantic discrimination, reduces intra-/inter-class pathological feature overlap (from 0.830/0.709 to 0.566/0.452), delivers a 25\% relative gain in zero-shot classification accuracy and 6-8\% improvements in cross-modal retrieval. Clinically, it empowers junior practitioners to achieve senior-level diagnostic precision and functions as an unbiased "second reader" to capture lesions overlooked due to cognitive anchoring. This work provides a reproducible paradigm for foundation models in high-stakes, data-scarce fields, offering transformative implications for forensic objectivity and judicial justice globally.