ForVA and GCM-CLIP: A Million-Scale Multimodal Dataset and Representation Learning Framework for Virtual Autopsy

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Intelligent virtual autopsy faces a profound semantic misalignment driven by scarce multimodal data and insufficient fine-grained cognitive mapping, leaving models vulnerable to complex post-mortem noise and catastrophic 'shortcut learning'. To bridge this misalignment, we curate ForVA, a standardized multimodal virtual autopsy dataset of 1.2 million image-text pairs across 9 categories of death causes, and propose GCM-CLIP,a semantics-enhanced contrastive learning framework with an adaptive semantic decoupling module acting as a high-precision "semantic filter". Mechanistic analysis shows GCM-CLIP sharpens semantic discrimination, reduces intra-/inter-class pathological feature overlap (from 0.830/0.709 to 0.566/0.452), delivers a 25\% relative gain in zero-shot classification accuracy and 6-8\% improvements in cross-modal retrieval. Clinically, it empowers junior practitioners to achieve senior-level diagnostic precision and functions as an unbiased "second reader" to capture lesions overlooked due to cognitive anchoring. This work provides a reproducible paradigm for foundation models in high-stakes, data-scarce fields, offering transformative implications for forensic objectivity and judicial justice globally.

Article activity feed