AutoPM3: enhancing variant interpretation via LLM-driven PM3 evidence extraction from scientific literature

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Rare diseases affect over 300 million people worldwide and are often caused by genetic variants. While variant detection has become cost-effective, interpreting these variants—particularly collecting literature-based evidence like ACMG/AMP PM3—remains complex and time-consuming.

Results

We present AutoPM3, a method that automates PM3 evidence extraction from literatures using open-source large language models (LLMs). AutoPM3 combines a Text2SQL-based variant extractor and a retrieval-augmented generation (RAG) module, enhanced by a variant-specific retriever and fine-tuned LLM, to separately process tables and text. We curated PM3-Bench, a dataset of 1027 variant-publication evidence pairs from ClinGen. On openly accessible pairs, AutoPM3 achieved 86.1% accuracy for variant hits and 72.5% recall for in trans variants—outperforming other methods, including those using larger models. We uncovered the effectiveness of AutoPM3’s key modules, especially for variant-specific retriever and Text2SQL, through the sequential ablation study. AutoPM3 located evidence in 76 s, demonstrating that open-source LLMs can offer an efficient, cost-effective solution for rare disease diagnosis.

Availability and implementation

AutoPM3 is implemented and freely available under the MIT license at https://github.com/HKU-BAL/AutoPM3.

Article activity feed