From Text to Translation: Using Language Models to Prioritize Variants for Clinical Review

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Backgrounds

Despite rapid advances in genomic sequencing, most rare genetic variants remain insufficiently characterized for clinical use, limiting the potential of personalized medicine. When classifying whether a variant is pathogenic, clinical labs adhere to diagnostic guidelines that comprehensively evaluate many forms of evidence including case data, computational predictions, and functional screening. While a substantial amount of clinical evidence has been developed for many of these variants, the majority cannot be definitively classified as ‘pathogenic’ or ‘benign’, and thus persist as ‘Variants of Uncertain Significance’ (VUS).

Methods:

We processed over 2.4 million plaintext variant summaries from ClinVar, employing sentence-level classification to remove content that does not contain evidence and removing uninformative or highly similar summaries. We then trained ClinVar-BERT to discern clinical evidence within these summaries by fine-tuning a BioBERT-based model with labeled records.

Results

We validated ClinVar-BERT model predictions for variant summaries that are classified as uncertain (VUS) using orthogonal functional screening data. ClinVar-BERT significantly separated estimates of functional impact in clinically actionable genes, including BRCA1 (p = 1.90×10 20 ), TP53 (p = 1.14×10 47 ), and PTEN (p = 3.82 × 10 7 ) and achieved an AUROC of 0.927 when classifying whether variants result in loss of function or have uncertain effects.

Conclusion

These findings suggest that ClinVar-BERT is capable of discerning evidence from diagnostic reports and can be useful for prioritizing variants for re-assessment by diagnostic laboratories and expert curation panels.

Article activity feed