Ensemble AI Screening of 1,211 TP53 Variants of Uncertain Significance
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background. TP53 is the most frequently mutated gene in human cancer, yet over 1,200 missense variants in ClinVar remain classified as Variants of Uncertain Significance (VUS), limiting their clinical utility in precision oncology. Methods. We applied two orthogonal deep-learning models—Meta's ESM-2 (esm2_t33_650M_UR50D; 650M parameters) and DeepMind's AlphaMissense to screen 1,211 TP53 ClinVar VUS. Each variant was scored by ESM-2 and cross-referenced against AlphaMissense predictions (UniProt P04637), yielding 1,199 successfully matched variants. Structural validation was performed against PDB 1TUP (Cho et al., 1994). Results. The two models showed strong anti-correlation (Pearson r = -0.706; Spearman rho = -0.714), supporting complementary predictive capacity. A total of 349 VUS were flagged as high-risk by both models (ESM-2 log-likelihood ratio ≤ -4.0 and AlphaMissense pathogenicity > 0.564), while 438 were concordantly classified as low-risk. We highlight five variants in the DNA-binding domain (L257R, V157D, R248P, C176R, R280I) with extreme concordant scores (ESM-2 LLR ≤ -12.5; AlphaMissense ≥ 0.99) and structurally validated damage mechanisms including loss of DNA contact, zinc coordination abolishment, and hydrophobic core disruption. Conclusions. Ensemble AI scoring can systematically reclassify TP53 VUS at scale. The 349 high-confidence pathogenic variants identified here warrant prioritized functional validation and may inform germline testing guidelines in Li-Fraumeni syndrome and somatic profiling in clinical oncology.