Structure-Aware Mapping of Disease-Relevant Missense Variation: A Case Study in Three Nuclear Pore Complex Genes

Fatemeh Yekeh Yazdandoost
Mohammad Parsa

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Missense variation in the nuclear pore complex (NPC) remains difficult to interpret because sequence change, structural context, and sparse clinical labels all interact in nontrivial ways. We study three functionally distinct nucleoporins GLE1, NUP214 , and NUP62 and build a reproducible pipeline that binds variants to canonical UniProt coordinates, overlays AlphaFold2 per-residue confidence, and assigns domain/feature labels from UniProtKB/Pfam. Primary inferences rely strictly on curated Clin-Var assertions, while a separate high-confidence pseudo-labeled cohort is created for sensitivity analyses using a guarded weak-supervision scheme: a centroid-cosine scorer over handcrafted sequence-structural features is ensembled with a positive-unlabeled classifier, and only variants passing conservative probability gates are promoted. Across genes, curated data reveal coherent structure-function signals: pathogenic substitutions concentrate in specific domains and structurally ordered regions, while the pseudo-labeled cohort preserves these trends under expanded sample size without entering into hypothesis tests. The result is a transparent workflow that cleanly separates ground truth from weak supervision, avoids leakage, and produces interpretable, domainlevel effect estimates. We argue that this combination of principled labeling, structural context, and simple, auditable models offers a practical path for variant interpretation in nucleoporins and, more broadly, in proteins rich in intrinsically disordered and repeat-containing regions.

Version published to 10.1101/2025.10.27.684907 on bioRxiv
Oct 29, 2025

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
A Computational Atlas of Mutational Vulnerability Highlights Convergent Prion-Like and Aggregation-Associated Features in Neurodegenerative Proteins

This article has 1 author:
1. Yathu Krishna Y K
This article has no evaluationsLatest version Jan 13, 2026
Path-Probability Models Outperform Point-Estimate Scores for Noncoding GWAS Gene Prioritization

This article has 1 author:
1. Abduxoliq Ashuraliyev
This article has no evaluationsLatest version Dec 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

A Computational Atlas of Mutational Vulnerability Highlights Convergent Prion-Like and Aggregation-Associated Features in Neurodegenerative Proteins

Path-Probability Models Outperform Point-Estimate Scores for Noncoding GWAS Gene Prioritization