Protein Composition, Not Mutation Identity, Determines Disease Manifestation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Current variant interpretation predicts pathogenicity but cannot predict which organ system will be affected. We show that three numbers -- the fractional composition of flexible (G,A,S,P,T), catalytic (D,E,R,K,H,C,N,Q), and structural (L,I,V,F,Y,W,M) amino acids in a protein -- predict disease category with 91.2% accuracy across 13 organ systems and 68,573 ClinVar pathogenic variants, a task no existing tool addresses. Remarkably, protein composition alone accounted for 97.1% of predictive power; the specific mutation contributed only 2.9%. This three-group classification derives from the Panchamahabhuta-Tridosha framework of Ayurveda, which maps matter to three functional principles: Vata (mobility), Pitta (transformation), and Kapha (stability) -- corresponding precisely to the flexible, catalytic, and structural amino acid groups validated here. The framework correctly predicted tissue vulnerability hierarchies across 7,635 germline and 16,757 somatic mutations, cancer progression trajectories in TCGA transcriptomes (4,578 samples), and psychiatric disorder signatures validated in patient-derived brain organoids. That a classification system developed through millennia of clinical observation maximizes disease prediction accuracy from protein sequence suggests it captured fundamental biochemical organizing principles that modern systems overlook.

Article activity feed