A Pāṇinian Grammar of the Human Genome: The Genomic Periodicity Index Encodes Functional Architecture and Evolutionary Innovation

Amit Pande
Rahul Sharma
Christian Garbe

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We describe a three-level formal grammar of the human genome that emerges directly from DNA sequence without prior biological annotation. At the first level, the genome maintains a universal structural periodicity of approximately 400 base pairs — an alternation of repeat and unique sequence that we term the Genomic Periodicity Index (GPI). Local deviations from this periodicity, the GPI Deviation (GPID), identify positions where functional requirements override structural packaging: 81.4% of GPID breaks across all 23 human chromosomes overlap known functional elements (p = 3×10⁻¹²⁷ versus a 30% genome-wide baseline), a principle conserved across human, chimpanzee, and gorilla (89%, 75%, and 68%, respectively).At the second level, dinucleotide composition resolves into five universal sequence classes across all chromosomes tested (mean similarity 0.96, p = 4.65×10⁻⁸), each mapping to a distinct biological function without annotation: the CpG-rich, promoter-associated class, enriched 3.72-fold at GPID breaks, marks sites of transcription initiation; the LINE-rich, AT-rich class marks structurally rigid scaffold sequence. At the third level, transition rules between sequence classes define forbidden adjacencies and reveal systematic compositional changes at transcription start sites, transcription end sites, and splice junctions.Positions combining the structurally rigid sequence class with GPID breaks show consistent enrichment for pathogenic variants (OR = 2.64, p = 10⁻³²), consistent with a model in which local sequence entropy predicts variant intolerance. A genome-wide grammar model trained on 1,116,212 windows across all 23 chromosomes independently recovers the Tridosha biochemical hierarchy from dinucleotide composition alone: the Pitta class (CpG-rich, GC-content 0.307) concentrates on gene-dense chromosomes (chr19, chr22) and is associated with metabolic and cardiovascular disease; the Kapha class (AT-rich, repeat-dense) is associated with structural tumour suppressor and DNA repair disorders; the Vāta class with neural and movement disorders — a correspondence with Āyurvedic clinical taxonomy derived without prior annotation. Across primate evolution, the GPI at three loci of human-specific disease burden — APOE, HBB, and CFTR — is 10–20-fold longer in human than in other primates, reflecting lineage-specific reorganization of regulatory architecture; nine sites of human-specific regulatory deletion coincide with loss of GPID breaks, marking sequence grammar changes associated with human trait divergence. Derived from raw DNA sequence alone — without biochemical assay, evolutionary alignment, or prior annotation — the GPI framework reveals the human genome as a formal positional grammar encoding regulatory identity, evolutionary constraint, and human-specific biology.

Version published to 10.20944/preprints202604.0713.v1
Apr 10, 2026

Organ-System Disease Identity Is Encoded in the Physical Grammar of Regulatory DNA

This article has 3 authors:
1. Amit Pande
2. Rahul Sharma
3. Christian Garbe
This article has no evaluationsLatest version Mar 23, 2026
Genomic codon usage is structurally consistent with First-Classness across the tree of life

This article has 1 author:
1. Douglas JosephyHuntington Moore
This article has no evaluationsLatest version Apr 14, 2026
Systematic inference of mutation rates and spectra across the tree of life via a scalable read-based framework

This article has 3 authors:
1. Yosef Maruvka
2. Asaf Pinhasi
3. Keren Yizhak
This article has no evaluationsLatest version Mar 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Organ-System Disease Identity Is Encoded in the Physical Grammar of Regulatory DNA

Genomic codon usage is structurally consistent with First-Classness across the tree of life

Systematic inference of mutation rates and spectra across the tree of life via a scalable read-based framework