The Chromosomal-Boundary Paradox of Processed Pseudogene Annotation in the T2T Era
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Processed pseudogenes arise through LINE-1–mediated reverse transcription and reintegration of mRNA, a mechanism increasingly recognized as a major force shaping gene evolution. Far from being inert genomic debris, these retrocopies have contributed to gene diversification and regulatory innovation across mammalian lineages. With the completion of telomere-to-telomere (T2T) human genome assemblies, many such loci have been found within duplication-dense pericentromeric and subtelomeric regions. These dynamic chromosomal boundaries frequently undergo recombination and segmental duplication, raising the possibility that an initial retrocopy insertion can act as a structural seed for subsequent DNA-level propagation. Notably, the insertion and duplication of the SEPTIN14 3′ terminal exon produced the composite CICP-SEPTIN14P co-mobilized gene unit underlying the CICP pseudogene family. Under current morphology-based annotation systems, such loci are automatically classified as processed pseudogenes, highlighting the limitation of appearance-driven annotation that overlooks mechanistic origin and duplication history. Results: The CICP-SEPTIN14P pair constitutes a co-mobilized gene unit that exemplifies how a processed pseudogene and its parental gene can propagate together as a single duplication block across chromosomal boundaries. Comparative inspection of the GRCh38.p14 and T2T-CHM13v2.0 assemblies identified 28 CICP loci in the human genome, most of which share sequence similarity with SEPTIN14. Approximately two-thirds of these loci are positioned within or adjacent to telomeric, subtelomeric, centromeric, or pericentromeric regions. The putative ancestral copy, CICP12, appears to be a processed pseudogene embedded within the final intron of SEPTIN14, forming the original co-mobilized gene unit that was subsequently propagated to multiple chromosomes through segmental duplication. Sequence alignments revealed extended tracts of > 90% identity among pericentromeric and subtelomeric members, supporting a model in which an integrated CICP-SEPTIN14P block was duplicated as a whole rather than generated by independent retrotransposition events. Expression profiling based on GTEx data showed that CICP16 and SEPTIN14P4 display strikingly similar expression patterns across multiple human tissues, suggesting that this co-mobilized duplication unit retained coordinated regulatory behavior after relocation. Comparable tendencies across other CICP-SEPTIN14P pairs reinforce the view that segmentally duplicated, co-mobilized gene units can preserve joint transcriptional control under shared chromatin environments, demonstrating that boundary-linked duplication can maintain regulatory synchrony even after integration into distinct chromosomal contexts. Conclusions: The CICP-SEPTIN14P co-mobilized gene unit illustrates how a processed pseudogene can transform into a duplication-driven expansion module once integrated near a chromosomal boundary. To avoid systematic misclassification of such loci as independent retrocopies, I propose the Chromosomal Boundary-Associated Processed Pseudogene framework. This model flags any parental gene, pseudogene, or related fragment located within telomeric, subtelomeric, centromeric, or pericentromeric regions and automatically extends the flag to all members of the same gene family for manual review. In addition, I recommend formally recognizing segmentally duplicated processed pseudogenes, defined as processed pseudogenes that later underwent DNA-level block duplication. Together, these measures establish an evolution-aware annotation strategy that integrates chromosomal context into pseudogene classification, offering a framework for improving genome annotation and interpretation in the post-T2T era.