Signatures of Micropeptides Encoded by lncRNAs in Cancer Progression and Metastasis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Long non-coding RNAs (lncRNAs) are key regulators of gene expression, chromatin remodeling, and signaling. Recent estimates suggest that the human genome contains more than 35,000 lncRNA genes, with roughly 20% predicted to encode micropeptides (MPs) with unknown functions. In this study, we focused on the subset of lncRNAs with strong statistical evidence for MP-encoding potential, accounting for approximately 8% of the unfiltered MPs collection. Our analysis centered on 1,782 high-confidence lncRNA-MPs derived from 478 genes expressed across 17 cancer types from The Cancer Genome Atlas (TCGA). We show that lncRNA-MPs display distinct amino acid compositions and unique 4-mer patterns compared to the human coding proteome. A few genes (9) with exceptionally long transcripts are characterized by ≥20 MPs each. Functional interference confirmed that most of the lncRNA-MPs are unstructured. Only a third of the genes display some phylogenetic conservation, and only 4 genes display canonical N-terminal signal peptides characteristic of secreted proteins. We focused on cancer progression-associated lncRNAs that show differential expression (z-score >|3|) across consecutive tumor stages and metastatic states (transitional lncRNAs, Tr-lncRNAs). A collection of 72 genes encoding 314 MPs (Tr-lncRNA-MPs) was detected, with 76% of the MPs being ≥30 amino acids long. Prediction by AlphaFold 2.0 and homology modeling tools revealed dozens of MPs with well-defined secondary structures and recognizable 3D motifs. Among the longer Tr-lncRNA-MPs (>60 amino acids), we confirmed the presence of ubiquitin-like, RNase H-related, and other conserved foldable motifs. Known cancer lncRNAs containing high-confidence MPs (XIST, UCA1, HOXA11-AS, LINC01234, and HAND-AS1) overlap with 50 pan-cancer lncRNAs associated with tumor stage or metastasis transitions. Together, these findings demonstrate that integrating sequence motifs (e.g., signal peptides, k-mers) with structural foldability offers a multifaceted view of lncRNA-MPs in cancer. We argue that the capacity to produce MPs may reinforce the oncogenic impact dominated by the lncRNA entity. We propose that Tr-lncRNA-MPs represent a promising new class of biomarkers and therapeutic targets in oncology.
Key points
-
478 lncRNA genes with strong evidence for micropeptide (MPs) production generated 1,782 distinct lncRNA-MPs.
-
72 lncRNAs and 314 MPs are associated with transitional lncRNAs from 17 cancer types and stages of tumor progression and metastasis.
-
Sequence and structural analyses reveal many MPs with reliable 3D folding potential.
-
Dozens of previously overlooked MPs may serve as novel biomarkers and therapeutic targets in cancer.