To be, or not to be an intron evidence from entropy-based machine learning

Read the full article See related articles

Abstract

Alternative splicing (AS) of introns is a key mechanism contributing to proteomic diversity. It enables the generation of multiple mRNAs variants from a single gene sequence, which are subsequently translated into the distinct protein isoforms. Intron retention (IR) is a specific type of AS in which introns remain unspliced in the mature mRNA. The process of intron splicing is regulated by cis-regulatory elements that recruit small non-coding RNAs and heterogeneous nuclear ribonucleoproteins (hnRNP), collectively forming the spliceosome. However, the precise mechanism or “code” governing the splicing pattern of any primary transcript remains not completely understood. In this study, we present the use of explainable Machine Learning (xML) models to investigate the mecahnism underlying intron retention in mature mRNA. Intronic sequences obtained were analyzed from species within the ciliate genus Tetrahymena, providing unique insights into the IR process that are biased by tissue-specific factors. Various features of the intronic sequences were examined, including the absence of repetitive nucleotide motifs-quantified as “entropy”-the GC content, and the complexity of the secondary structures as estimated by the Lempel-Ziv (LZ) measure. Our findings indicate that the key distinguishing features of retained introns (RIs) compared to constitutively spliced introns include the reduced presence of repetitive nucleotide motifs within the intronic sequences and the compactness of secondary structures near the 3’ splice sites. These features appear to weaken splicing signals, impairing the recognition of intronic sequences and resulting in IR. In conclusion, our work offers insights into the regulatory code underlying intron retention in other organisms and highlights its potential role in modulating phenotypic plasticity. This supports a framework for understanding the epigenetic mechanisms of stress response and environmental adaptation, aligning with the “Lamarckian” perspective of evolutionary biology.

Article activity feed