Decoding RNA Triple Helices: Identification from Sequence and Secondary Structure
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The discovery of long non-coding RNAs has revealed additional layers of gene‑expression control. Specific interactions of lncRNAs with DNA, RNAs, and RNA-binding proteins enable regulation in both cytoplasmic and nuclear compartments; for example, a conserved triple‑helix motif is essential for MALAT1 stability and oncogenic activity. Here we present a secondary-structure-based framework to annotate and detect RNA triple helices. First, we extend the dot-bracket formalism with a third annotation line that encodes Hoogsteen contacts. Second, we introduce TripleMatcher, which searches for a tripl-helix pattern, filters candidates by C1'-C1' distance thresholds, and merges overlaps into region-level zones. Using telomerase RNAs and RNA-stability elements with experimentally established triple helices (8 RNAs), TripleMatcher localized all annotated regions (structure-wise detection 8/8); geometric filtering removed most spurious candidates and improved precision (PPV from 0.42 to 0.81) and overall accuracy (F_1 from 0.42 to 0.62) while maintaining sensitivity. Benchmarking eight predictors showed that pseudoknot-aware methods most reliably reproduce the local architecture required for detection, aligning secondary-structure quality with downstream triple-helix recovery. Applied prospectively, the framework identified candidate regions directly from predicted secondary structures and scaled to a screen of 4,147 RNAs, where distance filtering reduced 150,948 raw candidates to 90 geometrically feasible regions across seven molecules, including human telomerase complexes. Together, the notation and TripleMatcher provide a concise route from secondary structure to a small, interpretable set of triple-helix candidates suitable for targeted experimental validation.