Data-driven design of LNA-blockers for efficient contaminant removal in Ribo-seq libraries
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Ribo-Seq libraries often contain a high amount of non-coding RNA fragments, which can significantly reduce the information output of these experiments. Contaminants can comprise up to 90% of a Ribo-Seq library, showing high sequence variability and diverse fragmentation, which hinders the effectiveness of rRNA depletion kits with fixed target sequences. We developed a workflow to identify experiment-specific contaminants from a small-scale, preliminary sequencing run. This enables the design of locked nucleic acid (LNA) oligonucleotides that target the contaminating fragments, thereby preventing their amplification during library preparation. This process requires only a single pipetting step and no additional purification. In a proof-of-concept experiment, just five LNAs reduced contaminating fragments by over 30 %, doubling the amount of useful sequencing data from Ribo-Seq experiments.
We offer a script to identify and visualize contaminants and optimized target sequences, along with guidelines for designing custom LNA sets and a collection of predesigned LNAs for Arabidopsis thaliana across various common growth conditions, serving as a foundation for a public LNA repository.
Significance Statement
Ribo-Seq libraries often contain abundant non-coding RNA contaminants, which, because of their high sequence variability and diverse fragmentation, are challenging to remove. We present a computational pipeline that identifies experiment-specific target sequences and allows for their efficient depletion using custom LNA probes in a single pipetting step, thereby increasing sequencing yield and reducing costs. A public LNA repository will support sharing validated targets within the research community.