Long-read transcriptomics corrects Trichomonas vaginalis intron annotations and maps poly(A)/UTR landscapes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Trichomonas vaginalis is a prevalent protozoan parasite responsible for the most common non-viral sexually transmitted infection worldwide. Despite its expansive genome (181.5 Mb; 36,310 protein-coding genes in the NYU_TvagG3_2 reference), intron annotations remain sparse and inconsistently validated. While a recent short-read RNA-seq study catalogued 63 putative "active" introns, inherent read-length limitations frequently misplace boundaries and inflate false positives, leaving true intron coordinates and transcript architectures unsettled. Methods To resolve these structural ambiguities, we integrated Oxford Nanopore direct RNA sequencing (DRS), ONT cDNA long reads, and Illumina short-read RNA-seq. This multi-platform approach enabled full-length intron re-annotation, precise delineation of 5′ and 3′ untranslated regions (UTRs), and single-molecule quantification of poly(A) tail lengths. Motif-guided candidate selection was orthogonally validated using targeted PCR and Sanger sequencing, with key events further replicated through the re-analysis of public SRA datasets. Results Our long-read evidence successfully reconciled and refined the existing short-read catalogue. We corrected five previously reported loci (two false positives, two coordinate mis-annotations, and one underlying gene sequence error) and identified three novel introns, all rigorously supported by multiple full-length reads and Sanger validation. Additionally, DRS allowed for the precise mapping of transcript end sites and AAUAAA positions relative to poly(A) addition, yielding comprehensive poly(A) length distributions and UTR boundaries for both intron-bearing and intron-free transcripts. Conclusions We deliver a rigorously validated, long-read–corrected intron, UTR, and poly(A) resource alongside a reproducible bioinformatics pipeline for non-model protists. This substantially improves the T. vaginalis reference annotation, establishing a critical foundation for future functional genomics, pathogenesis research, and diagnostic applications.

Article activity feed