TransStop, a genomic language model for the pan-drug prediction of translational readthrough efficacy

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Premature termination codons (PTCs) are a major cause of genetic diseases, but the efficacy of therapeutic readthrough agents is highly context-dependent. While linear models have shown promise in predicting readthrough efficiency, they may not fully capture the complex, non-linear interactions between sequence context and drug activity.

Methods

We developed TransStop, a transformer-based pan-drug model, trained on a dataset of ∼5,400 PTCs and eight readthrough compounds. The model learns sequence representations and incorporates learnable embeddings to capture drug-specific effects, allowing a single model to predict efficacy for multiple drugs.

Results

Our model achieved a global R 2 =0.94 on a held-out test set. Visualizations of the learned embeddings revealed a deep understanding of biological principles, including the distinct clustering of stop codon types and grouping of drugs by mechanism of action. In silico saturation mutagenesis and epistasis analyses uncovered complex, non-additive sequence determinants of readthrough. We generated 32.7 million predictions across the human genome, covering all possible PTCs. Analysis of these genome-wide predictions revealed strong drug specializations for specific stop codon contexts and identified key areas of disagreement with previous models, particularly for UGA codons, where our model predicts a more effective drug in thousands of cases. The TransStop model represents a significant advancement in the prediction of translational readthrough efficiency. Its superior accuracy and the biological insights derived from its applications provide a powerful tool for guiding clinical trial design, drug development, and personalized patient treatment.

Availability and Implementation

Source code: https://github.com/Dichopsis/TransStop . Model: https://huggingface.co/Dichopsis/TransStop.Genome-wide predictions: https://doi.org/10.5281/zenodo.16918476 .

Article activity feed