TETGen: A Transcriptome-Guided Transformer for Targeted Molecule Generation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
While artificial intelligence has advanced de novo molecule generation, effectively leveraging transcriptomic data to guide biologically meaningful molecular design remains underexplored. Here we present TETGen (Transcriptome-Enhanced Transformer Generator), a Transformer-based model that integrates transcriptomic profiles to generate molecules with targeted biological activities. TETGen incorporates biological context through gene vectors derived from protein-protein interaction network random walks, which serve as functional positional encodings. The model further supervises its cross-attention mechanism using transcriptomic profiles during training to ensure biological relevance. TETGen efficiently generates compounds with favorable drug-like properties and synthetic accessibility. Evaluation demonstrates that molecules with transcriptomic matching scores exceeding 0.6 exhibit significantly higher structural similarity to known bioactive compounds compared to those with lower scores. In a case study using NEK7-knockout transcriptomic data, TETGen generated candidate molecular glues whose stable binding modes were corroborated by docking and 100 ns molecular dynamics simulations, with the lead candidate maintaining persistent hydrogen bonding with NEK7 Glu37 and π-π stacking with CRBN Trp386. This work presents an interpretable generative framework that effectively bridges chemical and biological spaces, offering a computational tool for phenotype-inspired drug design.