TSProm: Deciphering the Genomic Context of Tissue Specificity

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Characterizing tissue-specific (TSp) gene expression is crucial for understanding development and disease; however, traditional expression-based methods often overlook the latent “regulatory grammar” embedded in the non-coding DNA, particularly in distal promoter regions. Here, we introduce TSProm , a framework that specializes a DNA foundation model (DNABERT2) to decipher the long-range regulatory logic of TSp promoters at the gene isoform level. The contributions of our work are two-fold. First, we propose a novel comparative design that trains two distinct models, A: for general promoter biology and B: for TSp regulation. These models enable the precise isolation of sequence motifs around the transcription start site that uniquely define tissue identity. Second, we introduce a comprehensive explainable AI (xAI) module that integrates attention-based discovery with model-agnostic SHAP analysis to provide robust, cross-validated interpretations of learned features. Applying this framework to human brain, liver, and testis promoters, we identified and validated clinically relevant transcription factors (TFs) in the brain, including SP1, MYC , and HES6 , and confirmed their known roles in diseases such as gliomas and neuroblastomas. Our analysis further revealed that C2H2 Zinc Finger proteins are a dominant feature of the global landscape of TSp gene regulation. TSProm provides a novel and interpretable framework for identifying TSp gene regulatory elements, offering powerful computational tools for the study of tissue-specific gene regulation in normal and disease conditions.

Article activity feed