TSProm: Deciphering the Genomic Context of Tissue Specificity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Characterizing tissue-specific (TSp) gene expression is crucial for understanding development and disease; however, traditional expression-based methods often overlook the latent “regulatory grammar” embedded in the non-coding DNA, particularly in distal promoter regions. Here, we introduce TSProm , a framework that specializes a DNA foundation model (DNABERT2) to decipher the long-range regulatory logic of TSp promoters at the gene isoform level. The contributions of our work are two-fold. First, we propose a novel comparative design that trains two distinct models, A: for general promoter biology and B: for TSp regulation. These models enable the precise isolation of sequence motifs around the transcription start site that uniquely define tissue identity. Second, we introduce a comprehensive explainable AI (xAI) module that integrates attention-based discovery with model-agnostic SHAP analysis to provide robust, cross-validated interpretations of learned features. Applying this framework to human brain, liver, and testis promoters, we identified and validated clinically relevant transcription factors (TFs) in the brain, including SP1, MYC , and HES6 , and confirmed their known roles in diseases such as gliomas and neuroblastomas. Our analysis further revealed that C2H2 Zinc Finger proteins are a dominant feature of the global landscape of TSp gene regulation. TSProm provides a novel and interpretable framework for identifying TSp gene regulatory elements, offering powerful computational tools for the study of tissue-specific gene regulation in normal and disease conditions.