A Robust Framework for Predicting Mutation Effects on Transcription Factor Binding: Insights from Mutational Signatures in 560 Breast Cancer Genomes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: A vast majority of somatic mutations in cancer reside in noncoding regions, yet systematically predicting their functional impact on gene regulation remains a significant challenge. These variants often enforce their effects by altering the binding affinity of transcription factors (TFs) to cis-regulatory elements. However, a critical gap exists in linking specific mutational processes to the disruption of gene regulatory networks at a systems level. Results: In this study, we present a comprehensive in silico pipeline centered on k-mer-based linear regression models to quantify TF binding affinity. Our framework produced 403 high-confidence TF models trained on high-throughput ChIP-seq and PBM datasets. We applied this pipeline to 3.5 million somatic mutations from 560 breast cancer whole genomes to predict gain- or loss-of-function (GOF/LOF) binding perturbations. These predictions were integrated with mutational signature analysis and curated gene sets, utilizing Activity-by-Contact model-based enhancer-gene maps to link variants to their target genes. Our analysis revealed that distinct mutational processes exert non-random, directional effects on specific TF families. The APOBEC-associated signatures (SBS2 and SBS13) were strongly enriched for GOF events in the Myb/SANT and FOX families, while the aging-associated signature SBS1 was enriched for LOF events in the Ets family members. Furthermore, predicted perturbations at putative enhancers were significantly linked to key oncogenes and tumor suppressor genes, with GOF and LOF events (e.g., FOXA1 and BRCA1/2, respectively). In breast cancer samples, the basal-like TNBC subtype exhibited that SBS3-driven GOF enrichments for the CXXC family converged on MYC target gene programs, while SBS39-driven LOF events for the same family converged on DNA Repair pathways. Conclusions: Our framework provides a robust and scalable approach for prioritizing and interpreting the functional consequences of somatic mutations in terms of TF perturbations. We demonstrate that specific mutational processes systematically rewire the gene regulatory landscape in a subtype-specific manner, offering novel mechanisms for transcriptional deregulation in breast cancer.

Article activity feed