ShapeME: A tool and web front-end for de novo discovery of structural motifs underpinning protein-DNA interactions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Determining where transcriptional regulators bind within a genome is paramount to understanding how gene expression is regulated. Historically, position weight matrices (PWMs) have been used to define the binding preferences of DNA binding proteins 1 . However, PWMs treat the identity of each base in a sequence as an independent and additive measure of binding preference, which can limit their utility 2 . Models that consider higher order interactions between nearby bases yield greater success in predicting proteins’ binding to DNA, but for many proteins there is still substantial room for improvement in predicting and understanding the determinants of proteins’ binding to DNA 3 . In addition to DNA sequence motifs, structural motifs (e.g., a narrow minor groove width) are important determinants of binding for some DNA-binding proteins 4 . Despite the initial success of algorithms using structural features of DNA to predict binding properties of proteins from either ChIP-seq or SELEX data 5–8 , there remains a need for a de novo structural motif discovery framework which can be applied to data from a variety of experimental designs. Here, we present a unified workflow, capable of utilizing virtually any type of data representing sequence coverage or enrichment (e.g. ChIP-seq, RNA-seq, SELEX, etc.), to discover short structural motifs with explanatory power for a protein’s DNA binding preference. We couple the DNAshapeR algorithm 9 with our own information-theoretic approach to de novo motif discovery, and wrap shape and sequence motif inference and model selection into a single tool called ShapeME. Application of our structural motif discovery algorithm to proteins with ChIP-seq data in ENCODE datasets reveals a subset of proteins where short structural motifs outperform the best PWM for that protein as determined from the JASPAR database, or as identified by the sequence motif elicitation tool STREME. Our approach offers a powerful and versatile framework for inferring structural DNA binding motifs, and will complement current sequence-based motif elicitation tools in discovery of protein-DNA interaction principles. A web-based interface to ShapeME is available at https://seq2fun.dcmb.med.umich.edu/shapeme , with full source code available at https://github.com/freddolino-lab/ShapeME .

Article activity feed