TFBSFootprinter: a multiomics tool for prediction of transcription factor binding sites in vertebrate species

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Transcription factor (TF) proteins play a critical role in the regulation of eukaryotic gene expression via sequence-specific binding to genomic locations known as transcription factor binding sites (TFBSs). Accurate prediction of TFBSs is essential for understanding gene regulation, disease mechanisms, and drug discovery. These studies are therefore relevant not only in humans but also in model organisms and domesticated and wild animals. However, current tools for the automatic analysis of TFBSs in gene promoter regions are limited in their usability across multiple species. To our knowledge, no tools currently exist that allow for automatic analysis of TFBSs in gene promoter regions for many species.

Methodology and Findings

The TFBSFootprinter tool combines multiomic transcription-relevant data for more accurate prediction of functional TFBSs in 124 vertebrate species. In humans, this includes vertebrate sequence conservation (GERP), proximity to transcription start sites (FANTOM5), correlation of expression between target genes and TFs predicted to bind promoters (FANTOM5), overlap with ChIP-Seq TF metaclusters (GTRD), overlap with ATAC-Seq peaks (ENCODE), eQTLs (GTEx), and the observed/expected CpG ratio (Ensembl).

TFBSFootprinter analyses are based on the Ensembl transcript ID for simplicity of use and require minimal setup steps. Benchmarking of the TFBSFootprinter on a manually curated and experimentally verified dataset of TFBSs produced superior results when using all multiomic data (average area under the receiver operating characteristic curve, 0.881), compared with DeepBind (0.798), FIMO (0.817) and traditional PWM (0.854). The results were further improved by selecting the best overall combination of multiomic data (0.910). Additionally, we determined combinations of multiomic data that provide the best model of binding for each TF. TFBSFootprinter is available as Conda and Python packages.

Article activity feed