PromoterPredict: sequence-based modelling of Escherichia coli σ ⁷⁰ promoter strength yields logarithmic dependence between promoter strength and sequence

Ramit Bharanikumar
Keshav Aditya R. Premkumar
Ashok Palaniappan

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (PeerJ)

Abstract

We present PromoterPredict, a dynamic multiple regression approach to predict the strength of Escherichia coli promoters binding the σ ⁷⁰ factor of RNA polymerase. σ ⁷⁰ promoters are ubiquitously used in recombinant DNA technology, but characterizing their strength is demanding in terms of both time and money. We parsed a comprehensive database of bacterial promoters for the −35 and −10 hexamer regions of σ ⁷⁰ -binding promoters and used these sequences to construct the respective position weight matrices (PWM). Next we used a well-characterized set of promoters to train a multivariate linear regression model and learn the mapping between PWM scores of the −35 and −10 hexamers and the promoter strength. We found that the log of the promoter strength is significantly linearly associated with a weighted sum of the −10 and −35 sequence profile scores. We applied our model to 100 sets of 100 randomly generated promoter sequences to generate a sampling distribution of mean strengths of random promoter sequences and obtained a mean of 6E-4 ± 1E-7. Our model was further validated by cross-validation and on independent datasets of characterized promoters. PromoterPredict accepts −10 and −35 hexamer sequences and returns the predicted promoter strength. It is capable of dynamic learning from user-supplied data to refine the model construction and yield more robust estimates of promoter strength. PromoterPredict is available as both a web service ( https://promoterpredict.com ) and standalone tool ( https://github.com/PromoterPredict ). Our work presents an intuitive generalization applicable to modelling the strength of other promoter classes.

PeerJ
Nov 7, 2018

Read the original source
PeerJ
Nov 7, 2018

Read the original source
PeerJ
Nov 7, 2018

Read the original source
PeerJ
Nov 7, 2018

Read the original source
PeerJ
Nov 7, 2018

Read the original source
PeerJ
Nov 7, 2018

Read the original source
Version published to 10.7717/peerj.5862
Nov 7, 2018
Version published to 10.1101/287607 on bioRxiv
Mar 23, 2018

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

This article has 4 authors:
1. Hua-Lin Xu
2. Xiu-Jun Gong
3. Hua Yu
4. Ying-Kai Wang
This article has no evaluationsLatest version Dec 28, 2025
Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

This article has 1 author:
1. Hayden Farquhar
This article has no evaluationsLatest version Feb 4, 2026
High-resolution binding data of TFIID and cofactors show promoter-specific differences in vivo

This article has 4 authors:
1. Julia Zeitlinger
2. Sergio García-Moreno Alcántara
3. Simon Bourdareau
4. Melanie Weilert
This article has no evaluationsLatest version Jan 30, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DNABERT2-CAMP: A Hybrid Transformer-CNN Model for E. coli Promoter Recognition

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods

High-resolution binding data of TFIID and cofactors show promoter-specific differences in vivo