Unintended Creation or Insertion of Antisense Promoter Motifs During Codon Optimization: A Cyber-Biosecurity Risk
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Codon optimization is a cornerstone technique in synthetic biology and biotechnological production, aimed at enhancing heterologous protein expression through synonymous codon substitutions. While optimization traditionally focuses on forward-strand translation efficiency, its impact on the complementary DNA strand is not always carefully examined. In this study, we investigate whether codon optimization inadvertently introduces antisense motifs, specifically bacterial antisense promoter (e.g., “TATAAT”), and whether such motifs can be silently inserted into coding sequences on purpose without altering protein output. We developed a computational pipeline that (i) scans optimized sequences for antisense motifs. These could be either natural or synthetic unintended motifs; (ii) implements a silent insertion algorithm that preserves amino acid sequence; and (iii) evaluates insertion feasibility across a large genomic dataset. These components can also lead to useful scanning of synthetic sequences, before they are synthesized or ordered. It has the potential to save a great deal of time and money that might be spent in wet labs that are using the wrong sequences. Their experiments often fail due to predictable reasons, while these failures can be avoided using the software (SW) we developed, which is published here as an open source for academic and industrial usage. In a dataset of 484,741 protein-coding sequences, only 4.8% naturally contained the motif, yet 77.28% of motif-free sequences permitted silent insertions. We extend these findings with codon bias analysis, derive analytical bounds for insertion complexity, and propose computational defense strategies. These results uncover a novel cyber-biosecurity vulnerability in DNA design pipelines, emphasizing the need for bi-directional screening in codon optimization tools.