SamplingDesign: RNA Design via Continuous Optimization with Coupled Variables and Monte-Carlo Sampling
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
RNA design aims to find an RNA sequence that can fold into a given target structure, which enables the creation of artificial RNA molecules with specific function, and has numerous applications in medicine. Computationally, it is particularly challenging due to two levels of combinatorial explosion: the exponentially large design space and the exponentially many competing structures for each design. As a result, heuristic methods such as local search have been popular for this task, but they cannot keep up with combinatorial explosion. We instead borrow two techniques from machine learning, continuous relaxation and Monte-Carlo sampling, to the RNA design problem. We formulate RNA design as continuous optimization, which starts with a distribution over all valid candidate sequences, and uses gradient descent to improve the expectation of an arbitrary objective function. We define novel sequence distributions using coupled variables to model the correlation between nucleotides. To make it universally applicable to any objective function, we use sampling to approximate the expected objective function, to estimate the gradient, and to select the final candidate. Compared to the state-of-the-art methods, our work consistently outperforms them in key metrics such as Boltzmann probability, ensemble defect, and energy gap, especially on long and hard-to-design puzzles in the Eterna100 benchmark. Our code is at http://github.com/weiyutang1010/ncrna_design.