SEEDS: Simulating Emergence of Errors in DNA Storage

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: DNA storage is a nonvolatile memory technology for storing data as synthetic DNA strings which offers unprecedented storage density and durability. Yet, the application of DNA as a practical digital information storage medium remains an enigma, since this is extremely expensive and it takes a substantial amount of time to encode and decode data to/from synthetic DNA. More importantly, various phases of DNA storage pipeline (e.g., synthesis, sequencing, etc.) are error prone. Furthermore, DNA is subject to decay over time and the reliability of the synthetic DNA depends on various aspects, including preservation medium and temperature. To allow for the perfect storage and recovery of the information and thereby making it competitive with the existing flash or tape based technologies, advanced error protection schemes are necessary. However, evaluating and comparing various DNA storage technologies and error correcting codes under realistic model conditions - comprising a wide array of synthesis medium, sequencing technologies, temperature and duration - is prohibitively time consuming and expensive. Results: In this study, we present SEEDS, an error model based simulator to mimic the process of accumulating errors at different phases of DNA storage. SEEDS is the first known simulator which incorporates various empirically derived statistical (or stochastic?) error models, mimicking the generation and propagation of different types of errors at various phases in DNA storage. It was assessed for its validity against the data from a number of published wet-lab experiments. Conclusions: SEEDS is easy to use and offers flexible and comprehensive parameter settings to mimic the error models in DNA storage. Validation against in vitro experimental results suggests its promise for emulating the stochastic models of error generation and propagation in DNA storage. SEEDS is available as a web interface with a server side application, along with portable cross-platform native applications (available at givethelink).

Article activity feed