S3Mirror: Making Genomic Data Transfers Fast, Reliable, and Observable with DBOS
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
To meet the needs of a large pharmaceutical organization, we set out to create S3Mirror - an application for transferring large genomic sequencing datasets between S3 buckets quickly, reliably, and observably. We used the DBOS-Transact durable execution framework to achieve these goals and benchmarked the performance and cost of the application. S3Mirror is an open source DBOS Python application that can run in a variety of environments, including DBOS Cloud Pro where it runs as much as 40x faster than AWS DataSync at a fraction of the cost. Moreover, S3Mirror is resilient to failures and allows for real-time file-wise observability of ongoing and past transfers.