Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection

Tazro Ohta
Tomoya Tanjo
Osamu Ogasawara

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (GigaScience)

Abstract

Background

Container virtualization technologies such as Docker are popular in the bioinformatics domain because they improve the portability and reproducibility of software deployment. Along with software packaged in containers, the standardized workflow descriptors Common Workflow Language (CWL) enable data to be easily analyzed on multiple computing environments. These technologies accelerate the use of on-demand cloud computing platforms, which can be scaled according to the quantity of data. However, to optimize the time and budgetary restraints of cloud usage, users must select a suitable instance type that corresponds to the resource requirements of their workflows.

Results

We developed CWL-metrics, a utility tool for cwltool (the reference implementation of CWL), to collect runtime metrics of Docker containers and workflow metadata to analyze workflow resource requirements. To demonstrate the use of this tool, we analyzed 7 transcriptome quantification workflows on 6 instance types. The results revealed that choice of instance type can deliver lower financial costs and faster execution times using the required amount of computational resources.

Conclusions

CWL-metrics can generate a summary of resource requirements for workflow executions, which can help users to optimize their use of cloud computing by selecting appropriate instances. The runtime metrics data generated by CWL-metrics can also help users to share workflows between different workflow management frameworks.

GigaScience
Jan 23, 2022

Now published in GigaScience doi: 10.1093/gigascience/giz052

Tazro Ohta 1Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Yata 1111, Mishima, Shizuoka 411-8540, JapanFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Tazro OhtaTomoya Tanjo 2National Institute of Informatics, Research Organization of Information and Systems, Tokyo 101–8430, JapanFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Tomoya TanjoOsamu Ogasawara 3DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Yata, Mishima 411-8540, JapanFind this author on Google ScholarFind this author on PubMedSearch for this …

Now published in GigaScience doi: 10.1093/gigascience/giz052

Tazro Ohta 1Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Yata 1111, Mishima, Shizuoka 411-8540, JapanFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Tazro OhtaTomoya Tanjo 2National Institute of Informatics, Research Organization of Information and Systems, Tokyo 101–8430, JapanFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Tomoya TanjoOsamu Ogasawara 3DNA Data Bank of Japan, National Institute of Genetics, Research Organization of Information and Systems, Yata, Mishima 411-8540, JapanFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Osamu Ogasawara

A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giz052 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

These peer reviews were as follows:

Reviewer 1: http://dx.doi.org/10.5524/REVIEW.101638 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.101639 Reviewer 3: http://dx.doi.org/10.5524/REVIEW.101640

Read the original source
Version published to 10.1093/gigascience/giz052
Apr 1, 2019
Version published to 10.1101/456756 on bioRxiv
Oct 30, 2018

An Intelligent Green Controller for Dynamic Resource Provisioning in Heterogeneous Cloud–Edge IoT Systems

This article has 5 authors:
1. Kalpit Soni
2. Mubina Malik
3. Dhatri Raval
4. Unnati Patel
5. Atul Patel
This article has no evaluationsLatest version Jan 8, 2026
Design and Implementation of a Scalable Cloud-Based Management System Using AWS

This article has 2 authors:
1. Micheal Williams
2. Jack Wilson
This article has no evaluationsLatest version Jan 8, 2026
Enhancing HPC Job Run Time Predictions leveraging Machine Learning, Historical Job Data, and Metaheuristic Optimization

This article has 4 authors:
1. Suja Ramachandran
2. M. L. Jayalal
3. M. Vasudevan
4. R. Jehadeesan
This article has no evaluationsLatest version Dec 15, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Results

Conclusions

Article activity feed

Related articles

An Intelligent Green Controller for Dynamic Resource Provisioning in Heterogeneous Cloud–Edge IoT Systems

Design and Implementation of a Scalable Cloud-Based Management System Using AWS

Enhancing HPC Job Run Time Predictions leveraging Machine Learning, Historical Job Data, and Metaheuristic Optimization