High Throughput Reproducible Literate Phylogenetic Analysis

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present a holistic approach from a literate programming perspective to frameand solve systems biology problems. In particular, given the large data-setsrequired for answering questions relating to evolutionary histories we focus onthe generalization and workflow required on a typical SLURM or PBS TORQUE queuedriven high performance computing cluster. We demonstrate how to leveragemultiple CLI tools compiled for efficient use in a portable manner onheterogeneous computational resources and further demonstrating the use of R togenerate literate data-driven plots and analysis. High Performance Computingcluster (HPC) bottlenecks and installation barriers are also discussed andmitigation strategies are developed. As a concrete example we demonstrate theestimation of a phylogenetic tree, used to pose and answer questions onevolutionary lineages. In this manner, a generalized approach which can be usedfor systems biology is elucidated for manipulating phylogenetic data, includingits validation, multiple sequence alignment, tree estimation through differentmodels and reproduction.

Article activity feed