High Throughput Reproducible Literate Phylogenetic Analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We present a holistic approach from a literate programming perspective to frameand solve systems biology problems. In particular, given the large data-setsrequired for answering questions relating to evolutionary histories we focus onthe generalization and workflow required on a typical SLURM or PBS TORQUE queuedriven high performance computing cluster. We demonstrate how to leveragemultiple CLI tools compiled for efficient use in a portable manner onheterogeneous computational resources and further demonstrating the use of R togenerate literate data-driven plots and analysis. High Performance Computingcluster (HPC) bottlenecks and installation barriers are also discussed andmitigation strategies are developed. As a concrete example we demonstrate theestimation of a phylogenetic tree, used to pose and answer questions onevolutionary lineages. In this manner, a generalized approach which can be usedfor systems biology is elucidated for manipulating phylogenetic data, includingits validation, multiple sequence alignment, tree estimation through differentmodels and reproduction.