kamino: proteome-wide variant calling for amino acid phylogenomics

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Amino acid-based phylogenetics usually relies on first clustering and aligning orthologous proteins. This approach is powerful but computationally demanding. Here, we present kamino, a reference-free and alignment-free method that builds amino acid phylogenomic alignments directly from proteomes. kamino adapts a local graph-based variant-calling algorithm to efficiently identify variable homologous positions among proteins and concatenate these polymorphic regions. Across diverse prokaryotic and eukaryotic datasets, we showed that kamino is able to generate good quality alignments. Phylogenetic analyses revealed that kamino generally recovered signals broadly similar to those obtained from marker-based approaches, while being much faster. Its main limitations are reduced performance on deeply divergent prokaryotic datasets and substantial memory requirements for large eukaryotic datasets. kamino therefore provides a fast and simple approach for constructing phylogenomic amino acid alignments, complementing classical marker-based workflows. The program is implemented in Rust and is freely available at https://github.com/rderelle/kamino .

Article activity feed