Building alternative consensus trees and supertrees using k -means and Robinson and Foulds distance

Nadia Tahiri
Bernard Fichet
Vladimir Makarenkov

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Motivation

Each gene has its own evolutionary history which can substantially differ from evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. However, the output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree.

Results

We present a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some remarkable properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Caliński–Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. Special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus better suited for analyzing large evolutionary datasets.

Availability and implementation

Our KMeansSuperTreeClustering program along with its C++ source code is available at: https://github.com/TahiriNadia/KMeansSuperTreeClustering.

Supplementary information

Supplementary data are available at Bioinformatics online.

Version published to 10.1093/bioinformatics/btac326
May 17, 2022
ScreenIT
Mar 27, 2021
SciScore for 10.1101/2021.03.24.436812: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when …
SciScore for 10.1101/2021.03.24.436812: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2021.03.24.436812 on bioRxiv
Mar 25, 2021

Statistical inference of the Tree of Blobs of a phylogenetic network from quartet concordance factors

This article has 4 authors:
1. John A. Rhodes
2. Elizabeth S. Allman
3. Cecile Ané
4. Hector Baños
This article has no evaluationsLatest version May 31, 2026
Phylogenetic tree inference using generative models

This article has 5 authors:
1. Edo Dotan
2. Asaf Schers
3. Elya Wygoda
4. Tal Pupko
5. Yonatan Belinkov
This article has no evaluationsLatest version Jun 16, 2026
A machine learning framework for interpreting phylogenetic tree patterns in interkingdom horizontal gene transfer

This article has 3 authors:
1. Kevin Aguirre-Carvajal
2. Vinicio Armijos-Jaramillo
3. Cristian R. Munteanu
This article has no evaluationsLatest version May 24, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability and implementation

Supplementary information

Article activity feed

Related articles

Statistical inference of the Tree of Blobs of a phylogenetic network from quartet concordance factors

Phylogenetic tree inference using generative models

A machine learning framework for interpreting phylogenetic tree patterns in interkingdom horizontal gene transfer