Chlomito: a novel tool for precise elimination of organelle genome contamination in nuclear genome assemblies

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Accurate genome assemblies are crucial for understanding biological evolution, mechanisms of disease, and biodiversity. However, contamination from organelle genomes in nuclear genome analyses often leads to inaccuracies and unreliability in results. To address this issue, we developed a tool named Chlomito, which employs innovative algorithms to precisely identify and eliminate organelle genome contamination sequences from nuclear genome assemblies. Compared to conventional approaches, Chlomito can not only detect and eliminate organelle sequences but also effectively distinguish true organelle sequences from those transferred into the nucleus via horizontal gene transfer. To evaluate the accuracy of Chlomito, we conducted tests using sequencing data from Plum and Mango. The results confirmed that Chlomito can accurately detect contigs originating from the organelle genome, and the identified contigs covered most regions of the organelle reference genomes, demonstrating its efficiency and precision in comprehensively recognizing organelle genome sequences. Additionally, for user convenience, we packaged this method into a Docker image, simplifying the data processing workflow. Overall, Chlomito provides a highly efficient and accurate method for identifying and removing contigs derived from organelle genomes in genomic assembly data, thereby contributing to the improvement of genome assembly quality and advancing research in genomics and evolutionary biology.

Article activity feed

  1. e. By combining these two metrics, we can significantly improve the83accuracy of identifying and removing organelle genome sequences from genome assembly d

    I'm assuming the second metric relies on mapped reads. Did you consider identifying spanning reads as further evidence for your tool? If a read spans an organellar genome sequence and nuclear genome sequence (perhaps with k=21 bp overlap at minimum, or potentially higher), then I think that would show evidence of an HGT event

  2. Plum and Mang

    Would you be willing to provide details on the quality of these two genomes? How well known are the chloroplast and mitochondrial sequences in these (do they have gold-standard labels?)?