Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer.

Results

We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences. This computational solution extracts information contents of the 2 sequences, exploiting a data compression technique to find rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image.

Conclusions

Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves, and Mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions were in accordance with previous studies, which took alignment-based approaches or performed FISH (fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was ~1 GB, which makes Smash++ feasible to run on present-day standard computers.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giaa048

    Morteza Hosseini 1IEETA/DETI, University of Aveiro, 3810-193 Aveiro, PortugalFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteFor correspondence: seyedmorteza@ua.ptDiogo Pratas 1IEETA/DETI, University of Aveiro, 3810-193 Aveiro, Portugal2Department of Virology, University of Helsinki, 00100 Helsinki, FinlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteBurkhard Morgenstern 3Department of Bioinformatics, University of Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany4Göttingen Center of Molecular Biosciences (GZMB), Justus-von-Liebig-Weg 11, 37077 Göttingen, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteArmando J. Pinho 1IEETA/DETI, University of Aveiro, 3810-193 Aveiro, PortugalFind this author on Google ScholarFind this author on PubMedSearch for this author on this site

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giaa048 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102233 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102234