RepeatFiller newly identifies megabases of aligning repetitive sequences and improves annotations of conserved non-exonic elements

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Transposons and other repetitive sequences make up a large part of complex genomes. Repetitive sequences can be co-opted into a variety of functions and thus provide a source for evolutionary novelty. However, comprehensively detecting ancestral repeats that align between species is difficult because considering all repeat-overlapping seeds in alignment methods that rely on the seed-and-extend heuristic results in prohibitively high runtimes.

Results

Here, we show that ignoring repeat-overlapping alignment seeds when aligning entire genomes misses numerous alignments between repetitive elements. We present a tool, RepeatFiller, that improves genome alignments by incorporating previously undetected local alignments between repetitive sequences. By applying RepeatFiller to genome alignments between human and 20 other representative mammals, we uncover between 22 and 84 Mb of previously undetected alignments that mostly overlap transposable elements. We further show that the increased alignment coverage improves the annotation of conserved non-exonic elements, both by discovering numerous novel transposon-derived elements that evolve under constraint and by removing thousands of elements that are not under constraint in placental mammals.

Conclusions

RepeatFiller contributes to comprehensively aligning repetitive genomic regions, which facilitates studying transposon co-option and genome evolution. Source code: https://github.com/hillerlab/GenomeAlignmentTools

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giz132

    Ekaterina Osipova 1Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany2Max Planck Institute for the Physics of Complex Systems, Dresden, Germany3Center for Systems Biology Dresden, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteNikolai Hecker 1Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany2Max Planck Institute for the Physics of Complex Systems, Dresden, Germany3Center for Systems Biology Dresden, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteMichael Hiller 1Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany2Max Planck Institute for the Physics of Complex Systems, Dresden, Germany3Center for Systems Biology Dresden, GermanyFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteFor correspondence: hiller@mpi-cbg.de

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giz132 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.101980 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.101981