Detection and annotation of unique regions in mammalian genomes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Long unique genomic regions have been reported to be highly enriched for developmental genes in mice and humans. In this paper, we identify unique genomic regions using an efficient method based on fast string matching. We quantify the resource consumption and accuracy of this method before applying it to the genomes of 18 mammals. We annotate their unique regions (URs) of at least 10 kb and find that they are strongly enriched for developmental genes across the board. We then investigated the subset of URs that lack annotations, which we call “anonymous.” The longest anonymous UR in the Tasmanian devil spanned 83 kb and contained the gene encoding inositol polyphosphate-5-phosphatase A, which is an essential part of intracellular signaling. This discovery of an essential gene in a UR implies that URs might be given priority when annotating mammalian genomes. Our documented pipeline for annotating URs in any mammalian genome is available from the repository github.com/evolbioinf/auger; the additional data for this study are available from the dataverse at doi.org/10.17617/3.4IKQAG.

Article activity feed