Combining DNA and protein alignments to improve genome annotation with LiftOn
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species. LiftOn’s protein-centric algorithm considers both types of alignments, chooses optimal open reading frames, resolves overlapping gene loci, and finds additional gene copies where they exist. LiftOn can reliably transfer annotation between genomes representing members of the same species, as we demonstrate on human, mouse, honey bee, rice, and Arabidopsis thaliana . It can further map annotation effectively across species pairs as far apart as mouse and rat or Drosophila melanogaster and D. erecta .
Article activity feed
-
Drosophila melanogaster 53 and D. erecta 54, 55, with 0.07 Dashing 45 similarity score and 0.08 Mash-distance 46.
Can you add a sentence putting these scores into context? Such as what are the scores for human to chimpanzee?
-
We measured genomic distance using Dashing 45 and Mash 46.
Ah ok this type of information is what I was looking for earlier.
-
To demonstrate LiftOn’s effectiveness at mapping annotation between distinct but closely related species, we mapped human genes onto Pan troglodytes (chimpanzee). Finally, we illustrate that LiftOn works on more distantly related species by mapping annotation from Drosophila melanogaster to Drosophila erecta and from Mus musculus to Rattus norvegicus.
When reading this I wonder if you have guidelines for how distant of species this method will work for? For outside readers it might be useful to provide some metric of phylogenetic distance or DNA similarity that this is expected to work well for.
-