GENESPACE tracks regions of interest and gene copy number variation across multiple genomes

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This manuscript describing GENESPACE was found to be of high interest for the genomics community across many different fields. GENESPACE is a new and straightforward computational tool to include synteny information in the calculation of genome-wide sets of orthologs. This is very timely as more and more chromosome-scale assembled genomes are becoming available.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The development of multiple chromosome-scale reference genome sequences in many taxonomic groups has yielded a high-resolution view of the patterns and processes of molecular evolution. Nonetheless, leveraging information across multiple genomes remains a significant challenge in nearly all eukaryotic systems. These challenges range from studying the evolution of chromosome structure, to finding candidate genes for quantitative trait loci, to testing hypotheses about speciation and adaptation. Here, we present GENESPACE, which addresses these challenges by integrating conserved gene order and orthology to define the expected physical position of all genes across multiple genomes. We demonstrate this utility by dissecting presence–absence, copy-number, and structural variation at three levels of biological organization: spanning 300 million years of vertebrate sex chromosome evolution, across the diversity of the Poaceae (grass) plant family, and among 26 maize cultivars. The methods to build and visualize syntenic orthology in the GENESPACE R package offer a significant addition to existing gene family and synteny programs, especially in polyploid, outbred, and other complex genomes.

Article activity feed

  1. Evaluation Summary:

    This manuscript describing GENESPACE was found to be of high interest for the genomics community across many different fields. GENESPACE is a new and straightforward computational tool to include synteny information in the calculation of genome-wide sets of orthologs. This is very timely as more and more chromosome-scale assembled genomes are becoming available.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

  2. Reviewer #1 (Public Review):

    In this manuscript John Lovell and colleagues introduce GENESPACE. GENESPACE is a computational tool that filters (gene sequence based) ortholog annotations by considering the location in the genome to restrict orthologous relationships to syntenic regions. The syntenic regions can be selected according to the context of the study, for example to in- or exclude homeologous regions. In addition, GENESPACE uses its ortholog annotation for the definition of syntenic regions across the focal genomes so that broad-scale chromosomal events can be visualized in an evolutionary context. The manuscript then continues to show the application of GENESPACE in three different scenarios. The first analysis makes use of the broad-scale synteny annotation of GENESPACE to analyze the origin of vertebrate sex chromosomes. The second analysis explores synteny in grass genomes, and evaluates the possibility to find PAV in these genomes given three previously defined QTL regions where a single parental allele induced the phenotypic variation. The third application deals with the assignment of paralogs within grass genomes introduced by the ancient Rho WGD. Using GENESPACE's feature to ignore the first (best) hits (orthologs), it is possible to assign WGD-induced paralogs. GENESPACE seems to be highly useful in practice, and I do not know any other tool that would perform a similar task. I would envision the broad application of GENESPACE as it is agnostic to the species or species group as long as chromosome-level assemblies are available.

  3. Reviewer #2 (Public Review):

    The new tool GENESPACE implements a pipeline in R that combines two existing tools, OrthoFinder and MCScanX. OrthoFinder is a popular tool for finding certain groups of homologous genes within the sets of protein sequences of multiple species. It thereby constructs gene trees as well as a species tree in order to distinguish orthologs from paralogs and produces 'orthogroups'. OrthoFinder does not use the positions of the genes in the genome. The older MCScanX finds syntenic regions between multiple genomes. The GENESPACE pipeline calls OrthoFinder and MCScanX to identify orthogroups, using synteny to prevent that gene pairs are in an orthogroup that are not syntenically matched.

    The R package is relatively easy to install and run, the provided example runs through smoothly and it is straightforward to apply it to another annotated set of related genomes. The riparian plots give a good overview over large scale rearrangements and look neat although they are generated automatically. The 'pangenome' table of orthologous genes provide copy number differences and can be used to start any downstream analysis for orthologous sets of genes, such as a search for positive selection or accelerated evolution.

    The paper discusses several application cases of GENESPACE that are likely of great interest to the respective genomics communities. Unfortunately, though, it is not going into details when describing the algorithm. The method description that was given is not always clear.

    The plausibility and a better performance than OrthoFinder and MCScanX on their respective tasks is shown on polyploid and relatively closely related cotton genomes. However, a more comprehensive benchmark, in particular on data where the synteny is less pronounced was not done. It is therefore not clear up to what degree of synteny GENESPACE is better than OrthoFinder at inferring orthogroups.