The first Antechinus reference genome provides a resource for investigating the genetic basis of semelparity and age-related neuropathologies

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Antechinus are a genus of mouse-like marsupials that exhibit a rare reproductive strategy known as semelparity and also naturally develop age-related neuropathologies similar to those in humans. We provide the first annotated antechinus reference genome for the brown antechinus (Antechinus stuartii). The reference genome is 3.3 Gb in size with a scaffold N50 of 73Mb and 93.3% complete mammalian BUSCOs. Using bioinformatic methods we assign scaffolds to chromosomes and identify 0.78 Mb of Y-chromosome scaffolds. Comparative genomics revealed interesting expansions in the NMRK2 gene and the protocadherin gamma family, which have previously been associated with aging and age-related dementias respectively. Transcriptome data displayed expression of common Alzheimer’s related genes in the antechinus brain and highlight the potential of utilising the antechinus as a future disease model. The valuable genomic resources provided herein will enable future research to explore the genetic basis of semelparity and age-related processes in the antechinus.

Article activity feed

  1. Now published in Gigabyte doi: 10.46471/gigabyte.7

    This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    **Reviewer 1: Qiye Li ** Since I am unable to access the data submitted to NCBI or GigaDB, I cannot judge this issue currently. Please make sure that the gene annotation, repeat annotation, transcriptome assembly, gene expression matrix, and genetic variant data have been uploaded somewhere in addition to the raw reads and genome assembly.

    While the bioinformatic tools used in all the steps are indicated clearly, the parameters for many tools are not defined.

    What is the gap ratio (i.e. % of unclosed gaps or Ns) of the genome assembly? As I know, the raw Supernova assembly may have a high proportion of gaps, although the scaffold N50 is pretty good. Additional gap closer steps (e.g. using GapCloser, RRID:SCR_015026) would improve the completeness of the assembly.

    BUSCO analysis is competent to access the completeness of the protein-coding gene space of the genome assembly. But a good BUSCO score does not necessarily mean good assembly completeness. Another conventional way to demonstrate the completeness of the assembly is to show the metrics of DNA read mapping, such as the overall mapping rate, % in proper pair, % of covered bases, etc.

    How is the completeness of the gene set generated by the Fgenesh++ pipeline? I suggest that the authors provide BUSCO score for the Fgenesh++ gene set as they did for the transcriptome assembly.

    Methods related to Alzheimer’s Genes Analysis: The methods used to identify the Alzheimer’s disease (AD) related human genes in antechinus seem to be flawed, as the authors only performed unidirectional searches for homologs in the antechinus gene set. I think the authors should identify bona fide orthologs of these AD-related genes in antechinus. The conventional way to determine orthologs between two species is based on a reciprocal best hit (RBH) strategy (i.e. RBHs between the human and antechinus gene sets).

    **Reviewer 2: Walter Wolfsberger ** PRJNA664282 accession number is not found on NCBI. Is it scheduled to be released with the publication?

    Appropriate tools were used for appropriate analyses. The Y chromosome identification approach seems sound.

    The bioinformatic approaches the authors tools are sound, with the right tools and approaches to the analysis.

    The prep-print is well worded and easy to understand and follow. It provides good amount of context, that justifies the extra analyses done in the publication. The assembly quality is adequate, with relatively low N50, but good completeness scores, given that mammalian genomes have higher levels of low complexity\repetitive content. The metrics presented adhere to the scope of GigaByte, and the data itself is valuable to the scientific community.