De novo transcriptome assembly and genome annotation of the fat-tailed dunnart ( Sminthopsis crassicaudata )

Curation statements for this article:
  • Curated by GigaByte

    GigaByte logo

    Editors Assessment: Marsupial species are invaluable for comparative studies due to their distinctive modes of reproduction and development, but there are a shortage of genomic resources to do these types of analyses. To help address that data gap multi-tissue transcriptomes and transcriptome assemblies have been sequenced and shared for the fat-tailed dunnart (Sminthopsis crassicaudata), a mouse-like marsupial that due to is ease of breeding is emerging as a useful lab model. Using ONT nanopore and Pacbio long-reads and illumina short reads 2,093,982 transcripts were sequenced and assembled, and functional annotation of the assembled transcripts was also carried out. Some addition work was required to provide more details on the QC metrics and access to the data but this was resolved during review. This work ultimately producing dunnart genome assembly measuring 3.23 Gb in length and organized into 1,848 scaffolds, with a scaffold N50 value of 72.64 Mb. These openly available resources hopefully provide novel insights into the unique genomic architecture of this unusual species and provide valuable tools for future comparative mammalian studies.

    This evaluation refers to version 1 of the preprint

This article has been Reviewed by the following groups

Read the full article

Abstract

Marsupials exhibit highly specialized patterns of reproduction and development, making them uniquely valuable for comparative genomics studies with their sister lineage, eutherian (also known as placental) mammals. However, marsupial genomic resources still lag far behind those of eutherian mammals, limiting our insight into mammalian diversity. Here, we present a series of novel genomic resources for the fat-tailed dunnart ( Sminthopsis crassicaudata ), a mouse-like marsupial that, due to its ease of husbandry and ex-utero development, is emerging as a laboratory model. To enable wider use, we have generated a multi-tissue de novo transcriptome assembly of dunnart RNA-seq reads spanning 12 tissues. This highly representative transcriptome is comprised of 2,093,982 assembled transcripts, with a mean transcript length of 830 bp. The transcriptome mammalian BUSCO completeness score of 93% is the highest amongst all other published marsupial transcriptomes. Additionally, we report an improved fat-tailed dunnart genome assembly which is 3.23 Gb long, organized into 1,848 scaffolds, with a scaffold N50 of 72.64 Mb. The genome annotation, supported by assembled transcripts and ab initio predictions, revealed 21,622 protein-coding genes. Altogether, these resources will contribute greatly towards characterizing marsupial biology and mammalian genome evolution.

Article activity feed

  1. Editors Assessment: Marsupial species are invaluable for comparative studies due to their distinctive modes of reproduction and development, but there are a shortage of genomic resources to do these types of analyses. To help address that data gap multi-tissue transcriptomes and transcriptome assemblies have been sequenced and shared for the fat-tailed dunnart (Sminthopsis crassicaudata), a mouse-like marsupial that due to is ease of breeding is emerging as a useful lab model. Using ONT nanopore and Pacbio long-reads and illumina short reads 2,093,982 transcripts were sequenced and assembled, and functional annotation of the assembled transcripts was also carried out. Some addition work was required to provide more details on the QC metrics and access to the data but this was resolved during review. This work ultimately producing dunnart genome assembly measuring 3.23 Gb in length and organized into 1,848 scaffolds, with a scaffold N50 value of 72.64 Mb. These openly available resources hopefully provide novel insights into the unique genomic architecture of this unusual species and provide valuable tools for future comparative mammalian studies.

    This evaluation refers to version 1 of the preprint

  2. AbstractMarsupials exhibit highly specialized patterns of reproduction and development, making them uniquely valuable for comparative genomics studies with their sister lineage, eutherian (also known as placental) mammals. However, marsupial genomic resources still lag far behind those of eutherian mammals, limiting our insight into mammalian diversity. Here, we present a series of novel genomic resources for the fat-tailed dunnart (Sminthopsis crassicaudata), a mouse-like marsupial that, due to its ease of husbandry and ex-utero development, is emerging as a laboratory model. To enable wider use, we have generated a multi-tissue de novo transcriptome assembly of dunnart RNA-seq reads spanning 12 tissues. This highly representative transcriptome is comprised of 2,093,982 assembled transcripts, with a mean transcript length of 830 bp. The transcriptome mammalian BUSCO completeness score of 93% is the highest amongst all other published marsupial transcriptomes. Additionally, we report an improved fat-tailed dunnart genome assembly which is 3.23 Gb long, organized into 1,848 scaffolds, with a scaffold N50 of 72.64 Mb. The genome annotation, supported by assembled transcripts and ab initio predictions, revealed 21,622 protein-coding genes. Altogether, these resources will contribute greatly towards characterizing marsupial biology and mammalian genome evolution.

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.118), and has published the reviews under the same license. These are as follows.

    Reviewer 1: Qiye Li

    For the ONT, PacBio and Illumina data for genome assembly, is there any new data that was generated in this manuscript? Are all of the data collected from the same individual? If so, what is the gender of the individual for genome assembly? It will be appreciated to make this information clear to readers. Page 3: I think "Pacific Biosciences CRL" should be modified to "Pacific Biosciences CLR"

    Reviewer 2. Emma Peel.

    Are all data available and do they match the descriptions in the paper?

    No. The figshare link doesn't work, but I'm presuming this is because the paper hasn't been published? Will data be accessioned in the GigaScience Database to ensure accessiblity? The illumina short-read genomic and RNAseq datasets are available through NCBI and match descriptions in the paper. I was unable to find the raw PB and ONT data from [68] that was used to generate the genome assembly. The authors of [68] indicate these datasets are available in supplementary table 3, but if you click through the figshare link in this table the raw data isn't there, nor anywhere else listed in the data availability section. Can the authors please clarify the location of the raw data and update the data availability section of this manuscript accordingly.

    Are the data and metadata consistent with relevant minimum information or reporting standards?

    Yes. Access to the GigaDB accession hasn't been provided, so I am unable to determine if the data and metadata is consistent with minimum information reporting standards according to the GigaDB checklists.

    Is the data acquisition clear, complete and methodologically sound?

    Yes. Some minor clarifications are required, see comments in the PDF. For example, please include detail on how RNA quality was determined (e.g. RIN numbers) and provide more detail regarding method of library preparation, flowcell and instrument used for Illumina sequencing.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    Yes. The only detail lacking is the method of transcript quantification used to determine the top 90% most highly expressed transcripts.

    Is the validation suitable for this type of data?

    Yes. Data validation is suitable, however I would like to see a comparison of v1.1 genome assembly with other marsupial genome assemblies.

    Additional Comments:

    This study is an important addition to marsupial omics resources, and I was excited to see such a comprehensive set of transcriptomes. My main comment is the need to explain and discuss the initial assembly (v1) in the introduction to provide context for the improved assembly. See comments in the attached PDF.

    Annotated paper: https://gigabyte-review.rivervalleytechnologies.comdownload-api-file?ZmlsZV9wYXRoPXVwbG9hZHMvZ3gvRFIvNDg3L2d4LURSLTE3MDE2Njk5NzdfRVAgKDIpLnBkZg==