High-speed whole-genome sequencing of a Whippet: Rapid chromosome-level assembly and annotation of an extremely fast dog’s genome
Curation statements for this article:-
Curated by GigaByte
Editors Assessment:
This Data Release paper presents the genome of the whippet breed of dog. Demonstrating a streamlined laboratory and bioinformatics workflows with PacBio HiFi long-read whole-genome sequencing that enables the generation of a high-quality reference genome within one week. The genome study being a collaboration between an academic biodiversity institute and a medical diagnostic company. The presented method of working and workflow providing examples that can be used for a wide range of future human and non-human genome projects. The final is 2.47 Gbp assembly being of high quality - with a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. This reference being scaffolded into 39 chromosome-length scaffolds and the annotation resulting in 28,383 transcripts. The results also looked at the Myostatin gene which can be used for breeding purposes, as these heterozygous animals can have an advantage in dog races. The reviewers making the authors clarify this part a little better with additional results. Overall this study demonstrating how rapidly animal genome research can be carried out through close and streamlined time management and collaboration.
This evaluation refers to version 1 of the preprint
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaByte)
- @scotted400's saved articles (scotted400)
Abstract
The time required for genome sequencing and de novo assembly depends on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow, often constrained by external sequencing services. Bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities, we aimed at generating a high-quality mammalian de novo genome in minimal time. We present the first chromosome-level genome assembly of the Whippet, using PacBio long-read high-fidelity sequencing and reference-guided scaffolding. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. Annotation using mammalian genomes and transcriptome data yielded 28,383 transcripts, 90.9% complete BUSCO genes, and identified 36.5% repeat content. Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week, adding another high-quality reference genome to the available sequences of domestic dog breeds.
Article activity feed
-
Editors Assessment:
This Data Release paper presents the genome of the whippet breed of dog. Demonstrating a streamlined laboratory and bioinformatics workflows with PacBio HiFi long-read whole-genome sequencing that enables the generation of a high-quality reference genome within one week. The genome study being a collaboration between an academic biodiversity institute and a medical diagnostic company. The presented method of working and workflow providing examples that can be used for a wide range of future human and non-human genome projects. The final is 2.47 Gbp assembly being of high quality - with a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. This reference being scaffolded into 39 chromosome-length scaffolds and the annotation resulting in 28,383 transcripts. The results also looked at the Myostatin gene which can be …
Editors Assessment:
This Data Release paper presents the genome of the whippet breed of dog. Demonstrating a streamlined laboratory and bioinformatics workflows with PacBio HiFi long-read whole-genome sequencing that enables the generation of a high-quality reference genome within one week. The genome study being a collaboration between an academic biodiversity institute and a medical diagnostic company. The presented method of working and workflow providing examples that can be used for a wide range of future human and non-human genome projects. The final is 2.47 Gbp assembly being of high quality - with a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. This reference being scaffolded into 39 chromosome-length scaffolds and the annotation resulting in 28,383 transcripts. The results also looked at the Myostatin gene which can be used for breeding purposes, as these heterozygous animals can have an advantage in dog races. The reviewers making the authors clarify this part a little better with additional results. Overall this study demonstrating how rapidly animal genome research can be carried out through close and streamlined time management and collaboration.
This evaluation refers to version 1 of the preprint
-
AbstractBackground The time required for sequencing and de novo assembly of genomes is highly dependent on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow. As a result, genome projects are often not only limited by financial, computational and sequencing platform resources, but also delayed by second party sequencing service providers. By bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities and know-how, we aimed at generating a high-quality mammalian de novo genome in the shortest possible time period. Therefore, we streamlined all processes involved and chose a very fast dog as a model: The Whippet.Findings We present the first chromosome-level genome assembly of the Whippet. We used PacBio long-read HiFi sequencing …
AbstractBackground The time required for sequencing and de novo assembly of genomes is highly dependent on the interaction between laboratory work, sequencing capacity, and the bioinformatics workflow. As a result, genome projects are often not only limited by financial, computational and sequencing platform resources, but also delayed by second party sequencing service providers. By bringing together academic biodiversity institutes and a medical diagnostics company with extensive sequencing capabilities and know-how, we aimed at generating a high-quality mammalian de novo genome in the shortest possible time period. Therefore, we streamlined all processes involved and chose a very fast dog as a model: The Whippet.Findings We present the first chromosome-level genome assembly of the Whippet. We used PacBio long-read HiFi sequencing and reference-guided scaffolding to generate a high-quality genome assembly. The final assembly has a contig N50 of 55 Mbp and a scaffold N50 of 65.7 Mbp. The total assembly length is 2.47 Gbp, of which 2.43 Gpb were scaffolded into 39 chromosome-length scaffolds. In addition, we used available mammalian genomes and transcriptome data to annotate the genome assembly. The annotation resulted in 28,383 transcripts resembling a total of 90.9% complete BUSCO genes and identified a repeat content of 36.5%.Conclusions Sequencing, assembling, and scaffolding the chromosome-level genome of the Whippet took less than a week and adds a high-quality reference genome to the list of domestic dog breeds sequenced to date.
This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.134), and has published the reviews under the same license. These are as follows.
Reviewer 1. Tianming Lan
The authors provided an example of High-speed strategy for whole-genome sequencing, genome assembly and annotation for species and take an example with the Whippet dog. This is a very novel idea under the genomic era with plummeting sequencing cost, fast accumulated sequencing data but shortage of computing resources. The authors also provide a very high-quality reference genome for the Whippet dog species with very good contiguity, accuracy and completeness. However, I have several concerns need the authors to further consider before it could be published at the journal of GigaByte.
Q1. There are too many keywords. Can the authors reduce a few? Biodiversity conservation, Comparative genomics, and evolutionary biology does not make sense in this manuscript. Q2. The authors performed reference-guided scaffolding analysis with the German Shepherd dog genome (GCA_011100685.1) as reference. Better if the authors explain why they selected this genome as the reference as there are several published dog genomes? Q3.The part of Heterozygosity make no sense to this manuscript unless there is a reasonable connection with other parts, because the dog is not a threatened species and also not a very special breed facing extensive inbreeding abd accumulation of deleterious mutations? Q4. The part of Myostatin doesn’t make sense to me, as I have read the paper the author cited and found that not all Whippet have this mutation? They sequenced 22 individuals, and 4 individuals are homozygous (-/-), 5 are heterozygous (mh/+) and the rest are homozygous (+/+). So you can always have a result by checking this mutation, but make no sense. Furthermore, one individual can hardly represent a species or a population? At the beginning of this paragraph, please change “Since” to “Since”. Q5. I think the most important find in this manuscript is how the authors finished a high-quality genome within a very short-term working. I suggest the authors remove the descriptions of Heterozygosity and Myostatin, but added a paragraph to tell readers the basic needs or standards for such a short-term work for genome assembly for a genome of something like dog. Just a suggestion, but I think would be better to improve the manuscript.
Reviewer 2. Xiaobo Wang
This study outlines an approach to expedite the sequencing and de novo assembly of genomes by leveraging collaboration between academic biodiversity institutes and a medical diagnostics company with advanced sequencing capabilities. The primary focus was on generating a high-quality de novo genome of the Whippet, a fast dog breed, within an accelerated timeframe. Below are some specific comments I would like to highlight.
- The authors mentioned the use of QUAST and QualiMap software tools to assess the genome of the Whippet; however, the corresponding results were not presented in the manuscript.
- The authors' reliance solely on mammalian protein sequences for homology annotation means that unique genes specific to the Whippet remain unannotated. The discrepancy of approximately 7% between the completeness assessments of the gene set and the genome via BUSCO further underscores the incomplete nature of the gene set. To address this, I recommend integrating transcriptome data, at the very least, to incorporate de novo annotation results. This addition should enhance the comprehensiveness and accuracy of gene annotations for the Whippet genome.
- The authors claim the absence of reported mutations in the Mstn gene but have not provided corroborating evidence, such as read alignment results from the genomic region, to verify that this is not due to assembly errors.
- If feasible, I propose integrating second-generation sequencing to further polish the genome and elevate its quality.
-
-