Whole genome sequencing and assembly of the house sparrow, Passer domesticus

Curation statements for this article:
  • Curated by GigaByte

    GigaByte logo

    Editors Assessment:

    This paper presents present the genome sequencing of the house sparrow (Passer domesticus) carrying out genome assembly and annotation using in silico approaches with tools that could be a valuable resource for understanding passerine evolution, biology, ethnology, geography, and demography. The final genome assembly was generated using short read sequencing and a computational workflow that included Shovill, SPAdes, MaSuRCA, and BUSCO benchmarking. Producing a 922 MB reference genome with 24,152 genes. The first draft was significantly smaller than this but peer review provided suggestions on how to improve the assembly quality. And after a few attempts and assembly with a reasonable size and BUSCO score was achieved. This openly available data potentially serving as a valuable resource for checking adaptation, divergence, and speciation of birds.

    This evaluation refers to version 2 of the preprint

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

The common house sparrow, Passer domesticus, is a small bird belonging to the family Passeridae. Here, we provide high-quality whole-genome sequencing data along with its assembly for the house sparrow. The final genome assembly was generated using a workflow that included Shovill, SPAdes, MaSuRCA, and BUSCO. The assembly consists of contigs spanning 268,193 bases and coalescing around a 922 MB sized reference genome. We used rigorous statistical thresholds to check the coverage, as the Passer genome showed considerable similarity to the Gallus gallus (chicken) and Taeniopygia guttata (Zebra finch) genomes, also providing functional annotations. This new annotated genome assembly will be a valuable resource for comparative and population genomic analyses of passerine, avian, and vertebrate evolution.

Article activity feed

  1. Editors Assessment:

    This paper presents present the genome sequencing of the house sparrow (Passer domesticus) carrying out genome assembly and annotation using in silico approaches with tools that could be a valuable resource for understanding passerine evolution, biology, ethnology, geography, and demography. The final genome assembly was generated using short read sequencing and a computational workflow that included Shovill, SPAdes, MaSuRCA, and BUSCO benchmarking. Producing a 922 MB reference genome with 24,152 genes. The first draft was significantly smaller than this but peer review provided suggestions on how to improve the assembly quality. And after a few attempts and assembly with a reasonable size and BUSCO score was achieved. This openly available data potentially serving as a valuable resource for checking adaptation, divergence, and speciation of birds.

    This evaluation refers to version 2 of the preprint

  2. AbstractThe common house sparrow, Passer domesticus is a small bird belonging to the family Passeridae. Here, we provide high-quality whole genome sequence data along with assembly for the house sparrow. The final genome assembly was assembled using a Shovill/SPAdes/MASURCA/BUSCO workflow, consisting of contigs spanning 268193 bases and coalescing around a 922 MB sized reference genome. We employed rigorous statistical thresholds to check the coverage, as the Passer genome showed considerable similarity to Gallus gallus (chicken) and Taeniopygia guttata (Zebra finch) genomes, also providing a functional annotation. This new annotated genome assembly will be a valuable resource as a reference for comparative and population genomic analyses of passerine, avian, and vertebrate evolution.Significance Avian evolution has been of great interest in the context of extinction. Annotating the genomes such as passerines would be of significant interest as we could understand the behavior/foraging traits and further explore their evolutionary landscape. In this work, we provide a full genome sequence of Indian house sparrow, viz. Passer domesticus which will serve as a useful resource in understanding the adaptability, evolution, geography, allee effects and circadian rhythms.

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.161), and has published the reviews under the same license.

    Reviewer 1. Gang Wang

    Is the language of sufficient quality? Yes. There are many details in the article, such as citation format, spelling, etc. [Supplementary Table 3a, 3b, 3c) → (Supplementary Table 3a, 3b, 3c) The citation format of the article also needs to be adjusted according to the journal requirements.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction? No. A previous reviewer mentioned that RagTag could be used to improve the quality of genome assembly. I suggest you seriously consider this.

    Is there sufficient information for others to reuse this dataset or integrate it with other data? No

    Overall Comments: The article is logically clear and the analysis is complete. The description of both sample collection and sequencing is relatively clear. At the same time, the analysis process shown in Figure 1 is also very reasonable. However, as described by the previous reviewer, I suggest that you remove the high-quality level. There are many details in the article, such as citation format, spelling, etc. [Supplementary Table 3a, 3b, 3c) → (Supplementary Table 3a, 3b, 3c) The citation format of the article also needs to be adjusted according to the journal requirements. Figure 2, the letters of a and b are too different, please unify them. Figure 4 is completely unclear, please increase the font size. A previous reviewer mentioned that RagTag could be used to improve the quality of genome assembly. I suggest you seriously consider this. Re-review: The authors used FCS-GX to exclude contaminating sequences in the genome, so I agree that this paper should be published.

    Reviewer 2. Agustin Ariel Baricalla

    Are all data available and do they match the descriptions in the paper? No. Matching data: NCBI project with access to the NCBI-SRA deposited raw data. Nonmatching data: Oxford Nanopore data: The authors reply to a previously submitted manuscript arguing that this data was not used, but Fig. 1 refers to Nanopore Minion data. The manuscript body and the additional data section do not include the Quast and BUSCO reports or their corresponding plots.

    Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples http://gigadb.org/site/guide No. GigaByte suggests a checklist including the genome, CDS, and proteins in FASTA format, as well as the annotations in GFF format; however, these items are not available for evaluation.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes. The FastP step for raw data processing is mentioned in the results section but is not detailed in the methods section.

    Is there sufficient data validation and statistical analyses of data quality? No. The authors have not included the BUSCO results. The OrthoDB database for 'passeriformes_odb12' contains over 10,000 curated genes, representing approximately 50-60% of the total genes in a typical passeriform genome. Therefore, the BUSCO report for the new assembly should be provided. The author mentioned that "The gene completeness for Passer was assessed through Benchmarking Universal Single-Copy Orthologs ( Busco version 5.5.0 ) [26] by using the orthologous genes in the Gallus gallus [ chicken] genome" but BUSCO uses the OrthoDB datasets to run, I do not understand what this phrase refers to.

    Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes. All the procedures are consistent and the programs or pipelines are well-known and well documented in the bioinformatic and genomic fields.

    Additional Comments: The inclusion of the mitochondrial genome represents a significant improvement in this manuscript. I recommend presenting all nuclear results together first, followed by a separate and clear description of the mitochondrial analysis and findings to enhance clarity. The data is interesting for analyzing the genetic dynamics behind Passer domesticus adaptation and evolution and can show differences between the previous genomes available from a European reference sample but this is not presented in this work. As of this revision, the NCBI's Passer domesticus genome includes two European reference genomes, both classified with 'chromosome-like' status (NCBI: GCF_036417665.1 and GCA_001700915.1). These genomes can be utilized in two distinct ways: (1) performing a 'genome-guided assembly' with MASURCA, using one of these genomes alongside the Illumina data, or (2) conducting genome scaffolding by employing one of these genomes as a reference and the assembled genome from raw reads as a query, using tools like RagTag or the chromosome scaffolder available in MASURCA. Both approaches could potentially lead to improvements in scaffold number and contiguity metrics, such as N50, N90, and the largest scaffold.

    Re-review: The authors have subtly improved the original version previously presented, but have not managed to surpass the minimum standards established by the publisher to be published by the journal. Easily achievable changes have been requested to complement the analysis previously made and have been ignored. Requests have not been answered, graphics that generate confusion between them and the text presented have not been fixed, and no relevant improvement between the previous and current versions has been shown.