A chromosome-level genome assembly and annotation of the maize elite breeding line Dan340

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background Maize is an important model organism for genetics and genomics research. Though reference genomes of maize are available, some genomes of important genetic germplasms for maize breeding are still lacking, for instance, the cultivar Dan340, which is a backbone inbred line of the LvDa Red Cob Group with several desirable characteristics. In this study, we constructed a high-quality chromosome-level reference genome for Dan340 by using long HiFi reads, short reads, and Hi-C. The final assembly of the Dan340 genome was 2348.72 Mb, which was anchored to 10 chromosomes. Repeat sequences accounted for 73.40% of the genome and 39,733 protein-coding genes were annotated. Comparative genomic analysis between Dan340 and other maize lines identified that 1806 genes from 359 gene families were specific to Dan340. Conclusions Our genome assembly and annotation provide a valuable resource for improving maize breeding and further understanding the intraspecific genome diversity in maize.

Article activity feed

  1. Abstract

    This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.62), and has published the reviews under the same license. These are as follows.

    **Reviewer 1. Kapeel Chougule **

    Are all data available and do they match the descriptions in the paper?

    No. I typed Bioproject ID provided under supporting data and materials: PRJNA795201 but could not see any information or data. Either the authors have put it on hold or data have not be submitted. Please request the authors to release data.

    Additional Comments: The manuscript titled "A chromosome-level genome assembly and annotation of a maize elitebreedinglineDan340" provides a good overview of genome assembly construction and structural annotation of maize elite breeding line Dan340. The authors have presented correct methods in construction of genome assembly and annotations. Although the paper provides elaborate methods for genome construction the manuscript lacking to demonstrate the value of the genome as a resource. More specifically the authors describe this line as elite with having desirable characters such as disease resistance; lodging resistance and so on. The focus of the manuscript mostly on methods without significant examples to demonstrate the value of the resource. The authors could characterize some disease resistance genes or genes affected by structural variations in the Dan340 line and compare it to other maize lines. Major Comments: 1) I typed Bioproject ID provided under supporting data and materials: PRJNA795201 but could not see any information or data. Either the authors have put it on hold or data have not be submitted. Please request the authors to release data. 2) The authors build repeat lib using ab inito and homology based methods and masked 66.09% of the Dan340 genome; this when compared to other reference lines esp B73 from Hufford et al (https://www.science.org/action/downloadSupplement?doi=10.1126%2Fscience.abg5289&file=science.abg5289_hufford_sm.pdf) Table S5 is significatly lower i.e B73 genome is 85% masked. 3) To assess the quality of genome assembly the authors use LAI index. The reported LAI for B73 in Hufford et al ( same as above Table S2) is 27.84 where as the authors report B73 LAI 16.79 which is incorrect. Is the B73 version used for comparison v4 or v5??. Can the authors provide a pairwise comparison of genome sequence using dot plot betwee Dan340 line and other maize lines to visualize assembly artifacts like inversions deletions or gaps in the assembly. 4) The authors use transcription data from six tissues ( stem,endosperm,embryo,bract,silk,ear and tip) for alignment. There is no mention in the manuscript how these were generated. Are they also submitted to NCBI? Minor comments: 1) Line 6; rephrase sentence what the authors meant: There are more than 50 Maize hybrid breeds derived from Dan340 since 2000. 2) Table 2 : use full genus specie names in column 1.

    **Reviewer 2. Georg Haberer **

    Are all data available and do they match the descriptions in the paper?

    No. I could not check the data availability, the provided project number was not at the NCBI sequence archive. The authors should ensure that both raw genomic and RNAseq reads are uploaded there. Also, the final genome sequence and gene annotations should be available to the community.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?

    No. The methods how structural variants were detected is very fuzzy. For example, L283: “Then, SAMtools v0.1.1 (Li et al., 2009) was used to assign the structural variations from the bam file to each chromosome or scaffold.” Samtools per se does not call any SVs, it is unclear how this assignment was actually performed: did they filter SV here? Did they use the full samtools+mpileup+vcftools pipeline? Also, did they really apply v0.1.1, this is an extremely outdated version (current version 1.15, probably dozens of updates).

    Additional Comments:

    The authors report sequencing and assembly of Dan340, a highly important founder line for maize breeding in China. They complement the genome sequence by gene and repeat annotations and a preliminary study of structural variants between their and three other lines. The obtained genome sequence is of excellent quality and evaluated by several independent statistics (BUSCO, CEGMA, LAI). Repeat and gene predictions seem to be done by state-of-the-art methods, and the reported numbers and proportions are similar to previous reports and comparative analysis in maize. In summary, the manuscript provides a highly valuable genomic resource for maize biologists and breeders and complements the increasing number of maize pan-genomes by a major Chinese germplasm. I have only a few comments for the authors: see my points above about data availability and methods to call SVs, in addition: - Table 2: they provide here only abbreviations of species, they have to spell out these abbreviations in the table legend, for exampe: Hvu: Hordeum vulgare. Also it may be difficult for readers to understand to what species/lines ‘ZmaL’ and ‘Zma’ point. - L31/L67: “… and so on.” not an appropriate closure of a sentence in scientific texts. - L95-103: this part can be left out, the authors do not have to, and should not describe or even justify here the CCS technology. Just mention what has been done.
    

    Recommendation: Minor Revision.

    **Reviewer 2. Xupo Ding **

    1. The gene numbers and repeat percentage should be presented in the abstract.
    2. The potential functions of three secondary metabolite processes in the abstract might be inferred. It will be showed specifics of Dan340, such as related to disease resistance or others?
    3. Line 79-82, this sentence is same with the conclusion in the abstract, please extending.
    4. The depth or data size of CCS and Hi-C should be added to corresponding the Illumina data description.
    5. Line 128-131, before assembly assessment, the quality might not be steerable. Please consulting: The assembly was performed in a stepwise fashion with PacBio HiFi reads and Illumina short reads and Hi-C technology.
    6. Line 211-213, insert the description about comparative data of LTR in Dan340, B73, Mo17, and SK.
    7. Line 241, describe how many genes or percentage of protein coding genes were supported by RNA-seq.
    8. Fig.4A, the line name of four maize might be out of Veen.
    9. Fig.4B and Fig.4C were not cited in the data description.
    10. Line 264-270, insert the pathways description fronting five or ten in the GO list and infer the function of special three secondary metabolite pathway in Dan340.
    11. In conclusion, the contributions should be deepen discussion,at least not exactly same with details in abstract and Line 79-82.

    Recommendation: Minor Revision