A chromosome-level genome assembly and annotation of a maize elite breeding line Dan340

Abstract

Background

Maize is not only one of the most important crops grown worldwide for food, forage, and biofuel, but also an important model organism for fundamental research in genetics and genomics. Owing to its importance in crop science, genetics and genomics, several reference genomes of common maize inbred line (genetic material) have been released, but some genomes of important genetic germplasm resources in maize breeding research are still lacking. The maize cultivar Dan340 is an excellent backbone inbred line of the Luda Red Cob Group with several desirable characteristics, such as disease resistance, lodging resistance, high combining ability, and wide adaptability.

Findings

In this study, we constructed a high-quality chromosome-level reference genome for Dan340 by combining PacBio long HiFi sequencing reads, Illumina short reads and chromosomal conformational capture (Hi-C) sequencing reads. The final assembly of the Dan340 genome was 2,348.72 Mb, including 2,738 contigs and 2,315 scaffolds with N50 of 41.49 Mb and 215.35 Mb, respectively. Repeat sequences accounted for 73.40% of the genome size and 39,733 protein-coding genes were annotated. Analysis of genes in the Dan340 genome, together with those from B73, Mo17 and SK, were clustered into 27,654 gene families. There were 1,806 genes from 359 gene families that were specific to Dan340, of which many had functional gene ontology annotations relating to “porphyrin-containing compound metabolic process”, “tetrapyrrole biosynthetic process”, and “tetrapyrrole metabolic process”.

Conclusions

The completeness and continuity of the genome were comparable to those of other important maize inbred lines. The assembly and annotation of this genome not only facilitates our understanding about of intraspecific genome diversity in maize, but also provides a novel resource for maize breeding improvement.

Research Areas

Genetics and Genomics; Agriculture, Plant Genetics

Data Description

Abstract

This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.62), and has published the reviews under the same license. These are as follows.

**Reviewer 1. Kapeel Chougule **

Are all data available and do they match the descriptions in the paper?

No. I typed Bioproject ID provided under supporting data and materials: PRJNA795201 but could not see any information or data. Either the authors have put it on hold or data have not be submitted. Please request the authors to release data.

Additional Comments: The manuscript titled "A chromosome-level genome assembly and annotation of a maize elitebreedinglineDan340" provides a good overview of genome assembly construction and structural annotation of maize elite breeding line Dan340. The authors have presented correct methods in construction of genome assembly and annotations. Although the paper provides elaborate methods for genome construction the manuscript lacking to demonstrate the value of the genome as a resource. More specifically the authors describe this line as elite with having desirable characters such as disease resistance; lodging resistance and so on. The focus of the manuscript mostly on methods without significant examples to demonstrate the value of the resource. The authors could characterize some disease resistance genes or genes affected by structural variations in the Dan340 line and compare it to other maize lines. Major Comments: 1) I typed Bioproject ID provided under supporting data and materials: PRJNA795201 but could not see any information or data. Either the authors have put it on hold or data have not be submitted. Please request the authors to release data. 2) The authors build repeat lib using ab inito and homology based methods and masked 66.09% of the Dan340 genome; this when compared to other reference lines esp B73 from Hufford et al (https://www.science.org/action/downloadSupplement?doi=10.1126%2Fscience.abg5289&file=science.abg5289_hufford_sm.pdf) Table S5 is significatly lower i.e B73 genome is 85% masked. 3) To assess the quality of genome assembly the authors use LAI index. The reported LAI for B73 in Hufford et al ( same as above Table S2) is 27.84 where as the authors report B73 LAI 16.79 which is incorrect. Is the B73 version used for comparison v4 or v5??. Can the authors provide a pairwise comparison of genome sequence using dot plot betwee Dan340 line and other maize lines to visualize assembly artifacts like inversions deletions or gaps in the assembly. 4) The authors use transcription data from six tissues ( stem,endosperm,embryo,bract,silk,ear and tip) for alignment. There is no mention in the manuscript how these were generated. Are they also submitted to NCBI? Minor comments: 1) Line 6; rephrase sentence what the authors meant: There are more than 50 Maize hybrid breeds derived from Dan340 since 2000. 2) Table 2 : use full genus specie names in column 1.

**Reviewer 2. Georg Haberer **

Are all data available and do they match the descriptions in the paper?

No. I could not check the data availability, the provided project number was not at the NCBI sequence archive. The authors should ensure that both raw genomic and RNAseq reads are uploaded there. Also, the final genome sequence and gene annotations should be available to the community.

Is there sufficient detail in the methods and data-processing steps to allow reproduction?

No. The methods how structural variants were detected is very fuzzy. For example, L283: “Then, SAMtools v0.1.1 (Li et al., 2009) was used to assign the structural variations from the bam file to each chromosome or scaffold.” Samtools per se does not call any SVs, it is unclear how this assignment was actually performed: did they filter SV here? Did they use the full samtools+mpileup+vcftools pipeline? Also, did they really apply v0.1.1, this is an extremely outdated version (current version 1.15, probably dozens of updates).

Additional Comments:

The authors report sequencing and assembly of Dan340, a highly important founder line for maize breeding in China. They complement the genome sequence by gene and repeat annotations and a preliminary study of structural variants between their and three other lines. The obtained genome sequence is of excellent quality and evaluated by several independent statistics (BUSCO, CEGMA, LAI). Repeat and gene predictions seem to be done by state-of-the-art methods, and the reported numbers and proportions are similar to previous reports and comparative analysis in maize. In summary, the manuscript provides a highly valuable genomic resource for maize biologists and breeders and complements the increasing number of maize pan-genomes by a major Chinese germplasm. I have only a few comments for the authors: see my points above about data availability and methods to call SVs, in addition: - Table 2: they provide here only abbreviations of species, they have to spell out these abbreviations in the table legend, for exampe: Hvu: Hordeum vulgare. Also it may be difficult for readers to understand to what species/lines ‘ZmaL’ and ‘Zma’ point. - L31/L67: “… and so on.” not an appropriate closure of a sentence in scientific texts. - L95-103: this part can be left out, the authors do not have to, and should not describe or even justify here the CCS technology. Just mention what has been done.

Recommendation: Minor Revision.

**Reviewer 2. Xupo Ding **

The gene numbers and repeat percentage should be presented in the abstract.
The potential functions of three secondary metabolite processes in the abstract might be inferred. It will be showed specifics of Dan340, such as related to disease resistance or others?
Line 79-82, this sentence is same with the conclusion in the abstract, please extending.
The depth or data size of CCS and Hi-C should be added to corresponding the Illumina data description.
Line 128-131, before assembly assessment, the quality might not be steerable. Please consulting: The assembly was performed in a stepwise fashion with PacBio HiFi reads and Illumina short reads and Hi-C technology.
Line 211-213, insert the description about comparative data of LTR in Dan340, B73, Mo17, and SK.
Line 241, describe how many genes or percentage of protein coding genes were supported by RNA-seq.
Fig.4A, the line name of four maize might be out of Veen.
Fig.4B and Fig.4C were not cited in the data description.
Line 264-270, insert the pathways description fronting five or ten in the GO list and infer the function of special three secondary metabolite pathway in Dan340.
In conclusion, the contributions should be deepen discussion，at least not exactly same with details in abstract and Line 79-82.

Recommendation: Minor Revision

Read the original source

A chromosome-level genome assembly and annotation of a maize elite breeding line Dan340

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Findings

Conclusions

Research Areas

Data Description

Article activity feed

Genome-wide prediction and association mapping of potato common scab with historical data

Genome-wide discovery, characterization, and validation of novel SSR markers in clove (Syzygium aromaticum) and their application in genetic analysis

Comparative Genomic Analysis of Mitochondrial Genomes from Two Lychee Cultivars

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Background

Findings

Conclusions

Research Areas

Data Description

Article activity feed

Related articles

Genome-wide prediction and association mapping of potato common scab with historical data

Genome-wide discovery, characterization, and validation of novel SSR markers in clove (Syzygium aromaticum) and their application in genetic analysis

Comparative Genomic Analysis of Mitochondrial Genomes from Two Lychee Cultivars