Chromosome-scale assembly of the highly heterozygous genome of red clover (Trifolium pratense L.), an allogamous forage crop species

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Relative to other crops, red clover (Trifolium pratense L.) has various favorable traits making it an ideal forage crop. Conventional breeding has improved varieties, but modern genomic methods could accelerate progress and facilitate gene discovery. Existing short-read-based genome assemblies of the ∼420 megabase pair (Mbp) genome are fragmented into >135,000 contigs, with numerous order and orientation errors within scaffolds, probably associated with the plant’s biology, which displays gametophytic self-incompatibility resulting in inherent high heterozygosity. Here, we present a high-quality long-read-based assembly of red clover with a more than 500-fold reduction in contigs, improved per-base quality, and increased contig N50 by three orders of magnitude. The 413.5 Mbp assembly is nearly 20% longer than the 350 Mbp short-read assembly, closer to the predicted genome size. We also present quality measures and full-length isoform RNA transcript sequences for assessing accuracy and future genome annotation. The assembly accurately represents the seven main linkage groups in an allogamous (outcrossing), highly heterozygous plant genome.

Article activity feed

  1. Abstract

    A version of this preprint has been published in the Open Access journal GigaByte (see paper https://doi.org/10.46471/gigabyte.42), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    **Reviewer 1. Jose De Vega **

    I think this long-read assembly is a great improvement against the previous short-read version available to the community to date. The assembly metrics are good, the dataset public, and there is good quality control all through the process. The manuscript is well written and the protocols are well explained. The data is public and the new assembly of interest to the community.

    However, I think the assembly has a limited interest for the research and breeding community without a gene annotation, which is not part of the manuscript. Since the authors have the data (e.g. iso-seq) and expertise, I do not understand why it has not been included in first place.

  2. Red clover

    **Reviewer 2. Jianghua Chen **

    Red clover is one of the most important forage crops in the world. The gametophytic self-incompatibility resulting in inherent high heterozygosity is the big challenge to get a high quality genome sequence using traditional short-read based genome assemblies. The author Bickhart et al used the long-read based assemblies method to get a high quality genome which significantly reduced the number of contigs by more than 500-folds, and improves the per-base quality and the genome size to 413.5 Mb matching well with the predicted genome size. This assembly accurately represents the seven main linkage groups, and it will help scientists to understand the origin of condensed tannins biology pathway in the leaf forages and to facilitate gene discovery and application of biotechnology to increase the nutritional value.

    I strongly support the editor to accept this manuscript to be published.