Chromosome-scale assembly of the highly heterozygous genome of red clover ( Trifolium pratense L.), an allogamous forage crop species
Abstract
Red clover ( Trifolium pratense L.) is used as a forage crop due to a variety of favorable traits relative to other crops. Improved varieties have been developed through conventional breeding approaches, but progress could be accelerated and gene discovery facilitated using modern genomic methods. Existing short-read based genome assemblies of the ~420 Megabase (Mb) genome are fragmented into >135,000 contigs with numerous errors in order and orientation within scaffolds, likely due to the biology of the plant which displays gametophytic self-incompatibility resulting in inherent high heterozygosity. A high-quality long-read based assembly of red clover is presented that reduces the number of contigs by more than 500-fold, improves the per-base quality, and increases the contig N50 statistic by three orders of magnitude. The 413.5 Mb assembly is nearly 20% longer than the 350 Mb short read assembly, closer to the predicted genome size. Quality measures are presented and full-length isoform sequence of RNA transcripts reported for use in assessing accuracy and for future annotation of the genome. The assembly accurately represents the seven main linkage groups present in the genome of an allogamous (outcrossing), highly heterozygous plant species.
Article activity feed
-
A version of this preprint has been published in the Open Access journal GigaByte (see paper https://doi.org/10.46471/gigabyte.42), where the paper and peer reviews are published openly under a CC-BY 4.0 license.
**Reviewer 1. Jose De Vega **
I think this long-read assembly is a great improvement against the previous short-read version available to the community to date. The assembly metrics are good, the dataset public, and there is good quality control all through the process. The manuscript is well written and the protocols are well explained. The data is public and the new assembly of interest to the community.
However, I think the assembly has a limited interest for the research and breeding community without a gene annotation, which is not part of the manuscript. Since the authors have the data (e.g. iso-seq) and expertise, I do …
A version of this preprint has been published in the Open Access journal GigaByte (see paper https://doi.org/10.46471/gigabyte.42), where the paper and peer reviews are published openly under a CC-BY 4.0 license.
**Reviewer 1. Jose De Vega **
I think this long-read assembly is a great improvement against the previous short-read version available to the community to date. The assembly metrics are good, the dataset public, and there is good quality control all through the process. The manuscript is well written and the protocols are well explained. The data is public and the new assembly of interest to the community.
However, I think the assembly has a limited interest for the research and breeding community without a gene annotation, which is not part of the manuscript. Since the authors have the data (e.g. iso-seq) and expertise, I do not understand why it has not been included in first place.
-
**Reviewer 2. Jianghua Chen **
Red clover is one of the most important forage crops in the world. The gametophytic self-incompatibility resulting in inherent high heterozygosity is the big challenge to get a high quality genome sequence using traditional short-read based genome assemblies. The author Bickhart et al used the long-read based assemblies method to get a high quality genome which significantly reduced the number of contigs by more than 500-folds, and improves the per-base quality and the genome size to 413.5 Mb matching well with the predicted genome size. This assembly accurately represents the seven main linkage groups, and it will help scientists to understand the origin of condensed tannins biology pathway in the leaf forages and to facilitate gene discovery and application of biotechnology to increase the nutritional …
**Reviewer 2. Jianghua Chen **
Red clover is one of the most important forage crops in the world. The gametophytic self-incompatibility resulting in inherent high heterozygosity is the big challenge to get a high quality genome sequence using traditional short-read based genome assemblies. The author Bickhart et al used the long-read based assemblies method to get a high quality genome which significantly reduced the number of contigs by more than 500-folds, and improves the per-base quality and the genome size to 413.5 Mb matching well with the predicted genome size. This assembly accurately represents the seven main linkage groups, and it will help scientists to understand the origin of condensed tannins biology pathway in the leaf forages and to facilitate gene discovery and application of biotechnology to increase the nutritional value.
I strongly support the editor to accept this manuscript to be published.
-