Genome-wide association study in quinoa reveals selection pattern typical for crops with a short breeding history

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This is a comprehensive study of genomic and phenotypic diversity in the orphan crop quinoa. Based on whole genome resequencing of 310 accessions and field phenotyping of the same set of accessions for two years, the study identified the genetic basis of agronomically important traits. Based on this promising work, there will likely be scope for quick improvement of this orphan crop through breeding.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Quinoa germplasm preserves useful and substantial genetic variation, yet it remains untapped due to a lack of implementation of modern breeding tools. We have integrated field and sequence data to characterize a large diversity panel of quinoa. Whole-genome sequencing of 310 accessions revealed 2.9 million polymorphic high confidence single nucleotide polymorphism (SNP) loci. Highland and Lowland quinoa were clustered into two main groups, with F ST divergence of 0.36 and linkage disequilibrium (LD) decay of 6.5 and 49.8 kb, respectively. A genome-wide association study using multi-year phenotyping trials uncovered 600 SNPs stably associated with 17 traits. Two candidate genes are associated with thousand seed weight, and a resistance gene analog is associated with downy mildew resistance. We also identified pleiotropically acting loci for four agronomic traits important for adaptation. This work demonstrates the use of re-sequencing data of an orphan crop, which is partially domesticated to rapidly identify marker-trait association and provides the underpinning elements for genomics-enabled quinoa breeding.

Article activity feed

  1. Author Response:

    Reviewer #1:

    The paper details a whole genome re-sequencing of 310 accessions of quinoa. This provides a good glimpse of diversity in this orphan crop, plus the GWAS studies are able to help provide the foundations for identifying key genes in quinoa variation. This will certainly advance our knowledge of this increasingly important orphan crop.

    1. One issue that permeates the entire paper is that the analysis is fairly basic and the authors do not make full use of the data. The analysis of population diversity is restricted to PCA, ADMIXTURE and phylogenetic analysis. It would probably broaden the impact of the paper if they can do deeper analysis of quinoa diversity, maybe looking at demographic history, looking at selection of highland vs. lowland, etc.

    Thanks for this suggestion. We performed a local PCA analysis by dividing the genome into 50 kb windows, and the results of the analysis are presented in Fig. S9. The results are added to the text, lines 189-209 and 556-562. Moreover, for a better understanding of the demographic history of quinoa, another study is underway with a very large set of additional genome sequences and additional outgroups.

    1. There is a focus on the rapid LD decay, which the authors attribute to the short breeding history and low selection. That seems like a stretch to make this conclusion based solely on LD decay. As they point out, many other factors could account for this, and the authors should provide other lines of evidence to draw this conclusion.

    The evidence of short breeding history in quinoa is also provided through admixtures analysis (Fig. S6) and genetic diversity analysis (Fig. S7 and S8).

    1. The GWAS analysis is good and does provide a good foundation for quinoa genetics. The authors discuss possible candidate genes is these GWAS regions. For the thousand seed weight, the relative small span of the GWAS peaks allows for localization of just a few genes in the GWAS region (CqPP2C5 and the CqRING). The GWAS associated with flowering time is larger - 1 Mb with 605 genes - but the authors focus on the GLX2-1 gene. This is again a stretch, as the large region precludes narrowing the candidate list unless there was a compelling mutation (for example a deletion or insertion of a major flowering time gene).

    Altogether, 605 genes are found in the 50kb flanking regions of the PCA-associated SNPs. This region is not 1 Mb, but 0.1 Mb in size. It was a typing error in the text corrected as 8.05-8.15 Mb (modified in the text line 284). In this region, we found 5 genes, and 3 of them were without any known annotation. The strongest association was found in the GLX2-1 gene and this association was also ‘consistent’ between years for all four traits. We modified the text line 285-286 and 287-290.

    Reviewer #2:

    A key genomic study on emerging, nutritious, alternative grain crop.

    Deep genomic data on hundreds of land races/accessions.

    Population structure analysis, could be enhanced.

    Agronomic growth and yield traits are correlated and environmentally sensitive.

    Genomic dissection via GWAS to multigenic loci with candidate genes add genomic prediction and selection.

    Inference on domestication.

    To improve population structure analysis, we performed a local PCA analysis by dividing the genome in 50 Kb windows, and the results of the analysis are presented in Fig. S9. The results are added to the text lines 189-209 and 556-562.

    We agree that the growing conditions typical of lowland (longer seasons) can prevent many accessions from reaching maturity. However, we observed that all accessions flowered and produced seeds. Nonetheless, GWAS with PCA (CP) has been shown to be effective in multiple studies (mentioned below) for genetically correlated traits. Therefore, we believe our analysis could address the bias that might occur due to maturity differences. We also discuss this in line 386-390 and 413-417.

    • Miao, C., Xu, Y., Liu, S., Schnable, P. S., & Schnable, J. C. (2020). Increased power and accuracy of causal locus identification in time series genome-wide association in sorghum. Plant physiology, 183(4), 1898-1909.

    • Yano, K., Morinaka, Y., Wang, F., Huang, P., Takehara, S., Hirai, T., ... & Matsuoka, M. (2019). GWAS with principal component analysis identifies a gene comprehensively controlling rice architecture. Proceedings of the National Academy of Sciences, 116(42), 21262-21267.

    • Aschard, H., Vilhjálmsson, B. J., Greliche, N., Morange, P. E., Trégouët, D. A., & Kraft, P. (2014). Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. The American Journal of Human Genetics, 94(5), 662-676.

    Genomic selection and prediction are interesting points. We believe that our study marks an important first step on the way to genomic selection. We agree that in many breeding programs, using marker-assisted selection for polygenic traits failed. However, markers from QTL explaining a large proportion of the phenotypic variance can be useful for marker-assisted selection, as for instance, the markers from our QTL regions on Cq2A. The next step will be to provide a database for genomic selection. This requires a more extensive set of breeding lines (training population) which should be grown under different environments.

    Reviewer #3:

    The authors have re-sequenced 310 quinoa accessions and carried out field phenotyping of the same set of accessions for two years in order to characterize genetic diversity and analyze the genetic basis of agronomically important traits.

    The main strength of the manuscript is that the authors have carefully characterized more than 300 quinoa accessions, achieving a sufficiently large population size for GWAS analysis with good statistical power. It is especially promising that the phenotypes all show high heritability. This indicates that the field phenotyping was of high quality and provides a good starting point for discovering relevant marker-trait associations. In addition, the authors provide convincing evidence for distinct population characteristics of highland and lowland quinoa, adding additional information compared to previous work (Maughan, 2012).

    The weak points are related to the genotype data and the conclusions drawn based on the GWAS analysis.

    1. An important issue is related to the relatively low depth of coverage (4-10x) that was used for re-sequencing. Across the accessions, there is a pronounced negative correlation between the mean sequencing depth and the heterozygosity level, indicating that heterozygotes are overcalled in individuals with low coverage. This also results in heterozygosity levels that are generally higher than expected for what is assumed to be mainly homozygous inbred lines.

    We addressed your concern by providing the scatter plot as requested. We also calculated correlations between coverage and heterozygosity (Fig. S3b). However, correlations were not significant, and therefore we believe that the coverage was sufficient enough to achieve accurate SNP-calling (lines 106-108).

    1. Another potential issue concerns SNPs called in repetitive regions. Among the significant GWAS SNPs identified, a very large proportion appears to be found in intergenic regions. While this does not rule out that some of them are genuinely important associations, it does suggest a potentially high level of noise in the GWAS results. In addition to the filtering already imposed, which includes a filter for mapping quality, the SNPs called in intergenic regions with unusually high coverage could be more closely examined to determine the extent of the issue. Masking repetitive genomic regions using RepeatMasker or similar programs could be useful.

    Thank you for this suggestion; we understand the problem could occur due to the poor/incorrect mapping in the intergenic regions. Therefore, we applied stringent filtering to remove SNPs with more than 50% missing genotype data, minimum mean depth less than five, and minor allele frequency less than 5% for the GWAS analysis. SNP densities in intergenic regions are generally higher than in the genic regions. In this table, there are 511 (47% of all association) intergenic SNPs and 300 upstream or downstream (28%) that are associated with traits. Therefore, we do not think that we have an overwhelming majority of intergenic SNPs. Also, we believe that SNPs within repetitive regions are also important. For instance, repetitive elements can have a function in controlling gene expression. Moreover, since our SNP calling and filtering criteria were very stringent, the probability of having false positives in our SNP data set is very low. Therefore, we would not remove them from the GWAS analysis at this stage.

    1. When the authors discuss their GWAS results, they frequently focus on cherry-picked candidate genes, although, in several cases, the top SNPs in the region in question are not found within these candidates. A more broad focus on all genes within the LD blocks, while still mentioning the candidate genes, would be more informative.

    We obtained candidate genes based on whole-genome LD average (50 kb) and we provided LD heatmaps to show that Saponin genes and GLX2-1 are in LD with the strongest associated SNPs Modified line 259-260, 398. For thousand seed weight, we showed that the SNPs with significant p-values are located within both CqRING and CqPP2C genes. We also modified the text accordingly (Lines 24,81,249,251,254-255,274,275,285-286,287-290,300-302,391,396,397,398,405-406,409-410,413-418,420-422).

    1. The manuscript includes statements that a particular genotype "results in" some phenotypic outcome, although no causal relationship has been demonstrated. In general, there is a tendency to draw too strong conclusions based on the GWAS results.

    We modified the text based on the reviewer’s comment. Rephrased into “associated with”.

    1. As this is primarily a resource paper, the authors should make the complete genotype and phenotype data as well as the layout of the field trials available. It would not be possible to reproduce the GWAS analysis based on the data included with the current version. They should also clarify how the quinoa accessions described will be made accessible to the community and provide all scripts used for data analysis through GitHub or a similar repository.

    Most of the accessions are available from the IPK Gatersleben and the USDA genebanks. Materials that are not available from the genebanks can be obtained from the authors with a Standard Material Transfer Agreement (SMTA). Genomic data (Ready to use vcf files) and phenotypic data are made available through the Dryad repository https://doi.org/10.5061/dryad.zgmsbcc9m. Raw sequencing data are available from NCBI SRA. Also, detailed descriptions of the germplasm, phenotyping methods, and phenotypes are posted at https://quinoa.kaust.edu.sa/#/ and published in Stanschewski et al., 2021 (see lines 603-607).

  2. Reviewer #3 (Public Review):

    The authors have re-sequenced 310 quinoa accessions and carried out field phenotyping of the same set of accessions for two years in order to characterize genetic diversity and analyze the genetic basis of agronomically important traits.

    The main strength of the manuscript is that the authors have carefully characterized more than 300 quinoa accessions, achieving a sufficiently large population size for GWAS analysis with good statistical power. It is especially promising that the phenotypes all show high heritability. This indicates that the field phenotyping was of high quality and provides a good starting point for discovering relevant marker-trait associations. In addition, the authors provide convincing evidence for distinct population characteristics of highland and lowland quinoa, adding additional information compared to previous work (Maughan, 2012).

    The weak points are related to the genotype data and the conclusions drawn based on the GWAS analysis.

    1. An important issue is related to the relatively low depth of coverage (4-10x) that was used for re-sequencing. Across the accessions, there is a pronounced negative correlation between the mean sequencing depth and the heterozygosity level, indicating that heterozygotes are overcalled in individuals with low coverage. This also results in heterozygosity levels that are generally higher than expected for what is assumed to be mainly homozygous inbred lines.

    2. Another potential issue concerns SNPs called in repetitive regions. Among the significant GWAS SNPs identified, a very large proportion appears to be found in intergenic regions. While this does not rule out that some of them are genuinely important associations, it does suggest a potentially high level of noise in the GWAS results. In addition to the filtering already imposed, which includes a filter for mapping quality, the SNPs called in intergenic regions with unusually high coverage could be more closely examined to determine the extent of the issue. Masking repetitive genomic regions using RepeatMasker or similar programs could be useful.

    3. When the authors discuss their GWAS results, they frequently focus on cherry-picked candidate genes, although, in several cases, the top SNPs in the region in question are not found within these candidates. A more broad focus on all genes within the LD blocks, while still mentioning the candidate genes, would be more informative.

    4. The manuscript includes statements that a particular genotype "results in" some phenotypic outcome, although no causal relationship has been demonstrated. In general, there is a tendency to draw too strong conclusions based on the GWAS results.

    5. As this is primarily a resource paper, the authors should make the complete genotype and phenotype data as well as the layout of the field trials available. It would not be possible to reproduce the GWAS analysis based on the data included with the current version. They should also clarify how the quinoa accessions described will be made accessible to the community and provide all scripts used for data analysis through GitHub or a similar repository.

  3. Reviewer #2 (Public Review):

    A key genomic study on emerging, nutritious, alternative grain crop.

    Deep genomic data on hundreds of land races/accessions.

    Population structure analysis, could be enhanced.

    Agronomic growth and yield traits are correlated and environmentally sensitive.

    Genomic dissection via GWAS to multigenic loci with candidate genes add genomic prediction and selection.

    Inference on domestication.

  4. Reviewer #1 (Public Review):

    The paper details a whole genome re-sequencing of 310 accessions of quinoa. This provides a good glimpse of diversity in this orphan crop, plus the GWAS studies are able to help provide the foundations for identifying key genes in quinoa variation. This will certainly advance our knowledge of this increasingly important orphan crop.

    1. One issue that permeates the entire paper is that the analysis is fairly basic and the authors do not make full use of the data. The analysis of population diversity is restricted to PCA, ADMIXTURE and phylogenetic analysis. It would probably broaden the impact of the paper if they can do deeper analysis of quinoa diversity, maybe looking at demographic history, looking at selection of highland vs. lowland, etc.

    2. There is a focus on the rapid LD decay, which the authors attribute to the short breeding history and low selection. That seems like a stretch to make this conclusion based solely on LD decay. As they point out, many other factors could account for this, and the authors should provide other lines of evidence to draw this conclusion.

    3. The GWAS analysis is good and does provide a good foundation for quinoa genetics. The authors discuss possible candidate genes is these GWAS regions. For the thousand seed weight, the relative small span of the GWAS peaks allows for localization of just a few genes in the GWAS region (CqPP2C5 and the CqRING). The GWAS associated with flowering time is larger - 1 Mb with 605 genes - but the authors focus on the GLX2-1 gene. This is again a stretch, as the large region precludes narrowing the candidate list unless there was a compelling mutation (for example a deletion or insertion of a major flowering time gene).

  5. Evaluation Summary:

    This is a comprehensive study of genomic and phenotypic diversity in the orphan crop quinoa. Based on whole genome resequencing of 310 accessions and field phenotyping of the same set of accessions for two years, the study identified the genetic basis of agronomically important traits. Based on this promising work, there will likely be scope for quick improvement of this orphan crop through breeding.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)