Genome-wide association mapping and predictive modeling of wet bean mass in a diverse cacao collection
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Yield improvement is a critical breeding target for the chocolate tree ( Theobroma cacao L.), the source of cocoa beans. We investigated the genetic architecture of flower, fruit, and seed traits in 346 diverse cacao accessions using an integrated approach, combining phenotypic data (27 traits), genome-wide SNP genotyping (671 SNPs), and advanced machine learning. Our Bootstrap Forest-based GWAS identified novel SNP-trait associations, providing candidate genes linked to key agronomic traits like pod index, seed number, and cotyledon mass. Phenotypic variation exhibited a complex, continuous relationship with genetic relatedness, not fully explained by predefined accession groups. Our Neural Network model (R 2 = 0.715 in validation) found that cotyledon mass (portion = 0.520) and cotyledon length (portion = 0.213) were the major phenotypic contributors that predict cacao total wet bean mass, a direct measurement of yield. This finding suggests that simple measurement of these early-stage phenotypic attributes may be useful in identifying productive clones of cacao. Overall, this study enhances our understanding of the genetic basis of trait variation in cacao and provide a powerful framework for implementing genomic selection to develop improved cacao varieties with enhanced yield potential.
Author summary
Improving yield in the chocolate tree, Theobroma cacao, is vital but complex, as yield results from the interplay of many genes. Identifying these genes and predicting which trees will be most productive remains a major challenge for breeders. In this study, we combined genetic information (DNA markers) with detailed measurements of 27 traits from 346 diverse cacao trees. Using advanced machine learning techniques, we pinpointed specific genetic regions associated with key yield components like pod characteristics and seed number. Furthermore, we developed a computational model that accurately predicts a tree’s total wet bean mass – a direct measure of yield. Surprisingly, this model revealed that the mass and length of the initial seed leaves (cotyledons) are the strongest predictors of future productivity. Our findings provide new insights into the genetic control of cacao yield and offer a potentially simple, early-stage tool to help breeders select superior trees more efficiently.