Predicting flowering time using integrated morphophysiological and genomic data with machine learning models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Indigenous Cannabis Sativa populations have adapted to diverse environments, resulting in genetic and phenotypic diversity. Understanding the mechanisms underlying flowering time variation is crucial for optimizing cultivation and breeding. This study employed a novel approach combining temporal phenotypic analysis, genomic data, and machine learning (ML) to identify key features associated with early, medium, and late flowering in cannabis landraces. We collected weekly data on six morphophysiological traits—stem diameter, height, growth rate, node number, internode length, and SPAD chlorophyll index—from 25 cannabis landrace populations 13 weeks for female plants and 11 weeks for male plants. Additionally, 145 accessions were genotyped using high-density genotyping-by-sequencing, resulting in 233,624 high-quality single nucleotide polymorphisms (SNPs). A comprehensive ML framework integrating mutual information (MI), recursive feature elimination (RFE), random forest (RF), and support vector machine (SVM), was used to investigate 234,002 features, encompassing SNPs, morphophysiological traits, and environmental factors. This approach identified 53 key features—22 genetic variants and 31 morphophysiological traits—that effectively distinguish between early, medium, and late flowering types with an accuracy of 96.6%. The identified SNPs were distributed across multiple chromosomes, including chromosomes 08, 09, and X. Notably, key loci like AutoFlower3 ( CsFT3 ) (on chromosome 08) and CircadianFloweringLocus1 ( CsCFL1 ) (on chromosome 09) were identified, with several SNPs located within or near annotated genes. These findings contribute significantly to the understanding of cannabis chronobiology and support the development of “smart crop” strategies by providing valuable markers for early selection and targeted breeding programs aimed at optimizing flowering time under diverse conditions.

Key Message

A data-driven machine learning strategy combining genomic and dynamic phenotypic traits enables accurate classification of flowering time in diverse Cannabis landraces.

Article activity feed