Diagnosing phenotypic signal before clustering: A simulation-based decision framework for agrobiodiversity studies

Abdel Kader NAINO JIKA

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Unsupervised clustering is widely applied to phenotypic data to explore population structure and guide decisions in agrobiodiversity research, particularly for neglected and underutilized species where genomic information is scarce. However, phenotypic datasets often exhibit weak differentiation, strong trait covariance, heteroscedasticity, and uneven sampling, raising fundamental questions about the reliability of clustering outcomes under such conditions. Here, we propose a signal-first diagnostic framework that evaluates the strength of phenotypic differentiation prior to clustering, rather than treating clustering as a default exploratory step. Using an empirically calibrated simulation design informed by trait distributions and covariance patterns observed in fonio ( Digitaria exilis ), we quantify clustering recoverability across a continuous gradient of phenotypic differentiation (Pst = 0.05–0.85) for eleven commonly used algorithms. Our results indicate that, under realistic trait architectures, meaningful recovery is not achievable below Pst ≈ 0.30 across the evaluated methods, and that internal validation metrics may provide misleading support for structure in low-signal regimes. The proposed framework offers a practical, transferable workflow for diagnosing when phenotypic clustering is informative, thereby supporting more robust interpretation of phenotypic diversity in data-constrained agrobiodiversity studies.

Version published to 10.21203/rs.3.rs-8574343/v1 on Research Square
Feb 12, 2026

When to cluster phenotypic data? A simulation-based framework to guide decisions in agrobiodiversity research

This article has 1 author:
1. Abdel Kader NAINO JIKA
This article has no evaluationsLatest version Jan 9, 2026
AI-Driven Mechanistic Modeling of Biological Processes for Drought-Resilient Crop Design

This article has 2 authors:
1. Francisco Calderon
2. Edgar S Correa
This article has no evaluationsLatest version Mar 6, 2026
Stronger Evidence for Trait–Environment Association by Pre-processing of Abundance Tables

This article has 1 author:
1. Cajo ter Braak
This article has no evaluationsLatest version Feb 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

When to cluster phenotypic data? A simulation-based framework to guide decisions in agrobiodiversity research

AI-Driven Mechanistic Modeling of Biological Processes for Drought-Resilient Crop Design

Stronger Evidence for Trait–Environment Association by Pre-processing of Abundance Tables