A Novel Hybrid Machine Learning Framework for Species Influence from Minimal Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Identifying ecologically influential species is crucial for biodiversity conservation. Yet, classical keystone estimation approaches such as Ecopath-derived KS1, KS2, and KS3 require numerous ecological parameters that are often difficult to obtain and sensitive to uncertainty. To overcome these limitations, we introduce a hybrid machine learning framework that infers species influence using only two widely accessible inputs: diet composition matrices and biomass. The methodology integrates mechanistic descriptors (such as Relative Total Impact) with graph-based topological features (such as PageRank, extended degree centrality) and employs three core learning strategies: Random Forest for supervised prediction, Label Propagation for semi-supervised inference, and GraphSAGE for inductive graph representation learning. These complementary models are combined through an ensemble strategy to generate robust and generalizable species-influence predictions. The framework is evaluated across multiple Ecopath ecosystems containing diet matrices, biomass data, and expert-validated keystoneness rankings. Results demonstrate that the ensemble effectively approximates Ecopath-derived ranking behaviour while requiring far fewer ecological inputs. Importantly, the objective is not to introduce a new keystone metric but to evaluate whether hybrid machine learning models can reliably reproduce expert-derived species rankings. By reducing data requirements and improving cross-ecosystem generalizability, the proposed approach offers a scalable, evidence-driven tool for ecological assessment and conservation planning in data-scarce environments-thereby advancing conservation and supporting SDG-aligned ecosystem management.