LSTM-Attention-Guided Graph Neural Networks for Integrated Genotype–Environment Modeling in Maize Yield Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present a deep-learning framework that combines an LSTM, a graph neural network (GNN), and transformer-style attention to model genotype–environment (G × E) effects for maize yield prediction. Weather data for a growing season is summarized using LSTM and encoded into a 21-dimensional embedding that is used as the environment node feature; 437,214 SNPs are summarized into 548 principal components that instantiate genotype nodes. Multi-head attention dynamically weights the edges during message passing. We compare three architectures: A (fully bipartite graph), B (A with intra-set top- k similarity within genotype and within environment), and C (B with a single learnable supernode readout that attends over all nodes after message passing). The joint representations feed a compact MLP for yield prediction. Using a forward-time split (2014–2021 train; 2022 test with unseen genotypes and unseen environments), performance improves monotonically from A to C: A (RMSE 2.7749, PCC 0.4115, R 2 0.1693), B (2.3683, 0.6622, 0.4385), C (2.2120, 0.6945, 0.4823). Compared to A, C has a reduction in RMSE by 0.5629 (∼20.3%) and an increase in PCC by 0.283 (∼68.8%), indicating that global, content-adaptive aggregation promotes local G × E propagation. Performance of our approach remains consistent regardless of the number of genotypes per environment and has strong performance under variable or unbalanced genotype sampling expression across environments.

Article activity feed