A Hybrid LSTM-Attention Model for Multivariate Time Series Imputation: Evaluation on Environmental Datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Environmental monitoring systems generate large volumes of multivariate time series data from heterogeneous sensors, including those measuring soil, weather, and air quality parameters. However, sensor malfunctions and transmission failures frequently lead to missing values, compromising the performance of downstream analytical and predictive models. To address this challenge, this study extends a previously proposed hybrid architecture that interleaves Long Short-Term Memory (LSTM) layers with a Multi-Head Attention mechanism for robust multivariate data imputation. The first LSTM layer captures short-term temporal dependencies, the attention layer emphasizes long-range relationships among correlated features, and the second LSTM layer re-integrates these enriched representations into a coherent temporal sequence. The model is evaluated using multiple environmental datasets of soil temperature, meteorological (precipitation, temperature, wind speed, humidity), and air quality data across missingness levels ranging from 10% to 90%. Performance is compared against baseline methods including K-Nearest Neighbour (KNN) and Bidirectional Recurrent Imputation for Time Series (BRITS). An ablation study further examines the contribution of each layer to overall model performance. Results demonstrate that the proposed hybrid model achieves superior accuracy and robustness across datasets, confirming its effectiveness for environmental sensor data imputation under varying missing data conditions.