A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Irregular multivariate time series with missing values present significant challenges for predictive modeling in domains such as healthcare. While deep learning approaches often focus on temporal interpolation or complex architectures to handle irregularities, we propose a simpler yet effective alternative: extracting time-agnostic summary statistics to eliminate the temporal axis. Our method computes four key features per variable—mean and standard deviation of observed values, and mean and variability of changes between consecutive observations—to create a fixed-dimensional representation. These features are then used with standard classifiers like logistic regression and XGBoost. Evaluated on four biomedical datasets (PhysioNet Challenge 2012, 2019, PAMAP2, and MIMIC-III), our approach achieves state-of-the-art performance, surpassing recent transformer and graph-based models by 0.5–1.7% in AUROC/AUPRC and 1.1–1.7% in accuracy/F1-score, while reducing computational complexity. Ablation studies demonstrate that feature extraction—not classifier choice—drives performance gains, with our summary statistics outperforming raw/imputed inputs across most benchmarks. Notably, we identify scenarios where missing patterns themselves encode predictive signals, as in sepsis prediction (PhysioNet 2019), where missing indicators alone can achieve 94.2% AUROC with XGBoost, only 1.6% lower than using original raw data as input. Our results challenge the necessity of complex temporal modeling when task objectives permit time-agnostic representations, offering an efficient, interpretable solution for irregular time series classification.

Article activity feed