Adaptive Segmentation and Statistical Analysis for Multivariate Big Data Forecasting

Desmond Fomo
Aki-Hiro Sato

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Forecasting high-volume, univariate, and multivariate longitudinal data streams is a critical challenge in Big Data systems, especially with constrained computational resources and pronounced data variability. However, existing approaches often neglect multivariate statistical complexity (e.g., covariance, skewness, kurtosis) of multivariate time series or rely on recency-only windowing that discards informative historical fluctuation patterns, limiting robustness under strict resource budgets. This work makes two core contributions to big data forecasting. First, we establish a formal, multi-dimensional framework for quantifying “data bigness” across statistical, computational, and algorithmic complexities, providing a rigorous foundation for analyzing resource-constrained problems. Second, guided by this framework, we extend and validate the Adaptive High-Fluctuation Recursive Segmentation (AHFRS) algorithm for multivariate time series. By incorporating higher-order statistics such as covariance, skewness, and kurtosis, AHFRS improves predictive accuracy under strict computational budgets. We validate the approach in two stages. First, a real-world case study on a univariate Bitcoin time series provides a practical stress test using a Long Short-Term Memory (LSTM) network as a robust baseline. This validation reveals a significant increase in forecasting robustness, with our method reducing the Root Mean Squared Error (RMSE) by more than 76% in a challenging scenario. Second, its generalizability is established on synthetic multivariate data sets in Finance, Retail, and Healthcare using standard statistical models. Across domains, AHFRS consistently outperforms baselines; in our multivariate Finance simulation, RMSE decreases by up to 62.5% in Finance and Mean Absolute Percentage Error (MAPE) drops by more than 10 percentage points in Healthcare. These results demonstrate that the proposed framework and AHFRS advances the theoretical modeling of data complexity and the design of adaptive, resource-efficient forecasting pipelines for real-world, high-volume data ecosystems.

Version published to 10.3390/bdcc9110268
Oct 24, 2025
Version published to 10.20944/preprints202508.1677.v1
Aug 22, 2025

Efficient data selection for time series forecasting using a lightweight linear proxy framework

This article has 2 authors:
1. xiang Ao
2. Mengru Chen
This article has no evaluationsLatest version Jan 13, 2026
Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review

This article has 1 author:
1. Hiteash Mahajan
This article has no evaluationsLatest version Jan 27, 2026
Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review

This article has 1 author:
1. Hiteash Mahajan
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Efficient data selection for time series forecasting using a lightweight linear proxy framework

Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review

Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review