Adaptive Segmentation and Statistical Analysis for Multivariate Big Data Forecasting
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Forecasting high-volume, univariate, and multivariate longitudinal data streams is a critical challenge in Big Data systems, especially with constrained computational resources and pronounced data variability. This paper addresses this challenge through the introduction of a dual-level contribution. First, we propose a theoretical framework for quantifying “data bigness” as a function of statistical, computational, and algorithmic complexity. This lens allows for more precise formalization of resource-bound analytics in dynamic environments. Second, we present the Adaptive High-Fluctuation Recursive Segmentation (AHFRS) framework, which leverages multivariate fluctuation statistics to construct compact, information-dense training subsets within bounded memory windows. Unlike static or recency-based methods, AHFRS dynamically selects historical segments with significant variance. This improves predictive signal retention under strict computational budgets. The framework is validated using synthetically generated longitudinal datasets across Finance, Retail, and Healthcare domains, each modeling domain-specific temporal dynamics while controlling for population heterogeneity. Forecasting is performed on a per-customer basis to simulate individualized inference under constrained memory conditions. Experimental results demonstrate that AHFRS consistently improves predictive performance across learning models and domains. This approach advances the theoretical modeling of data complexity and the design of adaptive, resource-efficient forecasting pipelines for real-world, high-volume data ecosystems. The proposed segmentation framework is validated on both real-world univariate and synthetic multivariate datasets. A univariate case study using Bitcoin hourly price data demonstrates early effectiveness of the model, which is then extended to multivariate domains in Finance, Retail, and Healthcare.