Efficient data selection for time series forecasting using a lightweight linear proxy framework

xiang Ao
Mengru Chen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Time series forecasting is pivotal in domains such as finance, transportation, and meteorology. In practical engineering applications, model performance hinges on the quality and quantity of data. However, the dual challenges of noise and redundancy in large-scale datasets, coupled with data scarcity in specific scenarios, remain significant hurdles. To address these issues, this paper proposes a unified data selection framework based on Linear Proxy and Mirrored Influence. This approach aims to rapidly evaluate sample value through lightweight forward passes, thereby circumventing expensive gradient calculations. The proposed method achieves two core functions within a unified architecture. Firstly, for standard training scenarios, we design an in-domain pre-selection mechanism guided by a validation set. This mechanism effectively identifies and eliminates detrimental samples prior to training, significantly enhancing both the training efficiency and prediction accuracy of the subsequent main model. Secondly, for few-shot scenarios, we propose a cross-domain data retrieval strategy. Leveraging limited target domain data as guidance, this strategy adaptively selects beneficial samples with consistent distributions from a large-scale source domain pool, effectively mitigating the data scarcity problem. Extensive experiments demonstrate that our method effectively resolves the challenges of training set denoising and cross-domain data augmentation while significantly reducing computational costs.

Version published to 10.21203/rs.3.rs-8425294/v1 on Research Square
Jan 13, 2026

Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review

This article has 1 author:
1. Hiteash Mahajan
This article has no evaluationsLatest version Jan 27, 2026
Deep Learning-Based Uncertainty-Driven Robust Time Series Forecasting for Backend Service Metrics

This article has 6 authors:
1. Sijia Li
2. Chengda Xu
3. Chi Zhang
4. Bolin Chen
5. Zizhao Zhang
6. Zixiao Huang
This article has no evaluationsLatest version Jan 29, 2026
Construction of a 0.01° Monthly Seamless XCO₂ Dataset over China: Based on a Temporally Adaptive Forest Model

This article has 5 authors:
1. Wenkai Zhang
2. Xi Chen
3. Li Duan
4. Shiran Song
5. Qian Zhou
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Comparative Study of Arima, Lstm and Prophet Models for Time Series Forecasting: A Comprehensive Review

Deep Learning-Based Uncertainty-Driven Robust Time Series Forecasting for Backend Service Metrics

Construction of a 0.01° Monthly Seamless XCO₂ Dataset over China: Based on a Temporally Adaptive Forest Model