Efficient data selection for time series forecasting using a lightweight linear proxy framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Time series forecasting is pivotal in domains such as finance, transportation, and meteorology. In practical engineering applications, model performance hinges on the quality and quantity of data. However, the dual challenges of noise and redundancy in large-scale datasets, coupled with data scarcity in specific scenarios, remain significant hurdles. To address these issues, this paper proposes a unified data selection framework based on Linear Proxy and Mirrored Influence. This approach aims to rapidly evaluate sample value through lightweight forward passes, thereby circumventing expensive gradient calculations. The proposed method achieves two core functions within a unified architecture. Firstly, for standard training scenarios, we design an in-domain pre-selection mechanism guided by a validation set. This mechanism effectively identifies and eliminates detrimental samples prior to training, significantly enhancing both the training efficiency and prediction accuracy of the subsequent main model. Secondly, for few-shot scenarios, we propose a cross-domain data retrieval strategy. Leveraging limited target domain data as guidance, this strategy adaptively selects beneficial samples with consistent distributions from a large-scale source domain pool, effectively mitigating the data scarcity problem. Extensive experiments demonstrate that our method effectively resolves the challenges of training set denoising and cross-domain data augmentation while significantly reducing computational costs.

Article activity feed