A Case Study on the Stability of Neural Network Climate Prediction Models with Different Training Stop Criteria

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Due to randomness factors in the machine learning model construction process, reproducibility is compromised. This study investigates the impact of randomness on model stability and evaluates techniques for reducing this impact using the widely adopted shallow neural network model as a testbed. Randomness in this neural network model arises from three events: randomly initializing model parameters, randomly selecting a validation subset, and randomly sampling batches for parameter updates. Among these, batch randomness exerts a much weaker impact than the other two factors. In this study, the model training is stopped when the validation performance fails to improve or when a preset threshold for loss or epoch number is met. The final model stability is considerably better when using threshold criteria than when using validation criterion, as the former avoids the randomness associated with selecting a validation subset. Sensitivity experiments show that scaling the model’s initial parameters (i.e., weights) to 0.1 times their original values can mitigate the impact of initialization randomness, thereby markedly improving model stability while also substantially enhancing predictive skill. Furthermore, weight decay and multi-model ensembles, which are two commonly used techniques, can also markedly enhance model stability. From the perspective of this case study, the compression of model initial parameters yields better improvements in stability compared to weight decay, and unlike multi-model ensemble methods that entail substantial increases in computational cost, it serves as a preferable technique for improving model stability.

Article activity feed