Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance

Paul Tschisgale
Peter Wulff

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly used in research both as tools and as objects of investigation. Much of this work implicitly assumes that LLM performance under fixed conditions (identical model snapshot, hyperparameters, and prompt) is time-invariant. If average output quality changes systematically over time, this assumption is violated, threatening the reliability, validity, and reproducibility of findings. To empirically examine this assumption, we conducted a longitudinal study on the temporal variability of GPT-4o's average performance. Using a fixed model snapshot, fixed hyperparameters, and identical prompting, GPT-4o was queried via the API to solve the same multiple-choice physics task every three hours for approximately three months. Ten independent responses were generated at each time point and their scores were averaged. Spectral (Fourier) analysis of the resulting time series revealed notable periodic variability in average model performance, accounting for approximately 20% of the total variance. In particular, the observed periodic patterns are well explained by the interaction of a daily and a weekly rhythm. These findings indicate that, even under controlled conditions, LLM performance may vary periodically over time, calling into question the assumption of time invariance. Implications for ensuring validity and replicability of research that uses or investigates LLMs are discussed.

Version published to 10.21203/rs.3.rs-8869261/v1 on Research Square
Feb 18, 2026

Foundation Model for Biological Temporal Data Dynamics with Experimental Validation

This article has 2 authors:
1. Xiaoyu Duan
2. Vipul Periwal
This article has no evaluationsLatest version Mar 12, 2026
CP-LLM: Conformal Calibration for Time Series Interval Forecasting with Frozen Large Language Model

This article has 1 author:
1. Ke Li
This article has no evaluationsLatest version Apr 2, 2026
A foundation model for multivariate time series forecasting

This article has 25 authors:
1. Abdul Fatir Ansari
2. Oleksandr Shchur
3. Jaris Küken
4. Andreas Auer
5. Boran Han
6. Pedro Mercado
7. Syama Sundar Rangapuram
8. Huibin Shen
9. Lorenzo Stella
10. Xiyuan Zhang
11. Mononito Goswami
12. Shubham Kapoor
13. Danielle Robinson
14. Pablo Guerron
15. Florian Saupe
16. Markus Reichstein
17. Tony Hu
18. Junming Yin
19. Nick Erickson
20. Prateek Mutalik Desai
21. Hao Wang
22. Huzefa Rangwala
23. George Karypis
24. Yuyang Wang
25. Michael Bohlke-Schneider
This article has no evaluationsLatest version Apr 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Foundation Model for Biological Temporal Data Dynamics with Experimental Validation

CP-LLM: Conformal Calibration for Time Series Interval Forecasting with Frozen Large Language Model

A foundation model for multivariate time series forecasting