Digital Data and Machine Learning for Influenza Prediction: Enhancing Healthcare Sustainability in Norway

Mesay Moges Menebo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Influenza presents a significant public health challenge globally, with seasonal outbreaks straining healthcare systems. Healthcare centers often experience high traffic from influenza-like illnesses (ILIs), many of which require only basic self-care advice. These visits contribute to avoidable congestion and strain. Timely ILI forecasts could support alternative strategies—like SMS-based guidance—to reduce unnecessary visits. Internet search data offers real-time insight into public health trends and may improve upon traditional surveillance systems. This study assessed the effectiveness of using Google search query data, alongside ILI incidence, to forecast influenza activity in Norway with machine learning models. Methods Weekly ILI data from the Norwegian Syndromic Surveillance System (NorSySS) was collected from 2006 to 2024, along with normalized Google search query data for 13 influenza-related terms. Pearson correlation analysis was conducted to identify search terms with significant associations with ILI incidence. Machine learning models, including Linear Regression, Random Forest, XGBoost, Support Vector Regression (SVR), and Long Short-Term Memory (LSTM) networks, were employed to predict ILI incidence. Model performance was evaluated using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R-squared (R²) metrics. Results The final predictor matrix combined 29 symptom- and medication-related Google search terms—identified through Pearson correlation (r ≥ 0.30), mutual information, and LASSO regression—with their lagged variants. These features were used to train the machine learning models. Among them, Random Forest achieved the best predictive performance (RMSE = 0.47, R² = 0.62), closely followed by XGBoost (RMSE = 0.48, R² = 0.60). Linear Regression and SVR showed moderate accuracy, while LSTM performed least effectively (RMSE = 0.76, R² = 0.11). Compared to LSTM, Random Forest reduced prediction error by 38%, most accurately capturing weekly ILI trends. Conclusions This study highlights the potential of integrating online search query data with machine learning models to improve the accuracy of influenza forecasting. The findings support the use of digital data sources as a complementary tool for influenza prediction, contributing to more sustainable healthcare resource management and timely public health interventions.

Version published to 10.21203/rs.3.rs-5326310/v1 on Research Square
Oct 31, 2025

Lowering Barriers to AI Adoption in Regional Hospitals: Predicting Patient Volumes from Minimal Data

This article has 6 authors:
1. Stefan Förstel
2. Markus Förstel
3. Markus Gallistl
4. Dario Zanca
5. Bjoern M. Eskofier
6. Eva M. Rothgang
This article has no evaluationsLatest version Oct 14, 2025
Machine Learning for Estimating Catastrophic Health Spending in Disaster-Affected, Data-Scarce Settings

This article has 3 authors:
1. Rozana Himaz
2. Dimitra Salmanidou
3. Saman Ghaffarian
This article has no evaluationsLatest version Oct 3, 2025
Performance evaluation of RespiCast ensemble forecasts for primary care syndromic indicators of viral respiratory disease in Europe during the 2023/24 winter season

This article has 32 authors:
1. Nicolò Gozzi
2. Corrado Gioannini
3. Paolo Milano
4. Ivan Vismara
5. Luca Rossi
6. Marco Quaggiotto
7. Stefania Fiandrino
8. Daniela Paolotti
9. Alessandro Vespignani
10. Helen Johnson
11. Rok Grah
12. Sebastian Funk
13. Katharine Sherratt
14. Niel Hens
15. Steven Abrams
16. Maikel Bosschaert
17. Francesco Celino
18. Lorenzo Zino
19. Alessandro Rizzo
20. Sasikiran Kandula
21. Birgitte Freiesleben de Blasio
22. Atte Aalto
23. Daniele Proverbio
24. Giulia Giordano
25. Jorge Gonçalves
26. Yuhan Li
27. Nicola Perra
28. Ajibola Omokanye
29. Leah J. Martin
30. Rene Niehus
31. Jose Canevari
32. Eva Bons
This article has no evaluationsLatest version Nov 2, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Lowering Barriers to AI Adoption in Regional Hospitals: Predicting Patient Volumes from Minimal Data

Machine Learning for Estimating Catastrophic Health Spending in Disaster-Affected, Data-Scarce Settings

Performance evaluation of RespiCast ensemble forecasts for primary care syndromic indicators of viral respiratory disease in Europe during the 2023/24 winter season