Evaluation of a machine learning approach utilizing wearable data for prediction of SARS-CoV-2 infection in healthcare workers

Robert P Hirten
Lewis Tomalin
Matteo Danieletto
Eddye Golden
Micol Zweig
Sparshdeep Kaur
Drew Helmus
Anthony Biello
Renata Pyzik
Erwin P Bottinger
Laurie Keefer
Dennis Charney
Girish N Nadkarni
Mayte Suarez-Farinas
Zahi A Fayad

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)
Evaluated articles (Rapid Reviews Infectious Diseases)

Abstract

Objective

To determine whether a machine learning model can detect SARS-CoV-2 infection from physiological metrics collected from wearable devices.

Materials and Methods

Health care workers from 7 hospitals were enrolled and prospectively followed in a multicenter observational study. Subjects downloaded a custom smart phone app and wore Apple Watches for the duration of the study period. Daily surveys related to symptoms and the diagnosis of Coronavirus Disease 2019 were answered in the app.

Results

We enrolled 407 participants with 49 (12%) having a positive nasal SARS-CoV-2 polymerase chain reaction test during follow-up. We examined 5 machine-learning approaches and found that gradient-boosting machines (GBM) had the most favorable validation performance. Across all testing sets, our GBM model predicted SARS-CoV-2 infection with an average area under the receiver operating characteristic (auROC) = 86.4% (confidence interval [CI] 84–89%). The model was calibrated to value sensitivity over specificity, achieving an average sensitivity of 82% (CI ±∼4%) and specificity of 77% (CI ±∼1%). The most important predictors included parameters describing the circadian heart rate variability mean (MESOR) and peak-timing (acrophase), and age.

Discussion

We show that a tree-based ML algorithm applied to physiological metrics passively collected from a wearable device can identify and predict SARS-CoV-2 infection.

Conclusion

Applying machine learning models to the passively collected physiological metrics from wearable devices may improve SARS-CoV-2 screening methods and infection tracking.

Version published to 10.1093/jamiaopen/ooac041
Apr 6, 2022
Rapid Reviews Infectious Diseases
Jan 26, 2022

Aaron Hudson

Review 1: "Evaluation of a Machine Learning Approach Utilizing Wearable Data for Prediction of SARS-CoV-2 Infection in Healthcare Workers"

This study develops a prediction model for positive COVID-19 diagnosis using data collected from Apple Watches on heart rate variability (HRV) among healthcare workers. Reviewers highlight unclear model justifications and methodology.

Read the original source
Rapid Reviews Infectious Diseases
Jan 26, 2022

Toyya Pujol

Review 2: "Evaluation of a Machine Learning Approach Utilizing Wearable Data for Prediction of SARS-CoV-2 Infection in Healthcare Workers"

This study develops a prediction model for positive COVID-19 diagnosis using data collected from Apple Watches on heart rate variability (HRV) among healthcare workers. Reviewers highlight unclear model justifications and methodology.

Read the original source
Rapid Reviews Infectious Diseases
Jan 26, 2022

Strength of evidence

Reviewers: A Hudson (UC Berkeley) | 📒📒📒 ◻️◻️
T Pujol (RAND Corporation) | 📒📒📒 ◻️◻️

Read the original source

SciScore for 10.1101/2021.11.04.21265931: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
Subjects completed daily surveys to report any COVID-19 related symptoms, symptom severity, the results for any SARS-CoV-2 nasal PCR tests, and SARS-CoV-2 antibody test results.	SARS-CoV-2 suggested: None

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

There are several limitations to our study. First, HRV was collected sporadically by the Apple Watch. We employed statistical modeling …

SciScore for 10.1101/2021.11.04.21265931: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
Subjects completed daily surveys to report any COVID-19 related symptoms, symptom severity, the results for any SARS-CoV-2 nasal PCR tests, and SARS-CoV-2 antibody test results.	SARS-CoV-2 suggested: None

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

There are several limitations to our study. First, HRV was collected sporadically by the Apple Watch. We employed statistical modeling to account for this. However, a denser data set using continuous data would likely further improve our predictions. Second, the model we employed used a 7-day smoothing approach. This approach observed infection-induced changes in HRV later than if HRV was estimated using a single-day method. Thus, the approach we employed is conservative. An additional limitation is that the Apple Watch provides HRV measurements only in the SDDN time domain. This limits assessments between other types of HRV measurements and COVID-19 outcomes. Additionally, other factors might impact HRV, which we were not able to capture and control for in the analysis. Furthermore, we were not routinely checking for SARs-CoV-2 infections and relied on subjects reporting a COVID-19 diagnosis. Therefore, infections could have occurred that are not accounted. Lastly, we did not externally validate our machine learning algorithm in another cohort.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Version published to 10.1101/2021.11.04.21265931 on medRxiv
Nov 5, 2021

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

This article has 10 authors:
1. Olanrewaju Eniade
2. Ezekiel Ukwenga
3. Uchenna Akuka
4. Opeyemi Adeniyi
5. Elonna Obak
6. Omolola Adeagbo
7. Peter Babatunde Olaitan
8. Rita Ayanbolade Olowe
9. Tolulope Opakunle
10. Olugbenga Adekunle Olowe
This article has no evaluationsLatest version Jan 25, 2026
Health Indicator Predictions from lifestyle and biometric data using Machine Learning Models

This article has 1 author:
1. Manuela Pop
This article has no evaluationsLatest version Dec 19, 2025
Risk Stratification for In-Hospital Mortality in Alzheimer’s Disease Using Interpretable Regression and Explainable AI

This article has 3 authors:
1. Tursun Alkam
2. Ebrahim Tarshizi
3. Andrew H. Van Benschoten
This article has no evaluationsLatest version Jan 7, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Materials and Methods

Results

Discussion

Conclusion

Article activity feed

Aaron Hudson

Toyya Pujol

Strength of evidence

Related articles

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

Health Indicator Predictions from lifestyle and biometric data using Machine Learning Models

Risk Stratification for In-Hospital Mortality in Alzheimer’s Disease Using Interpretable Regression and Explainable AI