One does not fit all: Detecting work-related stress from mouse, keyboard, and cardiac data in the field

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Continuously and unobtrusively monitoring work-related stress may help combat its detrimental effects on mental and physical health. For work in offices specifically, mouse and keyboard data have been suggested as highly suitable data sources for stress detection, in addition to physiological data such as heart rate variability (HRV). However, previous studies have yielded mixed results regarding the potential of mouse and keyboard data to detect stress, and very few works have examined the connection in real work environments. Moreover, concerns regarding the robustness and validity of existing stress detection approaches have been raised, emphasising the need for more rigorous investigations in the field.

Methods

We conducted an 8-week observational field study with office employees ( N = 36) where we collected mouse, keyboard, and cardiac data as well as self-reported stress during working hours. We derived mouse movement, keystroke dynamics, and HRV features, and trained regression models to detect self-reported stress levels using machine learning algorithms including Elastic Net, Random Forest, eXtreme Gradient Boosting (XGBoost), and Recurrent Neural Networks in two distinct modelling approaches: (1) one-fits-all, and (2) personalised. The first approach aims to detect the stress levels of a participant using training data from other participants. In the second, individual models are trained per participant.

Results

The one-fits-all modelling approach yields modest correlations with true labels (Spear-man’s ρ = 0.078) under leave-one-subject-out cross-validation, with a slight improvement when incorporating time series of feature values (Spearman’s ρ = 0.096). In the personalised approach, XGBoost models trained on mouse and keyboard features reach an average Spearman’s ρ of 0.188 under blocked cross-validation. When optimised across machine learning models and feature sets, performance of the personalised approach further improves, reaching an average Spearman’s ρ of 0.296.

Conclusion

Our results suggest that that developing robust and valid stress detection models from in-field data remains challenging, reflecting the complexity of affective computing in naturalistic settings. Personalised modelling approaches show encouraging potential and warrant further exploration. We offer actionable recommendations to advance research on automated stress detection in real-world settings and openly share our dataset to promote innovation and collaboration within the research community.

Article activity feed