PrevOccupAI-HAR: A Public Domain Dataset for Smartphone Sensor-Based Human Activity Recognition in Office Environments

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This article presents PrevOccupAI-HAR, a new publicly available dataset designed to advance smartphone-based human activity recognition (HAR) in office environments. PrevOccupAI-HAR comprises two sub-datasets: (1) a model development dataset collected under controlled conditions, featuring 20 subjects performing nine sub-activities associated to three main activity classes (sitting, standing, and walking), and (2) a real-world dataset captured in an unconstrained office setting captured from 13 subjects carrying out their daily office work for six hours continuously. Three machine learning models, namely k-nearest neighbors (KNN), support vector machine (SVM), and random forest, were trained on the model development dataset to classify the three main classes independently of sub-activity variation. The models achieved accuracies of 90.94 %, 92.33 %, and 93.02 % for the KNN, SVM, and Random Forest, respectively, on the development dataset. When deployed on the real-world dataset, the models attained mean accuracies of 69.32 %, 79.43 %, and 77.81 %, reflecting performance degradations between 21.62 % and 12.90 %. Analysis of sequential predictions revealed frequent short-duration misclassifications, predominantly between sitting and standing, resulting in unstable model outputs. The findings highlight key challenges in transitioning HAR models from controlled to real-world contexts and point to future research directions involving temporal deep learning architectures or post-processing methods to enhance prediction consistency.

Article activity feed