Semantic Smoothness Optimization via Graph-CutEnergy Minimization for Temporal Action Segmentation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Temporal action segmentation, crucial for understanding human activities in video content, remains a challenging task due to the complexity and variability of human actions. Existing approaches, such as temporal convolutional networks (TCNs) and transformer-based architectures, often fail to adequately model the intricate dependencies and semantic relationships between sequential actions. In this paper, we propose a novel framework formulated as an energy minimization problem to improve temporal action segmentation. Our approach incorporates data and smoothness costs, utilizing a graph-cut algorithm to achieve energy minimization. The data cost quantifies the likelihood of assigning appropriate semantic labels to frames based on visual features, while the smoothness cost ensures temporal consistency between neighboring frames by modeling semantic transitions. Extensive experiments on the GTEA, 50Salads, and Breakfast datasets demonstrate that our framework outperforms state-of-the-art methods, providing more accurate and temporally consistent action segmentation. By explicitly modeling semantic relationships and ensuring smooth action transitions, our approach contributes to more robust and reliable action recognition in untrimmed video sequences, with potential applications in robotics, video surveillance, and human-computer interaction. The code and datasets supporting the results of this study are publicly available at the project’s \href{https://github.com/MohannaAnsari/SSOG}{GitHub repository}.

Article activity feed