Research on Automatic Recognition Method of Inclusive Education Classroom Behavior Based on Pose Estimation and Multimodal Fusion

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A core challenge of inclusive education lies in the difficulty teachers face in effectively identifying and responding to the diverse behavioral manifestations of students with special educational needs (SEN) in mainstream classrooms. Traditional observation methods are time-consuming, labor-intensive, and highly subjective. This paper explores the application potential of computer vision (CV) technology in this field, aiming to construct an objective and automated framework for classroom behavior analysis. We propose a multimodal fusion method based on pose estimation and spatiotemporal modeling, capturing students' nonverbal behaviors in the classroom (such as body posture, head orientation, and activity level) using an RGB camera and combining this with simple audio features (vocal activity) for comprehensive analysis. We collected a dataset containing behavioral patterns of typical SEN students (such as Autism Spectrum Disorder (ASD) and Attention Deficit Hyperactivity Disorder (ADHD)) in a simulated inclusive education classroom environment and validated the proposed method. Experimental results show that the system achieves high accuracy (average 85.2%) in identifying key behavioral indicators (such as "attention," "social interaction," and "abnormal behavior"), significantly outperforming baseline methods that rely solely on manual observation. This study demonstrates the effectiveness of computer vision technology as a professional support tool for teachers, providing a new technological approach for achieving precise and personalized inclusive educational interventions.

Article activity feed