Development of an Interactive Digital Human with Context-Sensitive Facial Expressions
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression generation. The system establishes a complete pipeline for real-time interaction and compound emotional expression, following a sequence of “speech semantic parsing—multimodal emotion recognition—Action Unit (AU)-level 3D facial expression control.” First, a ResNet18-based model is employed for robust emotion classification using the AffectNet dataset. Then, an AU motion curve driving module is constructed on the Unreal Engine platform, where dynamic synthesis of basic emotions is achieved via a state-machine mechanism. Finally, Generative Pre-trained Transformer (GPT) is utilized for semantic analysis, generating structured emotional weight vectors that are mapped to the AU layer to enable language-driven facial responses. Experimental results demonstrate that the proposed system significantly improves facial animation quality, with naturalness increasing from 3.54 to 3.94 and semantic congruence from 3.44 to 3.80. These results validate the system’s capability to generate realistic and emotionally coherent expressions in real time. This research provides a complete technical framework and practical foundation for high-fidelity digital humans with affective interaction capabilities.