Applying a Convolutional Vision Transformer for Emotion Recognition in Children with Autism: Fusion of Facial Expressions and Speech Features

Yonggu Wang
Kailin Pan
Yifan Shao
Jiarong Ma
Xiaojuan Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

With advances in digital technology, including deep learning and big data analytics, new methods have been developed for autism diagnosis and intervention. Emotion recognition and the detection of autism in children are prominent subjects in autism research. Typically using single-modal data to analyze the emotional states of children with autism, previous research has found that the accuracy of recognition algorithms must be improved. Our study creates datasets on the facial and speech emotions of children with autism in their natural states. A convolutional vision transformer-based emotion recognition model is constructed for the two distinct datasets. The findings indicate that the model achieves accuracies of 79.12% and 83.47% for facial expression recognition and Mel spectrogram recognition, respectively. Consequently, we propose a multimodal data fusion strategy for emotion recognition and construct a feature fusion model based on an attention mechanism, which attains a recognition accuracy of 90.73%. Ultimately, by using gradient-weighted class activation mapping, a prediction heat map is produced to visualize facial expressions and speech features under four emotional states. This study offers a technical direction for the use of intelligent perception technology in the realm of special education and enriches the theory of emotional intelligence perception of children with autism.

Version published to 10.3390/app15063083
Mar 12, 2025
Version published to 10.20944/preprints202502.1606.v1
Feb 20, 2025

Lightweight Neural Network with Attention Mechanism for Enhanced Facial Expression Recognition

This article has 6 authors:
1. Zhenzhen Luo
2. Yong Luo
3. Xiang Zou
4. Qiangqiang Zhou
5. Shengnan Ke
6. Jun Gong
This article has no evaluationsLatest version Jan 21, 2026
Generating and Detecting Autistic Facial Expression Patterns Using Generative AI and Deep Learning

This article has 4 authors:
1. Nermin Siphocly
2. Arwa Hussein
3. Walaa Gad
4. Tamer AbdelKader
This article has no evaluationsLatest version Dec 26, 2025
ASD Recognition through Weighted Integration of Landmark-Based Handcrafted and Pixel-Based Deep Learning Features

This article has 8 authors:
1. Asahi Sekine
2. Abu Saleh Musa Miah
3. Koki Hirooka
4. Najmul Hassan
5. Md Al Mehedi Hasan
6. Yuichi Okuyama
7. Yoichi Tomioka
8. Jungpil Shin
This article has no evaluationsLatest version Dec 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Lightweight Neural Network with Attention Mechanism for Enhanced Facial Expression Recognition

Generating and Detecting Autistic Facial Expression Patterns Using Generative AI and Deep Learning

ASD Recognition through Weighted Integration of Landmark-Based Handcrafted and Pixel-Based Deep Learning Features