A Domain-Adaptive CNN-Transformer Deep Learning Model for Real-time Eye-tracking Based Classification and Scanpath Prediction in Energy-Constrained Wireless Devices

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Cultural heritage understanding and preservation are crucial for society, as they represent fundamental aspects of identity. Paintings, as a significant part of cultural heritage, continuously attract scholarly attention, particularly regarding viewer perception and the Human Vision System (HVS). This paper presents a novel approach to predicting human visual attention through eye-movement analysis during the visual experience of various paintings. We introduce a fully convolutional neural network (FCNN) designed to return scanpaths—sequences of points likely to capture viewer attention—by incorporating differentiable channel-wise selection and Soft-Argmax modules. Additionally, our model utilizes learnable Gaussian distributions to simulate visual attention biases present in natural scenes while mitigating domain shift effects with unsupervised general feature learning via a gradient reversal classifier. Our results demonstrate superior accuracy and efficiency compared to existing state-of-the-art methods. In a related exploration of cognitive processes, we investigate eye movements in reading, focusing on dyslexic and non-dyslexic readers. Traditional approaches that classify readers based on aggregated eye movement features have been insufficient, as they overlook the sequential nature of eye movements and the interaction with linguistic stimuli. We propose two sequence models that analyze eye movements without feature aggregation, incorporating contextualized word embeddings and linguistic features. Evaluated on a Mandarin Chinese dataset, our models achieve state-of-the-art performance in dyslexia classification, suggesting that even in logographic scripts, sequence models effectively capture eye gaze patterns.

Article activity feed