A UI Image Captioning Method Based on Fourier Neural Operators and Multi-Branch Feature Fusion

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mobile UI captioning is a complex task that generates concise natural language descriptions toconvey essential content and functionality of a screen, playing a pivotal role in language-basedhuman-computer interaction. However, conventional image feature extraction networks often overlooksemantic relationshins between different regions of the screen, as well as the fine-grained details ofimportant regions, leading to suboptimal description quality. 'To address these challenges, this paperproposes a feature-enhanced image encoding network for UI image captioning. The model leveragesthe concept of Fourier neural operators to perform computations in the frequency domain, facilitatinginformation interaction and enabling the capture of richer global semantic information. Additionallyan attention mechanism is introduced to enhance the extraction of local fne-grained features. Trainedand tested on the Screen2Words dataset, experimental results demonstrate that the proposed modelsignifcantly improves the generation of captions for UI images.

Article activity feed