Geo-TCAM: A Thangka Captioning Method Integrating Topic Modeling with Geometry- Guided Spatial Attention

Ping Zhong
Wenjin Hu
Yinqiu Zhao
Fujun Zhang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Thangka image captioning, an essential task in cultural heritage preservation, faces challenges due to the complexity of Thangka imagery and the depth of their semantic content. Current deep learning-based methods struggle with extracting detailed features and accurately understanding the semantics of Thangka images, often leading to incomplete or incorrect captions of key elements such as the main deity and the background. To address these challenges, this paper introduces a novel Thangka captioning model, integrating topic modeling and geometry-guided spatial attention (Geo-TCAM). The model employs a multi-level feature integration strategy to enhance feature extraction, including gestures and objects. By incorporating Latent Dirichlet Allocation (LDA) topic weights and visual features (TIF), it leverages external domain knowledge for better semantic understanding. The Geo-TCAM's geometry-guided facial spatial attention module (GFSA) improves spatial layout recognition. Experimental results demonstrate significant improvements in captioning performance, with BLEU-1, BLEU-4, METEOR, and CIDEr scores increasing by 11.9%, 17.1%, 9.7%, and 119.5%, respectively, compared to baseline models. On the COCO public dataset, the Geo-TCAM model also demonstrates outstanding performance, comparable to that of other state-of-the-art models. This study employs the Geo-TCAM model to accurately generate image captions for Thangka images, facilitating the digital preservation and dissemination of cultural heritage.

Version published to 10.21203/rs.3.rs-7108021/v1 on Research Square
Jul 25, 2025

Scene Text Detection Using Attention with Depthwise Separable Convolutions for Mobile Applications

This article has 2 authors:
1. Ramalakshmi Subbukalai
2. Vani Vijayan
This article has no evaluationsLatest version Sep 11, 2025
Neural Graph with Multifaceted Effects of Deep Latent Feature Representation for Point-of-Interest Recommendations

This article has 1 author:
1. Thaair Ameen
This article has no evaluationsLatest version Aug 20, 2025
A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction

This article has 3 authors:
1. Zhiheng Yang
2. Hua Zhang
3. Nanshan Zheng
This article has no evaluationsLatest version Sep 2, 2025

Listed in

Abstract

Article activity feed

Related articles

Scene Text Detection Using Attention with Depthwise Separable Convolutions for Mobile Applications

Neural Graph with Multifaceted Effects of Deep Latent Feature Representation for Point-of-Interest Recommendations

A Multi-Scale Feature Fusion Dual-Branch Mamba-CNN Network for Landslide Extraction