Enhanced Triple-branch Network for Generalized Zero-Shot Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Generalized Zero-Shot Learning (GZSL) aims to transfer knowledge from seen data to unseen categories and correctly distinguish both seen and unseen data. Most direct attribute prediction (DAP) methods focus on projecting visual features into attribute space and aligning them with class prototypes. However, domain shifts arise during the dataset acquisition due to camera angles or occlusion. Using the class prototype directly can not accurately represent each sample and will mislead the alignment of visual features and attributes. Moreover, the regularly used visual feature extraction backbone, such as ResNet, is trained on ImageNet, resulting in cross-dataset bias for the GZSL task. Therefore, this paper proposes an Enhanced Triple-branch Network (ETN), which has three branches: Evolving Attribute Feature Learning (EAFL), Cross-dataset Denoising Module (CDM), and Class Prototype Purification (CPP). EAFL progressively refine attribute features by correcting the semantic prototypes and profoundly explores the relationship between individual and group attribute features. CDM uses a mask to mitigate cross-dataset bias between ImageNet and GZSL benchmarks. CPP learns domain shifts of attributes, guaranteeing the predicted attributes align with specific samples. Furthermore, ETN is conducted on three datasets, demonstrating superior performance compared to some other methods in Generalized Zero-Shot Learning. Specifically, on the SUN dataset, the proposed ETN shows an advantage of $1.5\%$ in the GZSL compared to the second-best method. ETN has demonstrated notable advancements compared to the baseline, achieving improvements of $2.5\%$ and $3.4\%$ on the CUB dataset and AWA2 dataset, respectively. The codes are available at \url{https://github.com/GAInuist/ETN}.

Article activity feed