Multimodal Information Fusion with Neural Gating

Olivia Smith
Ava Martinez
Noah Brown

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In this study, we introduce an innovative framework for multimodal learning that leverages enhanced fusion gate units within gated neural network architectures. The proposed Fusion Gate Unit (FGU) serves as a pivotal component in neural network designs, aiming to derive a comprehensive intermediate representation by amalgamating data from diverse modalities. The FGU is adept at determining the extent to which each modality influences the unit's activation through the utilization of multiplicative gating mechanisms. We conducted evaluations on a multilabel genre classification task for movies, utilizing both plot summaries and poster images as input modalities. The results demonstrate that the FGU significantly elevates the macro F-score compared to single-modality approaches and surpasses existing fusion techniques, including mixture of experts models. Additionally, we present the MM-IMDb dataset alongside this publication, which, to our knowledge, represents the most extensive publicly accessible multimodal dataset for movie genre prediction to date. This dataset is expected to facilitate further research and development in the field of multimodal information processing.

Version published to 10.20944/preprints202409.1917.v1
Sep 24, 2024

Pre- and Post-Gated Attention-based Multimodal Fusion for Skin Lesion Classification

This article has 3 authors:
1. Thi-Trang Nguyen
2. Van-Hieu Vu
3. Viet-Anh Nguyen
This article has no evaluationsLatest version Sep 11, 2025
Structure-Activated and Interest-Aware Multimodal Recommendation Method

This article has 3 authors:
1. HaoYu Wang
2. HongBin Xia
3. XiaoFeng Wang
This article has no evaluationsLatest version Oct 16, 2025
MSS-UNet : Mamba-Based Multi-directional Selective Scanning for Medical Image Segmentation

This article has 6 authors:
1. Jun Wu
2. Pengfei Zhan
3. Xinyi Zhu
4. Shuai Guo
5. Yu Chen
6. Li Yang
This article has no evaluationsLatest version Oct 20, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Pre- and Post-Gated Attention-based Multimodal Fusion for Skin Lesion Classification

Structure-Activated and Interest-Aware Multimodal Recommendation Method

MSS-UNet : Mamba-Based Multi-directional Selective Scanning for Medical Image Segmentation