Multimodal Information Fusion with Neural Gating
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In this study, we introduce an innovative framework for multimodal learning that leverages enhanced fusion gate units within gated neural network architectures. The proposed Fusion Gate Unit (FGU) serves as a pivotal component in neural network designs, aiming to derive a comprehensive intermediate representation by amalgamating data from diverse modalities. The FGU is adept at determining the extent to which each modality influences the unit's activation through the utilization of multiplicative gating mechanisms. We conducted evaluations on a multilabel genre classification task for movies, utilizing both plot summaries and poster images as input modalities. The results demonstrate that the FGU significantly elevates the macro F-score compared to single-modality approaches and surpasses existing fusion techniques, including mixture of experts models. Additionally, we present the MM-IMDb dataset alongside this publication, which, to our knowledge, represents the most extensive publicly accessible multimodal dataset for movie genre prediction to date. This dataset is expected to facilitate further research and development in the field of multimodal information processing.