Evaluating Multimodal Large Language Models for Implicit Advertising Reasoning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In this paper, we present a novel method for implicit reasoning in advertising, focusing on the use of a multimodal large language model integrated with a specialized prompt engineering approach. Advertising often involves implicit reasoning, where messages are conveyed indirectly through a combination of text, images, and emotions. This makes it challenging for traditional models to capture the subtle nuances involved. To address this, we propose an innovative multi-round prompt design that enables the model to refine its reasoning process iteratively, thus improving its ability to decode implicit intentions in advertisements. Our method leverages GPT-4 as the core reasoning engine, and the prompts are designed to guide the model through multiple reasoning phases, ensuring that it effectively incorporates both textual and visual cues. We collect a custom dataset of advertising content to train and evaluate our approach, and our results demonstrate significant improvements in reasoning accuracy compared to baseline models. We also propose novel evaluation metrics, which go beyond traditional accuracy and incorporate human-like reasoning assessments. Experimental results and human evaluations show that our method outperforms existing models, both in terms of reasoning depth and interpretability, making it a promising solution for understanding complex advertising content.