Auditing multimodal large language models for context-aware content moderation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The development of multimodal large language models (MLLMs) offers new possibilities for context-aware content moderation by integrating text, images, and other data. This study uses a series of conjoint experiments to systematically audit GPT-4o, a state-of-the-art MLLM, to understand how it uses demographics and other contextual information when assessing hate speech. We show that GPT-4o's moderation decisions align with human judgments, particularly in recognizing norms regarding the use of reclaimed slurs by marginalized groups, and additional prompting can further enhance sensitivity to context. We further demonstrate that multimodal identity cues---combining visual and textual information---facilitate more nuanced content evaluation than unimodal approaches. This research underscores the potential and limitations of MLLMs for automated moderation and introduces conjoint analysis as a tool for auditing AI use in complex, context-sensitive applications.

Article activity feed