Auditing multimodal large language models for context-aware content moderation

Thomas Davidson

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The development of multimodal large language models (MLLMs) offers new possibilities for context-aware content moderation by integrating text, images, and other data. This study uses a series of conjoint experiments to systematically audit GPT-4o, a state-of-the-art MLLM, to understand how it uses demographics and other contextual information when assessing hate speech. We show that GPT-4o's moderation decisions align with human judgments, particularly in recognizing norms regarding the use of reclaimed slurs by marginalized groups, and additional prompting can further enhance sensitivity to context. We further demonstrate that multimodal identity cues---combining visual and textual information---facilitate more nuanced content evaluation than unimodal approaches. This research underscores the potential and limitations of MLLMs for automated moderation and introduces conjoint analysis as a tool for auditing AI use in complex, context-sensitive applications.

Version published to 10.31235/osf.io/f2p7d on OSF Preprints
Oct 25, 2024

Fine-tuned Large Language Models Can Replicate Expert Coding Better than Trained Coders: A Study on Informative Signals Sent by Interest Groups

This article has 3 authors:
1. Dahyun Choi
2. Denis Peskoff
3. Brandon M. Stewart
This article has no evaluationsLatest version Jun 3, 2025
When Corporate Chatbots Show Bias: A Multi-Dimensional Analysis of LLMs in Enterprise Settings

This article has 3 authors:
1. Shreya Bhattacharya
2. Vincent Hagenow
3. Marco Di Gennaro
This article has no evaluationsLatest version May 16, 2025
MultiLLM – Self Reflect Iterative Prompt Methodology based Automated Essay Scoring System

This article has 2 authors:
1. R. Johnsi
2. G. Bharadwaja Kumar
This article has no evaluationsLatest version Jun 18, 2025

Listed in

Abstract

Article activity feed

Related articles

Fine-tuned Large Language Models Can Replicate Expert Coding Better than Trained Coders: A Study on Informative Signals Sent by Interest Groups

When Corporate Chatbots Show Bias: A Multi-Dimensional Analysis of LLMs in Enterprise Settings

MultiLLM – Self Reflect Iterative Prompt Methodology based Automated Essay Scoring System