A Comprehensive Survey of Multimodal Large Language Models: Concept, Application and Safety

Shuai Liu
Weilin Pu
Chongling Xu
Zishuo Huang
Qian Li
Hang Wang
Chenhao Lin
Chao Shen

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent advancements in MLLM, such as those exemplified by developments like GPT-4o, have positioned them as a significant focus within the research community. MLLMs leverage the general capabilities of Large Language Models (LLMs) to handle tasks across multiple modalities, including text, image, audio, and video. With their unique ability to understand and generate content, such as composing narratives from visual inputs, MLLMs are attracting substantial interest from both academia and industry. However, the great outburst of algorithms and techniques of MLLMs has led to the emergence of new types of architectures, applications and safety issues in MLLMs. We provide this more comprehensive survey aiming to document and analyze the latest advancements in MLLMs. First, we introduce the fundamental concepts of MLLMs, including the development history of multimodal algorithms, the architecture of MLLMs, and their evaluation and benchmarks. We then explore advanced techniques in MLLMs, such as Multimodal In-Context Learning, Multimodal Chain of Thought, and LLM-aided Visual Reasoning. Following this, we examine the safety aspects of MLLMs, focusing on security issues, potential attacks, and model safety assessments. Finally, we discuss the current challenges and identify potential areas for future research.

Version published to 10.21203/rs.3.rs-5270567/v1 on Research Square
Oct 18, 2024

Optimizing Large Language Models with Randomized Multimodal Data Injection: A Novel Enhancement Methodology

This article has 5 authors:
1. Donald Howie
2. Ezekiel Montague
3. Ludovic Barros
4. Raphael Kostandinov
5. Gregory Stanhope
This article has no evaluationsLatest version Oct 21, 2024
MLLM4Rec : Multimodal Information Enhancing LLM for Sequential Recommendation

This article has 3 authors:
1. Wang Yuxiang
2. Shi Xin
3. Zhao Xueqing
This article has no evaluationsLatest version Sep 23, 2024
Large Language Models

This article has 2 authors:
1. Johannes B. Gruber
2. Fabio Votta
This article has no evaluationsLatest version Sep 26, 2024

Listed in

Abstract

Article activity feed

Related articles

Optimizing Large Language Models with Randomized Multimodal Data Injection: A Novel Enhancement Methodology

MLLM4Rec : Multimodal Information Enhancing LLM for Sequential Recommendation

Large Language Models