Domain Specific Benchmarks for Evaluating Multimodal Large Language Models

Anjum Anjum
Muhammad Arbab Arshad
Kadhim Hayawi
Efstathios Polyzos
Asadullah Tariq
Mohamed Adel Serhani
Laiba Batool
Brady D. Lund
Nishith Reddy Mannuru
Ravi Varma Kumar Bevara
Taslim Mahbub
Muhammad Zeeshan Akram
Sakib Shahriar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly being deployed across disciplines due to their advanced reasoning and problem-solving capabilities. To measure their effectiveness, various benchmarks have been developed that measure aspects of LLM reasoning, comprehension, and problem-solving. While several surveys address LLM evaluation and benchmarks, a domain-specific analysis remains underexplored in the literature. This paper introduces a taxonomy of seven key disciplines, encompassing various domains and application areas where LLMs are extensively utilized. Additionally, we provide a comprehensive review of LLM benchmarks and survey papers within each domain, highlighting the unique capabilities of LLMs and the challenges faced in their application. Finally, we compile and categorize these benchmarks by domain to create an accessible resource for researchers, aiming to pave the way for advancements toward artificial general intelligence (AGI).

Version published to 10.20944/preprints202505.1993.v1
May 26, 2025

Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods

This article has 5 authors:
1. Deepshikha Bhati
2. Fnu Neha
3. Devi Sri Bandaru
4. Matthew Weber
5. Ishan Dilipbhai Gajera
This article has no evaluationsLatest version Jan 15, 2026
Best Practices for Using Large Language Models at Scale

This article has 5 authors:
1. Bhargavee Kannikanti
2. Arjun Coimbatore Nagarasan
3. Alberto Rosas
4. Sriram Kothandaraman
5. Sravan Kumar Kannuri
This article has no evaluationsLatest version Dec 12, 2025
Opening the Black Box: A Survey on the Mechanisms of Multi-Step Reasoning in Large Language Models

This article has 6 authors:
1. Liangming Pan
2. Jason Liang
3. Jiaran Ye
4. Minglai Yang
5. Xinyuan Lu
6. Fengbin Zhu
This article has no evaluationsLatest version Jan 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods

Best Practices for Using Large Language Models at Scale

Opening the Black Box: A Survey on the Mechanisms of Multi-Step Reasoning in Large Language Models