A Cross-Domain Performance Report of Open AI ChatGPT o1 Model

Kadhim Hayawi
Sakib Shahriar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) represent a leap in the capabilities of artificial intelligence (AI) in natural language understanding, problem-solving, and domain-specific reasoning. Comparative and cross-domain evaluations of LLMs can help us understand their versatility and limitations, including real-world applicability. The o1 model developed by OpenAI represents a notable milestone in terms of state-of-the-art integration into the aspects of language processing and task execution. This report investigates the o1 (o1-preview) model on various tasks, including but not limited to mathematics, clinical knowledge, professional ethics, and the humanities. The results revealed that the o1 excels in certain areas, particularly in fields requiring specialized knowledge, such as college biology (98%) and clinical knowledge (93%). In comparison, it shows lower performance in areas like professional law (54%) and business ethics (81%).

Version published to 10.20944/preprints202412.1930.v1
Dec 23, 2024

Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods

This article has 5 authors:
1. Deepshikha Bhati
2. Fnu Neha
3. Devi Sri Bandaru
4. Matthew Weber
5. Ishan Dilipbhai Gajera
This article has no evaluationsLatest version Jan 15, 2026
Understanding the Impact of Dataset Characteristics on RAG-based Multi-hop QA Performance

This article has 3 authors:
1. Nimet Aksoy
2. Zekeriya Anıl Güven
3. Murat Osman Ünalır
This article has no evaluationsLatest version Dec 12, 2025
Image and Video Question Answering with Large Language Models: A Comprehensive Review

This article has 3 authors:
1. Alexander Davis
2. Justin Parker
3. Julian Perry
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods

Understanding the Impact of Dataset Characteristics on RAG-based Multi-hop QA Performance

Image and Video Question Answering with Large Language Models: A Comprehensive Review