A Cross-Domain Performance Report of Open AI ChatGPT o1 Model
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) represent a leap in the capabilities of artificial intelligence (AI) in natural language understanding, problem-solving, and domain-specific reasoning. Comparative and cross-domain evaluations of LLMs can help us understand their versatility and limitations, including real-world applicability. The o1 model developed by OpenAI represents a notable milestone in terms of state-of-the-art integration into the aspects of language processing and task execution. This report investigates the o1 (o1-preview) model on various tasks, including but not limited to mathematics, clinical knowledge, professional ethics, and the humanities. The results revealed that the o1 excels in certain areas, particularly in fields requiring specialized knowledge, such as college biology (98%) and clinical knowledge (93%). In comparison, it shows lower performance in areas like professional law (54%) and business ethics (81%).