A Comparative Study of Six Indigenous Chinese Large Language Models' Understanding Ability: An Assessment Based on 132 College Entrance Examination Objective Test Items

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

To assist Chinese language teachers in making evidence-based choices of useful and user-friendly domestic large language models in teaching and research, the study took 132 objective questions from the national college entrance examination Chinese language papers from 2021 to 2023 as the data set to assess the performance of six domestic large language models, namely Tongyi Qianwen, GLM-4, KimiChat, Baichuan, Wenxin Yiyan, and Xunfei Spark, in semantic understanding. The assessment revealed that the overall correct rates of the responses of the above six large language models to the questions were 70%, 69%, 57%, 55%, 60%, and 62% respectively. Among them, Tongyi Qianwen and Xunfei Spark performed best in language application questions, with correct rates of 74% each; GLM-4 performed best in ancient poetry reading and modern text reading questions, with correct rates reaching 92% and 77% respectively. The performance of the six large language models in classical Chinese reading questions was not ideal. For the wrongly answered test questions, the researchers corrected and analyzed the answers using the prompt strategy. Finally, the paper put forward several suggestions for promoting the assistance of large language models in Chinese language teaching and research.

Article activity feed