Comparative Analysis of Evaluation Methods for Generative Artificial Intelligence Systems and Development of Selection Algorithm

Aleksandr Meshkov

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

With the development of generative artificial intelligence and the active implementation of large language models in the ubiquitous field, a very important task arises, which requires an objective evaluation of the quality of such AI systems. Traditional machine learning metrics turn out to be inapplicable, since solution responses of LLM-based solutions demonstrate high variability in wording while maintaining semantic correctness. This paper analyzes existing approaches to evaluate the quality of systems built on the basis of generative AI, such as lexical methods, semantic embeddings, hybrid approaches based on LLM-as-a-Judge and natural language inference (NLI) methods. Particular attention is paid to the development of an algorithm for selecting the optimal evaluation strategy depending on various tasks, including the latency of evaluation, the correctness and interpretability of the results, as well as the stability and reproducibility of the obtained evaluation results. For comparison, the work presents the results of various evaluation methods using the example of analyzing the accuracy and relevance of a response from an AI system on a set of 500 test examples, demonstrating a correlation with expert assessments in the range from 0.67 to 0.92, depending on the chosen approach. The proposed algorithm can be used to build a suitable evaluation process for AI systems in various domains.

Version published to 10.21203/rs.3.rs-8658385/v1 on Research Square
Mar 20, 2026

Pen, Paper, and Artificial Intelligence: A Hybrid Architecture for Academic Assessment in the Era of Generative AI

This article has 1 author:
1. Dmitriy Khrustalev
This article has no evaluationsLatest version Mar 20, 2026
From Algorithms to Tutors: Tracing the Evolution of Generative AI in Education

This article has 1 author:
1. Deepshikha Aggarwal
This article has no evaluationsLatest version Mar 24, 2026
Efficient Optimization of Large Language Models via Parameter-Efficient Tuning and Adaptive Inference

This article has 1 author:
1. Yang Ji
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Pen, Paper, and Artificial Intelligence: A Hybrid Architecture for Academic Assessment in the Era of Generative AI

From Algorithms to Tutors: Tracing the Evolution of Generative AI in Education

Efficient Optimization of Large Language Models via Parameter-Efficient Tuning and Adaptive Inference