Evaluating Agentic AI Systems: A Balanced Framework for Performance, Robustness, Safety and Beyond

Manish Shukla

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Agentic artificial intelligence (AI)—multi-agent systems that combine large language models with external tools and autonomous planning—are rapidly transitioning from research labs into high-stakes domains. Existing evaluations emphasise narrow technical metrics such as task success or latency, leaving important sociotechnical dimensions like human trust, ethical compliance and economic sustainability under-measured. We propose a balanced evaluation framework spanning five axes (capability&efficiency, robustness& adaptability, safetyðics, human-centred interaction and economic&sustainability) and introduce novel indicators including goal-drift scores and harm-reduction indices. Beyond synthesising prior work, we identify gaps in current benchmarks, develop a conceptual diagram to visualise interdependencies and outline experimental protocols for empirically validating the framework. Case studies from recent industry deployments illustrate that agentic AI can yield 20–60 % productivity gains yet often omit assessments of fairness, trust and long-term sustainability. We argue that multidimensional evaluation—combining automated metrics with human-in-the-loop scoring and economic analysis—is essential for responsible adoption of agentic AI.

Version published to 10.31224/5195
Aug 26, 2025

Evaluating Agentic AI Systems:A Balanced Framework for Performance, Robustness, Safety and Beyond

This article has 1 author:
1. Manish Shukla
This article has no evaluationsLatest version Aug 26, 2025
Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems

This article has 1 author:
1. Manish A. Shukla
This article has no evaluationsLatest version Sep 12, 2025
Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems

This article has 1 author:
1. Manish Shukla
This article has no evaluationsLatest version Sep 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluating Agentic AI Systems:A Balanced Framework for Performance, Robustness, Safety and Beyond

Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems

Adaptive Monitoring and Real-World Evaluation of Agentic AI Systems