LLM-Skill Orchestration: Achieving 202/202 Subtask Completion via Rule-Augmented Multi-Model Collaboration in 50 Agentic Tasks

Rui Tao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

LLM agents typically rely on a single model for multi-step tool-using tasks, creating a tension between required capability breadth and individual model limitations. We introduce LLM-Skill Orchestration, a three-layer architecture where: (1) a reasoning model generates orchestration rules from system constraints alone; (2) a planning model decomposes tasks into skill graphs with explicit dependencies; and (3) heterogeneous LLM-Skills — both pure-text and tool-equipped — execute in parallel through a shared context pool. We evaluate 50 agentic tasks across five types (information retrieval, code construction, cross-system analysis, multi-step reasoning, compound decision-making). Each task has 4–6 binary checklist items, totaling 202 items. The rule-augmented system (Hb) achieves 202/202 completion and 17.5/20 average quality (LLMas- Judge, σ=2.0), compared to 137/202 (68%) and 7.4/20 for the single-model baseline (A), and 166/202 (82%) and 13.7/20 for static-rule orchestration (C). Three ablation findings shape our understanding (5-task pilot study, used for relative comparisons only): (i) same-model decomposition (D: 8/22) performs worse than no decomposition (A: 13/22), proving that model diversity, not parallelism, drives collaborative gains; (ii) rule-blind generation (Hb: 96/100) outperforms rule-informed generation (Hi: 76/100), demonstrating that deductive reasoning from system invariants generalizes better than inductive learning from failure cases; (iii) 34 of 227 skills (15%) produced 0- byte output due to API anomalies across the full 50-task evaluation, yet all were autonomously compensated by the synthesis stage — an emergent architectural resilience not designed into the system.

Version published to 10.21203/rs.3.rs-9323974/v1 on Research Square
Apr 7, 2026

Perceive, Plan, Act, Self-Correct: An Architectural Framework for Goal-Directed Agentic AI Systems

This article has 1 author:
1. Hameem Mahdi
This article has no evaluationsLatest version Apr 2, 2026
Automating Best-Practice Refactoring in Java via Multi-Agent Planning and Verification

This article has 4 authors:
1. Jian Yang
2. Jing Li
3. Yuanyuan Gao
4. Jiao Jiao
This article has no evaluationsLatest version Apr 13, 2026
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data

This article has 1 author:
1. Rong Lu
This article has no evaluationsLatest version Apr 27, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Perceive, Plan, Act, Self-Correct: An Architectural Framework for Goal-Directed Agentic AI Systems

Automating Best-Practice Refactoring in Java via Multi-Agent Planning and Verification

TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data