Dimension-Direct Routing: Achieving 25% Depth Improvement in Multi- Model LLM Systems via Explicit Capability Factorization

Tao Rui

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) exhibit distinct capabilities across different knowledge domains, yet single-model deployments struggle with knowledge-intensive tasks requiring cross-domain reasoning. We present eVoiceClaw Desktop, a multi-model orchestration system that operationalizes an \"AI managing AI\" paradigm: instead of humans manually selecting models, the system dynamically routes complex queries to specialized models through a dimension-direct routing algorithm.\n\nThe system underwent four major configuration iterations (V1–V4), culminating in V5 that addresses critical challenges in cross-domain task allocation and semantic accumulation bias. V5 achieves a 98% workflow trigger rate across 50 benchmark questions in Chinese, leveraging 12 models with balanced diversity (top model ≤16% usage share).\n\nWe evaluate response quality using LLM-as-Judge (Claude Opus 4.6) across four dimensions: factual accuracy, completeness, depth, and structure. Compared to single-model baselines, V5 achieves a 14.3% overall quality improvement, with depth of analysis improving by 25.9%, at the expense of approximately 9× higher latency and cost.\n\nAs a meta-demonstration, the initial draft of this paper was itself generated by the system (see Appendix B).

Version published to 10.21203/rs.3.rs-9317311/v1 on Research Square
Apr 7, 2026

ConsultChain: Progressive Context Distillation Across Heterogeneous LLM Fleets for Token-Optimal Inference

This article has 1 author:
1. Samuel Edusa
This article has no evaluationsLatest version Apr 13, 2026
DeepServe: SLO-Aware and Cost-Aware Elastic Scheduling for Serverless Multi-Tenant LLM Inference

This article has 5 authors:
1. Xuexian Li
2. Xiayuan Liu
3. Zilong Wang
4. Chun-Yao Hsieh
5. Yixue Liu
This article has no evaluationsLatest version Apr 7, 2026
RCT-MARS: When Per-Query Retrieval Routing Fails, and What It Takes to Succeed

This article has 1 author:
1. Ankit Srivastava
This article has no evaluationsLatest version Apr 7, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

ConsultChain: Progressive Context Distillation Across Heterogeneous LLM Fleets for Token-Optimal Inference

DeepServe: SLO-Aware and Cost-Aware Elastic Scheduling for Serverless Multi-Tenant LLM Inference

RCT-MARS: When Per-Query Retrieval Routing Fails, and What It Takes to Succeed