A Control-Theoretic MCP Framework for MLLMs’ Efficiency and Interpretability

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Aiming at the computational inefficiency and insufficient interpretability of multimodal large language models (MLLMs) in complex tasks such as multi-round reasoning and medical diagnosis, this paper proposes the MCP (Model-Controller-Presenter) three-layer collaborative framework. By decoupling MLLMs into reasoning, generation, and retrieval sub-modules, integrating a reinforcement learning (RL)-driven dynamic routing algorithm, and designing a task adaptation mechanism, the framework realizes the systematic integration of control theory and MLLM dynamic reasoning for the first time. Experiments on cross-modal benchmark datasets (GLUE, COCO, ScienceQA) show that compared with baseline models (LLaMA-2 7B, GPT-3.5, etc.), MCP improves task performance by 15–30%, enhances reasoning efficiency by 40%, and achieves a 90% manual interpretability score through the Presenter layer. This work provides a practical solution to break through the application bottleneck of MLLMs.

Article activity feed