Optimizing Large Language Models for Efficiency: A Dual-Model Architecture with Dynamic Vocabulary Adjustment

Tom Vatland

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) have revolutionized natural language processing but incursignificant computational and energy costs. We propose a novel dual-model architecturethat optimizes resource use by splitting processing between a lightweight model (Model B) and a full-capacity model (Model A). Model B handles frequent conversational patterns by converting the 70% least-used input tokens into a single [RARE] token, dynamically adjusted based on usage patterns. Critically, Model B also identifies [RARE] tokens within its output stream and top-k selections, routing these to Model A for nuanced handling. Model A processes all complex inputs and [RARE] outputs. Simulations suggest savings of 40-55% in power consumption and 30-40% in server capacity, with potential for greater efficiency through optimization. This approach offers a scalable, adaptive solution for deploying LLMs in resource-constrained environments, ensuring that both rare input and output tokens are processed with full model capacity.

Version published to 10.21203/rs.3.rs-6247457/v1 on Research Square
Mar 21, 2025

Improving the Sample Efficiency of In-Context Learning in Large Language Models Through Meta-Level Optimization

This article has 1 author:
1. Zixuan Zhou
This article has no evaluationsLatest version Apr 13, 2025
Advanced LPeg Techniques: A Dual Case Study Approach

This article has 1 author:
1. Zixuan Zhu
This article has no evaluationsLatest version Apr 22, 2025
A Graph-Retrieval-Augmented Generation Framework Enhances Decision-Making in the Circular Economy

This article has 10 authors:
1. Yang Zhao
2. Chengxiao Dai
3. Dusit Niyato
4. Chuan Fu Tan
5. Keyi Xiang
6. Yueyang Wang
7. Zhiquan YEO
8. Daren TAN
9. Jonathan Low Zhaozhi
10. Eugene HO
This article has no evaluationsLatest version Jun 3, 2025

Listed in

Abstract

Article activity feed

Related articles

Improving the Sample Efficiency of In-Context Learning in Large Language Models Through Meta-Level Optimization

Advanced LPeg Techniques: A Dual Case Study Approach

A Graph-Retrieval-Augmented Generation Framework Enhances Decision-Making in the Circular Economy