A Token-Agnostic Approach to Controlling Generated Text Length in Large Language Models

Kiannah Foster
Andrew Johansson
Elizabeth Williams
Daniel Petrovic
Nicholas Kovalenko

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid expansion of language models has led to increased demand for precise control over text generation, particularly in terms of output length. Traditional token-based methods often struggle with consistency across languages and text coherence, presenting challenges in tasks that require strict length adherence. A novel token-agnostic approach has been developed to address these limitations, leveraging semantic structures such as sentences and paragraphs to manage length dynamically. Through this method, text generation becomes more flexible and adaptable to a variety of languages and writing styles, ensuring that length constraints are respected without sacrificing fluency or relevance. Experimental results demonstrate the effectiveness of the method when implemented with Llama, yielding high precision in length adherence and text quality across multiple evaluation metrics. This approach offers a robust solution to the ongoing challenge of managing output length in text generation, with potential applications spanning numerous domains, from summarization to content creation.

Version published to 10.21203/rs.3.rs-5204102/v1 on Research Square
Oct 8, 2024

Improving Large Language Models with Concept-Aware Fine-Tuning

This article has 5 authors:
1. Dacheng Tao
2. Michael Chen
3. Xikun ZHANG
4. Jiaxing Huang
5. Yingjie Wang
This article has no evaluationsLatest version Oct 1, 2025
Large Language Models Robustness Against Perturbation

This article has 4 authors:
1. Saeed S. Alahmari
2. Lawerence Hall
3. Peter R. Mouton
4. Dmitry Goldgof
This article has no evaluationsLatest version Oct 9, 2025
Morphological-Core Tokenization: A Novel Approach to Preserve Semantic Integrity in Large Language Models

This article has 1 author:
1. Hemanth Manchabale Papachappa
This article has no evaluationsLatest version Oct 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Improving Large Language Models with Concept-Aware Fine-Tuning

Large Language Models Robustness Against Perturbation

Morphological-Core Tokenization: A Novel Approach to Preserve Semantic Integrity in Large Language Models