Task-Conditioned Representation Adaptation for Many-Shot In-Context Learning via Continued Pretraining

Lukas Schneider
Anna-Maria Keller
Michael Tobias Fischer

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

While continued pretraining has been shown to improve many-shot in-context learning (ICL) in large language models, the underlying representation dynamics that support task generalization remain insufficiently explored. This paper introduces a task-conditioned continued pretraining strategy that enhances ICL by explicitly disentangling task-specific and task-invariant representations during pretraining. The method augments standard language modeling objectives with lightweight task-conditioning signals derived from latent task clusters inferred using contrastive embedding similarity. The model is pretrained on a 150-billion-token mixed-domain corpus containing over 3,200 instruction-defined tasks, with each training sequence incorporating up to 128 demonstrations. Empirical evaluations across arithmetic reasoning, code generation, and information extraction tasks indicate that the proposed approach yields consistent gains in many-shot ICL performance, achieving up to 9.3% accuracy improvement over baseline continued pretraining methods. Representation probing further shows a 17% increase in task-separability scores while preserving general linguistic coherence. These findings suggest that task-conditioned representation adaptation during continued pretraining provides a scalable and data-efficient pathway for improving many-shot in-context learning across heterogeneous task distributions.

Version published to 10.21203/rs.3.rs-8866768/v1 on Research Square
Feb 16, 2026

Attention Heatmap Drift in a Contrastively Pretrained Vision–Language Model: A Controlled Matched-Learning-Rate Comparison of Full Fine-Tuning and Low-Rank Adaptation

This article has 1 author:
1. Ruize Xia
This article has no evaluationsLatest version Apr 6, 2026
EPMORE: Explainable Process Mixture-of-Experts

This article has 7 authors:
1. Wei Sheng
2. Chengzhu Xiao
3. Lunhao Ao
4. Junyan Long
5. Ye Yu
6. Yangguang Jia
7. Qihua Zhang
This article has no evaluationsLatest version Mar 24, 2026
LatentRecurrentDepthLM: An Open-Source Framework for Recurrent-Depth Language Models with Controllable Test-Time Compute

This article has 1 author:
1. Ahsan Umar
This article has no evaluationsLatest version Feb 24, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Attention Heatmap Drift in a Contrastively Pretrained Vision–Language Model: A Controlled Matched-Learning-Rate Comparison of Full Fine-Tuning and Low-Rank Adaptation

EPMORE: Explainable Process Mixture-of-Experts

LatentRecurrentDepthLM: An Open-Source Framework for Recurrent-Depth Language Models with Controllable Test-Time Compute