A Comparative Analysis of Chain-of-Thought Distillation from Gemini 3 to Legacy (Flan-T5) and Modern (Gemma) SLMs for Domain-Specific Classification

Vignesh Chinthakuntla
Sankar Ganesh Paramasivam
Tejaswini Neelarapu
Jagadesh Radhakrishnan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) such as Gemini 3 demonstrate strong multi-step reasoning, but the associated memory footprint and inference latency limit suitability for real‑time, edge‑deployed financial services. Small Language Models (SLMs) enable lower-cost deployment, yet standard supervised fine‑tuning frequently fails to capture fine‑grained intent boundaries in customer support taxonomies. A comparative analysis is conducted for distillation-by-synthesis that transfers chain‑of‑thought (CoT) supervision from a Teacher LLM (Gemini 3) into two Student architectures: a legacy encoder–decoder model (Flan‑T5 Base, 250M parameters) and a modern decoder‑only model (Gemma 2B). A reasoning‑augmented training set is synthesized on Banking77 by prompting the Teacher to produce intent labels together with short, structured justifications that highlight discriminative cues (for example, separating card_arrival from card_delivery_estimate). Student models are fine‑tuned to generate both an intent label and an aligned rationale. Evaluation covers three dimensions: (1) intent accuracy, (2) reasoning fidelity measured through rubric‑based label–rationale consistency, and (3) inference latency under batch‑1 serving. Results indicate that Gemma 2B yields the strongest accuracy and the most nuanced explanations, while Flan‑T5 Base delivers a favorable deployment trade‑off by maintaining competitive accuracy with substantially lower memory demand and latency. The analysis clarifies how architectural bias (encoder–decoder stability versus decoder‑only generation capacity) interacts with CoT distillation, providing guidance for low‑latency intent classifiers in compliance‑sensitive banking environments.

Version published to 10.21203/rs.3.rs-8798957/v1 on Research Square
Feb 10, 2026

EPMORE: Explainable Process Mixture-of-Experts

This article has 7 authors:
1. Wei Sheng
2. Chengzhu Xiao
3. Lunhao Ao
4. Junyan Long
5. Ye Yu
6. Yangguang Jia
7. Qihua Zhang
This article has no evaluationsLatest version Feb 12, 2026
EPMORE: Explainable Process Mixture-of-Experts

This article has 7 authors:
1. Wei Sheng
2. Chengzhu Xiao
3. Lunhao Ao
4. Junyan Long
5. Ye Yu
6. Yangguang Jia
7. Qihua Zhang
This article has no evaluationsLatest version Feb 12, 2026
Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods

This article has 5 authors:
1. Deepshikha Bhati
2. Fnu Neha
3. Devi Sri Bandaru
4. Matthew Weber
5. Ishan Dilipbhai Gajera
This article has no evaluationsLatest version Jan 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

EPMORE: Explainable Process Mixture-of-Experts

EPMORE: Explainable Process Mixture-of-Experts

Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods