Green AI for Sustainable Question Answering: Carbon-Aware Fine-Tuning and Retrieval-Augmented Generation at Scale

Tarunjit Yumnam
Sumegh Tharewal
Amit Kumar Sahu
Timothy Malche

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models have become a gem of the 21st century in modern Artificial Intelligence applications, however their environmental impact has raised concerns. This research compares the carbon footprints of three approaches for adapting large language models to new domains, Retrieval Augmented Generation (RAG), Full fine-tuning for three epochs (Full fine-tuning) and Parameter-efficient Fine-tuning (PEFT). RAG and LLM Fine-tuning are two knowledge adaptation techniques widely used to build a domain specific question-answering (QA) model and to improve environmental impact and user experience mitigation techniques such as parameter-efficient fine-tuning (PEFT), quantization and retrieval optimization will be discussed in this proposed system. This paper proposed a break-even analysis across varying query volumes and update frequencies to determine the carbon efficiencies of each approach. This proposed system investigates CO2 emissions by fine-tuning two different LLM models namely t5-small and DistilBERT with three different knowledge adaptation methods, namely, LLM full fine-tuning, LLM LoRA fine-tuning, and LLM-RAG model with a SQuAD QA dataset and provided various mitigation strategies to reduce CO₂ emissions without compromising model quality. The proposed system highlights the CO2 emission of t5-small LoRA fine-tuning is lowest among t5-small fine-tuning methods and DistilBERT LoRA fine-tuning is lowest among DistilBERT fine-tuning methods. The t5-small LoRA fine-tuning recorded 37.13% less carbon emission with compare to the DistilBERT LoRA fine-tuning. This research work finds a way to fine-tune the LLMs model with freely available GPUs, while the actual price to buy the A100 and T4 GPU is very costly.

Version published to 10.21203/rs.3.rs-8765202/v1 on Research Square
Feb 27, 2026

Efficient Optimization of Large Language Models via Parameter-Efficient Tuning and Adaptive Inference

This article has 1 author:
1. Yang Ji
This article has no evaluationsLatest version Apr 10, 2026
Adaptive-PEFT: Dynamic Rank Adjustment for Efficient and Enhanced Large Language Model Fine-Tuning

This article has 2 authors:
1. Tianrui Zhao
2. Linyu Wu
This article has no evaluationsLatest version Mar 16, 2026
Efficient Knowledge Distillation for News Classification Based on ModernBERT

This article has 2 authors:
1. Xuyang Wang
2. Yuxi Zheng
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Efficient Optimization of Large Language Models via Parameter-Efficient Tuning and Adaptive Inference

Adaptive-PEFT: Dynamic Rank Adjustment for Efficient and Enhanced Large Language Model Fine-Tuning

Efficient Knowledge Distillation for News Classification Based on ModernBERT