Green AI for Sustainable Question Answering: Carbon-Aware Fine-Tuning and Retrieval-Augmented Generation at Scale
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large Language Models have become a gem of the 21st century in modern Artificial Intelligence applications, however their environmental impact has raised concerns. This research compares the carbon footprints of three approaches for adapting large language models to new domains, Retrieval Augmented Generation (RAG), Full fine-tuning for three epochs (Full fine-tuning) and Parameter-efficient Fine-tuning (PEFT). RAG and LLM Fine-tuning are two knowledge adaptation techniques widely used to build a domain specific question-answering (QA) model and to improve environmental impact and user experience mitigation techniques such as parameter-efficient fine-tuning (PEFT), quantization and retrieval optimization will be discussed in this proposed system. This paper proposed a break-even analysis across varying query volumes and update frequencies to determine the carbon efficiencies of each approach. This proposed system investigates CO2 emissions by fine-tuning two different LLM models namely t5-small and DistilBERT with three different knowledge adaptation methods, namely, LLM full fine-tuning, LLM LoRA fine-tuning, and LLM-RAG model with a SQuAD QA dataset and provided various mitigation strategies to reduce CO₂ emissions without compromising model quality. The proposed system highlights the CO2 emission of t5-small LoRA fine-tuning is lowest among t5-small fine-tuning methods and DistilBERT LoRA fine-tuning is lowest among DistilBERT fine-tuning methods. The t5-small LoRA fine-tuning recorded 37.13% less carbon emission with compare to the DistilBERT LoRA fine-tuning. This research work finds a way to fine-tune the LLMs model with freely available GPUs, while the actual price to buy the A100 and T4 GPU is very costly.