Promoting Fairness in LLMs: Detection and Mitigation of Gender Bias
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
As large language models (LLMs) play an important role in AI applications, it is crucial to address biases, especially gender bias, to avoid stereotypes and ensure fairness in their results. A biased LLM may result in misrepresenting information, leading to social inequality, and reduced trust. To address these challenges, this research addresses the detection and mitigation of bias, with a focus on gender bias. We employed specialized metrics— Disparity Index (DI), Idea ConsistencyScore (ICS), Thematic Consistency Score (TCS), and Zero-Shot Classification—to evaluate model behavior across sensitive factors in Hindi and English prompts. These metrics were applied to analyze responses to di-verse prompts in both Hindi and English, which helps in the detection of explicit disparities in model outputs across sensitive factors such as gender. On the basis of insights gained from these evaluations, we developed two approaches to address these biases. First, we employed prompt eng-neering to refine model outputs and mitigate bias effectively. Building on these results, we further fine-tuned the model using LoRA (Low-RankAdaptation), a resource-efficient technique, to achieve substantial reductions in bias. Initial prompt engineering reduced polarized responses by 40% and improved positive portrayals by 45%. Further bias reduction was achieved through LoRA-based fine-tuning, lowering gender bias by 37%, racial bias by 27%, and age bias by 30%. These approaches show ascalable method to achieve fairness in LLMs.