Fine-grained Insider Threat Detection with Large Language Models: A Comparative Study
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Insider threats remain a significant challenge in cybersecurity, demanding more effective and efficient detection strategies. The advent of Large Language Models (LLMs) presents new opportunities in Insider Threat Detection (ITD), particularly in monitoring and analyzing behavioral patterns indicative of potential threats. However, LLMs also present limitations, such as the tendency to generate inaccurate or misleading outputs due to their generative nature. This study explores the use of LLMs for ITD by leveraging the CERT r4.2 dataset. We perform a comprehensive comparative analysis of fine-tuned models—specifically BERT, LLaMA 3, and Phi 3—used as classifiers in both binary and multi-class classification tasks, as well as generative models through in-context learning (ICL) techniques. Our findings demonstrate that fine-tuned LLMs achieve high accuracy and stability in detecting insider threats, even across complex multi-class scenarios. These models consistently outperform baseline methods, effectively capturing subtle behavioral cues associated with insider risks. Additionally, we introduce a refined Chain-of-Thought (CoT) prompting method that significantly improves ICL performance, particularly for scenario-specific threat identification. We also investigate the models' ability to manage previously unseen insider behaviors by incorporating a dedicated “Unknown” class. Results reveal that LLMs frequently misclassify these unknown behaviors as benign, especially in high-risk contexts, underscoring the difficulty of detecting novel threats in practical ITD applications.