Federated Learning for XSS Detection: Analysing OOD, Non-IID Challenges, and Embedding Sensitivity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper investigates federated learning (FL) for cross-site scripting (XSS) detection under out-of-distribution (OOD) drift. Real-world XSS traffic involves fragmented attacks, heterogeneous benign inputs, and client imbalance, which erode conventional detectors. To simulate this, we construct two structurally divergent datasets: one with obfuscated, mixed-structure samples and another with syntactically regular examples, inducing structural OOD in both classes. We evaluate GloVe, GraphCodeBERT, and CodeT5 in both centralised and federated settings, tracking embedding drift and client variance. FL consistently improves OOD robustness by averaging decision boundaries from cleaner clients. Under FL scenarios, CodeT5 achieves the best aggregated performance (97.6% accuracy, 3.5% FPR), followed by GraphCodeBERT (96.8%, 4.7%), but is more stable on convergence. GloVe reaches a competitive final accuracy (96.2%) but exhibits a high instability across rounds, with a higher false positive rate (5.5%) and pronounced variance under FedProx. These results highlight the value and limits of structure-aware embeddings and support FL as a practical, privacy-preserving defence within OOD XSS scenarios.