Federated Learning for XSS Detection: Analysing OOD, Non-IID Challenges, and Embedding Sensitivity

Bo Wang
Imran Khan
Martin White
Natalia Beloff

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper investigates federated learning (FL) for cross-site-scripting (XSS) detection under realistic out-of-distribution (OOD) drift. Real-world XSS traffic mixes fragmented attack payloads, heterogeneous benign inputs and client-side imbalance, which erode conventional detectors. To emulate this variability, we construct two structurally divergent datasets: one containing obfuscated, fragmented attacks and mixed-structure benign samples that blend code, natural-language text and trace fragments, and another comprising syntactically regular examples. This split induces structural OOD in both malicious and benign classes. We train GloVe, GraphCodeBERT and CodeT5 in centralized and federated settings while tracking embedding drift and client-level gaps. FL generally strengthens OOD robustness by averaging stable decision boundaries from cleaner clients into noisier ones. In federated tests, transformer-based embeddings achieve the highest global accuracy, whereas static GloVe vectors remain the least sensitive to negative-class drift. These findings highlight both the limits and value of structure-aware features in FL and suggest FL as a practical, privacy-preserving defence against distributionally mismatched XSS attack.

Version published to 10.20944/preprints202505.0439.v4
Jun 3, 2025
Version published to 10.20944/preprints202505.0439.v3
May 26, 2025
Version published to 10.20944/preprints202505.0439.v2
May 12, 2025
Version published to 10.20944/preprints202505.0439.v1
May 7, 2025

Federated Learning for XSS Detection: Analysing OOD, Non-IID Challenges, and Embedding Sensitivity

This article has 4 authors:
1. Bo Wang
2. Imran Khan
3. Martin White
4. Natalia Beloff
This article has no evaluationsLatest version Jun 3, 2025
AI-Powered Automated Bug Bounty Platform

This article has 5 authors:
1. Tahir Naquash
2. Zeeshan Yalakpalli
3. Shania Margaret Saini
4. Shivshankar -
5. Ayesha Siddiqua
This article has no evaluationsLatest version Jun 17, 2025
Machine Learning for Lateral Movement Detection using Sysmon Logs: An Empirical Comparison of Imbalanced and Resampled Data

This article has 2 authors:
1. Christos Smiliotopoulos
2. Georgios Kambourakis
This article has no evaluationsLatest version Jun 6, 2025

Listed in

Abstract

Article activity feed

Related articles

Federated Learning for XSS Detection: Analysing OOD, Non-IID Challenges, and Embedding Sensitivity

AI-Powered Automated Bug Bounty Platform

Machine Learning for Lateral Movement Detection using Sysmon Logs: An Empirical Comparison of Imbalanced and Resampled Data