The Impact of Network Preprocessing Structure Design on XAI Explanation Quality

Jinhyuk Son
Ho-Mook Cho
Jaewan Hong
Yongho Kim

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study experimentally analyzes how the design of preprocessing structures for network traffic data influences the explanation quality of artificial intelligence models. Unlike previous studies that focused on improving detection accuracy or architectural complexity, this work demonstrates that even under identical training conditions, differences in input preprocessing strategies can lead to significant variations in model explainability. To this end, four preprocessing-based input structures were compared: (1) tabular inputs based on session features, (2) payload-only inputs using unstructured text, (3) a simple hybrid combination of tabular and payload inputs, and (4) a rule-based natural language structure that integrates rule-transformed tabular features with payload data. All models were trained on the same dataset under identical configurations, with detection performance evaluated using AUROC and F1-score, and explanation quality assessed through Comprehensiveness, Sufficiency, and Deletion–Insertion AUC. Experimental results show that detection performance remained similar across all four structures, but the rule-based model consistently achieved the highest explanation quality across all metrics. In particular, Integrated Gradients analysis revealed the formation of clear semantic cross-references between indicator expressions in the rule sentences and attack cues in the payload, indicating that the model effectively learned to integrate behavioral context and content evidence. These findings demonstrate that the design of network preprocessing structures is a key factor determining model interpretability and explanation fidelity, and emphasize the importance of data representation–oriented approaches for advancing toward explainable and trustworthy security AI systems.

Version published to 10.21203/rs.3.rs-7973071/v1 on Research Square
Nov 4, 2025

Insightimate: Enhancing Software Effort Estimation Accuracy Using Machine Learning Across Three Schemas (LOC/FP/UCP)

This article has 6 authors:
1. Nguyen Nhat Huy
2. Duc Man Nguyen
3. Dang Nhat Minh
4. Nguyen Thuy Giang
5. P. W. C. Prasad
6. Md Shohel Sayeed
This article has no evaluationsLatest version Feb 2, 2026
QModel: A Time-Aware GitHub Mining Framework for Empirical Software Quality Studies

This article has 1 author:
1. Dmytro Polishchuk
This article has no evaluationsLatest version Jan 12, 2026
Enhancing Logistic Regression Performance Through Hyperparameter Tuning: A Comparative Evaluation Across Datasets

This article has 7 authors:
1. Mueed Ahmad
2. Noman Javed
3. Awais Muzafar
4. Mateen Muzafar
5. Hadia Naseer
6. Guantian Huang
7. Dianning He
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Insightimate: Enhancing Software Effort Estimation Accuracy Using Machine Learning Across Three Schemas (LOC/FP/UCP)

QModel: A Time-Aware GitHub Mining Framework for Empirical Software Quality Studies

Enhancing Logistic Regression Performance Through Hyperparameter Tuning: A Comparative Evaluation Across Datasets