A Kubernetes-Based AI Framework for Scalable PII Detection and Redaction in Application Logs

Ahmed Shamsan Almasah
sana alyaseri

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The increasing adoption of microservice architectures has amplified the challenges of application log management, particularly concerning the proliferation of personally identifiable information (PII). While logs are crucial for monitoring, debugging, and compliance, the distributed nature of microservices, coupled with stringent data privacy regulations like GDPR, CCPA, and HIPAA, necessitates robust PII detection and redaction mechanisms. Traditional methods, such as regular expressions, are inadequate for the volume and complexity of log data in modern IT environments. This research investigates the application of Natural Language Processing (NLP), specifically transformer-based models, for automated PII detection and redaction within Kubernetes-based microservices. We conduct a comparative analysis of several NLP techniques, including TF-IDF, spaCy's pre-trained model, a CNN-LSTM architecture, and a specialized pre-trained PII detection model (iiiorg/piiranha-v1), using the AI4Privacy dataset. Our evaluation considers accuracy, precision, recall, F1-score, resource utilization, and runtime. The results demonstrate the trade-offs between accuracy, computational cost, and contextual understanding, highlighting the superior performance and efficiency of specialized pre-trained models for balancing these factors in Kubernetes deployments. Specifically, we show that while deep learning models like CNN-LSTM achieve high accuracy, they are resource-intensive. Conversely, while TF-IDF is efficient, it lacks the contextual awareness needed for robust PII detection. Our findings indicate that specialized pre-trained models offer a compelling solution for practical PII redaction in resource-constrained Kubernetes environments.

Version published to 10.21203/rs.3.rs-6264767/v1 on Research Square
Mar 21, 2025

VADSK: Video Anomaly Detection with Structured Keywords

This article has 1 author:
1. Thomas Foltz
This article has no evaluationsLatest version Apr 15, 2025
A Hybrid CNN and Attentive Hierarchical BiLSTM Model with SMO for Intrusion Detection in IIoT

This article has 2 authors:
1. Sushama L. Pawar
2. Mandar S. Karyakarte
This article has no evaluationsLatest version Apr 17, 2025
GPTVD: Vulnerability Detection and Analysis Method Based on LLM's Chain of Thoughts.

This article has 5 authors:
1. Yinan Chen
2. Yuan Huang
3. Xiangping Chen
4. Pengfei Shen
5. Lei Yun
This article has no evaluationsLatest version Apr 21, 2025

Listed in

Abstract

Article activity feed

Related articles

VADSK: Video Anomaly Detection with Structured Keywords

A Hybrid CNN and Attentive Hierarchical BiLSTM Model with SMO for Intrusion Detection in IIoT

GPTVD: Vulnerability Detection and Analysis Method Based on LLM's Chain of Thoughts.