Autoencoder-Enhanced Hierarchical Mondrian Anonymization via Latent Representations

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Releasing structured microdata requires balancing utility and privacy under group-based disclosure risks. We propose AE-LRHMA, a hybrid anonymization framework that performs Mondrian-style hierarchical partitioning in an autoencoder-learned latent space and integrates local (k,e) -microaggregation. To explicitly control sensitive-value concentration and diversity within each equivalence class, we introduce a tunable constraint set consisting of k, a maximum sensitive proportion threshold, and an optional sensitive-entropy threshold (used as a hard gate when enabled and otherwise as a soft term in split scoring). The anonymized output is generated via standard interval/set generalization in the original space. Experiments on Adult and Bank Marketing demonstrate that AE-LRHMA yields lower information loss and more stable group structures than representative baselines under comparable settings. We further report linkage-attack-oriented risk metrics to empirically characterize relative disclosure trends, without claiming formal guarantees such as differential privacy.

Article activity feed