Ensemble-Based Anomaly Detection in Brazilian Parliamentary Expenses Using Unlabeled Public Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The rapid growth of digital information offers both opportunities and challenges for monitoring public resource usage. In Brazil, members of the National Congress receive monthly budgets for reimbursable expenses, but manual oversight of this spending is inefficient and error-prone. This work introduces an unsupervised methodology that integrates an ensemble of six Machine Learning (ML) techniques to detect anomalies in parliamentary expenses using unla-beled government data. To guide model training, we leverage six datasets from the ADBench framework, a comprehensive anomaly detection benchmark comprising multiple datasets, selected for their similarity to the target domain. To evaluate the proposed approach, we propose four new real-world unlabeled datasets, which are collectively named the Brazilian Parliamentary Expenses Datasets (BPED), obtained from official reimbursement records of the Brazilian’s Quota for Parliamentary Activity (CEAP). Results demonstrate that combining multiple anomaly detection methods enhances precision and robustness in identifying irregular expenditures, advancing transparency and accountability in public administration. Moreover, the proposed approach helps prioritize oversight efforts by directing investigations toward cases with a higher probability of irregular behavior.

Article activity feed