Robust Bandwidth Selection for Median-of-Means Kernel Density Estimation under Heavy Tails and Contamination

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Bandwidth selection is a key practical bottleneck in kernel density estimation (KDE): oversmoothing erases structure whereas undersmoothing fabricates spurious modes. Classical selectors (ROT, LSCV, SJ) perform well for light-tailed, uncontaminated samples but can be destabilized by a few extreme observations. We study bandwidth selection tailored to median-of-means KDE (MoMDE), which aggregates block-level statistics to temper heavy tails and gross contamination, and we propose two selectors: a leave-block-out cross-entropy rule (MoM-CV) and a robust plug-in rule based on a MoM curvature pilot. On a logarithmic bandwidth grid, we show that the MoM objective uniformly approximates the ISE risk, that MoM-CV satisfies a weak oracle inequality, and that the robust plug-in bandwidth attains the oracle rate under $\varepsilon$-contamination. Extensive simulations and two real-data studies (equity index returns and hourly Beijing PM2.5) indicate that the proposed selectors suppress spurious modes and reduce Hellinger divergence relative to LSCV while retaining competitive fit against ROT/SJ. We also provide simple diagnostics (Hellinger ribbons and false-peak counts), guidance for choosing the number of blocks, and an implementation whose computational cost scales near-linearly with the grid size.

Article activity feed