Robust Bandwidth Selection for Median-of-Means Kernel Density Estimation under Heavy Tails and Contamination
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bandwidth selection is a key practical bottleneck in kernel density estimation (KDE): oversmoothing erases structure whereas undersmoothing fabricates spurious modes. Classical selectors (ROT, LSCV, SJ) perform well for light-tailed, uncontaminated samples but can be destabilized by a few extreme observations. We study bandwidth selection tailored to median-of-means KDE (MoMDE), which aggregates block-level statistics to temper heavy tails and gross contamination, and we propose two selectors: a leave-block-out cross-entropy rule (MoM-CV) and a robust plug-in rule based on a MoM curvature pilot. On a logarithmic bandwidth grid, we show that the MoM objective uniformly approximates the ISE risk, that MoM-CV satisfies a weak oracle inequality, and that the robust plug-in bandwidth attains the oracle rate under $\varepsilon$-contamination. Extensive simulations and two real-data studies (equity index returns and hourly Beijing PM2.5) indicate that the proposed selectors suppress spurious modes and reduce Hellinger divergence relative to LSCV while retaining competitive fit against ROT/SJ. We also provide simple diagnostics (Hellinger ribbons and false-peak counts), guidance for choosing the number of blocks, and an implementation whose computational cost scales near-linearly with the grid size.