Novel Rapid Approach for Adaptive Gaussian Kernel Density Estimation: Gridpoint-wise Propagation of Anisotropic Diffusion Equation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Anyone who works with data is regularly faced with the issue of how to represent the distribution of the collected data. Nowadays histogramming is still the standard method of choice. Yet, more sophisticated methods exist, such as Kernel Density Estimation (KDE). In comparison, KDE enables smoother and more accurate representation of the data distribution. However, some challenges remain, most of which center around the optimal choice of kernel bandwidth, and the adaptiveness of this bandwidth. Here, we propose a novel method for Gaussian KDE(GKDE) that improves upon classical approaches in terms of accuracy and computational efficiency. Our approach is parallelizable and therefore fast. This parallelizability enables implementation on modern computational hardware, such as GPU, which is a great advantage incomparison to other methods, especially if large amounts of data need to be processed. Furthermore, it automatically chooses the kernel bandwidth based on the collected data and is bydesign adaptive, i.e., the bandwidth varies across the data range. This allows smooth representation of broad features, without oversmoothing sharp features in the distribution. Moreover, ourmethod is applicable in an arbitrary number of dimensions. The approach is—similar to other novel methods—based on the propagation of the heat equation. We propose a novel measure for the detection of the optimal bandwidth, which is the variance to mean ratio of the density estimate on the grid points, for different bootstrapped samples of the original data. Apart from the introduction of the method itself, we show some illustrating examples on the performance of the approach. To this end, we evaluate various different data distributions in one and two dimensions, observing good accuracy across the board. We call this promising new approach Gridpoint-wise Adaptive Density Propagation KDE (GradePro).