Challenges of Principal Component Analysis in High-Dimensional Settings when n<p

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Principal Component Analysis (PCA) aims to reduce the dimensions of datasets by transforming them into uncorrelated Principal Components (PCs), retaining most of the data’s variation with fewer components. However, standard PCA struggles in high-dimensional settings where there are more variables than observations due to limitations in covariance estimation. PCA relies on covariance matrices to measure variable relationships, using eigenvectors to determine data distribution directions and assessing eigenvalues’ significance. This article examines the pros and cons of estimating high-dimensional covariance matrices and emphasizes the importance of well-conditioned covariance estimation for accurate finite sample PCA. Various methods are available for estimating population covariance, among which Ledoit-Wolf estimation is deemed optimal in scenarios where the number of observations is smaller than the number of variables. However, it tends to excessively shrink the sample covariates matrix, resulting in an underestimation of the true eigen spectrum and a dearth of sparsity. Therefore, there’s a need for sparse and well-conditioned covariance matrix estimation to enhance PC estimation accuracy.

Article activity feed