A Probabilistic Approach to Visualize the Effect of Missing Data on PCA in Ancient Human Genomics
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Principal Component Analysis (PCA) is widely used in population genetics to visualize genetic relationships. Methods like SmartPCA enable the projection of ancient samples despite missing genotype data due to degraded DNA, but do not quantify projection uncertainty, risking misinterpretation. We introduce TrustPCA, a probabilistic framework that models the impact of missing loci and provides uncertainty estimates for SmartPCA projections. Using simulations with high-coverage ancient human genomes, we show that TrustPCA accurately quantifies projection uncertainty. Applied to real ancient genomic data, our method improves the reliability of PCA interpretations. We provide TrustPCA as a user-friendly web tool for the research community.