PSMC-FAC: Automated Optimization of False-Negative Rate Corrections for Low-Coverage PSMC-Based Demographic Inference
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Inferring demographic history from whole-genome data is a central objective in evolutionary and conservation genomics. However, the Pairwise Sequentially Markovian Coalescent (PSMC) framework, one of the most widely used demographic inference methods for whole-genome sequence data, is highly sensitive to sequencing coverage, with low coverage producing systematic underestimation of heterozygosity, which biases effective population size trajectories. Here, we present PSMC-FAC, an automated method designed to optimize false-negative rate correction in low-coverage genomes by minimizing geometric distances between FNR-corrected low-coverage trajectories and their corresponding high-coverage references. Whole-genome datasets from humans, gray wolves, and cattle were downsampled across multiple coverage levels and processed through standard demographic inference pipelines. Corrected trajectories, projected onto a common temporal grid, were compared using Hausdorff and discrete Fréchet distance metrics and optimal correction factors were modeled as a function of sequencing depth using second-degree polynomial regression. Across species and demographic contexts, PSMC-FAC substantially improved concordance between low- and high-coverage trajectories and revealed highly predictable coverage-dependent correction patterns. Overall, PSMC-FAC provides a reproducible and mathematically grounded alternative to subjective correction approaches, enabling reliable demographic inference from moderate-coverage genomes and facilitating broader population-scale genomic analyses.