PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Integrating the knockoff framework with any variable-selection method delivers stringent false discovery rate (FDR) control without recourse to p-values, offering a powerful alternative for differential expression analysis of high-throughput omics datasets. However, existing knockoff generators rely on restrictive modelling assumptions or coarse approximations that often inflate the FDR when applied to real-world data.
Results
We introduce Partial Least Squares Knockoff (PLSKO), an efficient, assumption-free generator that remains robust across diverse omics platforms. Our extensive simulations show that PLSKO is the only method to maintain FDR control with sufficient power in complex non-linear settings. Our semi-simulation studies drawn from RNA-seq, proteomics, metabolomics, and microbiome experiments confirm PLSKO generates valid knockoff variables. In pre-eclampsia multi-omics case studies, we combine PLSKO with Aggregation Knockoff to address the randomness of knockoffs and improve power, and demonstrate the method’s ability to recover biologically meaningful features.
Availability and implementation
Our proposed algorithm is available on Github (https://github.com/guannan-yang/PLSKO) and Zenodo (https://doi.org/10.5281/zenodo.16879594)