PLSKO: a robust knockoff generator to control false discovery rate in omics variable selection

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Integrating the knockoff framework with any variable-selection method delivers stringent false discovery rate (FDR) control without recourse to p-values, offering a powerful alternative for differential expression analysis of high-throughput omics datasets. However, existing knockoff generators rely on restrictive modelling assumptions or coarse approximations that often inflate the FDR when applied to real-world data.

Results

We introduce Partial Least Squares Knockoff (PLSKO), an efficient, assumption-free generator that remains robust across diverse omics platforms. Our extensive simulations show that PLSKO is the only method to maintain FDR control with sufficient power in complex non-linear settings. Our semi-simulation studies drawn from RNA-seq, proteomics, metabolomics, and microbiome experiments confirm PLSKO generates valid knockoff variables. In pre-eclampsia multi-omics case studies, we combine PLSKO with Aggregation Knockoff to address the randomness of knockoffs and improve power, and demonstrate the method’s ability to recover biologically meaningful features.

Availability and implementation

Our proposed algorithm is available on Github (https://github.com/guannan-yang/PLSKO) and Zenodo (https://doi.org/10.5281/zenodo.16879594)

Article activity feed