Permutation analysis prior to variable selection greatly enhances robustness of OPLS analysis in small cohorts

Marika Ström
Åsa M. Wheelock

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The R-workflow roplspvs ( R o rthogonal p rojections of l atent s tructures with p ermutation over v ariable s election) facilitates variable selection, model optimization and significance testing using permutations of OPLS-DA models, with the scaled loadings (p[corr]) as the main metric of significance cutoff. Permutations including (over) the variable selection procedure, prior to (sans), as well as post variable selection are performed. The resulting p- values for the correlation of the model (R ² ) and the cross-validated correlation of the model (Q ² ) sans-, post- and over- variable selection are provided as additional model statistics. These model statistics are useful for determining the true significance level of OPLS models, which otherwise have proven difficult to assess particularly for small sample sizes. Furthermore, we propose a means for estimating the background “noise” level based on permutated false positive rates of R ² and Q ² . This novel metric is then utilized to calculate an adjusted Q ² value. Using a publicly available metabolomics dataset, the advantage of performing permutations over variable selection was demonstrated for small sample sizes. Iteratively reducing the sample sizes resulted in overinflated models with increasing R ² and Q ² , and permutations post variable selection indicated falsely significant models. In contrast, the adjusted Q ² was marginally affected by sample size, and represents a robust estimate of model predictability, and permutations over variable selection showed true significance of the models.

Version published to 10.1101/2024.03.18.585475v1 on bioRxiv
Mar 20, 2024

A new p-value based multiple testing procedure for generalized linear models

This article has 2 authors:
1. Joseph Rilling
2. Cheng Yong Tang
This article has no evaluationsLatest version Mar 21, 2024
Large-scale composite hypothesis testing for omics analyses

This article has 4 authors:
1. Annaïg De Walsche
2. Franck Gauthier
3. Alain Charcosset
4. Tristan Mary-Huard
This article has no evaluationsLatest version Apr 3, 2024
A new robust and accurate two-sample Mendelian randomization method with a large number of genetic variants

This article has 9 authors:
1. Lei Zhang
2. Jun-Jie Niu
3. Xian-Mei He
4. Xiao Zheng
5. Qi-Gang Zhao
6. Xiu-Juan Yu
7. Li Luo
8. Hai-Gang Ren
9. Yu-Fang Pei
This article has no evaluationsLatest version Apr 18, 2024

Listed in

Abstract

Article activity feed

Related articles

A new p-value based multiple testing procedure for generalized linear models

Large-scale composite hypothesis testing for omics analyses

A new robust and accurate two-sample Mendelian randomization method with a large number of genetic variants