Permutation analysis prior to variable selection greatly enhances robustness of OPLS analysis in small cohorts

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The R-workflow roplspvs ( R o rthogonal p rojections of l atent s tructures with p ermutation over v ariable s election) facilitates variable selection, model optimization and significance testing using permutations of OPLS-DA models, with the scaled loadings (p[corr]) as the main metric of significance cutoff. Permutations including (over) the variable selection procedure, prior to (sans), as well as post variable selection are performed. The resulting p- values for the correlation of the model (R 2 ) and the cross-validated correlation of the model (Q 2 ) sans-, post- and over- variable selection are provided as additional model statistics. These model statistics are useful for determining the true significance level of OPLS models, which otherwise have proven difficult to assess particularly for small sample sizes. Furthermore, we propose a means for estimating the background “noise” level based on permutated false positive rates of R 2 and Q 2 . This novel metric is then utilized to calculate an adjusted Q 2 value. Using a publicly available metabolomics dataset, the advantage of performing permutations over variable selection was demonstrated for small sample sizes. Iteratively reducing the sample sizes resulted in overinflated models with increasing R 2 and Q 2 , and permutations post variable selection indicated falsely significant models. In contrast, the adjusted Q 2 was marginally affected by sample size, and represents a robust estimate of model predictability, and permutations over variable selection showed true significance of the models.

Article activity feed