Enhancing the Robustness of OPLS Modelling in Small Cohorts by Leveraging Permutation Analysis Prior to Variable Selection

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The R-workflow ropls-ViPerSNet ( R o rthogonal p rojections of l atent s tructures with V ar i able Per mutation S election and Elastic Net ) facilitates variable selection, model optimization and significance testing using permutations of OPLS-DA models, with the scaled loadings (p[corr]) as the main metric of significance cutoff. Permutations including (over) the variable selection procedure, prior to (pre-), as well as post variable selection are performed. The resulting p-values for the correlation of the model (R 2 ) and the cross-validated correlation of the model (Q 2 ) pre-, post- and over-variable selection are provided as additional model statistics. These model statistics are useful for determining the true significance level of OPLS models, which otherwise have proven difficult to assess particularly for small sample sizes. Furthermore, a means for estimating the background noise level based on permuted false positive rates of R 2 and Q 2 is proposed. This novel metric is then used to calculate an adjusted Q 2 value. Using a publicly available metabolomics dataset, the advantage of performing permutations over variable selection was demonstrated for small sample sizes. Iteratively reducing the sample sizes resulted in overinflated models with increasing R 2 and Q 2 and permutations post variable selection indicated falsely significant models. In contrast, the adjusted Q 2 was marginally affected by sample size, and represents a robust estimate of model predictability, and permutations over variable selection showed true significance of the models. An additional Elastic Net variable selection option is included in the workflow for variable selection by coefficient value penalization using an iterative approach to reduce noise while avoiding overfitting.

Article activity feed