Discovering potential key features of genome wide profiling data using Decision Variable Analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The identification of key features related to phenotype of interest (POI) from high dimensional data has been one of the important issues for omics-data study, such as transcriptome or DNA methylome data. However, these data are commonly contaminated by sources of unwanted variation caused by platforms, batches or other types of biological factors. Thus, the data can be considered as a combination of variation derived from POI and other confounding factors. Not taking into consideration for these factors could lead to spurious associations and missing important signals. Based on this idea, we propose a novel feature selection method called Decision Variable Analysis (DVA) to extract the important features related to POI from the data containing potential confounding factors. Using this method on the simulated data and real data, respectively, we found DVA performed better in identifying confounding factors comparing to other methods, including linear regression and surrogate variable analysis. Especially, our method is more efficient for the data in which there are much more feature number than sample size. We show improvements of DVA across high-dimensional datasets with smaller samples size compared to feature number on different platforms. The results indicate that DVA is an effective method to dissect sources of variation for omics-data with potential confounding factors. DVA is freely available for use at [https://github.com/xvon1/DVA](https://github.com/xvon1/DVA).

Article activity feed