Ensemble test for microbiome data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation: Recent research indicates strong correlations between the human microbiome and various diseases. However, statistical analysis of microbiome data is challenging due to its inherent sparsity and high dimensionality. PERMANOVA (Permutational multivariate analysis of variance using distance matrices) has been extensively employed to test the association between microbiome data and biological features. Its non-parametric nature makes it appealing, as it does not impose restrictions on data dimension or distribution. Despite its merits, limitations have restricted its broader application. Results: This paper introduces the E-MANOVA (Ensemble multivariate analysis of variance using distance matrices) method addressing these limitations. Traditional PERMANOVA is not robust to distances and association signals, which can lead to power reduction in certain scenarios. Using the idea of ensemble learning, we take the similarity matrix to the r-th power to construct base test and then combine multiple tests to construct ensemble test. Our test statistic demonstrates high power and robust-ness compared to other existing methods. We also use direct moment approximation and Pearson type III distribution to approximate the permutation null distribution, completely avoiding the computation-ally intensive permutation procedure. Finally, we utilize the Cauchy combination method to aggregate p-values from multiple distances, eliminating the need to prespecify distance measure before analysis. Conclusions: Our extensive simulations demonstrate that our proposed method outperforms existing methods in various situations. Further analysis of real data from cigarette smokers and curated micro-biome data shows that our proposed method generates the maximum number of significant p-values among all methods compared.