Zero-inflation in the Multivariate Poisson Lognormal Family

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Analyzing high-dimensional count data is a challenge and statistical model-based approaches provide an adequate and efficient framework thatpreserves explainability. The (multivariate) Poisson-Log-Normal (PLN) modelis one such model: it assumes count data are driven by an underlying structuredlatent Gaussian variable, so that the dependencies between counts solely stems from the latent dependencies.However PLN doesn't account for zero-inflation, a feature frequentlyobserved in real-world datasets. Here we introducethe Zero-Inflated PLN (ZIPLN) model, adding a multivariate zero-inflated component to themodel, as an additional Bernoulli latent variable. The Zero-Inflation can befixed, site-specific, feature-specific or depends on covariates. We estimatemodel parameters using variational inference that scales up to datasets with a few thousands variables and compare two approximations:(i) independent Gaussian and Bernoulli variational distributions or (ii)Gaussian variational distribution conditioned on the Bernoulli one. The method is assessedon synthetic data and the efficiency of ZIPLN is established even when zero-inflation concerns up to 90\% of the observed counts.We then apply both ZIPLN and PLN to a cow microbiome dataset, containing 90.6\% ofzeroes. Accounting for zero-inflation significantly increases log-likelihood and reduces dispersion in the latent space, thus leading to improved group discrimination.

Article activity feed