CAFT: A Compositional Log-Linear Model for Microbiome Data with Zero Cells
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Differential abundance analysis is fundamental to microbiome research and provides valuable insights into host-microbe interactions. However, microbiome data are compositional, highly sparse (with many zero counts), and influenced by differential experimental biases across taxa. Standard statistical methods often overlook these features. Many approaches analyze relative abundances without accounting for compositionality or rely on pseudocounts, potentially leading to spurious associations and inadequate false discovery rate (FDR) control. Methods: We introduce a novel framework for differential abundance analysis of microbiome data: the Compositional Accelerated Failure Time (CAFT) model. CAFT addresses zero read counts by treating them as censored observations that are below a detection limit. This approach is inherently resistant to multiplicative technical bias, eliminates the need for pseudocounts, and addresses compositional bias through the establishment of appropriate score test procedures. Results: Extensive simulations show that CAFT outperforms competing compositional differential abundance methods, including LOCOM, LinDA, ANCOM-BC2, its robust variant, and LDM-clr by offering more robust type I error and FDR control with or without technical bias. Additionally, we applied CAFT to microbiome data on inflammatory bowel disease (IBD) and the upper respiratory tract (URT) to identify differentially abundant gut microbial taxa between IBD patients and healthy controls, as well as URT taxa distinguishing smokers from non-smokers. Conclusion: We present CAFT, a powerful, robust, and efficient approach for compositional differential abundance analysis. CAFT effectively controls Type I error and maintains FDR control, while demonstrating enhanced power in statistical testing. These capabilities render CAFT a useful tool for compositional microbiome data analysis.