Ratio Percentile Deviation (RPD): A nonparametric, compositionally robust method for measuring the divergence of a microbial sample from a reference dataset

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Comparing a microbial sample against a reference dataset is essential for many workflows, such as quantifying change in a microbial community after experimental perturbation or evaluating whether a human microbiome sample falls within the range of healthy subjects. Here I present a new method called Ratio Percentile Deviation (RPD) for quantifying the divergence of a microbial sample from a set of reference samples. The RPD method compares features of the test samples to the empirical distribution of features in the reference samples, making no assumption about the underlying distribution of the data. The features used for comparison are ratios between taxa in the same sample, which are invariant between count and relative abundance data; therefore, RPD is appropriate for compositional read-based data. Applying the RPD method to several large microbial datasets shows that it outperforms approaches that compare test samples to a composite reference value (e.g. the centroid of the reference dataset). In case-control analyses using the American Gut Project and Ghana Breast Health datasets, a single RPD predictor discriminated individuals with microbiome dysfunction from healthy controls with AUCs of 0.74-0.79; DeLong tests confirmed RPD’s AUC was significantly higher than the best-performing comparator in all three analyses. I further show that RPD can measure compositional variation within a dataset by applying the metric to the long-term Lake Mendota microbial timeseries. The superior predictive power of RPD versus existing approaches across these varied datasets suggest that this method could be a useful new tool for comparing microbial samples to a reference across multiple study systems. The RPD method is available as an R function in the following github repository: https://github.com/cherren8/RPD

Article activity feed