A Review and Evaluation of Species Richness Estimation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

The statistical problem of estimating the total number of distinct species in a population (or distinct elements in a multiset), given only a small sample, occurs in various areas, ranging from the unseen species problem in ecology to estimating the diversity of immune repertoires. Accurately estimating the true richness from very small samples is challenging, in particular for highly diverse populations with many rare species. Depending on the application, different estimation strategies have been proposed that incorporate explicit or implicit assumptions about either the species distribution or about the sampling process. These methods are scattered across the literature, and an extensive overview of their assumptions, methodology and performance is currently lacking.

Results

We comprehensively review and evaluate a variety of existing methods on real and simulated data with different compositions of rare and abundant elements. Our evaluation shows that, depending on species composition, different methods provide the most accurate richness estimates. Simpler methods, like the Chao 1 and Chiu estimators, yield accurate predictions for many of the tested species compositions, but tend to underestimate the true richness for heterogeneous populations and small (containing 1% to 5% of the population) samples. When the population size is known, upsampling estimators such as PreSeq and RichnEst often yield more accurate results.

Availability and implementation

Source code for data simulation and richness estimation is available at https://gitlab.com/rahmannlab/speciesrichness .

Article activity feed