Population size estimation when multiple samples carrying the risk of misidentification are taken within the same capture occasion from the same individual

Abstract

Although non-invasive sampling is increasingly used in capture-recapture (CR) monitoring, it carries a risk of misidentification that, if ignored, causes an overestimation of population size. Models that deal with misidentification have been proposed. However, these models assume that only one sample can be collected per individual at one occasion. This is not true for several monitoring programs based on DNA, for example for those that extract the DNA from faecal samples. The models do not take repeated observations into account, leading to biased estimates.

In this paper, we develop an approach that extends the latent multinomial model (LMM) of Link et al., 2010 using a Poisson distribution to model the number of samplings of the same individual on a given occasion. We then conduct simulations to test how our new model performs. As an illustration, we applied the new Poisson model to a collection of Eurasian otter faeces (Lampa et al., 2015).

Our model yields unbiased estimates of population size when the expected number of samples per individual ( λ ) is sufficiently high: simulations with λ ≥ 0.36 and five capture occasions or with λ ≥ 0.23 and seven or more occasions. In contrast, when λ = 0.11 (corresponding to about 42%, 53% and 62% of the individuals being detected with respectively 5, 7 and 9 occasions), the population size is consistently underestimated. Applying the model to the otter dataset confirms the presence of misidentifications, consistent with the authors’ expectations.

Our findings indicate that repeated observations can be modelled without bias. The application on otters shows that our model is necessary to accurately estimate population size in presence of misidentification and repeated observations.

Ever since the implementation of the first capture-mark-recapture models to estimate demographic parameters in natural populations in the 70s, there has been a continuous development of more and more elaborated models allowing researchers to account for issues related to sampling procedures and data uncertainty. This development results from fruitful exchanges between biostatisticians and field biologists (e.g. Arnason and Schwartz 1999), illustrated (and stimulated) by the Euring meetings that hosted many discussions between them about both methodological advances and needs from field data (e.g. Senar et al. 2004). Even though capture-recapture models historically dealt with standard individual marking of animals whose marks were thereafter read again upon recapture or resighting from a distance, they have now been adjusted to different types of data, including those that do not require individuals’ physical capture anymore (i.e. non-invasive methods). These great improvements allow the estimation of population demographic parameters from species that are difficult to capture or for which capture may have unwanted impact (e.g. subsequent movements), provided that there is a way to recognize individuals. Such remote identification can be done via observation of phenotypic traits (e.g. coloured skin, feather or hair patterns) or, as relatively recently developed, via DNA analysis, generating ‘capture’ data similar to physical captures. Yet, these methods can be prone to errors, leading to a risk of misidentifying individuals, thereby generating apparently ‘new’ individuals in a population (called ‘ghosts’ here). This risk is therefore of particular concern for the estimation of one of the main demographic parameters of interest, namely population size, especially because the species for which remote identification is vastly used are often those with small populations and/or main conservation issues, for which knowing population size can be of crucial importance. Due to the appearance of these ‘new’ individuals, usually ‘seen’ only once, population size can be overestimated, sometimes largely. Consequently, there has been effort over the past years dedicated to account for this misidentification issue in capture-recapture models estimating population size.

The paper by Fraysse et al. (2026) provides a nice step forward in this collective effort. The model developed here by the authors builds on a previous model, the Latent Multinomial Model (LMM), which allows estimating the misidentification rate from the excess of capture histories corresponding to individuals seen only once (the ‘ghosts’) compared to their expected number in the absence of misidentifications. The main advantage of this model compared to previous approaches was that the full data set can be kept, instead of discarding all ‘capture’ histories with a single capture – some being the result of misidentifications creating ghosts but some being real single observations (e.g. for transient individuals). Yet, the LMM assumes that individuals can only be ‘seen’ once per capture occasion, while individuals can in fact be detected multiple times. Fraysse et al. address here this main weakness by presenting an extension of the LMM accounting for multiple observations for a given individual in a given sampling occasion. To illustrate their model extension, they use both real data from an otter population and simulated data exploring parameter sets to compare parameter estimates between three different models: (i) a standard closed population capture-recapture model, (ii) a model excluding all single-capture histories and (iii) their extension of the model accounting for misidentification allowing several observations for the same individual (each observation being prone to a misidentification) per ‘capture’ session.

The results show that the new model extension performs better than both the standard model (in particular in terms of estimate reliability) and the model excluding single-capture histories (in particular in terms of estimate precision). The authors discuss the reasons for this improvement, among which the additional information contained in multiple captures of the same individuals in a given occasion that the new model extension can make full use of, contrary to previous models. The simulations also help identifying the conditions in which this new extension should prove most useful, namely high capture and identification rates. The study thereby also provides key information about the sampling effort needed to calibrate the data to be collected for an efficient estimation of parameters under misidentification risk. Even though small population sizes may generate an underestimation bias, the new model appears of high interest by preventing discarding interesting sampling information (e.g. transients).

The authors then discuss perspectives for future developments, in particular (i) the need to account for heterogeneity in individual capture rate, that could be due e.g. to spatial distribution but also individual characteristics (e.g. personality traits), (ii) the possibility to model repeated similar misidentifications leading to the same ‘new’ individuals (ghosts) seen several times (although this should remain very rare), and (iii) the generalisation of the model extension to estimate other demographic parameters, and more specifically survival. More refinements are therefore to be expected that will continue improving this efficient tool for analysing field capture data from remote sampling, especially using DNA, in the potential presence of misidentification and with repeated observations of given individuals on a given capture occasion.

References

Arnason, A. N., and Schwarz, C. J. (1999). Using POPAN-5 to analyse banding data. Bird Study, 46(sup1), S157–S168. https://doi.org/10.1080/00063659909477242

Senar, J. C., Dhondt, A. A. and Conroy, M. J. (2004). The quantitative study of marked individuals in ecology, evolution and conservation biology: a foreword to the EURING 2003 Conference. Animal Biodiversity and Conservation, 27.1. https://doi.org/10.32800/abc.2004.27.0001

Rémi Fraysse, Rémi Choquet, Roger Pradel (2026) Population size estimation when multiple samples carrying the risk of misidentiﬁcation are taken within the same capture occasion from the same individual. bioRxiv, ver.4 peer-reviewed and recommended by PCI Ecology https://doi.org/10.1101/2024.06.12.598605

Read the original source

Population size estimation when multiple samples carrying the risk of misidentification are taken within the same capture occasion from the same individual

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Using Classification Trees to Identify the Best Method in Monte Carlo Simulations: From Population Parameters to Observed Features

Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data

A comparison of methods to assess selective disappearance and quantify ageing

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Using Classification Trees to Identify the Best Method in Monte Carlo Simulations: From Population Parameters to Observed Features

Bayesian Indirect Estimation of Historical Fertility in Europe and US Using Online Genealogical Data

A comparison of methods to assess selective disappearance and quantify ageing