Estimating multiplicity of infection, allele frequencies, and prevalences accounting for incomplete data

Meraj Hashemi
Kristan A. Schneider

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Molecular surveillance of infectious diseases allows the monitoring of pathogens beyond the granularity of traditional epidemiological approaches and is well-established for some of the most relevant infectious diseases such as malaria. The presence of genetically distinct pathogenic variants within an infection, referred to as multiplicity of infection (MOI) or complexity of infection (COI) is common in malaria and similar infectious diseases. It is an important metric that scales with transmission intensities, potentially affects the clinical pathogenesis, and a confounding factor when monitoring the frequency and prevalence of pathogenic variants. Several statistical methods exist to estimate MOI and the frequency distribution of pathogen variants. However, a common problem is the quality of the underlying molecular data. If molecular assays fail not randomly, it is likely to underestimate MOI and the prevalence of pathogen variants.

Methods and findings

A statistical model is introduced, which explicitly addresses data quality, by assuming a probability by which a pathogen variant remains undetected in a molecular assay. This is different from the assumption of missing at random, for which a molecular assay either performs perfectly or fails completely. The method is applicable to a single molecular marker and allows to estimate allele-frequency spectra, the distribution of MOI, and the probability of variants to remain undetected (incomplete information). Based on the statistical model, expressions for the prevalence of pathogen variants are derived and differences between frequency and prevalence are discussed. The usual desirable asymptotic properties of the maximum-likelihood estimator (MLE) are established by rewriting the model into an exponential family. The MLE has promising finite sample properties in terms of bias and variance. The covariance matrix of the estimator is close to the Cramér-Rao lower bound (inverse Fisher information). Importantly, the estimator’s variance is larger than that of a similar method which disregards incomplete information, but its bias is smaller.

Conclusions

Although the model introduced here has convenient properties, in terms of the mean squared error it does not outperform a simple standard method that neglects missing information. Thus, the new method is recommendable only for data sets in which the molecular assays produced poor-quality results. This will be particularly true if the model is extended to accommodate information from multiple molecular markers at the same time, and incomplete information at one or more markers leads to a strong depletion of sample size.

Version published to 10.1371/journal.pone.0287161
Mar 21, 2024
Version published to 10.1101/2023.06.01.543300 on bioRxiv
Jun 2, 2023

Accuracy of Plasmodium falciparum genetic data for estimating parasite prevalence and malaria incidence in Uganda

This article has 30 authors:
1. Shahiid Kiyaga
2. Monica Mbabazi
3. Thomas Katairo
4. Kisakye Diana Kabbale
5. Victor Asua
6. Bienvenu Nsengimaana
7. Innocent Wiringilimaana
8. Francis Ddumba. Semakuba
9. Caroline Mwubaha
10. Jackie Nakasaanya
11. Eric Watyekele
12. Alisen Ayitewala
13. Stephen Tukwasibwe
14. Jerry Mulondo
15. Samuel Lubwama. Nsobya
16. Bosco Agaba
17. Catherine Maiteki-Sebuguzi
18. Moses Robert. Kamya
19. David Patrick. Kateete
20. Joyce Nakatumba Nabende
21. Daudi Jjingo
22. Gerald Mboowa
23. Charles Batte
24. Isaac Ssewanyana
25. Andrés Aranda-Díaz
26. Grant Dorsey
27. Philip J. Rosenthal
28. Melissa Conrad
29. Bryan Greenhouse
30. Jessica Briggs
This article has no evaluationsLatest version Dec 9, 2025
Incidence and trends for notifiable infectious diseases in Shenyang, China, 2005-2024

This article has 6 authors:
1. Huijie Chen
2. Huiyu Wen
3. Ye Chen
4. Lihai Wen
5. Zhuo Jin
6. Bingzheng Zhou
This article has no evaluationsLatest version Jan 21, 2026
Genetic Diversity, Drug Resistance, and Molecular Transmission Networks of HIV-1 in Zunyi: Implications for Precision Prevention

This article has 11 authors:
1. Yinyin Wang
2. Yunli Ma
3. Miao He
4. Yonghu Wan
5. Yuxia Liu
6. Lu Ma
7. Yu Zhang
8. Min Zheng
9. Zhi Yuan
10. Zhan Gao
11. Xinhui Zhang
This article has no evaluationsLatest version Jan 22, 2026

Discuss this preprint

Listed in

Abstract

Methods and findings

Conclusions

Article activity feed

Related articles

Accuracy of Plasmodium falciparum genetic data for estimating parasite prevalence and malaria incidence in Uganda

Incidence and trends for notifiable infectious diseases in Shenyang, China, 2005-2024

Genetic Diversity, Drug Resistance, and Molecular Transmission Networks of HIV-1 in Zunyi: Implications for Precision Prevention