Learning mixed-infection strains from older hosts: a new sampling scheme for malaria epidemiology and population genetics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Malaria is characterized by frequent mixed infections and extremely high strain diversity, against which hosts do not acquire sterile immunity. The genetically divergent strains circulating at the within-host and population levels bring challenges to efficient disease control. Most molecular epidemiological studies investigate diversity patterns of pathogen isolates from children, usually under the age of five, since children suffer from higher rates of mortality and severe symptoms in endemic regions. However, a higher multiplicity of infection (MOI) in children makes it difficult to resolve the underlying strain genetic structure within hosts and populations, bringing obstacles for studying population genetics of malaria field samples.
Methods
In this study, we investigate the impact of host age on the inference accuracy of malaria parasite strains and diversity structure, and propose the best age group sampling strategy for molecular epidemiological studies. By integrating an agent-based malaria transmission model with a state-of-the-art computational tool to resolve mixed infections, we compared the SNP-haplotype inference quality from host samples across different age groups and transmission intensities. Inference quality was assessed using four metrics: the recovery of true haplotypes, strain dictionary, shared infections among hosts, and the underlying strain diversity structure from richness estimators. We then validated the inference quality patterns using an empirical dataset from Uganda.
Findings
Samples from the old hosts (age>15) show the best inference accuracy compared to young children (age<5) and young children with low levels of mixed infections (age<5, MOI≤3) from multiple aspects. The within-host strain composition is most accurately inferred from old hosts. Regarding strain haplotype dictionary recovery, 91% haplotypes inferred from old hosts are at least with 0.9 identity to the true strains under intermediate transmission intensity, while that percentage drops to 47% in young children. Under high transmission intensity, the percentages of inferred strains with at least 0.8 identity to the true strains are 83% and 31% in old hosts and young children, respectively. Similarly, the sharing network of infections among hosts is most accurately inferred from old hosts. More importantly, strains accurately recovered from old hosts represent frequent strains at the population level. Evaluation of both simulated and empirical datasets indicates that the strain diversity structure inferred from old hosts also yields the best estimates of population-level strain richness.
Interpretation
Our results show that malaria parasite isolates from old hosts can provide the most accurate inference of strain haplotypes, shared infections and population-level strain richness. Therefore, a field study with enriched adult sampling in malaria molecular epidemiology helps inform population-level strain composition, infection patterns, and overall diversity, thereby facilitating research on parasite population genetics and improving effective malaria control. Although the traditionally focused young hosts indeed carry more infections, the high levels of within-host mixed infections bring difficulties to strain inference. Instead, old hosts carry fewer strains, but with higher fitness, thus improving the inference accuracy while facilitating practical disease control. Our findings suggest that older hosts may become a new focus for field sampling in future malaria epidemiological research.
Funding
This research was partly supported by the Ralph W. and Grace M. Showalter Award and discretionary funds from the Mary J. Elmore New Frontiers Professorship at Purdue University.
Research in context
Evidence before this study
A search conducted on PubMed on November 7, 2025, for relevant published research which used the terms ((“malaria”[Title/Abstract] OR “falciparum”[Title/Abstract]) AND (“age structure” OR “age group” OR “age pattern” OR “age-specific”) AND (“genetics” OR “var”)), yielded 81 studies between 1978 and the present day, with only 9 published studies in the past 3 years. Most studies focus on age patterns of disease prevalence, the expression of specific genes or alleles (such as those encoding merozoite surface proteins), and the levels of antibodies. Evidence suggests severe symptoms and mortality in younger hosts, that disease prevalence typically peaks among pre-school or school-age children, and that disease prevalence decreases with age. In the meantime, there is still a considerable number of malaria cases among adolescents and adults in regions with low or intermediate transmission intensity. As the burden of malaria shifts towards older hosts in several regions with declining transmission intensity, evidence suggests that older hosts have overall lower levels of mixed infections and carry more distinct parasites. These findings indicate that older hosts can potentially serve as ideal candidates in learning the parasite strain haplotypes, thus facilitating malaria population genetic studies. Nevertheless, only 4 out of the 79 studies characterized malaria parasites at the level of strain haplotypes, and the age patterns of strain genetic structure remain unexplored. A clear understanding of how malaria strain structure relates to host age is still lacking, obscuring the role of older hosts in improving the inference of genetic structure in malaria.
Added value of this study
Recent advances in algorithms to resolve strain haplotypes from mixed infections provide a good opportunity to close this research gap to investigate the impact of host age on learning parasite genetics. Our study integrates a state-of-the-art strain haplotype learning algorithm called SNP-Slice and an agent-based model. We evaluate the effect of host age on the inference quality of parasite SNP-haplotypes from various perspectives: within-host strain inference, strain dictionary across host samples, shared infections between host pairs, and population-level strain richness. Our results suggest that sampling older hosts can facilitate more accurate inference of strain haplotypes and genetic structure in populations under the threat of malaria. This suggested sampling scheme implies potential improvements in estimating endemic transmission intensity and disease burden.
Implications of all the available evidence
First, our study confirms previous findings from empirical studies that older malaria hosts carry fewer but more distinct parasite strains. Second, our study reveals, for the first time, that sampling older hosts in field studies can provide a more accurate inference of parasite genetic structure. As a result, we propose a new sampling scheme for malaria field studies that emphasizes sampling old hosts. This scheme is in agreement with the viewpoint that the focal host group of malaria control should be shifted from young children to the relatively old hosts. Finally, we propose a new protocol for malaria molecular epidemiology, advocating for surveys and blood sample collection from adult hosts for genetic sequencing, and then applying haplotype reconstruction algorithms, such as SNP-Slice, to resolve strain haplotypes. With this protocol, strains can be learned with reasonably high accuracy, and there is no need to discard multi-clonal samples. To conclude, our findings will facilitate the population genetic studies of malaria epidemiology, as well as the effective drug deployment and disease control of malaria.