Improving statistical power in severe malaria genetic association studies by augmenting phenotypic precision

James A Watson
Carolyne M Ndila
Sophie Uyoga
Alexander W Macharia
Gideon Nyutu
Mohammed Shebe
Caroline Ngetsa
Neema Mturi
Norbert Peshu
Benjamin Tsofa
Kirk Rockett
Stije Leopold
Hugh Kingston
Elizabeth C George
Kathryn Maitland
Nicholas PJ Day
Arjen Dondorp
Philip Bejon
Thomas N Williams
Chris C Holmes
Nicholas J White

Curated by eLife

Evaluation Summary:

The fundamental premise of genome wide association studies for severe malaria is to take a population with confirmed severe malaria and compare with a control group who do not have severe malaria. This paper presents a novel and valuable method for improving power for severe malaria genetic association studies. The method would also be useful for studies of other disease where there is a clinical definition that sometimes includes people who do not truly have the disease.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

Severe falciparum malaria has substantially affected human evolution. Genetic association studies of patients with clinically defined severe malaria and matched population controls have helped characterise human genetic susceptibility to severe malaria, but phenotypic imprecision compromises discovered associations. In areas of high malaria transmission the diagnosis of severe malaria in young children and, in particular, the distinction from bacterial sepsis, is imprecise. We developed a probabilistic diagnostic model of severe malaria using platelet and white count data. Under this model we re-analysed clinical and genetic data from 2,220 Kenyan children with clinically defined severe malaria and 3,940 population controls, adjusting for phenotype mis-labelling. Our model, validated by the distribution of sickle trait, estimated that approximately one third of cases did not have severe malaria. We propose a data-tilting approach for case-control studies with phenotype mis-labelling and show that this reduces false discovery rates and improves statistical power in genome-wide association studies.

eLife
Jun 9, 2021

Evaluation Summary:

The fundamental premise of genome wide association studies for severe malaria is to take a population with confirmed severe malaria and compare with a control group who do not have severe malaria. This paper presents a novel and valuable method for improving power for severe malaria genetic association studies. The method would also be useful for studies of other disease where there is a clinical definition that sometimes includes people who do not truly have the disease.

(This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

Read the original source
eLife
Jun 9, 2021

Reviewer #1 (Public Review):

When an outcome is sometimes misclassified, it can blur an association between the treatment and the outcome and reduce the power of a study of the effect of the treatment on an outcome. This is a problem in studies of the effect of genotypes on severe malaria when the standard clinical definition of severe malaria is used because the standard clinical definition of severe malaria prioritizes sensitivity over specificity (because the loss from failing to treat a child for severe malaria is much greater than the loss from treating a child who doesn't have severe malaria). In this study, the authors use standardly available clinical data -- platelet count and white blood cell count -- to increase the specificity of the definition of severe malaria in studies of the effect of genotypes on severe malaria. The …

Reviewer #1 (Public Review):

When an outcome is sometimes misclassified, it can blur an association between the treatment and the outcome and reduce the power of a study of the effect of the treatment on an outcome. This is a problem in studies of the effect of genotypes on severe malaria when the standard clinical definition of severe malaria is used because the standard clinical definition of severe malaria prioritizes sensitivity over specificity (because the loss from failing to treat a child for severe malaria is much greater than the loss from treating a child who doesn't have severe malaria). In this study, the authors use standardly available clinical data -- platelet count and white blood cell count -- to increase the specificity of the definition of severe malaria in studies of the effect of genotypes on severe malaria. The authors then use a data tilting approach to put more weight on clinically defined severe malaria cases that meet this more specific case definition of severe malaria. The authors show that their approach reduces false discovery rates in an empirical study. The authors also report the interesting finding that approximately one third of clinically defined severe malaria cases in a study of Kenyan children did not have severe malaria.

This paper presents a novel and valuable method for improving power for severe malaria genetic association studies that would also be useful for studies of other disease where there is a clinical definition that lacks high specificity.

Read the original source
eLife
Jun 9, 2021

Reviewer #2 (Public Review):

The fundamental premise of genome wide association studies for severe malaria is to take a population with confirmed severe malaria and compare with a control group who do not have severe malaria. The author's hypothesis is that in areas with high levels of malaria transmission the severe malaria group gets diluted by patients who have been mis-classified with severe malaria (but are ill with something else). This dilution of the severe malaria group then dilutes the effect size for differences between the control group.

The authors propose a statistical method for correcting for the diluted severe malaria group via an approach of data tilting. The consequences of this adjustment are then followed through to a logical and sensible conclusion, namely that correcting for this dilution can lead to more hits in …

Reviewer #2 (Public Review):

The fundamental premise of genome wide association studies for severe malaria is to take a population with confirmed severe malaria and compare with a control group who do not have severe malaria. The author's hypothesis is that in areas with high levels of malaria transmission the severe malaria group gets diluted by patients who have been mis-classified with severe malaria (but are ill with something else). This dilution of the severe malaria group then dilutes the effect size for differences between the control group.

The authors propose a statistical method for correcting for the diluted severe malaria group via an approach of data tilting. The consequences of this adjustment are then followed through to a logical and sensible conclusion, namely that correcting for this dilution can lead to more hits in GWAS studies and greater effect sizes. I'm not an expert in genetic association studies, but to my untrained eye, this portion of the analysis checks out (roughly speaking Figures 4 - 6). Instead I will focus my attention on the probabilistic diagnostic model (roughly speaking Figures 1 - 3).

Something I struggled with was keeping track of the different datasets. To this extent, a table summarizing the cohorts with summary statistics such as geographic location, age, symptom severity, and other relevant epidemiological information would be very useful.

My primary concern is on the comparability of the training data (Asian adults, Asian children, African children with high PfHRP2) and testing data (Kenyan). It's crucial that the model trained on the Asian adult data (highly specific) is valid for application on African children. What I would like to see is a more explicit demonstration that what we observe about severe malaria in Asian adults applies to Asian children, applies to African children. There is evidence for this in Figure 1B and Figure S2, but there are so many different data sets, that my tired mind found it difficult to follow.

Figure 1B. For the grey line fitted to the FEAST data, does this also include the PfHRP2 = 1 data. As this was non-detectable, is this a valid thing to do?

Figure 3. Can you check the panel labels? What's the horizontal dashed line?

Were they significant associations between parasite density and the probability of severe malaria.

Read the original source
Version published to 10.1101/2021.04.16.440107 on bioRxiv
Apr 17, 2021

Accuracy of Plasmodium falciparum genetic data for estimating parasite prevalence and malaria incidence in Uganda

This article has 30 authors:
1. Shahiid Kiyaga
2. Monica Mbabazi
3. Thomas Katairo
4. Kisakye Diana Kabbale
5. Victor Asua
6. Bienvenu Nsengimaana
7. Innocent Wiringilimaana
8. Francis Ddumba. Semakuba
9. Caroline Mwubaha
10. Jackie Nakasaanya
11. Eric Watyekele
12. Alisen Ayitewala
13. Stephen Tukwasibwe
14. Jerry Mulondo
15. Samuel Lubwama. Nsobya
16. Bosco Agaba
17. Catherine Maiteki-Sebuguzi
18. Moses Robert. Kamya
19. David Patrick. Kateete
20. Joyce Nakatumba Nabende
21. Daudi Jjingo
22. Gerald Mboowa
23. Charles Batte
24. Isaac Ssewanyana
25. Andrés Aranda-Díaz
26. Grant Dorsey
27. Philip J. Rosenthal
28. Melissa Conrad
29. Bryan Greenhouse
30. Jessica Briggs
This article has no evaluationsLatest version Dec 9, 2025
Ensemble Machine Learning for Malaria Diagnosis in Resource-Limited Settings Using Clinical and Demographic Features

This article has 3 authors:
1. Panashe Nyengera
2. Hilary Takunda Takawira
3. Farai Fredric Mlambo
This article has no evaluationsLatest version Jan 28, 2026
BTN3A2 protects against Escherichia coli infection: Insights from genome-wide association and Mendelian randomization

This article has 10 authors:
1. Michael Marks-Hultström
2. Mikael Eriksson
3. Finn Schulz
4. Bram Burger
5. Kristoffer Strålin
6. Volkan Özenci
7. Anders Krifors
8. Fredrik Sjövall
9. Guillaume Butler-Laporte
10. Miklos Lipcsey
This article has no evaluationsLatest version Feb 3, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Accuracy of Plasmodium falciparum genetic data for estimating parasite prevalence and malaria incidence in Uganda

Ensemble Machine Learning for Malaria Diagnosis in Resource-Limited Settings Using Clinical and Demographic Features

BTN3A2 protects against Escherichia coli infection: Insights from genome-wide association and Mendelian randomization