Survey-Weighted Bayesian Logistic Regression for Measurement Error in Cross-Sectional Data Health Surveys
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Low- and middle-income nations employ cross-sectional health surveys like the Demographic and Health Surveys (DHS) to estimate population health indicators. These estimates may be invalidated by measurement error in explanatory factors and complicated survey design with uneven sample probability. Traditional logistic regression models neglect these difficulties, resulting in skewed parameter estimates and poor inference. Aims: This study aims to develop and evaluate a survey-weighted Bayesian logistic regression framework that simultaneously accounts for measurement error and complex survey design in cross-sectional health survey data. Methods: A Bayesian hierarchical model with pseudo-posterior likelihood combines sampling weights and interprets observed covariates as noisy measurements of latent true variables. Markov Chain Monte Carlo (MCMC) techniques are used for posterior inference over regression coefficient and measurement error variance prior distributions. A simulation study compares the proposed model to a standard unweighted model utilizing parameter estimation metrics and predictive performance indicators. Results: The simulation results show that the survey-weighted Bayesian model produces lower bias, reduced root mean square error, and coverage probabilities closer to the nominal level. In addition, the corrected model demonstrates improved predictive performance, with higher accuracy (0.836 vs.\ 0.724), precision (0.933 vs.\ 0.857), recall (0.948 vs.\ 0.696), F1-score (0.900 vs.\ 0.796), and AUC (0.844 vs.\ 0.745) compared to the unweighted model. Conclusion: Incorporating survey weights and correcting for measurement error within a Bayesian framework improves both parameter estimation and predictive performance. The proposed model provides a robust methodological approach for analyzing complex cross-sectional health survey data and supports more reliable population-level inference.