Dealing with differential misclassification of an outcome or a covariate in association studies with an internally validated sample. Application to the use of a serological test for the diagnosis of SARS-CoV-2 infection
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background To present an analytical framework for correcting misclassification when an imperfect test is used as an indicator of a disease in association studies, taking into account that part of the sample has joint test and disease data. Methods We explored two scenarios, depending on whether the disease is a covariate or the outcome. The analysis sample includes an internal validation sample where the disease status is known in addition to the test. Joint likelihood models taking into account classification errors and the possibly non-random selection of the validation sample were used. Simulations were performed to evaluate the methods. We illustrated our framework using data from a multi-cohort COVID-19 serological study conducted in France between 2020 and 2021, with serology as the imperfect test and SARS-CoV-2 infection as the disease. The dataset included concomitant measurements of the serological test and the SARS-CoV-2 infection status in 7% participants. We estimated 1) the association between incident persistent symptoms (outcome) and SARS-CoV-2 infection (covariate) and 2) the association between infection (outcome) and several covariates. For comparison, we also estimated ‘naïve’ models using serology without correction, or models based solely on the validation sample. Results Simulations confirmed the methods’ abilities to correct for misclassification and non-random selection of the validation sample. In the application, the estimated sensitivities and specificities of the serological test with respect to SARS-CoV-2 infection were 86.2%-87.7% and 95.8%-97.5%, respectively. Considering SARS-CoV-2 infection as a covariate, the corrected analysis showed a significant association between infection and persistent symptoms, while other analyses did not. Considering SARS-CoV-2 infection as the outcome, the corrected analysis confirmed the association between infection and age, gender and active smoking, but did not retrieve an association with living with at least one child at home and previous smoking, which were identified in the naive analysis. Conclusion This methodological framework can be applied in association studies when an imperfect test is used as an indicator of a disease and the disease status has been validated in a subset of the sample. We extended previous works to deal with non-random selection of this validated sample. Registration: NCT04392388