Hybrid Statistical Learning for COVID-19 Testing Operations: A Multitask Modeling of Positivity, Viral Load and Turnaround Time
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Timely and reliable COVID-19 testing requires more than accurate diagnostics: laboratories must triage infection risk, quantify viral burden among positives, and de-bottleneck turnaround times (TATs). Using a de-identified cohort of 15{,}524 PCR tests from a large pediatric health system in 2020, we develop a hybrid pipeline that jointly models (i) test positivity, (ii) cycle-threshold (Ct) among positives, and (iii) operational TAT components (collection$\to$receipt and receipt$\to$verification). Methodologically, we combine interpretable generalized additive models with elastic-net GLMs and gradient-boosted trees for classification; semiparametric regression and quantile methods (including quantile forests) for Ct; and accelerated failure-time and quantile regression for TAT. We embed time-aware validation, calibration, explainability, and fairness audits, and estimate policy effects of drive-through collection using orthogonalized Double Machine Learning with causal forests for heterogeneity. On rolling time splits, the stacked classifier attains AUROC $\approx 0.80$ with good low-range calibration; age and pandemic day are the dominant nonlinear drivers of positivity, Ct shows pronounced heterogeneity across settings, and TATs exhibit heavy-tailed, persistent variability. While naive ATE estimates for drive-through are fragile under missingness/overlap, subgroup causal forests highlight operational segments with plausible gains. The result is a deployment-ready blueprint that integrates predictive performance with governance (calibration, shift robustness, fairness) for laboratory operations.