Hybrid Statistical Learning for COVID-19 Testing Operations: A Multitask Modeling of Positivity, Viral Load and Turnaround Time

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Timely and reliable COVID-19 testing requires more than accurate diagnostics: laboratories must triage infection risk, quantify viral burden among positives, and de-bottleneck turnaround times (TATs). Using a de-identified cohort of 15{,}524 PCR tests from a large pediatric health system in 2020, we develop a hybrid pipeline that jointly models (i) test positivity, (ii) cycle-threshold (Ct) among positives, and (iii) operational TAT components (collection$\to$receipt and receipt$\to$verification). Methodologically, we combine interpretable generalized additive models with elastic-net GLMs and gradient-boosted trees for classification; semiparametric regression and quantile methods (including quantile forests) for Ct; and accelerated failure-time and quantile regression for TAT. We embed time-aware validation, calibration, explainability, and fairness audits, and estimate policy effects of drive-through collection using orthogonalized Double Machine Learning with causal forests for heterogeneity. On rolling time splits, the stacked classifier attains AUROC $\approx 0.80$ with good low-range calibration; age and pandemic day are the dominant nonlinear drivers of positivity, Ct shows pronounced heterogeneity across settings, and TATs exhibit heavy-tailed, persistent variability. While naive ATE estimates for drive-through are fragile under missingness/overlap, subgroup causal forests highlight operational segments with plausible gains. The result is a deployment-ready blueprint that integrates predictive performance with governance (calibration, shift robustness, fairness) for laboratory operations.

Article activity feed