A Hybrid Pharmacovigilance Method for National-Scale Comorbidity Discovery: Association Rules with FDA-Approved PRR/Chi-square and EBGM Validation.
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Clinicians need scalable, statistically rigorous maps of disease–disease co-occurrence to support screening, safer prescribing, and differential diagnosis. Spontaneous-report data such as FAERS are national in scope but prone to bias, so discovery requires conservative, interpretable validation. Methods We developed a three-stage pipeline—Association Rule Mining for candidate triage, disproportionality testing with PRR and χ² (criteria: PRR ≥ 2.0, χ² ≥ 4.0, p < 0.05, a ≥ 3), and Empirical-Bayes shrinkage via EBGM (low-count rule: if a < 10 and EB05 < 2, reject). The pipeline was applied to FAERS (n = 393,130; 50 index conditions) and externally evaluated with a lab-test corpus (n = 2,461). Results The workflow yielded 25,083 validated condition–condition associations after ≈ 99% overall rejection of naïve pairs (≈ 85% specificity among retained links). Statistical strength was high: 97.1% of associations met p < 0.001. Risk tiering showed 43.5% of links in a High-Risk band (PRR ≥ 10), with clinically coherent hubs (e.g., hypertension, rheumatoid arthritis, Crohn’s disease, psoriasis). Internal five-fold analysis indicated strong reproducibility, and cross-corpus comparison supported generalizability, with directionally consistent effects and high rejection concordance for spurious pairs. End-to-end performance supports real-time use (< 50 ms for cached retrieval; ~2–3 s on-demand). Conclusions A staged, FDA-aligned pipeline (Association Rules → PRR/χ² → EBGM) converts spontaneous reports into a defensible, reproducible comorbidity network at national scale. The approach reduces false positives without sacrificing sensitivity, aligns with familiar pharmacovigilance statistics, and is production-ready for deployment in clinical decision support (medocsecondopinion.com) while providing a clear queue of novel, testable hypotheses for follow-up in longitudinal data.