Machine learning detects hidden treatment response patterns only in the presence of comprehensive clinical phenotyping
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Inferential statistics traditionally used in clinical trials can miss relationships between clinical phenotypes and treatment responses. We simulated a randomised clinical trial to explore how gradient boosting (XGBoost) machine learning (ML) compares with traditional analysis when ‘ground truth’ treatment responsiveness depends on the interaction of multiple phenotypic variables. As expected, traditional analysis detected a significant treatment benefit (outcome measure change from baseline = 4.23; 95% CI 3.64–4.82). However, recommending treatment based upon this evidence would lead to 56.3% of patients failing to respond. In contrast, ML correctly predicted treatment response in 97.8% (95% CI 96.6– 99.1) of patients, with model interrogation showing the critical phenotypic variables and the values determining treatment response had been identified. Importantly, when a single variable was omitted, accuracy dropped to 69.4% (95% CI 65.3–73.4). This proof of principle underscores the significant potential of ML to maximise the insights derived from clinical research studies. However, the effectiveness of ML in this context is highly dependent on the comprehensive capture of phenotypic data.