Neural network-based identification of easily-obtainable demographic and clinical characteristics to identify people with tuberculosis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We consider the application of machine learning to the classification of tuberculosis (TB) based on clinical and demographic data. Such data is routinely collected from people who present with cough at community-level care centres. Therefore, such automatic classification could identify people who require expensive but critical confirmatory testing, thereby offering simple and low-cost method of triage. Logistic regression, XGBoost, and convolutional neural network classifiers are evaluated using fully-nested cross validation, with and without feature selection. Although the application of CNNs to clinical and demographic data is unconventional, we show it to be effective. Experiments are carried out using two datasets: cough diagnostic algorithm for TB (CODA TB), n = 1140 and cough audio triage for TB (CAGE-TB), n = 463, for both datasets all participants self-presented to healthcare facilities with symptoms or risk factors suggestive of TB. Using the CNN, areas under the receiver operating characteristic (AUROC) of 80.48% and 83.06% are achieved for the two datasets respectively. Furthermore, performance is shown to improve both when the set of clinical features is extended, and when the number of people in the dataset increases. This holds promise of the development of an automated TB triage tool, implemented on a low-cost mobile device such as a smartphone, that is suitable for use at primary health-care facilities.