Predicting pyrazinamide resistance in Mycobacterium tuberculosis using a graph convolutional network
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Pyrazinamide is an important first-line antibiotic for treating tuberculosis and resistance is primarily caused by mutations in the pncA gene. Traditional machine learning models have been shown to be able to predict pyrazinamide resistance with some success, but are limited in their ability to incorporate three-dimensional protein structural information. Graph neural networks offer the potential to integrate protein structure and residue-level features to better predict the impact of mutations on drug resistance.
Results
We trained a graph convolutional network model on PncA variants containing missense mutations and evaluated its ability to classify resistance to pyrazinamide. Each PncA variant was represented as an amino acid-level graph, with edges based on 3D spatial proximity and node features derived from chemical properties and mutation meta-predictors. We used AlphaFold2 to generate predicted structures of the PncA variants, which we used to create the protein graphs. The predicted structures of resistant PncA variants showed greater deviation from the wild-type structure compared to susceptible variants. Our model achieved an F1 score of 81.6 %, sensitivity of 81.6 % and specificity of 80.4 % on the test set and either matched or exceeded the performance of a published set of traditional machine learning models. We show that both structural graph connectivity and node features contribute significantly to model performance.
Conclusions
Our study demonstrates that graph-based deep learning can effectively leverage protein structure and biochemical features to accurately predict antimicrobial resistance, in spite of being trained on a small dataset with little variation. We present this as a proof-of-concept for applying these methods to resistance phenotype prediction. Our approach has the potential to be extended to modelling more complex patterns and mechanisms of resistance, particularly in more genetically diverse pathogens.