Efficient Drug Discovery with LSTM-Based Models: Insights from SARS-CoV-2 Variants
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid evolution of SARS-CoV-2 variants underscored the need for accelerated drug discovery methods. This study demonstrates the use of recurrent neural networks (RNNs) with Long Short-Term Memory (LSTM) units to generate novel pharmaceutical compounds capable of inhibiting SARS-CoV-2 through protein binding, using variants (Alpha, Beta, Gamma, Delta) as reference points. Three LSTM-based RNN models were developed, trained on a dataset of 2,572,812 preprocessed SMILES (Simplified Molecular-Input Line-Entry System) sequences from the ChEMBL 29 and MOSES databases, and fine-tuned against these variants. The models, differing in dropout regularization parameters, were evaluated for validity, originality, and uniqueness of generated molecules, with performance assessed via simulated protein binding affinity scores using PyRx. Results demonstrate that Model 3, with the lowest dropout values (0.2 and 0.4), outperformed others, achieving a 98.0% validity rate, 94.1% originality, and 97.9% uniqueness, and generating molecules with high binding affinities (e.g., -17.40 kcal/mol). These findings highlight the efficacy of LSTM-RNNs in automating and optimizing drug discovery, potentially offering a scalable, efficient alternative to traditional methods. Further laboratory validation is recommended to translate these computational results into practical therapeutic applications.