Parameter-Efficient Adaptation of Large Language Models for Drug-Target Affinity Modeling in Drug Discovery
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate prediction of protein ligand binding affinity is crucial for selection of promising hit compounds with higher likelihood of target engagement in drug discovery and development. This study presents a novel approach using parameter-efficient fine-tuning techniques to adapt large language models for multi-class binding affinity classification, leveraging the growing adoption of LLMs in biomedical and drug discovery research. We have fine-tuned LLaMA 3.2-1B base model on optimized BindingDB dataset containing 409715 protein-ligand pairs, classifying binding affinities into three categories Very high affinity, moderate affinity and low affinity. We compared two parameter-efficient fine tuning methods: Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA). Our results demonstrate the QLoRA achieves competitive performance while reducing memory requirements by approximately 75%, making it feasible to fine tune large language models on limited computational resources. QLoRA model achieved an F1-macro score of 0.7257 and F1-micro score of 0.7256, demonstrating the potential of memory-efficient approaches for LLM applications to find out potential drug candidates in hit identification phase of drug discovery. This unique study can serve as a gateway for the fine-tuning of large language models (LLMs) on domain-specific bioscience datasets, enabling the development of more accurate and customized models for biological and biomedical applications.