DrugTar improves druggability prediction by integrating large language models and gene ontologies

Niloofar Borhani
Iman Izadi
Ali Motahharynia
Mahsa Sheikholeslami
Yousof Gheisari

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Target discovery is crucial in drug development, especially for complex chronic diseases. Recent advances in high-throughput technologies and the explosion of biomedical data have highlighted the potential of computational druggability prediction methods. However, most current methods rely on sequence-based features with machine learning, which often face challenges related to hand-crafted features, reproducibility, and accessibility. Moreover, the potential of raw sequence and protein structure has not been fully investigated.

Results

Here, we leveraged both protein sequence and structure using deep learning techniques, revealing that protein sequence, especially pre-trained embeddings, is more informative than protein structure. Next, we developed DrugTar, a high-performance deep learning algorithm integrating sequence embeddings from the ESM-2 pre-trained protein language model with gene ontologies to predict druggability. DrugTar achieved areas under the curve and precision–recall curve values of 0.94, outperforming state-of-the-art methods. In conclusion, DrugTar streamlines target discovery as a bottleneck in developing novel therapeutics.

Availability and implementation

DrugTar is available as a web server at www.DrugTar.com. The data and source code are at https://github.com/NBorhani/DrugTar.

Version published to 10.1093/bioinformatics/btaf360
Jun 24, 2025
Version published to 10.1101/2024.09.21.614218 on bioRxiv
Sep 24, 2024

Protein Function Prediction with Pretrained ProtT5 Embeddings and Gradient Boosting

This article has 2 authors:
1. Jett Appel
2. Nathan Butcher
This article has no evaluationsLatest version Apr 28, 2026
Systematic Benchmarking of Kinase Bioactivity Models Across Splitting Strategies and Protein Representations

This article has 1 author:
1. Joshua M. Abbott
This article has no evaluationsLatest version Apr 22, 2026
Improving Biological Sequence Prediction with AlphaFold2 Representation

This article has 3 authors:
1. Zhiqian Jiang
2. Canh Hao Nguyen
3. Hiroshi Mamitsuka
This article has no evaluationsLatest version Apr 28, 2026

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability and implementation

Article activity feed

Related articles

Protein Function Prediction with Pretrained ProtT5 Embeddings and Gradient Boosting

Systematic Benchmarking of Kinase Bioactivity Models Across Splitting Strategies and Protein Representations

Improving Biological Sequence Prediction with AlphaFold2 Representation