DrugTar improves druggability prediction by integrating large language models and gene ontologies

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

Target discovery is crucial in drug development, especially for complex chronic diseases. Recent advances in high-throughput technologies and the explosion of biomedical data have highlighted the potential of computational druggability prediction methods. However, most current methods rely on sequence-based features with machine learning, which often face challenges related to hand-crafted features, reproducibility, and accessibility. Moreover, the potential of raw sequence and protein structure has not been fully investigated.

Results

Here, we leveraged both protein sequence and structure using deep learning techniques, revealing that protein sequence, especially pre-trained embeddings, is more informative than protein structure. Next, we developed DrugTar, a high-performance deep learning algorithm integrating sequence embeddings from the ESM-2 pre-trained protein language model with gene ontologies to predict druggability. DrugTar achieved areas under the curve and precision–recall curve values of 0.94, outperforming state-of-the-art methods. In conclusion, DrugTar streamlines target discovery as a bottleneck in developing novel therapeutics.

Availability and implementation

DrugTar is available as a web server at www.DrugTar.com. The data and source code are at https://github.com/NBorhani/DrugTar.

Article activity feed