ProtNote: a multimodal method for protein–function annotation

Samir Char
Nathaniel Corley
Sarah Alamdari
Kevin K Yang
Ava P Amini

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Understanding the protein sequence–function relationship is essential for advancing protein biology and engineering. However, <1% of known protein sequences have human-verified functions. While deep-learning methods have demonstrated promise for protein–function prediction, current models are limited to predicting only those functions on which they were trained.

Results

Here, we introduce ProtNote, a multimodal deep-learning model that leverages free-form text to enable both supervised and zero-shot protein–function prediction. ProtNote not only maintains near state-of-the-art performance for annotations in its training set but also generalizes to unseen and novel functions in zero-shot test settings. ProtNote demonstrates superior performance in the prediction of novel Gene Ontology annotations and Enzyme Commission numbers compared to baseline models by capturing nuanced sequence–function relationships that unlock a range of biological use cases inaccessible to prior models. We envision that ProtNote will enhance protein–function discovery by enabling scientists to use free text inputs without restriction to predefined labels—a necessary capability for navigating the dynamic landscape of protein biology.

Availability and Implementation

The code is available on GitHub: https://github.com/microsoft/protnote; model weights, datasets, and evaluation metrics are provided via Zenodo: https://zenodo.org/records/13897920.

Version published to 10.1093/bioinformatics/btaf170
Apr 15, 2025
Version published to 10.1101/2024.10.17.618952 on bioRxiv
Oct 21, 2024

A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

This article has 7 authors:
1. Valentina Carbonari
2. Annamaria Defilippo
3. Ugo Lomoio
4. Caterina Francesca Perri
5. Barbara Puccio
6. Pierangelo Veltri
7. Pietro Hiram Guzzi
This article has no evaluationsLatest version Dec 23, 2025
The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Motivation

Results

Availability and Implementation

Article activity feed

Related articles

A Survey on Efficient Protein Language Models

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

The Evolution of the AlphaFold Architecture