Odyssey: reconstructing evolution through emergent consensus in the global proteome

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present Odyssey, a family of multimodal protein language models for sequence and structure generation, protein editing and design. We scale Odyssey to more than 102 billion parameters, trained over 1.1 × 10 23 FLOPs. The Odyssey architecture uses context modalities, categorized as structural cues, semantic descriptions, and orthologous group metadata, and comprises two main components: a finite scalar quantizer for tokenizing continuous atomic coordinates, and a transformer stack for multimodal representation learning. Odyssey is trained via discrete diffusion, and characterizes the generative process as a time-dependent unmasking procedure. The finite scalar quantizer and transformer stack leverage the consensus mechanism, a replacement for attention that uses an iterative propagation scheme informed by local agreements between residues. Across various benchmarks, Odyssey achieves landmark performance for protein generation and protein structure discretization. Our empirical findings are supported by theoretical analysis.

Article activity feed