Bridging 3D Molecular Structures and Artificial Intelligence by a Conformation Description Language

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Artificial intelligence, particularly language models (LMs), is reshaping research paradigms across scientific domains. In the fields of chemistry and pharmacy, chemical language models (CLMs) have achieved remarkable success in two-dimensional (2D) molecular modeling tasks by leveraging one-dimensional (1D) representations of molecules, such as SMILES and SELFIES. However, extending these successes to three-dimensional (3D) molecular modeling remains a significant challenge, largely due to the absence of effective 1D representations for capturing 3D molecular structures. To address this gap, we introduce ConfSeq, a novel molecular conformation description language that integrates SMILES with internal coordinates including dihedral angles, bond angles, and pseudo-chirality. This design naturally ensures SE(3) invariance, while preserving the human readability and conciseness characteristic of SMILES. ConfSeq enables the reformulation of a range of 3D molecular modeling tasks, such as molecular conformation prediction, 3D molecular generation, and 3D molecular representation, into sequence modeling problems. Then, by simply employing a standard Transformer architecture, we achieve state-of-the-art performance on various benchmark sets. Furthermore, compared to widely used diffusion-based approaches in 3D molecular modeling, the ConfSeq-based method offers unique advantages in inference efficiency, generation controllability, and enables scoring of generated molecules. We believe that ConfSeq can serve as a foundational tool, advancing the development of sequence-based 3D molecular modeling methods.

Article activity feed