The Natural Coding of Language: SPECTRA is All You Need

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In the data-driven era, the recording and preservation of information have become increasingly crucial. To surpass the binary storage capabilities of silicon-based materials, several novel storage media have been proposed in recent years, such as synthetic DNA and polymers with sequence-encoding characteristics. Herein, we present an alternative strategy that uses spectra of small molecules as a natural code for linking human language to molecular structures. By assigning words to codes derived from selected regions of experimental spectra, we demonstrate a workflow that generates spectra conditioned on language inputs and translates them into molecules. The molecules produced in this way yield computed and measured spectra that match their intended target codes, confirming the fidelity of the approach. These findings introduce spectra as a compact and interpretable channel − “SPECTRA 1 is All You Need” − for storing and expressing information, advancing data storage, cryptography, and the emerging field of chemistry-inspired information technologies.

Article activity feed