The genetically-encoded amino acids distribute non-randomly within a functionally-relevant chemical space
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The standard set of 20 genetically-encoded amino acids (C20) exhibits a statistically non-random distribution in primarily two structurally-relevant physicochemical properties: hydrophobicity and molecular volume, and to a lesser extent charge. It remains an open question, however, whether evolutionary pressures similarly optimized the same alphabet for the distribution of functionally-relevant properties, such as reactivity. In this study, we used semi-empirical quantum chemistry simulations to calculate the highest occupied molecular orbital and lowest unoccupied molecular orbital (HOMO-LUMO) gaps for 84 xeno amino acids and constructed 10 million random 20-mer amino acid alphabets to determine where C20 fit amongst this background. The HOMO-LUMO gap measurements demonstrated that C20, similar to hydrophobicity and volume, also exhibits a non-random distribution. However, unlike hydrophobicity and volume, this distribution is non-random across an unevenly broad range. The results expand upon previous theory and suggest HOMO-LUMO gap energies as one synthetic biologists may consider when developing novel protein design tools or designing functional xeno amino acid alphabets.
Highlights
-
Life’s amino acid alphabet is non-randomly distributed within an expanded computationally-generated chemistry space generated from large-scale quantum chemistry simulations.
-
Amino acid alphabet coverage theory applies beyond structurally-relevant physicochemical descriptors to include functionally-relevant properties like reactivity as measured by frontier molecular orbitals
-
Findings here provide a theoretical framework to guide the design of novel proteins and development of synthetic amino acid alphabets.