Motif-weighted Structure Alignment for Classification and Evolutionary Studies of Carbonic Anhydrase

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Carbonic anhydrases (CAs) attract interest for their critical roles in various physiological processes and potential application in CO2 sequestration to combat global warming. Despite being an important enzyme family, the classification and evolution of CAs remain elusive due to their high sequence diversity and long evolutionary history. In this paper, the in-silico strategy, Motif-weighted Alignment for Structure-based Protein Classification (MASPC) was developed, which uses OmegaFold simulated CA structures combined with weighted structural motif alignment, TM-weighted, to facilitate more precise polymorphic analysis of large enzyme datasets in a robust manner. The MASPC strategy was first validated by 74 ground-truth CA structures extracted from PDB, showing improved performance compared to sequence-based polymorphic analysis (ClustalO-RAxML). Subsequently, MASPC was applied to analyze a representative database, which contains 1603 CAs from 117 model organisms, with focus on α-, β-, and- γ- CA classes, to cover organisms from across life evolution history. The results indicated that α-, β-, and γ-CAs were well grouped in their own classes, with clearer clustering associated with the CA’s organism. The structural differences among the α-, β-, and γ-CAs revealed by MASPC supported the current understanding that CA classes are the results of convergent evolution. The sub-clusters in α- and β-CAs are highly associated with organisms according to their appearance in evolutionary history, demonstrating a close correlation between CA evolution and life evolution. Furthermore, the MASPC method was also applied to identify 27 potential α-CAs from the NCBI database with less than 40% sequence similarity to a template human carbonic anhydrase II (HCA-II) sequence, demonstrating possible applications in enzyme identification studies.

Article activity feed