Deciphering the Proteome of Escherichia coli K-12: Integrating Transcriptomics and Machine Learning to Annotate Hypothetical Proteins
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Omics technologies have led to the discovery of a vast number of proteins that are expressed but have no functional annotation - so called hypothetical proteins (HPs). Even in the best studied model organism Escherichia coli K-12, 2% of the proteome remains uncharacterized, with no sequence homologies available in public databases. This knowledge gap becomes even worse when looking at microbial dark matter. However, knowing the functions of proteins is crucial for elucidating cellular and metabolic processes and harnessing biotechnological potentials. In this study, we employed machine learning (ML) to decipher the transcriptional regulatory network of E. coli K-12, deep learning for the prediction of structural homologs as well as other ML and bioinformatic tools, with the goal to assign functions to uncharacterized HPs. We further provide a proof-of-concept for experimental validation of function for three HP-encoding genes ( yhdN , yeaC and ydgH ) hinted from the in silico methods, by analyzing growth patterns of E. coli K-12 deletion mutants compared to the wild type, as well as their transcriptional responses to specific conditions. This study demonstrates that the use of Big Omics Data in combination with Artificial Intelligence and experimental controls is a powerful approach to illuminate functional dark matter.