GenDiS3 database: census on prevalence of protein domain superfamilies of known structure in the entire sequence database

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Despite the vast amount of sequence data available, a significant disparity exists between the number of protein sequences identified and the relatively few structures that have been resolved. This disparity highlights the challenge in structural biology to bridge the gap between sequence information and three-dimensional structural data, and the necessity for robust databases capable of linking distant homologues to known structures. Studies have indicated that there is a limited number of structural folds, despite the vast diversity of proteins. Hence, computational tools can enhance our ability to classify protein sequences, much before their structures are determined or their functions characterised, thereby bridging the gap between sequence and structural data. GenDiS is a repository with information on Genomic Distribution of protein domain superfamilies, involving a one-time computational exercise to search for trusted homologues of protein domains of known structures against the vast sequence database. We have updated this database employing advanced bioinformatics tools, including DELTA-BLAST for initial detection of hits and HMMSCAN for validation, significantly improving the accuracy of domain identification. Using these tools, over 151 million sequence homologues for 2060 superfamilies (SCOPe) were identified and 116 million out of them were validated as true positives. Through a case study on glycolysis-related enzymes, variations in domain architectures of these enzymes are explored, revealing evolutionary changes and functional diversity amongst these essential proteins. We present another case, LoG gene, where there one can tune-in and find significant mutations across the evolutionary lineage. The GenDiS database, GenDiS3, and the associated tools made available at https://caps.ncbs.res.in/gendis3/ , offer a powerful resource for researchers in functional annotation and evolutionary studies.

Database URL: https://caps.ncbs.res.in/gendis3/

Article activity feed