Functional annotation hypothetical proteins: a world to be explored in drug development in Trypanosomatids

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Hypothetical proteins can provide an alternative pathway for finding potential targets in the development of new drugs due to the fact that many Neglected tropical diseases are caused by Trypanosomatids (Chagas Disease, Leishmaniasis, and Human African Trypanosomiasis). In this work, we focus on applying functional prediction methods based on both sequence and structure to analyze the hypothetical proteins of the pathogenic agents that cause these diseases: T. cruzi (Tcr), T. brucei brucei (Tbr), T. brucei gambiense (Tbg), L. infantum (Lif), L. donovani (Ldo), and L. braziliensis (Lbz). By consulting databases and servers, we have predicted functional domains for twenty-six proteins in Tcr, thirteen in Tbr, fifteen in Tbg, ten in Lif, and one in both Ldo and Lbz. With the goal of developing multi-target therapies, we grouped the domains according to how they are shared among the organisms and investigated those that are shared among more species. By examining the existing literature using specific search strategies, we described what has already been reported for these domains and also analyzed protein structures and sequences, describing mutations among the species and potential drug sites. The published works have unveiled that some of these domains are non-essential for trypanosomatids, like the TRX domain, while others demand further investigation due to a lack of information about metabolic processes (UFC1, Ufm1, ACBP, AAA 18, and Fe-S). Although, we have identified three noteworthy domains that hold promise as targets: TPR, which plays a crucial role in the ciliogenesis process; Nuc deoxyrib tr, essential in purine recycling and recovery mechanisms; and MIX, important for protein targeting and the assembly of complexes such as COX. These three domains are promising targets for drug development due to their conservation, their potential to affect multiple species and their exclusivity.

Article activity feed

  1. In this work, we focus on applying functional prediction methods based on both sequence and structure to analyze the hypothetical proteins of the pathogenic agents that cause these diseases: T. cruzi (Tcr), T. brucei brucei (Tbr), T. brucei gambiense (Tbg), L. infantum (Lif), L. donovani (Ldo), and L. braziliensis (Lbz).

    This is a really cool analysis combining sequence-based and structure-based functional domain prediction! It provides tons of new info about trypanosomatid biology (even giving us some possible new drug targets!), but could also tell us a lot about sequence/structure conservation and how we might leverage this for annotation and drug target ID purposes.

  2. SEquence-Based Functional Prediction (SEBFP) and Structure-Based Functional Prediction (STBFP) results

    It would be interesting to think about how your sequence-based predictions and structure-based predictions compare. For example, do you have proteins that were similar based on sequence but not structure and vice versa? And even between different sequence-based and structure-based methods since you applied a bunch of tools? This could be an interesting opportunity to learn more about sequence and structure conservation in an organism that's more distant from humans!

  3. Structure-Based Functional Prediction (STBFP) results, which identified the IFT70 protein (PDBid: 4UZY) [100]. IFT70 is present in the IFT train which is a crucial component of the intraflagellar transport protein complex responsible for cyclogenesis, an evolutionarily conserved transport process involving the bidirectional movement of particles within cilia [101].

    It might be useful to put this information sooner. I was confused why you were talking about IFT70 previously and this context was really helpful!

  4. In the upcoming sections,

    Because it seems like you're sort of starting a new section here, it might be useful to add a title so that it doesn't run together with the previous section.

  5. our alignment results indicate a higher identity of 50% between these species, with the lowest being 44% between Tcr and Lif.

    I love this bit of information about what you found in your analyses. It's very helpful for thinking about these proteins.

  6. ACBP

    Might be good to include what this abbreviation means when you first reference it. Would also be useful to include some information about what you found in your analysis that led you to include this section.

  7. we identified the UFC1 and Ufm1 domains, both of which play roles in the ubiquitination process

    It would be cool to see a figure showing how similar the sequences and structures are of these proteins compared to known versions. Since it seems like we know quite a bit about them (ex that muts like Arg23Gln can affect binding in UFC1), it would be interesting to see if particular important residues are conserved.

  8. the same function must be predicted in at least eight SEBFP tools and two STBFP tools

    Why these criteria? Also by requiring that you get matches in both the sequence based and structure based tools, are you missing proteins that maybe have very similar structures (and possibly function) but very different sequences?