Functional annotation hypothetical proteins: a world to be explored in drug development in Trypanosomatids

Raissa Santos de Lima
Ana Carolina Silva Bulla
Manuela Leal da Silva

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Hypothetical proteins can provide an alternative pathway for finding potential targets in the development of new drugs due to the fact that many Neglected tropical diseases are caused by Trypanosomatids (Chagas Disease, Leishmaniasis, and Human African Trypanosomiasis). In this work, we focus on applying functional prediction methods based on both sequence and structure to analyze the hypothetical proteins of the pathogenic agents that cause these diseases: T. cruzi (Tcr), T. brucei brucei (Tbr), T. brucei gambiense (Tbg), L. infantum (Lif), L. donovani (Ldo), and L. braziliensis (Lbz). By consulting databases and servers, we have predicted functional domains for twenty-six proteins in Tcr, thirteen in Tbr, fifteen in Tbg, ten in Lif, and one in both Ldo and Lbz. With the goal of developing multi-target therapies, we grouped the domains according to how they are shared among the organisms and investigated those that are shared among more species. By examining the existing literature using specific search strategies, we described what has already been reported for these domains and also analyzed protein structures and sequences, describing mutations among the species and potential drug sites. The published works have unveiled that some of these domains are non-essential for trypanosomatids, like the TRX domain, while others demand further investigation due to a lack of information about metabolic processes (UFC1, Ufm1, ACBP, AAA 18, and Fe-S). Although, we have identified three noteworthy domains that hold promise as targets: TPR, which plays a crucial role in the ciliogenesis process; Nuc deoxyrib tr, essential in purine recycling and recovery mechanisms; and MIX, important for protein targeting and the assembly of complexes such as COX. These three domains are promising targets for drug development due to their conservation, their potential to affect multiple species and their exclusivity.

Arcadia Science
Jan 5, 2024

In this work, we focus on applying functional prediction methods based on both sequence and structure to analyze the hypothetical proteins of the pathogenic agents that cause these diseases: T. cruzi (Tcr), T. brucei brucei (Tbr), T. brucei gambiense (Tbg), L. infantum (Lif), L. donovani (Ldo), and L. braziliensis (Lbz).

This is a really cool analysis combining sequence-based and structure-based functional domain prediction! It provides tons of new info about trypanosomatid biology (even giving us some possible new drug targets!), but could also tell us a lot about sequence/structure conservation and how we might leverage this for annotation and drug target ID purposes.

Read the original source
Arcadia Science
Jan 5, 2024

SEquence-Based Functional Prediction (SEBFP) and Structure-Based Functional Prediction (STBFP) results

It would be interesting to think about how your sequence-based predictions and structure-based predictions compare. For example, do you have proteins that were similar based on sequence but not structure and vice versa? And even between different sequence-based and structure-based methods since you applied a bunch of tools? This could be an interesting opportunity to learn more about sequence and structure conservation in an organism that's more distant from humans!

Read the original source
Arcadia Science
Jan 5, 2024

Structure-Based Functional Prediction (STBFP) results, which identified the IFT70 protein (PDBid: 4UZY) [100]. IFT70 is present in the IFT train which is a crucial component of the intraflagellar transport protein complex responsible for cyclogenesis, an evolutionarily conserved transport process involving the bidirectional movement of particles within cilia [101].

It might be useful to put this information sooner. I was confused why you were talking about IFT70 previously and this context was really helpful!

Read the original source
Arcadia Science
Jan 5, 2024

In the upcoming sections,

Because it seems like you're sort of starting a new section here, it might be useful to add a title so that it doesn't run together with the previous section.

Read the original source
Arcadia Science
Jan 5, 2024

our alignment results indicate a higher identity of 50% between these species, with the lowest being 44% between Tcr and Lif.

I love this bit of information about what you found in your analyses. It's very helpful for thinking about these proteins.

Read the original source
Arcadia Science
Jan 5, 2024

TRX

Would also mention what this abbreviation stands for in case you have readers that aren't super familiar with this protein domain.

Read the original source
Arcadia Science
Jan 5, 2024

AAA 18 domain

Again would be interesting to include what you found in your analysis about AAA 18 domains.

Read the original source
Arcadia Science
Jan 5, 2024

Read the original source
Arcadia Science
Jan 5, 2024

ACBP

Might be good to include what this abbreviation means when you first reference it. Would also be useful to include some information about what you found in your analysis that led you to include this section.

Read the original source
Arcadia Science
Jan 5, 2024

Another server

I'm sure you considered it, but you could also try employing Foldseek to search for matches in a couple subsets of the AlphaFold database as well as CATH50, MGnify, and others. It's also super fast! https://search.foldseek.com/search

Read the original source
Arcadia Science
Jan 5, 2024

we identified the UFC1 and Ufm1 domains, both of which play roles in the ubiquitination process

It would be cool to see a figure showing how similar the sequences and structures are of these proteins compared to known versions. Since it seems like we know quite a bit about them (ex that muts like Arg23Gln can affect binding in UFC1), it would be interesting to see if particular important residues are conserved.

Read the original source
Arcadia Science
Jan 5, 2024

the same function must be predicted in at least eight SEBFP tools and two STBFP tools

Why these criteria? Also by requiring that you get matches in both the sequence based and structure based tools, are you missing proteins that maybe have very similar structures (and possibly function) but very different sequences?

Read the original source
Arcadia Science
Jan 5, 2024

ouch

Should this be "out"?

Read the original source
Version published to 10.1101/2023.12.07.570673 on bioRxiv
Dec 8, 2023

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed