Classification of bioactive peptides: a comparative analysis of models and encodings

Edoardo Bizzotto
Guido Zampieri
Laura Treu
Pasquale Filannino
Raffaella Di Cagno
Stefano Campanaro

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Bioactive peptides are short amino acid chains possessing biological activity and exerting specific physiological effects relevant to human health, which are increasingly produced through fermentation due to their therapeutic roles. One of the main open problems related to biopeptides remains the determination of their functional potential, which still mainly relies on time-consuming in vivo tests. While bioinformatic tools for the identification of bioactive peptides are available, they are focused on specific functional classes and have not been systematically tested on realistic settings. To tackle this problem, bioactive peptide sequences and functions were collected from a variety of databases to generate a comprehensive collection of bioactive peptides from microbial fermentation. This collection was organized into nine functional classes including some previously studied and some newly defined such as immunomodulatory, opioid and cardiovascular peptides. Upon assessing their native sequence properties, four alternative encoding methods were tested in combination with a multitude of machine learning algorithms, from basic classifiers like logistic regression to advanced algorithms like BERT. By testing a total set of 171 models, it was found that, while some functions are intrinsically easier to detect, no single combination of classifiers and encoders worked universally well for all the classes. For this reason, we unified all the best individual models for each class and generated CICERON (Classification of bIoaCtive pEptides fRom micrObial fermeNtation), a classification tool for the functional classification of peptides. State-of-the-art classifiers were found to underperform on our benchmark dataset compared to the models included in CICERON. Altogether, our work provides a tool for real-world peptide classification and can serve as a benchmark for future model development.

Arcadia Science
Jan 9, 2024

To compare the amino acid usage in the functional classes, single amino acid, dipeptide, and tripeptide frequencies were plotted (Figure 2). The amino acid frequency plot in Figure 2A reveals that some BPs classes have distinct characteristics. For example, celiac disease BPs have the highest frequency of proline and glutamine, opioid peptides are enriched in tyrosine and glycine, while cardiovascular BPs have slightly higher frequencies of alanine and the highest frequency of the negatively charged amino acids aspartic acid and glutamic acid.

I'm curious if using degenerate amino acid alphabets (dayhoff encoding, hydrophobic-polar, etc) would further improve classification accuracy or show interesting patterns.

Read the original source
Arcadia Science
Jan 9, 2024

After filtering the sequences, and merging functionally overlapping or related classes, the final database consisted of 3990 BPs divided into nine different functional groups (Table 1).

it would be nice to see how many peptides were dropped at each stage of filtering. I'm also curious since you dropped so much data if the model would be more generalizable if that data were included somehow.

Read the original source
Arcadia Science
Jan 9, 2024

https://github.com/BizzoTL/CICERON/

would you be willing to add a license to the repository so terms of re-use of your work are clear?

Read the original source
Arcadia Science
Jan 9, 2024

70:20:10

Sorry if I missed this, but can you report the total size of each group?

Read the original source
Arcadia Science
Jan 9, 2024

HugginFace Transformers

typo i think :)

Read the original source
Arcadia Science
Jan 9, 2024

“antihypertensive”, “ACE-inhibitory” and “Renin-inhibitory” as Antihypertensive; “DPP-IV inhibitors” and “alpha-glucosidase inhibitors” as Antidiabetic; “antimicrobial”, “antifungal”, “antibacterial” and “anticancer” as Antimicrobial; “antithrombotic”, “CaMKII Inhibitor” as Cardiovascular with positive effects on vascular circulation; “Antiamnestic”, “anxiolytic-like”, “AChE inhibitors”, “PEP-inhibitory” and “neuropeptides” as Neuropeptides.

curious if you tried without these groupings -- ie, how dissimilar are some of the peptides that were placed into combined groups, and could the model have done well without these groupings

Read the original source
Arcadia Science
Jan 9, 2024

otherwise, they were excluded from the analysis.

Similarly, how often does this occur?

Read the original source
Arcadia Science
Jan 9, 2024

Peptides with identical sequences but different functional class assignments were removed to avoid introducing potential biases in the classifier’s training.

How often does this occur?

Read the original source
Arcadia Science
Jan 9, 2024

The final result, CICERON, consists of nine different binary classifiers capable of identifying the products of microbial fermentation-derived BPs.

Did you try this as a multi-classification problem and end up with a better performance with binary classifiers? Or was the underlying model you used limited to binary classification?

Read the original source
Arcadia Science
Jan 9, 2024

Given the importance of BPs, there have been several attempts to create in-silico approaches to perform a preliminary assignment of the potential functional properties and facilitate the subsequent discovery and testing process in vivo [19–24]. These methods rely on several databases where peptides from various experiments have been collected and classified according to the BPs functional classes. Using the sequence properties of the peptides, such as amino acid composition, or the presence of sequence patterns of interest, peptides can be assigned to a functional class depending on the type of classifier used.

I'm curious if this task is different than determining whether an amino acid sequence of 2-50 aas is a bioactive peptide, or if one must first know that the sequence is a peptide to then apply these tools to categorize the …

Given the importance of BPs, there have been several attempts to create in-silico approaches to perform a preliminary assignment of the potential functional properties and facilitate the subsequent discovery and testing process in vivo [19–24]. These methods rely on several databases where peptides from various experiments have been collected and classified according to the BPs functional classes. Using the sequence properties of the peptides, such as amino acid composition, or the presence of sequence patterns of interest, peptides can be assigned to a functional class depending on the type of classifier used.

I'm curious if this task is different than determining whether an amino acid sequence of 2-50 aas is a bioactive peptide, or if one must first know that the sequence is a peptide to then apply these tools to categorize the peptide sequence into a functional class.

Read the original source
Arcadia Science
Jan 9, 2024

Classification of bIoaCtive pEptides fRom micrObial fermeNtation

One question that this name, and the abstract in general, left me with is whether this method is extensible beyond microbial fermentation peptides. I will continue to read to find out, but I'm wondering if another sentence might be added to clarify this.

I'm also curious if microbial fermentation peptides include all known classes of peptides, or if there are some functional classifications that might not be labelled by CICERON because CICERON has not seen them before. Again I will continue reading to hopefully find out!

Read the original source
Version published to 10.1101/2023.10.04.560809 on bioRxiv
Oct 6, 2023

Raman Spectroscopy of Protein-Polysaccharide Conjugates: A Comparative Study of Tree-Based Ensemble Models

This article has 3 authors:
1. Oksana A. Mayorova
2. Mariia S. Saveleva
3. Ekaterina S. Prikhozhdenko
This article has no evaluationsLatest version Dec 30, 2025
Natural and Synthetic Peptides as Alternatives to Antibiotics in Intestinal Infections—A Review

This article has 10 authors:
1. Lala Stepanyan
2. Monika Israyelyan
3. Alessandro Gori
4. Avetis Tsaturyan
5. Zhaklina Saribekyan
6. Kristina Hovsepyan
7. Tatevik Sargsyan
8. Raffaele Pastore
9. Antonio De Luca
10. Giovanni N. Roviello
This article has no evaluationsLatest version Jan 8, 2026
Predicting Cell-Penetrating Peptide Uptake Mechanism from Sequence: A Machine Learning Approach

This article has 1 author:
1. Nabil Brag
This article has no evaluationsLatest version Jan 21, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Raman Spectroscopy of Protein-Polysaccharide Conjugates: A Comparative Study of Tree-Based Ensemble Models

Natural and Synthetic Peptides as Alternatives to Antibiotics in Intestinal Infections—A Review

Predicting Cell-Penetrating Peptide Uptake Mechanism from Sequence: A Machine Learning Approach