A Multi-Label Cascade Flexible Neural Forest Model for Predicting the Subcellular Location of Multi-site Bacterial Proteins
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The study of subcellular localization of multi-site proteins provided an extremely important reference value for understanding the pathogenesis of diseases, drug design and disease prevention. At present, the multi-label learning models used for the subcellular localization of multi-site proteins frequently suffered from low prediction accuracy and inability to accurately localize protein sequences with low similarity. In this paper, a multi-label cascade flexible neural forest (MLCFN Forest) model was proposed to accomplish the subcellular localization of multi-site proteins. The model maximized the retention of inter-label correlation by “coding-classification-decoding” protein labels. The proposed multi-label model used flexible neural tree (FNT) as the basic learner, which can automatically determine its own network structure during model training. By introducing "FNT Group", it broke through the limitation of the single output structure in FNT. Finally, the proposed model used a layer-by-layer widening hierarchical processing framework, which not only improved the prediction performance of the model, but also avoided the waste of model structure and algorithm calculation as much as possible. Experiments on Gram-negative bacteria and Gram-positive bacteria data sets and tests in low-dimensional sample space showed that the proposed model can effectively improve the prediction accuracy of multi-site protein subcellular localization.