Optimizing Model Performance and Interpretability: an application to biological data classification

Zhenyu Huang
Yangkun Cao
Qiufen Chen
Bocheng Shi
Yuqing Li
Gangyi Xiao
Xuechen Mu
Ying Xu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In biological data classification, both performance accuracy and result interpretability are desired and yet difficult to achieve simultaneously. We present a framework for transcriptomic data-based classification that can accomplish both. The key idea is as follows: 1) to identify metabolic pathways whose expressions have strong discerning power in separating samples having distinct labels, hence providing a basis for providing interpretability of the classification results; 2) to select pathways from the afore-identified whose expression variance for each can be largely captured by its first principal component of the gene-expression matrix for the pathway, hence allowing to select a minimal number of discerning pathways; 3) to select a minimal set of genes whose collective discerning power covers 95% of the discerning power for each selected pathway, giving rise to a set of features (genes) for classification; and 4) to select a model among the available ones and model parameters that give the optimal classification results. We have demonstrated the effectiveness of this framework on two cancer biology problems. We anticipate that this framework will be used for the selection of features, model, and model parameters for a wide range of biological data classification problems.

Version published to 10.21203/rs.3.rs-4646752/v1 on Research Square
Jul 22, 2024

Classification of Bio-Data with Interval Dissimilarities: A Multidimensional Scaling Framework

This article has 4 authors:
1. Md. Anwarul Islam Bhuiyan
2. Sohana Jahan
3. Md. Babul Hasan
4. Md. Maruf Hossain
This article has no evaluationsLatest version Jan 21, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Pathway-Centric Global Expression Profiling Reveals Key Molecular Drivers in Hepatocellular Carcinoma

This article has 2 authors:
1. Raghavendra Krishnappa
2. Kanthesh M B
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Classification of Bio-Data with Interval Dissimilarities: A Multidimensional Scaling Framework

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Pathway-Centric Global Expression Profiling Reveals Key Molecular Drivers in Hepatocellular Carcinoma