High-dimensional Biomarker Identification for Scalable and Interpretable Disease Prediction via Machine Learning Models

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Omics data generated from high-throughput technologies and clinical features jointly impact many complex human diseases. Identifying key biomarkers and clinical risk factors is essential for understanding disease mechanisms and advancing early disease diagnosis and precision medicine. However, the high-dimensionality and intricate associations between disease outcomes and omics profiles present significant analytical challenges. To address these, we propose an ensemble data-driven biomarker identification tool, Hybrid Feature Screening (HFS), to construct a candidate feature set for downstream advanced machine learning models. The pre-screened candidate features from HFS are further refined using a computationally efficient permutation-based feature importance test, forming the comprehensive High-dimensional Feature Importance Test (HiFIT) framework. Through extensive numerical simulations and real-world applications, we demonstrate HiFIT’s superior performance in both outcome prediction and feature importance identification. An R package implementing HiFIT is available on GitHub ( https://github.com/BZou-lab/HiFIT ).

Article activity feed