MLMarker: A machine learning framework for tissue inference and biomarker discovery
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
MLMarker is a machine learning tool that computes continuous tissue similarity scores for proteomics data, addressing the challenge of interpreting complex or sparse datasets. Trained on 34 healthy tissues, its Random Forest model generates probabilistic predictions with SHAP-based protein-level explanations. A penalty factor corrects for missing proteins, improving robustness for low-coverage samples. Across three public datasets, MLMarker revealed brain-like signatures in cerebral melanoma metastases, achieved high accuracy in a pan-cancer cohort, and identified brain and pituitary origins in biofluids. MLMarker provides an interpretable framework for tissue inference and hypothesis generation, available as a Python package and Streamlit app.