scMILD: Single-cell Multiple Instance Learning for Sample Classification and Associated Subpopulation Discovery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Linking cellular states to clinical phenotypes is a major challenge in single-cell analysis. Here, we present scMILD, a weakly supervised Multiple Instance Learning framework that robustly identifies condition-associated cells using only sample-level labels. After systematically validating scMILD’s accuracy through controlled simulations, we applied it to diverse disease datasets, confirming its ability to retrieve known biological signatures. Building on this, our sample-informed analysis of scMILD-identified monocytes in COVID-19 revealed a temporal transition from an early antiviral to a late stress-response state. Furthermore, in a novel cross-disease application, a model trained on COVID-19 successfully stratified Lupus patients and distinguished shared inflammatory states from disease-specific ones. scMILD thus provides a validated and versatile strategy to dissect cellular heterogeneity, bridging single-cell observations with high-level phenotypes.

Article activity feed