Resolution Tradeoffs in Modularity Clustering with Application to Single Cell RNA-seq

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Modularity based clustering was introduced in the network literature for community detection and is now commonly applied to single cell RNA-seq (scRNAseq) datasets for cell type identification. Modularity clustering depends on a resolution parameter, which implicitly determines the number of clusters inferred, but no theory exists describing clustering as a function of the resolution. For scRNAseq, an improperly chosen resolution parameter can lead to erroneous or missed cell types.

In this work, we provide an explicit description of clustering as a function of the resolution parameter through the notion of a splitting resolution, the minimum resolution at which a graph or subgraph is split into multiple clusters. Ve show that the splitting resolution of a subgraph is inversely proportional to the frequency of the subgraph within the graph. This result extends the resolution limit result of Fortunato and Barthelemy to the setting of a general resolution parameter value.

In the network literature, the starting point for modularity clustering is a graph, but in scRNAseq applications the starting point is a cell embedding used to form a graph. Ve show that cell embeddings in scRNAseq can be approximated by Gaussian mixtures. Ve then study splitting resolutions of k-nearest neighbor graphs formed from cell embeddings distributed as a normal or a pair of isotropic normals. For such graphs, we derive formulas for the splitting resolution as a function of sample size, embedding dimension, and the covariance structure of the normals. Ve use our results to provide specific examples of type I vs II error tradeoffs implicit in the choice of the resolution parameter.

Article activity feed