Integrated ambient modeling and genetic demultiplexing of single-cell RNA+ATAC multiome experiments with Ambimux
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single cell technologies have advanced at a rapid pace, providing assays for various molecular phenotypes. Droplet-based single cell technologies, particularly those based on nuclei isolation, such as simultaneous RNA+ATAC single-cell multiome, are susceptible to exogenous ambient molecule contamination, which can increase noise in cell type-level associations. We reasoned that genotype-based sample multiplexing can provide an opportunity to infer this ambient contamination by leveraging DNA variation in sequenced reads. Thus, we developed ambimux, a likelihood-based method to estimate ambient fractions and demultiplex single-cell multiome experiments using genotype-level data. Ambimux models the ambient or nuclear probability at the read level and thus can classify empty droplets and estimate droplet-specific ambient molecule fractions in each modality. We first evaluated our method using simulated data sets across a range of parameters. We found that ambimux closely estimated the ground truth droplet contamination fractions in the RNA (MAE=0.048) and ATAC (MAE=0.042) modalities. As a result, ambimux maintained high specificity (>95%) and was able to correctly assign singlets at considerably high ambient fractions (up to 60%) for both RNA and ATAC modalities. In comparison with models that do not consider ambient contamination, these only maintained similar sensitivity levels at considerably lower ambient fractions (up to 25%). We then generated a real data set of seven visceral adipose tissue biopsies run on a single 10x Multiome channel. We ran ambimux and detected 4,986 singlets, capturing similar numbers as other methods.
Then, we sought to evaluate the fidelity of the ambient fraction estimates from ambimux. We split singlets into ambient-enriched (>5% contamination in both modalities) or nuclear-enriched (<5% in both) droplets and performed gene-peak linkage analysis. Low ambient droplets resulted in more significant hits with gene-peak links enriched at the transcription start site relative to high ambient droplets, suggesting that the ambient droplets identified by ambimux hamper the identification of biologically meaningful signals. In summary, we developed a joint single-cell multiome demultiplexing method, ambimux, that accurately models and estimates ambient molecule contamination in each modality.