ASPEN: Robust detection of allelic dynamics in single cell RNA-seq
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Review Commons)
Abstract
Single-cell RNA-seq data from F1 hybrids provides a unique framework for dissecting complex regulatory phenomena, but allelic measurements are limited by technical noise. Here, we present ASPEN, a statistical method for modeling allelic mean and variance in single-cell transcriptomic data from F1 hybrids. ASPEN uses a sensitive mapping pipeline and adaptive shrinkage to distinguish allelic imbalance and variance in single cells. Through extensive simulation based on sparse droplet-based single-cell data, ASPEN demonstrates improved sensitivity and control of false discoveries compared to existing approaches. Applied to mouse brain organoids and T cells, ASPEN identifies genes with incomplete X inactivation, stochastic monoallelic expression, and significant deviations in allelic variance. This reveals reduced variance in essential cellular pathways, and increased variance in neurodevelopmental and immune-specific genes.
Article activity feed
-
Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
The authors do not wish to provide a response at this time
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
Summary: The authors present ASPEN - a tool for allelic imbalance estimation in haplotype-resolved single-cell RNA-seq data. Besides the mean of the allelic ratio, ASPEN manages to assess its under- and overdispersion as well as perform group-level comparisons. Dr. Wong with colleagues applied ASPEN to the simulated and publicly available single-cell data from mouse brain organoids and T cells. They showed a general applicability of the tool to this type of data, compared it with scDALI in terms of statistical power, and made numerous conclusions regarding the allele-specific regulation of housekeeping and cell-specific gene …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #3
Evidence, reproducibility and clarity
Summary: The authors present ASPEN - a tool for allelic imbalance estimation in haplotype-resolved single-cell RNA-seq data. Besides the mean of the allelic ratio, ASPEN manages to assess its under- and overdispersion as well as perform group-level comparisons. Dr. Wong with colleagues applied ASPEN to the simulated and publicly available single-cell data from mouse brain organoids and T cells. They showed a general applicability of the tool to this type of data, compared it with scDALI in terms of statistical power, and made numerous conclusions regarding the allele-specific regulation of housekeeping and cell-specific gene expression in general and during cell differentiation, as well as identified examples of X inactivation, imprinting and random monoallelic expression.
Major comments:
- Considering biological insights, the authors focus on genes with the allelic imbalance variance being lower than expected based on the gene expression level, and find them being enriched by the processes essential for cell integrity. I am curious if the variation depends on the number of available cells as well, i.e. housekeeping genes may be more stably expressed from cell to cell. In this context, the authors can compare their results with the stably expressed genes from Lin et al. [https://doi.org/10.1093/gigascience/giz106].
- Continuing with the concerns regarding gene expression level changes, authors do not provide information about the differential expression of their findings. Even where they mention "F1 hybrids revealed 33 genes with significant changes in mean allelic expression and 193 with dynamic variance, independent of total expression changes (Supp. Fig. 3B; Supp. Table 4)" in "Allelic variance reveals transcriptional plasticity across cell states" I could not find the relevant info in the corresponding Figure and Supplementary table. Furthermore, it was shown that low number of cells and gene expression level can affect allelic imbalance estimates as well as lead to false positive random monoallelic expression [https://doi.org/10.1371/journal.pcbi.1008772]. The authors admit it but do not properly discuss how it is related to their RME examples. Are they lowly expressed and/or detected in a limited number of cells?
- The histogram provided in Figure 5C suggests the general RME preference towards maternal (C57BL/6J) haplotype. Can it be caused by the reference mapping bias? The authors suggest the total shifts of a null allelic mean, 0.52 for T cells and 0.54 for brain organoid, being the result of a reference mapping bias. However, using parental genomes should have eliminated this problem unless a substantial part of individual variants were missed due to the strict quality filters.
- Among the genes demonstrating a dynamic allelic imbalance variance during early neurogenesis, the authors found several examples involved in autism spectrum disorders and neuroanatomical phenotypes in mice. They suggest the temporal modulation of variance as a possible regulatory mechanism which may be perturbed in disease states. However, it is hard to estimate the significance of this finding without any enrichment tests. How many disease relevant genes among those with dynamic variance can be expected by chance?
Minor comments:
- Methods would definitely benefit from proofreading, e.g. there are mistakes in the beta-binomial distribution formula, log-transformed gene-level dispersion distribution (it does not follow N(0,1) with zero mean) and gamma likelihood function. Is rho a shape parameter instead of a rate? Specifically, I suggest describing the equitations from the "Bayesian shrinkage implementation" section in more detail. Why does the formula for corrected theta provided in the article deviate from the one presented on github https://github.com/ewonglab/ASPEN/blob/main/R/allelic_imbalance.R, i.e. "thetaCorrected = N/(N-K) * (theta + theta_smoothed(delta/(N-K)))/(1 + (delta/(N-K)))" where K = 1, instead of "thetaCorrected = (N-1)/N * (theta + theta_smootheddelta)/(1 + delta)"? Both gamma and rho also deviate from the script as far as I understood. Moreover, a few steps from the Methods remained unclear to me. First, does ASPEN apply a fixed theta threshold (i.e. of 0.001 from the manual or 0.005 from the article) or performs a more sophisticated MAD-based procedure? Does ASPEN obtain the stabilized thetas using N = 20 and theta = 10, followed by ML to correct both parameters and recalculate the posterior dispersion? Why do tests for static and dynamic allelic variance use different gene-level thetas, stabilized and non-stabilized ones? Does it affect the sensitivity and specificity of group-level analysis?
- Besides formulas, there are minor mistakes throughout the text as well. As such, I assume the sentence "In the dyn-mean test, the dispersion parameter (set to the stabilized group-level value)" from the "Detecting dynamic changes" section should include global dispersion, not the one estimated on the group-level. In the section "Allelic variance reveals transcriptional plasticity across cell states" FDR threshold of 0.5 is mentioned instead of 0.05. Figure captions also contain minor mistakes such as "Genes below the dashed line were excluded from the trend modelling" from Figure 4 which corresponds to B instead of C.
- Why does Figure 5B contain missing allelic ratio estimations? If it is due to the expression filters, please mention it in the caption.
- Given the principles of the dynamic tests, I would suggest calling them "differential", "ANOVA-like" or "group-level" instead of dynamic, since there is no actual possibility to account for the continuous changes over time.
- The example of differential variance from Figure 6D is not very clear to me and Supplementary Figure 5C does not help. I suggest adding histograms to emphasize changes in the allelic imbalance variation.
- The authors managed to uniquely map and unambiguously assign 20-38% of total reads. The weighted allocation procedure from Choi et al. [https://doi.org/10.1038/s41467-019-13099-0] might help to increase the total coverage.
- The discrete low dispersion values in Figures 2, 3A, 4B, 5C and 6A possibly stem from rounding to 4 decimal places. I suggest increasing the accuracy to improve the visual clarity.
- The sentence "Of these, 27 were X-linked, consistent with random X-inactivation dynamics in female cells, and five (Bex2, Ndufb11, Pcsk1n, Sh3bgrl, Uba1) displayed signatures of incomplete X inactivation, by demonstrating largely monoallelic expression in each cell" in the "Monoallelic expression reveals regulatory complexity" section should be rephrased to reflect the proportion of cells demonstrating both alleles expressed.
Significance
Nowadays the allele-specific gene expression analysis using single-cell RNA-seq data is widely used to study allele-specific bursting [https://doi.org/10.1186/s13059-017-1200-8], imprinting, X chromosome inactivation [https://doi.org/10.1038/s42003-022-03087-4] and other processes [https://doi.org/10.1016/j.tig.2024.07.003].
- My field of expertise mostly includes bioinformatic analysis of allele-specific expression and gene regulation using bulk sequencing data. However, to the best of my knowledge, there are three publicly available modern solutions allowing to assess the allelic imbalance using single-cell gene expression data: scDALI published in January 2022 [https://doi.org/10.1186/s13059-021-02593-8], Airpart published in May 2022 [https://doi.org/10.1093/bioinformatics/btac212] and DAESC published in 2023 [https://doi.org/10.1038/s41467-023-42016-9], with the latter not being mentioned by the authors.
- While the authors used simulations to compare ASPEN to scDALI-Hom in terms of sensitivity, I could not find any specificity estimates. The reasons for the statement "ASPEN demonstrated high sensitivity (98%) and specificity (92%) with a low false positive rate (<12%), confirming its capacity to distinguish distinct modes of regulatory variation during lineage differentiation (Fig. 4G)" are also unclear to me since Figure 4G only demonstrates a true positive rate in test and control simulations. Should not FPR be equal to 1 - specificity?
- Moreover, I suggest authors compare ASPEN to Airpart and DAESC along with scDALI as it can underline the scenarios where ASPEN is the best or the only option. Moreover, all these tools can estimate either heterogenous (scDALI-Het) or dynamic (Airpart, DAESC) allelic imbalance which can be compared to the allelic variance and group-level tests, respectively.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
The authors introduce ASPEN (Allele-Specific Parameter Estimation in scRNA-seq), a statistical framework designed to model cis-regulatory variation in single-cell RNA-sequencing data, and demonstrate that ASPEN effectively detects cell state-specific allelic imbalances. Using simulated datasets, the authors show that ASPEN outperforms existing methods (e.g., scDALI) in both sensitivity and specificity. Furthermore, they demonstrate that ASPEN can be used to further dissect allelic imbalance, enabling the identification of random monoallelic expression (RME), gene expression pulsing, and dynamic regulatory shifts.
My main concerns are:
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
The authors introduce ASPEN (Allele-Specific Parameter Estimation in scRNA-seq), a statistical framework designed to model cis-regulatory variation in single-cell RNA-sequencing data, and demonstrate that ASPEN effectively detects cell state-specific allelic imbalances. Using simulated datasets, the authors show that ASPEN outperforms existing methods (e.g., scDALI) in both sensitivity and specificity. Furthermore, they demonstrate that ASPEN can be used to further dissect allelic imbalance, enabling the identification of random monoallelic expression (RME), gene expression pulsing, and dynamic regulatory shifts.
My main concerns are:
- Framework similarity with scDALI: The ASPEN framework shares many conceptual similarities with scDALI. It is not clear why ASPEN significantly outperforms scDALI. The authors should elaborate more clearly on the differences between the two approaches and provide a detailed explanation for the observed improvements.
- Scalability and runtime: The manuscript does not report computational performance metrics (e.g., runtime, memory usage), which would be important for users planning to apply ASPEN to large-scale datasets.
- Comparison to additional tools: While the comparison to scDALI is appropriate, including benchmarking against other recent allele-specific methods (e.g., SCALE, AirPart) would strengthen the evaluation and broaden its relevance.
- User guidance: A figure or supplementary table summarizing required inputs, preprocessing steps, recommended parameters, and filtering strategies would be highly beneficial for potential users.
- Time-series smoothing: The manuscript would benefit from a clearer explanation of how time-series smoothing is implemented within ASPEN, particularly in dynamic cell state contexts.
Significance
The ASPEN framework is useful for identifying single cell ASE and related analysis, which currently is under developed. It is timely and the framework is rigorous and flexible and driving by the data.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
This is an interesting paper, which introduces a new approach and software ASPEN for analysis of allele-specific gene expression, which is applied to transcriptomes of F1 hybrids of mouse lines. The manuscript introduces an interesting statistical technique, which up to my knowledge is correct and brings about new biological results, identifying genes with systematically decreased or increased expression variance and statle allelic expression ratio, which seems to be controlled by the regulatory machinery.
The manuscript has some shortcomings in presentation, it is written very concisely, especially in its methods part, and is …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
This is an interesting paper, which introduces a new approach and software ASPEN for analysis of allele-specific gene expression, which is applied to transcriptomes of F1 hybrids of mouse lines. The manuscript introduces an interesting statistical technique, which up to my knowledge is correct and brings about new biological results, identifying genes with systematically decreased or increased expression variance and statle allelic expression ratio, which seems to be controlled by the regulatory machinery.
The manuscript has some shortcomings in presentation, it is written very concisely, especially in its methods part, and is somewhat difficult to follow.
I'm not sure that the authors make the correct claim in the manuscript. The title and the abstract says that the manuscript discusses the cis-regulatory heterogeneity, but in fact there is very little in the manuscript about gene regulation per ce. The study demonstrates that allele specific expression is controlled by some yet unknown mechanisms, rather than a product of technical noise and then presents a number of examples of different pathways which the increased and decreased allele specific variance. Also the manuscript presents several examples of shifts in the variance of particular genes in temporal development.
Yet, the manuscript tells virtually nothing about regulation, thus the conclusion that 'ASPEN enables the interrogation of cis regulatory effects on gene expression' is not justified in its literal terms; what ASPEN does it quantifies the allele-specific transcription activity effects in a single cell transcriptomics experiment. Mechanistically the observed effects can be explained by any regulatory effect like DNA methylation, chromatin structure or whatever. To prove that cis-regulatory effects are important here the authors need to show the allele specific nature of transcription factor binding (for instance by showing the TF binding motifs destroyed/created by variants). It is more difficult to take into account the chromatin effects without ATAC experiments but it might be that ATAC-seq experiments are available for parental line and there is a differential DNA accessibility in the locality of genes of interest. I think only with such mechanistic illustrations one can conclude that cis-regulatory interactions play a major role here.
As an other option, the authors may publish the study per se but with a changed title, the abstract and the discussion, formulating it in a more phenomenological way.
Minor note
In Figures 2-5 the low variance genes are shown with dots occupying lines parallele to x axis. This can be related to some wrong digitising of variance or to a low numbers of reads contributing to the variance. Please double check.
Significance
The paper introduces a new interesting statistical approach for quantifying allele specific transcription from the single cell data, using Bayesian shrinkage technique similar to that used in edgeR. The paper has clear biological meaning demonstrating that there are genes with a decreased variability in gene expression. I believe, the paper draws attention to the interesting area of facts and as such may be published.
-
