Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport (PILOT)

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Although clinical applications represent the next challenge in single-cell genomics and digital pathology, we still lack computational methods to analyze single-cell or pathomics data to find sample-level trajectories or clusters associated with diseases. This remains challenging as single-cell/pathomics data are multi-scale, i.e., a sample is represented by clusters of cells/structures, and samples cannot be easily compared with each other. Here we propose PatIent Level analysis with Optimal Transport (PILOT). PILOT uses optimal transport to compute the Wasserstein distance between two individual single-cell samples. This allows us to perform unsupervised analysis at the sample level and uncover trajectories or cellular clusters associated with disease progression. We evaluate PILOT and competing approaches in single-cell genomics or pathomics studies involving various human diseases with up to 600 samples/patients and millions of cells or tissue structures. Our results demonstrate that PILOT detects disease-associated samples from large and complex single-cell or pathomics data. Moreover, PILOT provides a statistical approach to find changes in cell populations, gene expression, and tissue structures related to the trajectories or clusters supporting interpretation of predictions.

Article activity feed

  1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Reply to the reviewers

    We have addressed all the queries and suggestions put forth by the reviewers. Major changes include:

    1. Expansion of PILOT Functionality and Analysis: We have substantially extended the functionality and analysis capabilities of PILOT, particularly in relation to sample clustering. This enhancement now encompasses the incorporation of statistical tests aimed at identifying cell types and genes associated with distinct patient groups. We applied this expanded feature in an exploratory analysis of sub-clusters within pancreas ductal adenocarcinoma data (PDAC).
    2. Clarification of Benchmarking Methods: We have provided clear elucidations of the methods employed for benchmarking PILOT alongside competing methodologies. Our benchmarking approach is notably comprehensive, encompassing twelve different datasets and evaluating four to five competing methods through statistical assessment across three problem domains: clustering, distance measurement, and trajectory estimation. The outcomes of these evaluations consistently demonstrate the superior performance of PILOT's Wasserstein metric across all three problem domains. It is noteworthy that previous studies have often limited their analyses to exploratory evaluations on individual datasets, lacking the level of comprehensive benchmarking undertaken in this study.
    3. Examination of Experimental Factors: We have conducted a thorough investigation into the impacts of batch correction, cluster/cell type resolution, and parameter choices used within the PILOT framework.
    4. Enhancement of Text Description: We have enhanced the textual descriptions to provide a high-level overview of the PILOT methodology, along with justifications for the methodological decisions made.
    5. Improvement of Code and GitHub Repository: To enhance accessibility and promote reproducibility, we have made improvements to the codebase and the associated GitHub repository.

    In summary, PILOT stands as a distinctive and all-encompassing framework. It holds the unique distinction of being the sole method offering comprehensive tools for both clustering and trajectory analysis of samples within multiscale single-cell and pathomics data. Moreover, it incorporates statistical methodologies for the interpretation of results. The effectiveness of these tools has been thoroughly validated through the most extensive benchmarking study performed to date on sample-level analysis. The versatility of PILOT is demonstrated through its successful application in exploratory analyses of three distinct datasets: elucidating trajectories in myocardial infarction single-cell RNA-seq data, uncovering trajectories within pathomics data from kidney IgNA patients, and facilitating the clustering of pancreas adenocarcinoma samples. We firmly believe that these contributions hold significant value for the fields of bioinformatics, single-cell genomics, and pathology.

    Reviewer #1 (Evidence, reproducibility and clarity):

    The paper describes a computational method, PILOT, that uses optimal transport to compute the Wasserstein distance between two individual single-cell samples. It uses PILOT to detect sample (patient) level trajectories and clusters associated with diseases. The method was applied separately to single-cell genomics data and to digital pathology data. The method was applied to several datasets and compared against other tools.

    Major comments:

    The paper is not easy to follow and should be improved considerably to make it readable and reproducible. Consequently, I was not convinced that the PILOT method is much better than other methods.

    We extend our appreciation to the reviewer for their valuable suggestion. We have further refined the manuscript by incorporating a comprehensive and high-level description of our method. This expansion encompasses methodological justifications and clarifications to enhance the overall clarity. Additionally, we wish to emphasize that, to the best of our knowledge, our benchmarking analysis stands as the most comprehensive within the current literature. The results of this analysis unequivocally demonstrate that PILOT surpasses all competing methods in at least one of the various computational analysis tasks, namely clustering, trajectory estimation, and distance evaluation.

    Furthermore, we have undertaken significant enhancements in the codebase of PILOT, coupled with a reorganization of the associated GitHub repository. This effort includes the development of in-depth and improved tutorials that faithfully replicate the analyses conducted on datasets related to myocardial infarction, pancreas adenocarcinoma, and pancreas pathomics (https://pilot.readthedocs.io/en/latest/). This changes guarantee the reproducibility of the PILOT framework.

    See below for specific changes and additional clarifications.

    At first read of the title and abstract, I got the impression that the method analyzes single cell and pathomics data concurrently rather than separately. This should be fixed.

    We have changed the text of the abstract and introduction to make clear that PILOT is either applied to single cell or pathomics data independently.

    The usage of Wasserstein distance to compute distance between single-cell samples is elegant and is the main strength of this study. Given that PILOT is the main achievement, it should be described more carefully and in a detailed manner.

    For example, in the first Results paragraph, "The test indicates for features explaining the predicted pseudotime by fitting either linear or quadratic models" - I could not understand this sentence. Also, which test do the authors refer to? A few sentences down, there is a reference to a Wald test, is that it?

    PILOT has three major parts: (1) a method for measuring distance of samples with optimal transport; (2) an patient level unsupervised analysis part (clustering or trajectory analysis) and (3) a part for explaining predicted trajectories/clustering. The sentence mentioned before, refers to the interpretation approach after trajectory analysis. Here, we fit linear, quadratic or linear quadratic models to find association of predicted sample pseudo-time with data features (gene expression values in scRNA or morphological features in pathomics data). This fit can be done for all cells in the data or only for cells from a specific type. In the case of a cell specific fit, we use a Wald test to check if the cell type fit differs from all other cell types in the data, i.e. the gene is associated with the trajectory and the expression changes are specific to the cluster at hand.

    While these details were found in the method section, we agree with the referee that they can be better introduced in the main manuscript. We have therefore improved the first subsection of the results and Figure 1 to reflect this.

    One of the key aspects of the Wasserstein distance is the cost metric. The determination of the cost metric should be detailed as part of the Results. Have the authors considered and estimated other ways to define the distance?

    This is an interesting question. Currently, PILOT uses the Cosine metric. In our revision, we evaluate other metrics (Euclidean, Manhattan, and Chebyshev). This benchmarking indicates that the Cosine and Manhattan performed best regarding the clustering problem (ARI), while Cosine was better than all metrics for the Silhouette statistic; and Cosine and Euclidean performed best regarding AUPR. Therefore, we adopt the Cosine metric in the paper. We include these results in the revised manuscript and in Sup. Fig. 5F-H.

    Figure 1 provides a schematic view of PILOT. However, there is no explanation of the notation, which makes it confusing rather than helpful. Also, what is the relationship between J and j, if any?

    We understand that the figure 1 was problematic, as it did not introduce the formulation. We have now improved the first sub-section of the results page and figure 1 to improve this.

    The motivation and usage of adjusted Rand index (ARI) and Friedman-Nemenyi tests should be provided. Currently, they are not clear, including why those tests are suitable in the cases shown.

    The adjusted Rand index is a well known metric to evaluate clustering results when labels are known. Among others this metric has many interesting features as it does not require an association of clusters with class labels. Moreover, it has a correction for random clustering solutions, therefore values lower than zero indicate poor solutions and values of 1 a perfect solution.

    The Friedman-Nemenyi test allows us to compare the performance of several algorithms whenever evaluated in the same data sets. Here, the null hypothesis is that all algorithms have the same performance (same ARI statistic). The test is nonparametric and is based on the rank of the algorithm at each data set. This is important, as ARI values (or any other evaluation statistic) are data set specific, e.g. some clustering problems are more difficult than others. By evaluating the rank, the test indicates which methods perform relatively better than others. Moreover, it follows a rigorous statistical framework including correction for multiple testing. This test has an increasing adoption in the machine learning community (Demsar et al., JMLR, https://jmlr.org/papers/v7/demsar06a.html).

    We have added phrases with these justifications in the main text (subsection Evaluation of patient-level clustering and trajectory analysis) and included a new section in the materials and methods with more information in the experimental design of the benchmarking analysis.

    Fig. 2 the use of method colors should be constant across panels.

    We have changed the colors of panels in figure 2A-C (and equivalent panels everywhere else) to avoid confusions.

    The proportions method works at least as well as PILOT in 2B and 2C (silhouette and AUPR). Explain why PILOT is better.

    The benchmarking analysis shows that PILOT has the highest ARI value (clustering performance) at absolute and ranking levels (Fig. 2A). Moreover the Friedman-Neymeni test indicates this PILOT has significantly higher ranking than all evaluated methods. Regarding Silhouette (distance evaluation) and AUPR (trajectory evaluation) both proportion and PILOT have similar absolute values (Fig. 2B and 2C; panel left), while PILOT has a superior ranking in both cases (Fig. 2B and 2C panel right). Friedman-Neymeni test indicates higher ranking of PILOT than PhEMD for Silhouette and higher ranking of PILOT than PhEMD and pseudo-Bulk regarding trajectory evaluation. The difference in the results on absolute and ranking values can be understood by looking at the statistics in table Table S1. PILOT has highest AUPR in 8 out 12 data sets; proportion has highest values in 5 (including 4 ties with PILOT); proportion-PHATE had 3 best results (including 3 ties with both PILOT and proportions), while PhEMD is best in one data set and Pseudo-bulk in 3 (including 1 tie with PILOT). Altogether, PILOT obtained a higher or equal AUCPR in 9 out of the 12 data sets. We have also changed Fig.2A, 2B and 2C to include all data points and to show the mean, as this provides a better visualization of the previously reported results.

    Altogether, these results indicate that PILOT outperforms all competing methods in at least one of the evaluated problems (clustering, trajectory and distance estimation) and ranks favorably in all evaluated scenarios. We have changed the manuscript text to reflect these results.

    Likewise, Figure 2C,D and Figures S1 and S2 don't show a clear and consistent advantage for PILOT over other methods. Explain what advantage of PILOT do the fraction panels highlight in Fig. 2E and Fig. S3. Fig. 2C is not mentioned in the text.

    Figure 2D, 2E, and now figures S2 and S3 represent visualizations of the results, which were statistically evaluated in panels of Fig.2A-2C. As discussed in the previous point, PILOT does perform better than all methods for the clustering problem and performs better or as good as the proportion test on 9 of the 12 evaluated data sets in the trajectory problem. We also have improved the text to include references to all figures in the main text.

    I assume Kidney IgAN (text) and Kidney IgA (fig. 2) are the same.

    The correct name is IgAN and this has been corrected in Figure 2.

    Fig. 3B fix the p-value notation (what is p=1.05E?) and R2 (R square?). Nrte tha both this problem also occurs in other figs. Fig. 3B shows the major cellular changes.

    We now adopt the term “R-squared” in the figures. Also, the previous version did not display p-values properly. We apologize for this. This has been fixed now.

    Are these changes consistent with known ones? Explain and provide references. Are there cell types that were expected to show a change and did not? Same questions for Fig. 3C wrt genes. Is this an exploratory analysis highlighting interesting candidate genes? If so, it should be described as such.

    Cardiac remodeling after myocardial infarction is characterized by loss of cardiomyocytes, infiltration by immune cells (myeloid and lymphocytes) and increase in myofibroblast populations (doi.org/10.1038/s41392-022-00925-z;doi.org/10.3389/fcvm.2019.00026). PILOT indicates these populations, with the exception of lymphocytes, are most relevant at both clustering levels (see Sup. Fig, 6). Particularly important are results from the low granularity analysis, as this indicates particular macrophage/fibroblast sub-populations (SPP1+ Mac. and Myofibroblast) with increase in disease. PILOT could not detect changes in lymphocyte cells, but this is explained by the poor coverage of these cells in the data set (>3%). We have updated the main manuscript to reflect this.

    We also explicitly mention that the analysis of genes and cells are exploratory analysis.

    The point of Fig. S6 and its major findings should be mentioned in the text (or it can be removed).

    We now make the reference to the gene ontology analysis presented in the new Figure S7 more explicit in the text.

    Fig. 4B legend - eGFR not GFR. What do the high-low values of Fig. 2B refer to?

    We have fixed these points.Hhigh and low values of panel 4B refer to the eGFR.

    Fig. S12 is out of order in supp file.

    This has been fixed.

    AUCPR - explain.

    The AUCPR stands for area under the curve of the precision recall (AUCPR) curve. We have now improved the explanation of the evaluation metric in the main text and methods section.

    The github looks like work in progress with many internal comments (eg, add ,edit, etc). I could not find the tutorials.

    We have removed all the comments, improved the repository organization and code. The tutorials are explicitly mentioned in the main github page (https://github.com/CostaLab/PILOT/) and in readthedocs webpage (https://pilot.readthedocs.io/). It include tutorials replicating analysis with trajectory inference and clustering problems, which are discussed in the manuscript.

    In the process of code review, we have noticed that while we could replicate all the analysis, the procedure for selection of healthy cardiomyocyte genes was distinct (gene were ranked by regression model fit p-value) than the analysis of the myofibroblast genes (genes were ranked by the Wald test p-value). As explained before, the Wald test, which compares the expression of the regression model fits across samples, is a more appropriate criteria, as it finds cluster and trajectory specific genes. We have changed the analysis of the cardiomyocyte to make the gene selection to be based on the Wald-test p-value. New results recover other sarcomere related genes (MYBPC3 and MYOM1) as being dysregulated during disease progression. These findings are in accordance with observations made in the original study presenting the data (Kuppe et al. 2022). We have updated Fig.3 and respective genes accordingly.

    Minor comments:

    Introduction: "Alternatively, trajectory analysis can be performed to uncover disease progression allowing the characterization of early disease events." Citations should be added (some appear later in the text).

    We included a reference to PhEMD.

    "Currently, there are no analytical methods to compare two single cell experiments from the same tissue from two distinct individuals." There have been several comparisons among data from patients, (e.g. Cain et al, 2023), so the authors should be more careful/accurate in their statements.

    We assume that the referee mentions https://www.nature.com/articles/s41593-023-01356-x. Indeed, we were not aware of this recently published study. The manuscript focuses on comparing cell proportion changes (estimated by deconvolution) between distinct phenotypes and does not provide any approach for sample level analysis of single cell data. This is in our view a different problem than the one addressed by PILOT or PhEMD. We refer to it in our manuscript, as its cell community based analysis is an interesting approach for the interpretation of PILOT results.

    "Except for PhEMD, all related methods9, 11, 12 require labels of patients for their analysis and cannot be used in the unsupervised analysis " - this sentence comes immediately after describing ref 13, which can be used in unsupervised analysis and accordingly is not cited in this sentence. The authors did well in describing ref 13 (a bioRxiv paper), and its description should come after this sentence.

    We changed the text to reflect this.

    "These can be clusters", clustered?

    Done.

    " acquire an injury cell states" remove an.

    Done.

    "As for scRNA-seq, there is no analytical method which is able to compare two or more histological slides based on morphometric properties of their structures." The sentence seems to refer to pathomics, not to sc data as suggested in "As for scRNA-seq"

    This has been rephrased.

    "Thus PILOT represents the first approach to detect unknown patient trajectories and clusters" patient clusters were also observed by others (eg ref 13, Cain et al).

    This has been rephrased.

    Equation 7 - Cosine(Mi,Mi) should be Cosine(Mi,Mj)

    Done.

    In the beginning of the Results, PILOT is not referred to as a package but as a researcher ("PILOT explores").

    This has been rephrased.

    Reviewer #1 (Significance):

    In general, the paper is a Methods paper. Hence, likely audience includes computational biologists interested in methodologies, not to biologists interested in the actual findings.

    Although I am among the likely audience, I was not convinced by the merits of the method, potentially due to the way the paper was written.

    I do not have sufficient expertise to check the math.

    In this revision, we have significantly enhanced the text to incorporate high-level descriptions of methods tailored for non-computational experts. Additionally, we have refined the description of the benchmarking process, which, as far as our knowledge extends, stands as the most comprehensive in the literature. This comprehensive analysis strongly underscores the statistical superiority of PILOT when compared to other methods. Lastly, PILOT presents an unique framework, encompassing methods for trajectory analysis, clustering, and interpretation of sample-level analyses within the realm of multiscale single-cell genomics and pathomics data.

    Reviewer #2 (Evidence, reproducibility and clarity):

    Joodaki et al. propose PILOT, a computational method for analysing single-cell genomics and pathomics data. PILOT is a method that enable clustering, trajectory analysis, and feature detection at a patient level using scRNA-seq data. This is an important task and represent the growing application of scRNA-seq to understand diseases and other perturbations to other biological systems. In particular, PILOT enables unsupervised analysis which alleviate the need of patient labels required by many alternative methods. We have the following comments for the authors' consideration.

    1. A key consideration in dealing with scRNA-seq data at a patient level is the batch effect in the data. Typically, each patient sample may be treated as a "batch" especially when they are processed separately to obtain a scRNA-seq dataset that are subsequently combined with scRNA-seq datasets from other patients to form a single dataset. Analysing these data without batch correction may lead to the identification of "cell types" and "states" that are driven by batch effect. In Figure 1. PILOT takes a clustered and integrated scRNA-seq data as input for analysis. I wonder how PILOT would behave if there is a strong batch effect in the data and how would the authors propose to handle them?

    This is an interesting question. Currently, PILOT is using the batch procedure used in the paper proposing the original data. We evaluate now the impact of batch correction methods implemented in scanpy (Harmony, bknn and Scanorama). We focus here on single cell data, which we have access to the original count matrix (Lupus, COVID, and Diabetes). We observe no impact of the batch correction algorithm in these data sets (see Sup. Fig. 5C-E). These results are now included in the manuscript.

    We have noticed however that strong batch effects in the lung cell atlas or the kidney cell atlas.For the lung cell atlas, we observed that single cell data measured from distinct techniques (Seq-well, Drop-seq, 10x 5’ and 10x 3’) or distinct tissue sampling approaches confounded results for all evaluated approaches. Therefore, we restricted the analysis to the technology with more samples (10x genomics 3’) and to lung tissues. This sample selection was previously described in the material and methods. Of note, the use of samples from distinct 10x genomic version kits (v1, v2 or v3) did not impact results. For the kidney cell atlas, we also observed a strong batch between single nuclei and single cell protocols. Here, we opted to focus on the largest cohort of single cell RNA experiments (see Review Fig. 1). Altogether, PILOT and other evaluated methods do require samples to be analyzed with an uniform technique and sampling approaches. We now include a brief discussion about this open point in the “Discussion” section. This is an important topic of future research.

    Review Fig. 1. - Data of the Kidney Precision Medicine Project was measured using either single cell or single nucleus protocols. All evaluated methods were affected by the differences in these technologies and could not separate disease status in this data.

    1. It appears that the Wasserstein distance (W) matrix of the samples was used for patient clustering and also trajectory analysis. However, most of the figures presented in the manuscript are for trajectory analysis. Since the patient clustering were performed prior to trajectory analysis, could the authors visualize the patient data based on the W before performing disease trajectory estimation?

    Indeed, despite the clustering-based analysis (ARI statistics; Fig. 2A) the current manuscript focuses on results of the trajectory analysis. We now include additional features for clustering analysis. This includes heatmap visualizations of the OT distance matrices together with Leiden clustering (Sup. Fig. 1). See points 4 and 5 below for further changes regarding clustering analysis.

    1. In trajectory analysis in figure 2D and E, why not use Multi-scale PHATE which appears to be specifically designed for trajectory analysis? The authors also mentioned SCANCell. While these methods require labels of patients for their analysis, it would be interesting to know how well they perform in comparison to PILOT if such information is available.

    This is an interesting point. Multiscale-PHATE is based on doing a multi-resolution clustering of the cells. It then applies PHATE (instead of diffusion map) to find a non-linear embedding on the cell proportions across samples and resolutions. While this analysis is presented at Multiscale-PHATE manuscript (Fig. 5), we could not find any code or functionality in their github to replicate this (https://github.com/KrishnaswamyLab/Multiscale_PHATE). Moreover, we were not able to find a function to find the cluster/resolution associations of cells to reimplement the above mentioned analysis following the descriptions of the manuscript. We also contacted authors, but obtained no reply. It is also important to state that Multi-PHATE used a supervised filter to select cell types for further analysis.

    Alternatively, we now include an evaluation of the use of cell proportions followed by a PHATE embedding in the trajectory based evaluation, which is close to the method proposed in Multiscale-PHATE. Our benchmarking indicates that Multiscale-PHATE is the third best ranked method being overpassed by proportion and PILOT. Regarding SCANCell, it focuses on the interpretation of cell communities and it uses embedding/distances by exploring PhEMD. Therefore, its performance in the trajectory or clustering performance problem is the same as PhEMD. We refer to these points in the text now.

    1. The current design of PILOT appears to assume that there is always a "smooth" trajectory in the data. Is this going to be the case in reality? What if we have a well separated and distinct groupings of the patients and controls data? In the latter case, imposing a trajectory seems artificial. I am also not sure how meaningful the trajectory analysis would be if, biologically, such a smooth transition is not present in the data.

    The EMD based distance can be used both for clustering or trajectory analysis. Also, PILOT performed quite well in the clustering problem benchmarking (Fig. 2A). The choice of application lies on the problem at hand. In our view, both the kidney pathomics and the myocardial infarction data (explored in Fig. 3 and Fig. 4) represent medical data with potential disease trajectories. We now expand the PILOT framework to include new visualizations and statistical methods to improve the interpretation of the clusterings (see point 2; Fig. S1; and point 5 and new Fig. 5).

    1. The feature analysis is also built on trajectory analysis using regression models. Again, how would this work out if there isn't a smooth trajectory/transition in the data (e.g. the data are obtained from a discrete case-control study)?

    We expanded the PILOT framework to also include statistical tests for accessing changes in cell populations and markers for the clustering problem. First, we use a Welch’s t-test to evaluate cell proportion changes associated with detected clusters. Next, we use a differential analysis test from limma to find genes within a cluster, whose gene expression is changing between the two groups of samples for a given cluster of cells. While these are standard approaches in the literature, this further improves the functionality of PILOT as a general framework for patient level analysis. This is now described in PILOT manuscript (Results subsection “Patient level distance with Optimal Transport” and methods).

    We also include in the manuscript an explorative analysis of a sub-cluster found in the PDAC data. This analysis could find a population of PDAC patients displaying higher levels of malignant cells and marked by both increase in hypoxia and fibrosis pathways. This example highlights how PILOT can be used to find potentially interesting groups of samples. These are implemented in the main manuscript (Fig. 5). We also include a new tutorial of PILOT with this analysis (see https://pilot.readthedocs.io/).

    1. It is not clear from the formulation of PILOT (and also Figure 1) if the cell type labels is required/used or the cluster id of a clustering algorithm was used instead. The author also mentioned that the clustering output does not have much impact on the downstream analysis. I wonder why and if so can we group the data in any way we want for downstream analysis? This can be useful when one would like to focus on certain grouping of cells.

    Clustering at the cells (or structure level) is required. For the benchmarking analysis, we have used the cell annotation reported in the original paper, which were derived via clustering analysis. The use of annotated clusters is crucial for interpretation. We also included in the original manuscript an analysis on the impact of the clustering resolution of the Leiden algorithm. This indicates that the change in resolution did not have a high impact in the clustering (ARI) of the samples (Sup. Fig. 5A-B). However, this analysis could not consider any interpretation of results, as cluster labels were not present.

    We believe, however, that the granularity of the clustering will impact the interpretation of the sample analysis. To investigate this, we evaluate how using higher level annotation/clustering of the heart myocardial infarction (also reported in Kuppe et al. 2022) impacts our ability to find cell specific changes. We observe similar changes whenever using low resolution clustering (decrease of cardiomyocytes, increase in fibroblasts and myeloid cells). However, this analysis loses a lot of important nuances found in the high resolution clustering (see Sup. Fig. 6). For example, it does not recover the fact that damaged cardiomyocyte populations have a slower decay than healthy myocytes. Or the fact that myofibroblasts has an increase in the latter disease trajectory stage, while progenitor fibroblast cells (Fibro_Scara5) have an increase previous to myofibroblasts. These results show how low resolution clustering can lead to loss of interesting information contained in cellular sub-states or cell sub-populations. This is now discussed in the results subsection ‘PILOT trajectories detect events associated with cardiac remodeling in myocardial infarction’.

    Reviewer #2 (Significance):

    PILOT is designed for analyzing scRNA-seq data at a patient level. There is a growing application of scRNA-seq to diseases and the development of computational tools for analyzing such data at phenotype level is critical. The key aspect of PILOT compared to other currently available tools is that it enables unsupervised analysis which alleviate the need of patient labels required by many alternative methods.

    Thanks for this very positive feedback and constructive comments.

  2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #2

    Evidence, reproducibility and clarity

    Joodaki et al. propose PILOT, a computational method for analysing single-cell genomics and pathomics data. PILOT is a method that enable clustering, trajectory analysis, and feature detection at a patient level using scRNA-seq data. This is an important task and represent the growing application of scRNA-seq to understand diseases and other perturbations to other biological systems. In particular, PILOT enables unsupervised analysis which alleviate the need of patient labels required by many alternative methods. We have the following comments for the authors' consideration.

    1. A key consideration in dealing with scRNA-seq data at a patient level is the batch effect in the data. Typically, each patient sample may be treated as a "batch" especially when they are processed separately to obtain a scRNA-seq dataset that are subsequently combined with scRNA-seq datasets from other patients to form a single dataset. Analysing these data without batch correction may lead to the identification of "cell types" and "states" that are driven by batch effect. In Figure 1. PILOT takes a clustered and integrated scRNA-seq data as input for analysis. I wonder how PILOT would behave if there is a strong batch effect in the data and how would the authors propose to handle them?
    2. It appears that the Wasserstein distance (W) matrix of the samples was used for patient clustering and also trajectory analysis. However, most of the figures presented in the manuscript are for trajectory analysis. Since the patient clustering were performed prior to trajectory analysis, could the authors visualize the patient data based on the W before performing disease trajectory estimation?
    3. In trajectory analysis in figure 2D and E, why not use Multi-scale PHATE which appears to be specifically designed for trajectory analysis? The authors also mentioned SCANCell. While these methods require labels of patients for their analysis, it would be interesting to know how well they perform in comparison to PILOT if such information is available.
    4. The current design of PILOT appears to assume that there is always a "smooth" trajectory in the data. Is this going to be the case in reality? What if we have a well separated and distinct groupings of the patients and controls data? In the latter case, imposing a trajectory seems artificial. I am also not sure how meaningful the trajectory analysis would be if, biologically, such a smooth transition is not present in the data.
    5. The feature analysis is also built on trajectory analysis using regression models. Again, how would this work out if there isn't a smooth trajectory/transition in the data (e.g. the data are obtained from a discrete case-control study)?
    6. It is not clear from the formulation of PILOT (and also Figure 1) if the cell type labels is required/used or the cluster id of a clustering algorithm was used instead. The author also mentioned that the clustering output does not have much impact on the downstream analysis. I wonder why and if so can we group the data in any way we want for downstream analysis? This can be useful when one would like to focus on certain grouping of cells.

    Significance

    PILOT is designed for analysing scRNA-seq data at a patient level. There is a growing application of scRNA-seq to diseases and the development of computational tools for analysing such data at phenotype level is critical. The key aspect of PILOT compared to other currently available tools is that it enables unsupervised analysis which alleviate the need of patient labels required by many alternative methods.

  3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

    Learn more at Review Commons


    Referee #1

    Evidence, reproducibility and clarity

    The paper describes a computational method, PILOT, that uses optimal transport to compute the Wasserstein distance between two individual single-cell samples. It uses PILOT to detect sample (patient) level trajectories and clusters associated with diseases. The method was applied separately to single-cell genomics data and to digital pathology data. The method was applied to several datasets and compared against other tools.

    Major comments:

    The paper is not easy to follow and should be improved considerably to make it readable and reproducible. Consequently, I was not convinced that the PILOT method is much better than other methods.

    At first read of the title and abstract, I got the impression that the method analyzes single cell and pathomics data concurrently rather than separately. This should be fixed.

    The usage of Wasserstein distance to compute distance between single-cell samples is elegant and is the main strength of this study. Given that PILOT is the main achievement, it should be described more carefully and in a detailed manner.

    For example, in the first Results paragraph, "The test indicates for features explaining the predicted pseudotime by fitting either linear or quadratic models" - I could not understand this sentence. Also, which test do the authors refer to? A few sentences down, there is a reference to a Wald test, is that it?
    One of the key aspects of the Wasserstein distance is the cost metric. The determination of the cost metric should be detailed as part of the Results. Have the authors considered and estimated other ways to define the distance?

    Figure 1 provides a schematic view of PILOT. However, there is no explanation of the notation, which makes it confusing rather than helpful. Also, what is the relationship between J and j, if any?

    The motivation and usage of adjusted Rand index (ARI) and Friedman-Nemenyi tests should be provided. Currently, they are not clear, including why those tests are suitable in the cases shown.

    Fig. 2 the use of method colors should be constant across panels. The proportions method works at least as well as PILOT in 2B and 2C (silhouette and AUPR). Explain why PILOT is better. Likewise, Figure 2C,D and Figures S1 and S2 don't show a clear and consistent advantage for PILOT over other methods. Explain what advantage of PILOT do the fraction panels highlight in Fig. 2E and Fig. S3. Fig. 2C is not mentioned in the text.

    I assume Kidney IgAN (text) and Kidney IgA (fig. 2) are the same.

    Fig. 3B fix the p-value notation (what is p=1.05E?) and R2 (R square?). Norte tha both this problem also occurs in other figs. Fig. 3B shows the major cellular changes. Are these changes consistent with known ones? Explain and provide references. Are there cell types that were expected to show a change and did not?
    Same questions for Fig. 3C wrt genes. Is this an exploratory analysis highlighting interesting candidate genes? If so, it should be described as such.
    The point of Fig. S6 and its major findings should be mentioned in the text (or it can be removed).

    Fig. 4B legend - eGFR not GFR. What do the high-low values of Fig. 2B refer to?
    Fig. S12 is out of order in supp file.

    AUCPR - explain.

    The github looks like work in progress with many internal comments (eg, add ,edit, etc). I could not find the tutorials.

    Minor comments:

    Introduction: "Alternatively, trajectory analysis can be performed to uncover disease progression allowing the characterization of early disease events." Citations should be added (some appear later in the text).
    "Currently, there are no analytical methods to compare two single cell experiments from the same tissue from two distinct individuals." There have been several comparisons among data from patients, (e.g. Cain et al, 2023), so the authors should be more careful/accurate in their statements.
    "Except for PhEMD, all related methods9, 11, 12 require labels of patients for their analysis and cannot be used in the unsupervised analysis " - this sentence comes immediately after describing ref 13, which can be used in unsupervised analysis and accordingly is not cited in this sentence. The authors did well in describing ref 13 (a bioRxiv paper), and its description should come after this sentence.

    "These can be clusters", clustered?

    " acquire an injury cell states" remove an.

    "As for scRNA-seq, there is no analytical method which is able to compare two or more histological slides based on morphometric properties of their structures." The sentence seems to refer to pathomics, not to sc data as suggested in "As for scRNA-seq"

    "Thus PILOT represents the first approach to detect unknown patient trajectories and clusters" patient clusters were also observed by others (eg ref 13, Cain et al).

    Equation 7 - Cosine(Mi,Mi) should be Cosine(Mi,Mj)

    In the beginning of the Results, PILOT is not referred to as a package but as a researcher ("PILOT explores").

    Significance

    In general, the paper is a Methods paper. Hence, likely audience includes computational biologists interested in methodologies, not to biologists interested in the actual findings.

    Although I am among the likely audience, I was not convinced by the merits of the method, potentially due to the way the paper was written.

    I do not have sufficient expertise to check the math.