Single-cell transcriptome profiles of Drosophila fruitless-expressing neurons from both sexes

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife assessment

    This manuscript will be of interest to both developmental biologists and neuroscientists. The data suggest that most (but not all) neuronal types are present in both Drosophila sexes, and the existence of mostly shared scSeq clusters suggests that sex-specific versions of the transcription factor Fruitless can modify neural function in a sex-specific way without completely altering core neural identity. This cell type gene expression atlas should prove valuable in future efforts to understand mechanisms of sex-specific development, as well as the molecular and developmental-genetic basis of sex differences in behaviour.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Drosophila melanogaster reproductive behaviors are orchestrated by fruitless neurons. We performed single-cell RNA-sequencing on pupal neurons that produce sex-specifically spliced fru transcripts, the fru P1-expressing neurons. Uniform Manifold Approximation and Projection (UMAP) with clustering generates an atlas containing 113 clusters. While the male and female neurons overlap in UMAP space, more than half the clusters have sex differences in neuron number, and nearly all clusters display sex-differential expression. Based on an examination of enriched marker genes, we annotate clusters as circadian clock neurons, mushroom body Kenyon cell neurons, neurotransmitter- and/or neuropeptide-producing, and those that express doublesex . Marker gene analyses also show that genes that encode members of the immunoglobulin superfamily of cell adhesion molecules, transcription factors, neuropeptides, neuropeptide receptors, and Wnts have unique patterns of enriched expression across the clusters. In vivo spatial gene expression links to the clusters are examined. A functional analysis of fru P1 circadian neurons shows they have dimorphic roles in activity and period length. Given that most clusters are comprised of male and female neurons indicates that the sexes have fru P1 neurons with common gene expression programs. Sex-specific expression is overlaid on this program, to build the potential for vastly different sex-specific behaviors.

Article activity feed

  1. Author Response

    Reviewer #1 (Public Review):

    Neuronal tissues are very complex and are composed of a large number of neuronal types. With the advent of single-cell sequencing, many researchers have used this technology to generate atlases of neuronal structures that would describe in detail the transcriptome profiles of the different cell types. Along these lines, in this manuscript, the authors present single-cell transcriptomic data of the fruitless-expressing neurons in the Drosophila male and female central nervous systems. The authors initially compare cell cluster composition between male and female flies. They then use the expression of known markers (such as Hox genes and KC neuronal markers) to annotate several of their clusters. Then, they look in detail at the expression of different terminal neuronal genes in their transcriptomic data: first, they look into neurotransmitter-related genes and how they are expressed in the fruitless-expressing neurons; they describe in detail these populations that they then verify the expression patterns by looking into genetic intersections of Fru with different neurotransmitter-related genes. Then, they look at Fru-neurons that express circadian clock genes, different neuropeptides and neuropeptide receptors, and different subunits of acetylcholine receptors. Finally, they look into genes that are differentially expressed between male and female neurons that belong to the same clusters. They find a large number of genes; through GO term enrichment analysis, they conclude that many IgSF proteins are differentially expressed, so they look into their expression in Fru-neurons in more detail. Finally, they compare transcription factor expression between male and female neurons of the same cluster and they identify 69 TFs with cluster-specific sex-differential expression.

    In general, the authors achieved their goal of generating and presenting a large and very useful dataset that will definitely open a large number of research avenues and has already raised a number of interesting hypotheses. The data seem to be of good quality and the authors present a different aspect of their atlas.

    The main drawback is that many of the analyses are very superficial, resulting in the manuscript being handwavy and unsupported. The manuscript would benefit by reducing the number of "analyses" to the ones that are also in vivo validated and by discussing some of the drawbacks that are inherent to their experimental procedure.

    scRNA-seq studies generate atlases that are descriptive, by their nature. Therefore, we decided to keep interesting gene-expression analyses in the paper that are based on the scRNA-seq results, especially for the discoveries that point to exciting avenues for future pursuit. We reduced the text as suggested.

    1. The authors treat their male, female, and full datasets as three different samples. At the end of the day, these are, for the most part, equivalent neuronal types. The authors should decide to a) either only use the full dataset and present all analyses in this, or b) give a clear correspondence of male and female clusters onto the full ones.

    In this paper, all the analyses presented are on the full data set, with some links to the male or female data sets included. We now make clear that the full data set is the focus of the paper (lines 137-141). We provide the male and female data sets for our reader, with the individual Seurat objects uploaded to GEO, to make it easy for the reader to do follow-up analyses using the same criteria we used. We think this is helpful for our research community. We also compare the male and female clusters to the full data set using ClustifyR and report which clusters in the male or female data set analyses correspond to those in the full data set (Source data 2), as suggested by the reviewer, though ClustifyR has some limitations based on our evaluation of this tool for other annotations (see below).

    1. Most of their sections are heavily reliant on marker genes. In fact, in almost every section they mention how many of their genes of interest are marker genes. This depends heavily on specific cutoffs, making the conclusions fragile. Similarly, GO terms are used selectively and are, in many cases, vague (such as “signaling”, “neurogenesis”, “translation”).

    We evaluated marker genes, as those provide molecular identities to the clusters, given by definition they are significantly more highly expressed in a specific cluster, compared to all clusters. We used a Wilcox rank sum test with the following parameters in Seurat: (min.pct=0.25, logfc.threshold=0.25), which resulted in all called marker genes having p values < 0.05. We did not use a more stringent criteria given that most of the marker gene analyses are descriptive, and it is important to capture a broad range of genes. Our criteria are similar to Ma et al. 2021 (PMID: 33438579) and Corrales at al. 2022 (PMID: 36289550). In the text, we refer to the top 5 marker genes in several analyses, though these marker genes have a much more significant enrichment. We agree with the reviewers’ criticisms regarding the cluster-specific GO-term analyses in the text and those have been removed from the manuscript.

    1. A few of the results are not confirmed in vivo. The authors should add a Discussion section where they discuss the inherent issues of their analyses. Are there clusters of low quality? Are there many doublets?

    We have added discussion around these topics to the conclusions section of the manuscript and the results, when appropriate.

    On the same note, their clusters are obviously non-homogeneous (i.e. they house more than one cell types. This could obviously affect the authors' cluster-specific sex-differential expression, as differences could also be attributed to the differential composition of the male and female subclusters.

    We discuss this potential limitation in the discussion of sex differences in gene expression (Lines 959-961).

    1. Immunostainings are often unannotated and, in some cases especially in the Supplement, they are blurry. The authors should annotate their images and provide better images whenever possible.

    We appreciate this being pointed out and have provided higher resolution figures. The issue was we exceeded the manuscript submission file size on initial submission.

    1. I believe that the manuscript would benefit significantly by being heavily reduced in size and being focused on in vivo rigorously confirmed observations.

    We have addressed this comment by removing some of the analyses.

    Reviewer #3 (Public Review):

    This paper uses single-cell transcriptome sequencing to identify and characterize some of the neuronal populations responsible for sex-specific behaviour and physiology. This question is of interest to many biologists, and the approach taken by the authors is productive and will lead to new insights into the molecular programs that underpin sexually dimorphic development in the CNS. The dataset produced by the authors is of high quality, the analyses are detailed and well described, and the authors have made substantial progress toward the identification and characterization of some of the neuron populations. At the same time, many other cell types whose existence is suggested by this dataset remain to be identified and matched to specific neuron populations or circuits. We expect the value of this dataset to increase as other groups begin to follow up on the data and analyses reported in this paper. In general, the value of this paper to the field of Drosophila neurobiology will be high even if it is published in close to its present form. On the other hand, the current manuscript does not succeed in presenting the key take-home messages to a broader audience. A modest effort in this direction, especially re-writing the Conclusions section, will greatly enhance the accessibility and broader impact of this paper.

    While the biological conclusions reached by the authors are generally robust and of high interest, we believe that some conclusions are not sufficiently supported by the analyses that have been performed so far and need to be reexamined and confirmed. A major question concerns the authors' ability to distinguish a shared cell type with sex-biased gene expression from a pair of closely related, sex-limited cell types. There appear to be many cases that fall into this grey area, and the current analysis does not provide an objective criterion for distinguishing between sex-specific and sexually dimorphic clusters. Below we suggest some technical approaches that could be used to examine this issue. A second problem, which we do not believe to be fatal but that needs to be discussed, concerns potential differences in developmental timing and cell cycle phase between males and females, and how these differences might impact the inferences of sexual dimorphism in cell numbers and gene expression. Finally, we identify several areas, including the expression of transcription factors in different neuronal populations, that we believe could be described in more biologically insightful ways.

    For our review, we focus on three levels of evaluation:

    1). Is the dataset of high quality, useful to a large number of people, well annotated, and clearly described?

    The data appear to be high quality. The authors use reasonable neuronal markers to infer that 99% of their cells are neuronal in origin, suggesting extremely low levels of contamination from non-neuronal cells. Moreover, the gene/UMIs detected per cell are high relative to what has been reported in previous Drosophila scRNA-seq neuron papers (e.g. Allen et al., 2020). The cluster annotations are incomplete - which is not surprising, given the complexity of the cell population the authors are working with. 46 of the 113 clusters in the full dataset are named based on published expression data, gene ontology enrichments of cluster marker genes, and overlap with other CNS single cell datasets. This leaves rather a lot outstanding. It is probably unrealistic to aim for a 100% complete annotation of this dataset. But if we're thinking about how this dataset might be used by other researchers, in most cases the validation that a given cluster corresponds to a real, distinct neuron subpopulation will be left to the user.

    A major comment we have about the quality of the dataset relates to how doublets are identified and dealt with. The presence of doublets, an unavoidable byproduct of droplet-based scRNAseq protocols (like the 10x protocol used by the authors), could affect the clustering or at least bias the detection of marker genes. In large clusters, one might expect the influence of doublets on marker gene detection to be diluted, but in smaller clusters it could cause more significant problems. In extreme cases, a high proportion of doublets can produce artifactual clusters. The potential for problems is particularly high in cases where the authors identify cells with hybrid properties, such as clusters 86 and 92, which the authors describe as being serotonergic, glutamatergic, and peptidergic. Currently, the authors filter out cells with high UMI/gene counts, but it's unclear how many are removed based on these criteria, and cells can naturally vary in these values so it is not clear to us whether this approach will reliably remove doublets. That said, we acknowledge that by limiting their 'FindMarkers' analysis to genes detected in >25% of cells in a cluster the authors are likely excluding genes derived from doublets that contaminate clusters in low (but not high) numbers. We think it would be useful for the authors to report the number of cells that are filtered out because they met their doublet criteria and compare this value to the number of expected doublets for the number of cells they recovered (10x provides these figures). We would also recommend that the authors trial a doublet detection algorithm (e.g. DoubletFinder) on the unfiltered datasets (that is, unfiltered at the top end of the UMI/gene distribution). Does this identify the same cells as doublets as those the authors were filtering out?

    We appreciate this suggestion and have now added results from the doublet detection algorithm, DoubletFinder to our manuscript. Please see above response in editorial comments. We provide a table in Figure 1 – supplement 1 that indicates the number of cells removed by our filtering criteria: We acknowledge that there may be additional doublets in our data set that were not removed in our filtering criteria in the discussion (Lines 1098-1102) and have also provided a new table in Source data 2 indicating the number of potential doublets identified by DoubletFinder that are present in each cluster.

    2). What is the value of this study to its immediate field, Drosophila neurobiology? Are the annotation and analysis of specific cell clusters as precise and insightful as they could be? Has all the most important and novel information been extracted from this dataset?

    This is the part that we are least qualified to assess, since we, unlike the authors, are not neurobiologists. We hope some of the other referees will have sufficient expertise to evaluate the paper at this level.

    One thing we noticed (more on that in Part 3) is that the authors rely on JackStraw plots and clustree plots to identify the optimal combination of PCs and resolution to guide their clustering. This represents a relatively objective way of settling on clustering parameters. However, in a number of the UMAPs it looks like there are sub clusters that go undiscussed. E.g. in Fig. 2E clusters 1 and 3 are associated with smaller, distinct clusters and the same is true of clusters 2 and 6 in Fig 4b. Given that the authors are attempting to assemble a comprehensive atlas of fru+ neurons, it seems important for them to assess (at least transcriptomically) whether these are likely to represent distinct subpopulations.

    We appreciate these comments and address this above in the editorial comments section.

    3). How interesting, and how accessible is this paper to people outside of the authors' immediate field? What does it contribute to the "big picture" science?

    Here, we think the authors missed an important opportunity by under-utilizing the Conclusions section. The manuscript has a combined "Results and Discussion" section, where the authors talk about their identification and analysis of specific cell clusters / cell types. Frankly, to a non-specialist this often reads like a laundry list, and the key conclusions are swamped by a flood of details. This is not to criticize that section - given the complexity and potential value of this dataset, we think it is entirely appropriate to describe all these details in the Results and Discussion. However, the Conclusions section does not, in its present form, pull it all back together. We recommend using that section to summarize the 5-8 most important high-level conclusions that the authors see emerging from their work. What are the most important take-home messages they want to convey to a developmental biologist who does not work on brains, or to a neurobiologist who does not work on Drosophila? The authors can enhance the value of this paper by making it more interesting and more accessible to a broader audience.

    We appreciate this suggestion and made changes to the conclusions section to address this comment.

  2. eLife assessment

    This manuscript will be of interest to both developmental biologists and neuroscientists. The data suggest that most (but not all) neuronal types are present in both Drosophila sexes, and the existence of mostly shared scSeq clusters suggests that sex-specific versions of the transcription factor Fruitless can modify neural function in a sex-specific way without completely altering core neural identity. This cell type gene expression atlas should prove valuable in future efforts to understand mechanisms of sex-specific development, as well as the molecular and developmental-genetic basis of sex differences in behaviour.

  3. Reviewer #1 (Public Review):

    Neuronal tissues are very complex and are composed of a large number of neuronal types. With the advent of single-cell sequencing, many researchers have used this technology to generate atlases of neuronal structures that would describe in detail the transcriptome profiles of the different cell types. Along these lines, in this manuscript, the authors present single-cell transcriptomic data of the fruitless-expressing neurons in the Drosophila male and female central nervous systems. The authors initially compare cell cluster composition between male and female flies. They then use the expression of known markers (such as Hox genes and KC neuronal markers) to annotate several of their clusters. Then, they look in detail at the expression of different terminal neuronal genes in their transcriptomic data: first, they look into neurotransmitter-related genes and how they are expressed in the fruitless-expressing neurons; they describe in detail these populations that they then verify the expression patterns by looking into genetic intersections of Fru with different neurotransmitter-related genes. Then, they look at Fru-neurons that express circadian clock genes, different neuropeptides and neuropeptide receptors, and different subunits of acetylcholine receptors. Finally, they look into genes that are differentially expressed between male and female neurons that belong to the same clusters. They find a large number of genes; through GO term enrichment analysis, they conclude that many IgSF proteins are differentially expressed, so they look into their expression in Fru-neurons in more detail. Finally, they compare transcription factor expression between male and female neurons of the same cluster and they identify 69 TFs with cluster-specific sex-differential expression.

    In general, the authors achieved their goal of generating and presenting a large and very useful dataset that will definitely open a large number of research avenues and has already raised a number of interesting hypotheses. The data seem to be of good quality and the authors present a different aspect of their atlas.

    The main drawback is that many of the analyses are very superficial, resulting in the manuscript being handwavy and unsupported. The manuscript would benefit by reducing the number of "analyses" to the ones that are also in vivo validated and by discussing some of the drawbacks that are inherent to their experimental procedure.

    1. The authors treat their male, female, and full datasets as three different samples. At the end of the day, these are, for the most part, equivalent neuronal types. The authors should decide to a) either only use the full dataset and present all analyses in this, or b) give a clear correspondence of male and female clusters onto the full ones.
    2. Most of their sections are heavily reliant on marker genes. In fact, in almost every section they mention how many of their genes of interest are marker genes. This depends heavily on specific cutoffs, making the conclusions fragile. Similarly, GO terms are used selectively and are, in many cases, vague (such as "signaling", "neurogenesis", "translation").
    3. A few of the results are not confirmed in vivo. The authors should add a Discussion section where they discuss the inherent issues of their analyses. Are there clusters of low quality? Are there many doublets?
      On the same note, their clusters are obviously non-homogeneous (i.e. they house more than one cell types. This could obviously affect the authors' cluster-specific sex-differential expression, as differences could also be attributed to the differential composition of the male and female subclusters.
    4. Immunostainings are often unannotated and, in some cases especially in the Supplement, they are blurry. The authors should annotate their images and provide better images whenever possible.
    5. I believe that the manuscript would benefit significantly by being heavily reduced in size and being focused on in vivo rigorously confirmed observations.
  4. Reviewer #2 (Public Review):

    This work characterizes the diversity of Fruitless-expressing neurons in developing, mid-pupal Drosophila brains using single-cell sequencing. This was a reasonably in-depth effort to characterize sexually specific cell types during neural development. The use of single-cell sequencing tools to understand developmental mechanisms and not just adult function is appreciated. Some of the recent broader efforts, such as a recent whole-head atlas, contained limited numbers of cells for many cell types and have likely missed rare cell types. Sorting cells of an interesting category in order to enrich their representation in the set, with specific questions in mind, is in some ways a stronger and more targeted approach.

    I am surprised that the authors observed so much overlap between Fruitless-expressing cells of the same type in males and females and so few sex-specific cell types, which is interesting. Of course, cells of the same overall "type" could have differential wiring and function given the expression of Fruitless. This study suggests that such a modification may not require a wholesale change in gene expression profiles and may be enabled via a restricted set of changes in target gene expression. Overall, while the outcome is a bit descriptive, these efforts produced some interesting biological insights, and this dataset should serve as a resource for future efforts.

  5. Reviewer #3 (Public Review):

    This paper uses single-cell transcriptome sequencing to identify and characterize some of the neuronal populations responsible for sex-specific behaviour and physiology. This question is of interest to many biologists, and the approach taken by the authors is productive and will lead to new insights into the molecular programs that underpin sexually dimorphic development in the CNS. The dataset produced by the authors is of high quality, the analyses are detailed and well described, and the authors have made substantial progress toward the identification and characterization of some of the neuron populations. At the same time, many other cell types whose existence is suggested by this dataset remain to be identified and matched to specific neuron populations or circuits. We expect the value of this dataset to increase as other groups begin to follow up on the data and analyses reported in this paper. In general, the value of this paper to the field of Drosophila neurobiology will be high even if it is published in close to its present form. On the other hand, the current manuscript does not succeed in presenting the key take-home messages to a broader audience. A modest effort in this direction, especially re-writing the Conclusions section, will greatly enhance the accessibility and broader impact of this paper.

    While the biological conclusions reached by the authors are generally robust and of high interest, we believe that some conclusions are not sufficiently supported by the analyses that have been performed so far and need to be reexamined and confirmed. A major question concerns the authors' ability to distinguish a shared cell type with sex-biased gene expression from a pair of closely related, sex-limited cell types. There appear to be many cases that fall into this grey area, and the current analysis does not provide an objective criterion for distinguishing between sex-specific and sexually dimorphic clusters. Below we suggest some technical approaches that could be used to examine this issue. A second problem, which we do not believe to be fatal but that needs to be discussed, concerns potential differences in developmental timing and cell cycle phase between males and females, and how these differences might impact the inferences of sexual dimorphism in cell numbers and gene expression. Finally, we identify several areas, including the expression of transcription factors in different neuronal populations, that we believe could be described in more biologically insightful ways.

    For our review, we focus on three levels of evaluation:

    1). Is the dataset of high quality, useful to a large number of people, well annotated, and clearly described?

    The data appear to be high quality. The authors use reasonable neuronal markers to infer that 99% of their cells are neuronal in origin, suggesting extremely low levels of contamination from non-neuronal cells. Moreover, the gene/UMIs detected per cell are high relative to what has been reported in previous Drosophila scRNA-seq neuron papers (e.g. Allen et al., 2020). The cluster annotations are incomplete - which is not surprising, given the complexity of the cell population the authors are working with. 46 of the 113 clusters in the full dataset are named based on published expression data, gene ontology enrichments of cluster marker genes, and overlap with other CNS single cell datasets. This leaves rather a lot outstanding. It is probably unrealistic to aim for a 100% complete annotation of this dataset. But if we're thinking about how this dataset might be used by other researchers, in most cases the validation that a given cluster corresponds to a real, distinct neuron subpopulation will be left to the user.

    A major comment we have about the quality of the dataset relates to how doublets are identified and dealt with. The presence of doublets, an unavoidable byproduct of droplet-based scRNAseq protocols (like the 10x protocol used by the authors), could affect the clustering or at least bias the detection of marker genes. In large clusters, one might expect the influence of doublets on marker gene detection to be diluted, but in smaller clusters it could cause more significant problems. In extreme cases, a high proportion of doublets can produce artifactual clusters. The potential for problems is particularly high in cases where the authors identify cells with hybrid properties, such as clusters 86 and 92, which the authors describe as being serotonergic, glutamatergic, and peptidergic. Currently, the authors filter out cells with high UMI/gene counts, but it's unclear how many are removed based on these criteria, and cells can naturally vary in these values so it is not clear to us whether this approach will reliably remove doublets. That said, we acknowledge that by limiting their 'FindMarkers' analysis to genes detected in >25% of cells in a cluster the authors are likely excluding genes derived from doublets that contaminate clusters in low (but not high) numbers. We think it would be useful for the authors to report the number of cells that are filtered out because they met their doublet criteria and compare this value to the number of expected doublets for the number of cells they recovered (10x provides these figures). We would also recommend that the authors trial a doublet detection algorithm (e.g. DoubletFinder) on the unfiltered datasets (that is, unfiltered at the top end of the UMI/gene distribution). Does this identify the same cells as doublets as those the authors were filtering out?

    2). What is the value of this study to its immediate field, Drosophila neurobiology? Are the annotation and analysis of specific cell clusters as precise and insightful as they could be? Has all the most important and novel information been extracted from this dataset?

    This is the part that we are least qualified to assess, since we, unlike the authors, are not neurobiologists. We hope some of the other referees will have sufficient expertise to evaluate the paper at this level.

    One thing we noticed (more on that in Part 3) is that the authors rely on JackStraw plots and clustree plots to identify the optimal combination of PCs and resolution to guide their clustering. This represents a relatively objective way of settling on clustering parameters. However, in a number of the UMAPs it looks like there are sub clusters that go undiscussed. E.g. in Fig. 2E clusters 1 and 3 are associated with smaller, distinct clusters and the same is true of clusters 2 and 6 in Fig 4b. Given that the authors are attempting to assemble a comprehensive atlas of fru+ neurons, it seems important for them to assess (at least transcriptomically) whether these are likely to represent distinct subpopulations.

    3). How interesting, and how accessible is this paper to people outside of the authors' immediate field? What does it contribute to the "big picture" science?

    Here, we think the authors missed an important opportunity by under-utilizing the Conclusions section. The manuscript has a combined "Results and Discussion" section, where the authors talk about their identification and analysis of specific cell clusters / cell types. Frankly, to a non-specialist this often reads like a laundry list, and the key conclusions are swamped by a flood of details. This is not to criticize that section - given the complexity and potential value of this dataset, we think it is entirely appropriate to describe all these details in the Results and Discussion. However, the Conclusions section does not, in its present form, pull it all back together. We recommend using that section to summarize the 5-8 most important high-level conclusions that the authors see emerging from their work. What are the most important take-home messages they want to convey to a developmental biologist who does not work on brains, or to a neurobiologist who does not work on Drosophila? The authors can enhance the value of this paper by making it more interesting and more accessible to a broader audience.