A fully automated deep learning pipeline for high-throughput colony segmentation and classification
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Review Commons)
Abstract
Adenine auxotrophy is a commonly used non-selective genetic marker in yeast research. It allows investigators to easily visualize and quantify various genetic and epigenetic events by simply reading out colony color. However, manual counting of large numbers of colonies is extremely time-consuming, difficult to reproduce and possibly inaccurate. Using cutting-edge neural networks, we have developed a fully automated pipeline for colony segmentation and classification, which speeds up white/red colony quantification 100-fold over manual counting by an experienced researcher. Our approach uses readily available training data and can be smoothly integrated into existing protocols, vastly speeding up screening assays and increasing the statistical power of experiments that employ adenine auxotrophy.
Article activity feed
-
-
Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Reviewer #1:
(Evidence, reproducibility and clarity (Required)):
This study aims to develop tools for yeast researchers to automatically segment and classify yeast colonies. The machine learning method enables rapid classification compared to manual counting.
**MAJOR CONCERNS:**
Please include additional details about the types of images that must be captured for segmentation and categorization. It is important to provide details of what level of magnification might be needed during image capture. We anticipate that providing clear protocols for altering thresholds to classify colonies might be one way to overcome this challenge
That’s correct. Details on …
Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Reviewer #1:
(Evidence, reproducibility and clarity (Required)):
This study aims to develop tools for yeast researchers to automatically segment and classify yeast colonies. The machine learning method enables rapid classification compared to manual counting.
**MAJOR CONCERNS:**
Please include additional details about the types of images that must be captured for segmentation and categorization. It is important to provide details of what level of magnification might be needed during image capture. We anticipate that providing clear protocols for altering thresholds to classify colonies might be one way to overcome this challenge
That’s correct. Details on image acquisition, such as the level of magnification, are important to obtain accurate results.
To address this, we provide a detailed protocol in our companion article on ProtocolExchange: https://protocolexchange.researchsquare.com/article/nprot-7305/v1
We have updated the manuscript to include this link.
While the program crops colonies and segments them accurately, there is no spatial information of where these colonies are located in the image. This loss of spatial information limits the ability to use this platform to identify colonies of interest following experiments such as a genetic screen.
In principle it would be possible to retain the location of each cropped colony in the form of (x,y) coordinates in pixels. This could be included in a future release. However, we doubt the utility of such information for a genetic screen, unless identification of a positive hit could be linked to robotic picking of the identified colony (which would certainly go beyond the scope of this work). In reality, researchers will pick positive hits manually anyways, making our pipeline superfluous for such an application. We emphasize that we have developed our pipeline for large-scale quantification of red/white color assays. Here the pipeline makes a huge difference, as compared to manual counting.
The inability to accurately recognize sectored colonies as sectored (rather than red) is a significant limitation to the usage of this program for quantitative assays. While differentiating between red and white colonies is useful, the conclusion by the authors about its value for quantitative assays is limited unless variegation can be accurately defined. The authors should either soften this conclusion or qualify what quantitative measurements might mean given the limitations of their classification program. This somewhat diminished our overall enthusiasm.
The reviewer correctly points out that our algorithm shows lower accuracy when differentiating between red and variegating colonies than when differentiating between white and non-white colonies (including red, variegating and pink). Given this observation, we initially focused on predicting white vs. non-white colonies with our tool. However, the output of our pipeline also includes more granular predictions of numbers of white, red, pink and variegating colonies. We therefore leave it up to the user to decide which level of granularity is more appropriate, taking into account the tradeoff between granularity and accuracy. In particular, we note that for the colonies we tested, splitting the non-white category of predictions into red, variegating and pink resulted in a decrease of sensitivity from 0.98 for the non-white category to 0.86-0.88 for the individual categories, while the corresponding specificity showed a smaller reduction from 1.0 to 0.97-0.98. Considering that a lack of any predictions for the red, pink, and variegating categories effectively prohibits the researcher from detecting them at all, even a reduced sensitivity may be better than nothing and therefore acceptable in this case. In order to make this clearer in the text, we provide a more detailed comparison of performance metrics between levels of prediction, which may help to guide the user’s decision.
This program must be benchmarked with other colony classifiers. Cell Profiler is an example of a popular yeast colony segmentation program. How does this machine learning based tool compare with other colony segmentation and categorization programs. One possibility is to include an additional figure that compares their program with clear benchmarks. The outcome of effort based on benchmarking is not as important since we believe it is useful to have many alternatives for yeast segmentation and categorization. We think this revision would be essential to the manuscript and would add significant value.
We have used other approaches and were not satisfied with the outcomes. Hence, we developed our own pipeline, specifically designed to accurately distinguish red from white colonies and quantify such assays at a large scale.
When using CellProfiler we could not reliably distinguish variegating, pink, and red colonies. White colonies show up in the Red, Blue and Green channel, Red colonies mainly in the Red Channel. Therefore, variegating, pink and red colonies can be distinguished from white by reduced Blue and Green values, which is indirect and caused several issues. One of the problems was reflection of the flash during image acquisition, giving two reflective white patches on each colony that differed in pixel size depending on the magnification and colony size. We tried to prevent reflection with a ‘tent’, which reduced but could not eliminate the reflection. Therefore, the MaxIntensity of the Green/Blue channel was always the same of each colony, impeding classification. Furthermore, most red/pink colonies had a slim white rim, which was sometimes bigger/smaller and the relative area of rim to colony depends on the colony size, which made it impossible to tell a bit variegating from red by the output values from CellProfiler.
If deemed useful by the editors, we will be happy to mention this in the manuscript. A systematic comparison with other classifiers seems to be a bit of an overkill though. As stated by this reviewer, the outcome of such comparison would not matter much. It is important that the community has several approaches to choose from, so that the best solution can be found for each specific application.
**MINOR CONCERN**
The program currently saved cropped images of each segmented colony. This takes up a lot of storage space. It might be useful to provide an option to save or not save these cropped images. This flexibility will be valuable for users but does not detract from the major conclusions of the manuscript.
While we appreciate that the need to save individual images of cropped colonies may be a drawback for some users, in the current implementation it is not possible to avoid this step. One could imagine a scenario in which all cropped images were stored in RAM prior to classification rather than written to a computer’s disk; however, we believe that most users would have more limitations on the availability of RAM than on disk storage, therefore making this option also not feasible.
The authors have provided excellent examples of colonies they believe are red, white or sectored. More accurately defining a pink colony would be valuable for users of this program. How much of red is classified as pink by this program?
As the reviewer points out, it is difficult to give an objective definition of a pink colony. In this case, we relied exclusively on subjective expert annotations to define which colonies were pink (as well as for all other categories).
We acknowledge that this may introduce some error into the model, as there may be some overlap between red and pink colonies or between pink and variegating colonies; however, this problem also exists in the case of manual annotation. As shown in Figure 1d, for the colonies we tested, 4 out of a total of 55 colonies annotated as red by an expert were predicted as pink by our algorithm. We would like to emphasize that our pipeline alleviates biases between different researchers who would be annotating colony color manually, therefore improving reproducibility. Such biases could be subjective or objective, such as different monitors used to inspect the images.
Providing an example data set with the protocol would be helpful for users with limited Python experience. In combination with their protocol on Protocol exchange, this would serve as a valuable resource for novices in programming.
We agree with the reviewer’s suggestion and will be happy to provide an example dataset used in the manuscript. We will defer to the journal’s guidelines as to the best way to share these raw images.
One technical issue of the program is that the program tries to open all files in the specified folder even if they aren't jpg. This causes problems if there are additional or hidden files in the folder and the program cannot process the additional files.
We appreciate the reviewer pointing out this issue and have fixed it in a new version of the code.
Reviewer #1 (Significance (Required)):
This manuscript describes a machine learning approach to segment and categorize yeast colonies based on a red/white selection assay. The approach has been implemented using Python which makes this widely accessible to many researchers. Their detailed protocol on Protocol Exchange is a valuable resource which made it possible for us to evaluate its performance. The program meets its goals of reducing user time via manual counting. It is also reasonably accurate in discriminating between red and white colonies based on our initial tests. However, there are several important concerns that the authors will need to address before this manuscript can become a valuable resource for the yeast community. It is important to note that our framework is one where we have a great interest in quantitative yeast genetics but cannot evaluate the strengths and weakness of the computational approach. So much of the review is focussed on what would be needed to make this tool more user appropriate.
*Reviewer #2 *
(Evidence, reproducibility and clarity (Required)):
**Summary:**
Carl et al present an application of a deep learning-based image analysis able to segment and classify individual yeast colonies by their phenotype in a special plate. They evaluated the method and show that it provides the accuracy similar to the one achieved by experts' manual classification.
**Major comments:**
The key conclusions are convincing. The evaluation is performed on 3 datasets showing different properties (strong presence of phenotype, almost lack of the phenotype, gradual change of the phenotype).
The claims are carefully formulated. The deep learning methodology (training, validation, using modern technologies such as transfer learning, Unet, augmentation) is carefully designed and carried out. The evaluation is sound. The limitations are discussed.
For a short paper as it's formulated currently, no additional experiments are necessary.
The methods are implemented and are available on GitHub.
However, I'd strongly recommend to share also the data used in the paper, to make possible the reproduction of the results as well as to be used as examples for future users.
As stated above, we agree with the reviewer’s suggestion and will be happy to provide an example dataset used in the manuscript. We will defer to the journal’s guidelines as to the best way to share these raw images.
No replicates are provided unfortunately. The manuscript would benefit from showing results from replicates, especially because they should be easily obtainable.
It is not clear to us to which experiment the reviewer is referring. All of the results presented in Figure 2 did include replicates, as detailed in the figure legend.
**Minor comments:**
I'm not familiar with the state of the art to judge on whether prior studies are referenced.
The text and fitures are very clear and well formulated.
Reviewer #2 (Significance (Required)):
Despite the conceptual innovation is average, the method is well-developed and seems to be very useful for yeast analysis.
I'm not an expert in the application area to judge the state of the art. The carried out deep learning methodology is top notch.
The manuscript can be interesting and useful for experts using the described assay for yeast.
My expertise is in omics, image analysis, and machine learning.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
Summary:
Carl et al present an application of a deep learning-based image analysis able to segment and classify individual yeast colonies by their phenotype in a special plate. They evaluated the method and show that it provides the accuracy similar to the one achieved by experts' manual classification.
Major comments:
The key conclusions are convincing. The evaluation is performed on 3 datasets showing different properties (strong presence of phenotype, almost lack of the phenotype, gradual change of the phenotype).
The claims are carefully formulated. The deep learning methodology (training, validation, using modern technologies …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #2
Evidence, reproducibility and clarity
Summary:
Carl et al present an application of a deep learning-based image analysis able to segment and classify individual yeast colonies by their phenotype in a special plate. They evaluated the method and show that it provides the accuracy similar to the one achieved by experts' manual classification.
Major comments:
The key conclusions are convincing. The evaluation is performed on 3 datasets showing different properties (strong presence of phenotype, almost lack of the phenotype, gradual change of the phenotype).
The claims are carefully formulated. The deep learning methodology (training, validation, using modern technologies such as transfer learning, Unet, augmentation) is carefully designed and carried out. The evaluation is sound. The limitations are discussed.
For a short paper as it's formulated currently, no additional experiments are necessary.
The methods are implemented and are available on GitHub.
However, I'd strongly recommend to share also the data used in the paper, to make possible the reproduction of the results as well as to be used as examples for future users.
No replicates are provided unfortunately. The manuscript would benefit from showing results from replicates, especially because they should be easily obtainable.
Minor comments:
I'm not familiar with the state of the art to judge on whether prior studies are referenced.
The text and fitures are very clear and well formulated.
Significance
Despite the conceptual innovation is average, the method is well-developed and seems to be very useful for yeast analysis.
I'm not an expert in the application area to judge the state of the art. The carried out deep learning methodology is top notch.
The manuscript can be interesting and useful for experts using the described assay for yeast.
My expertise is in omics, image analysis, and machine learning.
-
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
This study aims to develop tools for yeast researchers to automatically segment and classify yeast colonies. The machine learning method enables rapid classification compared to manual counting.
MAJOR CONCERNS:
Please include additional details about the types of images that must be captured for segmentation and categorization. It is important to provide details of what level of magnification might be needed during image capture. We anticipate that providing clear protocols for altering thresholds to classify colonies might be one way to overcome this challenge
While the program crops colonies and segments them accurately, there is …
Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.
Learn more at Review Commons
Referee #1
Evidence, reproducibility and clarity
This study aims to develop tools for yeast researchers to automatically segment and classify yeast colonies. The machine learning method enables rapid classification compared to manual counting.
MAJOR CONCERNS:
Please include additional details about the types of images that must be captured for segmentation and categorization. It is important to provide details of what level of magnification might be needed during image capture. We anticipate that providing clear protocols for altering thresholds to classify colonies might be one way to overcome this challenge
While the program crops colonies and segments them accurately, there is no spatial information of where these colonies are located in the image. This loss of spatial information limits the ability to use this platform to identify colonies of interest following experiments such as a genetic screens.T
The inability to accurately recognize sectored colonies as sectored (rather than red) is a significant limitation to the usage of this program for quantitative assays. While differentiating between red and white colonies is useful, the conclusion by the authors about its value for quantitative assays is limited unless variegation can be accurately defined. The authors should either soften this conclusion or qualify what quantitative measurements might mean given the limitations of their classification program. This somewhat diminished our overall enthusiasm.
This program must be benchmarked with other colony classifiers. Cell Profiler is an example of a popular yeast colony segmentation program. How does this machine learning based tool compare with other colony segmentation and categorization programs. One possibility is to include an additional figure that compares their program with clear benchmarks. The outcome of effort based on benchmarking is not as important since we believe it is useful to have many alternatives for yeast segmentation and categorization. We think this revision would be essential to the manuscript and would add significant value.
MINOR CONCERN
The program currently saved cropped images of each segmented colony. This takes up a lot of storage space. It might be useful to provide an option to save or not save these cropped images. This flexibility will be valuable for users but does not detract from the major conclusions of the manuscript.
The authors have provided excellent examples of colonies they believe are red, white or sectored. More accurately defining a pink colony would be valuable for users of this program. How much of red is classified as pink by this program?
Providing an example data set with the protocol would be helpful for users with limited Python experience. In combination with their protocol on Protocol exchange, this would serve as a valuable resource for novices in programming.
One technical issue of the program is that the program tries to open all files in the specified folder even if they aren't jpg. This causes problems if there are additional or hidden files in the folder and the program cannot process the additional files.
Significance
This manuscript describes a machine learning approach to segment and categorize yeast colonies based on a red/white selection assay. The approach has been implemented using Python which makes this widely accessible to many researchers. Their detailed protocol on Protocol Exchange is a valuable resource which made it possible for us to evaluate its performance. The program meets its goals of reducing user time via manual counting. It is also reasonably accurate in discriminating between red and white colonies based on our initial tests. However, there are several important concerns that the authors will need to address before this manuscript can become a valuable resource for the yeast community. It is important to note that our framework is one where we have a great interest in quantitative yeast genetics but cannot evaluate the strengths and weakness of the computational approach. So much of the review is focussed on what would be needed to make this tool more user appropriate.
-
