GuideMaker: Software to design CRISPR-Cas guide RNA pools in non-model genomes

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background

CRISPR-Cas systems have expanded the possibilities for gene editing in bacteria and eukaryotes. There are many excellent tools for designing CRISPR-Cas guide RNAs (gRNAs) for model organisms with standard Cas enzymes. GuideMaker is intended as a fast and easy-to-use design tool for challenging projects with (i) non-standard Cas enzymes, (ii) non-model organisms, or (iii) projects that need to design a panel of gRNA for genome-wide screens.

Findings

GuideMaker can rapidly design gRNAs for gene targets across the genome using a degenerate protospacer-adjacent motif (PAM) and a genome. The tool applies hierarchical navigable small world graphs to speed up the comparison of guide RNAs and optionally provides on-target and off-target scoring. This allows the user to design effective gRNAs targeting all genes in a typical bacterial genome in ∼1–2 minutes.

Conclusions

GuideMaker enables the rapid design of genome-wide gRNA for any CRISPR-Cas enzyme in non-model organisms. While GuideMaker is designed with prokaryotic genomes in mind, it can efficiently process eukaryotic genomes as well. GuideMaker is available as command-line software, a stand-alone web application, and a tool in the CyCverse Discovery Environment. All versions are available under a Creative Commons CC0 1.0 Universal Public Domain Dedication.

Article activity feed

  1. This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giac007), which carries out open, named peer-review.

    These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 3: Mudra Hegde

    Summary:

    In this manuscript, Poudel et al. present a software, GuideMaker, to rapidly design sgRNAs targeting non-model genomes. Various input parameters such as PAM motif, guide length, length of seed region for off-target searching and so on can be toggled to design a panel of sgRNAs for pooled screening projects. The tool also helps pick control sgRNAs to include in the sgRNA pool. To benchmark the computational performance of their tool, the authors used GuideMaker to design sgRNAs targeting E.coli, P.aeruginosa, Aspergillus fumigatus and Arabidopdis thaliana. They also compared GuideMaker to the existing design tool, CHOPCHOP and reported that the targets identified by GuideMaker were mostly similar to those identified by CHOPCHOP. This tool can be used as a stand-alone web application, command-line software or in the CyVerse Discovery Environment.

    Overall, the tool is very well documented and easy to use. In the current version of the manuscript, GuideMaker does not show a clear improvement over the state-of-the-art design tool, CHOPCHOP. The authors do not implement any existing on-target scoring methods to determine the targeting efficacy of the picked sgRNAs. This can lead to picking guides that are highly specific but not effective enough.

    Major points:

    1. Implementing on-target scoring methods, at least for the Cas enzymes that have on-target efficacy information, can help improve the process of picking sgRNAs. This tool will probably be used more often with standard Cas enzymes and it will be useful to have on-target efficacy scores attached to the guide RNAs.

    2. The authors do a thorough analysis of the computational performance of GuideMaker with various genomes and Cas enzymes but including a comparison of the computational performance of GuideMaker vs. CHOPCHOP will strengthen the manuscript.

    3. The authors define the PAM sequence of SaCas9 to be NGRRT whereas the canonical PAM sequence of SaCas9 is NNGRRT. This should be modified throughout the manuscript and analyses involving SaCas9 should be redone.

    4. A good addition to the tool would be to output a file with all the sequences that were designed targeting the region of interest with the specific PAM sequence. This gives the user a sense of the universe from which the final guides were picked.

    5. Another useful input parameter would be to specify a target region that the user wants to focus on such as letting the user input genomic coordinates or a gene name or locus tag. For example, CRISPy by Blin et al., 2016 takes a GenBank file as input and allows the user to input features specific to the uploaded genome.

    Minor points:

    1. "CyVerse" is misspelled as "CyCVerse" in multiple places in the manuscript.

    2. Reference Figure 2 in Line 92.

    3. Line 154: "Ratios between tools were calculated by dividing the number of gRNA identified.."

    4. In Supplementary Figure 3 "wit haVX2" should be "with aVX2".

    5. GitHub link in Line 336 does not work.

    6. Line 225-226: "GuideMaker also creates off-target gRNAs for use as negative controls in highthroughput experiments." "Off-target gRNAs" is misleading in this context.

  2. This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giac007), which carries out open, named peer-review.

    These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 2: Wen Wang

    The author developed a software, GuideMaker, for designing CRISPR-Cas guide RNA pools in non-model genomes. Three bacterial genomes, a fungal genome, and a plant genome were used in performance benchmarking, which proves that the software supports the design of gRNAs in non-standard Cas enzymes for non-model organisms at the genome-scale. However, the advantages of this software are not well estimated nor presented compared to other tools like CHOPCHOP. Also, the software was mainly evaluated in three bacteria genomes, one fungus and Arabidopsis genome. There are no tests for non-model plant or animal genomes. Therefore, the "non-model genomes" in the title are exaggerated. I list more problems as follows.

    Major comments:

    1. The authors did not compare the computation resources and performance (running time, memory) with existing softwares like CHOPCHOP. Also, the authors need to compare the score rankings with CHOPCHOP to present the relative power of GuideMaker. Is there any score rankings concerning efficiency or off-target possibilities for the designed Guide RNAs

    2. It is better to add support for gff formated annotation input files since many non-model species do not have GenBank annotations.

    3. The authors mentioned GuideMaker can design gRNAs for any small to medium size genome (up to about 500 megabases). The maximum genome used in the article was Arabidopsis thaliana (114.1MB), which is obviously smaller than the described (up to about 500 megabases). We couldn't find the description whether the authors had investigated the larger genomes. Therefore, the detailed analysis or discussion of this problem is needed.

    4. The authors stated GuideMaker to design CRISPR-Cas guide RNA pools in non-model genomes. Arabidopsis thaliana is a model organism and test in a non-model plant genome will be highly valuable.

    5. It is also stated that GuideMaker can design gRNAs for any PAM sequence from any Cas system but the results of SaCas and StCad was described in only one sentence.

    6. The source of the genomes was missing in the manuscript. In particular, some species have multiple genome versions in the same database. Therefore, to make the results more repeatable, the specific website and version number for each species are needed.

    Minor comments: There are many typos. I give some examples here.

    1. Line 11, "bacteria" should be "bacterias".

    2. Line 38, delete the", including non-model organisms",prokaryotic and eukaryotic organisms include the non-model organisms.

    3. Line 111, "candidates guides" should be "candidate guides".

    4. Line154, "gRNA identify with GuideMaker" should be "gRNA identified with GuideMaker".

    5. Line 195, "The second way GuideMaker reduces…" should be "The second way that GuideMaker reduces…".

    6. Line 204, "and", no need for italics.

    7. Line 207, "gRNA's" should be "gRNAs".

    8. Lines 209-210, "we anticipate performance will…" should be "we anticipate that performance will…".

    9. Figure. 1. It seems that the font size of the description of Control gRNAs is inconsistent with others, please check.

    10. Line 22,55,98,159,175,187,219 and 247, "Guidemaker" should be "GuideMaker".

    11. Line 262, "CAS" should be "Cas".

    12. Supplementary Figure 4. Grammar mistake in sentence "the different number of logical cores with or without AVX2 settings are available". It should be "the different number of logical cores with or without AVX2 settings is available".

  3. This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giac007), which carries out open, named peer-review.

    These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 1: Kornel Labun

    In this study, "GuideMaker: Software to design CRISPR-Cas guide RNA pools in non-model genomes", Poudel et al. provides a software for sgRNAs design, focusing on genome wide screens. The tool uses the original strategy of finding off-targets with the use of Hierarchical Navigable Small World graphs trying to provide fast running times for the all vs all comparison. Additional novelty is introduced with proximity filters towards features of interest, and filters for restriction sites inside the guide RNA. What's more, the tool creates control guide RNAs which is mandatory for pooled screens. I applaud the selection of the license as all versions of the GuideMaker are available under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Below I list some of my comments and suggestions. Comments & suggestions:

    1. I tested the website and the tool, not finding any bugs and errors. Website is well made, congratulations!

    2. Name of the tool: GuideMaker is not self-explanatory for what it is specialized for, which is pooled design. In the future consider naming your tools more distinctly as I am afraid that currently the tool will be buried under hundreds of other GuideSomething tools. 3. Authors also claim to support Cas13 (page

    3 line 65), but don't mention anything more specific about it. I mention that because design for RNA is vastly different from design for DNA and it should be explained how the tool designs for RNA.

    1. From my understanding the tool offers highly discriminatory settings towards off-target search for a quick resolution of the all vs all comparison problem, however authors ignore that CRISPR off-targets are not defined by the hamming distance, but levenshtein distance. This was proven already by many studies e.g. Tsai et al. 2015. I recommend that authors embrace this issue in the paper and explain why their design may be suitable, and for what kind of studies it would be alright to use hamming distance vs levenshtein distance instead of ignoring the problem.

    2. Study could gain prominence by showing a couple figures and describing how the grid-optimization parameters were selected. This would be especially important for everyone that wants to use this tool for nonbacterial gnomes (page 6, lines 128-131). Although script for optimization is included, it would be good to see what are the tradeoffs.

    3. I believe that Figure 4 and all other AVX2 vs nonAVX2 comparisons are not interesting enough to include multiple times. AVX2 improvements are nice, but the tool is already plenty fast, and running time of 250 vs 220 seconds does not matter for normal users. Similarly the number of cores does not seem to influence tool speed above 8 cores and one figure should be enough to explain that. Tool claims very fast running times, but does not compare to the running times of other similar tools for the design of the pooled screens, this could highlight its superiority.

    4. CHOPCHOP is a general tool for the design of pooled screens while here it is used as a pooled screen tool due to its configurability. Additionally, CHOPCHOP also supports all PAM and all species, but on its python version available here https://bitbucket.org/valenlab/chopchop/src/master/, website supports only some genomes due to slow process of index building for bowtie.

    5. Comparisons to CHOPCHOP focus on the guides found, but I don't understand why consensus ratio between the tools should matter. What is more important is whether GuideMaker does indeed not filter any guides that are preferable for each gene (e.g. by CHOPCHOP ranking) and whether its hamming based filter is good enough to not cause significant unknown off-target effects (levenshtein distance offtargets not found by hamming distance filter). All it takes is one bulge and the hamming distance will become large, while levenshtein distance can even be as low as 1.

    6. It is not clear to me why the tool can't be used with large genomes, filtering on the 11bp seed and hamming distance should be plenty fast for also very large genomes. Could it be that the tool should support other input, not only genbank file format?