Improving the annotation of the Heterorhabditis bacteriophora genome
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaScience)
Abstract
Background
Genome assembly and annotation remain exacting tasks. As the tools available for these tasks improve, it is useful to return to data produced with earlier techniques to assess their credibility and correctness. The entomopathogenic nematode Heterorhabditis bacteriophora is widely used to control insect pests in horticulture. The genome sequence for this species was reported to encode an unusually high proportion of unique proteins and a paucity of secreted proteins compared to other related nematodes.
Findings
We revisited the H. bacteriophora genome assembly and gene predictions to determine whether these unusual characteristics were biological or methodological in origin. We mapped an independent resequencing dataset to the genome and used the blobtools pipeline to identify potential contaminants. While present (0.2% of the genome span, 0.4% of predicted proteins), assembly contamination was not significant.
Conclusions
Re-prediction of the gene set using BRAKER1 and published transcriptome data generated a predicted proteome that was very different from the published one. The new gene set had a much reduced complement of unique proteins, better completeness values that were in line with other related species' genomes, and an increased number of proteins predicted to be secreted. It is thus likely that methodological issues drove the apparent uniqueness of the initial H. bacteriophora genome annotation and that similar contamination and misannotation issues affect other published genome assemblies.
Article activity feed
-
Abstract
**Reviewer 3. Mary Ann Tuli **
This manuscript describes the reannotation of the Heterorhabditis bacteriophora, an entomopathogenic nematode widely used to control insect pests in horticulture. A previous study was reported to encode an unusually high proportion of unique proteins and a paucity of secreted proteins compared to other related nematodes. This study asked whether these unusual characteristics were biological or methodological in origin.
The work was carried out in the spirit of data improvement, rather than a rebuttal, and while it is not a genome paper as such, it does reanalyse a genome using new data and different tools. It is very suited to the GigaScience philosophy and readership due to the repeatable side and open access component.
I have checked that the Methods described and the Resources used meet the …
Abstract
**Reviewer 3. Mary Ann Tuli **
This manuscript describes the reannotation of the Heterorhabditis bacteriophora, an entomopathogenic nematode widely used to control insect pests in horticulture. A previous study was reported to encode an unusually high proportion of unique proteins and a paucity of secreted proteins compared to other related nematodes. This study asked whether these unusual characteristics were biological or methodological in origin.
The work was carried out in the spirit of data improvement, rather than a rebuttal, and while it is not a genome paper as such, it does reanalyse a genome using new data and different tools. It is very suited to the GigaScience philosophy and readership due to the repeatable side and open access component.
I have checked that the Methods described and the Resources used meet the minimum standards reporting check list. I note that data has been submitted to the publicly available repositories (SRA and INSDC) but that the data is not yet available, thus it cannot be reviewed at the moment.
I have looked at the files in https://github.com/DRL/mclean2017 There are 9 supplementary files of annotation, analyses and annotation pipelines which look thorough and complete. The repository also include splice site files. The manuscript states that all custom scripts developed for this manuscript are available at in this repository but I see only a single script in the /analysis folder. Is this right?
The gene prediction and protein orthology analyses and discussion were thorough and fully explained, as well as future work (expanded transcriptome and comparative data work) described.
My recommendation is that this manuscript be published as a research article.
I have some minor typos and suggestions which are probably more pertinent for a copy editor to spot but include them here since I noted them down.
105 BUSCO; see below). Another unusual feature of the H. bacteriophora gene set was the -> 105 BUSCO; see Table 2). Another unusual feature of the H. bacteriophora gene set was the
107 Most nematode (and other metazoan) genomes have low proportions of non-canonical introns (less than 1%), [Reference needed]
137 from the new Illumina data and sequence similarity from the NCBI nucleotide database (nt) -> 137 from the new Illumina data and sequence similarity from the NCBI nucleotide (nt) database
371 The assembly scaffolds were aligned to the NCBI nucleotide (nt) database, -> 371 The assembly scaffolds were aligned to the NCBI nt database,
397 version of the assembly. Hard masking was for known Nematoda repeats from the -> 397 version of the assembly. The assembly was hard-masked for known Nematoda repeats from the….?
[Hard masked / hard-masked Soft masked / soft-masked check for consistent use]
406 bacteriophora annotation was identified from the general feature format file, and then-> 406 bacteriophora annotation was identified from the general feature format (GFF) file, and then
407 selected from the protein FASTA files. The general feature format file (GFF) for -> 407 selected from the protein FASTA files. The GFF file for
415 from the general feature format file as exon features -> 415 from the GFF file as exon features
423 bacteriophora. Intronic features were added to GFF3 [Explain what GFF3 is]
[Check consistent use of GFF (line 415) / GFF file / GFF format (744, 749) Should be GFF file]
424 gff3 -sort -tidy -retainids -fixregionboundaries -addintrons') and and splice sites were -> 424 gff3 -sort -tidy -retainids -fixregionboundaries -addintrons') and splice sites were
445 the 23 Clade V nematodes were downloaded from WBPS8 (available at: 446 http://parasite.wormbase.org/index.html) [Suggest link to ftp://ftp.ebi.ac.uk/pub/databases/wormbase/parasite/releases/WBPS8/)
358 Parasite (WBPS8) [34]. [This is the first mention of WormBase Parasite so should include the home page rather than line in 446]
478 using MAFFT v7.267 (RRID:SCR_011811) [50], and the alignments trimmed with NOISY [Reference needed for NOISY.]
480 v8.1.20 (RRID:SCR_006086) [51] with a PROTGAMMAGTR [Reference needed for PROTGAMMAGTR]
-
Now published in GigaScience doi: 10.1093/gigascience/giy034
Florence McLean 1Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Florence McLeanDuncan Berger 1Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Duncan BergerDominik R. Laetsch 1Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Dominik R. LaetschHillel T. Schwartz 2Division of Biology and Biological Engineering, California Institute of Technology, …
Now published in GigaScience doi: 10.1093/gigascience/giy034
Florence McLean 1Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Florence McLeanDuncan Berger 1Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Duncan BergerDominik R. Laetsch 1Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Dominik R. LaetschHillel T. Schwartz 2Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USAFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Hillel T. SchwartzMark Blaxter 1Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Mark Blaxter
A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giy034 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.
These peer reviews were as follows:
Reviewer 1: http://dx.doi.org/10.5524/REVIEW.101075 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.101076
-
-
-