Draft genome sequence of Actinopolyspora saharensis DSM 46666, a rare actinomycete isolated from the Algerian Sahara

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Actinopolyspora saharensis DSM 46666 (=H244), a rare actinomycete, was isolated from Palm grove in the Oasis of Inghid in the Mzab region of the Algerian Sahara and deposited in the German Collection of Microorganisms and Cell Cultures (DSMZ). Here, we report the draft genome sequence of strain DSM 46666, with a size of 4.67 Mbp and a G+C content of 69.5 %. Bioinformatic analysis of the genomic sequence revealed the carbohydrate-active enzyme gene repertoires involved in carbohydrate metabolism and the occurrence of 14 biosynthetic gene clusters disclosing the secondary metabolite capacity of strain DSM 46666.

Article activity feed

  1. Thank you for submitting your revised manuscript to Access Microbiology addressing the reviewer comments. I am pleased to let you know that it has now been accepted for publication. Congratulations!

  2. Comments to Author

    I wish to thank the authors, who have satisfactorily addressed all the points that I raised on the original submission.

    Please confirm that no generative AI tools or large language models have been used to generate this peer review report or to assist with any part of the peer review process.

    I confirm no generative AI tools were used in preparation of this review.

    Please rate the manuscript for methodological rigour

    Satisfactory

    Please rate the quality of the presentation and structure of the manuscript

    Satisfactory

    To what extent are the conclusions supported by the data?

    Strongly support

    Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

    No

    Is there a potential financial or other conflict of interest between yourself and the author(s)?

    No

    If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

    No: The work does not involve human / animal work.

  3. Thank you very much for submitting your manuscript to Access Microbiology. It has now been reviewed by two experts in the field, whose comments are attached below. They both have provided several concerns that will need major corrections of the manuscript. These include, but are not limited to: - sequencing and assembly information and parameters missing or partially missing - Methods lacking sufficient depth and detail - lack of clarity in the location where the biological samples were sourced - datasets must be made publicly available on a repository, according to our Open Data Policy These amendments, along with all other reviewer suggestions, need to be addresses in a revised version of the manuscript. Please provide a revised version of the manuscript (including a tracked changes version) and a point-by-point response to the reviewer comments within 60 days.

  4. Comments to Author

    Draft Genome Sequence of Actinopolyspora saharensis DSM 46666, a Rare Actinomycete Isolated from the Algerian Sahara In this study, the authors isolated a bacterium from the Algerian Sahara, characterised it as a halophile, isolated its DNA, sequenced the whole genome, and present their genome assembly. They further conducted genome analyses that include that of overall genome relatedness so as to assign the taxonomy of this strain. Lastly, they conducted analyses to determine the biosynthetic potential of this strain. A draft genome assembly of halophile bacterium isolated from soils in the Algerian Sahara is of much benefit to investigations on ecology, diversity and biochemistry of halophiles. That said, there are concerns on how the genome was assembled and how the sample was collected that need to be addressed so as to ensure that subsequent in silico genome analyses of this strain is based on accurate data which was obtained via entirely reproducible methods. Methodological rigour, reproducibility and availability of underlying data Sample collection Line 49 states "isolated from a desert soil sample in Algeria". Where exactly in the desert was the sample collected from? Can you give GPS co-ordinates? If that is not available, please give the name of local area/ "neighbourhood" name used by residents of that area. From what depth was the soil sample taken? What is the general description of the soil? Sandy/loam/Clay? Clarifying these would enable the sample collection steps to be "reproduced" by other researchers and also aid in studying the ecology of this strain. In line with the CBD, are the prior informed consent forms present/were obtained during sample collection? Genome assembly data In line with the open platform nature of Access Microbiology, I would recommend that the reads used for assembly be made publicly available through repositories like SRA. The Genbank record for JBPQHB000000000 states that the coverage is 53x while the manuscript in line 60 states 30x. Please give the size (in bp) of the all the reads sequenced, the quality control steps (e.g. calls with less than Q30 values were culled) on these reads the resultant size, then the final size (in bp) of the reads used in the final assembly. Genome assembly Line 60. The authors assigned reads to a reference genome, and then state the its is a de novo genome assembly. This is self-contradicting. On what basis did the authors determine that the reference genome the was used was appropriate, since this isolate is novel? Please also justify inclusion of a step for assigning reads to a reference genome that is not from the exact same sample, especially when current tools (SPADEs, Flye) and online platforms for their application (usegalaxy, PATRIC) are easy to use and powerful enough to create a high quality assemblies without any need of a reference genome from another strain. Please provide contamination and completion levels of this assembly. Since you have done taxonomic analysis using overall genome relatedness indices, it is important to show that the input data for these analyses was of sufficient quality (Riesco R, Trujillo ME. Update on the proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol. 2024 Mar;74(3):006300). Presentation of results The genome coverage is 30x, which is below the 50x threshold outlined for assigning taxonomy based on overall genome relatedness indices (Riesco R, Trujillo ME. Update on the proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol. 2024 Mar;74(3):006300). For ANI values, please include the alignment fraction. Also consider inclusion of AAI analysis and subsequent results alongside those of ANI. Overall genome relatedness indices alone are only one facet in the identification of a bacterium. Therefore, please clearly state available findings on other facets such as the growth, colony morphology and ecological characteristics that would identify this strain as Actinopolyspora. Fig 1. Include a scale bar. Improve the quality of Fig. 1 by making the lines thicker and use of higher resolution images (300 dpi and above). Use of online tree editing platforms such as iTOL is recommended to do this. Fig 2. This figure is a screenshot of the antiSMASH output and demonstrates a lack of creativity in presenting these findings. I recommend that the authors apply other ways apt, and creative ways of presenting data on the loci and predicted final products of BGCs. Table 2. Please present this data in an apt comparative figure e.g., dual data multi-level pie chart. How the style and organization of the paper communicates and represents key findings The key findings are a genome assembly, from a halophile whose habitat is the Sahara desert. In silico genome analyses to identify this strain is key and the authors did this. Also key are in silico genome analyses to show genes that encode traits useful to thrive in halophilic environments as well as the specific niche in the Algerian Sahara. This is missing or not clearly brought out. Literature analysis or discussion Line 46-48 Please expound on the connections between studying a rare halophile and developing a broader understanding on the ecological diversity and functional capacities of actinomycetes. Use of concrete connections is most welcomed. Line 88-102 It is absolutely not clear why the authors chose to focus on Carbohydrate-Active Enzyme analysis for a halophile especially in line with study gaols set out in Line 46-48. The same applies for BGC analysis. Minor comments Line 46 For easy comprehension, consider use of "In view/In line" rather than "In frame"

    Please rate the manuscript for methodological rigour

    Poor

    Please rate the quality of the presentation and structure of the manuscript

    Poor

    To what extent are the conclusions supported by the data?

    Partially support

    Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

    No

    Is there a potential financial or other conflict of interest between yourself and the author(s)?

    No

    If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

    Yes

  5. Comments to Author

    Major issues (1) According to the journal's Open Data Policy, all supporting data must be deposited and be publicly available at time of submission. The Policy states that raw sequence files (FASTQ-formatted files) should be deposited rather than only the assembled genome sequence. This is because the genome assembly is an inference from the raw reads. In the current version of the manuscript, I cannot see any accession numbers (nor URLs) for the raw reads. These should be deposited in the Sequence Read Archive (SRA) (https://doi.org/10.1093/nar/gkq1019) or European Nucleotide Archive (ENA) or other appropriate sequence repository and the accession numbers must be included and clearly signposted in the Data section. (2) It is essential that the manuscript provides sufficient technical details of the methodology that a competent investigator could repeat the work. Currently, there is insufficient detail about the methods. For example, the authors simply cite documents on the Microbes NG website. These documents might not be persistently maintained. Instead please provide full details in this manuscript or as an attached supplementary document or cite a document that has a DOI, for example a protocol created in protocols.io. Similarly, the authors need to filly describe their protocol for the CAZy analysis mentioned in lines 91-96. What were the the parameter values, options, database versions, score thresholds, etc. for these sequence similarity searches? (3) Line 67. The authors state that the annotation of the genome sequence assembly presented here was done with RAST. However, this conflicts with the information at https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_051594435.1/ where it is clearly stated that the publicly available annotation was generated using PGAP (https://doi.org/10.1093/nar/gkw569). Please resolve this conflict. For example, the authors could (a) remove all discussion of RAST results from the manuscript or (b) submit the RAST-based annotation to the NCBI. This also affects line 69 where the numbers of coding sequences are mentioned. (4) Figure 1 is intended to make a point about the phylogenetic distance between the sequenced strain and some related taxa. However, this is not achieved because insufficient information is provided about the scale. Please provide a scale bar to explain the relationship between branch lengths and phylogenetic distance. (5) For this CAZyme prediction resource to be useful for readers, it must be possible for the reader to locate each of the predicted CAZymes. In Table 2, for each of these CAZyme proteins, please provide the protein accession number. For example, here are some further details of some of the glycoside hydrolases: NZ_JBPQHB010000001.1 253234 255876 glycoside hydrolase family 172 protein WP_438386994.1 ACSLPK_RS01215 NZ_JBPQHB010000001.1 488287 492114 glycoside hydrolase family 2 TIM barrel-domain containing protein WP_438387072.1 ACSLPK_RS02270 NZ_JBPQHB010000002.1 234064 236793 glycoside hydrolase family 3 C-terminal domain-containing protein WP_165632255.1 ACSLPK_RS04945 NZ_JBPQHB010000004.1 37295 38485 glycoside hydrolase family 5 protein WP_438387666.1 ACSLPK_RS07345 NZ_JBPQHB010000004.1 196167 198686 glycoside hydrolase family 31 protein WP_438387721.1 ACSLPK_RS07895 NZ_JBPQHB010000004.1 217128 220337 family 78 glycoside hydrolase catalytic domain WP_438387726.1 ACSLPK_RS07945 NZ_JBPQHB010000005.1 154375 156021 putative glycoside hydrolase WP_438387800.1 ACSLPK_RS08855 NZ_JBPQHB010000007.1 75324 76751 glycoside hydrolase family 28 protein WP_207630390.1 ACSLPK_RS10795 NZ_JBPQHB010000007.1 171126 172031 glycoside hydrolase family 16 protein WP_438388009.1 ACSLPK_RS11200 NZ_JBPQHB010000008.1 155371 156390 glycoside hydrolase family 25 protein WP_438388083.1 ACSLPK_RS12260 NZ_JBPQHB010000008.1 167332 169119 glycoside hydrolase family 3 protein WP_438388086.1 ACSLPK_RS12315 NZ_JBPQHB010000010.1 6912 9194 glycoside hydrolase family 3 C-terminal domain-containing protein WP_165631588.1 ACSLPK_RS13535 NZ_JBPQHB010000010.1 22527 23336 aminoglycoside adenylyltransferase family protein WP_165631596.1 ACSLPK_RS13610 NZ_JBPQHB010000013.1 147932 149788 glycoside hydrolase family 3 protein WP_438388545.1 ACSLPK_RS16655 NZ_JBPQHB010000015.1 62490 63182 glycoside hydrolase family 11 protein WP_092521071.1 ACSLPK_RS17650 NZ_JBPQHB010000015.1 94975 98004 glycoside hydrolase family 3 C-terminal domain-containing protein WP_438388656.1 ACSLPK_RS17800 NZ_JBPQHB010000015.1 98327 100330 glycoside hydrolase family 31 protein WP_438388657.1 ACSLPK_RS17805 NZ_JBPQHB010000016.1 34780 36336 glycoside hydrolase family 32 protein WP_438388690.1 ACSLPK_RS18085 NZ_JBPQHB010000017.1 11425 13569 glycoside hydrolase family 97 protein WP_438388741.1 ACSLPK_RS18495 NZ_JBPQHB010000019.1 1587 3464 glycoside hydrolase family 15 protein WP_092525512.1 ACSLPK_RS19350 NZ_JBPQHB010000019.1 44027 45883 glycoside hydrolase family 2 protein WP_438388852.1 ACSLPK_RS19530 Minor issues (1) Is it possible to provide a URL linking to the strain DSM 46666 in the DSMZ's web pages? (2) What is the basis for describing this strain or species as "rare"? The authors cite references 2-4 to support the claim that members of this genus "are considered rare", but none of those references seem to provide direct assessment of the abundance (or rarity) of these species. Is it possible to cite a primary source of evidence for this alleged rarity? (3) Line 31. "disclosing the secondary metabolite capacity". This is somewhat over-selling the data. At best, we can use the data to make predictions about the this capacity. the predictions are limited by (a) incomplete knowledge of the relationship between gene sequences and metabolite products (b) incomplete knowledge of gene expression and (c) incomplete nature of a draft genome sequence assembly, especially affecting repeat-rich genes. Therefore, please tone-down this claim. (4) Line 49. Is it possible to cite any reference for the isolation of the microbe and/or to acknowledge the investigator(s) who performed the isolation? (5) Please check spelling of tools. For example, at line 61 it should be SAMtools, not Samtools. (6) Line 66. Please consider citing references for the NCBI resources that the authors used. For example: https://doi.org/10.1093/nar/gkw569, https://doi.org/10.1093/nar/gkab1112 and https://doi.org/10.1093/nar/gky989. (7) At line 42, when using the LPSN, please consider citing https://doi.org/10.1099/ijsem.0.004332 or other suitable citation for this resource. (8) Line 71. Please clarify whether "similarity" here means "identity". (9) Line 88. There is no need for capital "C", "A" and "E" for the words "carbohydrate-active enzyme". These are "normal" words and not the name of a database. (10) Line 88. When discussing the CAZy database, please cite: https://doi.org/10.1093/nar/gkab1045. (11) Line 92. Please provide references for HMMER, DIAMOND and Hotpep.

    Please rate the manuscript for methodological rigour

    Satisfactory

    Please rate the quality of the presentation and structure of the manuscript

    Satisfactory

    To what extent are the conclusions supported by the data?

    Partially support

    Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

    No

    Is there a potential financial or other conflict of interest between yourself and the author(s)?

    No

    If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

    Yes