Importance of database curation in taxonomic assignation of 16S data.

Matteo Soverini
Andrea Castagnetti

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Microbial identification is the key component to microbial community analysis. Since mid-2000s, with the advent of Next-generation sequencing techniques, it has been necessary to use increasingly refined and complete databases to uniquely assign the taxonomy of each sequence or taxonomic unit. In this study we evaluate the relevance of the database curation in this assignation process.

Access Microbiology
Oct 26, 2023

The reviewers have highlighted major concerns with the work presented. Please ensure that you address their comments. Please deposit the data underlying the work in the Society’s data repository Figshare account here: https://microbiology.figshare.com/submit. Please also cite this data in the Data Summary of the main manuscript and list it as a unique reference in the References section. When you resubmit your article, the Editorial staff will post this data publicly on Figshare and add the DOI to the Data Summary section where you have cited it. This data will be viewable on the Figshare website with a link to the preprint and vice versa, allowing for greater discovery of your work, and the unique DOI of the data means it can be cited independently. Please provide more detail in the Methods section and ensure that software is …

The reviewers have highlighted major concerns with the work presented. Please ensure that you address their comments. Please deposit the data underlying the work in the Society’s data repository Figshare account here: https://microbiology.figshare.com/submit. Please also cite this data in the Data Summary of the main manuscript and list it as a unique reference in the References section. When you resubmit your article, the Editorial staff will post this data publicly on Figshare and add the DOI to the Data Summary section where you have cited it. This data will be viewable on the Figshare website with a link to the preprint and vice versa, allowing for greater discovery of your work, and the unique DOI of the data means it can be cited independently. Please provide more detail in the Methods section and ensure that software is consistently cited and its version and parameters included.

Read the original source
Access Microbiology
Oct 24, 2023

Comments to Author

I appreciate the author's efforts toward the accurate taxonomic assignment of 16S rRNA data. Current manuscript version is poorly written, and there is no research found for the topic. A similar study (PMID: 30602085) reports that EzBioCloud performs well compared with other existing databases. However, the study did not use EzBioCloud data to create curated database named WellMicro. Why only V3-V4 regions? As several metagenome data is publicly available, why did the study use whole genome data to create a mock dataset? The manuscript needs more elaboration with current measures in metagenome data analytics and the accuracy of WellMicro at the genus and species levels.

Please rate the manuscript for methodological rigour

Poor

Please rate the quality of the presentation …

Comments to Author

I appreciate the author's efforts toward the accurate taxonomic assignment of 16S rRNA data. Current manuscript version is poorly written, and there is no research found for the topic. A similar study (PMID: 30602085) reports that EzBioCloud performs well compared with other existing databases. However, the study did not use EzBioCloud data to create curated database named WellMicro. Why only V3-V4 regions? As several metagenome data is publicly available, why did the study use whole genome data to create a mock dataset? The manuscript needs more elaboration with current measures in metagenome data analytics and the accuracy of WellMicro at the genus and species levels.

Please rate the manuscript for methodological rigour

Poor

Please rate the quality of the presentation and structure of the manuscript

Poor

To what extent are the conclusions supported by the data?

Not at all

Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

No

Is there a potential financial or other conflict of interest between yourself and the author(s)?

No

If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

No: Ethical clearance not applicable to the study

Read the original source
Access Microbiology
Jan 3, 2023

Comments to Author

The authors are addressing a key issue in the microbiome field, as many of the 16S databases are limited in their curation. However, there are a few issues with the work which are essential before publication can be endorsed. Major issues; 1. The authors do not make their database available. Looking at the linked GitHub, the script for reproducing the results is good, but requires that you have all the databases, which are not present in the GitHub and no additional links are provided. Without the WMdb being made available, this paper provides no benefit to the community. If this database is closed to the community then publication can not be endorsed as results can not be validated. 2. The creation of the WMdb is unclear in parts, for example, the different databases use different lineage systems. …

Comments to Author

The authors are addressing a key issue in the microbiome field, as many of the 16S databases are limited in their curation. However, there are a few issues with the work which are essential before publication can be endorsed. Major issues; 1. The authors do not make their database available. Looking at the linked GitHub, the script for reproducing the results is good, but requires that you have all the databases, which are not present in the GitHub and no additional links are provided. Without the WMdb being made available, this paper provides no benefit to the community. If this database is closed to the community then publication can not be endorsed as results can not be validated. 2. The creation of the WMdb is unclear in parts, for example, the different databases use different lineage systems. How were these combined? This is a major issue in the field, and will be further added to with the creation of the SeqCode, how do you determine which taxonomy is correct? 3. The mock data provided by the authors in the GitHub are full length sequences. I assume these are those that were 'extracted from complete bacterial genomes randomly downloaded from NCBI'. But why are the V3-V4 regions not also provided? Where are the random subsets of the fragments which were included with artificial generated DNA sequences? I must say I am not impressed by the lack of data provided by the authors in this regard. Using the data provided by the authors, it would be impossible to replicate this study. However, I do think the idea of the mock communities is good. One issue though, is the potential bias of the WMdb to have been optimised to work on these mock communities by the authors having put additional work into ensuring these taxa are covered. As such, real life use-cases are needed to validate the results shown in FIgure 1. I would suggest analysis of the HMP 16S datasets, along with a terrestrial dataset, such as the TARA data. This would allow for the applicability of the WMdb to be accessed in a real world setting. Minor issue; 1. Greengenes has recently been updated and I would suggest including both the old, and new versions in the analysis. 2. Replace 'L1' etc. with the taxonomic level e.g. phyla. in the supplementary figures.

Please rate the manuscript for methodological rigour

Satisfactory

Please rate the quality of the presentation and structure of the manuscript

Good

To what extent are the conclusions supported by the data?

Partially support

Do you have any concerns of possible image manipulation, plagiarism or any other unethical practices?

No

Is there a potential financial or other conflict of interest between yourself and the author(s)?

No

If this manuscript involves human and/or animal work, have the subjects been treated in an ethical manner and the authors complied with the appropriate guidelines?

Yes

Read the original source
Version published to 10.1099/acmi.0.000545.v1 on Access Microbiology
Dec 16, 2022

Assessment of characteristic orthologs in Ascomycota and Basidiomycota and their implications for fungal taxonomy

This article has 10 authors:
1. Masako Takashima
2. Ri-ichiroh Manabe
3. Keita Aoki
4. Masahiro Yuki
5. Gen Okada
6. Yuuki Kobayashi
7. Takashi Sugita
8. Junta Sugiyama
9. Moriya Ohkuma
10. Naoto Tanaka
This article has no evaluationsLatest version Mar 17, 2026
BacTaxID: A universal framework for standardized bacterial typing

This article has 2 authors:
1. Val Lanza
2. Miguel Diez Fernández de Bobadilla
This article has no evaluationsLatest version Mar 2, 2026
Metagenomic-scale analysis of the predicted protein structure universe

This article has 11 authors:
1. Martin Steinegger
2. Jingi Yeo
3. Yewon Han
4. Nicola Bordin
5. Andy Lau
6. Shaun Kandathil
7. Hyunbin Kim
8. Eli Levy Karin
9. Milot Mirdita
10. David Jones
11. Christine Orengo
This article has no evaluationsLatest version Mar 31, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Assessment of characteristic orthologs in Ascomycota and Basidiomycota and their implications for fungal taxonomy

BacTaxID: A universal framework for standardized bacterial typing

Metagenomic-scale analysis of the predicted protein structure universe