Unsupervised explainable AI for simultaneous molecular evolutionary study of forty thousand SARS-CoV-2 genomes
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes. While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes.
Article activity feed
-
SciScore for 10.1101/2020.10.11.335406: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization To solve this problem of 7-mers in PCA, 1/100 of the sequences were randomly selected, and the PCA result obtained for these sequences was used to set the initial state. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study …SciScore for 10.1101/2020.10.11.335406: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization To solve this problem of 7-mers in PCA, 1/100 of the sequences were randomly selected, and the PCA result obtained for these sequences was used to set the initial state. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on page 4. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-