Evaluating generalizability of artificial intelligence models for molecular datasets
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
Article activity feed
-
-
All data is also available on the project Github at https://github.com/mims-harvard/SPECTRA and on Harvard Dataverse at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/W5UUNN.
This is great that all your data is available! It would be helpful to provide a LICENSE in the repo so others know the terms of reuse, and some improved documentation on how to exactly use SPECTRA for different cases - such as some of the rationale for the SP decisions are here in the discussion and could help with examples in the repo as well
-
We define a spectral property (SP) as a MSP expected to affect model generalizability for a specific task (e.g. 3D protein structure for protein binding prediction). The definition of the spectral property is task-specific and, together with the molecular sequence dataset and model, are the only inputs to SPECTRA
I think this should be earlier in the introduction
-
Main
Overall this is a really well written introduction that can be understood by a general audience! I learned a lot and also looking forward to digging into some of the cited references.
-
a spectral property definition
I think I'm confused on what this is supposed to be even after having finished reading this paragraph
-
generating a spectral performance curve (SPC). We propose the area under this curve (AUSPC)
It's pretty early in the paper and it's pretty acronym heavy, I think some of these terms like spectral performance curve and area under the curve might not need to be abbreviated since the reader will have to think back to what these terms are each time, and there is already MB and SB.
-
metadata-based (MB) or similarity-based (SB)
Just a small note - in the abstract SB is referred to as "sequence-similarity based" and here just similarity based, would be good to be consistent
-