Evaluating generalizability of artificial intelligence models for molecular datasets

Yasha Ektefaie
Andrew Shen
Daria Bykova
Maximillian G. Marin
Marinka Zitnik
Maha Farhat

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

No abstract available

Version published to 10.1038/s42256-024-00931-6
Dec 6, 2024
Arcadia Science
Apr 12, 2024

All data is also available on the project Github at https://github.com/mims-harvard/SPECTRA and on Harvard Dataverse at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/W5UUNN.

This is great that all your data is available! It would be helpful to provide a LICENSE in the repo so others know the terms of reuse, and some improved documentation on how to exactly use SPECTRA for different cases - such as some of the rationale for the SP decisions are here in the discussion and could help with examples in the repo as well

Read the original source
Arcadia Science
Apr 12, 2024

We define a spectral property (SP) as a MSP expected to affect model generalizability for a specific task (e.g. 3D protein structure for protein binding prediction). The definition of the spectral property is task-specific and, together with the molecular sequence dataset and model, are the only inputs to SPECTRA

I think this should be earlier in the introduction

Read the original source
Arcadia Science
Apr 12, 2024

Main

Overall this is a really well written introduction that can be understood by a general audience! I learned a lot and also looking forward to digging into some of the cited references.

Read the original source
Arcadia Science
Apr 12, 2024

a spectral property definition

I think I'm confused on what this is supposed to be even after having finished reading this paragraph

Read the original source
Arcadia Science
Apr 12, 2024

generating a spectral performance curve (SPC). We propose the area under this curve (AUSPC)

It's pretty early in the paper and it's pretty acronym heavy, I think some of these terms like spectral performance curve and area under the curve might not need to be abbreviated since the reader will have to think back to what these terms are each time, and there is already MB and SB.

Read the original source
Arcadia Science
Apr 12, 2024

metadata-based (MB) or similarity-based (SB)

Just a small note - in the abstract SB is referred to as "sequence-similarity based" and here just similarity based, would be good to be consistent

Read the original source
Version published to 10.1101/2024.02.25.581982 on bioRxiv
Feb 28, 2024

Blind Challenges Let Us See the Path Forward for Predictive Models

This article has 4 authors:
1. John D. Chodera
2. W. Patrick Walters
3. Sriram Kosuri
4. James S. Fraser
This article has no evaluationsLatest version Jan 27, 2026
Blind Challenges Let Us See the Path Forward for Predictive Models

This article has 4 authors:
1. John D. Chodera
2. W. Patrick Walters
3. Sriram Kosuri
4. James S. Fraser
This article has no evaluationsLatest version Jan 27, 2026
LinkerMind: An Interpretable, Mechanism-Informed Deep Learning Framework for the De Novo Design of Antibody Drug Conjugate Linkers

This article has 1 author:
1. Martins Otun
This article has no evaluationsLatest version Dec 19, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Blind Challenges Let Us See the Path Forward for Predictive Models

Blind Challenges Let Us See the Path Forward for Predictive Models

LinkerMind: An Interpretable, Mechanism-Informed Deep Learning Framework for the De Novo Design of Antibody Drug Conjugate Linkers