Gen2: Building a Reviewer-Defensible Benchmark for Binding Hypothesis Triage in Cryptic Pocket Discovery
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Benchmarking in cryptic-pocket and allosteric discovery is often weakened by forcing heterogeneous case studies into pooled scoring despite ambiguous labels, unstable site assignment, missing row-level outputs, or mismatched evidential standards. Here, we present Gen2 as a governance-first benchmark framework for binding hypothesis triage in cryptic pocket discovery. Rather than treating benchmark assembly as a secondary administrative step, Gen2 treats it as part of the scientific method: each candidate slice is screened against frozen evidential rules, assigned a bounded role, and either admitted, parked, excluded, or retained as calibration or falsification material before pooled evaluation is considered. Applying this framework to the current panel produced a preserved no-active-slice-open checkpoint. Under these rules, HIF-2α remained policy-closed, TP53 Y220C remained calibration-only, CK2 was retained as falsification material, and KRAS G12D and PTP1B remained non-row-ready for different reasons. The principal result is therefore not pooled benchmark performance, but demonstration that Gen2 prevents invalid pooled claims by blocking premature scoring and preserving only reviewer-defensible evaluable units. This establishes a reproducible benchmark-construction layer for future multi-slice evaluation once row-ready systems and explicit row mappings are available.