Gen2: Building a Reviewer-Defensible Benchmark for Binding Hypothesis Triage in Cryptic Pocket Discovery

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Benchmarking in cryptic-pocket and allosteric discovery is often weakened by forcing heterogeneous case studies into pooled scoring despite ambiguous labels, unstable site assignment, missing row-level outputs, or mismatched evidential standards. Here, we present Gen2 as a governance-first benchmark framework for binding hypothesis triage in cryptic pocket discovery. Rather than treating benchmark assembly as a secondary administrative step, Gen2 treats it as part of the scientific method: each candidate slice is screened against frozen evidential rules, assigned a bounded role, and either admitted, parked, excluded, or retained as calibration or falsification material before pooled evaluation is considered. Applying this framework to the current panel produced a preserved no-active-slice-open checkpoint. Under these rules, HIF-2α remained policy-closed, TP53 Y220C remained calibration-only, CK2 was retained as falsification material, and KRAS G12D and PTP1B remained non-row-ready for different reasons. The principal result is therefore not pooled benchmark performance, but demonstration that Gen2 prevents invalid pooled claims by blocking premature scoring and preserving only reviewer-defensible evaluable units. This establishes a reproducible benchmark-construction layer for future multi-slice evaluation once row-ready systems and explicit row mappings are available.

Article activity feed