Science should be machine-readable
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
We develop a machine-automated approach for extracting results from papers, which we assess via a comprehensive review of the entire eLife corpus. Our method facilitates a direct comparison of machine and peer review, and sheds light on key challenges that must be overcome in order to facilitate AI-assisted science. In particular, the results point the way towards a machine-readable framework for disseminating scientific information. We therefore argue that publication systems should optimize separately for the dissemination of data and results versus the conveying of novel ideas, and the former should be machine-readable.
Article activity feed
-
This is such an awesome effort. Some thoughts while reading:
I have been wary of claims-based review given how hard it is to decouple claims from those made to chase editorial outcomes. I’d be very interested if we might prospectively use this type of result extraction to re-derive a supported claim (regardless of whether that was the claim made) and have that those be the content we operate on. I could even imagine authors using this type of supported content earlier in their workflow to make their narratives more robust rather than only post-hoc in evaluation.
That said, a useful distinction made here is to separate claims based on data/citations from claims that are inference or speculation. The latter of course are important for proposing parsimonious models that one can use as a theoretical framework for subsequent hypothesis …
This is such an awesome effort. Some thoughts while reading:
I have been wary of claims-based review given how hard it is to decouple claims from those made to chase editorial outcomes. I’d be very interested if we might prospectively use this type of result extraction to re-derive a supported claim (regardless of whether that was the claim made) and have that those be the content we operate on. I could even imagine authors using this type of supported content earlier in their workflow to make their narratives more robust rather than only post-hoc in evaluation.
That said, a useful distinction made here is to separate claims based on data/citations from claims that are inference or speculation. The latter of course are important for proposing parsimonious models that one can use as a theoretical framework for subsequent hypothesis generation or orthogonal testing. One awesome application of this in a future machine-readable world would be a forward-looking comparison of inference and speculation type claims with results from the rest of the corpus not just within-paper support.
Not at all a surprise is the increased coverage by LLMs compared to a few human reviewers who indeed likely only have expertise that covers a fraction of any interdisciplinary or collaborative work (the expertise of many authors went into a paper - the expertise of a reviewer is unlikely to span those).
Another thing that strikes me as invaluable here is the implicit and uncited result pairs that really will be essentially to generate knowledge graphs that can more usefully be built upon by new science/other scientists and far less biased than citation graphs.
Finally, as someone who spent many years invested in making the flip to the new eLife model (eliminating accept/reject editorial decisions), I was particularly interested in figure 2a comparing the old and new models. Even if unsupported claims had been dramatically higher or even high in an absolute sense, it would highlight the benefit of public reviews for such content rather than the same content being rejected and published as-is with a stamp of editorial acceptance elsewhere. However it's striking that even with many more claims evaluated by OpenEval, the unsupported claims are low whether authors decide when to stop revising their papers or when acceptance is gated by editors.
-