Agreeability testing of AMSTAR-PF, a tool for quality appraisal of systematic reviews of prognostic factor studies

ML Henry
NE O’Connell
RD Riley
KGM Moons
BJ Shea
L Hooft
SB Wallwork
JAA Damen
N Skoetz
RP Appiah
C Berryman
SM Crouch
GA Ferencz
AR Grant
KM Henry
AM Herman
EL Karran
I Koralegedera
HB Leake
E MacIntyre
B Mouatt
K Phuentsho
DA Van Der Laan
E Welsby
LK Wiles
EM Wilkinson
MK Wilson
MV Wilson
GL Moseley

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

This paper details initial testing of the agreeability and usability of a novel quality appraisal tool for systematic reviews of prognostic factor studies: AMSTAR-PF.

Methods

Fourteen appraisers each assessed eight systematic reviews using AMSTAR-PF. Their ratings for each question and each article were compared, with interrater, inter-pair and intrapair agreeability calculated using Gwet’s agreement coefficient. Time of use and time to reach consensus were also recorded.

Results

Interrater agreement averaged 0.59 (range, 0.21-0.90), inter-pair 0.61 (range 0.24-0.91) and intrapair 0.75 (range 0.45-0.95) across the domains, with agreement for the overall rating 0.46 (95%CI 0.30-0.62) for interrater, 0.46 (95%CI 0.17-0.74) for inter-pair, and 0.68 (range of averages 0.22-1.00) for intrapair agreement. The majority (60.7%) of intrapair ratings were identical, with 94.6% of final ratings either identical or only one category different for the overall appraisal. The time taken to appraise a study with AMSTAR-PF improved with use and averaged around 34 minutes after the first two appraisals.

Conclusions

Despite some variance in agreeability for different domains and between different appraisers, the testing results suggest that AMSTAR-PF has clear utility for appraising the quality of systematic reviews of prognostic factor studies.

Version published to 10.1101/2025.04.10.25325555 on medRxiv
Apr 14, 2025

The impact of a rapid risk of bias assessment compared to a traditional assessment with QUADAS-2

This article has 7 authors:
1. Shona Haston
2. Ryan PW Kenny
3. Nick Meader
4. Gemma Frances Spiers
5. Louise Tanner
6. Gurdeep S Sagoo
7. Gill Norman
This article has no evaluationsLatest version Jan 16, 2026
Unified tools for assessing the methodological quality of intervention effects in rapid reviews: a scoping review

This article has 10 authors:
1. Deborah Edwards
2. Emily C Clark
3. Judit Csontos
4. Maureen Dobbins
5. Elizabeth Gillen
6. Juliet Hounsome
7. Sarah E. Neil-Sztramko
8. Ruth Lewis
9. Mala Mann
10. Gillian Prue
This article has no evaluationsLatest version Feb 4, 2026
Transparency in Psychometric Reporting: A Review of Scales for Well-Being and Quality of Life in Older Persons Employing Rasch Analysis

This article has 4 authors:
1. Jeanette Melin
2. Marit Preuter
3. Marie-Louise Möllerberg
4. Kristofer Årestedt
This article has no evaluationsLatest version Dec 16, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

The impact of a rapid risk of bias assessment compared to a traditional assessment with QUADAS-2

Unified tools for assessing the methodological quality of intervention effects in rapid reviews: a scoping review

Transparency in Psychometric Reporting: A Review of Scales for Well-Being and Quality of Life in Older Persons Employing Rasch Analysis