Departing from four points: Psychometric implications of modifying response scale width for the PHQ-9 in repeated measurements

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Despite widespread use of the Patient Health Questionnaire-9 (PHQ-9) in depression screening, its default 4-point response scale lacks empirical justification. This study investigated how response scale width affects PHQ-9 psychometric properties across distributional characteristics, reliability, internal structure, and external validity. Methods: Using a within-participant design, 549 undergraduate students (71.5% female, M age = 20.93 years) completed baseline PHQ-9, CESD-R-10, and PANAS measures. Participants then completed the same instruments in 19 near-daily assessments over a maximum of 35 days, with each administration using different response scale formats ranging from 2 to 20 points in randomized order. Analyses examined distributional properties, internal consistency, test-retest reliability, confirmatory factor analysis, and external validity correlations. Results: Reducing response options to 2-3 points caused floor effects, lower means, and reduced variability, while expanding beyond 4 points increased means and standard deviations. Internal consistency and test-retest reliability improved from 2 to 5 scale points, with minimal incremental gains beyond this threshold. The one-factor model demonstrated comparable fit across the full spectrum of observed response-scale widths, yielding no persuasive evidence of systematic trends. External validity correlations with CESD-R-10 and PANAS Negative Affect increased modestly between 2-5 points then stabilized, while PANAS Positive Affect correlations remained stable across all response widths. Statistical comparisons revealed that 5-point and 7-point formats significantly outperformed 2-3 point scales, but provided limited evidence for advantages over the standard 4-point format. Conclusion: These findings support the current 4-point format as a defensible and efficient standard for depression screening applications. Although modest improvements were observed with wider scales, these gains do not justify departing from the well-established 4-point standard, particularly given existing clinical cutoffs and interpretive frameworks. Practitioners should avoid reducing response scale width below 4 points, as this causes clear psychometric deterioration.

Article activity feed