Departing from four points: Psychometric implications of modifying response scale width for the PHQ-9 in repeated measurements
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Despite widespread use of the Patient Health Questionnaire-9 (PHQ-9) in depression screening, its default 4-point response scale lacks empirical justification. This study investigated how response scale width affects PHQ-9 psychometric properties across distributional characteristics, reliability, internal structure, and external validity. Methods: Using a within-participant design, 549 undergraduate students (71.5% female, M age = 20.93 years) completed baseline PHQ-9, CESD-R-10, and PANAS measures. Participants then completed the same instruments in 19 near-daily assessments over a maximum of 35 days, with each administration using different response scale formats ranging from 2 to 20 points in randomized order. Analyses examined distributional properties, internal consistency, test-retest reliability, confirmatory factor analysis, and external validity correlations. Results: Reducing response options to 2-3 points caused floor effects, lower means, and reduced variability, while expanding beyond 4 points increased means and standard deviations. Internal consistency and test-retest reliability improved from 2 to 5 scale points, with minimal incremental gains beyond this threshold. The one-factor model demonstrated comparable fit across the full spectrum of observed response-scale widths, yielding no persuasive evidence of systematic trends. External validity correlations with CESD-R-10 and PANAS Negative Affect increased modestly between 2-5 points then stabilized, while PANAS Positive Affect correlations remained stable across all response widths. Statistical comparisons revealed that 5-point and 7-point formats significantly outperformed 2-3 point scales, but provided limited evidence for advantages over the standard 4-point format. Conclusion: These findings support the current 4-point format as a defensible and efficient standard for depression screening applications. Although modest improvements were observed with wider scales, these gains do not justify departing from the well-established 4-point standard, particularly given existing clinical cutoffs and interpretive frameworks. Practitioners should avoid reducing response scale width below 4 points, as this causes clear psychometric deterioration.