Comparison with PHQ-2 | Whooley Questions

How do the Whooley Questions differ from the PHQ-2?

Six years after the Whooley Questions were validated in 1997, an almost identical instrument, the 2-item Patient Health Questionnaire (PHQ-2), was published in 2003. The PHQ-2 comprises the first 2 questions from the 9-item Patient Health Questionnaire (PHQ-9), a self-report version of the PRIME-MD "Clinician Evaluation." The Whooley Questions comprise 2 questions from the PRIME-MD "Patient Questionnaire." The PHQ-2 has a different time frame (last 2 weeks vs. past month), response format (multiple choice vs. yes/no), and range of scores (0 to 6 vs. 0 to 2) than the Whooley questions. For these reasons, the (yes/no) Whooley Questions are more sensitive, easier to administer and simpler to score than the (multiple choice) PHQ-2. However, these two instruments also have many similarities (Table). Most importantly, both have poor specificity (I.e., many false positives). Thus, a positive screen on either the Whooley questions or the PHQ-2 must be followed by a clinical interview to establish the diagnosis of major depressive disorder.

In 2015 and 2016, Bosanquet and colleagues from the University of York published a meta-analysis of ten studies (including a total of 4618 patients) that compared the test characteristics of the “Whooley Questions” with a gold standard diagnostic interview for depression. Using the standard cut point of at least 1 (out of 2 possible points), they calculated a pooled sensitivity of 0.95 (95% Confidence Interval (CI), 0.88-0.97) and a pooled specificity of 0.65 (95% CI, 0.56-0.74). This same group also conducted a meta-analysis of more than 20 studies (with a total of >10,000 patients) comparing the test characteristics of the PHQ-2 with a gold standard diagnostic interview for depression. Manea and colleagues calculated a pooled sensitivity of 0.76 (95% CI, 0.68–0.82), and a pooled specificity of 0.87 (95% CI, 0.82–0.90) for a cut point of ≥3 (out of 6 possible points). Pooled sensitivity was 0.91 (95% CI, 0.85–0.94) and pooled specificity was 0.70 (95% CI, 0.64–0.76) for a cut point of ≥2 on the PHQ-2. An editorial published in JAMA Internal Medicine summarized these findings.

One published study has evaluated the test characteristics of both the “Whooley Questions” and the “PHQ-2” (compared with a gold standard diagnostic interview for depression) in the same patients. In a sample of 1024 patients with coronary heart disease), Elderon et al reported that a cut point of at least 1 on the “Whooley Questions” had a sensitivity of 0.90 (specificity, 0.69), and a cut point of at least 2 on the “PHQ-2” had a sensitivity of 0.82 (specificity, 0.79).

How do the Whooley Questions differ from the PHQ-9?

The 9-item Patient Health Questionnaire (PHQ-9) is a diagnostic tool that assesses the frequency of 9 depressive symptoms (not at all, several days, more than half the days, nearly every day) during the past 2 weeks. It originated as a self-report version of the PRIME-MD "Clinician Evaluation." The PHQ-9 is more specific (i.e., generates fewer false positives) than the Whooley Questions, but this is offset by worse sensitivity (more false negatives). Assuming a 20% prevalence of major depressive disorder, for example, false positive rates for the PHQ-9 would be only 6% (vs. 28% for the Whooley Questions), but false negative rates would be 5% (vs. 1% for the Whooley Questions). The main disadvantage of the PHQ-9 is that it is more complicated to administer and takes longer to complete than the Whooley Questions. The Whooley Questions rule out depression in more than half of patients. Thus, more than half of patients must complete the PHQ-9 unnecessarily.

Unfortunately, there is no perfect screening instrument for depression. If a screening instrument were perfect, all patients who test positive (and no patients who test negative) would have depression. Instead, higher sensitivity (fewer false negatives) always comes at the cost of lower specificity (more false positives). Although the Whooley Questions maximize sensitivity (few false negatives), this comes at the expense of poor specificity (many false positives). Other depression screening instruments, such as the PHQ-9, have higher specificity (fewer false positives), but they also have lower sensitivity (more false negatives) than the Whooley questions. Gilbody and colleagues found that the pooled sensitivity for a cut point of at least 10 (out of 27 possible points) on the PHQ-9 was 0.80 (95% Confidence Interval 0.71-0.87), and the pooled specificity was 0.92 (95% Confidence Interval 0.88-0.95).