Most of us would agree that there aren't enough valid and meaningful health care quality measures to guide patients' choices of hospitals and physicians. While the federal government has steadily expanded the number of publicly available measures on its Hospital Compare website, it still falls short of what many patients, payers and providers would like. This is particularly true in the realm of outcomes such as infections and mortality rates, and in provider-level ratings.
Journalists and other ratings-making organizations have recently attempted to fill the measurement chasm left by policymakers and health care professionals. In July, nonprofit journalism organization ProPublica unveiled its Surgeon Scorecard, posting the "Adjusted Complication Rates" for more than 16,000 physicians in eight inpatient procedures. The Scorecard’s release set off an intense debate within the health care community about the validity of the measure as well as the requirements of journalists when they function as scientists to create new measures. With the Surgeon Scorecard, ProPublica acted as judge and jury; they defined the measure, deemed it valid, and declared which surgeons were low quality. What assurances does the public have that such "vigilante" measures are scientifically sound? While ProPublica says its work was "guided by experts," that review was informal.
Shortly after the Scorecard was issued, some detractors on social media called for it to undergo peer review, a process that is typical for government-issued measures. That review was delivered on Friday, when several researchers in health care quality measurement, including me, published a critique on the RAND Corporation website. Our conclusion: patients should not consider the Scorecard a valid or reliable predictor of any individual surgeon's outcomes.
Among several concerns raised, we pointed out that the Adjusted Complication Rate, which was based mostly on readmissions, was not a true complication rate. The measure didn't consider complications that occurred during a hospital admission and ignored many complications that are most meaningful to patients. For instance, erectile dysfunction is common after radical prostatectomy (more than 50 percent, according to some estimates), but it was not tracked in the ProPublica measure. We also found problems with the underlying data used by ProPublica: Some surgical cases were attributed to non-surgeons or to surgeons in the wrong specialty — a finding that suggests the existence of other errors that are harder to detect.
Developing and vetting a valid new quality measure can be hard, tedious and controversial. Yet that process unearths weaknesses, improves the final product, and ultimately makes the measure more useful to patients and physicians. No matter who creates a measure — the government, journalists or nonprofit groups — we all have a duty to ensure it receives the highest level of scrutiny before it's issued, not after the fact. When journalists act as scientists, they should be held to the standards of scientists.
The Surgeon Scorecard and the Need for Measurement Standards,