Changes in Hospital Quality Associated with Hospital Value-Based Purchasing

The new england journal of medicine Special Article Changes in Hospital Quality Associated with Hospital Value-Based Purchasing Andrew M. Ryan, Ph.D., Sam Krinsky, M.A., Kristin A. Maurer, M.P.H., and Justin B. Dimick, M.D., M.P.H. ABSTRACT From the Department of Health Management and Policy, University of Michigan School of Public Health (A.M.R., S.K., K.A.M.), and the Department of Surgery, University of Michigan Medical School (J.B.D.) both in Ann Arbor. Address reprint requests to Dr. Ryan at the University of Michigan School of Public Health, 1415 Washington Heights, SPH II, Rm. M3124, Ann Arbor, MI 48109, or at amryan@ umich. edu. N Engl J Med 2017;376:2358-66. DOI: 10.1056/NEJMsa1613412 Copyright 2017 Massachusetts Medical Society. BACKGROUND Starting in fiscal year 2013, the Hospital Value-Based Purchasing (HVBP) program introduced quality performance based adjustments of up to 1% to Medicare reimbursements for acute care hospitals. METHODS We evaluated whether quality improved more in acute care hospitals that were exposed to HVBP than in control hospitals (Critical Access, which were not exposed to HVBP). The measures of quality were composite measures of clinical process and patient experience (measured in units of standard deviations, with a value of 1 indicating performance that was 1 standard deviation [SD] above the hospital mean) and 30-day risk-standardized mortality among patients who were admitted to the hospital for acute myocardial infarction, heart failure, or pneumonia. The changes in quality measures after the introduction of HVBP were assessed for matched samples of acute care hospitals (the number of hospitals included in the analyses ranged from 1364 for mortality among patients admitted for acute myocardial infarction to 2615 for mortality among patients admitted for pneumonia) and control hospitals (number of hospitals ranged from 31 to 617). Matching was based on preintervention performance with regard to the quality measures. We evaluated performance over the first 4 years of HVBP. RESULTS Improvements in clinical-process and patient-experience measures were not significantly greater among hospitals exposed to HVBP than among control hospitals, with difference-in-differences estimates of 0.079 SD (95% confidence interval [CI], 0.140 to 0.299) for clinical process and 0.092 SD (95% CI, 0.307 to 0.122) for patient experience. HVBP was not associated with significant reductions in mortality among patients who were admitted for acute myocardial infarction (difference-in-differences estimate, 0.282 percentage points [95% CI, 1.715 to 1.152]) or heart failure ( 0.212 percentage points [95% CI, 0.532 to 0.108]), but it was associated with a significant reduction in mortality among patients who were admitted for pneumonia ( 0.431 percentage points [95% CI, 0.714 to 0.148]). CONCLUSIONS In our study, HVBP was not associated with improvements in measures of clinical process or patient experience and was not associated with significant reductions in two of three mortality measures. (Funded by the National Institute on Aging.) 2358

Health care in the United States is extremely costly. There is compelling evidence that a large share of spending particularly in Medicare results in little or no patient benefit. 1-5 Quality performance also varies widely across hospitals. 6 Numerous public and private payer initiatives have attempted to resolve this conflict through value-based purchasing programs. 7,8 The Patient Protection and Affordable Care Act (ACA) established value-based purchasing programs throughout Medicare, including the Hospital Value-Based Purchasing (HVBP) program. Beginning in fiscal year (FY) 2013, the HVBP program made Medicare payments to acute care hospitals hospitals paid under the inpatient prospective payment system conditional on performance as assessed by a variety of metrics. Starting with clinical-process and patient-experience measures in FY 2013, the program expanded to include patient outcome measures in FY 2014 and spending measures in FY 2015. The size of the program incentives has also increased gradually from 1% of diagnosis-related group revenue in FY 2013 to 2% by FY 2017. Beginning in 2005, hospitals that were ultimately subject to HVBP also became subject to public quality reporting through the Hospital Compare website. 9 Despite the well-intended goals of value-based purchasing programs, evidence that these programs have improved quality and spending outcomes is mixed and far from convincing. 10-14 Previous research on the first 9 months of HVBP showed no evidence that the program improved performance as assessed by clinical-process or patient-experience measures. 15 Recent research also showed that HVBP did not reduce mortality during the first 30 months of the program. 16 Nonetheless, the longer-term effects of HVBP particularly with regard to clinical process and patient experience are unknown. It is possible that, despite the lack of early responsiveness to the program, the effects of HVBP may grow stronger over time, as the incentives increase and hospitals have time to respond to the program. It is also possible that the effects of HVBP are heterogeneous across hospitals. Of particular interest is whether hospital characteristics (e.g., teaching status, size, and Medicaid share) or engagement in health system delivery reforms (e.g., meaningful use of electronic health records, accountable care organization programs, and bundled payment) modify performance in the program. Methods Data, Study Population, and Study Outcomes Our study population included all U.S. hospitals that reported data through Hospital Compare (ranging from 4546 hospitals in the first release used in the study to 4799 hospitals in the most recent release). We restricted our study to general short-term acute care hospitals, which were exposed to HVBP (exposed hospitals), and Critical Access, which were exempt (control hospitals). In 2014, there were 1331 Critical Access in the United States. Reporting of quality measures for Critical Access through Hospital Compare varied considerably across study domains (Table S1 in the Supplementary Appendix, available with the full text of this article at NEJM.org). We excluded all children s, psychiatric, or other specialty facilities; Veterans Affairs hospitals; and hospitals in Maryland. Although hospitals in Maryland were not exposed to HVBP, they were subject to a similar set of financial incentives. 17 We excluded hospitals that did not report outcomes data in each study period, mainly because they did not meet minimum sample-size requirements as specified under HVBP rules. A total of 2842 hospitals were included in the clinical-process analysis, 3247 in the patient-experience analysis, and 2195, 3256, and 3525 hospitals in the analysis of mortality among patients who were admitted to the hospital for acute myocardial infarction, heart failure, and pneumonia, respectively. All clinical-process, patient-experience, and mortality data were downloaded from Hospital Compare. For the clinical-process and patient-experience outcomes, data were available for the eight annual measurement periods (ending June 30) from 2008 through 2015. We constructed a clinical-process composite that was based on the performance of hospitals across the seven indicators that were used as the basis for incentives in each of the first 3 years of the program and were reported through Hospital Compare for the duration of the study period. A list of the indicators included in the composite is provided in Table S2 in the Supplementary Appendix. The measure primary percutaneous coronary intervention received within 90 minutes after hospital arrival for patients with myocardial infarction was included for exposed hospitals but not for control hospitals (because percutaneous coro- 2359

The new england journal of medicine nary intervention is rarely performed at Critical Access ). For hospitals in the study sample that participated in HVBP, performance with regard to clinical-process indicators ranged from 99.7% compliance (appropriate venous thromboembolism prophylaxis for surgical patients) to 95.6% compliance (primary percutaneous coronary intervention for patients with acute myocardial infarction) by the end of the study period (Tables S3 through S6 in the Supplementary Appendix). To create the clinical-process composite, we standardized each indicator by subtracting the sample mean performance across the full sample (not matched) of exposed and control hospitals for the whole study period and dividing by the sample standard deviation. We then calculated the composite as the mean of these standardized indicator scores. A value of 1 indicates performance with regard to the composite measure that is 1 standard deviation (SD) above the mean among hospitals that met the inclusion criteria during the study period. The patient-experience composite consisted of eight indicators that were used as a basis for incentives under the program (a list of the indicators is provided in Table S2 in the Supplementary Appendix). As specified under the program rules, for each indicator we assessed the mean percentage of patients who reported excellent performance, known as top box performance (e.g., communication with doctors was always good). Among hospitals in the study sample that participated in HVBP, by the end of the study period, performance with regard to the patientexperience indicators ranged from 87.0% of patients who reported being given discharge instructions to 64.6% of patients who reported that staff always explained medications before administering them (Tables S7 through S10 in the Supplementary Appendix). To create a composite measure for patient experience, as with clinicalprocess performance, we calculated the mean of the standardized indicator scores. We evaluated hospital-level 30-day risk-standardized mortality among patients who were admitted for acute myocardial infarction, heart failure, or pneumonia as separate outcomes. Mortality was adjusted for patient age, sex, and clinical coexisting conditions (based on hierarchical condition categories). Mortality was also standardized to account for the variance in the estimates. Data on these measures were available for the seven overlapping 3-year periods (ending June 30) between 2008 and 2014. 18 For example, the 2008 extract included data from discharges occurring between July 1, 2006, and June 30, 2008, and the 2009 extract included data from discharges occurring between July 1, 2007, and June 30, 2009. We used data on teaching status, number of beds, and Medicaid share of inpatient days, all of which were obtained from Medicare cost reports between 2008 and 2015. To assess hospital participation in other payment reforms, we obtained publicly available data on dates of participation in the meaningful use of electronic health records (stage 1 or stage 2) 19 and the Bundled Payment for Care Improvement initiative. We also obtained data on dates of participation in the Pioneer and Medicare Shared Savings accountable care organization programs from Leavitt Partners. 20 Statistical Analysis We performed a difference-in-differences analysis to test the association between HVBP and the outcomes rewarded in the program. This analysis tests whether there were greater improvements in our study outcomes among exposed hospitals than among the control hospitals (Critical Access ). Although payment adjustments began to occur in FY 2013, we considered the start date of the HVBP to be July 2011, the first period in which hospital performance on quality measures was subject to incentives (with 2013 payment adjustments reflecting 2011 2012 performance). Critical Access are small, rural hospitals that are much different from the general short-term acute care hospitals that are subject to HVBP. To facilitate the comparison of outcomes between acute care hospitals and Critical Access, we created control groups of hospitals that had similar levels of and trends in the preintervention outcomes. To do this, we implemented a matching strategy using propensity scores, performing one-to-one matching with replacement and calipers (defining the maximum difference in the propensity score that was allowable for a match) of 0.01 of the propensity score. We restricted matches to exposed and control hospitals with overlapping ranges of propensity-score values, known as common support. 21 Matching was performed separately for each outcome. Our matching procedure first stratified outcomes on the basis of preinterven- 2360

tion trends and then matched hospitals within strata. 22 For clinical process and patient experience, we stratified into deciles, and for the mortality outcomes we stratified into quintiles, because of the small number of Critical Access that met the minimum case requirements. Multiple exposed hospitals could be matched to the same Critical Access Hospital. In the analysis, observations from Critical Access were weighted according to the number of matches between a given Critical Access Hospital and acute care hospitals. The matching procedure was implemented with Stata software, version 12 (StataCorp), with the use of a userwritten command. 23 Recent research suggests that matching can result in more accurate estimates in difference-in-differences analysis, particularly for measures like clinical process and patient experience, for which changes are closely related to baseline levels. 24 In the matched analysis, a relatively large share of acute care hospitals received suitable matches, ranging from 51% of hospitals for the patient-experience outcome to 93% for the pneumonia mortality outcome, and were therefore included in the analysis (Table 1). To test the effect of HVBP, we estimated a linear fixed-effects model at the hospital level. Models were estimated separately for our study outcomes. For the clinical-process and patient-experience models, we estimated the following model for hospital j at time t: Y jt = a 0 + b post t + δ (post t exposed j ) + ρu j + e jt. In this equation, Y represents the study outcome, a is the model intercept, b is the coefficient estimate for the post term, u is a vector of hospital fixed effects, ρ is a vector of coefficients for the hospital fixed effects, and e is the idiosyncratic error term. The term post is equal to 1 for observations occurring after the start of HVBP (and 0 otherwise), and the term exposed is equal to 1 for exposed hospitals (and 0 otherwise). The term post exposed is equal to 1 in the post-hvbp period among acute care hospitals and is equal to 0 otherwise. The differencein-differences estimate is the coefficient δ. The specification included hospital fixed effects (u). The term exposed is not included as a main effect because it does not vary over time and is therefore absorbed into the hospital fixed effect. Our models took a modified form for the mortality analysis. Because the available data extracts were for rolling 36-month periods, some observations included data that spanned the pre-hvbp and post-hvbp implementation period. To address this, our measure of hospital exposure to HVBP was specified as the proportion of months in a given data extract that followed the July 1, 2011, program start date. For instance, the 2012 data extract included discharges from July 1, 2009, through June 30, 2012, which we coded as 0.33. Similarly, we coded the 2013 extract as 0.66 and the 2014 extract as 1. For each outcome, we also estimated a separate specification that allowed the effect of HVBP to vary across a vector of hospital characteristics (teaching status, number of beds, and Medicaid share) and across hospital statuses with regard to participation in the meaningful use of electronic health records, the Bundled Payment for Care Improvement initiative, and the Pioneer and Medicare Shared Savings accountable care organization programs (see the Additional Description of Methods section in the Supplementary Appendix). Statistical tests and confidence intervals for the difference-in-differences estimates were based on nonparametric permutation tests with 2000 permutation resamples (see the Additional Description of Methods section in the Supplementary Appendix). These tests have been shown to have better properties than parametric methods for inference in the context of difference-in-differences analyses. 25 All analyses were performed with Stata software, version 12. 26 Results Characteristics of the Table 1 shows that, as compared with control hospitals, the matched sample of exposed hospitals were larger, were more likely to be teaching hospitals, had a higher share of Medicaid inpatient days, and were more likely to participate in the meaningful use of electronic health records for each of the study outcomes. However, the matched samples of exposed hospitals and control hospitals had very similar levels of preintervention performance with regard to each outcome, reflecting successful matching according to baseline performance. Changes in Performance Performance with regard to both clinical process and patient experience improved for both the exposed and the control hospitals before and 2361

The new england journal of medicine Table 1. Characteristics of the Samples of Matched and Used in Analyses of the Five Study Outcomes.* Characteristic Sample for Standardized Clinical-Process Composite Sample for Standardized Patient-Experience Composite Sample for 30-Day Risk- Standardized Mortality for Acute MI Admissions Sample for 30-Day Risk- Standardized Mortality for Heart Failure Admissions Sample for 30-Day Risk- Standardized Mortality for Pneumonia Admissions no. 2164 153 1507 237 1364 31 2383 419 2615 617 Hospital-years 17,312 1224 12,056 1896 9548 217 16,681 2933 18,305 4319 Preintervention score 0.32±0.77 0.33±0.79 0.01±0.66 0.07±0.61 16.07±1.49 16.04±1.23 11.49±1.38 11.53±1.28 11.86±1.74 11.87±1.66 Hospital characteristics % Teaching 37 1 28 2 39 0 32 1 33 1 BPCI 13 0 10 0 3 0 2 0 3 0 Meaningful use 96 82 97 80 96 87 96 82 96 83 ACO 19 15 18 11 15 10 13 5 14 6 Medicaid share % 12±9 10±5 12±9 10±8 13±8 12±7 13±9 8±6 13±9 8±7 Beds no. 223±187 24±2 177±170 24±2 242±182 25±0 203±183 24±3 208±187 24±4 * Plus minus values are means ±SD. The exposed hospitals were general short-term acute care hospitals, which were exposed to Hospital Value-Based Purchasing (HVBP); the control hospitals were Critical Access, which were exempt. To facilitate the comparison of outcomes between exposed hospitals and control hospitals, we used a matching strategy with propensity scores to create control groups of hospitals that had similar levels of and trends in the preintervention outcomes. Matching was performed separately for each outcome. BPCI denotes Bundled Payments for Care Improvement, and MI myocardial infarction. The standardized clinical-process composite is a composite of the clinical-process indicators that form the basis for incentives in HVBP. The composite is the mean of the individual indicators; each indicator in the composite has a mean of 0 for all hospitals over the study period and is expressed in units of its standard deviation (SD). The composite has negative values in the preintervention period because improvement occurred during the study period. A value of 1 indicates performance on the composite that is 1 SD above the mean among hospitals meeting inclusion criteria during the study period. The standardized patient-experience process composite is a composite of the patient-experience indicators that form the basis for incentives in HVBP and was constructed in the same way as the clinical-process composite. A list of the indicators included in both composite outcomes is provided in Table S2 in the Supplementary Appendix. This score is expressed as standard deviations relative to all hospitals meeting inclusion criteria during the study period. This score is expressed as the risk-standardized percentage of patients who died from the given condition within 30 days after a hospital admission. Data are the percentages of unique hospitals that participated in a given reform program at any time during the study period. The hospitals designated as participating in the meaningful use of electronic health records were those that received incentives under the Electronic Health Records Initiative (stage 1 or stage 2); those designated as participating in an accountable care organization (ACO) participated in the Pioneer or Medicare Shared Savings ACO Program. Medicaid share is measured as the share of inpatient days paid for by Medicaid over the study period. 2362

after HVBP was implemented (Fig. 1). and control hospitals had similar improvement after HVBP was implemented. Thirty-day riskstandardized mortality decreased among patients who were admitted for acute myocardial infarction, remained relatively constant among those admitted for pneumonia, and increased slightly among those admitted for heart failure in both the exposed and the control hospitals during the study period (Fig. 2). For mortality among patients who were admitted for acute myocardial infarction or heart failure, these trajectories were similar for the exposed and control hospitals before and after HVBP was initiated. Thirty-day risk-standardized mortality among patients who were admitted for pneumonia increased slightly in the post-hvbp period among the control hospitals. Table 2 shows the estimates for each of the study outcomes. The between-group differences in preintervention trends were not significant for any study outcome. The difference-in-differences estimates comparing the change in performance from the pre-hvbp period to the post- HVBP period between exposed and control hospitals indicated that HVBP was associated with a nonsignificant increase in clinical-process performance of 0.079 SD (95% confidence interval [CI], 0.140 to 0.299). HVBP was associated with a nonsignificant reduction in patient-experience performance ( 0.092 SD; 95% CI, 0.307 to 0.122). HVBP was not associated with significant reductions in 30-day risk-standardized mortality among patients who were admitted for acute myocardial infarction ( 0.282 percentage points; 95% CI, 1.715 to 1.152) or heart failure ( 0.212 percentage points; 95% CI, 0.532 to 0.108). HVBP was, however, associated with a significant reduction in 30-day risk-standardized mortality among patients who were admitted for pneumonia ( 0.431 percentage points; 95% CI, 0.714 to 0.148). Sensitivity Analysis Sensitivity analysis of the effects of HVBP across the full sample of hospitals showed an inconsistent pattern of results across the study outcomes, indicating no clear benefit in association with HVBP (Table S12 and Figs. S1 through S3 in the Supplementary Appendix). We also found that HVBP was not associated with improvement in an alternative measure of patient experience a single item indicating the percentage of patients A Clinical Process 0.50 Standardized Composite Score (SD) 0.25 0.00 0.25 0.50 0.75 B Patient Experience 0.60 Standardized Composite Score (SD) 1.00 2008 2009 2010 2011 2012 2013 2014 2015 0.50 0.40 0.30 0.20 0.10 0.00 0.10 Figure 1. Standardized Clinical-Process and Patient-Experience Performance among Matched and Matched, 2008 2015. The standardized clinical-process composite is a composite of the clinicalprocess indicators that form the basis for incentives in Hospital Value-Based Purchasing (HVBP). The composite is the mean of the individual indicators; each indicator in the composite has a mean of 0 for all hospitals over the study period and is expressed in units of its standard deviation (SD). The composite has negative values in the preintervention period because improvement occurred during the study period. A value of 1 indicates performance on the composite that is 1 SD above the mean among hospitals meeting inclusion criteria during the study period. The standardized patient-experience process composite is a composite of the patient-experience indicators that form the basis for incentives in HVBP and was constructed in the same way as the clinical-process composite. A list of the indicators included in both composite outcomes is provided in Table S2 in the Supplementary Appendix. who gave the hospital an overall rating of 9 or 10 out of 10 or an alternative clinical-process composite that excluded the measure primary percutaneous coronary intervention received within 90 minutes after hospital arrival for patients with acute myocardial infarction. Sensitivity analysis also showed little evidence that the effect of HVBP was modified by teaching status, size, and Medicaid share and hospital participation in the meaningful use of electronic health records, the Bundled Payment for Care Improve- 0.20 2008 2009 2010 2011 2012 2013 2014 2015 2363

The new england journal of medicine A Acute MI Admissions 30-Day Risk-Standardized Mortality (%) 17 16 15 14 13 12 11 0 2008 2009 2010 2011 2012 2013 2014 B Heart Failure Admissions 30-Day Risk-Standardized Mortality (%) 17 16 15 14 13 12 11 0 2008 2009 2010 2011 2012 2013 2014 C Pneumonia Admissions 30-Day Risk-Standardized Mortality (%) 17 16 15 14 13 12 11 0 2008 2009 2010 2011 2012 2013 2014 Figure 2. 30-Day Risk-Standardized Mortality among Patients Admitted to the Hospital for Acute Myocardial Infarction (MI), Heart Failure, or Pneumonia among Matched and Matched, 2008 2014. ment initiative, or accountable care organization programs. A related analysis showed that hospitals with a greater share of Medicare patients and therefore stronger incentives to improve did not improve more under HVBP than hospitals with a smaller share of Medicare patients. Results from models stratified according to baseline performance also showed no clear pattern of effect modification. Additional details regarding the results of the sensitivity analyses are provided in Tables S13 through S22 in the Supplementary Appendix. Discussion Our estimates of the effect of HVBP on clinical process, patient experience, and mortality were small, not consistent with one another in the direction of the association, and generally nonsignificant. The significant reduction in 30-day risk-standardized mortality among patients who were admitted to the hospital for pneumonia was driven by an increase in mortality in the matched sample of Critical Access. Because the incentives in the program did not appear to improve performance with regard to the clinicalprocess indicators related to pneumonia, 15 it is unlikely that HVBP would have reduced mortality among the patients who were admitted for pneumonia, since such reductions are harder to achieve than improvements in clinical process. We also found no meaningful variation in the effectiveness of the program across hospital characteristics and across statuses with regard to engagement in voluntary value-based reforms. Our study provides evidence that HVBP did not result in meaningful improvements in clinical process or patient experience or in a significant reduction in mortality during its first 4 years. Our results are consistent with those from studies that have shown that HVBP did not increase quality with regard to clinical process or patient experience in its first 9 months 15 and more recent research indicating that HVBP did not reduce mortality over the first 30 months of the program. 16 The seeming ineffectiveness of HVBP stands in contrast to the Medicare Hospital Readmissions Reduction Program (HRRP), which appears to have reduced rates of readmission for targeted conditions. 27 This may have resulted from the fact that incentives in HVBP are much smaller than the incentives in the HRRP. HVBP incentives are also spread over numerous domains and performance measures, further diluting the effect of the program. In addition, whereas the incentives in HVBP involve both bonuses and penalties, the HRRP uses only penalties. These penalties may have triggered loss aversion among hospital administrators, enhancing its effect. 28 In addition, HVBP uses a highly complex incentive design, rewarding hospitals for a combination of relative performance and improvement across numerous performance measures. The complex, wide-ranging, and evolving bonus-based incentive structure of the pro- 2364

Table 2. Estimates of the Association between HVBP and Incentive-Associated Outcomes. Difference-in-Differences Estimate (95% CI) Preintervention Difference in Annual Trend (95% CI)* Postintervention Change (95% CI) No. of Outcome standard deviations Standardized clinical-process composite 2317 0.001 ( 0.047 to 0.049) 0.697 (0.678 to 0.716) 0.617 (0.526 to 0.708) 0.079 ( 0.140 to 0.299) Standardized patient-experience composite 1744 0.004 ( 0.043 to 0.050) 0.354 (0.335 to 0.373) 0.447 (0.337 to 0.557) 0.092 ( 0.307 to 0.122) percentage points 1395 0.028 ( 0.220 to 0.277) 1.756 ( 1.841 to 1.670) 1.474 ( 2.282 to 0.667) 0.282 ( 1.715 to 1.152) 30-Day risk-standardized mortality for acute MI admissions 2802 0.016 ( 0.080 to 0.113) 0.479 (0.419 to 0.538) 0.691 (0.500 to 0.882) 0.212 ( 0.532 to 0.108) 30-Day risk-standardized mortality for heart failure admissions 3232 0.012 ( 0.083 to 0.060) 0.184 ( 0.255 to 0.114) 0.247 (0.076 to 0.419) 0.431 ( 0.714 to 0.148) 30-Day risk-standardized mortality for pneumonia admissions * The preintervention difference in trend is the difference between the annual linear trend for hospitals exposed to HVBP and control hospitals in the preintervention period (measurement periods ending between 2008 and 2011). For example, for mortality among patients admitted to the hospital for acute MI, exposed hospitals were changing at an annual rate that was 0.028 percentage points greater (i.e., worsening) than control hospitals, a difference that was not significant. The 95% confidence intervals (CIs) are based on a parametric t-test with clustered standard errors. The postintervention change is the difference between the mean preintervention outcome and the mean postintervention outcome. The 95% CIs are based on permutation tests from 2000 resamples. gram may be a less effective design than the simpler, more narrowly targeted, penalty-based design of the HRRP. 29 The Critical Access that formed the control group differ from exposed hospitals across a number of distinct dimensions, including size, teaching status, and baseline quality performance. Although we used matching to attempt to address differences in preintervention quality, expectations for quality improvement may differ between the Critical Access and hospitals exposed to HVBP. In addition, the control hospitals did not face financial penalties for not reporting quality of care through Hospital Compare, and therefore they reported data at lower rates than the exposed hospitals. However, a control group that was made up of more motivated Critical Access that voluntarily reported data through Hospital Compare would probably bias the results away from, rather than toward, the null. Changes in outcomes throughout the study period were very similar between the matched acute care hospitals and Critical Access (Figs. S4 and S5 in the Supplementary Appendix). This supports our use of Critical Access as a control group. We view our difference-in-differences analysis strategy with an imperfect control group to be superior to an interrupted time-series design, because the nonlinear trajectories of many of the study outcomes can lead to biased inferences when interrupted time-series designs are used (Tables S23 through S25 in the Supplementary Appendix). 30 The measures that form the basis for incentives under HVBP have been publicly reported for several years. also had high performance on some of the measures particularly the clinical-process indicators at the start of the program. As a result, the opportunity for additional improvement under HVBP may have been limited, decreasing the apparent effectiveness of the program. In addition, although HVBP was created by the passage of the ACA in 2010, hospitals may have attempted to improve quality in anticipation of the program. 15 Also, although the confidence intervals for our matching estimators were small for most of our outcomes, the estimates for mortality among patients who were admitted to the hospital for acute myocardial infarction had larger confidence intervals as a result of the lower number of hospitals that met caseload requirements. As 2365

a result, we had lower statistical power to determine the effect of HVBP on this measure of mortality. Finally, the incentives in the HVBP changed over time, and this may have modified hospital responsiveness to the program. Future research may evaluate whether the changing incentives in the program affected hospital performance. Our evaluation suggests that HVBP, which introduced small quality performance based adjustments in Medicare payments, has resulted in little tangible benefit over its first 4 years. It is possible that alternative incentive designs including those with simpler criteria for performance and larger financial incentives might have led to greater improvement among hospitals. It may be useful for the Centers for Medicare and Medicaid Services to continue to experiment with other value-based payment models, including the HRRP, accountable care organization programs, and bundled payment programs, in an effort to improve the value of hospital spending. Supported by grants from the National Institute on Aging (R01-AG-047932 to Dr. Ryan, Mr. Krinsky, and Ms. Maurer and R01 AG039434-05 to Dr. Dimick). Disclosure forms provided by the authors are available with the full text of this article at NEJM.org. We thank David Muhlestein for the use of the Leavitt Partners data on hospital participation in accountable care organization programs. References 1. Yasaitis L, Fisher ES, Skinner JS, Chandra A. Hospital quality and intensity of spending: is there an association? Health Aff (Millwood) 2009; 28: w566-w572. 2. Jha AK, Orav EJ, Dobson A, Book RA, Epstein AM. Measuring efficiency: the association of hospital costs and quality of care. Health Aff (Millwood) 2009; 28: 897-906. 3. Hussey PS, Wertheimer S, Mehrotra A. The association between health care quality and cost: a systematic review. Ann Intern Med 2013; 158: 27-34. 4. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 1: the content, quality, and accessibility of care. Ann Intern Med 2003; 138: 273-87. 5. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 2: health outcomes and satisfaction with care. Ann Intern Med 2003; 138: 288-98. 6. Jha AK, Li Z, Orav EJ, Epstein AM. Care in U.S. hospitals the Hospital Quality Alliance program. N Engl J Med 2005; 353: 265-74. 7. Rosenthal MB, Fernandopulle R, Song HR, Landon B. Paying for quality: providers incentives for quality improvement. Health Aff (Millwood) 2004; 23: 127-41. 8. Ryan AM, Damberg CL. What can the past of pay-for-performance tell us about the future of Value-Based Purchasing in Medicare? Healthc (Amst) 2013; 1: 42-9. 9. Ryan AM, Nallamothu BK, Dimick JB. Medicare s public reporting initiative on hospital quality had modest or no impact on mortality from three key conditions. Health Aff (Millwood) 2012; 31: 585-92. 10. Grossbart SR. What s the return? Assessing the effect of pay-for-performance initiatives on the quality of care delivery. Med Care Res Rev 2006; 63: Suppl: 29S-48S. 11. Lindenauer PK, Remus D, Roman S, et al. Public reporting and pay for performance in hospital quality improvement. N Engl J Med 2007; 356: 486-96. 12. Ryan AM, Blustein J, Casalino LP. Medicare s flagship test of pay-for-performance did not spur more rapid quality improvement among low-performing hospitals. Health Aff (Millwood) 2012; 31: 797-805. 13. Ryan AM. Effects of the Premier Hospital Quality Incentive Demonstration on Medicare patient mortality and cost. Health Serv Res 2009; 44: 821-42. 14. Jha AK, Joynt KE, Orav EJ, Epstein AM. The long-term effect of premier pay for performance on patient outcomes. N Engl J Med 2012; 366: 1606-15. 15. Ryan AM, Burgess JF Jr, Pesko MF, Borden WB, Dimick JB. The early effects of Medicare s mandatory hospital pay-forperformance program. Health Serv Res 2015; 50: 81-97. 16. Figueroa JF, Tsugawa Y, Zheng J, Orav EJ, Jha AK. Association between the Value- Based Purchasing pay for performance program and patient mortality in US hospitals: observational study. BMJ 2016; 353: i2214. 17. Calikoglu S, Murray R, Feeney D. Hospital pay-for-performance programs in Maryland produced strong results, including reduced hospital-acquired conditions. Health Aff (Millwood) 2012; 31: 2649-58. 18. Krumholz HM, Wang Y, Mattera JA, et al. An administrative claims model suitable for profiling hospital performance based on 30-day mortality rates among patients with an acute myocardial infarction. Circulation 2006; 113: 1683-92. 19. Data and program reports. Baltimore: Centers for Medicare and Medicaid Services, 2016 (https:/ / www.cms.gov/ regulations -and-guidance/ legislation/ ehrincentive programs/ dataandreports.html). 20. Colla CH, Lewis VA, Tierney E, Muhlestein DB. participating in ACOs tend to be large and urban, allowing access to capital and data. Health Aff (Millwood) 2016; 35: 431-9. 21. Caliendo M, Kopeinig S. Some practical guidance for the implementation of propensity score matching. J Econ Surv 2008; 22: 31-72. 22. Heckman JJ, Ichimura H, Todd PE. Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev Econ Stud 1997; 64: 605-54. 23. Leuven E, Sianesi B. PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. Chestnut Hill, MA: Boston College, 2003 (https:/ / ideas.repec.org/ c/ boc/ bocode/ s432001.html). 24. Ryan AM, Burgess JF Jr, Dimick JB. Why we should not be indifferent to specification choices for difference-in-differences. Health Serv Res 2015; 50: 1211-35. 25. Bertrand M, Duflo E, Mullainathan S. How much should we trust differencesin-differences estimates? Q J Econ 2004; 119: 249-75. 26. Stata statistical software: release 14. College Station, TX: StataCorp, 2015. 27. Zuckerman RB, Sheingold SH, Orav EJ, Ruhter J, Epstein AM. Readmissions, observation, and the hospital readmissions reduction program. N Engl J Med 2016; 374: 1543-51. 28. Tversky A, Kahneman D. Loss aversion in riskless choice: a reference-dependent model. Q J Econ 1991; 106: 1039-61. 29. Doran T, Maurer KA, Ryan AM. Impact of provider incentives on quality and value of health care. Annu Rev Public Health 2017; 38: 449-65. 30. Kontopantelis E, Doran T, Springate DA, Buchan I, Reeves D. Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis. BMJ 2015; 350: h2750. Copyright 2017 Massachusetts Medical Society. 2366