Measuring Attending Physician Performance in a General Medicine Outpatient Clinic

Similar documents
A comparison of two measures of hospital foodservice satisfaction

Methods to Validate Nursing Diagnoses

Assessing Resident Competency in an Outpatient Setting

Effect of DNP & MSN Evidence-Based Practice (EBP) Courses on Nursing Students Use of EBP

The attitude of nurses towards inpatient aggression in psychiatric care Jansen, Gradus

Medicaid HCBS/FE Home Telehealth Pilot Final Report for Study Years 1-3 (September 2007 June 2010)

Research Brief IUPUI Staff Survey. June 2000 Indiana University-Purdue University Indianapolis Vol. 7, No. 1

Employers are essential partners in monitoring the practice

The Quality of Therapeutic Alliance between Patient and Provider Predicts General Satisfaction

Recent changes in the delivery and financing of health

Measuring healthcare service quality in a private hospital in a developing country by tools of Victorian patient satisfaction monitor

Development and Psychometric Testing of the Mariani Nursing Career Satisfaction Scale Bette Mariani, PhD, RN Villanova University

Comparing Job Expectations and Satisfaction: A Pilot Study Focusing on Men in Nursing

Case-mix Analysis Across Patient Populations and Boundaries: A Refined Classification System

Situational Judgement Tests

time to replace adjusted discharges

Comparison of Duties and Responsibilities

Original Article Rural generalist nurses perceptions of the effectiveness of their therapeutic interventions for patients with mental illness

EPAs and Milestones: Integrating Competency Assessment into Authentic Clinical Practice. Robert Englander, MD MPH APD Meeting September 15 th, 2012

INPATIENT SURVEY PSYCHOMETRICS

Patients satisfaction with mental health nursing interventions in the management of anxiety: Results of a questionnaire study.

An Assessment Tool for Aseptic Technique in Resident Physicians: A Journey Towards Validation in the Real World of Limited Supervision

Analysis of Nursing Workload in Primary Care

USE OF NURSING DIAGNOSIS IN CALIFORNIA NURSING SCHOOLS AND HOSPITALS

SCERC Needs Assessment Survey FY 2015/16 Oscar Arias Fernandez, MD, ScD and Dean Baker, MD, MPH

Summary Report of Findings and Recommendations

PG snapshot Nursing Special Report. The Role of Workplace Safety and Surveillance Capacity in Driving Nurse and Patient Outcomes

A Qualitative Study of Master Patient Index (MPI) Record Challenges from Health Information Management Professionals Perspectives

Employee Telecommuting Study

Research. Setting and Validating the Pass/Fail Score for the NBDHE. Introduction. Abstract

TC911 SERVICE COORDINATION PROGRAM

Accepted Manuscript. Hospitalists, Medical Education, and US Health Care Costs,

Approximately 180,000 patients die annually in the

SEPTEMBER O NE-YEAR S URVEY SURVEY REPORT. Bachelor s Degree in Nursing Program

Bedside Teaching Creating Competent Physicians

American Board of Dental Examiners (ADEX) Clinical Licensure Examinations in Dental Hygiene. Technical Report Summary

Increasing cultural diversity and an aging population

Performance Measurement of a Pharmacist-Directed Anticoagulation Management Service

2018 HIMSS U.S. Leadership and Workforce Survey

Impact of hospital nursing care on 30-day mortality for acute medical patients

Running Head: READINESS FOR DISCHARGE

Amany A. Abdrbo, RN, MSN, PhD C. Christine A. Hudak, RN, PhD Mary K. Anthony, RN, PhD

Innovations in Primary Care Education was a

Description and Evaluation of an Educational Intervention on Health Care Costs and Value

Sampling Error Can Significantly Affect Measured Hospital Financial Performance of Surgeons and Resulting Operating Room Time Allocations

Influence of Professional Self-Concept and Professional Autonomy on Nursing Performance of Clinic Nurses

Impact of Scribes on Performance Indicators in the Emergency Department

SEPTEMBER O NE-YEAR S URVEY SURVEY REPORT. Associate Degree in Nursing Program

Outcome and Process Evaluation Report: Crisis Residential Programs

Barriers & Incentives to Obtaining a Bachelor of Science Degree in Nursing

Sources of occupational stress in New Zealand primary teachers. Bryan Tuck and Eleanor Hawe. Auckland College of Education. Auckland.

Policies and Procedures for In-Training Evaluation of Resident

Required Competencies for Nurse Managers in Geriatric Care: The Viewpoint of Staff Nurses

CMS-0044-P; Proposed Rule: Medicare and Medicaid Programs; Electronic Health Record Incentive Program Stage 2

ORIGINAL STUDIES. Participants: 100 medical directors (50% response rate).

SEPTEMBER E XIT S URVEY SURVEY REPORT. Bachelor s Degree in Nursing Program. 4

Nursing Practice Environments and Job Outcomes in Ambulatory Oncology Settings

INTRODUCTION AND OVERVIEW

E valuation of healthcare provision is essential in the ongoing

Appendix A Registered Nurse Nonresponse Analyses and Sample Weighting

Registered Nurses. Population

Merced College Registered Nursing 34: Advanced Medical/Surgical Nursing and Pediatric Nursing

Tying It All Together? A Competency-based Linkage Model for Family Medicine

SURGEONS ATTITUDES TO TEAMWORK AND SAFETY

Patient Satisfaction with Medical Student Participation in the Private OB/Gyn Ambulatory Setting

Evaluation of an independent, radiographer-led community diagnostic ultrasound service provided to general practitioners

Kingston Hospital Integration Perceptions of the General Public. Survey Results Final Report October 21, 2016 Prepared by HILL+KNOWLTON STRATEGIES

Inpatient Experience Survey 2012 Research conducted by Ipsos MORI on behalf of Great Ormond Street Hospital

Information systems with electronic

Developing a measure of facilitators and barriers to rapid response team activation

SCHOOL - A CASE ANALYSIS OF ICT ENABLED EDUCATION PROJECT IN KERALA

Performance-Based Assessment of Radiology Practitioners: Promoting Improvement in Accordance with the 2007 Joint Commission Standards

T he National Health Service (NHS) introduced the first

The Impact of Medicaid Primary Care Payment Increases in Washington State

Nazan Yelkikalan, PhD Elif Yuzuak, MA Canakkale Onsekiz Mart University, Biga, Turkey

What constitutes continuity of care in schizophrenia, and is it related to outcomes? Discuss. Alastair Macdonald

McGill University. Academic Pediatrics Fellowship Program. Program Description And Learning Objectives

University of Massachusetts-Dartmouth College of Nursing. Final Project Report, July 31, 2015

The new chronic psychiatric population

16 th Annual National Report Card on Health Care

HIGH SCHOOL STUDENTS VIEWS ON FREE ENTERPRISE AND ENTREPRENEURSHIP. A comparison of Chinese and American students 2014

kaiser medicaid uninsured commission on

Vascular surgeons' resource use at a university hospital related to diagnostic-related group and source of admission

American College of Rheumatology Fellowship Curriculum

PERCEPTIONS OF CLINICAL PLACEMENT EXPERIENCE AMONG DIPLOMA NURSING STUDENTS

Background and Issues. Aim of the Workshop Analysis Of Effectiveness And Costeffectiveness. Outline. Defining a Registry

EXPERIENTIAL EDUCATION Medication Therapy Management Services Provided by Student Pharmacists

Demographic Profile of the Officer, Enlisted, and Warrant Officer Populations of the National Guard September 2008 Snapshot

CLOSING THE DIVIDE: HOW MEDICAL HOMES PROMOTE EQUITY IN HEALTH CARE

Executive Summary. This Project

Title Student and Registered Nursing Staff's Perceptions of 12- Hour Clinical Rotations in an Undergraduate Baccalaureate Nursing Program

Models of Support in the Teacher Induction Scheme in Scotland: The Views of Head Teachers and Supporters

Issue Brief From The University of Memphis Methodist Le Bonheur Center for Healthcare Economics

Educational Needs of Community Health Nursing Supervisors Sonia A. Duffy, M.S., R.N., and Nancy Fairchild, M.S., R.N.

Doctor Patient Gender Concordance and Patient Satisfaction in Interpreter-Mediated Consultations: An Exploratory Study

Supplemental materials for:

EVIDENCED BASED PRACTICE

NURSES PROFESSIONAL SELF- IMAGE: THE DEVELOPMENT OF A SCORE. Joumana S. Yeretzian, M.S. Rima Sassine Kazan, inf. Ph.D Claire Zablit, inf.

A Quality Improvement Project on the Use of the I-PASS System in Written Physician Hand-Off Documents and Reduction in Unexpected Events

Physiotherapy outpatient services survey 2012

Transcription:

Measuring Attending Physician Performance in a General Medicine Outpatient Clinic Rodney A. Hayward, MD, Brent C. Williams, MD, MPH, Larry D. Gruppen, PhD, David Rosenbaum, BA OBJECTIVE: To determine which aspects of outpatient attending physician performance (e.g., clinical ability, teaching ability, interpersonal conduct) were measurable and separable by resident report. DESIGN: Self-administered evaluation form. SETTING: University internal medicine resident continuity clinic. PARTICIPANTS: All residents with their continuity clinic at the university hospital evaluated the two attentiing$ who staffed their clinic for the academic years of 1990--1991, 1991-- 1992, and 1992--1993 (average of 85 total residents per year). The overall response rate was 74% ANALYSIS" Exploratory analyses were conducted on a preliminary evaluation form in the f'trst two years of the study (236 evaluations of 20 different clinic attcndings) and con- FLrmatory analyses using factor analysis and generalizabuity analysis were performed on the third year's data (142 evaluations of 15 different clinic attendings). Analysis of variance was used to evaluate factors associated with evaluation scores. RESULTS: Analyses demonstrated that the residents did not distinguish between the attendings' clinical and teaching abilities, resulting in a single four-item scale that was named the Clinical/Teaching Excellence Scale, measured on a fivepoint scale from poor to outstanding (Cronbach's alpha = 0.92). A large amount of the variance for this scale score was associated with attending identity (adjusted R 2 ----- 46% ). However, two alternative approaches to evaluating the performance of the attending (preference for him or her to the "average" attending and perceived impact of the attending on residents' clinical skills) did not provide useful information independent of the Clinical/Teaching Excellence Scale. The ratings of three separate conduct scales [availability in Received from the Division of General Medicine, Department of Internal Medicine (RAH, BCW), Department of Health Services Management and Policy (RAH), and General Medicine Outpatient Service, Primary Care Education Programs {BCW). and the Office of Educational Resources and Research. Department of Postgraduate Medicine and Health PrQfessions Education (LDG), University of Michigan, Ann Arbor, Michigan; and the New York University Medical School (DR). New York, New York. Presented in part at the annual meeting of the Society of General Internal Medicine. Washington, DC, April 29. 1994. Address correspondence and reprint requests to Dr. Hayward: University of Michigan Medical Center. Division of General Medicine, 3116 Taubman Center, Ann Arbor, MI 48109-0376. 504 clinic (Availability Scale), treating residents and patients with respect (Respect Scale), and time efficiency in staffing cases (Slow Staff'mg Scale)] were separable from each other and from the rating of clinical/teaching excellence. For the Clinical/Teaching Excellent Scale, as few as four evaluations produced good interrater reliability and eight evaluations produced excellent reliability (reliability coefficients were 0.70 and 0.84, respectively). CONCLUSIONS: Although this evaluation instrument for measuring clinic attending performance must be considered preliminary, this study suggests that relatively few attending evaluations are required to reliably profile an individual attending's performance, that attending identity is associated with a large amount of the scale score variation, and that special issues of attending performance more relevant to the outpatient setting than the inpatient setting (availability in clinic and sensitivity to time efficiency) should be considered when evaluating clinic attending performance. KEY WORDS" medical education; clinical teaching; ambulatory teaching; internship and residency; internal medicine; performance evaluation; residency training. J GEN INTERN MED 1995;10:504-510. A S the focus of medical decision making and management has increasingly shifted from the inpatient to the outpatient setting, internal medicine programs have been hard-pressed to shift the focus of internal medicine teaching to the outpatient setting to keep up with this changing world. The growing mandate to train primary care physicians will only accentuate this problem. In many ways, patient-based teaching is inherently easier in the inpatient setting. Trainees work as part of a team, patients are usually available for bedside teaching for many days, and the time frame by which clinical information and consultation are obtained and patient status changes allows for a degree of longitudinal perspective of patient care within a short time period, In contrast, most patient-based ambulatory teaching occurs in a busy clinic with residents working individually as patients quickly come and go and appreciation for the longitudinal nature of management often takes months, if not years, of observation. Clinicians and educators may believe that the latter is a more relevant longitudinal perspective of medicine, in terms of both teaching and medical care, but it nonetheless presents some difficulties in clinical teaching, j- 2 As we meet the challenge of improving ambulatory education, changes in program structure and content

JGIM Volume 10, September 1995 505 and in faculty development will need to be implemented and evaluated. One component of this evaluative process will be residents' perceptions of attending performance. Such evaluations will be important in identifying areas of teaching that need to be improved, to help evaluate faculty development and curriculum enrichment interventions, and to document teaching excellence, or lack thereof, for promotion and tenure decisions. ~. 3-~ But for attending evaluations to be used optimally and be appropriately interpreted by those being evaluated, we have to understand what specific information can be accurately obtained by resident report. Extensive work on medical student and resident evaluations of inpatient attendings has been conducted, and this work has produced reliable evaluation tools. 6 1 Most of this work suggests that trainees tend to separate out two major domains when evaluating attendings: teaching content and interpersonal skills/behavior. Although recent research suggests that an inpatient attending evaluation form can be used to reliably evaluate clinic attendings,12 la we were unable to find any literature discussing a systematic reappraisal of which domains can be measured in the outpatient setting. In particular, we were interested in the importance of issues regarding time management, availability, and efficiency, issues perhaps more salient to attending duties in a busy, scheduled clinic than on inpatient wards. When we decided to more formally evaluate our faculty's outpatient teaching, we were struck by the variety of evaluation instruments that were being used at different institutions, by the various lengths and breadths of the instruments, and by the dearth of information about the scaling and duplicity of these instruments for evaluating outpatient attending performance. Ideally, we want to evaluate all important components of outpatient teaching. How good are attendings at teaching history taking, physical examination, and developing differential diagnoses and management strategies; do attendings address issues concerning interpersonal, ethical, or humanistic behavior; and are they effective at improving residents' time management skills and their ability to deal with difficult or demanding patients? Perhaps the intensive, year-long relationships that many residents have with their clinic attendings allows for more detailed appraisal of their attendings' teaching performance. However, the "halo effect" may be so strong that residents develop one global rating of attendings as either good, moderate, or bad and, therefore, only a single overall measure of attending performance is obtainable by resident report. ~3 In addition to knowing what attributes of attending performance are discernible and separable, other important questions remain. How stable are ratings from year to year? Does it make a difference whether attending physician performance is rated in absolute terms (e.g., "excellent," "fair," "poor"), relative to each other (e.g., "prefer to the average clinic attending"), or with regard to impact on residents' skills (e.g., "greatly improved my ambulatory care skills")? This information is important not only to help clinic directors monitor the teaching that is occurring in their clinics, but to help all physicians doing outpatient teaching interpret the meaning of the teaching evaluations that they receive. To help answer some of these questions, we conducted a three-year investigation that evaluated outpatient attending performance through resident survey at a large university outpatient hospital-based general medicine clinic. Residents evaluated attending physicians with whom they had worked one half-day a week during the preceding year. We solicited information about a variety of relevant attending attributes and used several alternative response scales. Study Population METHODS Internal medical residents evaluated clinic attendings who staffed their continuity care clinic during the preceding year. The survey was completed over three academic years, 1990-1991, 1991-1992, and 1992-1993; an average of 85 residents were in these clinics each year. Preliminary scales were developed based on exploratory analyses of 236 evaluations of 20 different attendings during the first two years of the study and confirmed using 142 evaluations of 15 different attendings in year three. All clinic attendings are general internists and had spent a minimum of one year staffing the residents' continuity clinic one half-day per week (seven or eight residents per half-day clinic). Each clinic half-day was staffed by the same two attendings throughout the year, although they had occasional exposure to other attendings who covered the clinic during absences of the usual attendings. The clinic protocols require that all cases be presented to ("staffed by") an attending physician prior to the patient's leaving the clinic. Survey Methods All internal medicine residents who had a one-halfday-per-week continuity clinic at the university hospital were sent a self-administered evaluation form. The residents were promised anonymity; therefore, demographic information about the residents was not collected (the small size of the clinics would allow frequent identification of residents based on age and gender alone}. However, return envelopes were coded to track who had returned the evaluation forms. If their evaluations had not been received, the residents were sent a reminder letter after one to two weeks. Two to three weeks thereafter, the nonrespondents were sent another evaluation form to complete. The overall response rate was 74%; 69% during the first two years of the study and 83% in year three. After year one of the study, the clinic at-

506 Hayward et al.. Clinic Attending Performance JGIM Table I Scale Scores and Interitem Reliabilities Number Mean of Scale Score* Scale Items + SD Clinical/teaching scales Clinical/Teaching Excellence 4 78 _+ 20 Prefer (to the "average" attending) 2 70 + 25 Impact (on resident's skill) 6 47 + 20 Conduct scales Respect (for patients and colleagues) 3 Availability (in clinic) 2 Slow Staffing (of cases) 1 *Scores standardized to 100-point scale. Single survey item. Interitem Reliability [Cronbach's Alpha) 0.92 0.79 0.93 91 -+ 17 0.71 66 + 27 0.91 78 + 27 ~" tendings received detailed feedback on their teaching performance for the first time. Evaluation Instrument We first reviewed previously developed inpatient attending and faculty development evaluation forms. Selected items were then distributed to all clinic faculty for their review. The faculty were asked to consider the many tasks, responsibilities, and goals of clinic attendings and to think of ways to evaluate these activities. Written responses of the faculty and results of a group discussion concerning the preliminary items led to modifications and additions. Areas the faculty identified as important in the evaluation were: 1} availability; 21 conscientiousness about clinic duties; 3) giving residents enough assistance with patient management without needlessly getting the residents behind schedule; 4) humanistic treatment of patients and residents; 5) respect of and sensitivity to issues of race, ethnicity, and gender; 6) commitment to and enthusiasm for teaching; 7) teaching ability; 8) perceived impact on a host of resident skills; 9) clinical knowledge and skills; and 10) feedback on resident performance. The attendings' performance was evaluated using three alternative response scales; 1 ) direct assessment of performance on a fivepoint scale ranging from poor to outstanding, 2) evaluating the perceived impact of the attending on various clinical abilities of the residents on a five-point scale, and 3) measuring residents' preferences for presenting to the attending as opposed to a hypothetical "average clinic attending" on a Likert-like five-point scale ranging from "'definitely prefer Dr. Blue" to "definitely prefer average clinic attending." All scales were standardized to range from 0 to 100, for purposes of comparison. Appendix A details these response scales. Based on exploratory analyses and feedback from residents and faculty, we made substantial changes in the evaluation form after year one ( 1991-1992 academic year), and then made some minor changes after year two. Therefore, year three is considered the validation year and all analyses of scale reliability were performed on these data (n = 142 evaluations, 15 different clinic attendings). However, the Clinical/Teaching Excellence Scale did not change over the three-year period and was therefore used to evaluate temporal changes in attendings' ratings by year. Data Analysis Development and evaluation of scales were conducted in two stages. First, we evaluated items directed at clinical and teaching ability using the three alternative response scales discussed above. Factor analysis, reliability analysis, and evaluation of independent contribution of preliminary scales at distinguishing between attendings were conducted as outlined below. After determining which items produced separable, useful information concerning clinical and teaching ability, we then repeated the steps of factor and reliability analysis and analysis of scale interactions of these items plus those items evaluating various aspects of conduct in carrying out clinic duties (availability, conscientiousness about clinic duties, and efficiency in staffing case). Factor analysis, using Promax (oblique) rotation and scree criteria for retaining factors, ~4-1~ was used to help identify separable conceptual domains. Scales were constructed and named, and Cronbach's alphas were computed to determine internal consistency. Analysis of variance (ANOVA) was used to determine how much variance in ratings was attributable to the specific attendings being evaluated (i.e., attending identity). To determine whether the scales produced independent measures of physician performance, these ANOVAs were performed before and after adjusting each scale for its intercorrelations with other scales, using residuals from multiple linear regression analyses. 16. 17 We were also interested in determining the number of evaluations required to achieve stable estimates of attending performance. We performed an ANOVA of residents nested within attending pairs crossed by items, is This analysis produced variance components that enabled us to calculate generalizability coefficients for the houseofficers" ratings of attendings. These generalizability coefficients can be understood in the same way as can reliability coefficients. As a confirmation, we also calculated group-averaged scores from jack-knifed, group effects models using Kish's roh and the Spearman-Brown prophecy formula.16 19 The effects of the year of the survey and the level of resident training on attending evaluation scores were

]GIM Volume 10, September 1995 507 evaluated using ANOVA. There was an insufficient number of attendings to allow analyses by attending characteristics (i.e., age or gender). For attendings who had ten or more evaluations for more than one year, ANOVA was used to evaluate for trends to be rated higher or lower in different years, thus testing temporal trends for attendings to improve or decline in performance {a Bonferroni correction was made to adjust for multiple comparisons). RESULTS Details on the scoring of the scales, the individual items for each scale, and the results of factor analyses are given in Appendix A. Analyses for items measuring attendings' clinical and teaching ability produced three scales: 1) the Clinical/Teaching Excellence Scale, 2) the Impact Scale, and 3) the Prefer Scale (Table 1 and Appendix A). Two conduct scales were identified by initial factor analyses, which we named the Respect Scale and the Availability Scale. In addition, a single item about taking"excessive time to staffcases" did not weigh heavily on any factor, had a high uniqueness value (0.7), and represented an issue that, in our experience, has high resident salience [all cases must be presented to an attending physician ("staffed") before patients can be discharged from the clinic]. We labeled this item the Slow Staffing Scale. Summary statistics on the mean score for all attendings and scale reliabilities are shown in Table 1. Of note, the residents did not distinguish between the attendings" clinical ability and teaching ability, resulting in a single scale (ClinicalYYeaching Excellence Scale) with high reliability. In addition, questions about providing feedback to residents and enthusiasm for teaching, items included in commonly used inpatient instruments, did not add any useful information independent of the ClinicalYreaching Excellence Scale in years one and two of the study and were dropped from further consideration. There were moderate intercorrelations between all scales (ranging from 0.30 to 0.58). To evaluate whether these scales were duplicative or whether they gave independent information useful in evaluating attending performance, we performed analyses before and after adjusting for correlations between different scales. First, we evaluated whether each of the three teaching scales helped distinguish between attendings (Table 2). Although a sizable percentage of variance in each unadjusted scale score was associated with attending identity, the Prefer Scale and the Impact Scale provided little or no information useful in evaluating attendings independent of the Clinicagreaching Excellence Scale (Table 2) and were therefore dropped from subsequent analyses. Next, we evaluated possible redundancies of the three conduct scales (Table 3). The Respect Scale, the Availability Scale, and the Slow Staffing Scale all contributed information associated with specific attendings that was Table 2 Amount of Variance in Teaching Scale Scores Associated with Attending Physician Identity Scale Variance Associated with Attendings' Identity [Adjusted R 2) Unadjusted for Other Scale Scores After Adjusting for Other Scale Scores* ClinicalYI'eaching Excellence 46% 23%+ Prefer 18%+ 6% Impact 19 % + 2 % *Tests whether the attendings being evaluated (attending identity) are associated with variations in each scale's scores independent of their scores on other scales listed in the table. p < 0.001. Table 3 Amount of Variance in the Clinical/Teaching Excellence Scale and Conduct Scales Associated with Attending Physician Identity Scale Variance Associated with Attendings' Identity [Adjusted R 2] Unadjusted for Other Scale Scores After Adjusting for Other Scale Scores* Clinical/Teaching Excellence 46 % + 34 % + Respect 25 % + 29 % Availability 44 % + 23 % Slow Staffing 20 % + 9 %# *Tests whether the attendings being evaluated (attending identity) are associated with variations in each scale's scores independent of the scores on other scales listed in the table. +p < 0.001.,~p < 0.05. independent from information provided by the ClinicalYI'eaching Excellence Scale rating and from that provided by each other (Table 3). We therefore identified four scales with excellent interitem reliability that provided independent information about attending performance; however, how many evaluators are needed to obtain a stable estimate of attending performance? The following generalizability coefficients would be achieved using four, eight, 12, and 16 evaluations: for the ClinicaVreaching Excellence Scale, 0.70, 0.84, 0.89, and 0.91; for the Respect Scale, 0.40, 0.69, 0.77, and 0.82; for the Availability Scale, 0.76, 0.87, 0.91, and 0.93; and for the Slow Staffing Scale, 0.49, 0.68, 0.76, and 0.81. Not surprisingly, simply evaluating the attendings and providing feedback did not substantially improve outpatient teaching as rated by the residents, a' 12 Indeed, ANOVA failed to show any association between year and scale results, suggesting that no overall improvement in rating occurred after the evaluations were

508 Hayward et al., Clinic Attending Performance JG1M first reported back to the attendings. Of the eight attendings who had more than ten evaluations in multiple years, only one showed statistically significant improvement in teaching after feedback of his or her evaluation results (from a mean Clinical/Teaching Excellence Scale rating of 59 + 14 to a score of 78 + 7, p < 0.001, which is statistically significant even after the Bonferroni correction). Of course, it cannot be determined whether this improvement would have occurred independent of feedback of his or her evaluation results. No attending had a statistically significant decline in his or her evaluations over the three-year study period. DISCUSSION This paper reports a preliminary attempt to develop a comprehensive evaluation instrument for measuring clinic attending performance using resident reports, but also provides useful information for all those involved in ambulatory medical education. Not only is it important for those being evaluated to understand the uses and limitations of their evaluations, but the results also give insights into how residents view their outpatient teachers and highlight issues of particular importance to them. We evaluated three potential response scales for evaluating attending teaching and clinical performance and found that, at least for the items included in our evaluation instrument, using a traditional response scale (a five-point scale from poor to outstanding) was superior to asking residents to compare their attending with a hypothetical "average clinic attending" or to rate the attendings' impact on their clinical skills. In contrast with most previous research that reports finding only two measurable domains from resident and medical student evaluations of attendings (clinical/teaching skill and interpersonal conduct }, 1o- 12 we found three conduct scales (Availability, Respect, and Slow Staffing) whose ratings were separable from each other and from the rating of the Clinical/Teaching Excellence Scale. Our results may be due to attendings' greater impact on time-efficient management of cases in the outpatient setting as compared with the inpatient setting, at least in some clinic settings. Even if future work supports the presence of four separable domains, this study suggests that they can be measured reliably with relatively few questions (as few as ten to 12 items). Several other aspects of the study are notable for their contrast with what has been reported for inpatient evaluations. As others have reported, 12. ~a good reliability for a clinical and teaching skills scale can be achieved with as few as six to eight evaluations generalizability coefficient >0.80), but we found that somewhat more evaluators are required for some conduct scales. Still, in all instances, fewer evaluations were needed than the 20 or more required for evaluation of inpatient attending conduct.13 As others have noted, the reliability and ratings of evaluations can vary depending on the amount of contact trainees have with attendings. Therefore, the interrater reliabilities in this study may have been aided by the more uniform, intensive trainee-faculty contact. Although some previous results have suggested that residents consistently rate faculty members higher than do medical students,12 we did not find that evaluations differed significantly by the year of training within resi- dency. Although this study produces useful results that should aid internal medicine postgraduate education programs' efforts to evaluate their clinical teaching in the outpatient setting, the evaluation instrument described in this paper should be considered a starting point and not a finished tool that can be used in any clinic setting. The evaluation instrument was developed in a single training program and its generalizability is not ensured. Indeed, we would expect the psychometrics of this instrument to vary with the structure and organization of the teaching clinic. Some scales may not be relevant in all settings. For example, although the Slow Staffing Scale did distinguish attendings independent of the other scales, it had borderline parameters in the factor analysis, is a single item, and should be explored further in future research. It may not be relevant in some clinic settings in which the level, frequency, and requirements for presenting cases to attendings makes this, and perhaps the Availability Scale, a less salient factor than it was in our clinic (not all clinics require residents to present all cases prior to patients' leaving the clinic). Similarly, although we found that the residents did not distinguish between the attendings' clinical and teaching abilities, their enthusiasm and commitment to teaching, and their feedback on resident performance, this could also vary in other clinic settings. These questions can be answered only by further research in a variety of teaching clinic settings, whose organizations, cultures, and rules can be quite diverse. What are the lessons to be learned from these findings? Clinic attendings can be evaluated reliably with relatively few resident evaluators, which should further encourage faculty to implement evaluation programs and to attempt to further delineate the aspects of clinic attending performance that are separable and identifiable in various clinic settings. Such evaluations can prove useful to provide feedback, hut should not be expected to improve teaching performance without other intervention.4. ~2 However, evaluations can also be used to target and evaluate faculty development programs and to grade attending performance for promotion and tenure considerations. This study should also serve as a reminder that specific aspects of clinical teaching (e.g., teaching history and physical, differential diagnosis, time management skills) are not usually discernible by trainee report, probably due, in part, to a strong halo effect.l -12 To address more detailed particulars of attending performance, alternative approaches to teaching assessment and improvement will be necessary.

JGIM Volume 10, September 1995 509 The authors thank Clare Weipert and Linda Buecken for typing the manuscript and Matthew Pillsbury for data entry and management. They also thank an anonymous reviewer for suggestions regarding a previous version of the manuscript and Judy Shea for review of the manuscript and her suggestions concerning the statistical analysis. REFERENCES I. Perkoff GT. Teaching clinical medicine in the ambulatory setting: an idea whose time has finally come. N Engl J Med. 1986:814:27-31. 2. Howell JD, Lurie N, Woolliscroft JO. Worlds apart: some thoughts to be delivered to house staff on the first day of clinic. JAMA. 1987; 258:502-3. 3. SGIM Council. Guidelines for promotion of clinical teachers: draft policy statement. SG[M News. 1993;Nov:7-14. 4. Whitman N, Schwenk T. Faculty evaluation as a means of faculty development. J Fam Pract. 1982; 14:1097- l01. 5. Smith LG. The development of an evaluation system for house staff and attendings. J Med Soc N Jersey. 1974:71:685-7. 6. Irby D, Rakestraw P. Evaluating clinical teaching in medicine. J Med Educ. 1981:56:181-6. 7. Ramsey PG, Gillmore GM, Irby DM. Evaluating clinical teaching in the medicine clerkship: relationship of instructor experience and training setting to ratings of teaching effectiveness. J Gen Intern Med. 1988;3:351-5. 8. Tortolani AJ, Rtsucci DA, Rosati RJ. Resident evaluation of surgical faculty. J Surg Res. 1991;51:186-91. 9. Downing SM. English DC, Dean RE. Resident ratings of surgical faculty: improved teaching effectiveness through feedback. Am Surg. 1983;49:329-32. 10. [rby DM, Gillmore GM, Ramsey PG. Factors affecting ratings of clinical teachers by medical students and residents. J Med Educ. 1987; 62:1-7. 11. Donnelly MB, Woolliscroft JO. Evaluation of clinical instructors by third-year medical students. Acad IVied. 1989:64:159-64. 12. McLeod Pj. James CA, Abrahamowicz M. Clinical tutor evaluation: a 5-year study by students on an in-patient service and residents in an ambulatory care clinic. Med Educ. 1993;27:48-54. 13. Ramsbottom-Lucier MT, Gillmore GM, lrby DM, Ramsey PG. Evaluation of clinical teaching by general internal medicine faculty in outpatient and inpatient settings. Acad Med. 1994;69:152-4. 14. Kim J. Factor Analysis: Statistical Methods and Practical Issues. Beverly Hills, CA: Sage Publications, 1978. 15. Holzinger KJ. Harman HH. Factor Analysts: A Synthesis of Factorial Methods. Chicago, IL: University of Chicago Press, 1941. 16. Nunnally JC. Psychometric Theory. 2nd ed. New York: McGraw-HilL 1978. 17. Stata Corporation. Stata Reference Manual: Release 3.1. 6th ed. College Station, TX: Stata Corporation, 1993. 18. Brennan RL. Elements of Generalizability Theory. Iowa City, IA: American College Testing Publications, 1983. 19. Ebel RL. Estimation of the reliabili ty of ratings. Psycbometrika. 1951; 16:407-24. APPENDIX A Scale Items and Factor Analyses A. Scale Items All scales were standardized to 0-100-point scales. A higher score is better for all scales. The evaluation form was per- sonalized for each attending being evaluated and the below items present the wording for a hypothetical Dr. Maize N. Blue. I. ClinicaKreaching Excellence Scale (4 items) (Response scale: 1 = poor, 2 = fair, 3 = good, 4 = excellent, 5 = outstanding) a. Please rate Dr. Blue'sgeneralknowledgeofoutpatient medicine. b. Please rate Dr. Blue's clinical judgment. c. Please rate Dr. Blue's effectiveness as a teacher. d. Please rate Dr. Blue's suitability as a role model. 2. Prefer Scale (2 items) (Response scale: I = definitely prefer the "'average" at- tending, 2 = somewhat prefer the "average" attending, 3 = have no preference, 4 = somewhat prefer Dr. Blue, 5 = definitely prefer Dr. Blue) a. Compared with the average clinic attending, when confronted with a patient with a challenging diagnostic problem, I would: b. Compared with the average clinic attending, when confronted with a patient with a challenging psycho- social problem, I would: 3. 4. Impact Scale (6 items)* (Response scale: I = no increase, 2 = small increase, 3 = moderate increase, 4 = large increase, 5 = extremely large increase) Rate the impact of Dr. Blue on your: a. History-taking skills. b. Ability to do a physical examination. c. Ability to present a case. d. Ability to effectively communicate with patients. e. Ability to order tests judiciously. f. Ability to manage your time in clinic efficiently. Respect Scale (3 items) (Response scale: I = strongly disagree, 2 = somewhat disagree, 3 = unsure, 4 = somewhat agree, 5 = strongly agree) *Two additional impact items were evaluated ("ability to develop a differential diagnosis for outpatient problems" and "competence in outpatient medicine"), but in the factor analysts they had moderate weights on the factor for clinical/teaching excellence and the factor for impact. Therefore, these items were left out of the final Impact Scale. However, alternative analyses performed with an eight-item Impact Scale including these two items did not appreciably affect any of the results reported in the tables of thts paper, except that the Impact and Clinical/ Teaching Effectiveness Scales were slightly more intercorrelated {0.64 vs 0.58] in the alternative analysis.

510 Hayward et al., Clinic Attending Performance ]GIM During the course of the past year, Dr. Blue: a. Treated me with respect. b. Was sensitive to issues of race, culture, and ethnicity. c. Treated men and women physicians with equal respect. 5. Availability Scale (2 items) (Response scale: I = strongly disagree, 2 = somewhat disagree, 3 = unsure, 4 = somewhat agree, 5 = strongly agree) Duirng the course of the past year, Dr. Blue: a. Avoided interruptions. b. Was conscientious about being present and available in clinic. 6. Slow Staffing Scale (1 item) (Response scale: 1 = strongly disagree, 2 - somewhat disagree, 3 = unsure, 4 = somewhat agree, 5 = strongly agree) a. During the course of the past year, Dr. Blue took excessive time to staff cases. ( - ) B. Factor Analyses (Promax rotation with scree criteria for retaining factors) +Although three teaching scales were separable by factor analyses with Promax rotation, only the Clinical/Teaching Excellence Scale produced independent information in evaluating attendings (Table 3), and therefore the items from the other two teaching scales (Prefer and Impact) were dropped from subsequent analyses. 1. For teaching and clinical items on three alternative re- sponse scales Factor Loadings Survey Item I 2 3 la 0.88 0.04-0.07 Ib 0.80-0.09 0.01 1 c 0.79 - O. 12 0.04 ld 0.77-0.02 0.21 2a 0.25-0.76-0.15 2b 0.01-0.83-0.02 2c 0.01-0.88 0.14 2d - O. 15-0.77 0.22 2e O. 15-0.63 O. 12 2f O. 04-0.69 O. 13 3a 0.33 0.08 0.61 3b 0.01 0.01 0.74 2. For conduct items and items from Clinical/Teaching Excellence Scale (Factor 1 above)+ Factor Loading Survey Item t 2 3 la 0.93 0.17-0.06 lb 0.82-0.05 0.02 lc 0.78-0.10 0.13 ld 0.75-0.33 0.01 4a 0.11-0.72 0.04 4b - 0.07-0.59 0.04 4c - 0.05-0.65 0.07 5a 0.00 - O. 12 0.73 5b 0.09 O. 10 0.74