Knowledge Discovery in Databases: Improving Quality in Homecare Bonnie L. Westra, PhD, RN, Assistant Professor University of Minnesota, School of Nursing An educational update to the HIMSS Management Engineering Performance Improvement Task Force June 17, 2008 1
Acknowledgments Co-Investigators Kay Savik, MS John H. Holmes, PhD Cristina Oancea, MS, PhD Student (RA) Lynn Choromanski, MS, RN, PhD Student (RA) Mary Dierich, MS, RN, PhD Student Industrial Partners CareFacts Information Systems CHAMP Software Deb Solomon, RN, MS, Home Caring & Hospice (consultant) Funding University of Minnesota Digital Technology Initiative Grant, UMN-Grant-In-Aide, NIH Health Trajectory P20 Grant
Objectives Describe current homecare research using EHR data Demonstrate a series of steps in comparing traditional statistical analytic methods with knowledge discovery methods (data mining) Examine lessons learned with the use of EHR data quality improvement Explore the use of KDD for future research
Problem Increasing homecare/ community-based care Annual expenditure in 2005 of $47.5 billion 2000 CMS implemented PPS for Medicare patients Concern about decrease in service/ visits on outcomes First study - 28% hospitalization rate nationally remained constant Limited research on ways to reduce hospitalization
Research Aims The purpose of the first study was to develop predictive models for risk factors associated with increased likelihood of hospitalization of homecare patients and discover if interventions documented as part of routine care using the Omaha System influence hospitalization. Use knowledge discovery in databases combined with traditional statistics. Reported here is the first models using traditional statistics.
Design/ Sample Secondary analysis of EHR data OASIS and Omaha System interventions from two different EHR systems and 15 homecare agencies. Data included All patients in 2004 receiving homecare services with a minimum of two OASIS records for the start and end of an episode of care and who also had Omaha System interventions.
* * KDD Process * * * Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. Advances in knowledge discovery and data mining. Menlo Park, CA: AAI Press/ The MIT Press Press; 1996.
Expertise Required Clinical expert What data are collected, when, why, and how Interpretation of the data Meaningful decisions throughout the process Information system knowledge - specifying requirements What data are available Similarity across agencies and vendors Data base issues how the data are stored Data analysis Statistical knowledge Data mining knowledge Clinical validation throughout the process
OASIS Data
Omaha System ENVIRONMENTAL OMAHA SYSTEM PROBLEMS 22 - Dentition 1 - Income 23 - Cognition 2 - Sanitation 24 - Pain 3 - Residence 25 - Consciousness 4 - Neighborhood/workplace safety 26 - Integument 27 - Neuro-musculo-skeletal 5 - Other function PSYCHSOCIAL 28 - Respiration 6 - Communication with community resources 29 - Circulation 7 - Social contact 30 - Digestion-hydration 8 - Role change 31 - Bowel function 9 - Interpersonal relationship 32 - Genito-urinary function 10 - Spiritual distress 33 - Antepartum/postpartum 11 - Grief 34 - Other 12 - Emotional stability HEATH RELATED BEHAVIORS 13 - Human sexuality 35 - Nutrition 14 - Caretaking/parenting 36 - Sleep and rest patterns 15 - Neglected child/adult 37 - Physical activity 16 - Abused child/adult 38 - Personal hygiene 17 - Growth and development 39 - Substance use 18 - Other 40 - Family planning PHYSIOLOGICAL 41 - Health care supervision 19 - Hearing 42 - Prescribed medication regimen 20 - Vision 43 - Technical procedure 21 - Speech and language 44 - Other
Analyses Traditional statistical analyses Frequencies, descriptive, histograms Chi square/ bivariate association Latent class analysis Logistic regression analysis Future - Data mining techniques Visualization Feature selection Decision trees Clustering
Preprocessing 18,067 OASIS records for 3,199 patients Missing data Duplicate records Invalid values 989,772 Omaha System Interventions Missing data Matched patients with OASIS and Omaha System Data 65,000 Medication records
Data Preparation Preparation cleaning data Missing values Duplicate records Out of range values Grouping data into episodes of care
Unit of Analysis
2,806 patients - 4,242 Episodes Episodes Death, 1.7% Continue, 10.9% Discharge, 48.8% Transfer, 38.6%
Transformation Summative scales Prognosis, Pain, Pressure Ulcers, Stasis Ulcers, Surgical Wounds, Respiratory Status, ADLs, IADLs Clinical Classification Software Primary diagnoses and then reduced into 51 smaller groups within 11 major categories Charlson Index of Comorbidity Additional medical diagnoses Interventions Theoretically grouped into 23 categories Created dummy variables For non-normally distributed data
11 Groups Primary Diagnoses Categories 51 Clinical Classification Software Groups 260 Primary Diagnoses ICD 9 codes ~13,000
Clinical Classification Software Grouping CCS Categories Descriptors Cardiac and Other Circulation Diseases 24 97, 98, 99, 111, 112, 113, 117, 120, 121 Hypertension & other circulatory diseases 25 100, 101, 102 Myocardial infarction 26 103, 104, 96, 213, 245 Other heart disease 27 105, 106 Conduction 28 108 Congestive Heart Failure; NONHP 29 109, 110 Acute cerebrovascular disease 30 114, 116, 118, 119 Peripheral atherosclerosis 31 115 Aneurysm
Applying a Clusterer: Identifying similarities and dissimilarities
Data Analysis Latent class analysis ADL Scale (M0640 M0710) Who Provides Assistance (M0350) Management of medications (M0780) Diagnosis group (M0230 CCS Groups) Logistic regression Create models for predictors of hospitalization - OASIS Added interventions Omaha System Interventions
Demographics 2,806 patients Mean age 74.4 (SD = 14.1) 64.6% Females 97.9% White 4,242 Episodes Length of stay ranged from 1-6,354 days (Median = 38 days) 48.8% discharged 38.6% transfer to inpatient setting 1,620 (38.4%) hospitalized 29.9% continued with care 1.7% died
Demographics Primary diagnoses (most frequent) 18.8% 18.1% 9.1% 7.3% 2.3% cardiac and circulatory diseases orthopedic/ trauma surgery and follow up endocrine and nutrition respiratory problems infectious diseases Charlson Index of Comorbidity 0 10 with a mean of.58 (SD = 1.32) Interventions (384,081) 62.5% 44.9% 30.2% 16.0% monitoring teaching treatments case management
Class I: Functionally Impaired Risk Factors Risk of Hospitalization Assistance with IADLs 1.5 2.3 Expected Prognosis 1.9 2.2 Charlson Index 2.6 3.3 Medicare as homecare payor 2.0 2.3
Significant Interventions Class I: Functionally Impaired Significant Interventions Variable Frequency OR Monitoring Injury Prevention Moderate 1.7
Risk Factors Class III: Cardiac/ Circulatory Risk of Hospitalization IADL Status: 1.5 2.3 Expected Prognosis: 1.6 1.8 Pain 1.9 2.2 Charlson Index 2.1 2.6 Bowel Incontinence 2.0 Patient equipment 3.9
Significant Interventions Class III: Cardiac/ Circulatory Significant Interventions Variable Frequency OR Teaching Disease Treatment Moderate.50 Providing Medication Treatment Low 1.9 Teaching Disease Treatment High 3.0
Interpreting Results Who Interprets Nurses on research team Homecare clinical manager Broader homecare audience What were they asked? Latent Classes are they meaningful? Within class predictors What does it mean to have bowel incontinence as a predictor of hospitalization? Across classes: most consistent predictors of hospitalization are Charlson Index of Comorbidity, Prognosis Medicare Patient management of equipment IADLs
Discussion Homecare patients are heterogeneous in needs latent class analysis was useful ADLs, management or oral medications, caregiver assistance, and primary diagnoses Differences between classes Similarities across classes Most consistent predictors of hospitalization are Charlson Index of Comorbidity, prognosis, Medicare, patient management of equipment, and IADLs The addition of interventions to the predictive models for hospitalization modified some predictors - Injury prevention Some interventions were risk factors, others were protective
Is There a Better Way? Use KDD methods How are they similar or different? What can we learn compared with traditional statistical analyses? What are the strengths and weaknesses?
Definition Knowledge discovery in databases (KDD) Rigorous analytic approach Combines traditional statistical concepts with semi-automated analyses Uses tools from the statistical and machine learning Inductive, data driven approach to analyze large, complex datasets Identify patterns in data that could be missed using only traditional analytic methods. Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Second Edition ed. San Francisco: Morgan Kaufmann; 2005.
Traditional Statistics KDD Feature Selection Chi-Square, bivariate Chi-Square InfoGain CFS evaluation Clustering Latent Class K Means EM BestFirst Greedy Stepwise Genetic Predictive Modeling Logistic Regression Decision Trees Bayesian Network
Strengths & Weaknesses Traditional Statistics Well known and accepted Use to discover and test hypotheses Limited by statistical assumptions KDD Newer and treated with suspicion Used for discovery Much more flexible in working with data Requires more interaction in making decisions about data Health care data is temporal and non-retangular
Lessons Learned Health care data are messy audit, Audit, AUDIT!! 80% is data preparation (minimally) Know your data dwell in the data early and often Many decisions made to manage the data each could influence the validity of the results Incorrectly coded data Missing data Data reduction strategies Feature selection cut points Dummy variables cut points
Lessons Learned Walk before you run Phasing in steps with each subsequent study Comparisons between traditional and data mining techniques Both use similar math Difference in assumptions and how data are managed Data mining - discovery Traditional statistics discovery & verification Art and a science
Research in Process Predict outcomes using protective / risk factors (OASIS), interventions (Omaha System) and medication data Hospitalization and emergent care use (DTI) Pressure ulcers and incontinence (P20) Oral medication management/ ambulation (GIA) Clustering of interventions
Bonnie Westra, PhD, RN Assistant Professor & Co-Director ICNP Center University of Minnesota, School of Nursing Robert Wood Johnson, Nurse Executive Fellow 5-140 Weaver-Densford Hall 308 Harvard St. SE Minneapolis, MN 55455 W - 612-625-4470 F - 612-626-3255 westr006@umn.edu
Thank you! For more information, please contact HIMSS Staff Liaison JoAnn W. Klinedinst, CPHIMS, PMP, FHIMSS at jklinedinst@himss.org