MERMAID SERIES: SECONDARY DATA ANALYSIS: TIPS AND TRICKS

Similar documents
Using Secondary Datasets for Research. Learning Objectives. What Do We Mean By Secondary Data?

An Overview of NCQA Relative Resource Use Measures. Today s Agenda

CLINICAL PREDICTORS OF DURATION OF MECHANICAL VENTILATION IN THE ICU. Jessica Spence, BMR(OT), BSc(Med), MD PGY2 Anesthesia

Minority Serving Hospitals and Cancer Surgery Readmissions: A Reason for Concern

Using the New Home Health Agency (HHA) PEPPER to Support Auditing and Monitoring Efforts

Hospital Discharge Data, 2005 From The University of Memphis Methodist Le Bonheur Center for Healthcare Economics

Chapter 2 Provider Responsibilities Unit 5: Specialist Basics

Hospital Strength INDEX Methodology

EuroHOPE: Hospital performance

3M Health Information Systems. 3M Clinical Risk Groups: Measuring risk, managing care

NATIONAL HEALTH INTERVIEW SURVEY QUESTIONNAIRE REDESIGN

2018 MIPS Quality Performance Category Measure Information for the 30-Day All-Cause Hospital Readmission Measure

Background and Issues. Aim of the Workshop Analysis Of Effectiveness And Costeffectiveness. Outline. Defining a Registry

Statistical Analysis Plan

Total Cost of Care Technical Appendix April 2015

Troubleshooting Audio

Scottish Hospital Standardised Mortality Ratio (HSMR)

Quality of Care of Medicare- Medicaid Dual Eligibles with Diabetes. James X. Zhang, PhD, MS The University of Chicago

Basic Concepts of Data Analysis for Community Health Assessment Module 5: Data Available to Public Health Professionals

Improved Clinical Outcomes for Patients Receiving Immunoglobulin Therapy Through Specialty Pharmacy or Home Infusion Services

Provider Peer Grouping Monthly Updates

DPM Sampling, Study Design, and Calculation Methods. Table of Contents

Gill Schierhout 2*, Veronica Matthews 1, Christine Connors 3, Sandra Thompson 4, Ru Kwedza 5, Catherine Kennedy 6 and Ross Bailie 7

Hospital Inpatient Quality Reporting (IQR) Program

CAHPS Hospice Survey Podcast for Hospices Transcript Data Hospices Must Provide to their Survey Vendor

Prior to implementation of the episode groups for use in resource measurement under MACRA, CMS should:

VE-HEROeS and Vietnam Veterans Mortality Study

Appendix A Registered Nurse Nonresponse Analyses and Sample Weighting

Family Integrated Care in the NICU

By Tousignant P, Roy Y, Héroux J, Diop M, Strumpf E.

Understanding Readmissions after Cancer Surgery in Vulnerable Hospitals

CAHPS Hospital Survey Podcast Series Transcript

Using the Inpatient Psychiatric Facility (IPF) PEPPER to Support Auditing and Monitoring Efforts: Session 1

The Role of Analytics in the Development of a Successful Readmissions Program

Analysis of 340B Disproportionate Share Hospital Services to Low- Income Patients

Preventable Readmissions

SDRC Tip Sheet Public Use Files

Technical Notes on the Standardized Hospitalization Ratio (SHR) For the Dialysis Facility Reports

Chronic Disease Surveillance and Office of Surveillance, Evaluation, and Research

June 25, Shamis Mohamoud, David Idala, Parker James, Laura Humber. AcademyHealth Annual Research Meeting

The Effects of Medicare Home Health Outlier Payment. Policy Changes on Older Adults with Type 1 Diabetes. Hyunjee Kim

Hospital Inpatient Quality Reporting (IQR) Program

Assess the individual, community, organizational and societal needs of the general public and at-risk populations.

DAHL: Demographic Assessment for Health Literacy. Amresh Hanchate, PhD Research Assistant Professor Boston University School of Medicine

Community Performance Report

Comparative Effectiveness Research and Patient Centered Outcomes Research in Public Health Settings: Design, Analysis, and Funding Considerations

Nielsen ICD-9. Healthcare Data

Managing Patients with Multiple Chronic Conditions

Surviving and thriving in the time of MACRA: What you need to know now to optimize your future.

Statistical Methods in Public Health III Biostatistics January 19 - March 10, 2016

The Memphis Model: CHN as Community Investment

Program Selection Criteria: Bariatric Surgery

Medicaid EHR Incentive Program Health Information Exchange Objective Stage 3 Updated: February 2017

1A) National-level Data Examples: Free or Inexpensive NHANES - National Health and Nutrition Examination Survey (NHANES). .

Supplementary Online Content

Suicide Among Veterans and Other Americans Office of Suicide Prevention

From Risk Scores to Impactability Scores:

Health Information System (HIS) Module 3 - Morbidity. Using Information to Protect Refugee Health

The Affordable Care Act and Its Potential to Reduce Health Disparities Cara V. James, Ph.D.

HOME DIALYSIS REIMBURSEMENT AND POLICY. Tonya L. Saffer, MPH Senior Health Policy Director National Kidney Foundation

Have existing coordination/integration efforts yielded Medicaid expenditure savings?

Prepared for North Gunther Hospital Medicare ID August 06, 2012

PCMH 2014 Recognition Checklist

Development of Updated Models of Non-Therapy Ancillary Costs

BCBSM Physician Group Incentive Program

Patient-centered medical homes (PCMH): Eligible providers.

National Cancer Patient Experience Survey National Results Summary

2017 Quality Reporting: Claims and Administrative Data-Based Quality Measures For Medicare Shared Savings Program and Next Generation ACO Model ACOs

Hospital Outpatient Quality Reporting Program

Oklahoma Health Care Authority. ECHO Adult Behavioral Health Survey For SoonerCare Choice

Public Health Services & Systems Research: Concepts, Methods, and Emerging Findings

Using An APCD to Inform Healthcare Policy, Strategy, and Consumer Choice. Maine s Experience

4/9/2016. The changing health care market THE CHANGING HEALTH CARE MARKET. CPAs & ADVISORS

HEDIS Measures and the Family Physician Office. Pablo J Calzada DO, MPH, FAAFP, FACOFP

Medicare Advantage PPO participation Termination - Practice Name (Tax ID #: <TaxID>)

Minnesota Adverse Health Events Measurement Guide

Impact of Financial and Operational Interventions Funded by the Flex Program

Surgical Performance Tracking in a Multisource Data Environment

School of Public Health and Health Services Department of Prevention and Community Health

Cardiovascular Disease Prevention: Team-Based Care to Improve Blood Pressure Control

2017 EPSDT. Program Evaluation. Our mission is to improve the health and quality of life of our members

DATA MANAGEMENT.& INTEGRITY

AMERICAN COLLEGE OF SURGEONS Inspiring Quality: Highest Standards, Better Outcomes

Patient-Mix Adjustment Factors for Home Health Care CAHPS Survey Results Publicly Reported on Home Health Compare in July 2017

JH-CERSI/FDA Workshop Clinical Trials: Assessing Safety and Efficacy for a Diverse Population

National Survey of Physician Organizations and the Management of Chronic Illness II (Independent Practice Associations)

INSURANCE INFORMATION

TC911 SERVICE COORDINATION PROGRAM

Quality Based Impacts to Medicare Inpatient Payments

National Cancer Patient Experience Survey National Results Summary

Do Integrated Health Care Systems Provide Lower-Cost, Higher-Quality Care?

Oregon's Health System Transformation

REPORT OF THE BOARD OF TRUSTEES

Introduction to the Home Health Care CAHPS Survey Webinar Training Session. Session I. January 2018

COMMUNITY HEALTH NEEDS ASSESSMENT HINDS, RANKIN, MADISON COUNTIES STATE OF MISSISSIPPI

2018 Hospital Pay For Performance (P4P) Program Guide. Contact:

time to replace adjusted discharges

2013 Workplace and Equal Opportunity Survey of Active Duty Members. Nonresponse Bias Analysis Report

How to Reshape Your Approach to NOWS, Today

NCQA s Patient-Centered Medical Home (PCMH) 2011 Standards 11/21/11

Transcription:

MERMAID SERIES: SECONDARY DATA ANALYSIS: TIPS AND TRICKS Sonya Borrero Natasha Parekh (Adapted from slides by Amber Barnato)

Objectives Discuss benefits and downsides of using secondary data Describe publicly available datasets Identify methodological techniques and considerations related to secondary data analysis Apply didactics to real-life problems that have arisen in secondary data analysis

What is secondary data analysis? Analysis of data collected by someone else In contrast to primary data analysis in which the same team of researchers designs, collects, and analyzes the data Health services and epidemiological research often rely on secondary data

Use of secondary data Types of secondary data: Interview/ Survey Administrative Some datasets are free Some charge fees

Use of secondary data Advantages Large populations represented Collecting primary data costly Detailed information available Can study rare conditions Don t need individual informed consent

Use of secondary data Disadvantages No choice in variables available Data use agreements are necessary Time frame might not be desired Population is predetermined Complex sampling frame on some

National Center for Health Statistics (NCHS) datasets Population Surveys: National Health and Nutrition Examination Survey (NHANES) National Health Interview Survey (NHIS) National Survey of Family Growth (NSFG) Vital Records: National Vital Statistics System National Death Index

National Center for Health Statistics (NCHS) datasets Provider surveys: National Ambulatory Medical Care Survey National Hospital Ambulatory Medical Care Survey National Hospital Care Survey National Study of Long-Term Care Providers

Other CDC datasets Behavioral Risk Factor Surveillance System (BRFSS): State-level, telephone surveys assessing health-related risk behaviors, chronic health conditions, and use of preventive services Pregnancy Risk Assessment Monitoring System (PRAMS): State-level surveillance on maternal attitudes and experiences around pregnancy Conducted in conjunction with state health departments Covers about 83% of all US births

Centers for Medicare and Medicaid Services (CMS) datasets Medicare data Administrative claims data (i.e., data generated by billing) for people whose health care is covered by Medicare (age 65; kidney failure, some disabilities) Medicaid data Claims data on all patients enrolled in the Medicaid program Summary files and MSIS Data Mart easier to use

Veterans Affairs (VA) databases VA is largest integrated health care system in the US Many databases: most administrative, few survey Very comprehensive Requires prior VA approval to access

SGIM Dataset Compendium Great resource to assist investigators conducting secondary data analysis Users guide and comprehensive list of public and proprietary datasets http://www.sgim.org/communities/research/datasetcompendium

Methodological techniques: Intro to analyses used Selection of analysis methods are based on two elements: Research Experiment Study Design Type of variables Research Questions Data Collection Methods Administrative Data (population based) Survey / Interview Data (sample based)

Intro to analyses used Study Design Cross-sectional, longitudinal (repeated measures) Multilevel (hierarchical) Cohort / case-control; prospective / retrospective; clinical trials Type of variables Time-to-event (survival) Binary, categorical, ordered, continuous outcome Instrumental variables

Intro to analyses used Research questions Descriptive Associations (regressions; linear, non-linear, logistic, Cox, mixed; structural equation modeling; etc.) Estimation / Inference (measures of effect: OR/RR; test for trend; etc.) Others (e.g.; cost-effectiveness, propensity score, etc.)

Intro to analyses used Data Collection Methods Administrative Data Population Record duplication Missing data Survey / Interview Data (Sampled) Sampling design Analysis adjusted for the sampling design (weights)

Case-Based Problems Lets apply what we learned to a case!

Case-Based Study Background Study objectives Assess cervical cancer screening guideline adherence after guideline changes Assess covariates associated with appropriate cervical cancer screening, under-screening, and over-screening

Problem 1: What level of analysis? ID # Age Race # of paps Patient-level? Pap 1 date Pap 1 provider Pap 2 date Pap 2 provider Provider-level? Physician # Specialty Gender # of paps performed Pap 1 date Pap 2 date Pap # Patient ID # Pap-level? Age Race Date Provider

Problem 1: Our Approach Pap-level Women could have multiple outcomes (appropriate screening, under-screening, over-screening); outcome based on pap Some variables changed with pap (patient age, date, provider) Some variables did not change with pap (patient race, ethnicity, comorbidities)

Problem 2: How do I define my outcome (rates of adherence)? Can look at # of paps during time periods Pros: Easier Consistent with other studies Cons Hard to assess >1 guideline in >1 time period What about paps right before and right after? Does not take into account actual interval between paps

Problem 2: How do I define my outcome? Can look at time between paps Pros Is more accurate since guidelines are based on intervals Cons Can be confusing to define Do you look at exact amount of time between paps? If I look at all paps for all women, how do I look at different guideline periods?

Problem 2: Our approach Solution: Look at time between paps for only women who have an index pap in set time periods Decreases sample size but is much cleaner Divide groups by age and time based off of guideline differences Group Age Index Period A1 18-29 yo 1/1/07-6/30/07 A2 18-29 yo 11/1/09-4/30/10 B1 30-65 yo 1/1/07-6/30/07 B2 30-65 yo 11/1/09-4/30/10

Problem 3: A covariate does not make sense! One covariate of interest was # of annual visits Hypothesis: women with more annual visits had more over-screening? Descriptive stats for visit# Mean SD Min Max Median 10.7 12 1 296!! 7.5 Ideas for what is going on?

Problem 3: A covariate does not make sense! What questions do you have? How did we define visits? We defined visits by outpatient location codes for visits from billing data What is going on with people with 296 visits/year? Person with max 296 had a visit every other day for procedure code H0020 (Alcohol and/or drug services; methadone administration and/or service; provision of the drug by a licensed program)

Problem 3: A covariate does not make sense! Options for how to handle this? Find out what proportion of people with >X visits per year have this procedure code? Determine how "visit count" summary statistics change if we exclude this procedure code? Hesitant to do this because some women get methadone or alcohol treatment at same offices as paps Talk to our statistician

Problem 3: Our approach 1 st : we excluded this procedure code Mean decreased from 10.7 7.5, Median decreased from 7.5 6. Still very high! Discussed with statistician we could keep exploring, or.. We could winsorize! transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers. We set all visit counts >95 th percentile to the 95 th percentile We used it as a confounder/control variable rather than a covariate of interest. Regressions with and without this variable were similar

Problem 4: A main covariate of interest does not make sense! Patient Factors Cervical Cancer Screening Guideline Adherence Provider Factors

Problem 4: A main covariate of interest does not make sense! We assessed provider factors through linking NPI number of provider in billing data with AAMC data on specialty, race, gender, and years in practice

Descriptive Statistics for Providers n Total 14812 Provider Type Clinic 2142 Lab 8494 Physician 2476 Provider Specialty Family Planning Clinic 1709 Independent Laboratory 8494 Family Practice 230 General Practitioner 1437 Internal Medicine 1 Clinical Medical Laboratory 8235 Family Medicine 249 Internal Medicine 253 OBGYN 1076 Pathology 1047 Specialist 207

Descriptive Statistics for Providers Billing Provider Type Clinic 2200 Laboratory 8506 Physician 1796 Billing Provider Spec Family Planning Clinic 1714 Family Practice 143 General Practitioner 877 Internal Medicine 3 OBGYN 193

We needed to do some more exploring

Problem 4: Our approach 1 st : Went back to billing codes to assess if I chose the wrong billing codes for paps Codes verified as correct for paps Most of the lab-based provider coded paps did not have other providers associated with the bill So.. If we excluded these, we would exclude a LOT of paps

Problem 4: Our approach Contacted coders from our clinic to understand whether some procedure codes were more specific for performing a pap (more likely to be physicians) vs. interpreting pap results (more likely to be laboratories) Dawn: the majority of pap smear procedure codes are for interpreting paps; only 1 Medicare code exists for performing paps Medicare code used very infrequently What would you do next?

Problem 4: Possible Solutions Excluding these paps Determine usual source of primary care Use physician visits done within 3 days of the pap claim as proxies for the performing provider Dicey since a) patients usual source of care may not be the ones who are performing the inappropriate paps, and b) sometimes patients see many doctors in a 3-day period so it will be hard to tease out who performed the pap Attribution of paps was tricky

Problem 4: Our solution Accepted that we could not assess provider factors with our data Looked into other sources, but not feasible within our means Referred to this as a limitation and moved on

Problem 5: Moving on We finished the cervical cancer screening project, and needed to decide where to go next Wanted to look at mammogram guideline adherence patterns Issue: Medicaid population is ~30 years old Thoughts?

Problem 5: Moving on >50-year-olds in Medicaid are a special population, <generalizable >Disabled >Long-term residents Solution: Decided to not study mammogram screening patterns in Medicaid Studied STI screening instead

Take home points Secondary data has benefits and downsides Many publicly available data sets Special considerations are needed for analysis ALWAYS check descriptive statistics for every variable and outcome Explore why when these don t make sense Once you understand why, decide if it is reparable There are tips and tricks for troubleshooting Work closely with your statistician Sometimes changing your original plan is best option

Take home points If using billing codes, check with coders about which variables make the most sense Hypotheses can lead to more hypotheses Think creatively about troubleshooting problems and next steps of your research

Thank you! Questions?