FCSM Research and Policy Conference March 8, 2018 Joshua Goldstein

Similar documents
Officer Retention Rates Across the Services by Gender and Race/Ethnicity

Manpower System Analysis Thesis Day Brief v.3 / Class of March 2014

Appendix. Table A1. Overall U.S. Results for Base Pay: Regression of Log Base Salary on Various Individual, Job and Employer Characteristics

Supplementary Online Content

Fleet and Marine Corps Health Risk Assessment, 02 January December 31, 2015

EPSRC Care Life Cycle, Social Sciences, University of Southampton, SO17 1BJ, UK b

Technical Notes on the Standardized Hospitalization Ratio (SHR) For the Dialysis Facility Reports

2013 Workplace and Equal Opportunity Survey of Active Duty Members. Nonresponse Bias Analysis Report

MaRS 2017 Venture Client Annual Survey - Methodology

AUGUST 2005 STATUS OF FORCES SURVEY OF ACTIVE-DUTY MEMBERS: TABULATIONS OF RESPONSES

Impact of Scholarships

Determining Like Hospitals for Benchmarking Paper #2778

Scottish Hospital Standardised Mortality Ratio (HSMR)

MERMAID SERIES: SECONDARY DATA ANALYSIS: TIPS AND TRICKS

A Semi-Supervised Recommender System to Predict Online Job Offer Performance

Recruiting in the 21st Century: Technical Aptitude and the Navy's Requirements. Jennie W. Wenger Zachary T. Miller Seema Sayala

The Life-Cycle Profile of Time Spent on Job Search

The Memphis Model: CHN as Community Investment

ChalleNGe: Variation in Participants and Policies Across Programs Subpopulations and Geographic Analysis

2011 National NHS staff survey. Results from London Ambulance Service NHS Trust

Accession Medical Standards Analysis and Research Activity (AMSARA): 2003 Annual Report

Enhancing Sustainability: Building Modeling Through Text Analytics. Jessica N. Terman, George Mason University

WARFIGHTER MODELING, SIMULATION, ANALYSIS AND INTEGRATION SUPPORT (WMSA&IS)

Low-Income Health Program (LIHP) Evaluation Proposal

The "Misnorming" of the U.S. Military s Entrance Examination and Its Effect on Minority Enlistments

Predictors of Attrition: Attitudes, Behaviors, and Educational Characteristics

Frequently Asked Questions 2012 Workplace and Gender Relations Survey of Active Duty Members Defense Manpower Data Center (DMDC)

Determining Patterns of Reserve Attrition Since September 11, 2001

A Reality Check on Health Information Privacy: How should we understand re-identification risks under HIPAA?

Satisfaction and Experience with Health Care Services: A Survey of Albertans December 2010

2016 National NHS staff survey. Results from Wirral University Teaching Hospital NHS Foundation Trust

Forecasting U.S. Marine Corps reenlistments by military occupational specialty and grade

Statistical Methods in Public Health III Biostatistics January 19 - March 10, 2016

NAVAL POSTGRADUATE SCHOOL THESIS

Licensed Nurses in Florida: Trends and Longitudinal Analysis

Navy and Marine Corps Public Health Center. Fleet and Marine Corps Health Risk Assessment 2013 Prepared 2014

Applying client churn prediction modelling on home-based care services industry

Fertility Response to the Tax Treatment of Children

Manpower System Analysis Thesis Day Brief / Class of March 2015

Summary of Findings. Data Memo. John B. Horrigan, Associate Director for Research Aaron Smith, Research Specialist

NAVAL POSTGRADUATE SCHOOL THESIS

School of Public Health and Health Services Department of Prevention and Community Health

2018 Technical Documentation for Licensure and Workforce Survey Data Analysis Addressing Nurse Workforce Issues for the Health of Florida

VE-HEROeS and Vietnam Veterans Mortality Study

The Prior Service Recruiting Pool for National Guard and Reserve Selected Reserve (SelRes) Enlisted Personnel

Predicting Transitions in the Nursing Workforce: Professional Transitions from LPN to RN

COMPLIANCE WITH THIS PUBLICATION IS MANDATORY

Attrition Rates and Performance of ChalleNGe Participants Over Time

2016 National NHS staff survey. Results from Surrey And Sussex Healthcare NHS Trust

Measuring the relationship between ICT use and income inequality in Chile

r e s e a r c h a t w o r k

Inferring Hospital Quality from Patient Discharge Records Using a Bayesian Selection Model

NAVAL POSTGRADUATE SCHOOL THESIS

2017 National NHS staff survey. Results from The Newcastle Upon Tyne Hospitals NHS Foundation Trust

Palomar College ADN Model Prerequisite Validation Study. Summary. Prepared by the Office of Institutional Research & Planning August 2005

U.S. Naval Officer accession sources: promotion probability and evaluation of cost

Tracking Functional Outcomes throughout the Continuum of Acute and Postacute Rehabilitative Care

ARMY RDT&E BUDGET ITEM JUSTIFICATION (R-2 Exhibit)

APPENDIX A: SURVEY METHODS

Assessing the Effects of Individual Augmentation on Navy Retention

Pricing and funding for safety and quality: the Australian approach

GAO. DOD Needs Complete. Civilian Strategic. Assessments to Improve Future. Workforce Plans GAO HUMAN CAPITAL

Variation in Participants and Policies Across ChalleNGe Programs

Alternative practice patterns of dental hygienists

Senior Nursing Students Perceptions of Patient Safety

Low-Income Health Program (LIHP) Evaluation Proposal

CURRICULUM VITAE. DATE OF BIRTH: 27/09/1981 PHONE: petralia+at+aueb.gr

Contents. Page 1 of 42

Screening for Attrition and Performance

Demographic Profile of the Active-Duty Warrant Officer Corps September 2008 Snapshot

Missed Nursing Care: Errors of Omission

NAVAL POSTGRADUATE SCHOOL THESIS

Chicago Scholarship Online Abstract and Keywords. U.S. Engineering in the Global Economy Richard B. Freeman and Hal Salzman

FY 2015 EAS Enlisted Retention Survey Results

2008 International Infantry & Joint Services Small Arms Systems Symposium System Analysis: Infantry Studies and Simulations

Work- life Programs as Predictors of Job Satisfaction in Federal Government Employees

Statistical Analysis Tools for Particle Physics

Settling for Academia? H-1B Visas and the Career Choices of International Students in the United States

Industry Market Research release date: November 2016 ALL US [238220] Plumbing, Heating, and Air-Conditioning Contractors Sector: Construction

1 P a g e E f f e c t i v e n e s s o f D V R e s p i t e P l a c e m e n t s

Virginia Community Corrections

Key findings. Jennie W. Wenger, Caolionn O Connell, Maria C. Lytell

Demographic Profile of the Officer, Enlisted, and Warrant Officer Populations of the National Guard September 2008 Snapshot

JIMAR PFRP ANNUAL REPORT FOR FY Project Proposal Title: Sociological Baseline of Hawaii Longline Industry

Table 1. Survey Sample and Virginia Tech Graduate Student Population (on Campus) Comparisons VT Grad Student Survey Participants

Research & Reviews: Journal of Medical and Health Sciences. Research Article ABSTRACT INTRODUCTION

Statistical Analysis for the Military Decision Maker (Part II) Professor Ron Fricker Naval Postgraduate School Monterey, California

INDICATORS AND MEASUREMENT: POLICY IMPERATIVES AND THE WAY FORWARD

CURRICULUM VITAE. Assistant Professor, Department of Mathematics, College of Arts and Sciences, University of Dayton.

Does the Sector Experience Affect the Wage Gap for Temporary Agency Workers

SECRETARY OF THE ARMY WASHINGTON. SUBJECT: Army Directive (Army Career and Alumni Program)

2017 National NHS staff survey. Results from Royal Cornwall Hospitals NHS Trust

System of Records Notice (SORN) Checklist

The Prevalence and Impact of Malnutrition in Hospitalized Adults: The Nutrition Care Process

PROFILE OF THE MILITARY COMMUNITY

Research Design: Other Examples. Lynda Burton, ScD Johns Hopkins University

Specifications for an Operational Two-Tiered Classification System for the Army Volume I: Report. Joseph Zeidner, Cecil Johnson, Yefim Vladimirsky,

STAR GAZING. Identifying and Improving the Performance of STudents At Risk of NCLEX Failure

2013, Vol. 2, Release 1 (October 21, 2013), /10/$3.00

Reenlistment Rates Across the Services by Gender and Race/Ethnicity

Transcription:

Leveraging Access to and Use of Department of Defense Data: A Case Study of Unraveling Military Attrition Through New Approaches to DoD Data Integration FCSM Research and Policy Conference March 8, 2018 Joshua Goldstein

Authors US Army Research Institute for the Behavioral & Social Sciences Andrew Slaughter, Senior Research Psychologist Social and Decision Analytics Laboratory, Biocomplexity Institute David Higdon, Professor of Statistics Sallie Keller, Professor of Statistics and Director Stephanie Shipp, Research Professor and Deputy Director Vicki Lancaster, Senior Research Scientist Bianica Pires, Research Scientist

Biocomplexity Institute of Virginia Tech Social and Decision Analytics Laboratory The Social and Decision Analytics Laboratory brings together statisticians and social and behavioral scientists to embrace today s data revolution, developing evidence-based research and quantitative methods to inform policy decision-making. 3

Motivation: The Challenge Our people are the most significant weapon in our arsenal Half of the U.S. Defense Budget is spent on these people, e.g., pay, benefits, health care, training, housing, child care, etc. Should all data be leveraged into these decisions? The Army population is highly dynamic and must be analyzed within context and on a longitudinal scale Need to gain a socio-demographic understanding Need integrated eco-system of multiple data sources

DoD ARI / VT SDAL Collaborative Research Research in science of ALL data to support DoD Assess ability to access, evaluate the quality, and integrate DoD data and other data to support decision making related to the Army Population Open opportunities to expand the types of questions and that can support findings beyond operational purposes Ground research in the context of a real problem Identify predictors of attrition within enlisted ranks

Research Questions What is the value of combining DoD, civilian, and non-federally collected data sources to enhance or complement a representative use of DMDC (PDE) data? How does this help capture and model individual, unit, and organizational characteristics and non-military contexts that affect attrition? How do we quantify this value?

Research Objective To assess the best approaches for successfully accessing and integrating DoD data and other data sources within the context of a case study on military attrition of Army enlistees

Data Discovery and Profiling in the Person Data Environment

Person Data Environment (PDE) DoD maintains numerous datasets about military personnel and their families, including their military service information including deployments, demographics, accessions, pay, and waivers, as well as their health, training, and positions held. DMDC provides access to many of these datasets through the Person Data Environment (PDE) in a secure, cloud-based enclave.

PDE Access To access the data requires the completion of necessary administrative steps, many which initially seem as daunting as the actual research. These involve: 1. Obtaining CAC (Common Access Card) cards for nonmilitary academics, 2. Completing the Institutional Review Board (IRB)/Human Research Protections Official (HRPO) processes; and 3. Working through the mechanics of accessing the PDE/CITRIX system.

Data Science Framework

Data Profiling Goal: Determine quality and utility of data for research use Looking at measures such as completeness, value validity, longitudinal consistency, and uniqueness. Example problems include changes in an individual's recorded gender over time, duplication of records by ID, etc. Develop a gold standard for demographics profiling

Completeness Steps in the Data Quality Analysis Process The Proportion of Elements Properly Populated for a Given Purpose Issue types include: record fields containing no data; records not containing necessary fields; datasets not containing the requisite records (e.g., Testing for NULLs and empty strings existing where not appropriate) Validity Uniqueness Duplication Consistency (Record) Consistency (Longitudinal) The Proportion of Elements whose Attributes Possess Proper Values Checking for value validity generally comes in the form of straight-forward domain constraint rules e.g., count gender where gender is not (male, female), count age where age is not between [0, 110] The Count of Unique Values Taken by an Attribute or Combination of Attributes Frequency distribution of an element. (Note: The more homogeneous the data values of an element, the less useful the element is for analysis) The Degree of Replication of Distinct Observations Per Observation Unit Type e.g. Greater than 1 registration per student per official reporting period. Note: Duplication occurs as a result of choice of level of aggregation The Degree to which Two or More Data Attributes Satisfy a Well-Defined Dependency Constraint a.k.a. - relationship validation e.g. zip-code state consistency e.g. gender pregnancy consistency The Degree to which an Attribute, or Combination of Attributes, Remains Consistent Over Time e.g. An individual's gender changing and then changing back (clerical error) e.g. An individual s race classification changing from one to two-races (change in number of options)

Data Profiling Example

Statistical demographic log analysis & reduction Basic demographics between data files needed to be reconciled and reusable data products created. Demographics Table Column Name Description Original Table o Information about the enlistee that typically remains static over time, e.g., gender, race, ethnicity, entry test scores o Simple rules are applied to resolve duplicates and invalid entries o Contains one row per PID Transaction Table Events or enlistee information that can change periodically, e.g., duty station, rank, pay grade Contains multiple rows per PID PID_PDE Enlistee s Unique ID Master PN_SEX_CD Gender Master RACE_CD Race Code Master INIT_ENT_TRN_END_DT Initial Entry Training End Date Master DATE_BIRTH_PDE Person Birth Date Master PN_BIRTH_PLC_CTRY_C D Person Birth PlaceCountry Code Master HOR_ZIP_CODE_PDE Homeof Record Zip Code Analyst ACT_SCORE ACT Score Analyst SAT_SCORE SAT Score Analyst AP ASVAB: Auditory Perception Score Analyst CO ASVAB: Combat Score Analyst......

Data Discovery - DoD Data Sources Currently Accessible in PDE Active Duty Military Personnel Master Active Duty Military Personnel Transaction MEPCOM Regular Army Analyst Interactive Personnel Elective Records Management System (IPERMS) Digital Training Management System (DTMS) Army Career and Alumni Program (ACAP) Global Assessment Tool (GAT) Pre-Deployment, Post-Deployment, Periodic Health Assessments Unit Risk Inventory (URI) DEOMI Organization Climate Survey Omaha 5 Behavioral Survey

Data Discovery Non-DoD Data Sources Data external to the PDE provide richer information on the surrounding social, political, and economic environments that affect an individual s decision to attrit. Tables ingested to by county and year (2005-2015) are available to all researchers in the PDE. Crosswalk of counties to deployed installation, home county. Data sources: American Community Survey (ACS) from the U.S. Census Bureau Bureau of Labor Statistics data including Occupational Employments Statistics Quarterly Census of Employment and Wages (QCEW) Current Employment Survey (CES) data

Case Study: Modeling First-Term Army Attrition

conceptual enlistee trajectories Conceptual Model of Attrition time (months) 0 20 40 60 80 100 120 (re)enlist end of term drop out color=unit Enlistees through time may transfer units, reenlist, and attrit. 0 5 10 15 20 25 30 individuals

An Agent-based Model of Attrition Develop an agent-based model for the purpose of generating simulated data to test potential statistical models. The ABM simulates the enlistment, attrition, and re-enlistment of active enlistees. Each agent is assigned to a Unit; attrition probability follows a U- shaped curve For each simulation month tick, agents can perform one of the following actions: enlist, transfer units, re-enlist, attrit. Simulation output provides synthetic data of enlistee attrition behavior across time. Percent 25% 20% 15% 10% 5% 0% Percentage of Attrited Enlistees by Unit 0 10 20 30 Month in Term

Hazard rate at time t for agent i λ " t = λ % t exp(x " β) where X " is a vector of covariates. For the simulated example (right), X " is the Soldier s unit; fitted survival probabilities show an effect of unit on attrition. Proportional Hazards Model

Building Complexity Model flexibility for connecting many data sources Need to integrate external data sources that change over time. Need to integrate person-specific information Relevant time is with respect to a Soldier s term Exposures to duties, leaders, training, Priors needed to estimate many similar effects (e.g. unit, duty location)

Multinomial Proportional Odds Cox discrete proportional odds model (Cox 1972); discrete-time version of proportional hazards model with time-varying covariates X t. Extended to allow for competing risks for multiple events; r denotes the risk (Allison 1982). λ - exp( β - " t = % t + β - X " t ) 1 + 2 exp( β 2 % t + β 2 X " t ) Define λ - " (t) as the cause-specific hazard, i.e. the probability that an attrition of Soldier i occurs due to risk r at time t, given that the event had not occurred at t-1. Time-varying data represented in counting process format Hazard related to covariates as in multinomial logistic regression.

Bayesian MPO Model Bayesian discrete time, additive model for the rate of Soldier attrition. This model builds on the multivariate proportional odds model (King 2014). λ " - (t) is the cause specific hazard, λ " % (t) is the probability of no event. η " - t = log λ " - (t) λ " % (t) η " - t = β % - t + β 7 - ( 7 X 7," t ) If X 7," t is categorical, estimate a separate effect for each category. For continuous covariates, β 7 - (X 7," t ) is piecewise constant with a Gaussian Markov Random Field prior. For a random effect, (β 7 9,, β 7 ; ) ~ N ; (0, Σ 7 ) where Σ 7 has an inverse Wishart specification.

Model Features This is a flexible hierarchical modeling framework that can account for: A random effect for individual Soldiers (e.g. taste for the military) Nonlinear, time-varying covariate effects Competing risks due to multiple causes of attrition Recurrent events due to reenlistment Efficient Bayesian inference for model parameters via Markov chain Monte Carlo (MCMC).

Model Covariates Covariates observed quarterly for >1M soldiers, 2005-2015. They include information on Demographics (Age, Gender, Race, Education, Marital Status) Military Characteristics (Rank, MOS, Unit, Court Martials, Deployments) Testing (ASVAB, APFT, Training) Community Factors (Employment rate, Wages, Poverty, etc.)

Conclusion We integrated multiple data sources within and external to the DoD, and levereaged this data for a case study on Soldier attrition. Next steps: Exploring attrition by MOS, unit, geography. Predictive power of our model and comparison to baselines.

References Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society, Series B, 34(2):187 220. Allison, P. D. (1982). Discrete-time methods for the analysis of event histories. Sociological Methodology, 13(1):61 98. King, A.J. (2014). Bayesian Event History Analysis with Applications to Recurrent Episodes of Illicit Drug Use. Ph.D. Thesis, University of California, Los Angeles.