ASM. Common Operations Failure Modes in the Process Industries International Symposium. Dr. Peter Bullemer Human Centered Solutions

Similar documents
Cognitive Level Certified Professional in Patient Safety Detailed Content Outline Recall. Total. Application Analysis 1.

A GLOWING RESEMBLANCE A COMPARE AND CONTRAST OF MEDICAL AND NUCLEAR PERFORMANCE IMPROVEMENT INITIATIVES

HALF YEAR REPORT ON SENTINEL EVENTS

Revealing the true cost of financial crime Focus on Asia and the Pacific

Deriving Process Safety KPIs. Swiss Aviation Safety Conference SASCON, November 2012, Olten Safety Performance Indicators (SPI)

Labor Market Openness, H-1B Visa Policy, and the Scale of International Student Enrollment in the US

ECRI Patient Safety Organization HFACS and Healthcare

Micro 2012 Program Chair s Remarks. Onur Mutlu PC Chair December 3, 2012 Vancouver, BC, Canada

Quality Improvement and Patient Safety (QPS) Ratchada Prakongsai Senior Manager

EFFECTIVE ROOT CAUSE ANALYSIS AND CORRECTIVE ACTION PROCESS

Improving the quality of the JODI Database

Business Environment and Knowledge for Private Sector Growth: Setting the Stage

Root Cause Analysis A Necessary Evil? Dr Joseph Lui HA Convention 8 th May 2012

Linking QAPI & Survey April 30, 2015

# $ pages In Stock. Report Description

Robotics and Reshoring: Case Studies of the Apparel and Footwear and Electronics Industries

12.01 Safety Management Plan UWHC Administrative Policies

The Nature of Knowledge

Clinical Risk Management: Agile Development Implementation Guidance

Guidelines for writing PDP applications

How effective and sustainable are Root. HFESA Conference

The Basic Principles of Developing Standards for Accreditation. Triona Fortune Deputy Chief Executive Officer 25 November 2014

Abstracts must be structured according to one of the four following formats, incorporating the indicated headings and information:

Risk Management in the ASC

Measuring Digital Maturity. John Rayner Regional Director 8 th June 2016 Amsterdam

Neither good nor bad: Just already around

EU Stress Tests and National Action Plans

Self-Assessment Questionnaire: Establishing a Health Information Technology Safety Program

Maintenance Outsourcing - Critical Issues

Pilot Study: Optimum Refresh Cycle and Method for Desktop Outsourcing

MANAGING PATIENTS WITH COMPLEX CHRONIC CONDITIONS: HIGH UTILIZERS AND CARE TRANSITIONS

The G200 Youth Forum 2015 has 4 main platforms which will run in tandem with each other:

Dietitians-nutritionists around the World

1 Introduction to ITC-26. Introduction to the ITC and DEPO. October 24 November 11, 2016 Albuquerque, New Mexico, USA Greg Baum

Engaging clinicians in improving data quality in the NHS

17th Annual Computer Security Applications Conference. Jeremy Epstein ACSAC Program Chair webmethods, Inc. (703)

Our Commitment to Deliver our Science to Patients

Patient Risk (Safety) in Radiation Therapy

PointRight: Your Partner in QAPI

England: Europe s healthcare reform laboratory? Peter C. Smith Imperial College Business School and Centre for Health Policy

Incident Management Practice

Request for Proposal REQUEST FOR PROPOSAL

Lee Hecht Harrison (LHH) Global Leader in Career Transition & Talent Development. Peter Alcide, President & COO

Leadership, Teamwork and Patient Safety

COUNTRY OVERALL COMPARATIVE SIZE

International co-operation in

New gtld Program Update. 12 March 2012

CTPR PILOT PROJECT APPLICATION GUIDELINES

The Future of Work: Information Access Expectations, Demands, and Behavior of the World s Next-Generation Workforce.

CMS TRANSPLANT PROGRAM QUALITY WEBINAR SERIES. James Ballard, MBA, CPHQ, CPPS, HACP Eileen Willey, MSN, BSN, RN, CPHQ, HACP

How Prepared are Hospital Employees for Internal Fire

Mobile Crane Tips Forward Discussion and Lessons Learned

The Safety Audit. Safety Audits Why Bother? Oh no.. 4/26/2017. I need some help but where can I get it????? Does it really matter? I hate metrics!

Pure Experts Portal. Quick Reference Guide

National scholarship programme for foreign students, researchers and lecturers SCHOLARSHIP FOR STUDIES IN HIGHER EDUCATION INSTITUTION Guidelines 2018

Strengthening tuberculosis surveillance: rationale and proposed areas of work

QAPI Making An Improvement

POLICY BRIEF. A Fund for Education in Emergencies: Business Weighs In. Draft for Discussion

M3 Global Research Overview

Medi-Cal Aid Codes: Methodology for Identifying Dual Enrollment Opportunities Between Medi-Cal and CalFresh

International Recruitment Solutions. Company profile >

Exploiting International Life Science Opportunities. Dafydd Davies

NRC INSPECTION MANUAL STSB

einteract User Guide July 07, 2017

Job Search Counseling Systematic literature review of impact evaluations

UNIVERSITY OF MISSISSIPPI MEDICAL CENTER PATIENT SAFETY PLAN

Employers are essential partners in monitoring the practice

Regional Alignment in Asia Pacific -

IEEE s Membership Strategy

2018 Medicare Advantage Dual Eligible Special Needs Plan (DSNP) & Model of Care (MOC) Overview

Objectives. Key Elements. ICAHN Targeted Focus Areas: Staff Competency and Education Quality Processes and Risk Management 5/20/2014

Report of the Auditor General to the Nova Scotia House of Assembly

E-Seminar. Teleworking Internet E-fficiency E-Seminar

Information Technology Incident Management

Part 1 - Registering

EXPORT PERFORMANCE MONITOR

Sub-title: Monitoring of Optimal Use of MCH e Registry, Evaluation and Action Plans. Effective date: 15 th January 2017 Review date: 1 st May 2017

Measuring Civil Society and Volunteering: New Findings from Implementation of the UN Nonprofit Handbook

Technical Questions and Answers for RFP-DEM Florida Statewide Comprehensive Risk Assessment and Vulnerability Analysis

September 2-3, 2013 Chengdu, China

NEW DISASTER PLANNING REGULATIONS AND REQUIREMENTS: ARE YOU PREPARED?

NHS Vacancy Statistics. England, February 2015 to October 2015 Provisional experimental statistics

SPONSORSHIP OPPORTUNITIES

A Canadian Perspective: Implementing Tiered Licensing in the Province of Ontario

Effective Date: January 9, 2017

A National Survey of Chronic Disease Management in Irish General Practice

Risk Based Inspections

Discovering the Future of Research Metrics at Elsevier

A Systems Approach to Patient Safety at the VA

Key Performance Indicators What does it mean for Hospital Authority?

Migrant Education Comprehensive Needs Assessment Toolkit A Tool for State Migrant Directors. Summer 2012

Contains Nonbinding Recommendations. Draft Not for Implementation

SYNCING INFORMATION CHAINS WITH HEALTH SYSTEMS DEVELOPMENT FOR BETTER OUTCOMES. Gabriela Tannus Branco de Araújo, MSc

Examining Compliance from an Internal Audit Perspective

practice standards CFP CERTIFIED FINANCIAL PLANNER Financial Planning Practice Standards

Standards for improvement in health care: supervision, certification and accreditation in Europe

Quality Improvement and Quality Improvement Data Collection Methods used for Medical. and Medication Errors

Healthcare Conflicts: Resolution Mode Choices of Doctors & Nurses in a Tertiary Care Teaching Institute

Characteristics of Specialty Occupation Workers (H-1B): October 1999 to February 2000 U.S. Immigration and Naturalization Service June 2000

Associate Professor Jennifer Weller University of Auckland Specialist Anaesthetist, Auckland City Hospital

Transcription:

Common Operations Failure Modes in the Process Industries 2009 International Symposium Beyond Regulatory Compliance, Making Safety Second Nature Dr. Peter Bullemer Human Centered Solutions Jason Laberge Honeywell October 27-28, 2008 College Station, TX USA ASM Paper presented on behalf of the Abnormal Situation Management R&D Consortium ASM and Abnormal Situation Management are registered trademarks of Honeywell International

Authors Dr. Peter Bullemer Senior partner, North American, human factors consulting group, Human Centered Solutions, LLP Specializes in human performance in process industry operations Technical Contributor to the ASM Consortium since 1994 Jason Laberge Principal Investigator for the Abnormal Situation Management (ASM ) Consortium Lead human factors researcher for ASM since 2005 Research focuses on understanding the factors that influence performance in complex systems Page 2 2

Founded in 1994 Abnormal Situation Management A Joint Research and Development Consortium Creating a new paradigm for the operation of complex industrial plants, with solution concepts that improve Operations ability to prevent and respond to abnormal situations. Human Centered Solutions Helping People Perform www.asmconsortium.org Page 3

Message The typical approach to incident analysis does NOT effectively identify the impact of ineffective operations practices This paper illustrates a methodology to identify systemic operations practice failure modes And improve human reliability associated with plant operations practices Page 4

Project Objectives Understand relation between ineffective operations practices and process industry incidents Systematically analyze incidents to determine common operational practice failure modes Identify root causes of common operational practice failure modes Why do failures occur ACROSS incidents This research study was sponsored by the Abnormal Situation Management (ASM ) Consortium. Page 5

Incident Selection Identified 123 candidate incidents (99 public, 24 site) Priority given to recent refining/chemical incidents with severe consequences and detailed reports Selected 32 incidents for the study # of Incidents 100 80 60 40 20 0 USA Canada UK Korea India Other Non-USA Germany Algeria Australia Brazil France Italy Kuwait Mexico Public Site Total USA 14 7 21 Non USA 6 5 11 100% 80% 60% 40% 20% 0% Cumulative % Total 20 12 32 Page 6

Operational Failures Failure is any operational practice flaw that, if corrected, could have prevented the incident from occurring or would have significantly mitigated its consequences What went wrong in the specific incident in the investigation team s own language/terms Example: Supervisor not accessible Common failure modes are shared operational practice failures across incidents Common problems for the industry (or site) Failures map to ASM Effective Operations Practices Guidelines Example: Ineffective first line leadership roles Page 7

Common Failure Modes Top 10 Operations Failures # % Hazard analysis/ communication 79 15% First-line leadership 65 12% Continuous improvement 60 11% Safety culture 36 7% Initial and refresher training 30 6% Task communications 29 5% Comprehensive MOC 28 5% Cross functional communication 23 4% Compliance with procedures 15 3% Design guidelines and standards 14 3% Other failure modes 160 30% TOTAL 539 Top 10 covers 70% of identified operations practice failures Page 8

Key Learning from Project The explicit focus on operating practice failures identified opportunities to reduce risk to incidents that may not be identified via traditional investigation approaches Page 9

Key Learning from Project BP Texas City incident (March 23, 2005) investigation reports (Baker, CSB, BP) failed to fully identify the following operating practice failures: Page 10» Task-oriented collaborative communication (i.e., team coordination and real-time communication)» Training for situation management and team collaboration (i.e., CRMtraining)» Need for a common console operator interface framework that supports all operator interaction requirements Note: this investigation was not typical in level of detail and scope of coverage Image from BP Incident Report (2006)

Key Learning from Project Typical analyses that focus on just root causes are insufficient for identifying systemic improvement opportunities: Root causes explain why something occurred, not what occurred in terms of failures Root causes are general and not specific enough to drive continuous improvement details are buried in incident report No effective methods for aggregating root cause details across incidents for systemic analysis of problems and improvements Event 1 Event 2 Event N Incident Event N+1 Why event occurred Missing What went wrong Root Cause Root Cause How aggregate details within and across incidents? Page 11

Some Definitions Incident failure is any operational practice flaw that, if corrected, could have prevented the incident from occurring or would have significantly mitigated its consequences What went wrong in the specific incidents and often in the investigation team s own language/terms In the research project incident failures were identified based on incident reports Example: Supervisor did not check procedure progress Page 12

Some Definitions Common failure are shared operational practice failures across incidents Common problems for the industry (or site) In the research project common failures map to ASM Effective Operations Practices Guidelines Example: Ineffective first line leadership Page 13

Some Definitions A root cause is the most basic cause (or causes) that can reasonably be identified that management has control to fix and, when fixed, will prevent (or significantly reduce the likelihood of) the failure s (or factor s) recurrence Why a failure occurred In the research project root causes were based on TapRoot An operations failure mode may have more than one root cause Example: No Supervision and No communication may both result in Ineffective first line leadership failure mode Page 14

Some Definitions Root cause manifestations are the specific expression or indication of a root cause in an incident How operational failure modes are expressed in real operations settings are the root cause details aggregated across incidents Basis for creating audit checklist to proactively look for operational risks Example: Supervisor not in control room to discuss problems is an example manifestation for the No Supervision common root cause and the Ineffective First Line Leadership Role common failure mode Page 15

Relation of Failures to Root Causes to s Incident 1 Failure 1 Failure 2 Failure N Incident 2 Failure 1 Failure 2 Failure N Page 16

Incident 1 Relation of Failures to Root Causes to s Incident 2 Failure 1 Failure 2 Failure N Failure 1 Failure 2 Failure N Common Failures Incident failures are often in the analysts own language so some kind of mapping must occur to determine common failures Page 17 In the research project, the team mapped the incident failures to the ASM Effective Operations Practices Guidelines

Incident 1 Relation of Failures to Root Causes to s Incident 2 Failure 1 Failure 2 Failure N Failure 1 Failure 2 Failure N Common Failures Common s In the research project, Common s were simply the count and relative frequency of the TapRoot root causes across the incident sample Page 18

Incident 1 Relation of Failures to Root Causes to s Incident 2 Failure 1 Failure 2 Failure N Failure 1 Failure 2 Failure N Common Failures Common s Common s Page 19

Relation of Failures to Root Causes to s Data at all three levels is needed to: Focus improvement on common and systemic problems Understand why problems occur and develop improvement programs and corrective actions to address real root causes General Common Failures What? What to focus on? Common s Why? Page 20 Specific Common s How? How to address problems?

Incident Texas City Texas City Esso Longford ASM Incident Failure Shift Supervisor did not ensure procedures were being followed It was not clear who was in charge when supervisor was gone No permit was issued or reviewed for the maintenance work Relation of Failures to Root Causes to s Common Failure Effective first line leadership Effective first line leadership Effective first line leadership Common Root Cause No Supervision No Communication Accountability needs improvement Standards, Policies, Admin Controls (SPAC) not followed Supervisor did not check procedure progress before leaving site Supervisor did not communicate with personnel that he was leaving the site No policy that outlines responsibilities when supervisor leaves the site Presence of field operator was assumed to remove need for permit Common Checking procedure progress for area of responsibility Bi-directional communication of status between supervisors and operators Unclear policy for supervisor requirements and expectations Enforcing practices/proce dures across the site Page 21

ASM Effective Practice Work Process Description Site Incident Reports Site Practice Standards Review Incident Reports Identify Common Failures List of operational failures, root causes Custer list of failures per practice standards List of top practice failure modes (covers at least 50%) Identify Common s List top root causes for each failure mode Identify Common s Analyze Gaps in Systems Cluster manifestations associated by root cause Consolidate list to highlight common elements List of weaknesses in management systems and practice standards Site Continuous Improvement Program Define Practice Improvements Implement Practice Changes List of prioritized solutions (cost, impact, etc) Generate improvement action plan to make changes per priority & resource constraints Monitor Impact of Changes Use leading/lagging metrics to track Page 22

Impact of Typical Approach Typical programs look at individual incidents for root causes Action plans developed to address root causes Operations practice failure modes are NOT explicitly identified s of root causes are NOT captured to help identify gaps in management systems and operations practices Continuous improvement programs lack input from incident based gap analysis Site Incident Reports Site Practice Standards Site Continuous Improvement Program Review Incident Reports Identify Common Failures Identify Common s Identify Common s Analyze Gaps in Systems Define Practice Improvements Implement Practice Changes Monitor Impact of Changes Page 23

ASM Approach Failure Modes # % Value of Failure Mode Information Common Failures vs. s Typical Approach s # % Hazard analysis/communication 79 15% First line leadership 65 12% Continuous improvement 60 11% No communication 71 8% Crew Teamwork Needs Improvement 58 7% Hazard Analysis Needs Improvement 46 5% Safety culture 36 7% Initial and refresher training 30 6% Task communications 29 5% Comprehensive MOC 28 5% Cross functional communication 23 4% Compliance with procedures 15 3% Design guidelines and standards 14 3% Other failure modes 160 30% Page 24 TOTAL 539 Management of Change (MOC) Needs Improvement 40 5% Displays Need Improvement 35 4% No supervision 34 4% Corrective Action Needs Improvement 33 5% No Standards, Policy or Administrative Controls (SPAC) 32 4% SPAC confusing or incomplete 32 4% SPAC not followed 29 3% Others 160 51% In our analysis, Common Failure Modes correspond to specific ASM Effective Operations Practices Moreover, failure modes need to map to a site s operations practice standards, policy and guidelines TOTAL 432

Improvement Opportunities for First Line Leadership Improvement opportunities are identified by extracting the root cause profiles for each common failure mode Profiles show distribution of common root causes (i.e., why the failure occurred ) across incidents Profile # % No supervision 14 18% Crew teamwork needs 11 14% improvement SPAC [1] not followed 8 10% MOC needs improvement 6 8% Pre-job briefing needs improvement 5 6% Other 36 45% Total 80 100% 45% 18% 6% 8% 10% No supervision Crew teamwork needs improvement SPAC not followed 14% Management of change (MOC) needs improvement Pre-job briefing needs improvement Other [1] Standards, policies, administrative controls standardized work processes, rules, procedures Page 25

Improvement Opportunity First Line Leadership Identify the root cause manifestations for each profile Specific reasons the failures occurred across incidents s are indicators of failures Potential candidates for leading indicators of incidents (from profile) No supervision Crew teamwork needs improvement Page 26 Checking procedure progress for area of responsibility Being at job site and maintaining situation awareness Identifying and addressing risk to personnel Monitoring high risk activities for problems/issues Enforcing violations of practices/procedures (esp related to safety) Ensuring team members (eg ops, maint) stay coordinated Not correcting/communicating known problems Team members not questioning when evidence of problems Team not focusing on critical activities/indicators (tunnel vision) Supervisor not keeping track of big picture, losing sight of hazards Rating 9 9 3 9 3 1 3 1 3 3

Conclusions If analysis is limited to individual incident analysis, the tendency is to address root causes specific to the incident A single incident focus may miss the larger management system contributions to safety risk Hence, the improvement may not have the intended positive impact Page 27

Conclusions Whereas, if the analysis is based on a sample of incidents (either common failures or root causes) Analysts will make assumptions about how to address high-level root causes such as No supervision s ground improvement opportunities in the incident data increasing the likelihood of understanding the operations practice or management system vulnerabilities Page 28

Discussion/Questions Thank You! Questions and/or Comments? Page 29

Abstract The Abnormal Situation Management Consortium funded a study to investigate common failure modes and root causes associated with operations practices. The study team analyzed 20 public and 12 private incident reports using the TapRoot methodology to identify root causes. These root causes were mapped to operations practice failures. This presentation presents the top ten operations failure modes identified in the analysis. Specific recommendations include how to analyze plant incident reports to better understand the sources of systemic failures and improve plant operating practices. This research study was sponsored by the Abnormal Situation Management (ASM ) Consortium. ASM and Abnormal Situation Management are registered trademarks of Honeywell International, Inc. Page 30