Supporting the recruitment and retention of pre-registration nursing and midwifery students in Scotland through linked NES and HESA data

Similar documents
Child Healthy Weight Interventions

Scottish Hospital Standardised Mortality Ratio (HSMR)

NURSING & MIDWIFERY WORKLOAD & WORKFORCE PLANNING PROJECT RECOMMENDATIONS AND ACTION PLAN NOVEMBER 2006 UPDATE

Findings from the 6 th Balance of Care / Continuing Care Census

Dental Statistics HEAT Target H9: Fluoride varnishing for 3 and 4 year olds

The new Nursing and Midwifery Council (NMC) standards for pre-registration nursing education. Advice on implementation for health services in Scotland

NHS EDUCATION FOR SCOTLAND JOB DESCRIPTION

Primary Care Workforce Survey Scotland 2017

Emergency Department Waiting Times

Dear Colleague. Performers List National Application Arrangements. Summary

NHSScotland Child & Adolescent Mental Health Services

Improving ethnic data collection for equality and diversity monitoring

Findings from the Balance of Care / NHS Continuing Health Care Census

Sharing Information at First Entry to Registers September 2008

NES General Practice Nursing Education Supervisor (General Practice, Medical Directorate)

Improving ethnic data collection for equality and diversity monitoring

Prescribing & Medicines: Reimbursement and remuneration paid to dispensing contractors

Findings from the Balance of Care / Continuing Care Census

Familial Hypercholesterolaemia Quality Improvement Tool Instruction Guide

Diagnostic Waiting Times

Diagnostic Waiting Times

Improving ethnic data collection for equality and diversity monitoring NHSScotland

SUBJECT: NHSL CORPORATE RISK REGISTER. For approval For endorsement X To note. Prepared Reviewed X Endorsed

TABLE 1. THE TEMPLATE S METHODOLOGY

UK Renal Registry 20th Annual Report: Appendix A The UK Renal Registry Statement of Purpose

The NHS management workforce

Hospital Maternity Activity

LOCAL SUPERVISING AUTHORITY WEST OF SCOTLAND WORK PLAN

UK Cystic Fibrosis Registry. Data sharing policy

Diagnostic Waiting Times

Diagnostic Waiting Times

Alcohol Brief Interventions 2015/16

Higher Education Students and Qualifiers at Scottish Institutions

Diagnostic Waiting Times

Work-Based Learning Programme for the Honour s Degree in Pre-Registration Nursing

National Health and Social Care Workforce Plan. Part 2 a framework for improving workforce planning for social care in Scotland

Diagnostic Waiting Times

Diagnostic Waiting Times

The UK nursing labour market review 2017 CORPORATE

GUIDANCE ON SUPPORTING INFORMATION FOR REVALIDATION FOR SURGERY

Waiting Times Recording Manual Version 5.1 published March 2016

Diagnostic Waiting Times

NHS Governance Clinical Governance General Medical Council

Guidance on supporting information for revalidation

STAFFORD & SURROUNDS PROFESSIONAL REGISTRATION

HUMAN RESOURCES POLICY

Nursing and Midwifery Student Working Hours in Practice Guidance

Child & Adolescent Mental Health Services in NHS Scotland

2017/18 Fee and Access Plan Application

UK Renal Registry 13th Annual Report (December 2010): Appendix A The UK Renal Registry Statement of Purpose

Supporting information for appraisal and revalidation: guidance for pharmaceutical medicine

Supporting information for appraisal and revalidation: guidance for Supporting information for appraisal and revalidation: guidance for ophthalmology

Prescribing & Medicines: Reimbursement and remuneration paid to dispensing contractors

NHS WALES INFORMATICS SERVICE DATA QUALITY STATUS REPORT ADMITTED PATIENT CARE DATA SET

Nursing associates Consultation on the regulation of a new profession

Revalidation for Nurses

EQuIPNational Survey Planning Tool NSQHSS and EQuIP Actions 4.

Mandating patient-level costing in the ambulance sector: an impact assessment

Family and Community Support Services (FCSS) Program Review

North West Universities: NMP collaboration

Diagnostic Waiting Times

Ashfield Healthcare Nurse Agency Ashfield House Resolution Road Ashby-de-la-Zouch LE65 1HW

Alcohol Brief Interventions 2016/17

S.S.T.S. Adult Inpatient Workload Tool

Supporting information for appraisal and revalidation: guidance for psychiatry

Standards to support learning and assessment in practice


I SBN Crown copyright Astron B31267

Public Health Skills and Career Framework Multidisciplinary/multi-agency/multi-professional. April 2008 (updated March 2009)

Caring Together and Getting It Right for Young Carers The Carers Strategies for Scotland Workforce Training and Education Plan.

Response to the Department for Education Consultation on the Draft Degree Apprenticeship Registered Nurse September 2016 Background

25/02/18 THE SOCIAL CARE WALES (REGISTRATION) RULES 2018

Association of Pharmacy Technicians United Kingdom

Review of Follow-up Outpatient Appointments Hywel Dda University Health Board. Audit year: Issued: October 2015 Document reference: 491A2015

Supporting information for appraisal and revalidation: guidance for Occupational Medicine, April 2013

Supporting information for appraisal and revalidation: guidance for Occupational Medicine, June 2014

NHS Ayrshire & Arran Adverse Event Management: Review of Documentation Supplementary Information Requested by NHS Ayrshire & Arran

25/02/18 THE SOCIAL CARE WALES (REGISTRATION) RULES 2018

NHS WALES INFORMATICS SERVICE DATA QUALITY STATUS REPORT ADMITTED PATIENT CARE DATA SET

EHDI TSI Program Narrative

Should you have any queries regarding the consultation please

Primary Health Network Core Funding ACTIVITY WORK PLAN

Increasing employment rates for ethnic minorities

Prescribing & Medicines: Reimbursement and remuneration paid to dispensing contractors

3. Q: What are the care programmes and diagnostic groups used in the new Formula?

Model Agreement between Lead Partners and partners of an INTERREG IVC project (Partnership Agreement) 1

Delayed Discharges in NHS Scotland

National Health and Social Care Workforce Plan. Part 1 a framework for improving workforce planning across NHS Scotland

Lead Clinicians of Heart Disease Managed Clinical Networks Regional Planning Groups Cardiac Voluntary Sector Organisations

General Nursing Council for Scotland (Education) Fund 1983 and Margaret Callum Rodger Midwifery. Award. Scholarship Information for Applicants

Hospital Standardised Mortality Ratios

NHS Grampian Equal Pay Monitoring Report

Research Passport Application Form Version 3 01/09/2012

Retrospective Chart Review Studies

Licensed Nurses in Florida: Trends and Longitudinal Analysis

National Accreditation Guidelines: Nursing and Midwifery Education Programs

Welsh Government Response to the Report of the National Assembly for Wales Public Accounts Committee Report on Unscheduled Care: Committee Report

Child & Adolescent Mental Health Services in NHS Scotland

Using Longitudinal Patient Encounter Data to Enhance Learning. Dr Simon Morgan Dr Parker Magin

Part(s) of the register: Registered nurse sub part 2 Adult nursing L2 October 1980 Registered nurse sub part 1 Adult nursing L1 Sept 1998

Transcription:

Supporting the recruitment and retention of pre-registration nursing and midwifery students in Scotland through linked NES and HESA data Laura Kate Campbell ISD Scotland l.campbell5@nhs.net Colin Tilley NHS Education for Scotland c.tilley@nhs.net Claire Tochel NHS Education for Scotland c.tochel@nhs.net June 14, 2012

Contents 1 Introduction 2 2 Data 2 2.1 NHS Education for Scotland data.................. 2 2.2 Higher Education Statistics Agency data.............. 3 2.3 Linked NES and HESA data.................... 5 2.4 Summary............................... 6 3 Results 7 3.1 Descriptive statistics......................... 7 3.1.1 Domicile............................ 7 3.1.2 Category of dependants................... 7 3.1.3 Postcode........................... 8 3.2 Spatial analysis............................ 9 3.3 Five-year completion rates for 36-month pre-registration courses 13 3.4 Summary............................... 13 4 Summary 14 5 Recommendations 15 List of Figures 1 The distribution of observations by SIMD............. 10 2 The distribution of observations by SG 6-fold urban rural index. 11 3 Data zones of pre-registration nursing and midwifery students.. 12 List of Tables 1 Sample NES data........................... 3 2 Sample HESA data.......................... 4 3 NES and HESA match........................ 5 4 NES and HESA match by cohort.................. 5 5 NES and HESA match by HEI................... 6 6 Sample of matched NES and HESA data.............. 6 7 Sample NES and HESA data with the first observation...... 7 8 Domicile of nursing and midwifery students by cohort...... 8 9 Number of dependants by cohort.................. 8 10 NES and HESA match by HEI................... 9 11 Scottish Government 6 fold Urban Rural Classification...... 9 12 Classification of observations.................... 13 1

1 Introduction The purpose of this report is to assess the extent to which information from the Higher Education Statistics Agency can provide additional support to the recruitment and retention of pre-registration nurses. The rationale for using HESA data arose from previous work undertaken by the Data Enhancement Sub Group, which used NES data to identify several determinants of five-year completion rates from 36-month pre-registration courses in Scotland but was not able to predict students at risk of not completing their course with any accuracy. The remainder of the report examines the extent to which HESA data, in combination with NES data, can support the recruitment and retention of preregistration nurses in Scotland. Section 2 describes the NES (section 2.1), HESA (section 2.2) and linked NES-HESA data (section 2.3). Section 3 uses the linked data to describe the variables that are only available in HESA data (section 3.1), provide a spatial analysis of pre-registration nursing and midwifery training (section 3.2) and examine the ability of variables in HESA to add explanatory power to the analysis of five-year completion rates. 2 Data 2.1 NHS Education for Scotland data Until 2002, the Records Team of the National Board for Nursing, Midwifery and Health Visiting for Scotland (NBS) maintained a database of the details of all people undertaking educational programmes in Scotland leading to either registration or recording with the United Kingdom Central Council (UKCC). The NBS also had statutory responsibility for verifying documentary evidence on students. In April 2002, the NBS became part of NHS Education for Scotland, the UKCC became the Nursing and Midwifery Council (NMC) and the regulatory element of the Records Team was devolved to the Higher Education Institutions (HEIs). The data are submitted to the NES records team from HEIs at the following events in the student experience: commencement of a programme; discontinuation from a programme (to which a student may resume or restart); and completion of a programme. These events are used to identify the state of each student at any point in time. A student is either active (started a programme but has neither completed nor discontinued), inactive (started a programme but has discontinued) or eligible (started and completed a programme). For the purposes of this report, the data were restricted to 36-month preregistration programmes. The NES data consist of one observation per student cohort with the following variables: PIN, date of birth, sex, cohort, educational qualifications on entry, course, HEI and whether the student was eligible for registration with the NMC when the data were extracted. Table 1 illustrates 2

the structure of NES data using artificial information on three students, each identified by their PIN, which is a unique identifier Each observation corresponds to a unique combination of student and cohort. Table 1: Sample NES data PIN DoB Forename Surname HEI Cohort 1S 01-Jan-80 Jane Smith A 2002 2S 01-Jan-80 Jane Jones A 2002 3S 01-Jan-80 Jane Johnson A 2002 2.2 Higher Education Statistics Agency data The Higher Education Statistics Agency is a national body set up by agreements between government, higher education funding councils and HEIs across the UK in 1993. The aim was to establish an integrated system of collection, analysis and dissemination of annually updated information about HEI activity. This includes detailed staff and student records and property information. The standardised electronic data collection system gathers individual level, quantitative data from HEIs in a reporting cycle which runs annually between August and July. The data is provided from HEIs to HESA retrospectively, and therefore constitutes an end-of-year record of student information. HESA provides routine public access to anonymised aggregate data on students by subject area or geographical region, but in addition works with stakeholders to offer bespoke analyses and further data access by negotiation. Annual student records contain extensive information about individuals on a particular course at a particular HEI, including a unique identifier (HESA ID). Associated with that HESA ID are many other variables: Demographics name, birth date, gender, nationality, country of domicile, postcode on entry, educational qualifications on entry, whether they had dependants on entry and term time postcode; Course information course title and subject, start and end dates, institution and campus details, expected course duration; and Study information year of study, which increments by one each year since a student began the course and year of programme, which increments by one as student progresses through each year of the course, reason for leaving and status. While some data are acquired at the point of application to a course and are fixed, other fields may change during the student s training. Initial investigation of these data indicated that the timeliness and accuracy of changes may not be consistent between institutions and years. ISD acquired HESA data from the Scottish Government s Analytical Services Unit Lifelong Learning team (ASU-LL) via a data sharing agreement. This agreement gave ISD 11 years of data on nursing and midwifery students which ranged from academic years 1999/2000 to 2009/2010. The agreement allows 3

HESA data to be linked with NES, the Scottish Workforce Information Standard System (SWISS) and Scottish Neighbourhood Statistics (SNS) data. In addition, there is a separate data processing arrangement between NES and ISD, which allows NES and SWISS data to be linked. By contrast with NES data, HESA data consist of one observation for each year that a student is recorded in training. Consider table 2, which contains artificial data for three students. The student with HESA ID 1 is observed in their first, second and third years. The student with HESA ID 2 is recorded in four years, two of which they are recorded as being in year 1 of their programme. Their progression to the second year of their programme may have been delayed due to a temporary discontinuation or because they repeated some or all of their first year. The student with HESA ID 3 was also recorded in four consecutive years but the final record may indicate that the student s completion date was recorded after the cut off point for that year s data submission or that they repeated year three due to temporary drop out or academic failure. The precise progression of students within HESA can only be verified by using several pieces of information on the student, year of student, year of programme and reasons for progression. Table 2: Sample HESA data HESA ID DoB Forename Surname HEI Cohort Student year 1 01-Jan-80 Jane Smith A 2002 1 1 1 01-Jan-80 Jane Smith A 2002 2 2 1 01-Jan-80 Jane Smith A 2002 3 3 2 01-Jan-80 Jane Jones A 2002 1 1 2 01-Jan-80 Jane Jones A 2002 2 1 2 01-Jan-80 Jane Jones A 2002 3 2 2 01-Jan-80 Jane Jones A 2002 4 3 3 01-Jan-80 Jane Johnson A 2002 1 1 3 01-Jan-80 Jane Johnson A 2002 2 2 3 01-Jan-80 Jane Johnson A 2002 3 3 3 01-Jan-80 Jane Johnson A 2002 4 3 Prog. year While there are several variables that could potentially enhance NES data, this report focuses on a student s domicile, whether they had dependants, their academic progression and their postcode. A student s domicile on entry is of interest because this may indicate the likelihood of retention within Scotland and also because the Nursing and Midwifery Student Bursary is only available to students domiciled in the European Economic Area. The number of dependants for each student is important because previous research has reported the importance of child care issues in the retention of nurses and midwives. A student s year of study and year of programme provide information on academic progression that is not directly available in NES data. The postcode of a student on entry allows their location to be mapped and also allows additional data on the remoteness, rurality and deprivation of locations to be added. 4

2.3 Linked NES and HESA data The variables used to match NES and HESA data were the Soundex of the student s surname and forename, their date of birth, HEI and cohort. 1 The combination of these matching variables gave rise to a unique combination in the NES data but, because HESA data consist of one observation for each year that a student is recorded in training, several observations in the HESA data. The matching process was therefore a one to many match between NES and HESA data respectively. A first pass at matching the data revealed a large discrepancy in the accuracy of the match before and after 2001. This was for two reasons. First, HESA data are missing information from Bell College before 2001. Second, about 50% of surnames and forenames were missing in the HESA data for 2000 and 2001. In light of these reasons, the match was restricted to cohorts 2002 to 2009. There were 23,369 observations in the NES data between 2002 and 2009. Of these, table 3 shows that only 867 observation or 3.7% remained unmatched to HESA data. The 22,502 (23,369-867) NES observations that were matched to HESA gave rise to 59,224 matched observations because each NES observation matched to one or more HESA observations. Table 3: NES and HESA match Source Observations NES only 867 HESA only 64,679 NES and HESA 59,224 Table 4 shows that the percentage of unmatched NES observations is relatively constant between cohorts. Table 4: NES and HESA match by cohort Cohort NES observations Unmatched % 2002 3,016 101 3.35 2003 3,027 96 3.17 2004 3,049 94 3.08 2005 3,001 118 3.93 2006 2,813 95 3.38 2007 2,770 119 4.30 2008 2,745 120 4.37 2009 2,948 124 4.21 Total 23,369 867 3.71 Table 5 shows that the percentage of unmatched NES observations varies between HEIs. In particular, the percentage of unmatched observations from Bell College was three and a half times higher than from Dundee. 1 Soundex is a phonetic algorithm for indexing names by sound, which enhances likelihood of match by eliminating common spelling variations. 5

Table 5: NES and HESA match by HEI HEI NES observations Unmatched % Bell 1,930 135 6.99 Dundee 3,592 72 2.00 GCU 3,856 179 4.64 Napier 3,938 145 3.68 RGU 2,760 65 2.36 Stirling 2,763 90 3.26 UWS 4,530 181 4.00 Total 23,369 867 Table 6 illustrates the matched data using the artificial data in tables 1 and 2. Table 6 also indicates the first benefit of linking these data: NES data are able to indicate the observations in HESA data that correspond to 36-month pre-registration training. PIN HESA ID Table 6: Sample of matched NES and HESA data DoB Forename Surname HEI Cohort Student Prog. year year 1S 1 01-Jan-80 Jane Smith A 2002 1 1 1S 1 01-Jan-80 Jane Smith A 2002 2 2 1S 1 01-Jan-80 Jane Smith A 2002 3 3 2S 2 01-Jan-80 Jane Jones A 2002 1 1 2S 2 01-Jan-80 Jane Jones A 2002 1 1 2S 2 01-Jan-80 Jane Jones A 2002 2 2 2S 2 01-Jan-80 Jane Jones A 2002 4 3 3S 3 01-Jan-80 Jane Johnson A 2002 1 1 3S 3 01-Jan-80 Jane Johnson A 2002 2 2 3S 3 01-Jan-80 Jane Johnson A 2002 3 3 3S 3 01-Jan-80 Jane Johnson A 2002 4 3 2.4 Summary This section showed that: there is a series of information governance arrangements produced as a direct result of the recruitment and retention work, which allows NES and ISD to work with linked individual level data from NES, HESA and SWISS; it is feasible to link NES and HESA data; the accuracy of the link is a function of the availability of common variables between the two data sets; HESA data adds several pieces of information that are not otherwise available such as a student s domicile, whether they had dependants, their postcode on entry and their academic progress; 6

the link between NES and HESA was able to identify pre-registration courses in HESA that would otherwise have been difficult to identify; more than 96% of NES observations were matched to at least one HESA observation; and. while the percentage of matched records was relatively constant between cohorts, there was some variation between HEIs. 3 Results 3.1 Descriptive statistics This section examines the added value variables from matching NES and HESA data: a student s domicile, whether they had dependants, their academic progression and their postcode. In light of the more complex longitudinal structure of the linked data, the academic progression of students is not examined in this report but will be subject to further analysis. Attention was therefore restricted to the first HESA observation that was matched to NES data as highlighted in table 7. PIN Table 7: Sample NES and HESA data with the first observation HESA ID DoB Forename Surname HEI Cohort StudentProg. year year 1S 1 01-Jan-80 Jane Smith A 2002 1 1 1S 1 01-Jan-80 Jane Smith A 2002 2 2 1S 1 01-Jan-80 Jane Smith A 2002 3 3 2S 2 01-Jan-80 Jane Jones A 2002 1 1 2S 2 01-Jan-80 Jane Jones A 2002 2 1 2S 2 01-Jan-80 Jane Jones A 2002 3 2 2S 2 01-Jan-80 Jane Jones A 2002 4 3 3S 3 01-Jan-80 Jane Johnson A 2002 1 1 3S 3 01-Jan-80 Jane Johnson A 2002 2 2 3S 3 01-Jan-80 Jane Johnson A 2002 3 3 3S 3 01-Jan-80 Jane Johnson A 2002 4 3 3.1.1 Domicile Table 8 reports the domicile of students on entry to pre-registration nursing and midwifery courses by cohort and shows that the vast majority of students were domiciled in the UK. Unfortunately, these data did not distinguish between jurisdictions within the UK. Unfortunately this lack of variation between domicile categories means that a student s domicile is unlikely to be useful in identifying students who don t complete. However, these data may be useful to examine the retention of students within NHSScotland. 3.1.2 Category of dependants Table 9 categorises dependants by cohort and shows that this was only reported routinely from 2007 onwards. Table 9 does however show that in the 2007, 2008 7

Table 8: Domicile of nursing and midwifery students by cohort cohort UK EEA Rest of World Missing Total 2002 2,793 9 51 62 2,915 2003 2,828 6 53 44 2,931 2004 2,891 4 6 54 2,955 2005 2,767 1 12 103 2,883 2006 2,592 3 24 99 2,718 2007 2,582 12 5 52 2,651 2008 2,587 9 6 23 2,625 2009 2,785 26 12 1 2,824 Total 21,825 70 169 438 22,502 and 2009 cohorts a significant minority, almost 20% in 2008, had dependants. Table 9: Number of dependants by cohort cohort Missing Dependants No dependants Total 2002 2,915 0 0 2,915 2003 2,930 1 0 2,931 2004 2,944 4 7 2,955 2005 2,827 18 38 2,883 2006 2,691 12 15 2,718 2007 1,318 408 925 2,651 2008 542 553 1530 2,625 2009 228 464 2132 2,824 Total 16,395 1460 4647 22,502 3.1.3 Postcode HESA data contain information on each student s postcode on entry. These postcodes alone convey little additional information. However, there is a wealth of further data that can be linked once a student s postcode is known. For the purposes of this report, the postcodes of students were matched to Scottish Neighbourhood Statistics data, which contains the Scottish Index of Multiple Deprivation and the Scottish Government s six fold urban rural classification. Table 10 shows the extent of the match between the linked NES-HESA data and the SNS postcode directory. Of the 22,502 NES-HESA observations, postcode information was matched to 21,436 or 95.26% observations: 405 observations had missing values; 110 had a missing value code (99999999); and 551 had either invalid or non-scottish postcodes. For the purposes of this report, these unmatched observations were omitted from further analysis. The Scottish Index of Multiple Deprivation (SIMD) 2009 identifies small area concentrations of multiple deprivation in Scotland. Each of the 6,505 data zones in Scotland is ranked according to several indicators of deprivation. These SIMD ranks can be grouped into 20 groups of equal frequency or vigintiles. The most deprived five percent of data zones are in the first vigintile. Figure 1 shows that students are approximately uniformly distributed by SIMD vigintile. This 8

Table 10: NES and HESA match by HEI Data Observations NES-HESA 1,066 SNS 173,554 NES-HESA and SNS 21,436 Total 196,056 suggests that pre-registration nursing and midwifery students were equally likely to be drawn from across the SIMD distribution. Figure 1: The distribution of observations by SIMD Percent 0 2 4 6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SIMD 2009 vigintiles The Scottish Government Urban Rural Classification provides a standard definition of urban and rural areas in Scotland as described in table 11. Figure 2 shows that the majority of pre-registration nursing and midwifery students apply from either Large Urban Areas or Other Urban Areas. The distribution of pre-registration nursing and midwifery students in figure 2 is about the same as the population distribution as reported in table 11. 3.2 Spatial analysis Figure 3 illustrates the data zones from which the nursing and midwifery students in the linked NES-HESA data applied. Similar maps could be produced for each HEI to show the location of their students on entry. 9

Table 11: Scottish Government 6 fold Urban Rural Classification Category Description Population % 1 Large Urban Areas Settlements of over 125,000 people 38.9 2 Other Urban Areas Settlements of 10,000 to 125,000 30.6 people 3 Accessible Small Towns Settlements of between 3,000 and 8.5 10,000 people and within 30 minutes drive of a settlement of 10,000 or more 4 Remote Small Towns Settlements of between 3,000 and 3.8 10,000 people and with a drive time of over 30 minutes to a settlement of 10,000 or more 5 Accessible Rural Areas with a population of less 11.6 than 3,000 people, and within a 30 minute drive time of a settlement of 10,000 or more 6 Remote Rural Areas with a population of less than 3,000 people, and with a drive time of over 30 minutes to a settlement of 10,000 or more 6.5 Figure 2: The distribution of observations by SG 6-fold urban rural index Percent 0 10 20 30 40 1 2 3 4 5 6 SG UR six fold classification 10

Figure 3: Data zones of pre-registration nursing and midwifery students FIGmap.pdf 11

3.3 Five-year completion rates for 36-month pre-registration courses Previous work using NES data has shown that several factors are significant determinants of five-year completion rates but also that in combination these factors are unable to accurately predict students who don t complete. Table 12 reports the extent to which the addition of SIMD vigintiles and the SG s urban rural index increases the accuracy of the classification of students who complete within five years. Table 12 reports three related measures of the predictive ability of the variables from NES and linked NES-HESA data. Sensitivity is defined as the probability that a student is predicted to complete given that they do complete. Specificity is defined as the probability that a student is predicted to not complete given that they don t complete. Table 12 shows that neither the sensitivity, specificity or percentage of observations correctly classified changes very much as a result of the addition of the SIMD vigintiles and the SG s urban rural index. Table 12: Classification of observations NES NES-HESA % % Sensitivity 97.35 97.28 Specificity 7.82 8.48 Correctly classified 70.46 70.60 3.4 Summary This section showed that: while domicile may be used to examine trends, there is insufficient variation between domicile categories in the available data for it to be used to identify students who don t complete; Whether a student had dependants is only available in HESA from 2007 onwards; The academic progression of students has the potential to add significant insight but because of the more complex longitudinal structure of the data it has not been examined in any detail yet; the student s postcode on entry can be used to map the location of students on entry and to link additional spatial information such as SIMD and measures of remoteness and rurality; the distribution of students by SIMD is about the same as the population distribution; the distribution of students by the SG s urban rural index is about the same as the population distribution; and SIMD and the SG s UR index add very little explanatory power nor do they substantially the percentage of correctly classified observations. 12

4 Summary Section 2 showed that: there is a series of information governance arrangements produced as a direct result of the recruitment and retention work, which allows NES and ISD to work with linked individual level data from NES, HESA and SWISS; it is feasible to link NES and HESA data; the accuracy of the link is a function of the availability of common variables between the two data sets; HESA data adds several pieces of information that are not otherwise available such as a student s domicile, whether they had dependants, their postcode on entry and their academic progress; the link between NES and HESA was able to identify pre-registration courses in HESA that would otherwise have been difficult to identify; more than 96% of NES observations were matched to at least one HESA observation; and. while the percentage of matched records was relatively constant between cohorts, there was some variation between HEIs. Section 3 showed that: while domicile may be used to examine trends, there is insufficient variation between domicile categories in the available data for it to be used to identify students who don t complete; Whether a student had dependants is only available in HESA from 2007 onwards; The academic progression of students has the potential to add significant insight but because of the more complex longitudinal structure of the data it has not been examined in any detail yet; the student s postcode on entry can be used to map the location of students on entry and to link additional spatial information such as SIMD and measures of remoteness and rurality; the distribution of students by SIMD is about the same as the population distribution; the distribution of students by the SG s urban rural index is about the same as the population distribution; and SIMD and the SG s UR index add very little explanatory power nor do they substantially the percentage of correctly classified observations. 13

5 Recommendations Undertake quality assurance of data matching 1. Further checks on matching and non-matching cases should be conducted, including a cross validation of other shared information between the two datasets and investigation of the relationship between linked variables. 2. Other matching methods such as probability matching should be conducted to assess the accuracy of the match and investigate potential for further enhancement. Short term data analysis 1. Further checks on the results should be made including exploring the distribution of students by SIMD and the SG s urban rural index 2. The reasons for the missing 50% of student names in 2000 and 2001 should be explored 3. The reasons for jurisdictions of the UK being combined should be explored 4. The value of spatial analysis should be explored further by examining whether the progression of students is a function of the location of students on entry. Long term data analysis 1. Assess the reliability of the academic progression data in HESA and undertake longitudinal analysis of completion rates taking this into account. 2. Examine whether the longitudinal structure of HESA data can increase the percentage of correctly classified observations. 3. The feasibility and information governance around linking to local labour market statistics should be explored. 4. NES and HESA data should be linked annually to support recruitment and retention 5. NES and SWISS data should be linked annually to support recruitment and retention 14