A Semi-Supervised Recommender System to Predict Online Job Offer Performance

Similar documents
Optimization Problems in Machine Learning

Enhancing Sustainability: Building Modeling Through Text Analytics. Jessica N. Terman, George Mason University

Automatically Recommending Healthy Living Programs to Patients with Chronic Diseases through Hybrid Content-Based and Collaborative Filtering

INPATIENT SURVEY PSYCHOMETRICS

Demand and capacity models High complexity model user guidance

Staffing and Scheduling

Implementation of Automated Knowledge-based Classification of Nursing Care Categories

Statistical Analysis Tools for Particle Physics

Planning Calendar Grade 5 Advanced Mathematics. Monday Tuesday Wednesday Thursday Friday 08/20 T1 Begins

Prediction of High-Cost Hospital Patients Jonathan M. Mortensen, Linda Szabo, Luke Yancy Jr.

CWE FB MC project. PLEF SG1, March 30 th 2012, Brussels

Predicting Medicare Costs Using Non-Traditional Metrics

Quality Management Building Blocks

Summary of Findings. Data Memo. John B. Horrigan, Associate Director for Research Aaron Smith, Research Specialist

Applying client churn prediction modelling on home-based care services industry

SF Fed, March 18 Hyunyoung Choi Hal Varian

Public Funding and Its Relationship to Research Outcomes. Paula Stephan Georgia State University & NBER UNU-MERIT/MGSoG Conference November 2014

FCSM Research and Policy Conference March 8, 2018 Joshua Goldstein

Free to Choose? Reform and Demand Response in the British National Health Service

User Guide for Patients

The Life-Cycle Profile of Time Spent on Job Search

What Job Seekers Want:

29A: Hours may be used as the Base labor increment. 28Q: Are human in the loop solutions of interest for ASKE? 28A: Yes

Big Data ESSNet - WP1 Research Plan for SGA-2 (version 2.0)

Factorial Design Quantifies Effects of Hand Hygiene and Nurse-to-Patient Ratio on MRSA Acquisition

An Application of Factorial Design to Compare the Relative Effectiveness of Hospital Infection Control Measures

Fertility Response to the Tax Treatment of Children

Space Dynamics Laboratory (SDL) Request for Proposals for the Government Fiscal Year (GFY) 2016 University Nanosatellite Program (UNP)

QUEUING THEORY APPLIED IN HEALTHCARE

Profit Efficiency and Ownership of German Hospitals

Satisfaction and Experience with Health Care Services: A Survey of Albertans December 2010

Publication Development Guide Patent Risk Assessment & Stratification

Personalized Job Matching

ESSnet WP1: Webscraping Job Vacancy advancement review - France

CLINICAL PREDICTORS OF DURATION OF MECHANICAL VENTILATION IN THE ICU. Jessica Spence, BMR(OT), BSc(Med), MD PGY2 Anesthesia

Roster Quality Staffing Problem. Association, Belgium

Maternal and Child Health North Carolina Division of Public Health, Women's and Children's Health Section

DISTRICT BASED NORMATIVE COSTING MODEL

12/12/2016. The Impact of Shift Length on Mood and Fatigue in Registered Nurses: Are Nurses the Next Grumpy Cat? Program Outcomes: Background

2018 Technical Documentation for Licensure and Workforce Survey Data Analysis Addressing Nurse Workforce Issues for the Health of Florida

Technical Notes on the Standardized Hospitalization Ratio (SHR) For the Dialysis Facility Reports

Training, quai André Citroën, PARIS Cedex 15, FRANCE

2013 Workplace and Equal Opportunity Survey of Active Duty Members. Nonresponse Bias Analysis Report

Clusters, Networks, and Innovation in Small and Medium Scale Enterprises (SMEs)

The Relationship between Structural and Psychological Empowerment and Participation in Continuing Professional Development in Oncology Nurses

Palomar College ADN Model Prerequisite Validation Study. Summary. Prepared by the Office of Institutional Research & Planning August 2005

Maintenance Outsourcing - Critical Issues

Introduction FUJITSU APPROACH FOR TACKLING THE TECHNICAL CHALLENGES RELATED TO THE MANAGEMENT OF EHR

Comparison of New Zealand and Canterbury population level measures

UNITED STATES PATENT AND TRADEMARK OFFICE The Patent Hoteling Program Is Succeeding as a Business Strategy

Accountable Care Atlas

How Local Are Labor Markets? Evidence from a Spatial Job Search Model. Online Appendix

Settling for Academia? H-1B Visas and the Career Choices of International Students in the United States

The Hashemite University- School of Nursing Master s Degree in Nursing Fall Semester

Unemployment. Rongsheng Tang. August, Washington U. in St. Louis. Rongsheng Tang (Washington U. in St. Louis) Unemployment August, / 44

Decision Fatigue Among Physicians

SCHOOL - A CASE ANALYSIS OF ICT ENABLED EDUCATION PROJECT IN KERALA

Big Data NLP for improved healthcare outcomes

Repeater Patterns on NCLEX using CAT versus. Jerry L. Gorham. The Chauncey Group International. Brian D. Bontempo

Impacts of Trade liberalization on Labor allocation in Vietnam

Long-Stay Alternate Level of Care in Ontario Mental Health Beds

A strategy for building a value-based care program

Measuring healthcare service quality in a private hospital in a developing country by tools of Victorian patient satisfaction monitor

Nonprofit Organizations & Social Media Fundraising: An Analysis of the GoodGiving Guide Challenge

University of Michigan Health System. Program and Operations Analysis. CSR Staffing Process. Final Report

Artificial Intelligence Changes Evidence Based Medicine A Scalable Health White Paper

Chapter 1 INTRODUCTION TO THE ACS NSQIP PEDIATRIC. 1.1 Overview

Statistical analysis of atmospherical components in ERS SAR data

Technical Notes for HCAHPS Star Ratings (Revised for October 2017 Public Reporting)

Jobs Demand Report. Chatham-Kent, Ontario Reporting Period of October 1 December 31, February 22, 2017

Fleet and Marine Corps Health Risk Assessment, 02 January December 31, 2015

Do Hiring Credits Work in Recessions? Evidence from France

Tree Based Modeling Techniques Applied to Hospital Length of Stay

Physiotherapy outpatient services survey 2012

The Pennsylvania State University. The Graduate School ROBUST DESIGN USING LOSS FUNCTION WITH MULTIPLE OBJECTIVES

Executive Summary. Rouselle Flores Lavado (ID03P001)

Work search and the power of networks

SSF Call for Proposals: Framework Grants for Research on. Big Data and Computational Science

Last Review: Outcome: Next Review:

Session 3 Highway Safety Manual General Overview. Joe Santos, PE, FDOT, State Safety Office November 6, 2013

Nursing Manpower Allocation in Hospitals

RMS (Resume Management System)

CLINICAL STRATEGY IMPLEMENTATION - HEALTH IN YOUR HANDS

How to deal with Emergency at the Operating Room

Using Computational Approaches to Improve Risk-Stratified Patient Management: Rationale and Methods

PEONIES Member Interviews. State Fiscal Year 2012 FINAL REPORT

Call for Posters. Deadline for Submissions: May 15, Washington, DC Gaylord National Harbor Hotel October 18 21, 2015

Web Appendix: The Phantom Gender Difference in the College Wage Premium

Developing CMFs. Study Types and Potential Biases. Frank Gross VHB

Results of censuses of Independent Hospices & NHS Palliative Care Providers

Faculty of Computer Science

time to replace adjusted discharges

U.S. Naval Officer accession sources: promotion probability and evaluation of cost

Technical Notes for HCAHPS Star Ratings (Revised for April 2018 Public Reporting)

COACHING GUIDE for the Lantern Award Application

CWE Flow-based Market Coupling Project

Author's response to reviews

Transitional Housing Program Progress Reporting Form Recording Transcript

COMPANY CONSULTING Terms of Reference Development of an Open Innovation Portal for UTFSM FSM1402 Science-Based Innovation FSM1402AT8 I.

Tomahawk Deconfliction: An Exercise in System Engineering

Transcription:

A Semi-Supervised Recommender System to Predict Online Job Offer Performance Julie Séguéla 1,2 and Gilbert Saporta 1 1 CNAM, Cedric Lab, Paris 2 Multiposting.fr, Paris October 29 th 2011, Beijing Theory and Application of High-dimensional Complex and Symbolic Data Analysis

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 29th - SDA 2011, Beijing 2

Context: Internet recruitment in France Proportion of job offers (source: APEC) In 2009, 82% of vacancies were published on the internet (66% percent in 2006) October 29th - SDA 2011, Beijing 3

Context: A job posting on a job board Job list 4

Context: A job posting on a job board Job list Structured data Unstructured data Job offer 5

Context: Multiposting of a job offer Illustration of multiposting I choose job boards I key just once my job offer My offer is automatically multiposted Posting returns Profile searched Senior Geophysicist 22 applications Job description Participating as a contributive team member 14 applications 18 applications Our data are provided by Multiposting.fr, an online job posting solution October 29th - SDA 2011, Beijing 6

Context: A hundred of job boards Number of job boards which have at least «X» postings Number of postings 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 0 20 40 60 80 100 Number of job boards Ex: 13 job boards have 1000 postings or more October 29th - SDA 2011, Beijing 7

Objectives With internet expansion, the number of potential job boards is exponentially growing It is now necessary to understand job board performances in order to make adequate choices when posting a job on internet Develop a predictive algorithm of job posting performance on a job board Develop an intelligent tool which recommends the best job boards according to the job offer We present here a recommender system predicting the ranking of job boards with respect to job posting returns October 29th - SDA 2011, Beijing 8

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 29th - SDA 2011, Beijing 9

Introduction to recommender systems General idea: the aim of a recommender system is to help users to find items from huge catalogues that they should appreciate and that they have not seen yet Illustration with a movie recommender system User Harry Potter The Chronicles of Narnia Terminator Rambo The Lord of the Rings Alice 4 5 1?? Bob 5 4 2 1 5 Cindy 3 5? 2 4 David 1? 5 4 2 Fragment of a rating matrix What movie should be recommended to Alice? Bob and Cindy like the same movies as Alice So we should recommend to Alice an other movie that they liked:? = unknown rating «The Lord of the Rings» This is a collaborative system (based on ratings and no use of descriptive variables)

Hybrid system? About recommender systems Prediction are based on ratings obtained by the most similar items with respect to rating vectors Prediction are based on item features (recommends items similar to those that the user liked in the past) Collaborative filtering Content-based filtering Hybrid system (a system which combines collaborative and contentbased approaches) October 29th - SDA 2011, Beijing 11

Our system as a particular case of recommender system Usual recommender objectives / issues Recommendation of items (= postings) to users (= job boards) according to the expected rating (= return) Unlimited number of potential items Sparse matrix: a lot of items, for each item few ratings are known Similarity between items is based on the ratings given by users Our additional issues We are interested in predicting ratings only for «new items»: no rating, only descriptive variables It is not possible to obtain ratings for new items because this is a «one shot» recommendation Posting return is more complex than a rating (usually between 0 and 5): much variability within and between users We need to understand posting return variability October 29th - SDA 2011, Beijing 12

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 29th - SDA 2011, Beijing 13

Complexity of our data and issues Which factors are relevant to explain job posting performance? - Identification of potential factors (job characteristics, job board, job market, etc.), coming from different sources (job offer, demographic data source, firm data, etc.) - Use of Text mining techniques to extract relevant descriptors from the job offer High dimensional data - We are working with structured and unstructured data which have to be handle simultaneously - Job postings are described by thousands of features - Features have to be weighted in the algorithm according to their power of explanation October 29th - SDA 2011, Beijing 14

Complexity of our data and issues: display length Irregular flow of applications and different display length because: - Each job board has a specific length of display - Some job postings are stopped before their end We have to predict posting daily performance for a given time Number of application received Number of application received per day Displaying day 15 Length of display

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 28th - SDA 2011, Beijing 16

Methodology: General overview of the recommender system October 29th - SDA 2011, Beijing 17

Methodology: Handling of structured data Categorical variables contract type education level career level location (region) job category (occupation) Industry Type of recruiter (company, recruitment agency, etc.) year month Quantitative variables Location (city, employment area) demographic characteristics: -Population -Unemployed people -Working people Displaying time Categorical variables are recoded into dummy variables October 29th - SDA 2011, Beijing 18

Handling of unstructured data: job offer text representation Latent Semantic Indexing (LSI) with TF-IDF weighting 1) Document-term matrix 2) Weighting 3) SVD 4) Document coordinates in the latent semantic space: Local weighting: TF (Term Frequency) Global weighting: IDF (Inverse Document Frequency)

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 29th - SDA 2011, Beijing 20

Methodology: Computing of PLS components Why PLS? The number of predictors can be large compared to the number of observations Components are independent and highly correlated with the dependent variable Dimensionality reduction Method: Extraction of PLS components: NIPALS algorithm Number of components chosen by cross-validation Selection of relevant predictors thanks to VIP indicator ( > 0.8 ) Computing of PLS components based on the predictors kept October 29th - SDA 2011, Beijing 21

Methodology: Similarity measures Computing of new posting similarity with respect to all past postings It supposes that similar items regarding to their PLS components should have similar returns for a given job board Method: Computation of euclidean distances between posting coordinates Similarity is a decreasing function of euclidean distance: Mean Distance max - distance Inverse distance Gaussian function Exponential function

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 29th - SDA 2011, Beijing 23

Methodology: Return estimation Expected return of an item (posting) i 1 is estimated thanks to an aggregating function computed on item neighborhood Neighborhood is defined by the K nearest neighbors of item i 1 with respect to the used similarity measure R u,i1 = expected return of item i 1 for user u (job board) r u,ik = return of item i k for user u October 29th - SDA 2011, Beijing 24

Methodology: Other approaches for comparison 1 - Comparison with PLS regression (model-based recommendation) Computing of PLS components (method was described before) Regression of PLS components on the dependent variable Prediction by 10-fold cross validation 2 - Comparison with a non-supervised system based on text features (heuristic-based recommendation) LSI with TF-IDF weighting and 50 dimensions Similarity measures are computed directly on LSI coordinates Same measures as those used in the semi-supervised system Same estimation technique October 29th - SDA 2011, Beijing 25

Advantages and weaknesses of the three approaches Linearity constraint Risk of overfitting Interpreting Weight fitting PLS-R yes yes yes yes Non supervised system no no no no Semi-supervised system no low yes yes October 29th - SDA 2011, Beijing 26

Methodology: System evaluation U = set of job boards D u = set of postings with an observed return for job board u r u,i = return of posting i on job board u p u,i = predicted return of posting i on job board u Mean Absolute Error (mean error per job board) October 29th - SDA 2011, Beijing 27

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 29th - SDA 2011, Beijing 28

Experiments: Data perimeter Objective: predict the number of applications received for a new posting on a job board We keep in the sample job boards with at least 100 postings Dependent variable: number of applications / display length Number of postings 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 0 20 40 60 80 100 Number of job boards 31 job boards 14 334 postings 30875 returns October 25th - SDA 2011, Beijing 29

Comparison of job board returns Illustration of return variability in and between job boards (one boxplot by job board) October 29th - SDA 2011, Beijing 30

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 29th - SDA 2011, Beijing 31

Results: Introducing of new relevant descriptors Improving results by adding relevant descriptors Best on how System MAE many job Return 70 boards? variability 60 Average Recommender PLS R (text features + additional variables) PLS R (text features) Average Recommender 10.2 2 50 40 PLS-R text features 8.0 5 30 20 PLS-R text t features + job characteristics + location characteristics 10 7.5 24 0 0 2000 4000 Number of postings 32

Non-supervised approach: Discussion about parameters MAE according to the number of neighbors and parameter in gaussian and exponential functions gaussian (σ) gaussian (1/2 σ) exp (σ) exp (1/3 σ) MAE 8,8 gaussian (1/3 σ) gaussian (1/4 σ) PLS R MAE 8,8 exp (1/4 σ) exp (1/8 σ) PLS R 8,6 8,6 84 8,4 84 8,4 8,2 8,2 8 8 7,8 7,8 7,6 7,4 7,2 7 0 50 100 number of neighbors 7,6 7,4 7,2 7 0 50 100 number of neighbors October 29th - SDA 2011, Beijing 33

Semi-supervised approach: Discussion about parameters MAE according to the number of neighbors and parameter in gaussian and exponential functions gaussian (σ) gaussian (2/3 σ) exp (σ) exp (1/2 σ) MAE 7,6 gaussian (1/2 σ) gaussian (1/3 σ) PLS R MAE 7,6 exp (1/3 σ) exp (1/6 σ) PLS R 7,4 7,4 7,2 7,2 7 7 6,8 6,8 6,6 6,6 6,4 0 50 100 6,4 0 50 100 number of neighbors number of neighbors October 29th - SDA 2011, Beijing 34

Results: Comparison of similarity functions Non-supervised approach Semi-supervised approach PLS R mean PLS R mean dist max dist inverse distance dist max dist inverse distance MAE 8,8 gaussian (1/4 σ) ) exp (1/8 σ) ) MAE 8,8 gaussian (1/3 σ) exp (1/6 σ) 8,4 8,4 8 8 7,6 7,6 7,2 7,2 6,8 6,8 6,4 0 50 100 number of neighbors 6,4 0 50 100 number of neighbors October 29th - SDA 2011, Beijing 35

Results: Summary Best system of each approach PLS R Non supervised system Semi supervised system System MAE Best on how many job boards? Return variability 70 60 Average Recommender 10.2 0 50 PLS-R 7.5 6 Non-supervised system 7.1 7 Semi-Supervised system 6.6 18 40 30 20 10 0 0 2000 4000 Number of postings October 29th - SDA 2011, Beijing 36

Outline Introduction Context and objectives Recommender systems Data complexity Methodology Data handling Similarity computing between job postings Return estimation and system evaluation Experiments: job board recommendation for job postings Data description Experiments and results Conclusions and future work October 29th - SDA 2011, Beijing 37

Conclusions and future work Conclusions: MAE decreases with the standard deviation parameter in gaussian and exponential functions (but increases if too small) In the semi-supervised approach, the optimal parameter implies stability of MAE with the number of neighbors. Select 40 neighbors, and just find the optimal parameter. Best results with semi-supervised supervised approach and exponential function The system allows introducing of new variables and manage their weight in the model Estimation are made on job offers really close to the new offer / the offer studied Future work: Improve the prediction if the posting is in fact «exactly» the same as a previous one Manage job boards with very few or no postings October 29th - SDA 2011, Beijing 38

谢谢 Thank you for your attention! October 29th - SDA 2011, Beijing 39