The FoS Forum: A System for Reflecting on Scholarly Activities in a University Setting and Data-Integration Lessons we Learned Building It Eleni Stroulia stroulia@ualberta.ca Professor, Computing Science Project Director, Integrated Strategic Data Systems Faculty of Science, University of Alberta May 2018
May-18 2 Faculty Members and Our CVs CV University Annual Report CCV F100 MITACS Provincial agencies NCEs
May-18 3 Faculty Members and Our CVs CV University Annual Report CCV F100 MITACS Provincial agencies NCEs Always up to date Every July (and upon QA processes ) With every application According to its individual reporting cycle
May-18 4 Agenda 1. The Forum Software (Origin and Development Methodology) 2. Data Sources and Supported Services of The FoS Forum 3. The FoS Forum, its Data-Integration Challenges, and how we addressed them 4. Some comments on the user-interaction model in the FoS Forum
THE FORUM SOFTWARE May-18 5
May-18 6 The Forum Software Product Line FoS Forum @2018 - generalized to The GRAND Forum @2009-@2014 The Forum @2014 cloned+evolved to instantiated to The AGE-WELL Forum @2014 - The GlycoNet Forum @2014 - The CFN Forum @2015 - The CART/GRAC Forum @2016 - GARS @2017 - The Quantum Forum @2016 - The FESRI Forum @2017 -
May-18 7 Key Design Principles Each output should be entered only once and it should be automatically associated with all contributing individuals all funding projects Reporting is a continuous process It is more about maintaining a record of activity and outputs than about reporting to a report recipient
May-18 8 Software-Engineering Methodology REST-style Service-Oriented Architecture Development process version control continuous deployment Development on a sandbox with fake data (development team) Testing on a sandbox with actual data (chosen users) Deployment on the production system testing with standard sequences Quality control access-control auditing
DATA SOURCES AND SUPPORTED SERVICES May-18 9
To build a complete picture of one s scholarly activity, one should take advantage of multiple data sources. The task is not a set of forms: it is a data structure. Faculty Scholarly Profile HQP FoS Forum May-18 10 Data Source (Faculty, Department) Data Source (Institution) Data Source Data Source (External)
In developing the FoS Forum, we have used systems owned by the FoS, other UoA units, and external systems. Faculty Scholarly Profile HQP FoS Forum Graduate Application Review System OT/CS 1500/600 files distributed reviewing May-18 11 efec Beartracks FGSR GARS Common CV FoS Awards Grants 3.0 Tec Edmonton GSMS NSERC Open Data
Data from different sources overlap, and need to be cleaned up, and aligned. In the FoS Forum, the result of this curation process is scholarly profiles for FoS members (faculty and HQP). Faculty Scholarly Profile HQP FoS Forum May-18 12 FEC history, Publications, Presentations, HQP, Activities, Awards Courses Taught HQP milestones/committees Grants, Spin-offs HQP Academic/Employment history, Publications, Presentations, HQP, Activities efec Beartracks FGSR GARS Common CV FoS Awards Grants 3.0 Tec Edmonton GSMS NSERC Open Data
Faculty members maintain their profiles, year-round. Faculty May-18 13 HQP Scholarly Profile FoS Forum FEC history, Publications, Presentations, HQP, Activities, Awards Courses Taught HQP milestones/committees Grants, Spin-offs HQP Academic/Employment history, Publications, Presentations, HQP, Activities efec Beartracks FGSR GARS Common CV FoS Awards Grants 3.0 Tec Edmonton GSMS NSERC Open Data
At the time of Annual-Report submission, the relevant information is compiled into a Annual Report PDF. Faculty Scholarly Profile HQP FoS Forum FEC history, Publications, Presentations, HQP, Activities, Awards Courses Taught HQP milestones/committees Grants, Spin-offs HQP Academic/Employment history, Publications, Presentations, HQP, Activities efec Beartracks FGSR GARS Common CV FoS Awards Grants 3.0 Tec Edmonton GSMS NSERC Open Data
The information is compiled into Quality Assurance documents: 1. standardized faculty CVs, 2. aggregate counts, and tables (e.g., courses and who taught them; instructors and their courses; publications with graduate/undergraduate student coauthors). Faculty Scholarly Profile HQP FoS Forum May-18 15 FEC history, Publications, Presentations, HQP, Activities, Awards Courses Taught HQP milestones/committees Grants, Spin-offs HQP Academic/Employment history, Publications, Presentations, HQP, Activities efec Beartracks FGSR GARS Common CV FoS Awards Grants 3.0 Tec Edmonton GSMS NSERC Open Data
Current Services QA CVs QA Reports Annual Reports Faculty Scholarly Profile HQP FoS Forum Teaching Assignments Find-an- Expert May-18 16 Services In-Progress FEC history, Publications, Presentations, HQP, Activities, Awards Courses Taught HQP milestones/committees Grants, Spin-offs HQP Academic/Employment history, Publications, Presentations, HQP, Activities efec Beartracks FGSR GARS Common CV FoS Awards Grants 3.0 Tec Edmonton GSMS NSERC Open Data
DATA-INTEGRATION CHALLENGES May-18 17
Key Integration Challenges May-18 18 è The FoS Forum Approach 1. Disambiguating person identities (within and beyond the UoA) 2. Exchanging and cross-referencing scholarly outputs 3. Aligning grants and project awards 4. Managing HQP (student) profiles and tracking progress
May-18 19 1. Disambiguating Person Identities Within the UoA, EmployeeID is the consistent identifier Beyond the UoA systems, names are inconsistent (middle names, umlauts, South-American names); typos exacerbate the challenge.
May-18 20 1. Disambiguating Person Identities Within the UoA, EmployeeID is the consistent Robinson, S. identifier Beyond the UoA systems, names are inconsistent (middle names, umlauts, South-American names); typos exacerbate the challenge. Susan Robinson 1234567 Robinson, Susan Robinson, S. R. Sue Robinson S Robinsen (typo at data entry) Herrera, Sebastian E Sanchez Herrera, Sebastian Sanchez Sanchez Herrera, Sebastian Enrique Nandez, José Avenda Nandez, Josà Avendaño Avendaño Nandez, Josà Luis
May-18 21 1. Disambiguating Person Identities Within the UoA, EmployeeID is the consistent identifier Beyond the UoA systems, names are inconsistent (middle names, umlauts, South-American names); typos exacerbate the challenge. è The FoS Forum incorporates many additional IDs (ORCID, GoogleScholar ID); implements a number of disambiguation heuristics to help with disambiguation and deduplication.
2. Exchanging and Cross-referencing May-18 22 Scholarly Outputs Faculty members typically maintain their scholarly history in several places (i.e., Google Scholar profile, ORCID, Publisher portals, CCV, )
2. Exchanging and Cross-referencing May-18 23 Scholarly Outputs Faculty members typically maintain their scholarly history in several places (i.e., Google Scholar profile, ORCID, Publisher portals, CCV, ) è The FoS Forum exchanges data with multiple such systems It imports publications (bibtex, DOI) from all the above systems academic/employment history and HQP from CCV It cross-references each publication with every author in the system; It annotates publications with impact measures (acceptance rates, ISI journal rankings). It exports academic/employment history, publications, and HQP to CCV
May-18 24 3. Aligning Grants and Project Awards Data representation in other systems will not necessarily align with the data representation required for (easy) integration Consider research projects and grant accounts AR/FEC requires project title, funder(s), amount, funding period, PI, co- PIs Grants 3.0 is a revenue account management system
May-18 25 3. Aligning Grants and Project Awards Data representation in other For systems example, looking will not at the necessarily grant accounts align of the faculty member who also serves as the Chair of with the data representation required for (easy) integration Chemistry è funds flowing to the CRCs in the department Consider research projects and grant accounts revenue from supplies and services associated AR/FEC requires project title, funder(s), amount, funding period, PI, cowith the department PIs Grants 3.0 is a revenue account management position system miscellaneous FoS revenue managed by the chair provincial awards to graduate students cross institutional awards NSERC discovery grants are simpler, BUT The same account # is re-assigned; project titles, funding periods are re-written Co-PIs are not always available
May-18 26 3. Aligning Grants and Project Awards Data representation in other systems will not necessarily align with the data representation required for (easy) integration Consider research projects and grant accounts AR/FEC requires project title, funder(s), amount, funding period, PI, co- PIs Grants 3.0 is a revenue account management system è The FoS Forum imports grants and associates them with account holders and project coapplicants; Imports NSERC awards; incorporates a manually curated list of grant types, for the QA CV and report generation; enables additions/edits through the user interface.
4. Managing HQP profiles and tracking progress May-18 27 The information about HQP is highly distributed: departmental systems manage graduate-student program requirements; scholarships and awards may be associated with grants; FGSR maintains official milestones for graduate students; Undergraduate students and PDFs are less visible to systems.
4. Managing HQP profiles and tracking progress May-18 28 The information about HQP is highly distributed: departmental systems manage graduate-student program requirements; scholarships and awards may be associated with grants; FGSR maintains official milestones for graduate students; Undergraduate students and PDFs are less visible to systems. è The FoS (in CS) has established a first pathway: Scholarly progress management with the FoS Forum (one system for all FoS scholars) Admission/Review with GSMS/GARS (dept. procedures with GradDB)
May-18 29 SOME COMMENTS ON THE USER-INTERACTION MODEL
May-18 30 Data Entry Should be Avoided Publication meta-data should be primarily imported from publishers, i.e., the authoritative data sources. Names (of people, roles, departments, universities) should be selected (not entered). Dates should be entered through special-purpose widgets.
May-18 31 PDFs Should be Easy to Read and Scan The FoS Forum PDFs are produced for broad/diverse audiences FEC members External reviewers Color for structure Concise look (and small size) Table of Contents follows the structure for easy access
Help should be Easy to Ask May-18 32 (and the request should be well documented) The Forum has a Report Issue button that takes a screenshot of the system and sends it to support, with data about the client s platform, and a comment from the user.