Steering Committee Meeting Location: UCSD BRF2 5A26 Date: 6/4/2015 Start time: 9:00 am PDT End time: 10:00 am PDT Meeting Objective Attendees Present Minute Taker Attendees Absent biocaddie Steering Committee Meeting UCSD: Lucila Ohno-Machado, Jeffrey Grethe, Stephanie Hagstrom, Cleo Maehara, Cassie Kling UTHealth: Hua Xu, Anupama Gururaj Oxford: Susanna-Assunta Sansone NIH: Ian Fore, Dawei Lin, Alison Yao, Eric Choi, Ron Margolis Cleo Maehara, Cassie Kling UM: George Alter Agenda Item #1: Clarification of Issues Discu Minutes Steering Committee (SC) meeting logistics: o Occurs on the first Thursday of each month at 9:00 am PT o Chaired by Ohno-Machado o Prior to each meeting, biocaddie will contact NIH for their suggested agenda items that can be combined with biocaddie s agenda The day before the meeting, we will send out the final agenda items and no later than 2 days after the meeting, we will circulate the meeting minutes. Quarter definition, please see table in slides (Ohno-Machado) o First year of award is only 11 months long Q4 is only two months long (6/31/15-8/31/15) o Supplements have to end at the same time as the parent grant year Supplements are scheduled to start on 10/01/2015 and end on 08/31/2016 together with biocaddie year 2 Supplements can request a one month, no cost extension so that it totals to a 12 month project (will use 11 month carry over funds for 1 month extension) Steering and Executive Committee (EC) composition (Ohno-Machado) o EC meets biweekly, SC meets monthly o Maryann Martone requested that Jeff Grethe take her place in SC and EC decision was approved Out of the five NIH SC members, only one can be a voting member (Ian Fore) o Nominated SC observers Observers are people interested in hearing what is happening at SC meetings NIH and biocaddie can nominate up to 5 people each to participate as an observer in SC meeting
Discu Nominations should be made via an official communication and appear on the annual progress report Heidi Sofia has been nominated by NIH (Heidi also serves as the science officer on another project at the Department of Biomedical Informatics UCSD) Dissemination and community outreach activities (Ohno-Machado) o Request for Applications (RFA) for pilot project on Harvester biocaddie will issue an RFA for a pilot on a metadata harvester (software that collects metadata from the web) Software must be interoperable with the prototype We discussed an outline for the RFA during a phone call on 6/03/2015 Agenda Item #2: Progress and Updates Y1Q3 and Plans for Y1Q4 Just one awardee would not generate enough community engagement up to 3 awardees o Award amount: $25,000 per project o Duration of project period: 6 months (09/01/2015 02/29/2016) o Will be issued by UCSD RFA will need to be explained in progress report and budget Will the harvester crawl the web or will it be directed at specific resources that are felt to be significant? There is an intent to web crawl, we are still working on the RFA text that specifies our expectations and the rules for success The pilots will be used to experiment at least to a small scale Working Group 1: BD2K Centers of Excellence Collaboration (Ohno-Machado) o Full WG1 description available here. o Likely that two of the supplement proposals requested through biocaddie will be selected for support Selected: Haussler (genomics BD2K), Kohane (PIC BD2K), Kumar (MD2K) count queries across data types, expand Beacons Selected: Ping (Heart BD2K) ELIXIR, Oxford, and Institute for Systems Biology (Seattle) Not selected: Craven (D2 Predictive Models) redundant data identification, linking the same person across data sets Not selected: Musen (CEDAR) needs assessment for data used by centers (Oxford) o Supplement proposals not requested through biocaddie centers that reached out to us about supplements Ma ayan (LINCS) supplement submitted ranking visualization, crowdsourcing (in collaboration with PP 2.1) Two Haussler supplements submitted: ATHENA breast cancer sequences and clinical data
Hashed IDs for observations o Once supplement funding begins, key element will be the integration of projects so they are not operating independently from each other o We will also collaborate with CEDAR, but plans have not been laid out yet since it will not be receiving additional funding Working Group 2: Data Identifiers Recommendation (Grethe) o Full WG2 description is available here. o Will operate in two phases Phase 1 (June-July): will produce internal specifications to be delivered to the Core Dev. Team as a document and integrated into DDI prototype Will consider work done by W3C, ELIXIR and other groups First call occurs June 11, there have already been a number of communications with ELIXIR via email Will have three rounds of review of specifications After delivering document to Core Dev. Team, we will reevaluate group members to work on phase 2 Phase 2 (August onwards): will more broadly address the longterm community needs specifying best practices and operating procedures for identifiers Working Group 3: Metadata Specifications (Sansone) o Full WG3 description available here. o Phase 1 (May-July): defines core metadata o Will be a joint effort with CEDAR center o Synergies with BD2K Metadata WG (Musen/Alter) and ELIXIR activities Two WGs exist one in biocaddie and a separate BD2K WG biocaddie s WG3 timeframe is much more compressed, will include an overlap of members with BD2K WG WG3 Phase 1 mainly works with CEDAR, phase 2 will work more broadly in collaboration with BD2K WG o Invited external experts as new members o We are creating three working documents 1. Standard operating procedure document 2. List of competency questions, highlighting metadata 3. Mapping files: generic metadata schemas and life science specific o Timeline: First iteration of mapping is nearly complete Meeting on June 11 with internal biocaddie team during which we will prepare material for the first meeting with invited members (June 18) Core Development Team (Xu) o Team consists of 3 groups Farcas (UCSD), Grethe (UCSD), and UTHealth Meets weekly on Tuesdays at 12:00 pm PT o Task assignment
Development site setup Farcas (UCSD) Data harvest Farcas / Grethe (UCSD) Metadata management Tao / Xu (UTH), Grethe (UCSD) Search engine/web portal Johnson / Cohen / Xu (UTH), Grethe (UCSD) o Web portal (datamed.biocaddie.org) set up by 6/31/15 o Data ingestion process Protein Data Bank (PDB) dataset is complete while database of Genotypes and Phenotypes (dbgap) ingestion is ongoing BD2K center ingestion is difficult as only a few are producing datasets LINCS may be a good option to pursue as a data source o Pilot project integration (will use pilot project 1.1, 2.1, and 2.2 software) with DDI prototype o Use cases and benchmark dataset development Not all datasets will be completed by 8/31/2015 but at least one or two will be done o User needs survey Core Dev. Team consulted with ICPSR experts and concluded that an interview would be better than a survey Have interviewed 5 people so far at UTHealth Lucila will contact BD2K center PIs for an interview designee within their center We could also interview NIH officers o For details on the data indexing pipeline and UI/Search workflow, please see slides. Deliverables for Y1Q3 (Ohno-Machado) o We disseminated the white paper and received feedback We had some feedback but nothing very critical o Set up the infrastructure for the web portal on track with time table o Design of data ingestion process with BD2K center data started and on track o Working groups were organized and started o We are on track for this quarter s deliverables Deliverables for Y1Q4 (only two months) (Ohno-Machado) o Index datasets using metadata standards, tested on 2-3 datasets from BD2K centers After conversations with centers (per WG1), we know that at least 7 are using datasets not producing datasets (with the exception of LINCS and MD2K in the future) Proposed modification index datasets used by centers instead of produced This issue has arisen in NIH meetings and Ian is seeking clarification will be communicated to biocaddie team o We are currently setting up testing benchmarks o RFA for pilot on Harvester for DDI schema is in progress will be issued
Discu before end of quarter o Wrap up of pilot projects currently working on integrating relevant pilot project software into prototype Agenda Item #3 Plan for the Eight Administrative Supplements On biocaddie s part, there has been no action on this - NIH is working on resolving this issue. This item will be placed on the agenda for next month s meeting Action Items Task Due Date Person Responsible 1 Communicate SC observer nominations and 07/01/2015 Steering Committee include in biocaddie s annual progress report (up to 5 nominations each for NIH and biocaddie) 2 Clarify with NIH if biocaddie can index data 07/02/2015 Ian Fore that the BD2K centers are using instead of producing 3 Place administrative supplements on agenda for 07/01/2015 Admin support team next Steering Committee meeting 4 Include RFA for Harvester pilot in progress report and budget 07/01/2015 Executive Committee Next Meeting: 07/02/2015 9:00 am 10:00 am PDT