Systematizing Confidence in Open Research and Evidence (SCORE)

Similar documents
ECHO BAA Guidelines. Approved for Public Release, Distribution Unlimited

DARPA-BAA TRADES Frequently Asked Questions (FAQs) as of 7/19/16

HR001118S0037 Frequently Asked Questions

Computers and Humans Exploring Software Security (CHESS) Program HR001118S0040

HR001118S0040 Computers and Humans Exploring Software Security (CHESS) Frequently Asked Questions

Doing Business with DARPA

DARPA 101. Dr. D. Tyler McQuade. August 29, Distribution Statement A (Approved for Public Release, Distribution Unlimited)

27A: For the purposes of the BAA, a non-u.s. individual is an individual who is not a citizen of the U.S. See Section III.A.2 of the BAA.

Breakout Session 6: Doing Business with DARPA

DARPA BAA Frequently Asked Questions

DARPA-BAA EXTREME Frequently Asked Questions (FAQs) as of 10/7/16

WARFIGHTER ANALYTICS USING SMARTPHONES FOR HEALTH (WASH) Angelos Keromytis. Proposer s Day 16 May 2017

How to Write a Successful Scientific Research Proposal

Small Business Programs Office (SBPO) Susan Nichols Program Director

DARPA BAA HR001117S0054 Posh Open Source Hardware (POSH) Frequently Asked Questions Updated November 6, 2017

Q: Do all programs have to start with a seedling? A: No.

DARPA BAA HR001117S0054 Intelligent Design of Electronic Assets (IDEA) Frequently Asked Questions Updated October 3rd, 2017

REQUEST FOR WHITE PAPERS BAA TOPIC 4.2.1: ADAPTIVE INTELLIGENT TRAINING TECHNOLOGIES Research and Development for Multi-Agent Tutoring Approaches

29A: Hours may be used as the Base labor increment. 28Q: Are human in the loop solutions of interest for ASKE? 28A: Yes

Integrating Broader Impacts into your Research Proposal Delta Program in Research, Teaching, and Learning

HR001118S0023 FREQUENTLY ASKED QUESTIONS

UNCLASSIFIED. R-1 ITEM NOMENCLATURE PE D8Z: Human, Social and Culture Behavior (HSCB) Modeling Advanced Development FY 2011 OCO Estimate

screening by targeting

Question1: Is gradual technology development over multiple phases acceptable?

PROGRAM ANNOUNCEMENT FOR FY 2019 ENVIRONMENTAL SECURITY TECHNOLOGY CERTIFICATION PROGRAM (ESTCP)

University of Florida Unifying Home Asset & Operational Ratings: Adaptive Management via Open Data & Participation November 2011 Progress Report

Fiscal Year (FY) 2016 Unemployment Insurance (UI) Reemployment Services and Eligibility Assessment (RESEA) Grants

TARGETED RFA IN PROSTATE CANCER RESEARCH Predictive Markers

Information Technology

Evaluation at the Innovation Center

Relevant Courses and academic requirements. Requirements: NURS 900 NURS 901 NURS 902 NURS NURS 906

SBIR at the Department of Defense:

WARFIGHTER MODELING, SIMULATION, ANALYSIS AND INTEGRATION SUPPORT (WMSA&IS)

American Board of Dental Examiners (ADEX) Clinical Licensure Examinations in Dental Hygiene. Technical Report Summary

DARPA-BAA Common Heterogeneous Integration and IP Reuse Strategies (CHIPS) Frequently Asked Questions. December 19, 2016

Saving lives through research and education

Research Centres 2016 Call Webinar January Abstract Deadline: 04/03/16, 1pm Pre-Proposal Deadline: 28/04/16, 1pm

Details of Application Changes

Towards faster implementation and uptake of open government

UNCLASSIFIED. R-1 Program Element (Number/Name) PE F / Distributed Common Ground/Surface Systems. Prior Years FY 2013 FY 2014 FY 2015

Allergy & Rhinology. Manuscript Submission Guidelines. Table of Contents:

Restricted Call for proposals addressed to National Authorities for Higher Education in Erasmus+ programme countries

Retrospective Chart Review Studies

Running Head: READINESS FOR DISCHARGE

NSF-BSF COLLABORATIONS IN BIOLOGY. Theresa Good Acting Division Director Molecular and Cellular Biosciences September 2017

Disability Research Grant Program

Critical Review: What effect do group intervention programs have on the quality of life of caregivers of survivors of stroke?

Topic Question Page. 30 How are Intellectual Property Rights (IPR) handled? 6

Registry of Patient Registries (RoPR) Policies and Procedures

Executive Summary. This Project

Writing Effective Grant Proposals

UNCLASSIFIED FY 2016 OCO. FY 2016 Base

Statements of Interest. Request for Proposals (RFP)

ARMY RDT&E BUDGET ITEM JUSTIFICATION (R-2 Exhibit)

SSF Call for Proposals: Framework Grants for Research on. Big Data and Computational Science

Understanding Gulf Ocean Systems Grants 1 - Application Form

Towards a Common Strategic Framework for EU Research and Innovation Funding

Commercial Solutions Opening (CSO) Office of the Secretary of Defense Defense Innovation Unit (Experimental)

L.Y r \ Office ofmanagement and Budget

ALICE Policy for Publications and Presentations

Adapting Cross-Domain Kill-Webs (ACK) HR001118S0043

Call for Proposals Collaborative Data Innovations for Sustainable Development

PCORI s Approach to Patient Centered Outcomes Research

BIG DATA REGIONAL INNOVATION HUBS & SPOKES

The Role of Health IT in Quality Improvement. P. Jon White, MD Health IT Director Agency for Healthcare Research and Quality

June 25, Dear Administrator Verma,

Doing Business with DARPA

SBIR/Prime Opportunities: A Rapidly Changing Landscape How Do We Move Forward?

DARPA-RA Young Faculty Award (YFA) Frequently Asked Questions (FAQs) as of 9/29/2017

THE UNIVERSITY OF NORTH CAROLINA SYSTEM RESEARCH OPPORTUNITIES INITIATIVE REQUEST FOR PROPOSALS

Clinical Practice Guideline Development Manual

2 nd Call for Collaborative Data Science Projects

THE TERRY FOX NEW FRONTIERS PROGRAM PROJECT GRANT (PPG) (2019)

Commercial Solutions Opening (CSO) Office of the Secretary of Defense Defense Innovation Unit (Experimental)

Identifying Evidence-Based Solutions for Vulnerable Older Adults Grant Competition

Broad Agency Announcement

Institute of Medicine Standards for Systematic Reviews

Full application deadline Noon on April 4, Presentations to Scientific Review Committee (if invited) May 11, 2016

Syntheses and research projects for sustainable spatial planning

UNCLASSIFIED FY 2016 OCO. FY 2016 Base

ARMY RDT&E BUDGET ITEM JUSTIFICATION (R-2 Exhibit)

2019 PANCREATIC CANCER ACTION NETWORK CATALYST GRANT. Program Guidelines and Application Instructions

2017 Community Grant Guidelines $25,000 One-Year Grants

RAS. Providing innovative solutions to challenging EW/ELINT problems for the DoD and all the US Services. 111 Dart Circle Rome, NY

Cyber Grand Challenge DARPA-BAA-14-05

Call for proposals on platform technologies: Frequently asked questions (FAQs) - 1

DARPA-RA Young Faculty Award (YFA) Frequently Asked Questions (FAQs) as of 9/8/2017

Governance and Institutional Development for the Public Innovation System

A program for collaborative research in ageing and aged care informatics

UNCLASSIFIED FY 2017 OCO. FY 2017 Base

ACQUISITION OF THE ADVANCED TANK ARMAMENT SYSTEM. Report No. D February 28, Office of the Inspector General Department of Defense

UNCLASSIFIED R-1 ITEM NOMENCLATURE

UNCLASSIFIED

HORIZON 2020 PROPOSAL EVALUATION

Doing Business with DARPA

Report on Feasibility, Costs, and Potential Benefits of Scaling the Military Acuity Model

Accelerating Commercial Innovation for National Defense

REQUEST FOR PROPOSALS August 1, 2016

THE UNIVERSITY OF NORTH CAROLINA SYSTEM INTER-INSTITUTIONAL PLANNING GRANT REQUEST FOR PROPOSALS

Scientific Technical and Medical (STM) journal publishing industry overview

Transcription:

Systematizing Confidence in Open Research and Evidence (SCORE) Adam Russell Defense Sciences Office SCORE Proposers Day June 8, 2018 DARPA DSO Internal Use Only Pre-Decisional, Not Approved for Distribution

SCORE Proposal Tips Read the BAA! (If the BAA differs from this presentation, be guided by the BAA) If in doubt, address the Heilmeier Catechism Don t overlook mandatory inclusions as highlighted by the BAA a great idea can be sunk by ignoring the details Present a compelling, innovative approach that isn t addressed by current state of the art - describe how it will advance the science, provide new capabilities, and positively impact DoD Back up your ideas and technical approaches (e.g., theoretical arguments, models, past results, new data) Provide quantitative metrics and milestones to assist DARPA in evaluating feasibility and transparency of proposed work Where possible, go open-source. If you can t, provide strong justification. Don t forget to address risks! Hope is not a management strategy. DISTRIBUTION STATEMENT A. Approved for public release. 2

SCORE Objective Automated tool to quantify the confidence DoD should have in social and behavioral science (SBS) research claims Outcome: Automated capabilities for assigning Confidence Scores (CSs) for the Reproducibility and Replicability (R&R) of different SBS research claims Automated mechanisms for updating Confidence Scores based on new information (retractions, etc.) and/or new signals (social media, etc.) Tailored, interpretable Confidence Scores for different users and applications Impact: Enhance DoD s capabilities to leverage SBS research Enable more effective SBS modeling and simulation Guide future SBS research towards higher CSs Confidence Score Confidence: Reproducibility. Replicability. The R&R of SCORE Reproducibility: The extent to which results can be computationally reproduced by others Replicability: The degree to which results can be replicated by others Distribution authorized to the Department of Defense and U.S. DoD contractors only. Other requests for this document shall be referred to DARPA Defense Sciences Office. 3

What is the problem SCORE is addressing? Effective use of SBS research for national security is hampered by questions of its reproducibility and replicability There is increasing evidence that there are widespread uncertainties in the confidence one should have in many SBS research claims (e.g., refs here). But this does not mean that all SBS research is untrustworthy. How can an SBS consumer practically know the difference? Search/evaluate SBS Literature Create Models Applications Distribution A Cleared for Public Release. 4

SCORE Impact Impact: Enhance DoD s capabilities to leverage SBS research Enable more confident SBS modeling and simulation Guide future SBS research towards higher Confidence Scores Research Vetting: Low Time Cost Minutes Low Financial Cost Cents Wide Coverage Many signals Quantified Results Interpretable User Focused Tailorable + + + + SCORE will improve DoD s efficiency in evaluating SBS research, and increase confidence in how that research can be leveraged for the Human Domain Distribution Statement 5

Computational Reproducibility Rubric Rubric developed for, and used in, DARPA s Next Generation Social Science (NGS2) Program https://goo.gl/ns1vdj Distribution authorized to the Department of Defense and U.S. DoD contractors only. Other requests for this document shall be referred to DARPA Defense Sciences Office. 6

Technical Challenges Automated tools to quantify the confidence DoD should have in SBS research and claims requires Creating algorithms that can quantify confidence With results equal to or better than the best expert methods With explainable and tailorable outputs Developing approaches for expert scoring of SBS studies for algorithm training/test set With sufficient speed and accuracy With ability to understand basis for scores Preparing a curated (selected and organized) dataset of diverse SBS literature At a rate that can train/test effective machine learning algorithms With sufficient diversity while being machine-readable Empirically testing the R&R of a representative subset of studies At a rate sufficient to provide assurance of expert accuracy That reflects different content, authors, journals Why now? Weak Signals for Algorithms to Exploit Expert Predictions at Scale Open Research and Replication Platforms Distribution authorized to the Department of Defense and U.S. DoD contractors only. Other requests for this document shall be referred to DARPA Defense Sciences Office. 7

SCORE Mechanics Program Structure Technical Areas Evaluation and Performance Metrics Teams and Teaming Proposal Details DISTRIBUTION STATEMENT A. Approved for public release. 8

SCORE Program structure The SCORE program will be divided into three Technical Areas (TAs) with an independent Test and Evaluation (T&E) team providing oversight. The three TAs are: TA1: Data TA2: Experts TA3: Algorithms Proposals to any of the TAs must address the full program timeline, however TA3 teams will officially start work after Month 6 in Phase 1 Proposers should structure their proposals with Phase 1 as the base period and Phase 2 as an option for funding Please note that to avoid conflicts of interest, no person or organization may be a performer for more than one TA, whether as a prime or as a subcontractor Distribution Statement 9

SCORE Program Phases Phase 1: Data +Experts 18 months Phase 2: Data + Algorithms 18 months 1Q FY19 3Q FY20 1Q FY22 Program Kickoff Phase 1 Outcome SCORE Common Task Framework (CTF) Create datasets of SBS research with validated Confidence Scores Phase 2 Outcome SCORE Algorithms Build algorithms that rapidly and interpretably assign expertlike Confidence Scores GOALS 6K+ curated SBS articles in CTF for training/testing algorithms Understanding of expert processes and signals GOALS Trustworthy algorithms that convincingly overlap with experts judgments Algorithms that enhance DoD use of SBS research SCORE will combine data, experts, and algorithms to create a systematic approach for developing Confidence Score technologies Distribution authorized to the Department of Defense and U.S. DoD contractors only. Other requests for this document shall be referred to DARPA Defense Sciences Office. 11

SCORE Technical Areas SCORE will develop and test new capabilities to rapidly and accurately estimate the Reproducibility and Replicability (R&R) of SBS research claims TA1 (Data) Teams will: Curate SCORE datasets for TA2 and TA3 teams Empirically evaluate representative sample of studies to test accuracy of TA2 methods Test TA3 algorithms ability to update and detect gaming efforts TA2 (Experts) Teams will: Assign CSs to all TA1 datasets via expert crowd-sourcing methods Be 80% accurate in predicting TA1 R&R empirical evaluations in each phase Capture signals that experts use to assign confidence levels TA3 (Algorithms) Teams will: Create algorithms that assign CSs to TA1 test datasets that correlate with best TA2 team CSs Demonstrate usability of algorithms/systems for DoD SBS consumers Distribution Statement 11

SCORE Technical Areas SCORE is a two-phase program built around three technical areas Curate studies datasets for TA2 Experts in predicting Confidence Scores (CSs) Empirically test representative samples of studies to evaluate TA2 CSs accuracy TA1: Data Provide TA3 training datasets (including previous reproducibility and replication results) Provide datasets to test TA3 algorithms overlap of TA2 CSs, ability to update CSs, detect gaming efforts Research studies Confidence Scores Training data Challenge data TA2: Experts Use expert crowd-sourcing methods to assign Confidence Scores to TA1 datasets Capture expert processes/signals used to assign Confidence Scores TA3: Algorithms Develop algorithms for automated Confidence Score generation for TA1 data using diverse signals (may use TA2 signals) Demonstrate algorithm updating given new data or information Demonstrate utility for experts and nonexperts 12

SCORE Program Schedule and Tests = Milestone = Site Visit = PI Meeting Phase I Phase II FY19 FY20 FY21 FY22 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 TA1 DATA: CTF Database Curation 1b DATA: CTF Database Curation 1a DATA: R&R Empirical Evaluations Phase 1 1c DATA: R&R Empirical Evaluations Phase 2 TA2 2a EXPERTS: Confidence Scores Assignments 2b 2c EXPERTS: Confidence Scores Assignments TA3 ALGORITHMS: Early Proof 3a ALGORITHMS: Sprint 1 ALGORITHMS: ALGORITHMS: 3b 3c 3d Sprint 2 Sprint 3 Phase 1 to Phase 2 decision: TA1 Curation at a rate of 3K articles per year TA2 CSs assessments at a rate of 3K articles per year 200 empirical evaluations, 80% TA2 expert accuracy End Phase 1: Curated database of research articles Platforms established for efficient expert assessments Efficacy and explainability of algorithms demonstrated End Phase 2: Algorithms with high confidence overlap with expert assessments Efficient algorithm return rate Demonstrated usability for experts and non-experts TA1/TA2 Kickoff TA3 Kickoff Please see Figure 3 and Tables 1-2 in the BAA Distribution authorized to the Department of Defense and U.S. DoD contractors only. Other requests for this document shall be referred to DARPA Defense Sciences Office. 13

SCORE Mid-term and Final Exams TA1 Data TA2 Experts TA3 Algorithms Metric SOA Phase 1 Phase 2 Outcome Curation rate R&R Empirical Evaluation CSs Assignment rate? 100 studies /12 months? >3K per year 200 studies /15 months >3K per year >3K per year 200 studies /12 months >3K per year Accuracy 75% 80% >80% Scoring rate Correlation with TA2 N/A N/A 1 study /hour Demonstration of efficacy and explainability CTF Datasets for SCORE Accurate Confidence Scores 1 study /30 minutes SCORE Algorithms 75/85/95% Distribution authorized to the Department of Defense and U.S. DoD contractors only. Other requests for this document shall be referred to DARPA Defense Sciences Office. 14

SCORE Proposal Tips Read the BAA! (If the BAA differs from this presentation, be guided by the BAA) If in doubt, address the Heilmeier Catechism Don t overlook mandatory inclusions as highlighted by the BAA a great idea can be sunk by ignoring the details Present a compelling, innovative approach that isn t addressed by current state of the art - describe how it will advance the science, provide new capabilities, and positively impact DoD Back up your ideas and technical approaches (e.g., theoretical arguments, models, past results, new data) Provide quantitative metrics and milestones to assist DARPA in evaluating feasibility and transparency of proposed work Where possible, go open-source. If you can t, provide strong justification. Don t forget to address risks! Hope is not a management strategy. DISTRIBUTION STATEMENT A. Approved for public release. 15

SCORE encourages multidisciplinary teaming! DARPA highly encourages and will facilitate teaming. See BAA, VIII.B. Teaming Profiles are due June 15, 2018 no later than 4:00pm Eastern Consolidated teaming profiles will be sent via email to the proposers who submitted a valid profile However DARPA will attempt to update the consolidated teaming profiles with submissions past the due date Interested parties can still submit a one-page profile including the following information to SCORE@darpa.mil: Contact information Proposer s technical competencies. Desired expertise from other teams, if applicable Complete teaming information is not required for abstract submission DISTRIBUTION STATEMENT A. Approved for public release. 16

But!... Specific content, communications, networking, and team formation are the sole responsibility of the participants. Neither DARPA nor the DoD endorses the information and organizations contained in the consolidated teaming profile document, nor does DARPA or the DoD exercise any responsibility for improper dissemination of the teaming profiles. DISTRIBUTION STATEMENT A. Approved for public release. 17

SCORE Key Dates BAA Published Anticipated June 12, 2018 Teaming Profiles Due June 15, 2018 Proposers Day June 8, 2018 Abstracts Due (TA1 and TA2) June 20, 2018 Abstracts Due (TA3) November 1, 2018 FAQ Submissions Due July 20, 2018 Proposals Due (TA1 and TA2) August 1, 2018 Proposals Due (TA3) December 12, 2018 Please refer to the BAA for any changes in dates DISTRIBUTION STATEMENT 18

Intellectual Property Data sharing and collaboration are key aspects of this program Therefore, intellectual property rights asserted by proposers are strongly encouraged to be aligned with open source regimes See Section VI.B in the BAA for further information DISTRIBUTION STATEMENT A. Approved for public release. 19

Proposal Abstracts Proposers are highly encouraged to submit an abstract Submit to https://baa.darpa.mil/ (do not submit via email) see BAA Section IV.E.1 for details DARPA will respond to abstracts with a statement as to whether DARPA is interested in the idea While it is DARPA policy to attempt to reply to abstracts within thirty calendar days, proposers may anticipate a response within approximately three weeks Regardless of DARPA s response to an abstract, proposers may submit a full proposal Abstracts will be reviewed in the order they are received DARPA will review all full proposals submitted using the published evaluation criteria and without regard to any comments resulting from the review of an abstract Complete teaming information is not required for abstract submission DISTRIBUTION STATEMENT A. Approved for public release. 20

SCORE Evaluation Criteria Review and Selection Process: DARPA will conduct a scientific/technical review of each conforming proposal. Proposals will not be evaluated against each other since they are not submitted in accordance with a common work statement. Evaluation Criteria: Proposals will be evaluated using the following criteria, listed in descending order of importance: (a) Overall Scientific and Technical Merit; (b) Potential Contribution and Relevance to the DARPA Mission; (c) Cost Realism (See BAA Section V. A. for specific details on each criterion) DISTRIBUTION STATEMENT A. Approved for public release. 21

References Camerer et al. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351(6280). Camerer et al. (unpublished). Evaluating the Replicability of Social Science Experiments in Nature and Science. Chang, Andrew C., and Phillip Li (2015). Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say Usually Not, Finance and Economics Discussion Series 2015-083. Washington: Board of Governors of the Federal Reserve System. Dreber, Anna et al. (2015). Using predication markets to estimate the reproducibility of scientific research. PNAS, 112(50. Ioannidis, John & T. D. Stanley, Hristos Doucouliagos (2017). The Power of Bias in Economics Research. The Economic Journal, 127(605). Klein, Rick & Michelangelo Vianello, Fred Hasselman, and Brian Nosek (unpublished). Many Labs 2: Investigating Variation in Replicability Across Sample and Setting. Advances in Methods and Practices in Psychological Science. Kovanis M, Porcher R, Ravaud P, Trinquart L (2016). The Global Burden of Journal Peer Review in the Biomedical Literature: Strong Imbalance in the Collective Enterprise. PloS ONE, 11(11). Maket, Matthew & Jonathan Plucker, Boyd Hegarty (2012). Replications in Psychology Research. Perspectives on Psychological Science, 7(6). Martin, GN & Clarke R (2017). Are Psychology Journals Anti-replication? A Snapshot of Editorial Practices. Frontiers in Psychology, Munafo, Marcus et al. (2017). A manifesto for reproducible science. Nature Human Behavior, 0021. Neuliep, JW & Crandall R (1990). Editorial bias against replication research. Journal of Social Behavior & Personality, 5(4). Nosek, Brian & Jeffrey Spies, Matt Motyl (2012). Scientific Utopia II. Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspectives on Psychological Science, 7(6). Nosek, Brian et al. (2015). Estimating the reproducibility of psychological science. Science, 349(6251). Smith, Richard (2006). Peer Review: a flawed process at the heart of science and journals. Journal of the Royal Society of Medicine, 99(4). Steen RG, Casadevall A, Fang FC (2013). Why Has the Number of Scientific Retractions Increased? PLoS ONE, 8(7). Szucs, Denes & John Ioannidis (2017.). Empirical assessment of publiched effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3). Veldkamp CLS, Nuijten MB, Dominguez-Alvarez L, van Assen MALM, Wicherts JM (2014). Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science. PLoS ONE, 9(12). Distribution authorized to the Department of Defense and U.S. DoD contractors only. Other requests for this document shall be referred to DARPA Defense Sciences Office. 22

Thank you Distribution Statement 23