Using Crowds to Crack Algorithmic Problems Rinat Sergeev - NASA Tournament Lab at Harvard University soon to become Crowd Innovation Lab at Harvard University
Lab Structure Current Staff JIN PAIK Manager RINAT SERGEEV Senior Data Scientist MICHAEL MENEITTI Post Doctoral Fellow ANDREA BLASCO Post Doctoral Fellow Post Doctoral Alums INA GANGULI U.Mass. Amherst PATRICK GAULE CERGE-EI (Prague) CHRIS REIDL Northeastern University
Crowds Can Be Organized as Contests or Communities (Boudreau & Lakhani 2013; King and Lakhani 2013) Innovation problem requires diversity of approaches and broad experimentation Sponsor not sure what combination of skills and approaches might be useful in solution generation Clear rules for participation and winning Innovation problem requires cumulative knowledge building and aggregation of diverse inputs Contributions range from mix & match to co-production with modular tasks and functions Informal, norms-based governance
Innovation Field Experiments Aim To Identify Causal Mechanisms Underlying Innovating with Crowds Collaborations Search costs in finding collaborators - HMS-Advanced Imaging Grant Program ~ 450 researchers - $800,000 Self-organization in online teams - NASA/TopCoder- Imaging/OCR in Documents ~ 432 coders - $50,000 Contests Prizes vs signals - NASA/TopCoder - Autonomous Robots ~ 1200 coders - $30,000 Incentives for internal public goods - HMS/MGH-Idea Competition ~ 350 employees - $27,000 Expert evaluation of scientific ideas - HMS Grant Process ~ 150 Proposals :142 Evaluators - $25,000-$1M Comparing Contests & Collaborations Incentives & search - HMS/TopCoder- Computational Biology ~ 700 coders - $6,000 Selection vs treatment effects - NASA/TopCoder - Space Medical Kit Development ~ 900 coders - $25,000
Crowds and Development The Crowd Competition balance Structure Self-selection Optimality Incentive diversity Attractiveness Evaluation Quality The Problem Broad-scope problems Innovation Narrow-scope problems Extreme quality Inconvenient problems Flexibility Large problems Bandwidth Diverse-skill problems Specialization
Development The Product Software Applications Algorithms The Partners NASA multiple departments USAID National Geographic Department of Energy QuakeFinder Smithsonian Center for Astrophysics Universities Commercial companies
Innovation Tournaments are Historically Important & Currently Popular The Duomo - Florence 1418 - Up to 2,000 Florins The Longitude Prize 1714 - Up to 20,000 Invention of Food Canning 1800 - Up to 12,000 Francs Ansari X-Prize Space Travel 1996 $10,000,000 Scientific Problem Solving 2001 Average $30,000 Local Motors Car Design 2008 Over 35000 Submits
Why does it work?
Crowdsourcing gives access to smart people No Matter Who You Are Most of the Smartest People Work for Someone Else Bill Joy (Sun Microsystems, BSD Unix, Java)
incentivized to do the task Extrinsic Cash, Job Market Signals, Community Prestige Intrinsic Fun, Enjoyment, Learning, Autonomy, Taste Prosocial Community Belonging Identity
with matching skills
especially if you don t set the requirement too strong
Multiple attempts can produce Extreme Value Outcomes Probability Density 0.06 0.05 0.04 0.03 0.02 0.01 0-20 -10 0 10 20 30 40 50 60 Value of Innovation Outcomes
Broad participation can bring a valuable idea, missed by the experts
Even many ideas!
Comparative Evaluation and Peer Pressure
Comparative Evaluation and Peer Pressure
Is there a lot of cheese for free? Well, there are trade-offs: Management overhead Knowledge transfer Performance variability Wasted resource on non-winners Legal questions Resistance to innovations from problem stakeholders Crowd Specifics, Preferences and Limitations
Lab has Designed & Executed Over 100 Challenges 35" Total&Consulta+ons&and&Challenges" 30" Total"Consults" 25" Total"Challenges" 20" 15" 10" 5" 0" Jan*11" Feb*11" Mar*11" Apr*11" May*11" Jun*11" Jul*11" Aug*11" Sep*11" Oct*11" Nov*11" Dec*11" Jan*12" Feb*12" Mar*12" Apr*12" May*12" Jun*12" Jul*12" Aug*12" Sep*12" Oct*12" Nov*12" Dec*12" Jan*13" Feb*13" Mar*13" Apr*13" May*13" Jun*13" Jul*13" Aug*13" Sep*13" Oct*13" Nov*13" Dec*13" Jan*14" Feb*14" Mar*14" Apr*14" May*14" Jun*14" Jul*14" Aug*14"
TopCoder/Appirio Contest Engine Development Assembly Testing Bug Races Architecture Concepts Design Wireframes More than 30+ Specialized contest types Storyboards Prototype Marathon Matches Single-Round Match
Massive Parallel Production of Innovative Assets Copilots Competitors Architecture Client Assembly Testing UX Idea Gen Rapid Prototyping Big Data Challenge Optimization Algo Storyboard Wireframes Concept
We like them because They Collect a Lot of Data and Are Open for Experiments!
Leverage Competition to Optimize Complex Big Data Algorithmic Problems NTL Algorithmic Projects for Science
Harvard Algorithm Challenges 2014 2013 2012 NASA PDS Cassini Rings NASA Asteroid Data Hunter 2 NASA Asteroid Tracker EPA Cyanobacteria Modeling EPA Toxcast NASA Asteroid Data Hunter 1 USAID Atrocity Prevention NatGeo Collective Minds & Machines NASA Robonaut 1 NASA Robonaut 2 NASA Longeron NASA Robots (Signals v. Prizes) 2011 USPTO Patten Imaging NIH/HMS Megablast
Antibody Sequence Annotation Algorithm 122 654 89 5 CODERS SUBMITTED SOLUTIONS DIFFERENT APPROACHES TO SOLVE PROBLEM IDENTIFIED WINNING COUNTRIES RUSSIA, FRANCE, EGYPT, BELGIUM & US Higher accuracy and 120x speedup!
Optimizing Genome-Wide Association Studies (GWAS) Algorithm, implemented in PLINK package Genome associations Links genetic variants (SNPs) to observed health conditions Helps target proteins for future investigation Speedup: contest by contest ~30x speedup in logistic regression ~300x speedup over basic use case ~1000x speedup with multi-threading Streamline: Complete runs reveal all SNP correlations From 5 hours per GWAS down to ~20s
Maximizing the energy output from International Space Station solar panels
Maximizing the energy output from International Space Station solar panels Crowdsourced model of ISS Contest winners Energy output from different solutions 4,056 Registrants 459 Competitors 2185 Submissions 4.76 Avg. submissions per competitor 124,025 Views for Longeron Video Top solutions have been added to NASA-ISS reserve pool
Can we train an algorithm to follow the annotation crowd in a search for Genghis Khan Tomb? Crowd-Archaeology from Space Winning Algorithm Scientist and Explorer Albert Lin on Horseback Checking the Predictions Onsite
Can we use open-source data to predict atrocities with sub-country level of resolution? Winner s Algorithm Used GDELT and PITF to forecast PITF Implemented Method Random Forest Used 23 predictive patterns from PITF Used 13 predictive patterns from GDELT Example of the Prediction Data search crowdsourced Data preparation crowdsourced Ideas crowdsourced Main algorithmic contest successful 1077 Registrants 93 Competitors 618 Submissions 6.65 Avg. submissions per competitor Growing atrocity risk in Aleppo province, Syria 2010-12 Top solution outperformed baseline model by 62%
Can we use open-source data to predict atrocities with sub-country level of resolution? Sometimes you meet them! nhzp339 as a winning green dot nhzp339 alive
KaBoom: Teach NASA radar array how to track asteroids! The radar is real The contest: 174 Registrants 37 Countries 43 Competitors 299 Submissions The plans: To make it bigger and stronger
Asteroid Data Hunter: Find Asteroids on Space Images The orbits of known asteroids The main way to detect those The challenge to automatize it
NASA: Find New Moons of Saturn on Cassini Images 62 large moons of Saturn are known There is way to find smaller ones! The challenge to automatize a search for propeller perturbations of Saturn Rings
Our Team Thank You!