INCITE Proposal Writing Webinar April 25, 2013 James Osborn ALCF Catalyst Team Argonne National Laboratory Matt Norman OLCF Scientific Computing Group Oak Ridge National Laboratory and Julia C. White, INCITE Manager
Overview Allocation programs [3] INCITE mission and recent stats [4 12] Titan and Mira [13 17] Tips for applicants [18 36] Common oversights Requesting a startup account Benchmarking data Q&A [37, open discussion] Conclusions [38 45] Submittal, review, and awards decisions Contact links 2
Three primary ways for access to LCF Distribution of allocable hours 10% Director s Discretionary Up to 30% ASCR Leadership Computing Challenge DOE/SC capability computing Leadership-class computing 60% INCITE 4.7 billion core-hours in CY2013 3
What is INCITE? Innovative and Novel Computational Impact on Theory and Experiment INCITE promotes transformational advances in science and technology through large allocations of computer time, supporting resources, and data storage at the Argonne and Oak Ridge Leadership Computing Facilities (LCFs) for computationally intensive, large-scale research projects. 4
INCITE criteria Access on a competitive, merit-reviewed basis* 1 Merit criterion Research campaign with the potential for significant domain and/or community impact 2 Computational leadership criterion Computationally intensive runs that cannot be done anywhere else: capability, architectural needs 3 Eligibility criterion Grant allocations regardless of funding source* Non-US-based researchers are welcome to apply *DOE High-End Computing Revitalization Act of 2004: Public Law 108-423 5
Twofold review process Peer review: INCITE panels Computational readiness review: LCF centers Award Decisions New proposal assessment Scientific and/or technical merit Appropriateness of proposal method, milestones given Team qualifications Reasonableness of requested resources Technical readiness Appropriateness for requested resources Renewal assessment Change in scope Met milestones On track to meet future milestones Scientific and/or technical merit Met technical/ computational milestones On track to meet future milestones INCITE Awards Committee comprised of LCF directors, INCITE program manager, LCF directors of science, sr. management 6
2013 INCITE statistics Request for Information helped attract new projects Call closed June 27 th, 2012 Total requests ~15 billion core-hours, 3x more than the 5 billion core-hours requested last year Number of proposals submitted increased nearly 20% Awards of ~5 billion core-hours for CY 2013 61 projects awarded of which 20 are renewals Acceptance rates 33% of nonrenewal submittals and 100% of renewals Contact information Julia C. White, INCITE Manager whitejc@doeleadershipcomputing.org 7
2013 award statistics, by system Jaguar Titan Mira Intrepid 2012 INCITE 2013 INCITE 2013 INCITE 2012 INCITE 2013 INCITE Number projects* 35 32 27 31 27 Average Project 27M 58M 78M 24M 27M Median Project 23M 49.5M 45M 20M 25M * Totals of 32 projects at the OLCF, 37 projects at the ALCF (many of the ALCF projects received time on both Mira and Intrepid) Titan Mira Intrepid Total Awards (Hrs in CY2013) 1.84B 2.11B 0.721B 8
New PI s in INCITE A new PI has never previously led an INCITE submittal 32% of the nonrenewal projects are led by new PI s 41 new projects awarded, 13 led by new PI s INCITE actively engages with new research teams through outreach such as workshops, email distributions, and individual networking. 9
2013 INCITE panel peer reviewers > 50% (e.g. more than 40) of the reviewers are: Society fellows (AAAS, APS, SIAM, IEEE, etc), Agency awardees (ex. NSF Early Career), Laboratory fellows, National Academy members, National Society presidents 41% participated in the 2012 INCITE review 83 science experts participated in the 2013 INCITE panel review. 10
INCITE seeks high-impact research campaigns Examples of previous successful INCITE applications that advance the state of the art across a broad range of topics and different mission priorities Glimpse into dark matter Supernovae ignition Protein structure Creation of biofuels Replicating enzyme functions Global climate Accelerator design Carbon sequestration Turbulent flow Propulsor systems Membrane channels Protein folding Chemical catalyst design Plasma physics Algorithm development Nano-devices Batteries Solar cells Reactor design Nuclear structure 11
INCITE breakthroughs since inception A few of the many science and engineering advances Hours requested vs. allocated: ~2X per year ~3X per year Hours allocated 4.9M 6.5M 18.2M 95M 268M 889M 1.6B 1.7B 1.7B 5B Projects 3 3 15 45 55 66 69 57 60 61 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Researchers solved the 2D Hubbard model and resented evidence that it predicts HTSC behavior Phys. Rev. Lett 2005 Modeling of molecular basis of Parkinson s disease named #1 computational accomplishment Breakthroughs 2008 Largest simulation of a galaxy s worth of dark matter, showed for the first time the fractal-like appearance of dark matter substructures. Nature (2008), Science (2009) World s first continuous simulation of 21,000 years of Earth s climate history. Science (2009) Largest-ever LES of a full-sized commercial combustion chamber used in an existing helicopter turbine Unprecedented simulation of magnitude-8 earthquake over 125-square miles, Proceedings, SC10 NIST proposes new standard reference materials from LCF concrete simulations Calculation of the number of bound nuclei in nature, Nature (2012) New method to rapidly determine protein structure, with limited experimental data Science (2010), Nature (2011) OMEN breaks the petascale barrier using more than 220,000 cores, Proceedings SC10 12
INCITE resources: Mira at ALCF Mira - Blue Gene/Q System Node: 64 bit PowerPC A2, 1.6 GHz 16 cores, 4 HW threads/core 16 GB memory 205 GFLOPS peak 48 racks / 48k nodes / 768k cores 768 TB of memory Peak flop rate: 10 PF 35 PB of disk + large tape storage system New Visualization Systems Initial system available now Advanced visualization system in 2015 13
Overview of Blue Gene/Q Design Parameters BG/P BG/Q Improvement Cores / Node 4 16 4x HW threads / Core 1 4 4x Clock Speed (GHz) 0.85 1.6 1.9x Flop / Clock / Core 4 8 2x Nodes / Rack 1,024 1,024 -- RAM / core (GB) 0.5 1 2x Flops / Node (GF) 13.6 204.8 15x Mem. BW/Node (GB/sec) 13.6 42.6 3x Network Interconnect 3D torus 5D torus Smaller diameter Concurrency / Rack 4,096 65,536 16x 14
Scaling workshop at ALCF Don t miss the ALCF's scaling workshop that has produced Gordon Bell finalists for four years running. ALL ARE WELCOME! Hit your highest performance numbers in preparation for INCITE 2014 Try your code out on full-system Mira reservations Work one-on-one with ALCF experts to optimize your code s performance Register today! Go to www.alcf.anl.gov for details, or www.alcf.anl.gov/workshops/mira-performance-boot-camp-2013 15
Titan at OLCF Cray Linux Environment operating system Gemini interconnect 3-D Torus Globally addressable memory Advanced synchronization features AMD Opteron 6200 processor (Interlagos) New accelerated node design using NVIDIA multi-core accelerators NVIDIA Kepler (K20) GPUs 27 PFlops peak performance 584 TB DDR3 + 110 TB GDDR5 memory Titan Specs Compute Nodes 18,688 Login & I/O Nodes 512 Memory per node 32 GB + 6 GB NVIDIA Kepler (2012) 1.31 TFlops Opteron 2.2 GHz Opteron performance 141 GFlops Total Opteron Flops 2.6 PFlops Disk Bandwidth ~ 1 TB/s 16
Cray XK7 compute node XK7 Compute Node Characteristics AMD Opteron 6200 Interlagos 16 core processor @ 2.2GHz Tesla K20 Kepler @ 1.31 TF with 6GB GDDR5 memory Memory Host: 32GB @ 1600 MHz DDR3 GPU: 6GB @ 5200 MHz GDDR5 Gemini High Speed Interconnect Four compute nodes per XK7 blade. 24 blades per rack 17
Key questions to ask yourself Is both the scale of the runs and the time demands of the problem of LCF scale? Yes, I can t get the amount of time I need anywhere else. Yes, I my simulations are too large to run on other systems. Do you need specific LCF hardware? Yes, the memory and I/O available here are necessary for my work. TIPS Do answer these questions in the proposal. This is especially helpful for the computational readiness reviewers. 18
Key questions to ask yourself (cont.) Do you have the people ready to do this work? No, I m waiting to hire a postdoc. Yes, I have commitments from the major participants. Do you have a workflow? Do you have a post-processing strategy? Do you use ensemble runs and need LCF resources? My ensembles can run under the direction of a large job, with I/O scaling on a parallel file system. -> possible yes My ensemble expects to run millions of serial jobs on nodes with local disk available. -> probably no Some of these characteristics are negotiable, so make sure to discuss atypical requirements with the centers 19
Some limitations on what can be done Laws regulate what can be done on these systems LCF systems have cyber security plans that bound the types of data that can be used and stored on them Some kinds of information we cannot have Personally Identifiable Information (PII) Classified Information or National Security Information Unclassified Controlled Nuclear Information (UCNI) Naval Nuclear Propulsion Information (NNPI) Information about development of nuclear, biological or chemical weapons, or weapons of mass destruction Inquire if you are unsure or have questions 20
Proposal form: Outline 1 Principal investigator and co-principal investigators 2 Project title (80 characters) 3 Research category 4 Project summary (50 words) 5 Computational resources requested 6 Funding sources 7 Other high-performance computing support for this project 8 Project narrative, other materials (A) Executive summary (1 page) (B) Project narrative including impact of the work, objectives, benchmarking (15 pages) (C) Personnel justification & management plan (D) Milestone table (E) Publications resulting from INCITE Awards (*new*) (F) Request for Information Data Management Plan (*new*) 9 Application packages 10 Proprietary and sensitive information 11 Export control 12 Monitor information 21
Getting started: Know your audience Remember, INCITE is very broad in scope Computational-science-savvy senior scientists/engineers drawn around the world from national labs, universities, and industry They will be assessing potential impact of this work versus other proposals submitted TIPS Don t assume that your audience is familiar with your work through other review programs (ex. funding agencies). INCITE is very broad in scope and you may be competing against a diverse set of proposals. 22
Narrative: Impact of the work This is the principal determinant of a successful submittal What is the scientific challenge, and its significance Impact of a successful computational campaign the big picture Reasons this work needs to be done now, on the resources requested TIPS Do give a compelling picture of the impact of this work, both in the context of your field and, where appropriate, beyond. Do explain why this work cannot be done elsewhere. Reviewers scrutinize whether another allocation program may be a better fit. 23
Narrative: Objectives and milestones Successful submittals must also very clearly Describe approach to solving the problem, its challenging aspects, preliminary results Tie to the resources requested your key objectives, key simulations, and project milestones in your milestone table TIPS Do clearly articulate your project s milestones for each year. Reviewers have downgraded proposals that don t show that the PI has a well thought out plan for using the allocation. Do bear in mind that the average INCITE award of time for a single project is equivalent to several million dollars. Spend your time on the proposal accordingly. 24
Narrative Computational approach Provide the basic foundation Describe the underlying formulation Don t assume reviewers know all the codes Do show that the code you plan to employ is the correct tool for your research plan Do explain the differences if you plan to use a private version of a well-known code List programming languages, libraries and tools used Check that what you need is available on the system 25
Narrative: Computational campaign The details are important! Describe the kind of runs you plan with your allocation L exploratory runs using M nodes for N hours X big runs using Y nodes for Z hours P analysis runs using Q nodes for R hours Big runs often have big output and/or big I/O Show you can deal with it and understand the bottlenecks Understand the size of results, where you will analyze them, and how you will get the data there TIPS Do clearly emphasize the relationship between the proposed runs and the major milestones. This helps the Awards Committee maximize your milestones, if they can t grant the full award requested. 26
Narrative: Personnel & Management plan Experience and credibility List the scientific and technical members and their experience as related to the proposed scientific or technical goals Successful proposal teams demonstrate a clear understanding of petascale computing and can optimally use these resources to accomplish the stated scientific/technical goals Transparent use of time Projects involving multiple teams or different thrust areas should clearly state how the allocation will be distributed and managed TIPS Do include in Personnel Justification a brief description of the role of each team member. Although not a requirement, proposals with application developers or clear connections to development teams are favorably viewed by readiness reviewers. 27
Narrative: New for 2014 Call for Proposals Publications resulting from INCITE awards To show impact of the INCITE program, we ask authors to list the publications from previous INCITE awards to this project team for work related to the proposal under consideration Include only publications with INCITE acknowledgements Request for Information Data Management Plan (DMP) We plan to implement in future solicitations a requirement for a formal DMP as part of the proposal. Submit a short document, not to exceed one page, which describes your anticipated future data management strategies and needs. [Note: this is for INCITE management and will not be included in the materials sent to reviewers.] 28
Are you ready to apply now? Port your code before submitting the proposal Check to see if someone else has already ported it Request a startup account if needed (see next slide) Provide compelling benchmark data Prove application scalability in your proposal Run example cases at full scale If you cannot show proof of runs at full scale, then provide a very tight story about how you will succeed TIPS Do make the benchmark examples as similar to your production runs as possible, or, make it clear why another benchmark example is valid for your proposed work. 29
Request a start up account now Director s Discretionary Proposals considered year-round Award up to millions of hours Allocated by LCF center directors Argonne DD Program: http://www.alcf.anl.gov/getting-started/apply-for-dd Director s Discretionary (DD) requests can be submitted anytime DD may be used for porting, tuning, scaling in preparation for an INCITE submittal Submit applications at least 2 months before INCITE Call for Proposals closes Oak Ridge DD Program: www.olcf.ornl.gov/support/getting-started/olcf-director-discretion-project-application/ 30
Computational approach Use of next-generation systems Use as much of the resources on a node as possible Strategies to consider Hybridization utilizing OpenMP or Pthreads to expose thread-level or SMP-like parallelism (for multiple cores/hw threads) Make use of available accelerators (GPUs) using, e.g. CUDA, OpenCL, compiler directives, etc. Algorithmic improvements or design to maximize data locality and memory hierarchy usage TIPS Do provide a development plan articulating a strategy for maximizing node-level parallelism. 31
Code performance overview Performance data should support the required scale Use similar problems to what you will be running Show that you can get to the range of processors required Best to run on the same machine, but similar size runs on other machines can be useful Be clear about the number of nodes, MPI ranks, threads and GPUs (if applicable) being used in runs Include production style I/O in benchmarks (checkpoint/restart, analysis) Describe how you will address any scaling deficiencies TIPS Do provide performance data in the requested format. Do provide performance of the scaling baseline, not just scaling efficiency 32
Parallel performance: Direct evidence WEAK SCALING DATA Increase problem size as resources are increased STRONG SCALING DATA Increase resources (nodes) while doing the same computation Pick the approach(es) relevant to your work and show results Weak Scaling Example Strong Scaling Example Time to solution (m) 11.0 10.8 10.6 10.4 10.2 10.0 Actual Ideal 2400 4800 9600 19200 38400 76800 Time to solution (s) 6,000 5,000 4,000 3,000 2,000 1,000 Actual Ideal 0 2400 4800 9600 19200 38400 76800 Number of processors Number of processors 33
More about our ensemble policy or, Can I meet the computationally intensive criterion by loosely coupling my jobs? Possibly yes, If you require large numbers of discrete or loosely coupled simulations where time-to-solution is an untenable pacing issue, and If a software workflow solution (e.g., pre- and post-processing scripts that automate run management and analysis) is provided to facilitate this volume of work. Probably no, If by decoupling the simulations the work could be effectively carried out on a smaller resource within a reasonable time-to-solution. TIPS Do examine the Frequently Asked Questions for these and other topics at http://hpc.science.doe.gov/allocations/incite/faq.do 34
Proposal form: Final check 1 Principal investigator and co-principal investigators 2 Project title (80 characters) 3 Research category 4 Project summary (50 words) 5 Computational resources requested 6 Funding sources 7 Other high-performance computing support for this project 8 Project narrative, other materials (A) Executive summary (1 page) (B) Project narrative including impact of the work, objectives, benchmarking (15 pages) (C) Personnel justification & management plan (D) Milestone table (E) Publications resulting from INCITE Awards (*new*) (F) Request for Information Data Management Plan (*new*) 9 Application packages 10 Proprietary and sensitive information 11 Export control 12 Monitor information 35
Renewal form: Outline 1 Principal investigator and co-principal investigators 2 Research category 3 Project status summary (1 page) 4 Renewal computational resources requested 5 Project achievements and plans (A) Project achievements (1 page) (B) Project plans (15 pages) Project achievements Accomplishments, publications, allocation use, parallel performance, anticipated data storage needs Project plans for next year What you expect to accomplish, anticipated production-todevelopment job time, benchmarking data if new codes to be used 36
Q&A Open discussion on what authors should include in their proposal 37
Submitting your proposal or renewal You may save your proposal at any time without having the entire form complete Your Co-PIs may also log in and edit your proposal Required fields must be completed for the form to be successfully submitted An incomplete form may be saved for later revisions After submitting your proposal, you will not be able to edit it Submit 38
INCITE awards committee decisions The INCITE Awards Committee is comprised of the LCF center directors, INCITE program manager, LCF directors of science and senior management. The committee identifies the top-ranked proposals by a) peer-review panel ratings, rankings, and reports; and b) additional considerations, such as the desire to promote use of HPC resources by underrepresented communities. Computational readiness review is used to identify whether the topranked proposals are ready for the requested system. 39
INCITE awards committee decisions (cont.) A balance is struck to ensure each awarded project has sufficient allocation to enable all or part of the proposed scientific or technical achievements a robust support model for each INCITE project When the centers are oversubscribed, each potential project is assessed to determine the amount of time that may be awarded to allow the researchers to accomplish significant scientific goals. Requests for appeals can be submitted to the INCITE manager or LCF center directors. If an error has occurred in the decision-making process (e.g. procedural, clerical), consideration is given by the INCITE management and an award may be granted. 40
2014 INCITE award announcements Awards will be announced by INCITE Manager, Julia White, in November 2013 Welcome and startup information from centers Agreements to sign: Start this process as soon as possible! Getting started materials: Work closely with the center Centers provide expert-to-expert assistance to help you get the most from your allocation Scientific Liaisons and Catalysts (OLCF / ALCF) 41
PI responsibilities Let us know your achievements and challenges Provide quarterly status updates (on supplied template) Milestone reports Publications, awards, journal covers, presentations, etc., related to the work Provide highlights on significant science/engineering accomplishments as they occur Submit annual renewal request Complete annual surveys Encourage your team to be good citizens on the computers Use the resources for the proposed work 42
It is a small world Let the science agency that funds your work know how significant the INCITE program and the Leadership Computing Facilities will be to your work Be sure to include the appropriate acknowledgements Contact us if you have questions: we want to hear from you 43
Relevant links INCITE General Information www.doeleadershipcomputing.org/ INCITE Proposal Site proposals.doeleadershipcomputing.org/ Argonne Discretionary Program www.alcf.anl.gov/getting-started/apply-for-dd Oak Ridge Discretionary Program www.olcf.ornl.gov/support/getting-started/olcf-director-discretion-project-application Contact the center if you d like to request Discretionary time for benchmarking 44
Contacts For details about the INCITE program: www.doeleadershipcomputing.org General information proposals.doeleadershipcomputing.org Proposal site INCITE@DOEleadershipcomputing.org For details about the centers: www.olcf.ornl.gov help@nccs.gov, 865-241-6536 www.alcf.anl.gov support@alcf.anl.gov, 866-508-9181 45
Supplementary material 46
Innovative and Novel Computational Impact on Theory and Experiment INCITE is an annual, peer-review allocation program that provides unprecedented computational and data science resources 5 billion core-hours to be awarded for 2014 on the 27-petaflops Cray XK7 Titan and the 10-petaflops IBM BG/Q Mira Average award: 50+ million core-hours Individual awards will be up to several hundred million core-hours INCITE is open to any science domain INCITE seeks computationally intensive, large-scale research campaigns Call for Proposals The INCITE program seeks proposals for high-impact science and technology research challenges that require the power of the leadership-class systems. Allocations will be for calendar year 2014. April 15 June 28, 2013 Contact information Julia C. White, INCITE Manager whitejc@doeleadershipcomputing.org 47
Allocation Programs at the LCFs Mission 60% 30% 10% Director s INCITE ALCC Discretionary High-risk, high-payoff science that requires LCF-scale resources* High-risk, high-payoff science aligned with DOE mission Strategic LCF goals Call 1x/year (Closes June) 1x/year (Closes February) Rolling Duration 1-3 years, yearly renewal 1 year 3m,6m,1 year Typical Size 50 70 projects 50M 100 s M core-hours/yr. 10 20 projects 1M 75M core-hours/yr. 100s of projects 10K 1M core-hours Review Process Scientific Peer-Review Computational Readiness Scientific Peer-Review Computational Readiness Strategic impact and feasibility Managed By INCITE management committee (ALCF & OLCF) DOE Office of Science LCF management Availability Open to all scientific researchers and organizations Capability >20% of cores 48
A sample of codes with local expertise available at Argonne and Oak Ridge Application Field ALCF OLCF FLASH Astrophysics MILC,CPS LQCD Nek5000 Nuclear energy Rosetta Protein structure DCA++ Materials science ANGFMC Nuclear structure NUCCOR Nuclear structure Qbox Chemistry LAMMPS Molecular dynamics NWChem Chemistry GAMESS Chemistry MADNESS Chemistry CHARMM Molecular dynamics NAMD Molecular dynamics Application Field ALCF OLCF AVBP Combustion GTC Fusion Allstar Life science CPMD, CP2K Molecular dynamics CESM Climate CAM-SE Climate WRF Climate Amber Molecular dynamics enzo Astrophysics Falkon Computer science/htc s3d Combustion DENOVO Nuclear energy LSMS Materials science GPAW Materials science 49