NICS and the NSF's High-Performance Computing Program Jim Ferguson NICS Director of Education, Outreach & Training 8 September 2011
Acknowledgement Phil Andrews, 1955-2011 NICS Director, pre-opening of NICS through February 2011 Studied Physics at Cambridge, Purdue, Princeton Worked at Pittsburgh Supercomputing Center and San Diego Supercomputer Center before moving to the National Institute of Computational Sciences. 2
National Institute for Computational Sciences University of Tennessee and ORNL partnership NICS is the 2 nd NSF Track 2 center Building on strengths of UT and ORNL Systems are now in production Series of computers culminating in a 1.2 PF system in 2011 3
NICS Resources Kraken legendary sea monsters of gargantuan size said to have dwelt off the coasts of Norway and Iceland 4
How Big is The Kraken? 112,896 compute cores residing on 9,408 nodes. 147 terabytes of compute memory. 3.3 raw petabytes of parallel file system. Theoretical peak performance of 1.17 petaflops (FLoating point OPerations per second).
6
NICS Timeline NSF Track-2 Award Announced Sep 2007 First Cray XT3 Resources Available May 2008 Cray XT3 Production 40 TF Jun 2008 Cray XT4 Acceptance Jul 2008 Cray XT4 Production 18K cores - 166 TF Aug 2008 Cray XT5 Delivery Dec 2008 Cray XT5 Production 64K cores 615 TF Feb 2009 100M Hours Awarded at TRAC Mar 2009 Cray XT5 Upgrade 100K cores ~1 PF Nov 2009 Cray XT5 Upgrade 112K cores -- ~1.2 PF Jan 2011 7
Other significant NSF Awards at NICS NSF TeraGrid XD center: Remote Data Analysis & Visualization (RDAV). $10M/3-year award to UT, main computing resource to be an SGI Ultraviolet machine. Sean Ahern, Principal Investigator NSF Track 2D Experimental system: Main computing resource will be a GPU-based hybrid system. Award through Georgia Tech, Principal Investigator is Jeffrey Vetter Kraken memory upgrade Extreme Science and Engineering Discovery Environment (XSEDE): Just announced, $18M/5-year 8
Access to NSF computational resources Done by Peer Review process at quarterly Resource Allocation meeting. Prospective Principal Investigators apply, then are reviewed by peers, and allocations made among the NSF resources available. Directors at each Center also have discretionary time that they may dole out. Startup accounts for researchers to try out computing resources are available easily. Educational allocations are usually approved easily. 9
IGMCS Access UT students registered for the IGMCS program will get an allocation on NICS resources Details being finalized, likely at least 20,000 hours for each student 10
How did we get Here? The NSF Supercomputing Timeline Pre 1984: scramble for cycles First Proposals: unsolicited appeals to NSF for national level resources, 1983-1984 Centers Program: 1986-1996 PACI Program: 1997-2004 TCS+DTF+ETF = TeraGrid: 2000-2011 Track 2: 2007-2010 (A D) Track 1: 2011 TeraGrid XD: 2011 ( XSEDE ) 11
Technology Changes Initial systems in the 1980s were all vector processors (Cray or ETA) which favored vectorizable code Today s systems are all Massively Parallel or clusters which favor scalable code. Along the way, we ve seen several different architectures, including Single Instruction Multiple Data, Multi-threaded, etc. Already arrived: General Processing Graphics Processing Units and other accelerators. Offer great speedup with lower power requirements, but are currently difficult to program efficiently. 12
13 Seymour Cray
NSF HPC Early Years Little Open capability: NMFECC for Fusion Early 1980s: Studies evaluating need for access to supercomputers. Program Advisory Committee at NSF, chaired by Neal Lane including 25-30 computational scientists, frequent meetings. Unsolicited proposals from Princeton (Steve Orszag), Illinois (Larry Smarr) and UC-San Diego (Sid Karin), Oct 83 Feb 84. Later 1984: NSF formal solicitation, 22 proposals received. 14
Early Years (continued) Early 1985: Three Center awards announced: San Diego Supercomputer Center, National Center for Supercomputing Applications, and Jon von Neumann Center Advanced Prototype award to Cornell, who later became a Center. Late 1985: Friendly users at Centers. Official openings in early 1986. Early 1986: Award to Pittsburgh Supercomputing Center. 15
What about the Network? NSFnet formed in 1986-1987 to connect the Centers. NSF enforced the adoption of TCP/IP standards for networking. This worked out! NCSA Telnet provided full TCP/IP stack for PC, Macintosh, and Sun workstations. NSFnet was eventually released by NSF and transitioned to private hands, the beginnings of today s commodity internet (1995). vbns, a fast network research network connecting the Centers, debuted the same day NSFnet went commodity. 16
Open Software development NCSA Telnet: a model for open source tools. Gplot, Gdoc and P3D from PSC SDSC and NCSA Image Tools NCSA Mosaic: the ultimate NSF Center software tool 17
18
What is different today? Back then: supercomputers were like tall buildings monolthic and designed as a unit, unique and difficult to change. Now: supercomputers are generally made up of smaller components, and pieces can be added or subtracted at any time. The problems of modern centers are essentially civil engineering issues: weight, power, and heat dissipation. 19
NSF Reorganizes 2005: Supercomputing was moved out of CISE and into the Office of Cyberinfrastructure (OCI) reporting directly to the NSF Director. Separate research award programs, SDCI and STCI, established. Other Directorates at NSF have strong input into directions of OCI. Resulted in more service oriented (and shorter), prescriptive awards. 20
Tracks 1 and 2: A new beginning Four Track 2 RFPs announced, yearly awards, $30M captial equipment, operations funded separately. One Track 1 award, $200M for sustained PetaScale system, production target 2011. Very prescriptive on allowed work. Production oriented, very low operations funding. TeraGrid connectivity important. 21
Track 2 Awards 2A: Texas Advanced Computing Center, recommended 2006, awarded 2007. Sun system, AMD processors, custom switch. 2B: National Institute for Computational Sciences, recommended & awarded 2007. Cray XT5, AMD processors, SeaStar interconnect. 2C: Pittsburgh recommended 2008, never awarded due to problems with main vendor (SGI). 2D: San Diego, Georgia Tech, Indiana awarded smaller experimental systems. Dear Colleague Letter awards in 2010 to NICS, TACC, NCSA, SDSC, PSC for a variety of systems & upgrades. 22
Track 1 Award Awarded to the National Center for Supercomputing Applications in 2007. IBM PERCS system (Power Series) Sustained PetaFlop >10PF, 1 PB memory, 10PB disk, 1EB storage. 100 Gb/s WAN connectivity IBM Cancelled contract (through an Out clause) in August 2011, NCSA negotiating a new deal with new vendors. 24
What s Next? TeraGrid XD: 2.5 years in the making Two major proposals went in, NSF asked that they be blended (roughly 85/15) New program, XSEDE, has major partners NCSA, PSC, TACC and NICS. Researchers who use the current TeraGrid should see little or no disruption to current practice. 25
September 8, 2011 XSEDE: National Computational Science and Education Services
What is XSEDE? XSEDE is a comprehensive set of advanced, heterogeneous high-end digital services, integrated into a general-purpose infrastructure. 27
What does that mean? 28
XSEDE Vision The extreme Science and Engineering Discovery Environment (XSEDE) will: enhance the productivity of scientists and engineers by providing them with new and innovative capabilities. Thus, XSEDE will: facilitate scientific discovery while enabling transformational science and engineering, and innovative educational programs. XSEDE will fulfill this vision by creating an advanced, capable, and robust cyberinfrastructure supported by the combined expertise of a distributed team of leading CI (cyberinfrastructure) professionals. 29
XSEDE Characteristics: XSEDE forms the foundation of a national cyberinfrastructure (CI) ecosystem Its comprehensive suite of advanced digital services will federate (combine) with other high-end facilities and campus-based resources XSEDE integrates diverse digital resources Its open architecture allows continued addition of new technology capabilities and services 30
XSEDE is about.... Increasing productivity leading to more science making the difference between a feasible project and an impractical one Transformative impact through active, formal requirements gathering processes to understand the needs of the community new and expanded advanced support that includes external short-term contracting for expertise beyond the current team Novel and Innovative Projects: supports novel science areas, demographic diversity, innovative technologies, science gateway development, data repositories, and campus bridging 31
And National Training and Education and Outreach programs with the scope and scale to: increase diversity of topics, modes of delivery, and reach to new communities and audiences broaden participation among under-represented communities campus bridging for effective use of CI (cyberinfrastructure) resources integrate with campuses through expanded Champions program and additional bridging activities establish certificate and degree programs institutional incorporation of CS&E curricula; professional development certificate prepare undergraduates, graduates and future K-12 teachers 32
Thank You! Jim Ferguson (jwf@utk.edu)
Who works at a place like NICS? System Administrators. Large HPC systems, security systems, file systems, job scheduling, mass storage systems, networks, user account administration, etc. User Support. Front-line answers for users, documentation, FAQ collection, web site management, user surveys, training, allocation management, diagnosing problems. Computational Scientists. Second-line, in depth help for users. Experts in various disciplines, training, documentation, diagnosing problems. Management. Reporting, proposal writing, hiring Outreach & Education. External Relations.