High Performance Computing for Engineers

Similar documents
High Performance Computing for Engineers

EE 579: Digital System Testing. EECS 579 Course Goals

EPCC A UK HPC CENTRE. Adrian Jackson. Research Architect

Multicore Processors Big deal? or No big deal? Steven Parker SCI Institute School of Computing University of Utah

Announcement of Request for DoD HPC Modernization Program HPC Equipment Reutilization Proposals

How do I get an Allocation?

Software as Infrastructure at NSF. Daniel S. Katz Program Director, Division of Advanced Cyberinfrastructure

FPGA Accelerator Virtualization in an OpenPOWERcloud. Fei Chen, Yonghua Lin IBM China Research Lab

How to obtain HPC resources. A. Emerson, HPC, Cineca.

SJTU CCOE Annual Report and Renew Request

DARPA-BAA TRADES Frequently Asked Questions (FAQs) as of 7/19/16

Report from the Executive Committee

You ll love the Vue. Philips IntelliVue Information Center ix

Personal cloud computer available for everyone

NICS and the NSF's High-Performance Computing Program. Jim Ferguson NICS Director of Education, Outreach & Training 8 September 2011

Best Practices for Writing a Successful NSF MRI Grant Proposal

Collaborative R&D Funding Infineon UK

DEPARTMENT OF INSTRUMENTATION AND CONTROL ENGINEERING (Established in 1993)

2009 Student Technology Fee Proposal Form

Domain Reuse. Mr. Neil Patterson & Mr. Milton Smith

ASLR^Cache: Practical Cache Attacks on the MMU

Success Story Enabling Global Growth with NVIDIA GRID

Enabling Safe Multi-Computer Usage with Flash Memory. Flash Memory Summit Session 101 Consumer Applications Panelist: Jay Elliot

2 nd Call for Collaborative Data Science Projects

Introduction. Groupware. Groupware development and research contexts. Time-space classification of groupware

UNCLASSIFIED. R-1 ITEM NOMENCLATURE PE D8Z: Central Test and Evaluation Investment Program (CTEIP) FY 2011 Total Estimate. FY 2011 OCO Estimate

INCITE Proposal Writing Webinar April 25, 2013

DEEP LEARNING FOR PATIENT FLOW MALCOLM PRADHAN, CMO

Deployment Guide. GlobalMeet 5 June 27, 2018

School of Engineering and Technology

2017 Annual Missile Defense Small Business Programs Conference

Realization of FPGA based numerically Controlled Oscillator

Federal Demonstration Partnership. January 12, 2009 Michael Pellegrino

DEFENSE INFORMATION SYSTEMS AGENCY P. O. BOX 4502 ARLINGTON, VIRGINIA Joint Interoperability Test Command (JTE) 23 Dec 09

Hilton Reservations and Customer Care

YOU ARE CORDIALLY INVITED to attend the IDN-KANSAS CITY Trade Show THURSDAY APRIL 19, :00-2:00 2-DAYS OF EDUCATION. Holiday Inn & Suites

Army Ground-Based Sense and Avoid for Unmanned Aircraft

Radar Open Systems Architectures

Methicillin resistant Staphylococcus aureus transmission reduction using Agent-Based Discrete Event Simulation

Tuning Your MPI Application Without Writing Code. SNUG TechTalk, 8 Feb 2012

Code 85 Weapons Analysis Facility (WAF) Technical Engineering Services Pre-Solicitation Conference

Agathoklis Papadopoulos, PhD

UNCLASSIFIED. UNCLASSIFIED Army Page 1 of 7 R-1 Line #9

NAWIPS Migration to AWIPS II Status Update Unidata Users Committee Meeting. NCEP Central Operations 11 April 2011

Computer System. Computer hardware. Application software: Time-Sharing Environment. Introduction to Computer and C++ Programming.

Net-Enabled Mission Command (NeMC) & Network Integration LandWarNet / LandISRNet

PFAU 9.0: Fluid-Structure Simulations with OpenFOAM for Aircraft Designs"

The future of innovation in view of the new EU policies: Europe 2020, Innovation Union, Horizon Nikos Zaharis, SEERC December 29, 2011

YOUR BUSINESS CAN T WAIT for VoIP

UNCLASSIFIED R-1 ITEM NOMENCLATURE FY 2013 OCO

Computer Science Undergraduate Scholarship

Mr. Vincent Grizio Program Manager MISSION SUPPORT SYSTEMS (MSS)

Simulation: Overview and Taxonomy

Adopting HTCondor at Raytheon

GLOBALMEET USER GUIDE

UNCLASSIFIED FY 2016 OCO. FY 2016 Base

DARPA BAA HR001117S0054 Intelligent Design of Electronic Assets (IDEA) Frequently Asked Questions Updated October 3rd, 2017

GE Medical Systems Information Technologies. ApexPro FH Enterprise-Wide Telemetry

One Size Doesn t Fit All

ARMY RDT&E BUDGET ITEM JUSTIFICATION (R2 Exhibit)

Welcome to the 2016 XSEDE Summer Boot Camp

INTRODUCTION TO Mobile Diagnostic Imaging. A quick-start guide designed to help you learn the basics of mobile diagnostic imaging

Server, Desktop, Mobile Platforms Working Group (SDMPWG) Dated

Joseph Wei

INCITE Proposal Writing Webinar

ECE Computer Engineering I. ECE Introduction. Z. Aliyazicioglu. Electrical and Computer Engineering Department Cal Poly Pomona

Moving Target Artillery Round (MTAR) 2016 NDIA Armament Systems Forum

AFCEA Mission Command Industry Engagement Symposium

DARPA BAA HR001117S0054 Posh Open Source Hardware (POSH) Frequently Asked Questions Updated November 6, 2017

Streamline Practice, Laboratory and Clinical Workflows. Healthcare Identification Solutions

7. Study and Evaluation Scheme for Diploma Programme In Computer Engineering (For the State of Haryana)

It's time for a change to better utilize resources in healthcare

UNCLASSIFIED. R-1 Program Element (Number/Name) PE A / Landmine Warfare and Barrier Advanced Technology. Prior Years FY 2013 FY 2014 FY 2015

P.O.Box 4500 Phone: +358 (0) OULU FINLAND Homepages:

Table of Contents DARPA-BAA-16-62

2-DAYS OF EDUCATION TUESDAY, AUGUST 7, :00 AM - 5:00 PM TRADE SHOW WEDNESDAY, AUGUST 8, :00 AM - 2:00 PM YOU ARE CORDIALLY INVITED

L.Y r \ Office ofmanagement and Budget

Dynamic Decision Support A War Winning Edge

CAP IP. ip intercom REFUGE CALL POINTS IP NETWORK SALES DEPARTMENT SECURITY DISPATCH SMARTPHONE SUPERVISION IP MAYLIS RANGE

5Ways to. Leverage Data-driven Patient Care

Pioneering ONLINE RECRUITMENT software

MLR Institute of Technology

eprint MOBILE DRIVER User Guide

PROCEDURE FOR MOBILE DEVICE & TELEWORKING POLICY

UNCLASSIFIED. UNCLASSIFIED Army Page 1 of 10 R-1 Line #10

UNCLASSIFIED R-1 ITEM NOMENCLATURE FY 2013 OCO

Oracle Taleo Cloud for Midsize (TBE)

The Application and Use of Telepresence Robots. April 2011

CONTINUOUS IMPROVEMENT INITIATIVE GUIDELINES OCTOBER 2017

Universal Armament Interface (UAI)

europeana business plan 2012

Consultancy Services for Building a Knowledge Management System (Data Portal) for Ministry of Education, Baghdad (Re-Advertisement)

CENGN Summit December 7, 2017 Strategic Program Development and Delivery Office

Powerful yet simple digital clinical noting and sketching from PatientSource. Patient Care Safely in One Place

Developing a LIMS to Support Trials in the United Kingdom

2018 Annual Missile Defense Small Business Programs Conference

CSE255 Introduction to Databases - Fall 2007 Semester Project Overview and Phase I

AIR FORCE MISSION SUPPORT SYSTEM (AFMSS)

Acute Care Solutions. A range of modern, intuitive and marketleading solutions for the next generation of hospital IT

UNIVERSITY LIBRARIES

Transcription:

High Performance Computing for Engineers David Thomas dt10@ic.ac.uk Room 903 HPCE / dt10 / 2012 / 0.1

High Performance Computing for Engineers Research Testing communication protocols Evaluating signal-processing filters Simulating analogue and digital designs Tools CAD tools: synthesis, place-and-route, verification Libraries/toolboxes: filter design, compressive sensing Products Oil exploration and discovery Mobile-phone apps Financial computing HPCE / dt10 / 2012 / 0.2

High Performance Computing for Engineers Types of performance metrics Throughput Latency Power Design-time Capital and running costs Required versus desired performance Subject to a throughput of X, minimise average power Subject to a budget of Y, maximise energy efficiency Subject to Z development days, maximise throughput HPCE / dt10 / 2012 / 0.3

What is available to you Types of compute device Multi-core CPUs GPUs (Graphics Processing Units) MPPAs (Massively Parallel Processor Arrays) FPGAs (Field Programmable Gate Arrays) Types of compute system Embedded Systems Mobile Phones Tablets Laptops Grid computing Cloud computing HPCE / dt10 / 2012 / 0.4

2012 : LG Optimus 2X NVidia Tegra 2 - CPU : Dual-core ARM Cortex A9 - GPU : ULP GeForce (8 cores) Imgs : http://www.techradar.com/reviews/phones/mobile-phones/lg-optimus-2x-929388/review, http://www.anandtech.com/show/2911 HPCE / dt10 / 2012 / 0.5

2012 : Lenovo Thinkpad Edge E525 AMD Fusion A8-3500M - CPU : Quad-Core 2.4GHz Phenom-II - GPU : HD 6620G 400MHz (320 cores) Img:http://laptops-specs.blogspot.com/2011/09/lenovo-thinkpad-edge-e525-specs.html, http://www.techradar.com/images/zoom/amd-llano-965315/index1 HPCE / dt10 / 2012 / 0.6

2012 : Imperial HPC Cluster cx2 - SGI Altix ICE 8200 EX Racks and racks of high-performance PCs 3000+ x64 cores running at 3GHz Available to researchers and undergrads (if they ask nicely) Grid-management system Run program on 1000 PCs with one command HPCE / dt10 / 2012 / 0.7

Performance and Efficiency Relative to CPU 60.0 50.0 40.0 30.0 G P U MP P A F P G A 0.0 10.0 20.0 200.0 150.0 100.0 50.0 F P G A G P U MP P A 0.0 345 Un i f o rm G a u ssi a n E xp o n e n t i a l M e a n (G e o ) U n i fo r m G a u s s i a n E x p o n e n ti a l Me a n ( G e o ) Performance Power Efficiency HPCE / dt10 / 2012 / 0.8

Design tradeoffs HPCE / dt10 / 2012 / 0.9

Design tradeoffs HPCE / dt10 / 2012 / 0.10

Design tradeoffs HPCE / dt10 / 2012 / 0.11

Design tradeoffs Task-based parallelism vs threads Easy to program (less time coding) Easy to get right (less time testing) Many implementations and APIs Intel Threaded Building Blocks (TBB) Microsoft.NET Task Parallel Library OpenCL HPCE / dt10 / 2012 / 0.12

Design tradeoffs HPCE / dt10 / 2012 / 0.13

Design tradeoffs Src: NVIDIA CUDA Compute Unified Device Architecture, Programmers Guide HPCE / dt10 / 2012 / 0.14

Design tradeoffs HPCE / dt10 / 2012 / 0.15

Design tradeoffs HPCE / dt10 / 2012 / 0.16

Design tradeoffs HPCE / dt10 / 2012 / 0.17

What will you learn Systems: what high-performance systems do you have Methods: how can these systems be programmed Practise: concrete experience with multi-core and GPUs Analysis: knowing what to use and when HPCE / dt10 / 2012 / 0.18

What you won t learn Multi-threaded programming PThreads, windows threads, mutexes, spin-locks,... We ll look at the concepts and hardware, but ignore the practise Not needed when using modern task-based methods OpenMP API for parallelising for-loops in C/C++ Old technology, not very user-friendly Doesn t map nicely to architectures such as GPUs We ll use modern techniques such as TBB and CUDA/OpenCL MPI (Messaging Passing Interface) Point-to-point communication between networks Important; but very specialised: entire course by itself This course only considers common non-specialist systems HPCE / dt10 / 2012 / 0.19

Structure of the course Exam (50%) + two practical courseworks (50%) Task-based project using Intel Threaded Building Blocks Simple and robust framework for task-level parallelism Highly portable: linux, windows, posix source GPU based project using CUDA or OpenCL If you have a GPU in your laptop, use that Certain lab-machines have GPUs compatible with CUDA Will also explore using OpenCL to target both CPUs and GPUs HPCE / dt10 / 2012 / 0.20

Skills needed Basic programming If you can t program in _any_ language then worry Intel TBB uses C++ rather than C Some weird C++ stuff, but not scary: explained in lectures Working examples given and explained Templates given as starting point for project work GPU programming uses CUDA or OpenCL (both C-like) Let s you use whatever graphics card you happen to have Working examples, explained in lectures Template as starting point for project work Not expected to become a guru, just make it faster HPCE / dt10 / 2012 / 0.21

Key Focus: Engineering How does this apply to you? Examples from Elec. Eng. problems Mathematical analysis Simulation of digital circuits VLSI circuit layout Communication channel evaluation (Fractal zoomers) Tools and languages used in EE C MATLAB qsub (Imperial HPC cluster) HPCE / dt10 / 2012 / 0.22

Simple example : Totient function Eulers totient function: totient(n) Number of integers in range 1..n which are relatively prime to n Integers i and j are relatively prime if gcd(i,j)=1 Totient not included in MATLAB HPCE / dt10 / 2012 / 0.23

Version 0 : Simple loop Eulers totient function: totient(n) Number of integers in range 1..n which are relatively prime to n Not included in MATLAB Integers i and j are relatively prime if gcd(i,j)=1 function [res]=totient_v0(n) res=0; for i=1:n % Loop over all numbers in 1..n if gcd(i,n)==1 % Check if relatively prime res=res+1; % Count any that are end end HPCE / dt10 / 2012 / 0.24

Version 1 : Vectorising Convert loops into vector operations Standard MATLAB optimisation Actually a way of making parallelism explicit function [res]=totient_v1(n) numbers=1:n; % Generate all numbers in 1..n gcd_res= (gcd(numbers,n)==1); % Perform GCD on all numbers res=sum(gcd_res==1); % Count all relatively prime numbers HPCE / dt10 / 2012 / 0.25

Version 2 : Parallel for loop MATLAB supports a parfor command Each loop iteration is/may be executed in parallel Can operate on multiple cores, and even multiple machines HPCE / dt10 / 2012 / 0.26

Version 2 : Parallel for loop MATLAB supports a parfor command Each loop iteration is/may be executed in parallel Can operate on multiple cores, and even multiple machines function [res]=totient_v2(n) res=0; parfor i=1:n % Loop over all numbers in 1..n if gcd(i,n)==1 % Check if relatively prime res=res+1; % Count any that are end end HPCE / dt10 / 2012 / 0.27

Version 3 : Agglomeration Too much overhead with current parallel loop Each parallel iteration has a cost due to scheduling Process space in chunks, using smaller vectors function [res]=totient_v3(n, step) if nargin<2 % How large each chunk should be step=1000; end res=0; % Loop over each chunk parfor i=1:floor(n/step) % Then process each chunk as a vector numbers=(i-1)*step+1:min(i*step,n); rel_prime= (gcd(numbers,n)==1); res=res+sum(rel_prime); end HPCE / dt10 / 2012 / 0.28

Results from my dual-core laptop 8 6 v0: For Loop v1: Vectorised v2: ParFor Loop v3: ParFor Chunked 4 2 0 0 0.5 1 1.5 2 2.5 x 10 5 HPCE / dt10 / 2012 / 0.29

Questions? HPCE / dt10 / 2012 / 0.30