igh erformance omputing

Similar documents
ISC 2016, Frankfurt, June 20, Highlights of the 47 th. TOP500 List. Erich Strohmaier

SC17, Denver, November 14, Highlights of the 50 th. TOP500 List. Erich Strohmaier

TOP Years - 50 Editions

7),8) (GPU) SIMD ClearSpeed (GSIC) 53% TSUBAME. NVIDIA Tesla GPU TFlops. Tokyo Institute of Technology 2 JST, CREST

Cloud, Distributed, Embedded. Erlang in the Heterogeneous Computing World. Omer

Computational Challenges in Cold QCD. Bálint Joó, Jefferson Lab Computational Nuclear Physics Workshop SURA Washington, DC July 23-24, 2012

REPORT, RE0813, MIL, ENV, 810G, TEMP, IN HOUSE, , PASS

NAWEA 2015 SYMPOSIUM

Making 100G/200G Economical for DCI

Flight Software Overview

Out-of-Core Cholesky Factorization Algorithm on GPU and the Intel MIC Co-processors

P500/700/900 Part Number. P510/710/910 Part Number N/A N/A SBB0K65455 N/A N/A N/A SBB0K65456 N/A N/A N/A SBB0K65457 N/A N/A N/A SBB0K65521 N/A

Electronic Structure Workshop, Spring 2017

FINANCIAL ANALYSIS. Stoby

HPC 2 (nee ERC) History

Profile-driven Selective Code Compression

Automotive. Transforming data into faster insight and more competitive F1 race cars

Supercomputing in Plain English

INVESTOR PRESENTATION

Spacecraft Simulation Tool. Debbie Clancy JHU/APL

INVESTOR PRESENTATION

HPC Market Update October Addison Snell Christopher Willard, Ph.D. Laura Segervall

Tier-2. Joint Annual Meeting of ÖPG/SPS/ÖGAA - Innsbruck /48. Austrian Federated WLCG

Reducing Code Size with Run-time Decompression

INVESTOR PRESENTATION. December 1, 2017

Investigating the Problems of Ship Propulsion on a Supercomputer

Fast Software-managed Code Decompression

INVESTOR PRESENTATION. March 2018

An Architecture for Combined Test Data Compression and Abort-on-Fail Test

Addressing DDR5 design challenges with IBIS-AMI modeling techniques

Scalable Data Structure to Compress Next- Generation Sequencing Files and its Application to Compressive Genomics

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Fast Floating Point Compression on the Cell BE Processor

Computing s Energy Problem:

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

PC-based systems ELOP II-NT

INVESTOR PRESENTATION. December 1, 2017

AccuRAID iscsi Auto-Tiering Best Practice

The Future of Field-Programmable Gate Arrays

FREE MOTION SIMULATION OF A SAILING YACHT IN UP-WIND CONDITION WITH ROUGH SEA

The Innovation Imperative! Monday, October 21, 13

Volume A Question No : 1 You can monitor your Steelhead appliance disk performance using which reports? (Select 2)

A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications

Visualize Nitrogen Gas Consumption

SAFE CAPACITY TEST: INNOVATION AND SAVINGS

A Novel Decode-Aware Compression Technique for Improved Compression and Decompression

Series Environmental Chambers

FireHawk M7 Interface Module Software Instructions OPERATION AND INSTRUCTIONS

Status of SuperKEKB Control System. April 25, 2012 Tatsuro NAKAMURA KEKB Control Group, KEK

Advanced Test Equipment Rentals ATEC (2832) OMS 600

securing networks with silicon

2011 ScheduALL FOXTEL

Pushing the Boundaries of UpFront CFD. Sean Horgan and Mike Clapp 80/20 Engineering Ltd

OptonPro. High-power laser up to 7000 mw. Laser therapy natural healing with the power of light

MEMS for automotive and consumer electronics

A Quadtree-Based Lightweight Data Compression Approach to Processing Large-Scale Geospatial Rasters

Economic Benefits Of Compressor Analysis >

HP Integrity Superdome 2

NAMD & HELIUM Enabling Work on the PRACE IBM Prototypes

WHY CHINA IS A GOOD MARKET FOR FD SOI

Less power great performances

UNIQUE HYBRID SCRUBBERS

PRC CO ² -LASER PRESENTATION

Protecting shared storage from rogue jobs by I/O profiling and dynamic load balancing

Customer Responsibilities

HASTAC High stability Altimeter SysTem for Air data Computers

The evolution of ENEAGRID/CRESCO HPC infrastructure

The evolution of ENEAGRID/CRESCO HPC infrastructure

On the use of rotor equivalent wind speed to improve CFD wind resource mapping. Yavor V. Hristov, PhD Plant Performance and Modeling Vestas TSS

PURCHASE AND DELIVERY OF A HELIUM CLOSED-CYCLE OPTICAL CRYOSTAT. The present tender is only an unofficial translation provided as a courtesy.

Code Basic module and level control complete with optionals code

NSF's Ocean Observatories Initiative: Building Research Infrastructure for the Pacific Northwest and the Broader Community

Low gas volume & flow measurements made easier Gas Endeavour

Optical Time Domain Reflectometer. Operating System 2 Windows XPe Windows XP Pro with desktop option 256 MB

Laboratory 2(a): Interfacing WiiMote. Authors: Jeff C. Jensen (National Instruments) Trung N. Tran (National Instruments)

Unisys. Imagine it. Done. c Consulting. c Systems Integration. c Outsourcing. c Infrastructure. c Server Technology. Unisys NDP 30 and NDP 110

Condor Week Condor WAN scalability improvements. A needed evolution to support the CMS compute model

Instructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. Fall Lecture #39. Agenda

Sorted by "Department" below and then by "CrsNo" (Updated February 28, 2017)

B.R.O. Basketball Return Optimizer. Electrical & Computer Engineering

DataCore Cloud Service Provider Program (DCSPP) Product Guide

Persistent Memory Performance Benchmarking & Comparison. Eden Kim, Calypso Systems, Inc. John Kim, Mellanox Technologies, Inc.

SteelHead Product Family

HYDRAULICS. H89.8D - Hydraulic Bench

Equipment for all types of activities, excitement for one and all!

CT PET-2018 Part - B Phd COMPUTER APPLICATION Sample Question Paper

A Study on Algorithm for Compression and Decompression of Embedded Codes using Xilinx

TECHNICAL COMMERCIAL OFFER TURBOGENERATOR UNITS «TURBOSPHERE»

Energy Savings Through Compression in Embedded Java Environments

How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library

R I C H A R D M A H O N Y Global Director, Consulting Ovum

OPAP S.A. Corporate Presentation November Nikos Polymenakos Investor Relations

Image compression: ER Mapper 6.0 ECW v2.0 versus MrSID 1.3

Numerical simulation of the ALBA s synchrotron cooling system response to pump start-up and shut-down Page 1

KMT Waterjet Systems Inc.

Study on Fire Plume in Large Spaces Using Ground Heating

Kodiak Electric Renewable Energy Implementation. Colin Young System Engineer 4/26/2016

COMPRESSORS WITH SIDE STREAM

Evaluation of Unstructured WAVEWATCH III for Nearshore Application

Angelie EIS System. Electrical Impedance Segmentography. When the smallest thing matters

Transcription:

igh erformance omputing 2012/11/19 Xavier Vigouroux Business Development Manager 1 The views expressed are those of the author and do not reflect the official policy or position of Bull

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull 2

Yes! It s worth to invest 3

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull 4

1996 PhD The views expressed are those of the author and do not reflect the official policy or position of Bull Who am I 5

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull 6

Bull: European leader in mission-critical digital systems recognized worldwide in secure systems OPERATING IN 50 COUNTRIES 1.3bn REVENUES Efforts in research in 2011 +23% +29% growth in profitability in 1 st quarter 2012 +4,6% growth in 2011 7

The views expressed are those of the author and do not reflect the official policy or position of Bull 8

How large is a machine? top500. m. Linpack Ax = b Flops/s 9

Top 3 16 325 Tflops/s 20 132 Tflops/s 1 572 864 PowerPC cores 17 590 Tflops/s 27 112 Tflops/s 299 008 cores AMD 261 632 cores GPU 10 051 Tflops/s 11 280 Tflops/s 705 024 sparc cores 10

TOP500 1,E+09 1,E+08 1,E+07 1,E+06 1,E+05 1,E+04 1,E+03 1,E+02 1,E+01 1,E+00 0 10 20 30 40 50 60 Rpeak Rmax 11

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull 3 petaflop-scale systems TERA 100 CURIE IFERC 1.25 PetaFlops 140 000+ Xeon cores 256 TB memory 30 PB disk storage 500 GB/s IO throughput 580 m² footprint 2 PetaFlops 90 000+ Xeon cores 148 000 GPU cores 360 TB memory 10 PB disk storage 250 GB/s IO throughput 200 m² footprint 1,5 PetaFlops 70 000+ Xeon cores 280 TB memory 15 PB disk storage 120 GB/s IO throughput 200 m² footprint 12

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull 3 large GPU based systems #7 green500 TERA 100 GPU-based extension 198 bullx B505 accelerator blades 396 NVIDIA Tesla M2090 GPU processors 202,752 GPU cores CURIE GPU-based extension 144 bullx B505 accelerator blades 288 NVIDIA Tesla M2090 GPU processors 147,456 GPU cores BSC GPU-based system 126 bullx B505 accelerator blades 252 NVIDIA Tesla M2090 GPU processors 129,024 GPU cores 13

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull Source: http://www.top500.org list nov 12 14

The views expressed are those of the author and do not reflect the official policy or position of Bull 15

Load the supercomputer 16

Ramp up the users Tier 0 1 Petaflop/s 50000 cores European, PRACE TGCC, IFERC Tier 1 100 Teraflop/s 5000 cores National Meso Centres 10 Teraflop/s 500 cores Regional 17

Cooling 12kW/rack 6 C 30KW 30 C 80KW 18

Cooling Water has closed as possible to the heat source Water can be hotter (as delta T is key) Room can be hotter (remove CRAC) But, maintainability is key! No change in maintenance process CPU can be changed, DIMM can be changed Blades can be removed 30 C 80KW 19

% The views expressed are those of the author and do not reflect the official policy or position of Bull Power Feeding 15% 10% PSU Efficiency (240V) 93 92 91 90 89 88 87 86 85 84 83 82 0 50 100 150 200 250 300 Load (V) 20

Efficient Hardware 2 x CPUs 2x GPUs/Xeon Phis 21

J/s Consumed Energy (J) The views expressed are those of the author and do not reflect the official policy or position of Bull Bull - Energy aware batch scheduler Fix the CPU frequency $# srun --cpu-freq=2700000 --resv-ports -N2 -n64./cg.c.64& $# sacct -j 58 -format=jobid,elapsed,avecpufreq,consumedenergy JobID Elapsed AveCPUFreq ConsumedEnergy ------------ ---------- ---------- -------------- 66 00:00:49 2640340 19668 Effective CPU Frequency Job Power consumption 450 400 350 300 250 200 150 0 1000000 2000000 3000000 freq 25000 23000 21000 19000 17000 15000 0 50 100 Duration (s) 22

Use efficient Library Courtesyof Jack DONGARRA MAGAM + starpui 23

The views expressed are those of the author and do not reflect the official policy or position of Bull 24

2 Key elements Computing the function Managing Data Data function Results 25

Data Movement and Floating point FPS-164 and VAX (1976) 11 Mflop/s; transfer rate 44 MB/s Ratio of flops to bytes of data movement: 1 flop per 4 bytes transferred Nvidia Fermi and PCI-X to host 500 Gflop/s; transfer rate 8 GB/s Ratio of flops to bytes of data movement: 62 flops per 1 byte transferred Flop/s are cheap, so are provisioned in excess 26

[DATA] Non Volatile RAM Source: ITRS 27

[DATA] 3D Wioming from CEA leti Wide IO Memory Interface Next Generation Courtesy of CEA leti 28

[DATA] Hybrid Memory Cube Wide I/O standard comes from mobile world True «3D» (with Through Silicon Vias») Bandwidth Energy DDR-3 10,66 GB/s 50-70pJ/bit HMC 128GB/s 8 pj/bit Source: http://www.synopsys.com, http://blogs.intel.com/research/2011/09/15/hmc/, http://www.semiconwest.org/sites/semiconwest.org/files/docs/shekhar%20borkar_intel.pdf 29

[DATA] Photonics Photonic is getting closer to the CPU Basic blocks are in place: Modulator Multiplexer Routing Receiver Off chip Energy cost very close to On chip Energy cost To go deeper: Silicon Photonics for Exascale Computing Keren Bergman, Columbia universisty Lightwave Research Laboratory 30

[FUNCTION] Near threshold Voltage Drawback: surface is increasing http://www.realworldtech.com/near-threshold-voltage/ 31

1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Frequency (MHz) The views expressed are those of the author and do not reflect the official policy or position of Bull [FUNCTION] CPU frequency 3000 2500 2000? 1500 1000 500? 0 Source: top500 - mean freq 32

Technology node size (nm) The views expressed are those of the author and do not reflect the official policy or position of Bull [FUNCTION] Mask reduction 10000 1000 100 10 1 1960 1970 1980 1990 2000 2010 2020 2030 33

The views expressed are those of the author and do not reflect the official policy or position of Bull 34

TOP500 1,E+09 1,E+08 1,E+07 1,E+06 1,E+05 1,E+04 1,E+03 1,E+02 1,E+01 1,E+00 0 10 20 30 40 50 60 Rpeak Extrap Peak Rmax Extrap Max 35

Is it a long way to Exascale Jun 97 Nov 08 Nov 12 Jun 19 asci red roadrunner Titan Eflop Sandia ORNL ORNL??? MW 0,50 2,48 8,3 20 Eflop max 1,45E-06 0,001 0,027 1 nodes 7264 6480 18688 64000 racks 104 296 200 300 U/rack 30 42 42 42 Concur. 9152 129 600 50 M O(10 9 ) Year factor Year factor Year factor Gflop/W 2,91E-03 1,549 0,4 1,651 3,3 1,519 50,0 Tflop/node 2,00E-04 1,798 0,2 1,708 1,5 1,441 15,6 W/node 68,8 1,161 383,3 1,035 439,3 0,949 312,5 nodes/rack 69,8 0,904 21,9 1,437 93,4 1,135 213,3 W/rack 4807,7 1,050 8390,1 1,487 41045,0 1,077 66666,7 Tflop/U 4,66E-04 1,579 0,09 2,455 3,23 1,637 79,37 W/U 160,3 1,019 199,8 1,487 977,3 1,077 1587,3 36

Technology node size (nm) The views expressed are those of the author and do not reflect the official policy or position of Bull Node evolution : «à la Phi» 10000 1000 100 10 1 1960 1980 2000 2020 2040 Gflop/W 50,0 Tflop/node 15,6 1-2 Phi W/node 312,5 nodes/rack 213,3 Tflop/U 79,37 10 Phi W/U 1587,3 60 cores @22nm => ~500 cores @8nm 1 Tflops @22nm => ~8 Tflops @ 8nm With µarch improvement, it should be less! Phi power consumption will have to be around 150W-300W Between 200 and 400 Phi per Rack 37

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull [#freq] x [#flop/cycle] x [#cores] 1050 MHz 700 MHz 16-64 22 000 000 38

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull Gene Amdahl John Gustafson 39

Stack Application Libraries Runtime Supercomputer Data Center Hurry UP Speed UP Keep the Pace 40

Strong Integration Ressource Manager Data exchange Library Code Resiliency Topology See Fastforward approach See MAGMA [UTK] with StarPU [INRIA] 41

Supercomputer Suite COMPLETE OPEN FLEXIBLE All functionalities o Cluster management o Development factory o Execution environment o Data storage and access All sizes o From department o to Top 10 Best of breed o Linux, o OpenMPI, o HPC Toolkit, o Nagios, o OFED, o Slurm, o Lustre, o Shine,... + Bull added value Modular: Get what you need when you need INTEGRATED Installed, deployed and operated as a single software 42

Applications and Performances team Electro-Magnetics Computational Chemistry Quantum Mechanics Computational Chemistry Molecular Dynamics Computational Biology Structural Mechanics Implicit Structural Mechanics Explicit One Bull expert (PhD) for each segment Seismic Processing Computational Fluid Dynamics Reservoir Simulation Rendering / Ray Tracing Climate / Weather Ocean Simulation Data Analytics 43

44

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull 45

The views expressed The views are expressed those of are the those author of the author and and do do not not reflect the official policy policy or position or position of Bull of Bull 46

For any question xavier.vigouroux@bull.net 47