7),8) (GPU) SIMD ClearSpeed (GSIC) 53% TSUBAME. NVIDIA Tesla GPU TFlops. Tokyo Institute of Technology 2 JST, CREST
|
|
- Thomasina Lynch
- 6 years ago
- Views:
Transcription
1 Linpack 1, 2 1, 2 1, 3, 2 1, 2 7),8) (GPU) SIMD ClearSpeed TSUBAME Linpack TSUBAME Opteron 640 Xeon 4),12) LANL RoadRunner 5) 648 ClearSpeed 624 NVIDIA Tesla GPU Linpack 87TFlops 163TFlops (GSIC) 53% TSUBAME Opteron CPU Xeon CPU ClearSpeed Linpack NVIDIA Tesla GPU 4 Linpack Linpack Tuning Method on a Heterogeneous Supercomputer with Hybrid Accelerators Toshio Endo, 1, 2 Akira Nukada, 1, 2 Satoshi Matsuoka 1, 3, 2 and Naoya Maruyama 1, 2 We report Linpack benchmark results on the TSUBAME supercomputer, a large scale heterogenous system with graphics processing units (GPUs) and ClearSpeed SIMD accelerators. With all of about 10,000 Opteron cores, 640 Xeon cores, 648 ClearSpeed accelerators and 624 NVIDIA Tesla GPUs, we have achieved 87TFlops. This paper describes careful tuning and load balancing method required to achieve this performance. On the other hand, since the peak speed is 163 TFlops, the efficiency is 53%, which is slower than other systems. This paper also discusses the reason of this gap from the aspect of system architecture. 1. SIMD Sony/IBM/ Cell Broadband Engine NVIDIA AMD(ATI) (GPU) ClearSpeed SIMD GPU 9),10) 77.48TFlops 11) 87.01TFlops ( 1) Top500 3) 41 Cell RoadRunner Linpack (R peak ) (R max ) TSUBAME 163.2TFlops R peak /R max 87.01/163.2 = 53.3% Top % RoadRunner 76% TSUBAME PCI 1 Tokyo Institute of Technology 2 JST, CREST 3 National Institute of Informatics 1 c 2009 Information Processing Society of Japan
2 Switch C1 Switch #1 120 nodes 2 1 Top500 TSUBAME Linpack TSUBAME supercomputer IB(InfiniBand) x24 Switch #2 Switch #3 Switch C2 Switch #4 Switch #5 Switch #6 120 nodes 120 nodes 120 nodes 120 nodes 55 nodes IB x4 10GB Voltaire InfiniBand TSUBASA cluster TSUBAME tsubasa 2. TSUBAME TSUBAME 655 SunFire X PBytes InifiniBand 2) TSUBAME : TSUBAME dual core 2.4GHz Opteron CPU core 32GB InifiniBand CUFFT CUBLAS host channel adapter (HCA) 2 64bit SuSE Linux Enterprise Server 10 I/O PCI-X PCI-Express 1.0 x8 HCA InfiniBand : 2 10Gbps SDR InfiniBand 288 Voltaire ( 2) InifiniBand 24 Message passing interface(mpi) Voltaire MPI ClearSpeed : PCI-X Clear- Speed X620 1) 2 CSX600 SIMD 1GB DRAM( 6.4Gbytes/s) 420MFlops( ) 96 PE 80.64GFlops 1.06GBytes/s PCI-X 25W SIMD C n CSXL CSFFT CSXL Tesla : NVIDIA Tesla S1070 1U 4 ( ) TSUBAME 316 Tesla 2 1 Tesla PCI-Express gen1 x8 Tesla Tesla T10GPU (SM) 30 SM 102Gbytes/s 4GB 86.4GFlops 1.04TFlops 700W, 175W NVIDIA Tesla CUDA C BLAS CUBLAS c 2009 Information Processing Society of Japan
3 tsubasa : TSUBAME tsubasa Xeon TSUBAME 20 InfiniBand( 200Gbps) : k 2 TSUBASA TSUBAME Linpack 90 Sun Blade X6250 Quad core Xeon E5440 (2.83GHz) 2 8CPU 8GB N 16GB SDR InfiniBand(10Gbps) Linpack N 2.1 MPI Opteron, ClearSpeed, NVIDIA, Xeon 4 ( ) 3 4. ( ) Tesla TSUBAME : 16 Opteron 1 ClearSpeed 2 Tesla Tesla TSUBAME : 16 Opteron 1 ClearSpeed TSUBASA : 8 Xeon 3. Linpack Linpack High performance Linpack (HPL) 6) HPL Linpack MPI N Flops : (DGEMM) CPU Clear- P Q Speed, Tesla Cell RoadRun- ner RoadRunner 96%( ) Cell N B TSUBAME CPU(Opteron Xeon) 35% Clear- ( k ) : k LU : O(N 2 B) O(N 2 (P + Q)) O(N 3 ) (DGEMM) BLAS 4.1 TSUBAME Linpack 11),12) RoadRunner 5) RoadRunner Opteron 4 Cell PowerXCell 8i PFlops Linpack 1.105PFlops RoadRunner TSUBAME CPU Speed 32% Tesla 33% : Linpack N N MPI : N RoadRunner CPU 3 c 2009 Information Processing Society of Japan
4 Cell 16GB Clear TSUBAME 32GB 2 Tesla 1 ClearSpeed 4GB+4GB+1GB Tesla TSUBAME 1 3 Tesla TSUBAME (Opteron, ClearSpeed, Tesla) PCI-X/PCI-express ( PCI ) : TSUBAME 4.3 MPI RoadRunner TSUBAME 4.2 TSUBAME CPU, ClearSpeed, Tesla MPI DGEMM B CPU HPL (M B) (B N ) DGEMM DGEMM M, N B CPU CPU/ O(M N B) PCI BLAS O(M N + M B + N B) B HPL (2) ClearSpeed B 288 B = 1152 RoadRunner B = 128 TSUBAME Opteron (CPU, ) B = 240 DGEMM RoadRunner TSUBAME DGEMM PCI Tesla TSUBAME 4 Tesla TSUBAME 2 tsubasa 1 3 Tesla TSUBAME 4 PCI CPU CPU OS DGEMM sched setaffinity CPU ClearSpeed 2 Tesla 1 16 Opteron cores ( 3 ) 12 DGEMM 1 TSUBAME N TFlops 1GB ClearSpeed 87TFlops HPL Tesla Speed Tesla Tesla Dedicated cores for PCI communication (1) B CPU DGEMM CPU 1 Tesla TSUBAME 4 4 c 2009 Information Processing Society of Japan
5 DGEMM ( 1%) Lin- 3 HPL pack TSUBAME (Tesla ) Tesla S1070 (1) TSUBAME ClearSpeed (2) (3) (4) ( ) Opteron Tesla (2) (3) ClearSpeed Xeon(Sun X6250 ) 1 MPI typical power DGEMM FIFO Linpack 66% DGEMM 15% ClearSpeed Tesla ( DGEMM) 1/7 1 Linpack ( ) FIFO DGEMM DGEMM DGEMM DGEMM R peak 163.2TFlops R max /R peak 5. Linpack 5.1 Linpack TSUBAME Linpack Voltaire MPI GCC BLAS Opteron Xeon GotoBLAS 1.26 ClearSpeed CSXL 3.11 Tesla DGEMM/DTRSM Linpack : (1)Tesla TSUBAME 312 (2)Tesla TSUBAME 336 (3)tsubasa 80 MPI = R max = 87.01TFlops Opteron / Opteron, Xeon TFlops Top500 GotoBLAS 4.48GFlops(Opteron) 10.74GFlops(Xeon) 41 RoadRunner PCI CPU 4 Linpack CPU Xeon 35% ClearSpeed 32%, Tesla 33% Linpack MPI 10 Tesla 53.5% RoadRunner 76% 5.2 TSUBAME Linpack PCI DGEMM 5 TSUBAME Opteron 6 Linpack core DGEMM 1 ClearSpeed Tesla AC-DC DGEMM 5 c 2009 Information Processing Society of Japan
6 4 BLAS size Linpack Linpack DGEMM 1 Tesla DGEMM 1GPU on-board PCI (80.33GFlops) Opteron 5% ClearSpeed CSXL 3.11 on-board DGEMM CSX600 ( 1 ) 32.15GFlops (2) Opteron DGEMM Linpack (B=240) B = 1152 BLAS size Linpack MPI DGEMM Opteron, Xeon, Tesla 93 95% 19% ClearSpeed 80% Opteron 13% 89% node DGEMM 1 DGEMM TSUBAME Linpack DGEMM PCI PCI Linpack PCI CPU Tesla TSUBAME 262.4GFlops Tesla TSUBAME 121.2GFlops tsubasa 85.86GFlops PCI RoadRunner PCI core DGEMM node DGEMM node hetero Linpack DGEMM 262.4GFlops Tesla GFlops Tesla 2 Linpack Tesla 121.2/2 4 = 242.4GFlops 20GFlops 6.4% Opteron RoadRunner Linpack (4 ) DGEMM 12% (1) DGEMM PCI Opteron core DGEMM 22% 1 DGEMM DTRSM 6 core DGEMM node DGEMM 1% 2 Opteron B=1152 node hetero BLAS size 8% 2 6 c 2009 Information Processing Society of Japan
7 6 TSUBAME Opteron ( B=240) 5 TSUBAME ( B=1152) GOTO BLAS ( RoadRunner PCI ) tsubasa COE JST CREST ULP-HPC: 6. 1) ClearSpeed Technology Inc. TSUBAME 10,000 Tesla 2) NVIDIA CUDA Documentation. ClearSpeed 1200 Linpack develop.html TFlops 3) TOP500 supercomputer sites. RoadRunner Linpack 4) Toshio Endo and Satoshi Matsuoka. Massive supercomputing coping with heterogeneity of modern accelerators. In Proceedings of IEEE International Parallel and GPU TSUBAME Distributed Processing Symposium (IPDPS08), page 10 pages, ) Michael Kistler, John Gunnels, Daniel Brokenshire, and Brad Benton. Petascale ( ) ( ) NVIDIA Corp. computing with accelerators. In Proceedings of ACM SIGPLAN Principles and Practice of Paralle Computing (PPoPP09), ClearSpeed Inc. 7 c 2009 Information Processing Society of Japan
8 6) A.Petitet, R.C. Whaley, J.Dongarra, and A.Cleary. HPL - a portable implementation of the high-performance Linpack benchmark for distributed-memory computers. 7) JamesC. Phillips and John E.Stone andand KlausSchulten. Adapting a messagedriven parallel application to GPU-accelerated clusters. In Proceedings of IEEE SC08, ) Jeff Stuart and John Owens. Message passing on data-parallel architectures. In Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS09), ),,,,, and. GPU 32GPU 700GFLOPS. In 2009-HPC-120, page 6 pages, ) and. CUDA poisson. In 2008-HPC-115, pages 19 23, ),,,, and. linpack. In 2009-ARC-182/HPC-119 (HOKKE2009), pages 85 90, ),,, and. TSUB- AME Linpack., 48(SIG 8 (ACS 18)):62 70, c 2009 Information Processing Society of Japan
SC17, Denver, November 14, Highlights of the 50 th. TOP500 List. Erich Strohmaier
Highlights of the 50 th SC17, Denver, November 14, 2017 TOP500 List Erich Strohmaier 41 ST LIST: Sunway THE TaihuLight TOP10 # Site Manufacturer Computer Country Cores 1 2 3 4 5 6 7 8 9 10 National Supercomputing
More informationTOP Years - 50 Editions
TOP500-25 Years - 50 Editions SC17 November 14 2017 Erich Strohmaier Berkeley Lab TOP500 Roots Early Surveys Mannheimer Supercomputer Statistics Hans Meuer started the Mannheimer Supercomputer Seminar
More informationISC 2016, Frankfurt, June 20, Highlights of the 47 th. TOP500 List. Erich Strohmaier
Highlights of the 47 th ISC 2016, Frankfurt, June 20, 2016 TOP500 List Erich Strohmaier 41 ST LIST: Sunway THE TaihuLight TOP10 # Site Manufacturer Computer Country Cores 1 2 3 4 5 6 7 8 National Supercomputing
More informationTSUBAME---A Year Later
1 TSUBAME---A Year Later Satoshi Matsuoka, Professor/Dr.Sci. Global Scientific Information and Computing Center Tokyo Inst. Technology & NAREGI Project National Inst. Informatics EuroPVM/MPI, Paris, France,
More informationigh erformance omputing
igh erformance omputing 2012/11/19 Xavier Vigouroux Business Development Manager 1 The views expressed are those of the author and do not reflect the official policy or position of Bull The views expressed
More informationCloud, Distributed, Embedded. Erlang in the Heterogeneous Computing World. Omer
Cloud, Distributed, Embedded. Erlang in the Heterogeneous Computing World Omer Kilic @OmerK omer@erlang-solutions.com Outline Challenges in modern computing systems Heterogeneous computing Co-processors
More informationInvestigating the Problems of Ship Propulsion on a Supercomputer
Ship Elbrus powered by What I See is What I Understand. Так мы его задумали Investigating the Problems of Ship Propulsion on a Supercomputer A.A. Aksenov, S.V. Zhluktov, D.P. Silaev, S.A. Kharchenko, E.A.
More informationComputational Challenges in Cold QCD. Bálint Joó, Jefferson Lab Computational Nuclear Physics Workshop SURA Washington, DC July 23-24, 2012
Computational Challenges in Cold QCD Bálint Joó, Jefferson Lab Computational Nuclear Physics Workshop SURA Washington, DC July 23-24, 2012 Cycles from Titan, Mira 10000 2 x 20 PF & Moore's law Cumulative
More informationP500/700/900 Part Number. P510/710/910 Part Number N/A N/A SBB0K65455 N/A N/A N/A SBB0K65456 N/A N/A N/A SBB0K65457 N/A N/A N/A SBB0K65521 N/A
PDF Version: 2.0, April 6, 2016 MAINSTREAM SYSTEMS SBB P410 Part Number P500/700/900 Part Number P510/710/910 Part Number Option Part Number P410 P500 P510 Processors Xeon E5 2600 s Broadwell Intel Xeon
More informationA new take at Adaptive Fast Multipole Methods: application, implementation, and hybrid CPU/GPU parallelism
A new take at Adaptive Fast Multipole Methods: application, implementation, and hybrid CPU/GPU parallelism Stefan Engblom UPMARC @ TDB/IT, Uppsala University KCSE seminar, Stockholm, November 6, 2013 S.
More informationInstructors: Randy H. Katz David A. PaGerson hgp://inst.eecs.berkeley.edu/~cs61c/fa10. Fall Lecture #39. Agenda
2/2/ CS 6C: Great Ideas in Computer Architecture (Machine Structures) Project 3 Speed- up and RAID Instructors: Randy H Katz David A PaGerson hgp://insteecsberkeleyedu/~cs6c/fa 2// Fall 2 - - Lecture #39
More informationFast Floating Point Compression on the Cell BE Processor
Fast Floating Point Compression on the Cell BE Processor Ajith Padyana, T.V. Siva Kumar, P.K.Baruah Sri Satya Sai University Prasanthi Nilayam - 515134 Andhra Pradhesh, India ajith.padyana@gmail.com, tvsivakumar@gmail.com,
More informationNAWEA 2015 SYMPOSIUM
Aerodynamics and Aeroacoustics of Spanwise Wavy Trailing Edge Flatback Airfoils: Design Improvement Seung Joon Yang James D. Baeder Alfred Gessow Rotorcraft Center Department of Aerospace Engineering,
More informationCompiling for Multi, Many, and Anycore. Rudi Eigenmann Purdue University
Compiling for Multi, Many, and Anycore Rudi Eigenmann Purdue University Compilation Issues in Multi, Many, Anycore and sample contributions Multicore: shared-memory issues revisited => Cetus: parallelizing
More informationThe evolution of ENEAGRID/CRESCO HPC infrastructure
2ndNanomeetsBio@Nanomates Università di Salerno 2 maggio 2013 The evolution of AGRID/CRESCO HPC infrastructure G.Bracco bracco@enea.it A Centro Ricerche Frascati V. Enrico Fermi 45, Frascati (ROMA) S.Migliori,A.Quintiliani,S.Podda,R.Guadagni,F.Ambrosino,F.Beone,
More informationThe evolution of ENEAGRID/CRESCO HPC infrastructure
Incontro EERA-SP4 Bologna 15 maggio 2013 The evolution of AGRID/CRESCO HPC infrastructure G.Bracco bracco@enea.it A Centro Ricerche Frascati V. Enrico Fermi 45, Frascati (ROMA) S.Migliori,A.Quintiliani,S.Podda,R.Guadagni,F.Ambrosino,F.Beone,
More informationHPC 2 (nee ERC) History
HPC 2 (nee ERC) History William Faulkner, of course I think that a man tries to be better than he thinks he will be. I think that that is his immortality, that he wants to be better, he wants to be braver,
More informationNow every device for small & medium businesses, at zero upfront
Now every device for small & medium businesses, at zero upfront Well, almost every device etisalat.ae/businessdevices 6180_Device2 Rate Card_Combined Generic/24 September 2017 Business Devices Rate Card
More informationSatoshi Yoshida and Takuya Kida Graduate School of Information Science and Technology, Hokkaido University
Satoshi Yoshida and Takuya Kida Graduate School of Information Science and Technology, Hokkaido University ompressed Pattern Matching ompressed Data Search Directly 0000 000000 Program Searching on ompressed
More informationOut-of-Core Cholesky Factorization Algorithm on GPU and the Intel MIC Co-processors
Out-of-Core Cholesky Factorization Algorithm on GPU and the Intel MIC Co-processors Ben Chan (Chinese University of Hong Kong) Nina Qian (Chinese University of Hong Kong) Mentors: Ed D Azevedo (ORNL) Shiquan
More informationElectronic Structure Workshop, Spring 2017
Electronic Structure Workshop, Spring 2017 Weine Olovsson @Linköping University, Wednesday 29th March 2017 National Supercomputer Centre in Linköping Sweden Introduction Welcome! Presentations from SNIC
More informationThe Evolution of Transport Planning
The Evolution of Transport Planning On Proportionality and Uniqueness in Equilibrium Assignment Michael Florian Calin D. Morosan Background Several bush-based algorithms for computing equilibrium assignments
More informationOpenFabrics Alliance Interoperability Logo Group (OFILG) May 2011 Logo Event Report
OpenFabrics Alliance Interoperability Logo Group (OFILG) May 2011 Logo Event Report UNH-IOL 121 Technology Drive, Suite 2 Durham, NH 03824 - +1-603-862-0090 OpenFabrics Interoperability Logo Group (OFILG)
More informationPersistent Memory Performance Benchmarking & Comparison. Eden Kim, Calypso Systems, Inc. John Kim, Mellanox Technologies, Inc.
Persistent Memory Performance Benchmarking & Comparison Eden Kim, Calypso Systems, Inc. John Kim, Mellanox Technologies, Inc. PM Benchmarking & Comparison Part 1: Test Plan & Workloads Eden Kim of Calypso
More informationDecompression of run-time compressed PE-files
Decompression of run-time compressed PE-files MIROSLAV VNUK, PAVOL NÁVRAT Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 842 6 Bratislava, Slovakia Abstract.
More informationAvailable online at ScienceDirect. Transportation Research Procedia 2 (2014 )
Available online at www.sciencedirect.com ScienceDirect Transportation Research Procedia 2 (2014 ) 264 272 The Conference on in Pedestrian and Evacuation Dynamics 2014 (PED2014) Exhaustive analysis with
More informationXtreemOS Plans for the Kerrighed Project
Managed by XtreemOS Plans for the Kerrighed Project Christine Morin, INRIA XtreemOS scientific coordinator February 1, 2008 Kerrighed Summit Paris, France XtreemOS IP project is funded by the European
More informationHWBOINTS REVISION 7. Revision 7 is designed in response to community feedback with the intent of awarding points
HWBOINTS REVISION 7 Revision 7 is designed in response to community feedback with the intent of awarding points more accurately as a reflection of overclocking result quality. The new revision re-balances
More informationSupercomputing in Plain English
Supercomputing in Plain English Part VIII: Multicore Madness Henry Neeman, Director OU Supercomputing Center for Education & Research University of Oklahoma Information Technology Tuesday April 14 2009
More informationOptimization of a Wing-Sail shape for a small boat
STAR Global Conference 2014 Vienna, March 17-19 Optimization of a Wing-Sail shape for a small boat G. Lombardi, F. Cartoni, M. Maganzi Dept. of Civil and Industrial Engineering of Pisa Aeronautical Section
More informationAdiabatic Switching. A Survey of Reversible Computation Circuits. Benjamin Bobich, 2004
Adiabatic Switching A Survey of Reversible Computation Circuits Benjamin Bobich, 2004 Agenda for Today 1. The Basics of Adiabatic Logic and the Fundamentals of Adiabatic Charging 2. Early Adiabatic Circuit
More informationCS 341 Computer Architecture and Organization. Lecturer: Bob Wilson Cell Phone: or
CS 341 Computer Architecture and Organization Lecturer: Bob Wilson Cell Phone: 508-577-9895 Email: robert.wilson@umb.edu or bobw@cs.umb.edu 1 Welcome to CS341 This course teaches computer architecture
More informationAccuRAID iscsi Auto-Tiering Best Practice
AccuRAID iscsi Auto-Tiering Best Practice Overview From the perspective of storage features, the performance of SSDs are high, but the cost is also high per GB. Relatively speaking, the cost of a traditional
More informationNon-Interactive Secure Computation Based on Cut-and-Choose
Non-Interactive Secure Computation Based on Cut-and-Choose Arash Afshar, Payman Mohassel, Benny Pinkas, and Ben Riva May 14, 2014 Afshar, Mohassel, Pinkas, and Riva Non-Interactive Secure Computation Based
More informationApproved Minutes (At Long Beach, CA, May 2004) IEEE Backplane Ethernet Study Group March 16th 18 th, 2004 Orlando, FL
Approved Minutes (At Long Beach, CA, May 2004) IEEE 802.3 - Backplane Ethernet Study Group March 16th 18 th, 2004 Orlando, FL Meeting convened at 8:26am, March 16, 2004. Prepared by: John D Ambrosia Agenda
More informationcudimot: A CUDA toolbox for modelling the brain tissue microstructure from diffusion-mri
cudimot: A CUDA toolbox for modelling the brain tissue microstructure from diffusion-mri Moisés Hernández Fernández Istvan Reguly, Mike Giles, Stephen Smith and Stamatios N. Sotiropoulos GPU Technology
More informationOn the use of rotor equivalent wind speed to improve CFD wind resource mapping. Yavor V. Hristov, PhD Plant Performance and Modeling Vestas TSS
On the use of rotor equivalent wind speed to improve CFD wind resource mapping Yavor V. Hristov, PhD Plant Performance and Modeling Vestas TSS Firestorm- Number 53 on Top500 list from June 2011 14664 processors
More informationA Quadtree-Based Lightweight Data Compression Approach to Processing Large-Scale Geospatial Rasters
1 A Quadtree-Based Lightweight Data Compression Approach to Processing Large-Scale Geospatial Rasters Jianting Zhang 1,2 and Simin You 2 1 Department of Computer Science, City College of New York, New
More informationParaFEM: Microstructurally Faithful Modelling of Materials. Louise M. Lever, University of Manchester
ParaFEM: Microstructurally Faithful Modelling of Materials Louise M. Lever, University of Manchester HECToR dcse Seminar, NAG, Manchester, UK 9.30am Wednesday 5 October 2011 Overview Background Activities
More informationKnowledge. Improvement. Success. SAM PuttLab. 3D putt replay module
Knowledge. Improvement. Success. www.scienceandmotion.com SAM PuttLab 3D putt replay module The COMPLETE putting solution Putt training in a new dimension 43% of the golf shots are putts, however, putting
More informationPerfect Golf Quick Start Guide
Quick Start Guide Perfect Golf Quick Start Guide To play Perfect Golf you must first have purchased the following: 1. A SkyTrak Launch Monitor 2. Have an active Play and Improve Package 3. Have purchased
More informationScalable Data Structure to Compress Next- Generation Sequencing Files and its Application to Compressive Genomics
Western Michigan University ScholarWorks at WMU Parallel Computing and Data Science Lab Technical Reports Computer Science Fall 2017 Scalable Data Structure to Compress Next- Generation Sequencing Files
More informationDesarrollo de un Modelo de Oleajes Para Ingeniería de Costas (El Método de Hidrodinámica de Partículas Suavizada)
Desarrollo de un Modelo de Oleajes Para Ingeniería de Costas (El Método de Hidrodinámica de Partículas Suavizada) Robert A. Dalrymple, Johns Hopkins University Moncho Gómez Gesteira, Benedict Rogers, Shan
More informationProfile-driven Selective Code Compression
Profile-driven Selective Code Compression Yuan Xie and Wayne Wolf Electrical Engineering Department Princeton University Princeton, NJ 08540, USA yuanxie,wolf@ee.princeton.edu Haris Lekatsas NEC USA 4
More informationThe evolution of ENEAGRID/CRESCO HPC infrastructure
Portici 22 maggio 2013 The evolution of AGRID/CRESCO HPC infrastructure G.Bracco bracco@enea.it A Centro Ricerche Frascati V. Enrico Fermi 45, Frascati (ROMA) S.Migliori,A.Quintiliani,S.Podda,R.Guadagni,F.Ambrosino,F.Beone,
More informationLight Loss-Less Data Compression, With GPU Implementation
Light Loss-Less Data Compression, With GPU Implementation Shunji Funasaka, Koji Nakano, and Yasuaki Ito Department of Information Engineering, Hiroshima University Kagamiyama -4-, Higashihiroshima 739-8527,
More informationHow Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library
How Game Engines Can Inspire EDA Tools Development: A use case for an open-source physical design library Tiago Fontana, Renan Netto, Vinicius Livramento, Chrystian Guth, Sheiny Almeida, Laércio Pilla,
More informationHPC Market Update October Addison Snell Christopher Willard, Ph.D. Laura Segervall
HPC Market Update October 2018 Addison Snell Christopher Willard, Ph.D. Laura Segervall HPC Market for 2017 Total worldwide HPC market (servers, storage, software, etc.) reached $35.4 billion in 2017,
More informationComputing s Energy Problem:
Computing s Energy Problem: (and what we can do about it) Mark Horowitz Stanford University horowitz@ee.stanford.edu 1 of 46 Everything Has A Computer Inside 2of 46 The Reason is Simple: Moore s Law Made
More informationFREE MOTION SIMULATION OF A SAILING YACHT IN UP-WIND CONDITION WITH ROUGH SEA
STAR European Conference 2010 London, 22-23 March FREE MOTION SIMULATION OF A SAILING YACHT IN UP-WIND CONDITION WITH ROUGH SEA G. Lombardi, M. Maganzi, A. Mariotti Dept. of Aerospace Engineering of Pisa
More informationA Hybrid Code Compression Technique using Bitmask and Prefix Encoding with Enhanced Dictionary Selection
A Hybrid Code Compression Technique using Bitmask and Prefix Encoding with Enhanced Dictionary Selection Syed Imtiaz Haider and Leyla Nazhandali Virginia Polytechnic Institute and State University 302
More informationSteelHead Product Family
Specification Sheet 05.08.18 SteelHead Product Family SteelHead CX Branch Office SteelHead Mid-Size Office SteelHead CX255 Series CX570 Series CX770 Series Configurations U L M H L M H L M H Profile Desktop
More informationA 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications
A 28nm SoC with a 1.2GHz 568nJ/ Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications Paul Whatmough, S. K. Lee, H. Lee, S. Rama, D. Brooks, G.-Y. Wei
More informationSHUFFLE TURN OF HUMANOID ROBOT SIMULATION BASED ON EMG MEASUREMENT
SHUFFLE TURN OF HUMANOID ROBOT SIMULATION BASED ON EMG MEASUREMENT MASANAO KOEDA, TAKAYUKI SERIZAWA, AND YUTA MATSUI Osaka Electro-Communication University, Faculty of Information Science and Arts, Department
More informationReducing Code Size with Run-time Decompression
Reducing Code Size with Run-time Decompression Charles Lefurgy, Eva Piccininni, and Trevor Mudge Advanced Computer Architecture Laboratory Electrical Engineering and Computer Science Dept. The University
More informationHydroinformatics in Urban Environment
Hydroinformatics in Urban Environment Prof. Dragan Savic FREng 11 May 2014 Centre for Water Systems Established in 1998 30+ members (9 academic staff, 8 post-docs, 20+ PhDs/EngDs, 1 administrator) Current
More informationAddressing DDR5 design challenges with IBIS-AMI modeling techniques
Addressing DDR5 design challenges with IBIS-AMI modeling techniques Todd Westerhoff, SiSoft Doug Burns, SiSoft Eric Brock, SiSoft DesignCon 2018 IBIS Summit Santa Clara, California February 2, 2018 Agenda
More informationHP Integrity Superdome 2
Total System Cost $1,703,613 USD Database Size HP Integrity Superdome 2 Composite Query per Hour Metric 140,181.1 QphH@1000GB Database Manager Operating System Price/Performance $12.15 USD Price/QphH@1000GB
More informationBareos, ZFS and Puppet. Christian Reiß Symgenius
Bareos, ZFS and Puppet Christian Reiß Symgenius About Me Christian Reiß 37 years old System Administrator at Symgenius 2 About Me Linux since 1995'ish (Kernel 1.3) B* evangelist Love Puppet 3 Agenda Motivation
More informationICES REPORT November gfpc: A Self-Tuning Compression Algorithm. Martin Burtscher and Paruj Ratanaworabhan
ICES REPORT 09-31 November 2009 gfpc: A Self-Tuning Compression Algorithm by Martin Burtscher and Paruj Ratanaworabhan The Institute for Computational Engineering and Sciences The University of Texas at
More informationTransposition Table, History Heuristic, and other Search Enhancements
Transposition Table, History Heuristic, and other Search Enhancements Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 Abstract Introduce heuristics for improving the efficiency
More informationDecompression Method For Massive Compressed Files In Mobile Rich Media Applications
2010 10th IEEE International Conference on Computer and Information Technology (CIT 2010) Decompression Method For Massive Compressed Files In Mobile Rich Media Applications Houchen Li, Zhijie Qiu, Lei
More informationEEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering
EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 6 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Outline 2 Review of lecture 5 The
More informationOutline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management
EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 6 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Outline Review of lecture 5 The
More informationRunning WIEN2K on Ranger
Running WIEN2K on Ranger with both coarse and fine parallelism Hang Liu Texas Advanced Computing Center May 8, 2012 Outline Introduction Setting WIEN2K in User s Account 2 Introduction WIEN2K WIEN2K Software
More informationWHY CHINA IS A GOOD MARKET FOR FD SOI
WHY CHINA IS A GOOD MARKET FOR FD SOI SEPTEMBER 15, 2015 INTERNATIONAL BUSINESS STRATEGIES, INC. 632 Industrial Way Los Gatos CA 95030 USA 408 395 9585 408 395 5389 (fax) www.ibs-inc.net info@ibs-inc.net
More informationUsed with Dissolution Tester for Fraction and Flow Measurements
TOYAMA Auto Sampler W, Model PAS-615 Used with Dissolution Tester for Fraction and Flow Measurements This is a full automatic sampler to be used with 6-shaft or 8-shaft dissolution tester, with the following
More informationREPORT, RE0813, MIL, ENV, 810G, TEMP, IN HOUSE, , PASS
REPORT, RE0813, MIL, ENV, 810G, TEMP, IN HOUSE, TST 00191 REV. A Crystal Group Inc 850 Kacena Rd., Hiawatha, IA Phone: 877 279 7863 Fax: 319 393 2338 9/23/2014 Revision History REV Date Approved Description
More informationProgress in Developing Hybrid Models
Progress in Developing Hybrid Models Shiqiang Yan, Jinghua Wang & Qingwei Ma City, University of London A Zonal CFD Approach for Fully Nonlinear Simulation of Two vessels in Launch and Recovery Operation
More informationNew IBIS Techniques for Modeling Complex IO Cadence Webinar March 23, 2005 Donald Telian
New IBIS Techniques for Modeling Complex IO Cadence Webinar March 23, 2005 Donald Telian 1 CADENCE DESIGN SYSTEMS, INC. About the Presenter Donald Telian 20+ years in high-speed PCB and SI Engineer at
More informationPARALLEL IMPLEMENTATION OF THE SOCIAL FORCES MODEL
PARALLEL IMPLEMENTATION OF THE SOCIAL FORCES MODEL Michael J. Quinn 1, Ronald A. Metoyer 1, and Katharine Hunter-Zaworski 2 1 School of Electrical Engineering and Computer Science 2 Department of Civil,
More informationSolving MINLPs with BARON. Mustafa Kılınç & Nick Sahinidis Department of Chemical Engineering Carnegie Mellon University
Solving MINLPs with BARON Mustafa Kılınç & Nick Sahinidis Department of Chemical Engineering Carnegie Mellon University MINLP 2014 Carnegie Mellon University June 4, 2014 MIXED-INTEGER NONLINEAR PROGRAMS
More informationSolving the problem of serving large image mosaics. Using ECW Connector and Image Web Server with ArcIMS
Solving the problem of serving large image mosaics Using ECW Connector and Image Web Server with ArcIMS A White Paper from Earth Resource Mapping 9 April 2003 Asia Pacific Regional Office Americas Region
More informationSUPPLEMENT MATERIALS
SUPPLEMENT MATERIALS This document provides the implementation details of LW-FQZip 2 and the detailed experimental results of the comparison studies. 1. Implementation details of LW-FQZip 2 LW-FQZip 2
More informationChapter 19: Vibrations and Waves
Chapter 19: Vibrations and Waves SIMPLE HARMONIC MOTION ic or Oscillatory motion is called SHM. Start off with the story of Galileo being in the church. PENDULUM Make the following points with a pendulum
More informationSQL LiteSpeed 3.0 Installation Guide
SQL LiteSpeed 3.0 Installation Guide Revised January 27, 2004 Written by: Jeremy Kadlec Edgewood Solutions www.edgewoodsolutions.com 888.788.2444 2 Introduction This guide outlines the SQL LiteSpeed 3.0
More informationA computational study of on-demand accuracy level decomposition for twostage stochastic programs
A computational study of on-demand accuracy level decomposition for twostage stochastic programs Christian Wolf, Universität Paderborn, Germany Csaba I. Fábián, Kecskemét College, Hungary Achim Koberstein,
More informationNAMD & HELIUM Enabling Work on the PRACE IBM Prototypes
NAMD & HELIUM Enabling Work on the PRACE IBM Prototypes Xu Guo Applications Consultant EPCC, The University of Edinburgh xguo@epcc.ed.ac.uk +44 131 651 3530 Outline PRACE project overview Application enabling
More informationStrategy, Developments & Outlook SESP September 2010 ESTEC, Noordwijk, The Netherlands
Strategy, Developments & Outlook SESP 2010 28-30 September 2010 ESTEC, Noordwijk, The Netherlands Overview Introduction Strategy Upgraded courses New 4.2 features EuroSim 4.3 outlook EuroSim on a stick
More informationThe GungHo Dynamical Core
The GungHo Dynamical Core The Met Office STFC Daresbury Laboratory * University of Manchester Rupert Ford * STFC's effort was funded by the Hartree Centre 6th 7th April 2016, 4th ENES HPC Workshop GungHo
More informationMulticore Real-Time Scheduling
Multicore Real-Time Scheduling Bjorn Andersson and Dionisio de Niz Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Copyright 2015 Carnegie Mellon University This material
More information05-341r0: Updated Test and Simulation Results in Support of SAS-2. Kevin Witt, Mahbubul Bari, Brad Holway
05-341r0: Updated Test and Simulation Results in Support of SAS-2 Kevin Witt, Mahbubul Bari, Brad Holway Equalization Overview Equalizers enable longer reach and higher data rates over band-limited channels.
More informationApplications of Climate Model in LAPAN
Applications of Climate Model in LAPAN Didi Satiadi National Institute of Aeronautics and Space (LAPAN) Workshop on MCCOE Radar Meteorology/Climatology in Indonesia Jakarta, 28 th February 2013 Introduction
More informationNew in the MUTCD: The Flashing Yellow Arrow Presented at the 57 th Annual Traffic and Safety Conference May 17, 2006
New in the MUTCD: The Flashing Yellow Arrow Presented at the 57 th Annual Traffic and Safety Conference May 17, 2006 David A. Noyce, Ph.D., P.E. University of Wisconsin - Madison The FYA Story Research
More informationPaul Burkhardt. May 19, 2016
GraphEx Symposium 2016 U.S. National Security Agency Research Directorate May 19, 2016 Graphs are everywhere! Why graph? Graph-theoretic approaches are appealing to many fields simple data abstraction
More informationProtecting shared storage from rogue jobs by I/O profiling and dynamic load balancing
Ellexus: The I/O Profiling Company Dr Rosemary Francis, CEO and Co-founder Protecting shared storage from rogue jobs by I/O profiling and dynamic load balancing The I/O Profiling Company - Protect. Balance.
More informationTraining Fees 3,400 US$ per participant for Public Training includes Materials/Handouts, tea/coffee breaks, refreshments & Buffet Lunch.
Training Title DISTRIBUTED CONTROL SYSTEMS (DCS) 5 days Training Venue and Dates DISTRIBUTED CONTROL SYSTEMS (DCS) Trainings will be conducted in any of the 5 star hotels. 5 22-26 Oct. 2017 $3400 Dubai,
More informationDistributed Power Management: Technical Deep Dive + Real World Example
Distributed Power Management: Technical Deep Dive + Real World Example Breakout Session # TA2197 Anne Holler Anthony Vecchiolla VMware Engineering International Integrated Solutions Date: September 18,
More informationGeneral Notes: NOTE: NOTE:
Symantec Storage Foundation and High Availability Solutions 6.1, 6.1.1, 6.2, 6.2.1 (AIX, Linux, Solaris), and Dynamic Multi-Pathing for VMware 6.1, 6.2 Hardware Compatibility List Copyright 2016 Veritas
More information3D Production Printer. Facility Requirements Guide. Original Instructions
ProX DMP100 3D Production Printer Facility Requirements Guide Original Instructions Please refer back to http://infocenter.3dsystems.com/product-library/prox-100#facility-guide for the most up-to-date
More informationIntroduction. The HCL also contains information for Dynamic Multi-Pathing for VMware 6.1, 6.2, and Veritas Access 7.1, 7.2.
Symantec Storage Foundation and High Availability Solutions 6.1, 6.1.1, 6.2, 6.2.1, Dynamic Multi-Pathing for VMware 6.1, 6.2, and Veritas Access 7.1, 7.2 Hardware Compatibility List Copyright 2017 Veritas
More informationInformation Systems ISM 3011
Thomson Course Technology 1 Information Systems ISM 3011 Telecommunications and Networks Unit 6A Chapter 6 Dr. Martin Hepp 1 Dr. Martin Hepp 2 Principles and Learning Objectives Effective communication
More informationGeneration of See-Through Baseball Movie from Multi-Camera Views
Generation of See-Through Baseball Movie from Multi-Camera Views Takanori Hashimoto #1, Yuko Uematsu #2, Hideo Saito #3 # Keio University 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, 223-8522 Japan 1 takanori@hvrl.ics.keio.ac.jp
More informationOptical Time Domain Reflectometer. Operating System 2 Windows XPe Windows XP Pro with desktop option 256 MB
CMA 4500 Series Optical Time Domain Reflectometer The CMA 4500 continues NetTest s tradition as the industry s premier OTDR solution designed with the high performance and scalability necessary to meet
More informationFebruary 22-27, 2010, Zurich, Switzerland
3 rd KNIME Users Meeting and Workshop February 22-27, 2010, Zurich, Switzerland Monday February 22, 11:00 am 6:00 pm, Technopark Zurich, Room Fortran KNIME Users Training Day 1 Start End Topic 11:00 am
More informationOVERPRESSURE DUE TO A MOLTEN ALUMINUM AND WATER EXPLOSION IN A CASTHOUSE
OVERPRESSURE DUE TO A MOLTEN ALUMINUM AND WATER EXPLOSION IN A CASTHOUSE Abstract Jennifer Woloshyn 1, Andrew Gerber 2, Tom Plikas 1, Duane Baker 1, Adam Blackmore 1 1 Hatch Ltd., 28 Speakman Drive, Mississauga
More informationPerformance of Fully Automated 3D Cracking Survey with Pixel Accuracy based on Deep Learning
Performance of Fully Automated 3D Cracking Survey with Pixel Accuracy based on Deep Learning Kelvin C.P. Wang Oklahoma State University and WayLink Systems Corp. 2017-10-19, Copenhagen, Denmark European
More informationSmart Cars for Safe Driving
Smart Cars for Safe Driving Prof. Dr. Dariu M. Gavrila Environment Perception Group Research and Advanced Engineering XXXII Jornadas de Automática, Sevilla, 9-9-2011 We originally thought Machine Intelligence
More informationSmart Data Role computers play in Technology
Smart Data Role computers play in Technology October 30 th 2015 Sizzle Video 2016 2 Introduction: Will Phillips INDYCAR Vice President of Technology Daniel Louks INDYCAR Support Engineer Smart Data Role
More informationArithmetic Coding Modification to Compress SMS
G8-5 2011 International Conference on Electrical Engineering and Informatics 17-19 July 2011, Bandung, Indonesia Arithmetic Coding Modification to Compress SMS Ario Yudo Husodo #1, Rinaldi Munir *2 #*
More information