A history-based estimation for LHCb job requirements

Similar documents
Building an NFL performance metric

A computer program that improves its performance at some task through experience.

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

Matrix-analog measure-cerrelatepredict

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

AC : MEASUREMENT OF HYDROGEN IN HELIUM FLOW

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Modeling Fantasy Football Quarterbacks

Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824

In addition to reading this assignment, also read Appendices A and B.

Model EXAxt AV550G. Averaging Oxygen Analyzer

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Depth Conversion with ISMap

Tier-2. Joint Annual Meeting of ÖPG/SPS/ÖGAA - Innsbruck /48. Austrian Federated WLCG

SIDRA INTERSECTION 6.1 UPDATE HISTORY

Geant4 Electromagnetic Physics for HEP Applications

Provably Secure Camouflaging Strategy for IC Protection

Computationally Efficient Determination of Long Term Extreme Out-of-Plane Loads for Offshore Turbines

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

LIDAR Correlation to Extreme Flapwise Moment : Gust Impact Prediction Time and Feedforward Control

Delta Compressed and Deduplicated Storage Using Stream-Informed Locality

PC-based systems ELOP II-NT

Estimating Paratransit Demand Forecasting Models Using ACS Disability and Income Data

CS 221 PROJECT FINAL

intended velocity ( u k arm movements

Section I: Multiple Choice Select the best answer for each problem.

CS145: INTRODUCTION TO DATA MINING

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

Bayesian Methods: Naïve Bayes

Winter Steelhead Redd to Fish conversions, Spawning Ground Survey Data

Navigate to the golf data folder and make it your working directory. Load the data by typing

Lab 1. Adiabatic and reversible compression of a gas

AN ASSESSMENT OF NEW JERSEY RESIDENT HUNTER OPINION ON CROSSBOW USE

Projecting Three-Point Percentages for the NBA Draft

Lecture 5. Optimisation. Regularisation

SUPPLEMENT MATERIALS

Torrild - WindSIM Case study

Chapter 12 Practice Test

Deconstructing Data Science

A.6 Graphic Files. A Scale-Free Transportation Network Explains the City-Size Distribution. Figure 6. The Interstate route map (abridged).

Novel empirical correlations for estimation of bubble point pressure, saturated viscosity and gas solubility of crude oils

Accuracy and Precision of High-Speed Field Measurements of Pavement Surface Rutting and Cracking

Determining Occurrence in FMEA Using Hazard Function

Lab 1c Isentropic Blow-down Process and Discharge Coefficient

Evaluation of four numerical wind flow models

Predicting Season-Long Baseball Statistics. By: Brandon Liu and Bryan McLellan

Applications of ELCIRC at LNEC

CHAPTER 6 DISCUSSION ON WAVE PREDICTION METHODS

Predicting the Total Number of Points Scored in NFL Games

A Point-Based Algorithm to Generate Final Table States of Football Tournaments

The Best Use of Lockout/Tagout and Control Reliable Circuits

Analysis of Shear Lag in Steel Angle Connectors

Preparation for Salinity Control ME 121

PREDICTING TEXTURE DEFICIENCY IN PAVEMENT MANAGEMENT PREDICTING TEXTURE DEFICIENCY IN PAVEMENT MANAGEMENT

Safety-critical systems: Basic definitions

EVALUATION OF ENVISAT ASAR WAVE MODE RETRIEVAL ALGORITHMS FOR SEA-STATE FORECASTING AND WAVE CLIMATE ASSESSMENT

Performance Visualization of Bubble Sort in Worst Case Using R Programming in Personal Computer

Diagnosis of Fuel Evaporative System

Physical Analysis Model Report

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Additional On-base Worth 3x Additional Slugging?

RM-80 respiration monitor

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Recovery of sea level fields of the last decades from altimetry and tide gauge data

Deconstructing Data Science

CT433 - Machine Safety

Tying Knots. Approximate time: 1-2 days depending on time spent on calculator instructions.

INSTRUMENT INSTRUMENTAL ERROR (of full scale) INSTRUMENTAL RESOLUTION. Tutorial simulation. Tutorial simulation

Predicting NBA Shots

Sensing and Modeling of Terrain Features using Crawling Robots

Software Reliability 1

Should bonus points be included in the Six Nations Championship?

STATE OF NEW YORK PUBLIC SERVICE COMMISSION. At a session of the Public Service Commission held in the City of Albany on August 20, 2003

Wind Flow Modeling Software Comparison

Development and Implementation of a Relocatable Coastal and Nearshore Modeling System

AggPro: The Aggregate Projection System

PROJECTS STREETS

Vacuum Simulations of the KATRIN Experiment

C o d i n g f o r i n t e r a C t i v e d i g i t a l M e d i a

Hydroinformatics in Urban Environment

Nearshore Wind-Wave Forecasting at the Oregon Coast. Gabriel García, H. Tuba Özkan-Haller, Peter Ruggiero November 16, 2011

FMEA- FA I L U R E M O D E & E F F E C T A N A LY S I S. PRESENTED BY: AJITH FRANCIS

AccuRAID iscsi Auto-Tiering Best Practice

THE FUNDAMENTALS OF THE AIR SAMPLER CALIBRATION-VERIFICATION PROCESS

SPLIT HOPKINSON PRESSURE BAR OPERATIONS MANUAL 5/9/2014

If you need to reinstall FastBreak Pro you will need to do a complete reinstallation and then install the update.

Intelligent Decision Making Framework for Ship Collision Avoidance based on COLREGs

The Evolution of Vertical Spatial Coherence with Range from Source

Rogue Wave Statistics and Dynamics Using Large-Scale Direct Simulations

A handy systematic method for data hazards detection in an instruction set of a pipelined microprocessor

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Essen,al Probability Concepts

Ivan Suarez Robles, Joseph Wu 1

Reported walking time and measured distances to water sources: Implications for measuring Basic Service

PREDICTING the outcomes of sporting events

Create a bungee line for an object to allow it the most thrilling, yet SAFE, fall from a height of 3 or more meters.

Machine Learning an American Pastime

Implementing Emergency Stop Systems - Safety Considerations & Regulations A PRACTICAL GUIDE V1.0.0

A Novel Approach to Predicting the Results of NBA Matches

Performance of Fully Automated 3D Cracking Survey with Pixel Accuracy based on Deep Learning

Transcription:

A history-based estimation for LHCb job requirements Nathalie Rauschmayr on behalf of LHCb Computing 13th April 2015

Introduction N. Rauschmayr 2 How long will a job run and how much memory might it need? Production Manager Underestimation: Job is killed. We lose the whole job! Overestimation: What to do with the remaining time

Introduction N. Rauschmayr 3 Well studied problem in High Performance Computing Some recent studies in HEP: CMS, WLCG multicore task force Multicore jobs: good runtime estimates allow better scheduling

Introduction N. Rauschmayr 3 Well studied problem in High Performance Computing Some recent studies in HEP: CMS, WLCG multicore task force Multicore jobs: good runtime estimates allow better scheduling

Introduction N. Rauschmayr 4 A lot of job meta data from past jobs: LHCb bookkeeping

Introduction N. Rauschmayr 4 A lot of job meta data from past jobs: LHCb bookkeeping Hardware specific: CPU model HS06 Cache size RAM Size

Introduction N. Rauschmayr 4 A lot of job meta data from past jobs: LHCb bookkeeping Hardware specific: CPU model HS06 Cache size RAM Size Input File Size Number of events Run number (LHCb RunDB) Trigger Configuration Key Avg. Multiplicity Avg. Luminosity...

Introduction N. Rauschmayr 4 A lot of job meta data from past jobs: LHCb bookkeeping Hardware specific: CPU model Job Start/End time HS06 Site Cache size Worker Node RAM Size Memory Footprint Input File Runtime Size Number of events Run number (LHCb RunDB) Trigger Configuration Key Avg. Multiplicity Avg. Luminosity...

Introduction N. Rauschmayr 4 A lot of job meta data from past jobs: LHCb bookkeeping Hardware specific: CPU model Job Start/End time HS06 Site Cache size Worker Node RAM Size Memory Footprint Input File Runtime Size Number of events Run number (LHCb RunDB) Trigger Configuration Key Avg. Multiplicity Avg. Luminosity...

Our Proposal N. Rauschmayr 5 Automate the prediction procedure based on prior jobs and given job meta data Supervised Learning Reduce false estimates Simplify Production Manager's life

Our Proposal N. Rauschmayr 6

Workload Analysis - Reconstruction Normalized CPU time per event: CPUTime HEPSPECValue NumberOfEvents Approximate a Gaussian distribution: Reprocessing 2011 6,000 Frequency 4,000 2,000 0 0 10 20 30 CPU-work per event in HS06.s

Workload Analysis - Reconstruction N. Rauschmayr 7 Normalized CPU time per event: CPUTime HEPSPECValue NumberOfEvents Approximate a Gaussian distribution: Reprocessing 2011 Reprocessing 2011 Maximum Likelihood 11.71 HS06.s Frequency 6,000 4,000 2,000 Frequency 6,000 4,000 2,000 0 0 10 20 30 CPU-work per event in HS06.s 0 -sigma +sigma 0 10 20 30 CPU-work per event in HS06.s

Workload Analysis - Reconstruction Comparison of 2011 and 2012 workloads: 2.5 104 2 Reprocessing 2011 2 Reprocessing 2012 2.5 104 Maximum Likelihood 17.91 HS06.s Frequency 1.5 1 0.5 Maximum Likelihood 11.71 HS06.s Frequency 1.5 1 0.5 0 -sigma +sigma 0 5 10 15 20 25 CPU-work per event in HS06.s 0 -sigma +sigma 0 10 20 30 CPU-work per event in HS06.s N. Rauschmayr 8

Workload Analysis - Stripping N. Rauschmayr 9 Stripping: Sort reconstructed events into different output streams 1.5 104 Reprocessing 2011 1.5 104 Reprocessing 2012 Maximum Likelihood 5.87 HS06.s Frequency 1 Maximum Likelihood 5.26 HS06.s 0.5 Frequency 1 0.5 0 +sigma +sigma 0 +sigma +sigma 2 4 6 8 10 CPU-work per event in HS06.s 0 2 4 6 8 10 12 CPU-work per event in HS06.s

Detect Most Important Features N. Rauschmayr 10 Certain well known correlations: Beam energy versus event size

Detect Most Important Features N. Rauschmayr 10 Certain well known correlations: Beam energy versus event size Pileup versus complexity of reconstruction Example: Reconstruction (2011 versus 2012)

Detect Most Important Features N. Rauschmayr 11 Avoid overfitting...

Detect Most Important Features N. Rauschmayr 11 Avoid overfitting... Linear regression: runtime per event = Θ 0 + Θ 1 x 1 + Θ 2 x 2 + Θ 3 x 3 + Θ 4 x 4 + Θ 5 x 5 + Θ 6 x 6

Detect Most Important Features Avoid overfitting... Linear regression: runtime per event = Θ 0 + Θ 1 x 1 + Θ 2 x 2 + Θ 3 x 3 + Θ 4 x 4 + Θ 5 x 5 + Θ 6 x 6 Normalize File Size Avg. Event Size HEPSPEC Number Of Events Avg. Luminosity Avg. Multiplicity

Detect Most Important Features Avoid overfitting... Linear regression: runtime per event = Θ 0 + Θ 1 x 1 + Θ 2 x 2 + Θ 3 x 3 + Θ 4 x 4 + Θ 5 x 5 + Θ 6 x 6 Normalize Find best Θ File Size Avg. Event Size HEPSPEC Number Of Events Avg. Luminosity Avg. Multiplicity 0.80 0.19-0.97-1.33 0.67-0.08

Detect Most Important Features Avoid overfitting... Linear regression: runtime per event = Θ 0 + Θ 1 x 1 + Θ 2 x 2 + Θ 3 x 3 + Θ 4 x 4 + Θ 5 x 5 + Θ 6 x 6 Normalize Find best Θ Remove small Θ File Size Avg. Event Size HEPSPEC Number Of Events Avg. Luminosity Avg. Multiplicity 0.80 0.19-0.97-1.33 0.67-0.08 1.05 x -0.97-1.55 0.59 x

Detect Most Important Features N. Rauschmayr 11 Avoid overfitting... Linear regression: runtime per event = Θ 0 + Θ 1 x 1 + Θ 2 x 2 + Θ 3 x 3 + Θ 4 x 4 + Θ 5 x 5 + Θ 6 x 6 Normalize Find best Θ Remove small Θ Evaluate RMSE File Size Avg. Event Size HEPSPEC Number Of Events Avg. Luminosity Avg. Multiplicity 0.80 0.19-0.97-1.33 0.67-0.08 1.05 x -0.97-1.55 0.59 x 2.16 22% better than naive estimator like MLE

Detect Most Important Features N. Rauschmayr 12 Stripping Jobs: Normalize Find best Θ Remove small Θ Evaluate RMSE File Size Avg. Event Size HEPSPEC Number Of Events Avg. Luminosity Avg. Multiplicity -0.19 0.70 0.19 0.23-0.08 0.003-0.18 0.62 0.19 0.27 x x 0.54 25% better than naive estimator like MLE

Supervised Learning N. Rauschmayr 13 Supervised learning: there exist some labelled training data What if only little amount of training data available?

Supervised Learning N. Rauschmayr 14 1. Find similar jobs which have already run 2. Predict requirements for the next k jobs using either (MLE/LR) 3. When k jobs have finished, update prediction formula with the new results obtained 4. Repeat step 2 and 3 until all jobs have finished k Jobs Training Data k Jobs New Test Data

Supervised Learning N. Rauschmayr 14 1. Find similar jobs which have already run 2. Predict requirements for the next k jobs using either (MLE/LR) 3. When k jobs have finished, update prediction formula with the new results obtained 4. Repeat step 2 and 3 until all jobs have finished Error Accumulated Error k Jobs + k Jobs Training Data k Jobs New Test Data

Supervised Learning N. Rauschmayr 14 1. Find similar jobs which have already run 2. Predict requirements for the next k jobs using either (MLE/LR) 3. When k jobs have finished, update prediction formula with the new results obtained 4. Repeat step 2 and 3 until all jobs have finished Error Accumulated Error k Jobs + k Jobs + k Jobs Training Data k Jobs New Test Data

Supervised Learning N. Rauschmayr 14 1. Find similar jobs which have already run 2. Predict requirements for the next k jobs using either (MLE/LR) 3. When k jobs have finished, update prediction formula with the new results obtained 4. Repeat step 2 and 3 until all jobs have finished Accumulated Error Error k Jobs + k Jobs + k Jobs + k Jobs Training Data k Jobs New Test Data

Supervised Learning Root Mean Squared Error: n i=0 (difference i) 2 n, where difference is predicted minus real values Update after 100 jobs Update after 100 jobs Accumulated RMSE 4 3 2 1 MLE LR RMSE for each k 10 8 6 4 2 0 0 0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5 Job Number 10 5 Job Number 10 5 N. Rauschmayr 15

Conclusion N. Rauschmayr 16 Historical data can help us to predict future jobs Certain meta data are strongly correlated with runtime Improved prediction up to 25% Both models (naive estimator and linear regression) can be easily implemented in Production and support the Production Manager in his work

N. Rauschmayr 17 Questions

Backup Slides N. Rauschmayr 18 Trigger configuration changes over different runs

Backup Slides N. Rauschmayr 19 Stripping Jobs: Memory and Runtime per Event Accumulated Error Update after 100 jobs 7 104 6.5 6 5.5 5 4.5 MLE LR 0.2 0.4 0.6 0.8 1 1.2 Job Number 10 5 Accumulated Error 2 1.5 1 0.5 Update after 100 jobs MLE LR 0 0 0.2 0.4 0.6 0.8 1 1.2 Job Number 10 5

Backup Slides Reconstruction: Memory Footprint 2.5 104 Reprocessing 2011 Reprocessing 2012 2.5 104 Maximum Likelihood 1.76 GB 2 2 Frequency 1.5 1 0.5 Maximum Likelihood 1.6 GB Frequency 1.5 1 0.5 0 -sigma +sigma 0 -sigma+sigma 1.6 1.8 2 1.4 1.6 1.8 2 Memory in kb 10 6 Memory in kb 10 6 N. Rauschmayr 20

Backup Slides Stripping: Memory Footprint Change from POOL to ROOT 2.5 104 Reprocessing 2011 2.5 104 Reprocessing 2012 2 2 Maximum Likelihood 3.4 GB Frequency 1.5 Maximum Likelihood 2.2 GB 1 0.5 Frequency 1.5 1 0.5 0 -sigma +sigma 0 -sigma+sigma 2 2.2 2.4 2.6 2.8 3 3 3.2 3.4 3.6 3.8 4 Memory in kb 10 6 Memory in kb 10 6 N. Rauschmayr 21