Distributed Power Management: Technical Deep Dive + Real World Example

Similar documents
DataCore Cloud Service Provider Program (DCSPP) Product Guide

Volume A Question No : 1 You can monitor your Steelhead appliance disk performance using which reports? (Select 2)

CASE STUDY. Compressed Air Control System. Industry. Application. Background. Challenge. Results. Automotive Assembly

UNIVERSITY OF WATERLOO

Oracle Utilities Meter Data Management Release Utility Reference Model MDM.Manage VEE and VEE Exceptions

VALVE CRITICALITY MODELING

AccuRAID iscsi Auto-Tiering Best Practice

CONTROL SOLUTIONS DESIGNED TO FIT RIGHT IN. Energy Control Technologies

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Questions & Answers About the Operate within Operate within IROLs Standard

Bidirectional Forwarding Detection Routing

Application of Bayesian Networks to Shopping Assistance

SteelHead Product Family

POLICY GUIDE. DataCore Cloud Service Provider Program (DCSPP) DCSPP OVERVIEW POLICY GUIDE INTRODUCTION PROGRAM MEMBERSHIP DCSPP AGGREGATORS

itexamdump 최고이자최신인 IT 인증시험덤프 일년무료업데이트서비스제공

SQL LiteSpeed 3.0 Installation Guide

ENHANCED PARKWAY STUDY: PHASE 2 CONTINUOUS FLOW INTERSECTIONS. Final Report

Configuring Bidirectional Forwarding Detection for BGP

Virtual Breadboarding. John Vangelov Ford Motor Company

Product Overview. Product Description CHAPTER

SIDRA INTERSECTION 6.1 UPDATE HISTORY

Distributed Control Systems

Software for electronic scorekeeping of volleyball matches, developed and distributed by:

Author s Name Name of the Paper Session. Positioning Committee. Marine Technology Society. DYNAMIC POSITIONING CONFERENCE September 18-19, 2001

Persistent Memory Performance Benchmarking & Comparison. Eden Kim, Calypso Systems, Inc. John Kim, Mellanox Technologies, Inc.

Panther 5 Acute Care Ventilator

Microsoft System Center Data

Session Objectives. At the end of the session, the participants should: Understand advantages of BFD implementation on S9700

Ingersoll Rand. X-Series System Automation

MEETPLANNER DESIGN DOCUMENT IDENTIFICATION OVERVIEW. Project Name: MeetPlanner. Project Manager: Peter Grabowski

Academic Policy Proposal: Policy on Course Scheduling for the Charles River Campus ( )

ROUNDABOUT CAPACITY: THE UK EMPIRICAL METHODOLOGY

English. English. Predictive Multi Gas for

HAP e-help. Obtaining Consistent Results Using HAP and the ASHRAE 62MZ Ventilation Rate Procedure Spreadsheet. Introduction

Autodesk Moldflow Communicator Process settings

Accelerate Your Riverbed SteelHead Deployment and Time to Value

exsm.cluster High Availability for TSM Server Michael Abel & Bruno Friess TSM Symposium Oxford September 2005 Hier Kundenlogo

At each type of conflict location, the risk is affected by certain parameters:

Code for the Provision of Chargeable Mobile Content Services

ONE SIZE DOESN T FIT ALL RECONCILING OVERLAPPING TRANSPORT NETWORKS IN A CONSTRAINED URBAN ENVIRONMENT

CS 341 Computer Architecture and Organization. Lecturer: Bob Wilson Cell Phone: or

The MQ Console and REST API

Mathematics of Planet Earth Managing Traffic Flow On Urban Road Networks

AC : MEASUREMENT OF HYDROGEN IN HELIUM FLOW

MIKE NET AND RELNET: WHICH APPROACH TO RELIABILITY ANALYSIS IS BETTER?

APPENDIX 2 PROPOSAL FOR REPLACEMENT FOR POOLE (DOLPHIN) LEISURE CENTRE FINAL SPECIFICATION

SIL Safety Manual. ULTRAMAT 6 Gas Analyzer for the Determination of IR-Absorbing Gases. Supplement to instruction manual ULTRAMAT 6 and OXYMAT 6

Exhibit 1 PLANNING COMMISSION AGENDA ITEM

Steam Turbine Performance Measurement. Pressure Measurement

Lenovo ThinkAgile SX for Microsoft Azure Stack Planning and Setup Guide

THE HORNS REV WIND FARM AND THE OPERATIONAL EXPERIENCE WITH THE WIND FARM MAIN CONTROLLER

Computer Operator Instructions. Extra Instructions for High School District Championship Meets

Ron Gibson, Senior Engineer Gary McCargar, Senior Engineer ONEOK Partners

Quickstart Installation Checklist (Please refer to operation manual for complete installation instructions)

UBEC 1AT. AUTO TANK Fill System Installation, Operation, & Setup Instructions

Code for the Provision of Chargeable Mobile Content Services

( ) ( ) *( A ) APPLICATION DATA. Procidia Control Solutions Coarse/Fine Control. Split-Range Control. AD Rev 2 April 2012

Appendix A COMPARISON OF DRAINAGE ALGORITHMS UNDER GRAVITY- DRIVEN FLOW DURING GAS INJECTION

Diver Training Options

Safety Manual VEGAVIB series 60

NASCAR Media Group CASE STUDY: LOCATION: Charlotte, NC GOAL: SOLUTION:

2017 IIHF BID REGULATIONS

IDeA Competition Report. Electronic Swimming Coach (ESC) for. Athletes who are Visually Impaired

Agent Based Urban Models The Notting Hill Carnival Model

GUIDE TO RUNNING A BIKE SHARE. h o w t o p l a n a n d o p e r a t e a s u c c e s s f u l b i k e s h a r e p r o g r a m

The Future of Hydraulic Control in Water-Systems

Drilling Efficiency Utilizing Coriolis Flow Technology

Micro Environmental Control Systems Woda Sci Sump Sprinkler and Home Water Control System Manual Contents

CMC Peer-to-Peer Control System Software. User s Manual Version 1.0

ELMM manual. Bite Technologies, LLC 657 S. Mechanic Street Pendleton, SC

Assessment of correlations between NDE parameters and tube structural integrity for PWSCC at U-bends

Improving the Bus Network through Traffic Signalling. Henry Axon Transport for London

Safety Manual VEGAVIB series 60

Pedestrian Dynamics: Models of Pedestrian Behaviour

Low Level Road Improvements Traffic Analysis. Report

Connect with Confidence NO POWER NO PROBLEM

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS

Meter Data Distribution Market Trials

HWBOINTS REVISION 7. Revision 7 is designed in response to community feedback with the intent of awarding points

Warranty The device shall have a 6-year warranty at minimum

Operational Ranking of Intersections: A Novel Prioritization Methodology

Evaluating and Classifying NBA Free Agents

Hydronic Systems Balance

A Novel Decode-Aware Compression Technique for Improved Compression and Decompression

CONTENTS PREFACE 1.0 INTRODUCTION AND SCOPE 2.0 POLICY AND GOVERNANCE 3.0 SUMMARY OF PROGRESS 4.0 NATURE OF DEMAND 5.0 TRAVEL AND PARKING INITIATIVES

FedRAMP Continuous Monitoring Performance Management Guide. Version 2.0

1. TEAM RULES AND REQUIREMENTS

Instrument pucks. Copyright MBARI Michael Risi SIAM design review November 17, 2003

Standing Committee on Policy and Strategic Priorities. Mount Pleasant Industrial Area Parking Strategy and Access Improvements

Modeling of Hydraulic Hose Paths

SHIMADZU LC-10/20 PUMP

Introduction to Roundabout Analysis Using ARCADY

Dockless Cycle Share

10 SHERFORD Town Code

Self-Organizing Signals: A Better Framework for Transit Signal Priority

Advances in Low Voltage Motor Control Center (MCC) Technology Help Reduce Arc-Flash Hazards and Minimize Risks

Optimizing Compressed Air Storage for Energy Efficiency

User Help. Fabasoft Scrum

The Usage of Propeller Tunnels For Higher Efficiency and Lower Vibration. M. Burak Şamşul

Transcription:

Distributed Power Management: Technical Deep Dive + Real World Example Breakout Session # TA2197 Anne Holler Anthony Vecchiolla VMware Engineering International Integrated Solutions Date: September 18, 2008

Disclaimer This session may contain product features that are currently under development. This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new technologies or features discussed or presented have not been determined. These features are representative of feature areas under development. Feature commitments are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery.

Virtual Datacenter OS from VMware

Outline of Talk Who needs Distributed Power Management? What is DPM? Fully-supported in 2009 How does DPM operate? Spoilers! Usage Algorithm Where might DPM be extended in the future? Why use DPM? Theory: 30 35% power savings in mainstream production cluster Real World Example: Tony got 73% w/dpm in test/dev cluster

Who Needs Distributed Power Management? Data center power consumption is an increasingly important concern. Issues include: Cost Environmental Impact Server consolidation via virtualization can result in large power savings Highlighted in EPA report to US congress for Public Law 109-431 Effectiveness recognized by power utilities [e.g., PG&E virtualization rebate] After server consolidation, additional opportunity to save power enabled VMware Virtual Infrastructure (VI) allows flexible use of capacity Distributed Power Management (DPM) leverages VI to save power

What is Distributed Power Management (DPM)? Response to Reduced Demand Power Off Right-sizes Cluster Capacity Consolidates virtual machines (VMs) onto fewer hosts & powers hosts off when demand is low Powers hosts back on when needed to meet workload demand or to satisfy constraints Optional add-on to Distributed Resource Scheduler (DRS) DRS Cluster with DPM enabled

What is Distributed Power Management? DPM is integrated with DRS Works with load balancing Respects QoS policies No disruption or downtime to VMs DRS manages cluster resource constraints & objectives: Maps VM & resource pool SLAs onto ESX hosts in a cluster Recommends VM initial placement & migration for load balancing DRS and DPM interoperate with High Availability (HA) which: Provisions and provides host & VM failover capability

Outline of Talk Who needs DPM? What is DPM? How does DPM operate? Usage Algorithm Where might DPM be extended in the future? Why use DPM? Theory Real World Example

How Does DPM Operate? Host Enter/Exit Standby Powers hosts completely off; idle hosts burn significant % of peak Powered-off hosts are in standby mode, available when needed In VC 2.5 / ESX 3.5, DPM operates on ESX 3.5+ hosts that can be awakened from ACPI S5 by Wake-on-LAN [WOL] packets Sent on VMotion networking interface by another ESX 3.5+ host

How Does DPM Operate? Enabling/Disabling DPM DPM can be disabled or enabled for cluster, disabled by default When enabled, DPM can operate in manual or automatic mode: DPM can be set disabled, manual, or automatic on a per-host basis Per-host settings apply only when DPM is enabled for the cluster

How Does DPM Operate? Recommendation Ratings Host power-off recommendations: 1 to 4 stars [aka priorities 5..2] Higher star rating means larger unused powered-on cluster capacity, hence more attractive opportunity for power savings Host power-on recommendations: 3..5 stars [aka priorities 3..1] 5 stars (priority 1): Meet HA or user-specified minimum powered-on capacity 3 or 4 stars (priority 3 or 2): Address high host utilization, w/higher number meaning host utilization closer to saturation DPM discards recommendations below specified threshold The default DPM recommendation threshold is 1 star in VC 2.5

How Does DPM Operate? Algorithm Overview Goal: Keep utilization of hosts in target range, subject to constraints specified by DRS, HA, and DPM s operating parameters Considers recommending host power-on operations, when there are hosts whose utilization is above target range Considers recommending host power-off operations, when there are hosts whose utilization is below target range DPM is run as part of periodic (default: each 5 mins) DRS invocation, after core DRS cluster analysis and rebalancing completes DRS may recommend host power-ons, if needed for migrations to address HA or DRS constraint violations, handle user requests involving host evacuation, or place VMs on hosts for VM power-on

How Does DPM Operate? Evaluating Utilization DPM aims to keep host s CPU and memory resource utilization within (default) range of 45% to 81% (63%+/-18%) Resource utilization = demand/capacity: Demand = total CPU or memory resource needed by VMs running on host Capacity = total CPU or memory resource available on host for running VMs Demand includes actual usage + unsatisfied demand estimate If host resource heavily contended, utilization can exceed 100%

How Does DPM Operate? Utilization, continued Host demand = sum across host s running VMs, of each VM s ave demand over historical period + 2 std dev (capped at max observed) Using average demand to ensure demand used is not anomalous Period DPM considers for evaluating demand that may lead to host power-on is last 5 minutes & to host power-off is last 40 minutes Responds relatively rapidly to increases in composite VM demand Responds relatively slowly to decreases in composite VM demand Using 2 standard deviations above average to provide wide coverage of probable demand range, based on past demand during period of interest

How does DPM Operate? Host Power-On Decision If high host utilization, DPM iterates through standby hosts Runs DRS what-if mode to rebalance VMs btw hosts, assuming standby host on Quantifies impact of powering on host wrt reducing number of highly-utilized hosts and/or distance above target utilization Computes per-resource score denoted highscore = sum of weighted distance above target utilization for each host above target Compares highscore for cluster with & without host powered-on If highscore stably improved for cluster with standby host powered-on, DPM generates a power-on recommendation for host In comparing highscores, if memory overcommitted on hosts, DPM gives reduction in memory utilization higher importance than impact on CPU

How Does DPM Operate? Host Power-On, continued Continues iterating through standby hosts, as long as there are hosts exceeding target utilization range for CPU or memory Then recommends powering on any additional hosts needed to reach minimum powered-on CPU or memory resources, which may be specified by HA, optionally set by the user, or defined by default Hosts powered-on solely to reach specified minimum amount of CPU or memory resources are not needed to accommodate VMs currently running in the cluster, and may be idle

How Does DPM Operate? Host Power-off Decision If low host utilization, DPM iterates through powered-on hosts. Runs DRS what-if mode to rebalance VMs btw hosts, assuming running host off Quantifies impact of powering off host wrt reducing number of lightly-utilized hosts and/or distance below target utilization Computes per-resource score denoted lowscore = sum of weighted distance below target utilization for each host below target Compares lowscore for cluster with & without host powered-off If lowscore value improved for cluster with host powered-off & if highscore for cluster is not worse, DPM generates recommendation to power-off the host, along with any needed prerequisite migrations of VMs off of that host

How Does DPM Operate? Host Power-off, continued Continues iterating through powered-on hosts, as long as there are any hosts below target utilization range for CPU and any for memory?????? Several additional factors considered wrt placing a host in standby DPM rejects power-offs if DRS in its most conservative setting DPM rejects power-offs taking powered-on capacity below minimum specified by HA or user; by default: keeps one host on DPM rejects power-off if conservatively-projected power-savings benefit doesn t exceed by specified multiplier the potential risk-adjusted performance cost, as described on the next slide

How Does DPM Operate? Host Power-off Cost/Benefit Host power-off has a number of potential associated costs, including Cost of migrating any running VMs off host Loss of host s resources during powering-off period Power consumed during powering-off period Performance loss if resources become needed to meet demand while host off Loss of host s resources during its subsequent powering-on period Power consumed during powering-on period Cost of migrating VMs back onto host after it is powered-on For host to be powered-off, DPM compares above costs weighted by risks with projection of power-savings benefit Makes estimates conservative by incorporating workload stability and recent worst-case demand Rejects power-off unless benefit exceeds cost by configurable factor

How Does DPM Operate? Host Sort for Power-on/-off For both power-on and power-off, hosts in DPM automatic mode are considered before hosts in DPM manual mode Hosts at same DPM automation level considered in capacity order first more critical resource (CPU or memory), then other resource Larger capacity hosts favored for power-on & smaller for -off For power-off, hosts at the same automation level and capacity are considered in order of lower VM evacuation cost For power-off, hosts matching on above considered in randomized order, to spread selection across hosts for a wear-leveling effect EPA report does not find published data showing power-cycling engenders wear. DPM may sort on other factors (e.g., host power efficiency) in future

How Does DPM Operate? Host Sort for Power-on/-off Host consideration order doesn t dictate order hosts are selected DPM invokes DRS in what-if mode for each candidate host and there are many reasons why host may be rejected, based on DRS operating constraints and objectives; examples: For host power-off, constraints may lead to inability to evacuate all VMs from a candidate host or cases in which VMs to be evacuated are only moveable to hosts that will then become (more) heavily utilized For host power-on, constraints may be such that no VMs would move to a host if it were powered-on or such that VMs that would move to a candidate host are not expected to reduce load on the highly-utilized hosts DPM doesn t strictly adhere to selection based on its host sort order if doing so would lead to choosing a host with excessively larger capacity than needed and if smaller capacity host that can adequately handle demand is available

How Does DPM Operate? References Recent Related VMWorld Talks 2008 TA2421: DRS Technical Deep Dive, Shanmuganathan & Bhatt. TA2469: Platform Power Management Opportunities for Virtualization, Brunner & Saxena. 2007 TA24: DRS Deep Dive & Technology Preview of DPM, Ji. References VMware Resource Management Guide. VMware Distributed Power Management: Technical Overview, white paper in preparation

Outline of Talk Who needs DPM? What is DPM? How does DPM operate? Usage Algorithm Where might DPM be extended in the future? Why use DPM? Theory Real World Example

Where Might DPM Be Extended in the Future? Fully-supported [not experimental] in 2009 Provide additional host wake method(s) in 2009 Explore increased interactions w/ ESX host-level power management Host-level power-management currently synergistic with DPM Investigate incorporating more metrics into DRS/DPM algorithm: Measurement: e.g., power (base & peak consumption), temperature Architectural: e.g., blade enclosure info, server room layout Examine extending VM demand prediction methods+use in DRS/DPM Evaluate impact of considering longer historical periods, employing additional statistical techniques, & incorporating prediction into more aspects of DPM

Outline of Talk Who needs DPM? What is DPM? How does DPM operate? Usage Algorithm Where might DPM be extended in the future? Why use DPM? Theory Real World Example

Why Use DPM? Theory Cluster not highly utilized 24/7 provides power saving opportunity Example 1: DRS+DPM 32 host cluster w/ 32 hosts 60% utilized @peak, 40% @non-peak During non-peak, cluster could have 21 hosts on, each 60% utilized Reduces power 34% (assuming linear scaling of power w/hosts on) Example 2: DRS+DPM+HA 32 host cluster w/2 host failover; 30 hosts 60% peak, 40% non-peak During non-peak, cluster could have 22 hosts on, 20 @60% utilized Reduces power 31% (again assuming linear scaling power w/hosts)

Why Use DPM? Real World Example, Here s Tony! Deployment Considerations Currently DPM is listed as Experimental No problem! Start by using it in your lab environment Traditional monitoring software must be configured to properly report a host s state when using DPM Traditional Node Up/Node Down monitoring will report a host failure when DPM puts a host into stand-by

Best Practices Test & verify Wake-on-LAN functionality DPM currently requires Wake-on-LAN Wake-on-LAN functionality MUST be individually tested on each host in the DPM cluster Start slow: experiment first with a subset of hosts DPM can be enabled/disabled on a per-host basis Powered off VMs and templates should be kept on hosts where DPM is disabled

A Real World Example Configured systems: HP C7000 Blade Chassis 16 BL-460c blades with 2 Intel quad core CPUs, 32GB RAM, FC mezzanine card, quad port NIC mezzanine card 4 Virtual connect ethernet modules 2 Virtual connect Fiber Channel modules 1 OA module 6 Power Supplies

A Real World Example, Continued All Blades in the chassis running VMware ESX version 3.5 All ESX servers boot from local disk All virtual machines reside on SAN storage

Power Profile without DPM 16 Blades running 120 dev and test Virtual machines Blade chassis powered by all 6 power supplies Blade chassis reported ~4800 watts of power consumed

Power Profile with DPM 4 Blades running 120 dev and test Virtual machines 12 Blades powered down Blade chassis powered by 2 power supplies (4 placed in standby automatically by chassis) Blade chassis reported ~1300 watts of power consumed

End Result DPM reduced power consumption by 73%! Without DPM, each VM required an average 40 watts of power With DPM, each VM required an average of less than 11 watts of power That s 11 watts per server!

Summary: Why Use DPM? Power Savings! Servers powered off use almost no power Servers powered off generate no heat and reduce the power requirements of HVAC systems

Q&A Breakout Session # TA2197 Anne Holler Anthony Vecchiolla VMware Engineering International Integrated Solutions Date: September 18, 2008