41
06/20/22 1 Power Management for Chip- level Multiprocessing Processors Kai Ma

Power Management for Chip-level Multiprocessing Processors

Embed Size (px)

DESCRIPTION

Power Management for Chip-level Multiprocessing Processors. Kai Ma. Background. To get better performance 1. Scale frequency (fast) 2. On-chip resource replication (parallel) Chip-MultiProcessing vs Simultaneous MultiThreading. SMT vs CMP. Other justification for CMP. - PowerPoint PPT Presentation

Citation preview

Page 1: Power Management for Chip-level Multiprocessing Processors

04/19/23 1

Power Management for Chip-level

Multiprocessing Processors

Kai Ma

Page 2: Power Management for Chip-level Multiprocessing Processors

04/19/23 2

Background

To get better performance

1. Scale frequency (fast)

2. On-chip resource replication (parallel) Chip-MultiProcessing vs Simultaneous MultiThreading

Page 3: Power Management for Chip-level Multiprocessing Processors

04/19/23 3

SMT vs CMP

SMT CMP

Technique Duplicate resources on one core

Duplicate cores on one die

Target Instruction level parallelism

Thread level parallelism

Implementation Basically redesign Reuse proven design

Area Small transistor increase

Proportionately to core number

Page 4: Power Management for Chip-level Multiprocessing Processors

04/19/23 4

Other justification for CMP

Memory wall, ILP wall, Power wall Higher cache coherency circuitry rate Signal integrity Future: Many cores (many specialized cores )

Page 5: Power Management for Chip-level Multiprocessing Processors

04/19/23 5

Power management for CMP

Reduce operating costs for energy and cooling Prolong battery life for portable and embedded systems Reduce cooling requirement Meet scalable performance target Heat dissipation and hotspot

Page 6: Power Management for Chip-level Multiprocessing Processors

04/19/23 6

Outline

1. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

Canturk Isci*, Alper Buyuktosunoglu*, Chen-Yong Cher*, Pradip Bose* and Margaret Martonosi

*IBM T.J. Watson Research Center Department of Electrical Engineering

Yorktown Heights Princeton University

2. Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors

Radu Teodorescu and Josep Torrellas

Department of Computer Science University of Illinois at Urbana-Champaign

Page 7: Power Management for Chip-level Multiprocessing Processors

04/19/23 7

Outline

An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget

1. Contribution

2. Global Power Management

3. Global Power Management Policies: core modes, power and performance matrix

4. Experimental Result and Evaluation

5. Conclusion

6. Critique

Page 8: Power Management for Chip-level Multiprocessing Processors

04/19/23 8

Contribution

Introduce a global power management

Develop a static power management analysis tool

Evaluate different policies for CMP power management

Page 9: Power Management for Chip-level Multiprocessing Processors

04/19/23 9

Global Power Management

Monitor the power and set working mode of each core

Page 10: Power Management for Chip-level Multiprocessing Processors

04/19/23 10

Global Power Management Policies

Priority: Slow down the core runs low priority task

PullhiPushLo: Speedup the low power core and slow down the high power core.

MaxBIPS: Predict and choose power mode combination

Page 11: Power Management for Chip-level Multiprocessing Processors

04/19/23 11

Core Power Modes

Underlying mechanism: DVFS Overhead: Order of microseconds Performance Degradation: Elapsed execution time for

benchmark

Page 12: Power Management for Chip-level Multiprocessing Processors

04/19/23 12

Power and BIPS Matrices

Power BIPS

Turbo 1 1*(500/507)

Eff1 1*0.95^3 1*0.95*(500/513)

Eff2 1*0.85^3 1*0.85*(500/520)

Page 13: Power Management for Chip-level Multiprocessing Processors

04/19/23 13

Experimental Methodology

SPEC CPU2000 benchmark A trace-based CMP analysis tool is incorporated with

IBM’s Turandot simulator Mode switch (500ns) and Statistics collection (50ns) During mode switch, no instruction execution, power is

consumed

Page 14: Power Management for Chip-level Multiprocessing Processors

04/19/23 14

Static vs Dynamic

Page 15: Power Management for Chip-level Multiprocessing Processors

04/19/23 15

Policy and Budget Curve

Page 16: Power Management for Chip-level Multiprocessing Processors

04/19/23 16

Power Saving

Page 17: Power Management for Chip-level Multiprocessing Processors

04/19/23 17

Power Management Result

Page 18: Power Management for Chip-level Multiprocessing Processors

04/19/23 18

Trends under CMP Scaling

The difference between MaxBIPS and oracle decreases with core number increasing

Increasing core numbers has smaller impact on MaxBIPS

CMP scales favor static per-core management over chip-wide DVFS

Page 19: Power Management for Chip-level Multiprocessing Processors

04/19/23 19

Conclusion

Global management is preferred

Dynamic management is preferred

MaxBIPS is efficient

Page 20: Power Management for Chip-level Multiprocessing Processors

04/19/23 20

Critique

MaxBIPS: Prediction is superlinearly dependent on the number of modes and core

Power performance estimation matrix: transition penalty

Not consider temperature

Page 21: Power Management for Chip-level Multiprocessing Processors

04/19/23 21

Outline

Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors

1. Background 2. Contribution 3. Algorithm 4. System Implementation 5. Evaluation 6. Conclusion 7. Critique

Page 22: Power Management for Chip-level Multiprocessing Processors

04/19/23 22

Background

For CMP, with-in die process variation impacts:

Static power consumption

Maximum frequency

Page 23: Power Management for Chip-level Multiprocessing Processors

04/19/23 23

Contribution

Propose variation-aware algorithms for application scheduling

Complement these algorithms with variation-aware DVFS

Page 24: Power Management for Chip-level Multiprocessing Processors

04/19/23 24

CMP Configuration

High level frequency and DVFS policy

Page 25: Power Management for Chip-level Multiprocessing Processors

04/19/23 25

Algorithms

Page 26: Power Management for Chip-level Multiprocessing Processors

04/19/23 26

Linear Programming

A technique for optimization of a linear objective function, subject to linear equality and linear inequality constraints

c and b are known vectors, A is a known matrix, x represents variables vector

Page 27: Power Management for Chip-level Multiprocessing Processors

04/19/23 27

Power Mode Selection: LinOpt TP : average throughput N: core number i : from 1 to N a(i) : constant depends on the thread and core v(i): core voltage b(i) and c(i): constants introduced to approximate power-voltage relation

Object function:

Constraints:

Page 28: Power Management for Chip-level Multiprocessing Processors

04/19/23 28

Power Mode Selection: SAnn

Use annealing algorithm to solve the power mode selection problem

SAnn searches all possible combination of core voltage

Compare to LinOpt: More accurate but more costly

Page 29: Power Management for Chip-level Multiprocessing Processors

04/19/23 29

System Implementation

Algorithm runs on a core or a power management unit At OS scheduling interval, OS assigns threads to cores by using

VarF&AppIPC Every 10ms, the LinOpt algorithm runs and sets the cores to correct

power

Page 30: Power Management for Chip-level Multiprocessing Processors

04/19/23 30

Profiling for Implementation

Page 31: Power Management for Chip-level Multiprocessing Processors

04/19/23 31

Evaluation Methodology

Variation:Varius model Power: SESC + Wattch+HotLeakage Temperature: HotSpot Critical Path Model:

1.Calculation path delay: Multiplier like unit

2.Memory: SRAM

3.Interconnection: Cacti

4.Gate delay: Alpha-power law

Page 32: Power Management for Chip-level Multiprocessing Processors

04/19/23 32

Workload

SPEC

Run different applications on different cores

12 billion instructions

Page 33: Power Management for Chip-level Multiprocessing Processors

04/19/23 33

Metrics

Total power Average frequency of active cores Throughput Energy delay-square product (consider Time-to-solution

and energy consumption) Weighted throughput: application’s IPC normalized to the

application’s IPC at reference conditions

Page 34: Power Management for Chip-level Multiprocessing Processors

04/19/23 34

Evaluation

Power and frequency variation on one die

Page 35: Power Management for Chip-level Multiprocessing Processors

04/19/23 35

Uniform Frequency & No DVFS

As the thread number increases, there is no less used core for thread mapping

Page 36: Power Management for Chip-level Multiprocessing Processors

04/19/23 36

NoUniform Frequency & No DVFS

Different cores run at different frequencies, by selecting less used core, they may end up with lower frequency ones.

Page 37: Power Management for Chip-level Multiprocessing Processors

04/19/23 37

NoUniFreq+DVFS

Throughput:

VarF&AppIPC+LinOpt is effective

Power:

throughput gains are high when power targets are low

Page 38: Power Management for Chip-level Multiprocessing Processors

04/19/23 38

LinOpt Granularity

Deviation between power consumed and power target decreases as interval between LinOpt run increases

Page 39: Power Management for Chip-level Multiprocessing Processors

04/19/23 39

Conclusion

With-in die variation substantially impacts static power consumed and maximum frequency

Variation-aware algorithms are proposed and analyzed, LinOpt is efficient

Page 40: Power Management for Chip-level Multiprocessing Processors

04/19/23 40

Critique

How to decouple thread mapping and power mode selection

Static power consumption and dynamic power consumption should be discussed separately

Thread mapping takes place once, thread migration should be considered

Page 41: Power Management for Chip-level Multiprocessing Processors

04/19/23 41

Comparison

MICRO UIUC

Objective Manage CMP processor power

Manage CMP processor power

Core 8 homogeneous Cores (POWER4 like)

20 inhomogeneous Cores (Alpha 21264 like)

Policy Global, Dynamic Global, Dynamic

Algorithm MaxBIPS LinOpt

Methodology Simulation (Turandot) Simulation (SESC)

Benchmark SPEC CPU2000 SPEC

Controller Ad-hoc Ad-hoc