44
Fifth Workshop on Energy - Efficient Design Weed 2013 June - 2013 Need good name 1 Energy efficient computing in high performance systems Efraim Rotem Intel Corporation, Technion . Israel Ran Ginosar Technion, Israel Avi Mendelson Technion, Israel Uri Weiser Technion, Israel June 2013 Work supported by“ICRI-CI” – Intel Collaborative Research Institute for Computational Intelligence”

Energy efficient computing in high performance systems

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Need good name 1

Energy efficient computing in high performance systems

Efraim RotemIntel Corporation, Technion .

Israel

Ran GinosarTechnion,

Israel

Avi MendelsonTechnion,

Israel

Uri WeiserTechnion,

Israel

June 2013

Work supported by“ICRI-CI” – Intel Collaborative Research Institute for Computational Intelligence”

Page 2: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

1978 1982 1986 1990 1994 1998 2002 2006

1

10

100

1,000

10,000

Source: Dave Patterson

386

486

Pentium

P-2

P-4

Core-2 Duo

MMX

286PC-XT8086

Compute Performance

2

Page 3: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -20133

The best is yet to come• Clients:

• Create, innovate, Collaborate

• Perform complex tasks

• Deliver computational density

• Economy of scale

• Ubiquitous computing

• Lower entry cost

• Rich content compute demand

• Audio visual

• Cognitive computing

• Servers:

• Drive the connected world

• Google, Facebook …

• Perform some of the compute for thin clients

• Large scale computing

• Finance, science, engineering, cloud computing

Page 4: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Moore’s law for power performance

• Theoretical scaling of new Process technology:– Linear Dimensions: Shrinks by 0.7– Area: Shrinks by 0.5– Capacitance: Shrinks by 0.7– Voltage: Scale down by 0.7– Frequency: Scale up by 1/0.7– Power Scale down by 0.5

There’s additional transistor and power budget for: New features Architectural extensions Performance improvement

Half the area

Half the power

Sustainable Performance improvement at same power consumption

Power = C * V2 * F + Leakage

Page 5: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Recent Reality - Deep in the power wall Practical scaling factors:

Linear Dimensions, Area and active capacitance continue to shrink Interconnect impact increases

Voltage: Roughly the same!!! Frequency A design choice Between leakage and speed Power = C * V2 * F + Leakage Roughly the same @ 1/0.7 higher freq. Power = C * V2 * F + Leakage 0.7X power @ same freq. and ½ area

Same transistor count 0.7-1X

power, Higher power density

Same area 1.5-2X power

The Power wall – Moors law will continue delivering transistor density but tough design and architectural choices: Higher power density to enable the frequency speedup

Harder to cool

Any architectural additions come at cost on higher power

Cdyn/mm^2 increases with process shrink and architectural improvements

Process energy efficiency break even – 2X transistors 1.5X perf.

*Reference: “MultiAmdahl: How Should I Divide My Heterogeneous Chip?”, T. Zidenberg et. al.

Page 6: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Energy Density Over Time

250 180 130 90 65 45

0.5

1.0

1.5

2.0

32 22

Technology Node

No

rm

alized

En

erg

y D

en

sit

y

More then Moore’s Law

• Moors law delivers transistor density

• No longer delivers energy efficiency

• At the same area, power and energy consumption increase

• Power and energy efficiency is back in engineering hands

6

Page 7: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Historical power trends

7

Cd

yn[n

F]

P ~ Cdyn*V2*fCdyn trending ~ fixed for a given architecture – core area shrinksDynamic range of power increases (10X on recent products)

486 Pentium™ P6 Centrino™ Core™

~ equal Cdyn

@ smaller area

Page 8: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Power Management FundamentalsMaximize user experience under multiple constraints• User Experience (May have different preferences):

– Throughput performance

– Responsiveness - burst performance

– Ergonomics (acoustic noise, skin temp)

– Battery life / energy consumption: on and standby

• Optimizing around Constraints to meet user preferences– Silicon capabilities

– System Thermo-Mechanical capabilities – short and long

– Power delivery capabilities – from the wall to the transistor

– Workload and usage

– Workload dynamic range

Page 9: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Physical constraints

thermo-mechanical

Page 10: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

TDP – Thermal design power

Traditional design approach – worst case designMax realistic power at steady state for long period of time

ARD Application Power (5s Peak)

25

27

29

31

33

35

37

39

Pre

mie

r P

ro C

S3

+ U

T2

00

4P

rem

ier

Pro

CS

3 +

Fa

rCry

Pre

mie

r P

ro C

S3

+ S

tar

Wa

rsP

rem

ier

Pro

CS

3 +

UT

20

04

+P

rem

ier

Pro

CS

3 +

Lo

st

Pla

ne

tP

rem

ier

Pro

CS

3 +

Ca

ll o

f D

uty

Pre

mie

r P

ro C

S3

+ W

ME

PC

Ma

rk V

an

tag

eP

rem

ier

Pro

CS

3T

AT

@ 1

00

WM

E +

Lo

st

Pla

ne

tW

ME

+ F

arC

ryP

CM

ark

05

WM

E +

UT

20

04

WM

E +

Ca

ll o

f D

uty

43

DM

ark

Va

nta

ge

Po

we

rDir

ec

tor

7 (

VC

1 -

MP

G4

)P

ow

erD

ire

cto

r 7

(H

.26

4 1

08

0p

-3

DM

ark

Va

nta

ge

(8

x6

)P

ow

erD

ire

cto

r 7

(H

.26

4 -

MP

G4

)W

ME

So

ny

Ve

ga

s

Po

we

rDir

ec

tor

7 (

H.2

64

- M

PG

2)

3D

Ma

rk0

6P

ow

er

Pro

du

ce

r 5

3D

Ma

rk0

6 (

All)

Lo

st

Pla

ne

t E

xtr

em

e (

10

x7

)3

DM

ark

06

(A

ll)

Ca

ll o

f D

uty

4 (

8x

6)

Fa

rCry

Ca

ll o

f D

uty

4 (

16

x1

2)

3D

Ma

rk0

6U

T2

00

4C

all o

f D

uty

4 (

10

x7

)C

om

pa

ny

of

He

ros

(8

x6

)B

att

lefi

eld

2 (

16

x1

2)

SY

SM

ark

07

3D

Ma

rk0

3 (

De

mo

)C

om

pa

ny

of

He

ros

(8

x6

)F

EA

RB

att

lefi

eld

2 (

10

x7

)U

lea

d 1

1B

att

lefi

eld

2 (

14

x1

0)

3D

Ma

rk0

3 (

De

mo

8x

6)

Pri

me

95

(x

2)

FE

AR

Pre

y

Co

mp

an

y o

f H

ero

s (

10

x7

)F

EA

RS

tar

Wa

rs (

Me

nu

)L

os

t P

lan

et

Ex

tre

me

(1

1x

6)

Sta

r W

ars

(M

en

u)

Bio

sh

oc

k (

6x

4)

Bio

sh

oc

k (

8x

6)

Bio

sh

oc

k (

8x

6)

Sta

r W

ars

(In

tro

)C

all o

f J

ua

rez (

10

x7

)C

rys

is (

Intr

o)

Ca

ll o

f J

ua

rez (

10

x7

)C

rys

is (

GP

U)

Vid

eo

Ca

ptu

re (

6x

4)

Zip

HD

D-H

DD

Pri

me

95

(x

1)

Cry

sis

(C

PU

1)

Zip

HD

D-U

SB

HD

DC

rys

is (

CP

U2

)V

ide

o C

ap

ture

(2

.0M

P)

Idle

TDP

Po

we

r

Page 11: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Recent years change

• The rules of the game changed

– Focus on User Experience

• New innovative Form Factors

– High computation low power devices

– Skin temperature sensitive

– Impose changes on system engineering

• New usage models emerge at the data center

– Interactive web services: Google, Facebook *

*Source: “Online data intensive services”, D. Meisner et. al.

Page 12: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Classic ModelSteady-State Thermal Resistance

Design guide for steady state

New ModelSteady-State Thermal Resistance

GPU and CPU sharingAND

Dynamic Thermal Capacitance

New Concept: Thermal CapacitanceTe

mp

erat

ure

Time

Tem

per

atu

re

Time

More realistic response to

power changes

PCU manages energy budgets over multiple time constants

Classic model respond

CPU GPU

12

Example:Cp_Al ~ 0.9 J/(gr*’K)100gr heat sink @ 35W 100Sec

Page 13: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Tablet Thermal example

Tj=90

Tskin=40

10sec

100sec

1000sec

1w 5w4w3w2w 6w

355sec

210sec~100sec

~50sec

1906sec

98sec

14.4sec

6.4sec

7200sec

Sustained

operation

System temp limit

Tj (junction) Limited

Operation region

Max power limit

Traditional “TDP”

Turbo

Page 14: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Mapping the Usages Of Interest

Short(hitting PD, Freq constraints)

Long (hitting system power constraints, Tskin)

Max Perf within system constraints

Meet QoS @ min Energy (BL)

Idle

Video PB

MP3 PB

Casual game

Web surfing

Video encode

Create PDF

Photo editing

File compression

Video encode

Create PDF

Photo editing

File compression

Math Apps Math Apps Heavy games

AOAC

Time

Pow

er/P

erf

VIRUS

TDP

Page 15: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Mapping the Usages Of Interest

Short(hitting PD, Freq constraints)

Long (hitting system power constraints, Tskin)

Max Perf within system constraints

Meet QoS @ min Energy (BL)

Idle

Video PB

MP3 PB

Casual game

Web surfing

Video encode

Create PDF

Photo editing

File compression

Video encode

Create PDF

Photo editing

File compression

Math Apps Math Apps Heavy games

AOAC

Time

VIRUS

TDP

Tj=90

Tskin=40

10sec

100sec

1000sec

1w 5w4w3w2w 6w

355sec

210sec~100sec

~50sec

1906sec

98sec

14.4sec

6.4sec

7200sec

Pow

er/P

erf

Page 16: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

What is CPU “Turbo”• P1 is Guaranteed frequency

– Wide dynamic power rage

• P0 is max possible frequency– P1 to P0 range is fully H/W controlled

• P1-P0 has significant frequency range (GHz)– Single thread performance

– Light load performance

• Various possible policies and user preferences

• Pn is the energy efficient point– Lower then Pn is controlled by T-state

“Turbo”H/W

Control

OS VisibleStates

OS Control

T-state &Throttle

P1

Pn

P0 1C

Vo

ltag

e an

d f

req

uen

cy

P0 2/3/4C

LFM

16

Page 17: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Intel® Turbo-Bust Technology• Turbo enabled product specifications

P1 P0

CPU

P1 P0

PG TDP total package sustained power

Source: http://www.intel.com/Assets/PDF/datasheet/324692.pdf

Page 18: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Power Telemetry

• Power management is based on measurements

• Intel® SOC implement power meter

– Used for power management algorithms

– Architecturally exposed to software and system

– For the use of S/W or system embedded controller

Average accuracy – 0.9%STDEV 0.6%

0

5

10

15

20

25

30

35

40

45

0 50 100 150 200 250

CPU - predicted

GPU - predicted

Package - predicted

CPU - actual

GPU - actual

Package - actual

18

Page 19: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Time

Power

Sleep orLow power

“Turbo”

“TDP”

C0(Turbo)

After idle periods, the system accumulates “energy budget” and can accommodate high power/performance for a few seconds

In Steady State conditions the power stabilizes on TDP

Buildup thermal budget during idle periods

Use accumulated

energy budget to enhance user

experience

Intel® Turbo-Boost Technology 2.0

19

Page 20: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Energy Efficient P-State - optimizing MIPS / Watt

• Voltage scaling is not energy efficient– Used to get raw performance

– Some applications are less energy efficient than others

– May still be more efficient then bringing up another system

• Not all workloads gain performance from frequency– For example – many memory accesses poor scalability

– “Wait slowly” accumulate energy headroom

• Continuously generate “scalability” metric– Drop frequency (less turbo) if scalability is low

– Save energy OR more performance at same energy

20

Page 21: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Time

Power

Sleep orLow power

“Turbo”

“TDP”

Buildup thermal budget during idle periods

Intel® Turbo Boost Technology 2.0

21

Max current

* Source: “Multiple Clock and Voltage Domains for Chip Multi Processors”, Rotem et. al.

Turbo introduces very high power dynamic range stress the power delivery network

Page 22: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Power delivery constraints

Georg Simon Ohm16 March 1789 – 6 July 1854

Page 23: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Power delivery constraints

• Power Delivery limits

– Wall To System

– System to package

– Inside Package

• DC sustained power/current

• Instantaneous and transients

Page 24: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Mobile platform PDN

• Power supply and battery current feeding the total platform are also limited

IA core

GT

PCU

SoC

CPU / GT Platform VR

Brick/Silver Box

Battery

X

SVID

Space is limited in tablets and SFF

Page 25: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

What PDM parameters limit us• VR Max Icc (total package)

– “TDP” - Need to sustain forever (thermal limit)

– “Virus” – Long time and O.K. to thermal throttle

– Instantaneous – should be treated as “never exceed

• I*R drop on DC and AC load line

• Load release overshoot

• FET max current and magnetic saturation – technology dependent

• Over current protection

• Battery and brick max Icc (total platform!)

– Battery electrical max current and overheat

– Brick over current protection

O.K. to apply control

By design or a-priori

Fast control or a-priori

Page 26: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

PDN controls in action

Time

P-statePower

Actual instantaneous power

Hard LimitMax Icc

Power limit 2

Power limit 1

PL1 time exp. average

C0 P0

Voltage Regulator reported capability

CURRENT_CONFIG_CONTROL MSR

TURBO_POWER_LIMIT Control MSR

Enables and locks

Package Power limit 2 – Instantaneous

Package Power limit 1 Time interval

Package Power limit 1 clamp bit

Package Power limit 1 - power

•Also:• Individual power controls available• Explicit frequency control

User / OEM / OS preference

Allow programmability

Page 27: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Platform Energy

Energy Aware Race to Halt

27

Source: “Energy Aware Race to Halt: A Down to EARtH Approach for Platform Energy Management”, E. Rotem et al.

Page 28: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Combined platform energy

• Total platform energy may have an optimal freq.– Possibly within the operation range

• Can we calculate this point at run time?– Minimize energy to complete a task

28

Performance

Energy

Pe - Globalminimum

ECPU~ f2Esystem ~ 1/f

QoS

EE algorithmRace to Halt

Page 29: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

The analytical model – run time

• Fixed platform power– Continues: Components idle power

– while executing: Component active idle

• Fixed platform energy: data transfer cost

• Freq. Dependent Energy: CPU DVFS

29

Platform run time power

Platform constant power

tCPU tMEM

CPU

Idle

CPU Active power

Time

Po

we

r

Categorizing:

Page 30: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

The analytical model – Some algebra

30

• CPR is a Parameter that can describes system power characteristics compared to workload power characteristics

• SCA is a characteristic that represents Amdahl behavior of a workload and represents how well performance is scales with frequency

Page 31: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Exploring the energy function

31

• Relative platform energy as a function of freq.

– SCA = 1 ; Different CPR values

– CPU power >> system power LFM and vice versa

0.60

0.80

1.00

1.20

1.40

1.60

1.80

1.0 1.2 1.4 1.6 1.8 2.0 2.2

Plat

form

Ene

rgy

Relative Frequency

Platform energy vs. Frequency

0.33

0.26

0.20

0.15

0.11

0.08

0.06

Optimal Fc

CPR values

Aligns well with intuition

Page 32: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Exploring the energy function

32

0.60

0.70

0.80

0.90

1.00

1.10

1.20

1.30

1.40

1.00 1.20 1.40 1.60 1.80 2.00 2.20

Pla

tfo

rm E

ne

rgy

Relative Frequency

Platform energy vs. Frequency

1

0.71

0.50

0.35

0.25

0.18

Optimal Fc

SCA values

• Relative platform energy as a function of freq.

– Fixed CPR ; Different SCA values

– The lower SCA the lower is optimal frequency

Aligns well with intuition

Page 33: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Heterogeneous computing

Combining a mix of “big” and “small” cores

• High performance high power cores

– For demanding compute tasks

– Excel on single threaded workloads

• Small, energy efficient cores

– Excel on low QoS workloads

– Many of them perform multi threaded workloads efficiently

33

Page 34: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

EARtH– hetro platform energy

34

0.88

0.90

0.92

0.94

0.96

0.98

1.00

1.02

1.04

0.6 0.8 1.0 1.2 1.4 1.6

Relat

ive tot

al ener

gy

Relative Frequency

Total energy vs. Frequency - hybrid core

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1

Optimal Fc

SCA

“BIG” core“Small” core

• In general, small core is more energy efficient

• Platform energy may be different– For some CPR/SCA values – big core can be more

energy efficient

Page 35: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Homogeneous core policy results

• Big core usually gains from low frequency

• Small core usually gains from RtH

• But - not always

– Cannot predict a-priori which works better

35

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

Tota

l pla

tfo

rm e

ner

gy s

avin

gs

Standard voltage CPU

EARtH over LFM

EARtH over RtH

EARtH over random

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

Tota

l pla

tfo

rm e

ne

rgy

sa

vin

gs

Low voltage CPU

EARtH over LFM

EARtH over RtH

EARtH over random

Page 36: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Benefit from Hetro

• Energy savings of Asymmetric core compared to a CPU consisting big cores only assuming no QoS requirement

36

0%

5%

10%

15%

20%

25%

30%

35%En

erg

y Sa

vin

gs

Asymetric core energy savings

Asymetric core energy savings

Page 37: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

EARtH benefits on Hetro-CPU

• EARtH policy compared to fixed frequency policy

37

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

Ene

rgy

Savi

ng

Workloads (sorted)

Asymetric core energy savings

S-LFM S-RtH

B-LFM B-RtH

Page 38: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Sensitivity to platform power

• Sensitivity of the energy savings to platform power

• The higher the platform is – the more scenarios big core benefits

38

0%

20%

40%

60%

80%

100%

35% 50% 60% 70% 80% 85% 90%

wo

rko

ads

(%)

Platform power (%)

Type of core that achives the lowest energy

Small core is better Big core is better

Page 39: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Platform energy and energy “load line”

39

Page 40: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Energy proportionality

• Data centers are rarely fully utilized

• Energy cost of the data center is significant

• Server platform attempt to achieve power load line – propositional dependency between utilization and power consumption

40

Page 41: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Energy proportionality

41

SSJ Operations

Page 42: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Rack power delivery optimization

• Applying control vs worst case design

42

Page 43: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Summary

• Moors law drives transistor density

– Does not deliver energy efficiency

– Drives the need for aggressive energy efficient design, architecture and management

• Compute system goodness has many aspects

– Instantaneous and sustain performance

– Managed within multiple physical constraints

43

Page 44: Energy efficient computing in high performance systems

Fifth Workshop on Energy-Efficient Design – Weed 2013June -2013

Thank You

44