29
The SEEC Framework and Runtime System Henry Hoffmann MIT CSAIL http://heartbeats.csail.mit.edu/ High Performance Embedded Computing Workshop September 21-22, 2011 This work was funded by the U.S. Government under the DARPA UHPC program. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.

The SEEC Framework and Runtime System

  • Upload
    corina

  • View
    82

  • Download
    0

Embed Size (px)

DESCRIPTION

The SEEC Framework and Runtime System. Henry Hoffmann MIT CSAIL http://heartbeats.csail.mit.edu/ High Performance Embedded Computing Workshop September 21-22, 2011. - PowerPoint PPT Presentation

Citation preview

Page 1: The SEEC Framework and Runtime System

The SEEC Framework and Runtime System

Henry HoffmannMIT CSAIL

http://heartbeats.csail.mit.edu/

High Performance Embedded Computing WorkshopSeptember 21-22, 2011

This work was funded by the U.S. Government under the DARPA UHPC program. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.

Page 2: The SEEC Framework and Runtime System

In the beginning…*

2*The beginning, in this case, refers to the beginning of my career (1999)

Performance

Application programmers had one goal:

Page 3: The SEEC Framework and Runtime System

But Modern Systems Have Increased the Burden on Application Programmers

3

Performance

Pow

er

Quality

Resiliency

Many additional constraints

Even worse, constraints can change dynamicallyE.g. power cap, workload fluctuation, core failure

beat/s

Lo Hi

Power

Page 4: The SEEC Framework and Runtime System

Most Programming Models Designed for Performance

4

Coherent Shared Memory Message Passing

Communication

Concurrency

Coordination

Multi-threaded Multi-process

Through Memory Through Network

Locks Messages

Control Procedural Procedural

Global Shared Cache

LocalCache

RF

data

store

LocalCache

RF

data

load

Core 0 Core 1

data

LocalMemory

RF

data

store

LocalMemory

RFNetworkInterface

data

load

Core 0 Core 1

NetworkInterface

Network

Procedural control insufficient to meet the needs of modern systems

Page 5: The SEEC Framework and Runtime System

5

SEEC Replaces Procedural Control with Self-Aware Control

Procedural Control Self-Aware Control

Decide

Act

• Run in open loop • Assumptions made at design time• Based on guesses about future

Decide Act

Observe

• Run in closed loop• Understand user goals • Monitor the environment

- Application optimized for system- No flexibility to adapt to changes

+System optimizes for application+Flexibly adapt behavior

The self-aware model allows the system to solve constrained optimization problems dynamically

Page 6: The SEEC Framework and Runtime System

6

Outline

• Introduction/Motivation

• The SEEC Model and Implementation

• Experimental Validation

• Conclusions

6

Page 7: The SEEC Framework and Runtime System

7

The SElf-awarE Computing (SEEC) Model

• Goal: Reduce programmer burden by continuously optimizing online

• Key Features: 1. Decoupled Approach:

• Applications explicitly state goals and progress• System software and hardware state available actions• The SEEC runtime system dynamically selects actions to maintain goals

2. General and Extensible:• New applications can be supported without training• New actions can be added without redesign and reimplementation

Application Developer

Systems Developer

Observe Act

Decide

API SPI

SEECRuntime

Page 8: The SEEC Framework and Runtime System

8

Example Self-Aware System Built from SEEC

Video Encoder

Actuators

Goals:30 beat/s, Minimize Power

20 b/s30 b/s33 b/s

Algorithm Cores

Frequency Bandwidth

Control and Learning System

_r

Hea

rt R

ate

Time

pure delayslow convergenceoscillating

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

SEECController

Application-

Desired Heart Rate

_r

Observed Heart Rate

r(k)

Error

e(k)

Speedup

s(k)

1

2

3

4

5

6

7

8

9

0 5 10 15 20 25 30

Action

Spee

dup

Initial Model

Observe

Decide Act

Page 9: The SEEC Framework and Runtime System

9

Roles in SEEC’s Decoupled Model

Application Developer

Systems Developer

SEECRuntime System

Express application goals and progress(e.g. frames/ second)

Read goals and performance

Determine how to adapt (e.g. How much to speed up the application)

Provide a set of actions and a callback function(e.g. allocation of cores to process)

Initiate actions based on results of decision phase

Observe

Decide

Act

Page 10: The SEEC Framework and Runtime System

10

Registering Application Goals

• Performance– Goals: target heart rate and/or latency between tagged heartbeats– Progress: issue heartbeats at important intervals

• Quality– Goals: distortion (distance from application defined nominal value)– Progress: distortion over last heartbeat

• Power– Goals: target heart rate / Watt and/or target energy between tagged heartbeats– Progress: Power/energy over last heartbeat interval

Application

Lo Hi

Power

SEECDecision Engine

Performance

Power

Quality

Research to date focuses on meeting performance while minimizing power/maximizing quality

Observe

Page 11: The SEEC Framework and Runtime System

Actuators

11

Registering System Actions

Each action has the following attributes:• Estimated Speedup

– Predicted benefit of taking an action• Cost

– Predicted downside of taking an action – Axis for cost (accuracy, power, etc.)

• RPC handle– A function that takes an id an implements the associated action– This is currently subject to change/redesign

SEECDecision Engine

Estimated Speedup

Cost

Callback

Algorithm Cores

Frequency Bandwidth

Act

Page 12: The SEEC Framework and Runtime System

The SEEC Decision Engine(A general, extensible approach)

• Pros: Simple, Analyzable, Works well for profiled applications

• Cons: Lack of generality for unseen applications

12

Controller(Decide)

Actuator(Act)

Application(Observe)-

PerformanceGoal

CurrentPerformance

Classic Control System: Generalized second order system

Decide

Page 13: The SEEC Framework and Runtime System

The SEEC Decision Engine(A general, extensible approach)

13

Controller(Decide)

Actuator(Act)

Application(Observe)-

PerformanceGoal

CurrentPerformance

Classic Control System

ApplicationModel

(Decide)

Adaptive Control:Based on 1-DimensionalKalman Filter for Workload Estimation

• Pros: Adapts to unseen applications• Cons: Assumes (relative) system models are correct Cannot support race-to-idle

Decide

Page 14: The SEEC Framework and Runtime System

The SEEC Decision Engine(A general, extensible approach)

14

Controller(Decide)

Actuator(Act)

Application(Observe)-

PerformanceGoal

CurrentPerformance

Classic Control System

ApplicationModel

(Decide)

Adaptive Control

ResourceModel

(Decide)

Adaptive Action Selection:Approximates solution to linear programming problem for meeting goals and minimizing cost

• Pros: Supports race-to-idle and proportional allocation• Cons: May overprovision due to system model errors

Decide

Page 15: The SEEC Framework and Runtime System

The SEEC Decision Engine(A general, extensible approach)

15

Controller(Decide)

Actuator(Act)

Application(Observe)-

PerformanceGoal

CurrentPerformance

Classic Control System

ApplicationModel

(Decide)

Adaptive Control

Machine Learner:Uses reinforcement learning to estimate system models online, becomes control system when models converge

SystemModel

(Decide)

ResourceModel

(Decide)

Adaptive Action Selection

Decide

Page 16: The SEEC Framework and Runtime System

16

Outline

• Introduction/Motivation

• The SEEC Model and Implementation

• Experimental Validation

• Conclusions

16

Page 17: The SEEC Framework and Runtime System

17

Systems Built with SEEC

17

System Actions Tradeoff BenchmarksDynamic Loop Perforation

Skip some loop iterations Performance vs. Quality

7/13 PARSECs

Dynamic Knobs Make static parameters dynamic

Performance vs. Quality

bodytrack, swaptions, x264, SWISH++

Core Scheduler Assign N cores to application

Compute vs. Power PARSEC

Clock Scaler Change processor speed Compute vs. Power PARSEC

Bandwidth Allocator Assign memory controllers to application

Memory vs. Power STREAM,PARSEC

Power Manager Combination of the three above

Performance vs. Power

PARSEC, STREAM, simple test apps (mergesort, binary search)

Learned Models Power Manager with speedup and cost learned online

Performance vs. Power

PARSEC

Multi-App Control Power Manager with multiple applications

Performance vs. Power and Quality for multiple applications

Combinations of PARSECs

Page 18: The SEEC Framework and Runtime System

18

Systems Built with SEEC

System Actions Tradeoff BenchmarksDynamic Loop Perforation

Skip some loop iterations Performance vs. Quality

7/13 PARSECs

Dynamic Knobs Make static parameters dynamic

Performance vs. Quality

bodytrack, swaptions, x264, SWISH++

Core Scheduler Assign N cores to application

Compute vs. Power PARSEC

Clock Scaler Change processor speed Compute vs. Power PARSEC

Bandwidth Allocator Assign memory controllers to application

Memory vs. Power STREAM

Power Manager Combination of the three above

Performance vs. Power

PARSEC, STREAM, extra test apps (mergesort, binary search)

Learned Models Power Manager with speedup and cost learned online

Performance vs. Power

PARSEC

Multi-App Control Power Manager with multiple applications

Performance vs. Power and Quality for multiple applications

Combinations of PARSECs

18

Page 19: The SEEC Framework and Runtime System

19

Constrained Optimization: Managing Performance/Watt for PARSEC

19

Application Goals

System Actions

Experiment

Optimize performance/Watt on multiple machines

Maintain performance, minimize power

Allocate cores

Allocate clock speed

Allocate memory bandwidth

Execute on two machines (w/ different power profiles)

Compare SEEC to several other approaches including a static oracle

Page 20: The SEEC Framework and Runtime System

Performance/Watt for PARSEC(On server with low idle power)

SEEC beats the static oracle by adjusting to phases within an application and recognizing when to race-to-idle

black

scho

les

body

track

cann

eal

dedu

p

faces

imfer

ret

fluida

nimate

freqm

ine

raytra

ce

strea

mcluste

r

swap

tions vip

sx2

64

Averag

e0

0.2

0.4

0.6

0.8

1

1.2

1.4

static oracle Heuristic Classic Control SEEC (AAS) SEEC (ML)

Benchmark

Norm

aliz

ed P

erfo

rman

ce P

er W

att

Page 21: The SEEC Framework and Runtime System

Performance/Watt for PARSEC(On server with high idle power)

SEEC is able to beat the static oracle on a different machine without code changes

black

scho

les

body

track

cann

eal

dedu

p

faces

imfer

ret

fluida

nimate

freqm

ine

raytra

ce

strea

mcluste

r

swap

tions vip

sx2

64

Averag

e0

0.20.40.60.8

11.21.41.6

static oracle Heuristic Classic Control SEEC (AAS) SEEC (ML)

Benchmark

Norm

aliz

ed P

erfo

rman

ce P

er W

att

Page 22: The SEEC Framework and Runtime System

22

Learning Models Online

22

Application Goals

System Actions

Experiment

Adapt system behavior when initial models are wrong

Minimize power consumption while meeting target performance

Change cores, clock speed, and mem. bandwidth

Initial models are incredibly optimistic(Assume linear speedup with any resource increase)

Benchmark: STREAM

Observe convergence time and performance/Watt for converged system

Page 23: The SEEC Framework and Runtime System

SEEC Can Learn Models Online

0 50 100 150 200 250 300 350 4000

0.2

0.4

0.6

0.8

1

1.2

Classic Control SEEC

Time

Nor

mal

ized

Per

form

ance

Per

W

att

SEEC learns not to allocate too many compute resources to a memory bound application

Page 24: The SEEC Framework and Runtime System

When System Models Are Wrong(Breakdown)

24

0 50 100 150 200 250 300 350 4000

0.2

0.4

0.6

0.8

1

1.2

1.4

Classic Control SEEC (AAS)SEEC (ML)

Time

Norm

aliz

ed P

ower

0 50 100 150 200 250 300 350 4000

0.2

0.4

0.6

0.8

1

1.2

Classic Control SEEC (AAS)SEEC (ML)

Time

Norm

aliz

ed P

erfo

rman

ce

Adaptive Control achieves performance fastest, but wastes power.ML reaches target performance slowest, but saves power

Page 25: The SEEC Framework and Runtime System

25

Managing Application and System Resources Concurrently

25

Application Goals

System Actions

Experiment

Manage multiple applications when clock frequency changes

bodytrack: maintain performance, minimize power

x264: maintain performance, minimize quality loss

Change core allocation to both applications

Change x264’s algorithms

Maintain performance of both applications when clock frequency changes

Page 26: The SEEC Framework and Runtime System

26

SEEC Management of Multiple ApplicationsIn Response to a Power Cap

bodytrack x264

26

0

0.5

1

1.5

2

2.5

40 90 140 190 240Time (Heartbeat)

Nor

mal

ized

Per

form

ance

0

1

2

3

4

5

6

7

Cor

es

bodytrack w/ adaptationbodytrackbodytrack cores

0

0.5

1

1.5

2

2.5

40 90 140 190 240Time (Heartbeat)

Nor

mal

ized

Per

form

ance

0

1

2

3

4

5

6

7

Cor

es

x264 w/ adaptationx264x264 cores

Clock drops 2.4-1.6GHz w/o SEEC app

misses goals

SEEC allocates cores to bodytrack

w/o SEEC app exceeds goals

SEEC removes cores from x264

SEEC adjusts algorithm to meet goals

Page 27: The SEEC Framework and Runtime System

27

Outline

• Introduction/Motivation

• The SEEC Framework

• Experimental Validation

• Conclusions

27

Page 28: The SEEC Framework and Runtime System

28

Conclusions

• SEEC is designed to help ease programmer burden– Solves resource allocation problems – Adapts to fluctuations in environment and application behavior

• SEEC has two distinguishing features– Decoupled Design

• Incorporates goals and feedback directly from the application• Allows independent specification of adaptation

– General and Extensible Decision Engine• Uses an adaptive second order control system to manage adaptation

• Demonstrated the benefits of SEEC in several experiments– Optimize performance per Watt for multiple benchmarks on multiple

machines– Adapts algorithms and resource allocation as environment changes

28

Page 29: The SEEC Framework and Runtime System

Thanks

• DARPA’s UHPC Program

• MIT, Politecnico di Milano, Freescale

• Martina Maggio, Marco D. Santambrogio, Jason E. Miller, Jonathan Eastep

• Stelios Sidiroglou, Sasa Misailovic, Michael Carbin

• Anant Agarwal, Martin Rinard, Alberto Leva, Jim Holt

29