21
1 PRiME : POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS Dr Geoff Merrett International Symposium on Many-Core Computing: Hardware and Software 18 January 2018 | Southampton, UK prime-project.org

PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

1

PRiME: POWER-EFFICIENT, RELIABLE, MANY-COREEMBEDDED SYSTEMS

Dr Geoff MerrettInternational Symposium on Many-Core Computing: Hardware and Software18 January 2018 | Southampton, UK

p r i m e - p r o j e c t . o r g

Page 2: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

2

THE PRiME PROJECT

“Enable the sustainability of many-core scaling by preventing the uncontrolled increase in energy consumption and

unreliability through a step change in holistic design methods and cross-layer system optimisation.“

www.prime-project.org

Page 3: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

3

THE PRiME PROJECT

Page 4: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

4

THE PRiME PROJECTwww.prime-project.org

http://www.prime-project.org/

Page 5: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

Runtime Management:a PRiME PerspectivePower/Energy Reliability

Page 6: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

6

PRiME’S APPROACH TO RTM

Maeda-Nunez, Luis Alfonso, Anup K. Das, Rishad A. Shafik, Geoff V. Merrett, and Bashir Al-Hashimi. "PoGo: an application-specific adaptive energy minimisation approach for embedded systems.” HiPEAC Workshop on Energy Efficiency with Heterogenous Computing, 2015

ondemandLinux Governor

PRiMEQ-Learning RTM

Page 7: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

7

REINFORCEMENT LEARNING RTM

Application DataSet AverageTemperature(Celcius) PeakTemperature(Celcius)Linux Ge etal. Proposed Linux Geetal. Proposed

tachyonset1 69.2 52.6 38.6 71.5 63 60set2 50.5 44.5 43.8 57.3 56.3 52set3 50.8 44.7 41.6 57.8 54.5 48.8

mpeg2_decclip1 36 34 34.2 42.7 41.3 39clip2 35.6 34.4 34.2 42.3 42 39.3clip3 34.3 34.4 34 43 39.7 44.3

Average MTTF improvements: 5x (thermal aging); 4x (thermal cycling)Das, Anup, Al-Hashimi, Bashir and Merrett, Geoff (2015) Adaptive and hierarchical run-time manager for energy-aware thermal management of embedded systems. ACM Transactions on Embedded Computing Systems, 1-25.

Page 8: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

8

MODEL-BASED RTMStereo Matching Application: http://github.com/PRiME-project/PRiMEStereoMatch

Grayscale & Gradient

Post Processing

LeftImage

RightImage

Depth Map

Cost Volume Construction

Cost Volume Construction

Grayscale & Gradient

Cost Volume Filtering

Cost Volume Filtering

Disparity Selection

Disparity Selection

• Processes stillimages, video ora camera feed

• OpenCL supported

• Includes test datasets

Leech, Charles, Vala, Charan Kumar, Acharyya, Amit, Yang, Sheng, Merrett, Geoffrey and Al-Hashimi, Bashir (2017) Run-time performance and power optimization of parallel disparity estimation on many-core platforms ACM Transactions on Embedded Computing Systems

Page 9: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

9

MODEL-BASED RTMModel Building

Leech, Charles, Vala, Charan Kumar, Acharyya, Amit, Yang, Sheng, Merrett, Geoffrey and Al-Hashimi, Bashir (2017) Run-time performance and power optimization of parallel disparity estimation on many-core platforms ACM Transactions on Embedded Computing Systems

Page 10: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

10

MODEL-BASED RTMRuntime Management

Leech, Charles, Vala, Charan Kumar, Acharyya, Amit, Yang, Sheng, Merrett, Geoffrey and Al-Hashimi, Bashir (2017) Run-time performance and power optimization of parallel disparity estimation on many-core platforms ACM Transactions on Embedded Computing Systems

Page 11: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

11

MODEL-BASED RTM: HETEROGENEITY Heterogeneous Platforms

Run-time changes in:• Performance requirements• Application workload changes

Workload-A

Workload-B

Application

Hardware

Decode

CPU

Filter Display

DSP FGPA

RuntimeModelling Mapping DVFS

Tasks

Yang, Sheng, Shafik, Rishad Ahmed, Merrett, Geoff V., Stott, Edward, Levine, Joshua, Davis, James and Al-Hashimi, Bashir (2015) Adaptive energy minimization of embedded heterogeneous system using regression-based learning. PATMOS 2015, Salvador, BR, 01 - 04 Sep 2015. 8pp.

FPGA

Page 12: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

12

EXECUTING MULTIPLE APPLICATIONS

• Workload and performance variation due to:– Changes within an application

– Changing applications (sequential execution)

• RTM: Change detection and Learning transfer

• Overlapping applications? (concurrent execution)

Shafik, Rishad, Das, Anup, Maeda-Nunez, Luis, Yang, Sheng, Merrett, Geoff and Al-Hashimi, Bashir (2015) Learning transfer-based adaptive energy minimization in embedded systems. IEEE TCAD.

Page 13: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

13

RTM FOR CONCURRENT EXECUTIONMRPI (Memory Reads Per Instruction)

• Supports concurrent execution of applications

• Inter-cluster Thread-to-core Mapping (ITM).

• MRPI informs DVFS control

Reddy, Basireddy Karunakar, Singh, Amit, Biswas, Dwaipayan, Merrett, Geoff and Al-Hashimi, Bashir (2017) Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores IEEE Transactions on Multiscale Computing Systems, pp. 1-14.

Page 14: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

14

MANAGING TEMPERATUREMRPI (Memory Reads Per Instruction)

• Aims to avoid frequency throttling at temperature threshold.

• Predicts temperature using a regression-based model

• Can achieve a 10% improvement in energy and performance

App(s)Profiling

(temp.,freq.,power)

Choose best regression model

Mapping and DVFS setting

RegressionModels

Temperature predictor

Run-time Manager

online

Core frequency without prediction

Core frequency with prediction

http://www.prime-project.org/

Page 15: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

15

TUNING DPM/RTM PARAMETERS

• Tune governor parameters for theexecuting (interactive) workload

• Account for variability in accesstimes and user input

• Prediction/detection dependent

• Energy saving/QoE improvement compared to ‘default’, e.g.– 13% energy saving

– 27% QoE improvement

– 9% energy + 15% QoE

Exynos-5422 A15/A7, Android 6.0Google Chrome browser workloads

Touch input emulationNetwork throttling (UL, DL, RTT latency)

Bantock, James, Robert Benjamin, Tenentes, Vasileios, Al-Hashimi, Bashir and Merrett, Geoffrey (2017) Online tuning of Dynamic Power Management for efficient execution of interactive workloads International Symposium on Low Power Electronics and Design. IEEE. 6 pp.

Page 16: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

16

MEASURING/MODELLING POWERwww.powmon.ecs.soton.ac.uk

Why Power Estimation?

• Few platforms allow hardware measurement of power consumption

• RTMs need to make decisions based on real-time ‘measurements’

• Offline DPM strategy evaluation/design-space exploration

PMCs (Performance Monitoring Counters)

• On several platforms, lowoverhead, many events available

• …but a small number (e.g. 4-6)can be monitored simultaneously

PowMon: A Stable, “Bottom-Up” Approach to Power Modelling

02/12/2015, 13:52Graph Test:Run-Time Power Modelling

Page 1 of 1http://127.0.0.1/~Matthew/Micro_Server/results-plots/results-pmc-drag-drop.html

PMC Exploration Results

Cluster and CorrelationSelect Clustering: None Clustering A Clustering B

Cluster and Correlation

L1D

_TLB

_REF

ILL:

0x05

L1D

_TLB

_REF

ILL_

LD:0

x4C

MEM

_ACC

ESS:

0x13

L1D

_CAC

HE_A

CCES

S:0x

04CY

CLE_

COUN

T:0x

11LD

ST_S

PEC:

0x72

LD_S

PEC:

0x70

MEM

_ACC

ESS_

LD:0

x66

BUS_

CYCL

ES:0

x1D

L1D

_CAC

HE_L

D:0

x40

BR_R

ETRU

N_S

PEC:

0x79

BR_I

ND

IREC

T_SP

EC:0

x7A

INST

_SPE

C:0x

1BL1

I_CA

CHE_

ACCE

SS:0

x14

BR_I

MM

ED_S

PEC:

0x78

DP_

SPEC

:0x7

3BR

_PRE

D:0

x12

BR_M

IS_P

RED

:0x1

0PC

_WRI

TE_S

PEC:

0x76

INST

R_RE

TIRE

D:0

x08

VPF_

SPEC

:0x7

5ST

REX_

PASS

_SPE

C:0x

6DD

MB_

SPEC

:0x7

ELD

REX_

SPEC

:0x6

CL2

D_C

ACHE

_WB:

0x18

L1D

_CAC

HE_W

B:0x

15

L1D

_CAC

HE_R

EFIL

L_LD

:0x4

2

L1D

_CAC

HE_R

EFIL

L_ST

:0x4

3

L1D

_CAC

HE_W

B_VI

CTIM

:0x4

6

L2D

_CAC

HE_W

B_VI

CTIM

:0x5

6BU

S_AC

CESS

_LD

:0x6

0BU

S_AC

CESS

:0x1

9L2

D_C

ACHE

_REF

ILL:

0x17

BUS_

ACCE

SS_S

HARE

D:0

x62

BUS_

ACCE

SS_S

T:0x

61

L2D

_CAC

HE_R

EFIL

L_LD

:0x5

2

L2D

_CAC

HE_R

EFIL

L_ST

:0x5

3L2

D_C

ACHE

_ACC

ESS:

0x16

BUS_

ACCE

SS_N

ORM

AL:0

x64

L2D

_CAC

HE_S

T:0x

51L2

D_C

ACHE

_LD

:0x5

0L1

D_C

ACHE

_REF

ILL:

0x03

L1D

_TLB

_REF

ILL_

ST:0

x4D

UNAL

IGN

ED_L

D_S

PEC:

0x68

UNAL

IGN

ED_S

T_SP

EC:0

x69

UNAL

IGN

ED_L

DST

_SPE

C:0x

6AM

EM_A

CCES

S_ST

:0x6

7ST

_SPE

C:0x

71L1

D_C

ACHE

_ST:

0x41

L2D

_CAC

HE_I

NVA

L:0x

58ST

REX_

FAIL

_SPE

C:0x

6E

L2D

_CAC

HE_W

B_CL

EAN

:0x5

7L1

I_CA

CHE_

REFI

LL:0

x01

L1I_

TLB_

REFI

LL:0

x02

DSB

_SPE

C:0x

7DEX

C_RE

TURN

:0x0

A

BUS_

ACCE

SS_N

OT_

SHAR

ED:0

x63

BUS_

ACCE

SS_P

ERIP

H:0x

65EX

C_TA

KEN

:0x0

9

L1D

_CAC

HE_W

B_CL

EAN

:0x4

7L1

D_C

ACHE

_IN

VAL:

0x48

CID

_WT_

RETI

RED

:0x0

B

TTBR

_WRI

TE_R

ETIR

ED:0

x1C

ISB_

SPEC

:0x7

CAS

E_SP

EC:0

x74

-0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Corr

elat

ion

with

Pow

er

Events

M. J. Walker et al., "Accurate and Stable Run-Time Power Modeling for Mobile and Embedded CPUs," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 1, pp. 106-119, Jan. 2017.

Page 17: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

17

STABLE vs UNSTABLE POWER MODELSwww.powmon.ecs.soton.ac.uk

Our stable approach achieves a low average error and narrow error distribution compared to existing techniques.

09/03/2016, 01:50Graph Test:Run-Time Power Modelling

Page 1 of 1file:///Users/user/Dropbox/ARM/Power_Modelling_Website/paper/comparison-graph.html

Filename: comparison-graph-data.csv

Comparison Graph

a b c d e P0

2

4

6

8

10

12

14

16

18

20

22

Per

cent

age

Err

or (

%)

Power Model

Mean

Last updated:

Training: Small set of 20 workloads

Testing: Full set of 60 workloads

[a] M. Pricopi, T. S. Muthukaruppan, V. Venkataramani, T. Mitra, and S. Vishin, “Power-performance modeling on asymmetric multi-cores,” CASES ’13.[b] M. Walker et al., “Run-time power estimation for mobile and embedded asymmetric multi-core cpus,” HIPEAC Workshop Energy Efficiency with Hetero. Comp. 2015[c] S. K. Rethinagiri et al., “System-level power estimation tool for embedded processor based platforms,” RAPIDO ’14. New York, 2014.[d], [e] R. Rodrigues et al, “A study on the use of performance counters to estimate power in microprocessors,” IEEE TCAS II, vol. 60, no. 12, pp. 882–886, Dec 2013.

M. J. Walker et al., "Accurate and Stable Run-Time Power Modeling for Mobile and Embedded CPUs," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 1, pp. 106-119, Jan. 2017.

Page 18: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

18

IMPROVING USABILITY/EVALUATION

App-RTM interaction

• Communication of controls/monitors(e.g. register/deregister)

• Update controls/monitors (e.g. set/get)

RTM-Device interaction

• Communication of controls/monitors(e.g. register/deregister)

• Update controls/monitors

OpenCL Workload Management

• Tuneable parameters (e.g. thread level parallelism – kernels, device selection)

• Scheduling of kernels to compute devices

Hardware Layer

Application Layer

GPU

DSP

Application

PRiME App Knobs: Parallelism (No.

Kernels), Heterogeneity

PRiME Dev Knobs:V-F, Core Affinity

CPU

FPGA

prime_app_knob_reg()/dereg()prime_app_knob_get()

PRiME App Monitors: Performance (fps),

Accuracy (error rate)

prime_app_mon_reg()/dereg()prime_app_mon_set()

prime_app_mon_weight()

PRiME Dev Monitors: Power, Energy, Temperature

Runtime Manager

Runtime Controller

Runtime Model

prime_dev_knob_reg()/dereg()prime_dev_knob_type()prime_dev_knob_set()

prime_dev_mon_reg()/dereg()prime_dev_mon_type()prime_dev_mon_get()

Workloads (OpenCL Kernels)

Workload Scheduler

prime_app_reg()/dereg()

App 1 App 2 App N

prime_dev.h

prime_app.h

http://www.prime-project.org/

Page 19: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

19

THE PRiME RTM FRAMEWORKExperimental Evaluation

Application-RTM interaction

• 3 Applications

RTM-Device interaction

• 2 Platforms

Page 20: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

20

SUMMARY

PRiME’s Approach to RTM

• Online vs offline+online approaches

• Model-free vs model-based approaches

• Single > multiple > concurrent applications

• Homogeneous vs Heterogeneous platforms

Tools and Support www.prime-project.org

• PowMon power estimationwww.powmon.ecs.soton.ac.uk

• PRiME RTM Framework(available soon)

• PRiMEStereoMatch application• + more...

http://www.prime-project.org/

Page 21: PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED …€¦ · PRiME: POWER-EFFICIENT, RELIABLE, MANY-CORE EMBEDDED SYSTEMS DrGeoff Merrett International Symposium on Many-Core

Any Questions?

Dr Geoff V MerrettAssociate Professor

Electronics and Computer ScienceTel: +44 (0)23 8059 2775Email: [email protected] | www.geoffmerrett.co.ukHighfield Campus, Southampton, SO17 1BJ UK