19
Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim 2,3 1 2 3

Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors

  • Upload
    katy

  • View
    92

  • Download
    0

Embed Size (px)

DESCRIPTION

Optimal Power Allocation for Multiprogrammed Workloads on Single-chip Heterogeneous Processors. Euijin Kwon 1,2 Jae Young Jang 2 Jae W. Lee 2 Nam Sung Kim 2,3. 1. 2. 3. Single-chip heterogeneous processors. Compared to systems based on discrete components Lower communication overhead - PowerPoint PPT Presentation

Citation preview

Page 1: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

Optimal Power Allocation for Multiprogrammed Workloads on

Single-chip Heterogeneous Processors

Euijin Kwon1,2 Jae Young Jang2

Jae W. Lee2 Nam Sung Kim2,3

1 2 3

Page 2: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

2

Single-chip heterogeneous processors

• Compared to systems based on discrete components- Lower communication overhead- Lower power consumption- Lower cost (less silicon)- Emerging application friendly (sequential + parallel processing)

Sources: AMD, Intel, and Samsung

AMD’s Llano Intel’s Sandy Bridge Samsung’s Exynos

Page 3: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

3

Challenges• SCHP’s performance: limited by power budget

- Total chip power budget- CPU/GPU power budget

• Multiprogrammed workload- Workload-aware power allocation- Considering characteristics and metrics

How can optimize overall performance within limited power budget?

Page 4: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

4

Outline

• Motivation• Target platform: SCHP + MW• Workload-aware power allocation

- Characteristics of programs- Evaluation Metrics

• Methodology- Power configuration- Benchmark programs

• Evaluation• Algorithm• Conclusion

Page 5: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

5

Target platform: SCHP + MW• 4-core CPU + 16-SM GPU• Multiple V/F domains DVFS• 2 programs running• Hardware resources evenly divided

GPU0

GPU0 V/F domain

Memory Controllers

MCs V/F domain

CPUCore0

CPUCore1

CPUCore2

CPUCore3

CPU V/F domain(per-core)

GPU1

GPU1 V/F domain

Multiprogrammed Workload

Program 1

Program 2

Page 6: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

6

Workload-aware power allocation• Characteristics of programs

- Non-uniform performance sensitivities • Evaluation metrics

- Throughput vs. Energy efficiency

Nor

mal

ized

thro

ughp

ut

Allocating more power to mri-q

28.6 34.2 39.8 48.6 59.0 0.8

1.0

1.2

1.4

1.6

1.8

2.0

compute-bound (mri-q)memory-bound (stream-copy)

Power allocation (using the same HW)

Page 7: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

7

Outline

• Motivation• Target platform: SCHP + MW• Workload-aware power allocation

- Characteristics of programs- Evaluation Metrics

• Methodology- Power configuration- Benchmark programs

• Evaluation• Algorithm• Conclusion

Page 8: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

8

Methodology: shared power budget

• Can change the power budget for 17.4

24.8

34.2

46.4

62.8 11.2

16.8

22.4

31.2

41.6 11.2

16.8

22.4

31.2

41.6

CPU 2 GPU 1 GPU 2

Power Configuration

Output

17.4

24.8

34.2

46.4

62.8

CPU 1

• Total chip power budget = 100 W• CPU power budget = 80 W• GPU power budget = 64 W• Baseline configuration

- Evenly divided (25 W for each CPU/GPU group)

Throughput EnergyEfficiency

Page 9: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

9

Methodology: benchmark programs

• Used 6 benchmark programs.• Divided into 3 groups depending on characteristics

Benchmark Acronym Source Characteristics

Magnetic Resonance Imaging Q MRQ Parboil Compute-bound

Stream Cluster SCL Rodinia Compute-bound

Hotspot HOT Rodinia Neutral

Sum of Absolute Difference SAD Parboil Neutral

Stencil STN Parboil Memory-bound

Stream Copy SCP CS Virginia Memory-bound

Page 10: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

10

Outline

• Motivation• Target platform: SCHP + MW• Workload-aware power allocation

- Characteristics of programs- Evaluation Metrics

• Methodology- Power configuration- Benchmark programs

• Evaluation• Algorithm• Conclusion

Page 11: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

11

Evaluation: case study 1 (compute- vs. memory-bound)

19% throughput improvement 32% energy efficiency improvement

• Allocating more power to compute-bound• Optimal points vary depending on metrics.

Page 12: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

12

Evaluation: case study 2 (memory- vs. memory-bound)

10% throughput improvement 32% energy efficiency improvement

• Equally allocated power• Again, optimal point depends on

- Evaluation metric- Workload characteristics (compute- or memory-bound)

Page 13: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

13

Evaluation: variation of optimal configuration

• Depending on programs’ characteristics and evaluation metrics

P1 P2Metric 1: throughput Metric 2: energy efficiency

P1 (Watt) P2 (Watt) P1 (Watt) P2 (Watt)CPU GPU CPU GPU CPU GPU CPU GPU

MRQ (C) SCL(C) 17.4 31.2 17.4 31.2 17.4 16.8 17.4 16.8SCP (M) STN (M) 17.4 31.2 17.4 31.2 17.4 11.2 17.4 11.2SAD (N) HOT (N) 17.4 31.2 17.4 31.2 17.4 11.2 17.4 16.8MRQ (C) SCP (M) 17.4 41.6 17.4 22.4 17.4 22.4 17.4 16.8SCL (C) SCP (M) 17.4 41.6 17.4 22.4 17.4 11.2 17.4 11.2

HOT (N) MRQ(N) 17.4 31.2 17.4 31.2 17.4 11.2 17.4 22.4MRQ (C) SAD (N) 17.4 31.2 17.4 31.2 17.4 16.8 17.4 22.4SCL (C) SAD (N) 17.4 31.2 17.4 31.2 17.4 16.8 17.4 11.2

HOT (N) STN (M) 17.4 41.6 17.4 22.4 17.4 11.2 17.4 11.2HOT (N) SCP (M) 17.4 41.6 17.4 22.4 17.4 11.2 17.4 11.2SAD (N) SCP (M) 17.4 41.6 17.4 22.4 17.4 11.2 17.4 22.4

Page 14: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

14

Evaluation: performance improvement from optimal power allocation

• Achieved significant improvement- 12% for throughput- 18% for energy efficiency

MRQ

vs.

SCL (

CC)

SCP

vs. S

TN (M

M)

SAD

vs. H

OT

(NN)

MRQ

vs.

SCP

(CM

)

SCL v

s. SC

P (C

M)

HOT

vs. M

RQ (N

C)

MRQ

vs.

SAD

(CN)

SCL v

s. SA

D (C

N)

HOT

vs. S

TN (N

M)

HOT

vs. S

CP (N

M)

SAD

vs. S

CP (N

M)

GEO

MEA

N

0.9

1.1

1.3

Normalized IPS Normalized IPS/W

Page 15: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

15

Algorithm for throughput maximization

calculate (slope)

abs(sp1-sp2) < threshold

sp1 > sp2

alloc(p2_more)

alloc(p1_more)

alloc(equally)

wait(regular_time)

YES

YES

NO

NO

Nor

mal

ized

thro

ughp

ut

28.6 34.2 39.8 48.6 59.0 0.8

1.0

1.2

1.4

1.6

1.8

2.0

compute-bound (mri-q)memory-bound (stream-copy)

Power allocation

Page 16: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

16

Algorithm for energy efficiency maximization

final = min_power

EE(final) == MAX

EE(final, p1++) > EE(final, p2++)

final = (final, p1++)

final = (final, p2++)

exit

MAX = max( EE(final), EE(final, p1++), EE(final, p2++) )

• Gradient search from the minimum power allocation

Page 17: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

17

Conclusion

• We propose a solution for optimal power allocation - Workload-aware power allocation- By using program characteristics and evaluation metrics

• Significant performance improvement achieved- 12% for throughput- 18% for energy efficiency

• Run-time algorithms effectively find (near-)optimal power allocation

Page 18: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

18

Backup slides

Page 19: Optimal Power Allocation for Multiprogrammed Workloads on  Single-chip Heterogeneous Processors

19

Simulator• Integrated CPU + GPU simulator

- H. Wang, V. Sathish, R. Singh, M. Schulte and N. Kim, "Workload and Power Budget Partitioning for Single-Chip Heterogeneous Processors," in PACT, 2012.

- http://cpu-gpu-sim.ece.wisc.edu/- gem5 + GPGPU-Sim

• Adaptive power allocation for multiprogrammed workload- Per-core V/F domains for CPU- 2 V/F domains for GPU