31
synergy.cs.vt .edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin , Pavan Balaji (ANL), Wu- chun Feng

Power and Performance Characterization of Computational Kernels on the GPU

  • Upload
    bernie

  • View
    43

  • Download
    0

Embed Size (px)

DESCRIPTION

Power and Performance Characterization of Computational Kernels on the GPU. Yang Jiao , Heshan Lin , Pavan Balaji (ANL), Wu-chun Feng. Graphic Processing Units (GPU) are Powerful. - PowerPoint PPT Presentation

Citation preview

Page 1: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Power and Performance Characterization of

Computational Kernels on the GPUYang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng

Page 2: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Graphic Processing Units (GPU) are Powerful

* Data and image source, http://people.sc.fsu.edu/~jburkardt/latex/ajou_2009_parallel/ajou_2009_parallel.html

Page 3: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

GPU is Increasingly Popular in HPC Three out of top five supercomputers are GPU-

based

Page 4: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

GPUs are Power Hungry

Xeon GTX280 Fermi0

50

100

150

200

250

300

350Th

erm

al D

esig

n Po

wer

(Wat

ts)

It is imperative to investigate Green GPU computing

Page 5: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Green Computing with DVFS on CPUs Mechanism

Minimizing performance impact Lower voltage and frequency when CPU not in critical

path

What about GPUs?

Power Voltage∝ 2 × Frequency

Page 6: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

What is this Paper about? Characterize performance and power for

various kernels on GPUs Kernels with different compute and memory

intensiveness Various core and memory frequencies

Contributions Reveal unique frequency scaling behaviors on GPUs Provide useful hints for green GPU computing

Page 7: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work

Page 8: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

NVIDIA GTX280 Architecture

8

On-chip memory • Small sizes• Fast access

Off-chip memory • Large size• High access latency

Device (Global) Memory

Page 9: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

OpenCL Write once, run on any GPUs Allow programmer to fully exploit power of

GPUs Compute kernel: function executed on a GPU

OpenCL Device Abstraction

Page 10: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

GPU Frequency Scaling Two dimensional

Compute core frequency and memory frequency

Semi-automatic Dynamic configuration not supported User can only control peak frequencies Automatically switch to idle mode when no

computation

Details not available to public

Page 11: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work

Page 12: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Kernel Selection High performance of GPUs

Massive parallelism (e.g., 240 cores) High memory bandwidth (e.g., 140GB/s)

Three kernels of computational diversityCompute Intensive

Memory Intensive

Matrix Multiplication

Matrix Transpose

Fast Fourier Transform (FFT)

Page 13: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Kernel Characteristics Memory to compute ratio

Instruction throughput

Rmem =#Global_Memory _Transactions#Computation _ Instructions

Rins =#Computation _ Instructions

GPU _Time

Page 14: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Kernel Profile

Matrix Multiplication

Matrix Transpose

FFT

Rmem 5.6% 53.7% 8.3%Rins 203215711 12095895 145165788

Page 15: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Measurement Performance

Matrix multiplication, FFT: GFLOPS Matrix transpose: MB/s

Energy Whole system when executing the kernel on the GPU

Power Reported using the average power

Energy Efficiency Performance / power

Page 16: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Outline Introduction GPU Overview Characterization Methodology Experimental Results Conclusion & Future Work

Page 17: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Experimental Setup System

Intel Core 2 Quad Q6600 NVIDIA GTX280 1GB memory

Power Meter Watts Up? Pro ES

Page 18: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Matrix Multiplication - Performance Mostly affected by core frequency, almost not

affected by memory frequency

400 450 500 550 600 650 70085

95

105

115

125

135

145

155

600700800900100011001200

GPU Core Frequency (MHz)

Perf

orm

ance

(GFL

OPS

)

Page 19: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Matrix Multiplication - Power Mostly affected by core frequency, slightly

affected by memory frequency

400 450 500 550 600 650 700245

255

265

275

285

295

305

315

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er (W

atts)

Page 20: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Matrix Multiplication - Efficiency Best efficiency achieved at highest core

frequency and relatively high memory frequency

400 450 500 550 600 650 700340360380400420440460480500

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er E

ffici

ency

(M

FLO

PS/W

att)

Page 21: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Matrix Transpose - Performance Performance dominated by memory frequency

400 450 500 550 600 650 700150

170

190

210

230

250

270

600700800900100011001200

GPU Core Frequency (MHz)

Perf

orm

ance

(MB/

s)

Page 22: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Matrix Transpose - Power Higher core frequency increase power

consumption (not performance)

400 450 500 550 600 650 700195200205210215220225230235240

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er (W

atts)

Page 23: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Matrix Transpose - Efficiency Best efficiency achieved at highest memory

frequency and lowest core frequency

400 450 500 550 600 650 700650

750

850

950

1050

1150

1250

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er E

ffici

ency

(KBP

S/W

att)

Page 24: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

FFT - Performance Affected by both core and memory frequencies

400 450 500 550 600 650 70040455055606570758085

600700800900100011001200

GPU Core Frequency (MHz)

Perf

orm

ance

(GFL

OPS

)

Page 25: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

FFT - Power Affected by both core and memory frequencies

400 450 500 550 600 650 700225

235

245

255

265

275

285

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er (W

atts)

Page 26: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

FFT - Efficiency Best efficiency at highest core and memory

frequencies

400 450 500 550 600 650 700185

205

225

245

265

285

305

600700800900100011001200

GPU Core Frequency (MHz)

Pow

er E

ffcie

ncy

(GFL

OPS

/w)

Page 27: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

FFT – Two Dimensional Effect

Power (Watts) Efficiency (Mflops/Watt)225

230

235

240

245

250

255

260

265

270

<550, 1200><600, 1000><700, 800>

7%

Page 28: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Power and Efficiency Range

Power Efficiency0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Matrix MultiplicationMatrix TransposeFFT

Page 29: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Conclusion & Future Work To take away

Green computing on GPUs are important GPU frequency scaling considerably different than

CPUs

Next Finer-grained level of characterization (e.g., different

types of operations) Experiments on Fermi and AMD GPUs

Page 30: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Acknowledgment NSF Center for High Performance

Reconfigurable Computing (CHREC) for their support through NSF I/UCRC Grant IIP-0804155;

National Science Foundation for their support partialy through CNS-0915861 and CNS-0916719.

Page 31: Power and Performance Characterization of Computational Kernels on the GPU

synergy.cs.vt.edu

Questions?