17
ADAPTIVE CACHE- LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Embed Size (px)

DESCRIPTION

3D Integration  Stacking the main memory on processors  Connecting them by wide on-chip buses  The memory bandwidth can be improved 3

Citation preview

Page 1: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORSTakatsugu Ono, Koji Inoue and Kazuaki MurakamiKyushu University, Japan

ISOCC 2009

Page 2: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Outline Introduction Software Controllable-Variable

Line Size(SC-VLS) Cache

Evaluation Summary

2

Page 3: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

3D Integration Stacking the

main memory on processors

Connecting them by wide on-chip buses

The memory bandwidth can be improved

Processor Core

$DL1 $IL1

Main Memory

wide on-chip bus

3

Page 4: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Motivation4

3D stacking makes it possible to reduce the cache miss penalty

We can employ larger cache line size in order to expect the effect of prefetching

But… if programs don’t have high spatial localities of memory references

It might worsen the performanceA large amount of energy is required!

Page 5: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Software-Controllable Variable Line-Size Cache (1/3)

5

We propose SC-VLS cache It attempts to optimize the amount of

data to be transferred between cache memory and main memory

When a program does not require high memory bandwidth ⇒ SC-VLS cache reduces the cache line size

Page 6: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Software-Controllable Variable Line-Size Cache (2/3)

6

Features SC-VLS cache doesn’t require any

hardware monitor to decide the line size

Advantages SC-VLS cache reduces energy

consumption with trivial hardware overhead

Page 7: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Software-Controllable Variable Line-Size Cache (3/3)

7

Adequate line size analysis Before an application program is executed,

we analyze an adequate line size of each function

Code generation Line size change instructions are inserted into

start of functions in original program code The instruction sets status register to indicate

an adequate line size

Page 8: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Adequate Line Size Analysis- Example-

8

5|100 6|200 2|100 5|100 2|200

foo1() foo2() foo3() foo1() foo2()

line size = 64B

ave. MR64Bfoo1()= 10/200 = 5.0%ave. MR64Bfoo2()= 8/400 = 2.0%ave. MR64Boo3()= 2/100 = 2.0%

2|100 2|200 1|100 2|100 14|200

foo1() foo2() foo3() foo1() foo2()

line size = 32Bave. MR32Bfoo1()= 4/200 = 2.0%ave. MR32Bfoo2()= 16/400 = 4.0%ave. MR32Bfoo3()= 1/100 = 1.0%

32B 64B 32B 32B 64B

foo1() foo2() foo3() foo1() foo2()

adequate line sizefoo1() = 32Badequate line sizefoo2() = 64Badequate line sizefoo3() = 32B

MR64B≒ 2.9%

MR32B = 3.0%

MRadequate ≒ 1.9%

# of misses # of accesses

2|100 6|200 1|100 2|100 2|200

line size = adequate

Page 9: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Evaluation Simulator

SimpleScalar and CACTI Benchmark programs

10 programs (MiBench) Input data sets

Analysis phase: small Execution phase: large

The SC-VLS cache can dynamically choose four line sizes; 32B, 64B, 128B and 256B

9

Page 10: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Energy

bitcou

nt madtiff

2bw

dijkst

ra

rijnda

el_en

c

rijnda

el_de

csha

adpcm

_enc

adpcm

_dec

lame

00.5

11.5

22.5

3

FIX32B FIX64B FIX128B FIX256B SC-VLS

Benchmark programs

Nor

mal

ized

ene

rgy

11.4

3.7

3.7

9.0

4.5

11.4

11.3

7.1

5.219.3

10

Page 11: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Performance

bitcou

ntmad

tiff2b

wdij

kstra

rijnda

el_en

c

rijnda

el_de

csha

adpcm

_enc

adpcm

_dec

lame

0.940.960.98

11.021.041.061.08

FIX32B FIX64B FIX128B FIX256B SC-VLS

Benchmark programs

Nor

mal

ized

exe

cutio

n tim

e

11

Page 12: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Summary 3D integration

can improve memory bandwidth makes it possible to reduce the cache

miss penalty SC-VLS cache

can dynamically change the line sizes reduces the energy consumption up

to 75%

12

Page 13: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

THANK YOU

ACKNOWLEDGEMENTThis research was supported in part by New Energy and Industrial Technology Development Organization

Page 14: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

ArchitectureTag Index Offset

MUX

Processor

Status Reg.

Address

Set an adequate line size

Data

Hit / Miss

Tag Minimumline sizeValid bit

SRAM cell array=

==

==

==

MUX

DRAM cell array

32B

TSV

32B

32B

32B

32B

32B

32B

32B

14

Page 15: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Adequate Line Size Analysis15

We execute cache simulation with each line size independently to determine an adequate line size

1. An average cache miss rate of each function is calculated

2. We compare the average cache miss rates with all line size candidates

3. A line size which the cache miss rate is the smallest is determined as an adequate line size

Page 16: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Energy Model16

mmAC

iiaccessSADRAMaccessLL NEEACE

111mem

# L1 memory access

Total energy of stacked DRAM

average energy for a cache access

Total energy of $L1

# main memory access

average energy for a cache access# activated DRAM sub-array

Page 17: ADAPTIVE CACHE-LINE SIZE MANAGEMENT ON 3D INTEGRATED MICROPROCESSORS Takatsugu Ono, Koji Inoue and Kazuaki Murakami Kyushu University, Japan ISOCC 2009

Average SC-VLS Cache Line Size

Benchmarks Average SC-VLS cache line size (B)

bitcount 81.94mad 233.60

tiff2bw 255.99dijkstra 223.04

rijndael_enc 64.82rijndael_dec 33.01

sha 141.90adpcm_enc 233.40adpcm_dec 255.67

lame 254.78

17