24
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of Michigan Nam Sung Kim, Steve Martin, David Blaauw & Trevor Mudge In-class presentation on 11/24/2008 by : Harshit Khanna (1200127817) 1 Arizona State University CSE 520 Advanced Computer Architecture ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE 2002, VOL 29, pages 148-157

Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Embed Size (px)

Citation preview

Page 1: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Drowsy Caches: Simple Techniques for Reducing Leakage Power

Authors:ARM LtdKrisztián Flautner, Advanced Computer Architecture Lab, The University of MichiganNam Sung Kim, Steve Martin, David Blaauw & Trevor Mudge

In-class presentation on 11/24/2008 by : Harshit Khanna (1200127817)

1Arizona State University CSE 520 Advanced Computer Architecture

ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE2002, VOL 29, pages 148-157

Page 2: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Outline• Summary• Motivation• Circuit Techniques

– Traditional Circuit Techniques• Gated-VDD• ABB-MTCMOS• Dynamic VDD Scaling (DVS)

– Comparison of various low-leakage circuit techniques– Proposed circuit technique

• Policies– Implementation of drowsy cache line– Additions to the traditional cache line– Basic working description– Working set characteristics– Observations– Results

• Policy evaluation– Policy evaluation– Test Setup– Energy consumption• Future work

Arizona State University CSE 520 Advanced Computer Architecture 2

Page 3: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Summary• Simplest policy – cache lines are periodically put

into a low-power mode without regard to their access histories - can reduce the cache’s static power consumption by more than 80%.

• Total energy consumed in the cache can be reduced by an average of 54%.

• Fraction of leakage energy is reduced from an average of 76% in projected conventional caches to an average of 50% in the drowsy cache.

• Performance degradation - 9% for crafty & < 4% for equake.

3Arizona State University CSE 520 Advanced Computer Architecture

Page 4: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Motivation• Speed density leakage (static) power consumption • Leakage power accounts for 15%-20% of the total power on chips.• As processor technology moves below 0.1 micron, static power consumption is set

to increase exponentially, setting static power consumption on the path to dominating the total power used by the CPU.

• The on-chip caches are one of the main candidates for leakage reduction since they contain a significant fraction of the processor’s transistors.

4Arizona State University CSE 520 Advanced Computer Architecture

Page 5: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Circuit Techniques

Arizona State University CSE 520 Advanced Computer Architecture 5

Page 6: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Traditional Circuit Techniques• Gated-VDD– Working:

• Reduces the leakage power by using a high threshold (high-Vt) transistor to turn off the power to the memory cell when the cell is set to low-power mode.

– Advantages :• Leakage significantly reduced.

– Disadvantages :• It loses any information stored in the cell when switched

into low-leakage mode.• Performance penalty.• Requires special high-Vt devices for the control logic.

6Arizona State University CSE 520 Advanced Computer Architecture

Page 7: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Traditional Circuit Techniques (contd.)

• ABB-MTCMOS– Working:

• Threshold voltages of the transistors in the cell are dynamically increased when the cell is set to drowsy mode by raising the source to body voltage of the transistors in the circuit.

– Advantages :• Leakage significantly reduced

– Disadvantages:• Supply voltage of the circuit is increased, thereby offsetting

some of the gain in total leakage power.• Requires special high-Vt devices for the control logic.

7Arizona State University CSE 520 Advanced Computer Architecture

Page 8: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Dynamic VDD Scaling (DVS)

• Disadvantages– Process variation dependent.– More noise susceptible.

• Advantages– Retains cell information in low-power mode.– Fast switching between power modes.– Easy implementation.– More power reduction than ABB-MTCMOS.

Arizona State University CSE 520 Advanced Computer Architecture 8

Page 9: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Comparison of various low-leakage circuit techniques

9Arizona State University CSE 520 Advanced Computer Architecture

Page 10: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Proposed circuit technique

• Choose between two different supply voltages in each cache line.

• DVS technique - used in the past to trade off dynamic power consumption and performance.

• Exploiting voltage scaling to reduce static power consumption.

• Due to short-channel effects in deep-submicron processes, leakage current reduces significantly with voltage scaling.

Arizona State University CSE 520 Advanced Computer Architecture 10

Page 11: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Policies

Arizona State University CSE 520 Advanced Computer Architecture 11

Page 12: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Arizona State University CSE 520 Advanced Computer Architecture 12

•L1 drowsy data caches.•All lines in an L2 cache can be kept in drowsy mode without significant impact on performance.

Implementation of the drowsy cache line

Page 13: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Additions to the cache line• word line gating circuit– prevent accesses when in drowsy mode since unchecked

accesses to a drowsy line could destroy the memory’s contents.

• voltage controller– Determines operating voltage of an array of memory cells

in the cache line– It switches the array voltage between the high (active) and

low (drowsy) supply voltages depending on the state of the drowsy bit.

• drowsy bit– Controlling the voltage to the memory cells

Arizona State University CSE 520 Advanced Computer Architecture 13

Page 14: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Basic working description•If a drowsy cache line is accessed, the drowsy bit is cleared, and consequently the supply voltage is switched to high VDD.

•The wordline gating circuit is used to prevent accesses when in drowsy mode, since the supply voltage of the drowsy cache line is lower than the bit line precharge voltage; unchecked accesses to a drowsy line could destroy the memory’s contents.

•Whenever a cache line is accessed, the cache controller monitors the condition of the voltage of the cache line by reading the drowsy bit.

If (accessed line == normal mode)Then read the contents of the cache line (without losing any performance because the power mode of the line can be checked by reading the drowsy bit concurrently with the read and comparison of the tag).

If (accessed line == drowsy mode)Then prevent the discharge of the bit lines of the memory array (because it may read out incorrect data). The line is woken up automatically during the next cycle, and the data can be accessed during consecutive cycles.

Arizona State University CSE 520 Advanced Computer Architecture 14

Page 15: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Arizona State University CSE 520 Advanced Computer Architecture 15

Working set characteristics

ExecFactor - expected worst-case execution time increase for the baseline algorithmaccs - the number of accesseswakelatency - wakeup latency = 1 cycleaccsperline - number of accesses per lineMemimpact (how much impact a single memory access has on overall performance)assumption : increase in cache access latency = increase in execution timeSo memimpact is set to 1

Page 16: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Observations

Arizona State University CSE 520 Advanced Computer Architecture 16

Should tags be put into drowsy mode along with the data?

•In both cases, no extra latencies are involved when an awake line is accessed

•In direct-mapped caches there is no performance advantage to keeping the tags awake. There is only one possible line for each index, thus if that line is drowsy, it needs to be woken up immediately to be accessed.

•A drowsy access takes at least three cycles to complete

Page 17: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Results• The fraction of unique cache lines accessed during an

update window—is relatively small.• On most benchmarks more than 90% of the lines can

be in drowsy mode at any one time.• Performance degradation - 9% for crafty & < 4% for

equake.• Advantages:– Significantly reduce the static power consumption of the

cache– prediction techniques to control the drowsy cache not

necessary if drowsy cache can transition between drowsy and awake modes relatively quickly.

Arizona State University CSE 520 Advanced Computer Architecture 17

Page 18: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Arizona State University CSE 520 Advanced Computer Architecture 18

Policy evaluation

Page 19: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Policy evaluation• The following parameters can be varied:

• Update window size: specifies in cycles how frequently decisions are made about which lines to put into drowsy mode.

• Simple or Noaccess policy: The policy that uses no perline access history is referred to as the simple policy. In this case, all lines in the cache are put into drowsy mode periodically (the period is the window size). The noaccess policy means that only lines that have not been accessed in a window are put into drowsy mode.

• Awake or drowsy tag: specifies whether tags in the cache may be drowsy or not.

• Transition time: the number of cycles for waking up or putting to sleep cache lines. They only consider 1 or 2 cycle transition times, since the circuit simulations indicate that these are reasonable assumptions.

Arizona State University CSE 520 Advanced Computer Architecture 19

Page 20: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Test setup

• They use various benchmarks from the SPEC2000 suite on SimpleScalar using the Alpha instruction set.

• All simulations were run for 1 billion instructions.• The simulator configuration parameters are

summarized below:– OO4: 4-wide superscalar pipeline, 32K direct-mapped L1

icache, 32 byte line size - 1 cycle hit latency, 32K 4-way set associative L1 dcache, 32 byte line size - 1 cycle hit latency, 8 cycle L2 cache latency.

– IO2: 2-wide in-order pipeline, cache parameters same as for OO4.

Arizona State University CSE 520 Advanced Computer Architecture 20

Page 21: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Energy consumption• The authors find that the simple policy with a window size of 4000

cycles reaches a reasonable compromise between simplicity of implementation, power savings, and performance.

• The impact of this policy on leakage energy is characterized by :– Normalized total energy - the ratio of total energy used in the drowsy

cache divided by the total energy consumed in a regular cache. – Normalized leakage energy - the ratio of leakage energy in the drowsy

cache to leakage energy in a normal cache. – The data in the DVS columns - energy savings resulting from the

scaled-VDD (DVS) circuit technique.– The data in the theoretical minimum column - assumes that leakage in

low-power mode can be reduced to zero (without losing state). i.e. it estimates the energy savings given the best possible hypothetical circuit technique.

Arizona State University CSE 520 Advanced Computer Architecture 21

Page 22: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Arizona State University CSE 520 Advanced Computer Architecture 22

• Drowsy cache implementation – reduces the total energy consumed in the data cache by more than 50% without significantly impacting performance. • Total leakage energy is reduced by :

- average of 71% when tags are always awake.- average of 76% using the drowsy tag scheme.

Page 23: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Future work• The proposed scheme is not a solution to all

caches in the processor.• L1 instruction cache does not do as well with the

proposed algorithm.• Investigate the use of instruction prefetch

algorithms combined with the drowsy circuit technique.

• Extension of these techniques to other memory structures, such as branch predictors.

• Impact of having adaptive window size.

Arizona State University CSE 520 Advanced Computer Architecture 23

Page 24: Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of

Thank youQuestions?

Arizona State University CSE 520 Advanced Computer Architecture 24