21
Synonymous Address Compaction Synonymous Address Compaction for Energy Reduction in Data for Energy Reduction in Data TLB TLB Chinnakrishnan Ballapuram Hsien-Hsin S. Lee Hsien-Hsin S. Lee Milos Prvulovic Milos Prvulovic School of Electrical and Computer School of Electrical and Computer Engineering Engineering College of Computing College of Computing Georgia Institute of Technology Georgia Institute of Technology Atlanta, GA 30332 Atlanta, GA 30332

Synonymous Address Compaction for Energy Reduction in Data TLB

  • Upload
    keiji

  • View
    31

  • Download
    2

Embed Size (px)

DESCRIPTION

Synonymous Address Compaction for Energy Reduction in Data TLB. Chinnakrishnan Ballapuram Hsien-Hsin S. Lee Milos Prvulovic School of Electrical and Computer Engineering College of Computing Georgia Institute of Technology Atlanta, GA 30332. Background. Address Translation - PowerPoint PPT Presentation

Citation preview

Page 1: Synonymous Address Compaction for Energy Reduction in Data TLB

Synonymous Address Compaction Synonymous Address Compaction for Energy Reduction in Data TLBfor Energy Reduction in Data TLB

Chinnakrishnan Ballapuram

Hsien-Hsin S. Lee Hsien-Hsin S. Lee

Milos PrvulovicMilos Prvulovic

School of Electrical and Computer EngineeringSchool of Electrical and Computer Engineering

College of ComputingCollege of ComputingGeorgia Institute of TechnologyGeorgia Institute of Technology

Atlanta, GA 30332Atlanta, GA 30332

Page 2: Synonymous Address Compaction for Energy Reduction in Data TLB

2Ballapuram et al., Georgia Tech

BackgroundBackground

Address Translation Major power processor power contributors I-TLB and D-TLB lookup for every instruction

and memory reference TLBs are highly associative

Multi-porting increasing power consumption

Page 3: Synonymous Address Compaction for Energy Reduction in Data TLB

3Ballapuram et al., Georgia Tech

Outline Outline

Motivation Unique access behavior and locality are

analyzed for energy reduction opportunities

Synonymous Address Compaction Intra-Cycle Compaction Inter-Cycle Compaction Implementation Details

Performance/Energy Evaluation Conclusions

Page 4: Synonymous Address Compaction for Energy Reduction in Data TLB

4Ballapuram et al., Georgia Tech

Breakdown of d-TLB accessesBreakdown of d-TLB accesses

More than 1 d-TLB lookup for 58% accesses (4-wide machine) They often access the same page (intra-cycle synonymous accesses)

0%

20%

40%

60%

80%

100%

1 dtlb access / clk 2 dtlb accesses / clk 3 dtlb accesses / clk 4 dtlb accesses / clk

% o

f da

ta T

LB

acc

ess

es

Page 5: Synonymous Address Compaction for Energy Reduction in Data TLB

5Ballapuram et al., Georgia Tech

Breakdown of Synonymous Intra-cycle Accesses in d-TLBBreakdown of Synonymous Intra-cycle Accesses in d-TLB

~30% of accesses have synonyms indicating redundancy With intra-cycle compaction, 1/2 of syn(1) accesses, 2/3 of syn(2)

accesses, and 3/4 of syn(3) accesses can be eliminated

% o

f da

ta T

LB

acc

ess

es

0%

20%

40%

60%

80%

100%

syn(3) in 4 mem ref / clk

syn(2) in 4 mem ref / clk

syn(2) in 3 mem ref / clk

syn(1) in 4 mem ref / clk

syn(1) in 3 mem ref / clksyn(1) in 2 mem ref / clk

syn(0) in 4 mem ref / clk

syn(0) in 3 mem ref / clk

syn(0) in 2 mem ref / clk

syn(0) in 1 mem ref / clk

MiBench SPEC2000

Page 6: Synonymous Address Compaction for Energy Reduction in Data TLB

6Ballapuram et al., Georgia Tech

Inter-cycle Reuse of d-TLB TranslationsInter-cycle Reuse of d-TLB Translations

Inter-cycle synonymous accesses 68% of accesses could reuse the last address translation

More reuses can be achieved by partitioning dTLB into stack (99%), global (82%), and heap (75%)

% o

f da

ta T

LB

acc

ess

es

40%

50%

60%

70%

80%

90%

100%

Baseline stack-TLB global-TLB heap-TLB

Page 7: Synonymous Address Compaction for Energy Reduction in Data TLB

7Ballapuram et al., Georgia Tech

Dynamic Data Memory DistributionDynamic Data Memory Distribution

~40 % of the dynamic memory accesses go to the stack which is concentrated on only few pages

4 memory accesses ~= 2 stack, 1 global and 1 heap

0%

20%

40%

60%

80%

100%

stack global heap text env

Page 8: Synonymous Address Compaction for Energy Reduction in Data TLB

8Ballapuram et al., Georgia Tech

Semantic-Aware Memory ArchitectureSemantic-Aware Memory Architecture

To Processor

Unified L2 Cache

Data Address Router

gCachehCache

ld_data_base_regld_env_base_reg

ld_data_bound_reg

gTLB 0 1 2 3

To Processor

Virtual address

uTLB 0 1

63

Most of the memory accesss Most of the memory accesss go to smaller stack and go to smaller stack and global TLB/cacheglobal TLB/cache Reducing powerReducing power

sTLB 0

1

sCache

Page 9: Synonymous Address Compaction for Energy Reduction in Data TLB

9Ballapuram et al., Georgia Tech

VPN compaction mechanismsVPN compaction mechanisms0xdeadbeee 0xdeadbeef 0xdeadbef0Cycle i

Cycle (i+1) 0xdeadbef2 0xdeadbeef 0x12345678

0xffffffff

-----

0xdeadb 0xdeadb 0xdeadbCycle i

Cycle (i+1) 0xdeadb 0xdeadb 0x12345

0xfffff

-----

Virtual address access sequenceVirtual address access sequence

VPN translation lookup in d-TLBVPN translation lookup in d-TLB

Page 10: Synonymous Address Compaction for Energy Reduction in Data TLB

10Ballapuram et al., Georgia Tech

VPN compaction mechanismsVPN compaction mechanisms0xdeadbeee 0xdeadbeef 0xdeadbef0Cycle i

Cycle (i+1) 0xdeadbef2 0xdeadbeef 0x12345678

0xffffffff

-----

Intra-cycle compactionIntra-cycle compaction

0xdeadb 0xdeadb 0xdeadbCycle i

Cycle (i+1) 0xdeadb 0xdeadb 0x12345

0xfffff

-----

Virtual address access sequenceVirtual address access sequence

VPN translation lookup in d-TLBVPN translation lookup in d-TLB

0xdeadb ----- -----Cycle i

Cycle (i+1) 0xdeadb ----- 0x12345

0xffffffff

-----

VPNs after intra-cycle compactionVPNs after intra-cycle compaction

Page 11: Synonymous Address Compaction for Energy Reduction in Data TLB

11Ballapuram et al., Georgia Tech

VPN compaction mechanismsVPN compaction mechanisms0xdeadbeee 0xdeadbeef 0xdeadbef0Cycle i

Cycle (i+1) 0xdeadbef2 0xdeadbeef 0x12345678

0xffffffff

-----

Intra-cycle compactionIntra-cycle compaction

0xdeadb 0xdeadb 0xdeadbCycle i

Cycle (i+1) 0xdeadb 0xdeadb 0x12345

0xfffff

-----

Virtual address access sequenceVirtual address access sequence

VPN translation lookup in d-TLBVPN translation lookup in d-TLB

Inter-cycle compactionInter-cycle compaction

0xdeadb ----- -----Cycle i

Cycle (i+1) 0xdeadb ----- 0x12345

0xffffffff

-----

VPNs after intra-cycle compactionVPNs after intra-cycle compaction

0xdeadb 0xdeadb 0xdeadbCycle i

Cycle (i+1) ----- ----- 0x12345

0xfffff

-----

VPNs after inter-cycle compactionVPNs after inter-cycle compaction

Page 12: Synonymous Address Compaction for Energy Reduction in Data TLB

12Ballapuram et al., Georgia Tech

Intra-cycle compaction mechanismIntra-cycle compaction mechanism

Reservation Station

AGUs FPUsIUs

LoadBuffer

StoreBuffer

Six 20-bit comparators

32-entry fully-associativeData TLBs

Memory OrderBuffer

Physical Address

AGUs IUs

Page 13: Synonymous Address Compaction for Energy Reduction in Data TLB

13Ballapuram et al., Georgia Tech

Comparator LogicComparator Logic

1 = 2 = 3 = 4

VPN1 == VPN2

VPN1 == VPN3

VPN1 == VPN4

VPN2 == VPN3

VPN2 == VPN4

VPN3 == VPN4

1 = 3 = 4

2 = 3 = 4

1 = 2 = 4

1 = 2 = 3

MEMORY ORDER BUFFER

1 = 2

1 = 3

1 = 4

2 = 3

2 = 4

3 = 4

Page 14: Synonymous Address Compaction for Energy Reduction in Data TLB

14Ballapuram et al., Georgia Tech

Inter-cycle Compaction MechanismInter-cycle Compaction Mechanism

To Processor

Unified L2 Cache

Data Address Router

gCachehCache

ld_data_base_regld_env_base_reg

ld_data_bound_reg

gTLB 0 1 2 3

To Processor

Virtual address

uTLB 0

32

sCache

sTLB 0

1

MRU Latch

MRU Latch

MRU Latch

last access reuse

last access reuse

Page 15: Synonymous Address Compaction for Energy Reduction in Data TLB

15Ballapuram et al., Georgia Tech

Execution Engine Out-of-Order

Fetch / Decode / Issue / Commit 4 / 4 / 4 / 4

L1 / L2 / Memory Latency 1 / 6 / 150

TLB hit / miss latency 1 / 30

L1 Cache baseline DM 32KB, 32B

L2 Cache 4w 512KB, 32B

Number of TLB entries 32

Each 20-bit comparator power 300 uW

Each MRU latch power in TLB 140 uW

Simulation ParametersSimulation Parameters

Page 16: Synonymous Address Compaction for Energy Reduction in Data TLB

16Ballapuram et al., Georgia Tech

Energy Savings via Synonymous CompactionEnergy Savings via Synonymous Compaction

Intra-cycle compaction 27% Inter-cycle compaction 42% Inter-cycle semantic-aware 56%

dat

a T

LB

Ene

rgy

Sa

vin

gs %

0%

20%

40%

60%

80%

Inter-Cycle: semnatic-aware stack-only Inter-Cycle: semantic-aware stack + global

Inter-Cycle: semantic-aware stack + heap Inter-Cycle: all semantic-aware

Inter-Cycle: applied to baseline d-TLB Intra-Cycle: applied to baseline d-TLB

Page 17: Synonymous Address Compaction for Energy Reduction in Data TLB

17Ballapuram et al., Georgia Tech

Performance Impact w/ Synonymous CompactionPerformance Impact w/ Synonymous Compaction

Intra-cycle compaction 9% Inter-cycle compaction 8% Inter-cycle semantic-aware 4%

Perf

orm

an

ce S

peed

up

Perf

orm

an

ce S

peed

up

0.80

0.84

0.88

0.92

0.96

1.00

Inter-Cycle: semantic-aware stack only Inter-Cycle: semantic-aware stack+global

Inter-Cycle: semantic-aware stack+heap Inter-Cycle: all semantic-aware

Inter-Cycle: applied to baseline d-TLB Intra-Cycle: applied to baseline d-TLB

Page 18: Synonymous Address Compaction for Energy Reduction in Data TLB

18Ballapuram et al., Georgia Tech

I- and d-TLB Energy Savings via Synonymous CompactionI- and d-TLB Energy Savings via Synonymous Compaction

Combining compaction for iTLB and dTLB gives 85% and 52% energy savings

Overall 70% TLB energy savings Using semantic-aware, overall 76% energy savings

TL

B E

ner

gy S

avi

ng

s %

0%

20%

40%

60%

80%

100%

Intra-Cycle + Inter-Cycle: applied to baseline iTLB

Intra-Cycle + Inter-Cycle: applied to baseline dTLB

Intra-Cycle + Inter-Cycle: applied to both iTLB + baseline dTLBIntra-Cycle + Inter-Cycle: applied to both iTLB + all semantic-aware

Page 19: Synonymous Address Compaction for Energy Reduction in Data TLB

19Ballapuram et al., Georgia Tech

Combining compaction for iTLB and dTLB have 5% and 13% performance impact

Using semantic-aware, overall 13% performance impact

Pe

rfo

rman

ce S

pee

dup

0.80

0.84

0.88

0.92

0.96

1.00

Intra-Cycle + Inter-Cycle: applied to baseline iTLB

Intra-Cycle + Inter-Cycle: applied to baseline dTLB

Intra-Cycle + Inter-Cycle: iTLB + dTLB

Intra-Cycle + Inter-Cycle: iTLB + all semantic-aware

I- and d-TLB Performance Impact w/ Synonymous CompactionI- and d-TLB Performance Impact w/ Synonymous Compaction

Page 20: Synonymous Address Compaction for Energy Reduction in Data TLB

20Ballapuram et al., Georgia Tech

ConclusionsConclusions

Consecutive TLB accesses are highly synonymous

Proposed synonymous address compaction to exploit this behavior

Reduce energy for d-TLB and i-TLB Energy savings and performance impact

Intra-cycle 27% and 9% Inter-cycle 42% and 8% Semantic-aware 56% and 4%

Page 21: Synonymous Address Compaction for Energy Reduction in Data TLB

Q and AQ and A