31
On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin— Madison http://www.ece.wisc.edu/~pharm

On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

Embed Size (px)

Citation preview

Page 1: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

On the Value Locality of Store Instructions

Kevin M. LepakMikko H. Lipasti

University of Wisconsin—Madison

http://www.ece.wisc.edu/~pharm

Page 2: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Motivation Value Locality (VL) is Real

More than 30 VL/VP papers Patents granted for VL work (AMD, Gabbay)

VL has been traditionally used to speed up the next state function (the “means” of computing)

VL has not been explored (generally) for the output function (the “end” of computing)

Is there any benefit to exploiting VL for outputs?

Page 3: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

VL of Memory Writes—Who Cares?

Reduction of writebacks/dirty lines is desirable Less data traffic Fewer invalidates Reduced pressure on WB buffers

Writes comparatively expensive Requires multi-porting/banking circuit tricks

Removal of “unnecessary” value storage seems intuitively satisfying Do unnecessary writes imply unnecessary

computation?

Page 4: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Outline Motivation Introduction to Silent Stores Uniprocessor Results Multiprocessor True/False Sharing New Definition of False Sharing Multi-Processor Results Conclusions

Page 5: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Terminology Silent Store: A memory write

that does not change the system state

Page 6: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Silent Stores—Is this for Real?

Percentage of silent stores is non-trivial in all cases, 20%-68%

Percentage of Silent Stores

0

10

20

30

40

50

60

70

80

90

100

Per

cen

t

PPC

SS

Page 7: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Terminology Silent Store: A memory write that

does not change the system state Store Verify: A load, compare, and

subsequent store (if non-silent) operation

Store Squashing: Removal of a silent store from program execution Dynamically Statically

Page 8: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Uniprocessor Machine Model SimpleScalar simulator

64 entry RUU; 8 issue 64K Gshare branch predictor 64K each split I/D cache 1MB L2 unified cache 16 entry load/store queue 4 memory load ports; 1 store port (4-

wide version of the 2-wide 21164)

Page 9: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Squashing Mechanism

Baseline

Implementation ofStore Squashing

Page 10: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Writebacks Eliminated

Substantial WB elimination by simplistic store verify/squash(14%-81% for cases with non-trivial WBs in the baseline case)

Percent L1-D WBs Eliminated by Squashing

0

10

20

30

40

50

60

70

80

90

Pe

rce

nt

Page 11: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

IPC Effects

7.6%, 7.9%, 14%, and 6.3% speedup of squashing over no squashing for m88ksim, gcc, vortex, and HM

IPC Effect of Squashing

0

0.5

1

1.5

2

2.5

3

3.5

m88ksim gcc vortex HM

IPC

No Squash

Squash

Page 12: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

IPC EffectsIPC Effect of Squashing

0

0.5

1

1.5

2

2.5

3

3.5

m88ksim gcc vortex HM

IPC

No Squash

Squash

No Sq. w/SF

Sq. w/SF

Squashing provides more benefit than store forwarding

Page 13: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Outline Motivation Introduction to Silent Stores Uniprocessor Results Multiprocessor True/False

Sharing New Definition of False Sharing Multi-Processor Results Conclusions

Page 14: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Multiprocessor True/False Sharing

Dubois et. al: ISCA-1993 Address-based definition Considers all “side-effects” (non-

critical words) brought in by a cache miss and future accesses to the line

Does not consider silent stores or data value prediction

Page 15: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Terminology Program Structure Store Value Locality

(PSSVL): The value locality exhibited by a given static store (can write to many addresses)

Message Passing Store Value Locality (MPSVL): The value locality exhibited for a specific memory location (can be written by many PCs)

Stochastically Silent Store: A store value which is trivially predictable by any well known method

Page 16: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

MPSVL and PSSVL

Percentage of stochastically silent (PSSVL, MPSVL) stores is non-trivial

27%-72% for PSSVL, 39%-70% for MPSVL

Address-based and PC-based SVL

0

10

20

30

40

50

60

70

80

90

go

m88

ksim gc

c

com

pres

s liijp

eg perl

vorte

xolt

p

barn

es

ocea

n

Pe

rce

nt

Silent

PSSVL

MPSVL

Page 17: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

New Definition of False Sharing

Extend Dubois’ definition with store value locality: Update False Sharing (UFS)

Consider (update) silent stores Stochastic False Sharing (SFS)

Consider stochastically (predictable by any well known method) silent stores

Message-passing Stochastic False Sharing (MSFS):

Exploit value locality based on effective memory address Program-structure Stochastic False Sharing

(PSFS): Exploit value locality based on instruction address (PC)

Page 18: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Machine Model SimOS-PPC full system simulator

4 processors 1MB data cache 4K direct-mapped stride predictor for

Program-structure and Message-passing store value locality

Remove all silent and stochastically silent stores from the cache hierarchy Limit study—must have a mechanism to

exploit (subject of current research)

Page 19: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Multiprocessor Sharing

Page 20: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Multiprocessor Sharing

Page 21: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Multiprocessor Sharing

Page 22: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Multiprocessor Sharing Measurable reduction in true/false sharing

for simple update silent squashing (UFS) Substantial reductions by squashing

update silent store hits and misses (UFS-P) and stochastically silent stores (SFS)

Squashing store misses (UFS-P) can be substantially better than simple UFS Motivates silence confidence mechanism for

store misses

Page 23: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Multiprocessor Traffic Measurable reduction in invalidate traffic

for simple update silent store squashing (UFS)—more effective than Exclusive state Substantial reduction for UFS-P and Stochastic

False Sharing (SFS) Writeback data traffic reduction by

squashing update silent store hits and misses (UFS-P) 5%-82% in oltp 16%-17% in ocean 5%-16% in barnes

Page 24: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Conclusions Significant store value locality exists

MPSVL (includes update silent stores) PSSVL

Uniprocessor performance can be enhanced by squashing silent stores

A new definition of sharing is given which accounts for update/stochastically silent stores

We can exploit the new sharing definitions to reduce address and data bus traffic UFS: Implementation given, non-trivial results SFS: Limit study shows significant potential

Page 25: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Future Work Characterization of silent stores Silence prediction and confidence

mechanisms Implementation of SFS mechanism . . .

Page 26: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Backup Slides

Page 27: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

IPC—ContinuedIPC Effect of Squashing

0

0.5

1

1.5

2

2.5

3

3.5

4

IPC

Exclusive, No Squash Exclusive, Squash Non. Ex, No Squash Non. Ex, Squash

Squashing in an exclusive 4L/1S memory system equivalent to non-exclusive/no squash

Page 28: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Previous Value Locality Works Lipasti, Shen: MICRO-1996, ASPLOS-

1996 Load value prediction (VP), input register VP

Mendelson, Gabbay: Technion TR-97 VP based on output register specifier

Gonzalez, Gonzalez: PACT-1999 Improving branch prediction

Calder et. al: ISCA-1999 Critical path optimizations

Many Others

Page 29: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

16B 32B 64B 128B 512B

Linesize (bytes)

0.01.02.03.04.05.06.07.08.09.0

10.011.0

Inv

alid

ate

s (

%/r

ef)

16B 32B 64B 128B 512B

Linesize (bytes)

0.01.02.03.04.05.06.07.08.09.0

10.011.0

Inv

alid

ate

s (

%/r

ef)

Multiprocessor Invalidates

HitMissMissNoE

SentSentNoE

OCEAN

Page 30: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

16B 32B 64B 128B 512B

Linesize (bytes)

0.0

0.2

0.4

0.6

Inva

lidat

es (

%/r

ef)

16B 32B 64B 128B 512B

Linesize (bytes)

0.0

0.2

0.4

0.6

Inva

lidat

es (

%/r

ef)

Multiprocessor Invalidates

HitMissMissNoE

SentSentNoE

BARNES

Page 31: On the Value Locality of Store Instructions Kevin M. Lepak Mikko H. Lipasti University of Wisconsin—Madison pharm

June 13, 2000Kevin M. Lepak and Mikko H.

Lipasti

Multiprocessor Invalidates

16B 32B 64B 128B 512B

Linesize (bytes)

0.0

0.5

1.0

1.5

2.0

2.5

Inva

lidat

es (

%/r

ef)

16B 32B 64B 128B 512B

Linesize (bytes)

0.0

0.5

1.0

1.5

2.0

2.5

Inva

lidat

es (

%/r

ef) Hit

MissMissNoE

SentSentNoE

OLTP