31
Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature Mrinmoy Ghosh Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin S. Lee ARM Microsoft Research Georgia Tech

Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature Mrinmoy Ghosh Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Embed Size (px)

Citation preview

Page 1: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature

Mrinmoy Ghosh

Ripal Nathuji

Min Lee

Karsten Schwan

Hsien-Hsin S. Lee

ARM Microsoft Research Georgia Tech

Page 2: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Cache Interference in “Concurrent Processes”

L2 Cache

Core A

L1 Cache

Core B

L1 Cache

P1

P2

P1 $ LineP2 $ LineLine Hit !!!Conflict !!!

Page 3: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Cache Interference Effect (Concurrent Processes)

Maximum performance degradation less than 10%

mcf

libq

mcf

perl

mcf

perl

libqlibq

libq

mcfmcf

libq

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

perlb

ench

gobm

k

hmm

er

sopl

ex

povr

ay

omne

tpp

mcf

libqu

antu

m

asta

r

bwav

es

sphi

nx3

xala

ncbm

k

Rel

ativ

e R

un

Tim

e

Page 4: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Cache Interference in “Shared Cache Multi-Core”

L2 Cache

Core A

L1 Cache

Core B

L1 Cache

P1 P2

P1 $ LineP2 $ LineConflict !!!

Page 5: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Cache Interference Effect (Shared Cache Multi-Core)

Performance degraded by as much as 65%

lbmlbmlibqbwaves

libq libqmcf

libq

libq

libqlibq

soplex

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

perlb

ench

gobm

k

hmm

er

sopl

ex

povr

ay

omne

tpp

mcf

libqu

antu

m

asta

r

bwav

es

sphi

nx3

xala

ncbm

k

Rel

ativ

e R

un

Tim

es

Intelligent Process Management Needed !!

Page 6: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

• Problem– Processes in different cores can be incompatible– Shared resource contention

• Observation– Less contention of incompatible processes when running

on the same core

• Insight:

– Process incompatibility severely affects performance– Compatibility-based scheduling increases throughput

Process (In-)Compatibility in Multi-Cores

Page 7: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

7

Ideas

• Use Counting Bloom Filter to record memory access signature

• Compatibility test using signature

Page 8: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Insertion: Counting Bloom Filter

PresenceBit

1

1

Counter

N-to-mHash Func X

N-to-mHash Func Y

N-bit Data Address A

Page 9: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Insertion: Counting Bloom Filter

PresenceBit

1

1

1

Counter

N-to-mHash Func X

N-to-mHash Func Y

N-bit Data Address B

2

Page 10: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Deletion: Counting Bloom Filter

PresenceBit

1

1

Counter

N-to-mHash Func X

N-to-mHash Func Y

Data Address AWas Evicted

12

Page 11: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Query: Counting Bloom Filter

PresenceBit

1

0

2

Counter

N-to-mHash Func X

N-to-mHash Func Y

Data Address A??

1

Data Not Present !!!

Page 12: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Bloom Filter Signatures vs. Cache Footprint

Strong Correlation !!!

0

500

1000

1500

2000

2500

3000

3500

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700

Cache Footprint Signature Value

Page 13: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

13

Architectural Support

Page 14: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Bloom Filter Signature Multi-Core Architecture

L2 Cache

Core A

L1 Cache

Core B

L1 Cache

Last Filter

Core Filter

Last Filter

Core Filter

Bloom Filter Counters

Page 15: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Bloom Filter Signature Multi-Core Architecture

L2 Cache

Core A

L1 Cache

Core B

L1 Cache

P1 P2

Last Filter

Core Filter

Last Filter

Core Filter

Bloom Filter Counters

P3

Page 16: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Metric for Execution StateLast Filter

Core Filter

RBV (Running Bit Vector)

+Occupancy Weight

(i.e., # of 1s)

Page 17: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Interference Metric (Complement of Symbiosis)

Process Pool (Processes waiting to be scheduled) Proc1 RBV

Proc0

Proc1

Proc2

Proc**Proc*

Core Filter

Symbiosis = 5+

Interference Metric = N - 5

+

Page 18: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

18

Process-to-Core

Mapping Algorithms

• A1: Use Occupancy Weight

• A2: Use Interference Graph

• A3: Use Weighted Interference Graph

Page 19: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

• Sort all processes according to occupancy weight• Processes form groups using sorted weight

– # of processes in a group = Processes/Cores• Map processes to cores based on sorting results

A1: Weight Sorted Algorithm

P0100

P499

P270

P565

P643

P320

P115

Core A

L1 Cache

Core B

L1 Cache

Core C

L1 Cache

Core D

L1 Cache

Page 20: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

• Form interference graph using interference metric• Find MAX-CUT of the graph

A2: Interference Graph Algorithm

P0

CA=20

CB=30

P1

CA=10

CB=45

P2

CA=40

CB=25

P3

CA=15

CB=50

Was in CA Was in CB

P0(A)

P1(A)

P2(B)

P3(B)

30

40Interference Graph

Page 21: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

• Form interference graph using interference metric• Find MAX-CUT of the graph

A2: Interference Graph Algorithm

P0

CA=20

CB=30

P1

CA=10

CB=45

P2

CA=40

CB=25

P3

CA=15

CB=50

Was in CA Was in CB

P0(A)

P1(A)

P2(B)

P3(B)

70

Interference Graph

Page 22: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

• Form interference graph using interference metric• Find MAX-CUT of the graph

A2: Interference Graph Algorithm

P0

CA=20

CB=30

P1

CA=10

CB=45

P2

CA=40

CB=25

P3

CA=15

CB=50

Was in CA Was in CB

P0(A)

P1(A)

P2(B)

P3(B)

70

Interference Graph

60

30 7545

85

Page 23: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

• Form interference graph using interference metric• Find MAX-CUT of the graph

A2: Interference Graph Algorithm

P0(A)

P1(A)

P2(B)

P3(B)

70

Interference Graph

60

30 7545

85

P1(A)

P3(B)

P0(A)

P2(B)

85 45

Page 24: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

• To address high interference issues• Weight the edges of the interference graph• The rest are the same as A2

A3: Weighted Interference Graph Algorithm

P0OW=90

CA=20

CB=30

P1OW=85

CA=10

CB=45

P2OW=50

CA=40

CB=25

P3OW=100

CA=15

CB=50

Was in CA Was in CB

P0(A)

P1(A)

P2(B)

P3(B)

90*30

50*40Interference Graph

Page 25: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

25

Performance Evaluation

Page 26: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Evaluation Methodology

P1 P2 P3 PN

Fedora Linux

Simics x86

Gather Footprint in Emulator

“magic”interface

Process-to-CoreMapping

P1 P2 P3 PN

Intel Core 2

Native x86 Run

P1 P2 PN

Linux Linux Linux

Xen Hypervisor

Intel Core 2

VM Run

Page 27: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

0%

10%

20%

30%

40%

50%

60%

asta

r

gobm

k

hm

mer

lbm

libquantu

m

mcf

om

netp

p

perlbench

povra

y

sople

x

sphin

x

xala

ncbm

k

Performance Results

0%

5%

10%

15%

20%

25%

asta

r

gobm

k

hm

mer

lbm

libquantu

m

mcf

om

netp

p

perlbench

povr

ay

sople

x

sphin

x

xala

ncbm

k

Maximum performance improvement of up to 54%

Average performance improvement of up to 23%

Page 28: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Performance of Virtualized Systems

Maximum performance improvement of up to 26%

Average performance improvement of up to 9.5%

asta

r

gobm

k

hmm

er

lbm

libqu

antu

m

mcf

omne

tpp

perlb

ench

povr

ay

sopl

ex

sphi

nx

xala

nbcm

k

0%1%2%3%4%5%6%7%8%9%

10%

Page 29: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Performance Sensitivity of 3 Algorithms

0%

4%

8%

12%

16%

mcfgobmkpovray

omnetpp

mcfhmmer

libquantumomnetpp

perlbenchgobmk

libquantumomnetpp

gobmkhmmer

libquantumpovray

mcfhmmer

libquantumpovray

Application Mix

Per

form

ance

Ben

efit

Sorted Graph Weighted Graph

Weighted Interference Graph has the best performance

Page 30: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

Conclusion

30/53

Shared Resource (e.g., LLC) Management is Critical

Process Scheduling using Compatibility in Multi-Core

Capturing Cache Reference Behavior for Processes

Symbiotic Scheduling with Bloom Filter Signature

Measured Speedup of 22% (up to 54%) on Intel Core 2

Page 31: Symbiotic Scheduling for Shared Caches in Multi-Core Systems Using Memory Footprint Signature  Mrinmoy Ghosh  Ripal Nathuji Min Lee Karsten Schwan Hsien-Hsin

31

That’s All, Folks!

Georgia TechECE MARS Labhttp://arch.ece.gatech.edu