Upload
erick-sparks
View
285
Download
2
Tags:
Embed Size (px)
Citation preview
1
Multi-Core Systems
CORE 0 CORE 1 CORE 2 CORE 3
L2 CACHE
L2 CACHE
L2 CACHE
L2 CACHE
DRAM MEMORY CONTROLLER
DRAM
Bank 0
DRAM
Bank 1
DRAM
Bank 2
DRAM
Bank 7. . .
Shared DRAMMemory System
Multi-CoreChip
unfairness
2
DRAM Bank Operation
Row Buffer
Access Address (Row 0, Column 0)
Row
dec
oder
Column decoder
Row address 0
Column address 0
Data
Row 0Empty
Access Address (Row 0, Column 1)
Column address 1
Access Address (Row 0, Column 9)
Column address 9
Access Address (Row 1, Column 0)
HITHIT
Row address 1
Row 1
Column address 0
CONFLICT !
Columns
Row
s
3
DRAM Controllers
A row-conflict memory access takes much longer than a row-hit access (50ns vs. 100ns)
Current controllers take advantage of the row buffer
Commonly used scheduling policy (FR-FCFS): Row-hit first: Service row-hit memory accesses first Oldest-first: Then service older accesses first
This scheduling policy aims to Maximize DRAM throughput But, yields itself to a new form of denial of service
with multiple threads
Memory Performance Attacks:
Denial of Memory Service
in Multi-Core SystemsThomas Moscibroda and Onur Mutlu
Computer Architecture GroupMicrosoft Research
5
Outline
The Problem Memory Performance Hogs
Preventing Memory Performance Hogs Our Solution
Fair Memory Scheduling Experimental Evaluation Conclusions
6
The Problem
Multiple threads share the DRAM controller DRAM controllers are designed to maximize DRAM
throughput
DRAM scheduling policies are thread-unaware and unfair Row-hit first: unfairly prioritizes threads with high row
buffer locality Streaming threads Threads that keep on accessing the same row
Oldest-first: unfairly prioritizes memory-intensive threads
Vulnerable to denial of service attacks
7
Memory Performance Hog (MPH) A thread that exploits unfairness in the DRAM
controller to deny memory service (for long time periods) to
threads co-scheduled on the same chip
Can severely degrade other threads’ performance While maintaining most of its own performance!
Easy to write an MPH An MPH can be designed intentionally (malicious) Or, unintentionally! Implications in commodity multi-core systems:
widespread performance degradation and unpredictability user discomfort and loss of productivity vulnerable hardware could lose market share
8
An Example Memory Performance Hog
STREAM
- Sequential memory access - Very high row buffer locality (96% hit rate)- Memory intensive
9
A Co-Scheduled Application
RDARRAY
- Random memory access- Very low row buffer locality (3% hit rate)- Similarly memory intensive
10
What Does the MPH Do?
Row BufferR
ow d
ecod
er
Column decoder
Data
Row 0
T0: Row 0
Row 0
T1: Row 16
T0: Row 0T1: Row 111
T0: Row 0T0: Row 0T1: Row 5
T0: Row 0T0: Row 0T0: Row 0T0: Row 0T0: Row 0
Row size: 8KB, data size: 64B128 requests of stream serviced before rdarray
Request Buffer
11
Already a Problem in Real Dual-Core Systems
Results on Intel Pentium D running Windows XP(Similar results for Intel Core Duo and AMD Turion, and on Fedora)
2.82X slowdown
1.18X slowdown
12
More Severe Problem with 4 and 8 Cores
Simulated config:- 512KB private L2s- 3-wide processors-128-entry instruction window
13
Outline
The Problem Memory Performance Hogs
Preventing Memory Performance Hogs Our Solution
Fair Memory Scheduling Experimental Evaluation Conclusions
14
Preventing Memory Performance Hogs MPHs cannot be prevented in software
Fundamentally hard to distinguish between malicious and unintentional MPHs
Software has no direct control over DRAM controller’s scheduling OS/VMM can adjust applications’ virtualphysical page
mappings No control over DRAM controller’s hardwired pagebank
mappings
MPHs should be prevented in hardware Hardware also cannot distinguish malicious vs unintentional Goal: Contain and limit MPHs by providing fair memory
scheduling
15
Fairness in Shared DRAM Systems A thread’s DRAM performance dependent on its inherent
Row-buffer locality Bank parallelism
Interference between threads can destroy either or both Well-studied notions of fairness are inadequate
Fair scheduling cannot simply Equalize memory bandwidth among threads
Unfairly penalizes threads with good bandwidth utilization DRAM scheduling is not a pure bandwidth allocation problem
Unlike a network wire, DRAM is a stateful system Equalize the latency or number of requests among
threads Unfairly penalizes threads with good row-buffer locality
Row-hit requests are less costly than row-conflict requests
Bandwidth, Latency
or Request Fairness Performance
Fairness (slowdown)
16
Our Definition of DRAM Fairness A DRAM system is fair if it slows down each thread
equally Compared to when the thread is run alone
Experienced Latency = EL(i) = Cumulative DRAM latency of thread i running with other threads
Ideal Latency= IL(i) = Cumulative DRAM latency of thread i if it were run alone on the same hardware resources
DRAM-Slowdown(i) = EL(i)/IL(i)
The goal of a fair scheduling policy is to equalize DRAM-Slowdown(i) for all Threads i Considers inherent DRAM performance of each thread
17
Outline
The Problem Memory Performance Hogs
Preventing Memory Performance Hogs Our Solution
Fair Memory Scheduling Experimental Evaluation Conclusions
18
Fair Memory Scheduling Algorithm (1) During each time interval of cycles, for each thread,
DRAM controller Tracks EL (Experienced Latency) Estimates IL (Ideal Latency)
At the beginning of a scheduling cycle, DRAM controller Computes Slowdown = EL/IL for each thread Computes unfairness index = MAX Slowdown / MIN
Slowdown
If unfairness < Use baseline scheduling policy to maximize DRAM
throughput (1) row-hit first (2) oldest-first
19
Fair Memory Scheduling Algorithm (2) If unfairness ≥
Use fairness-oriented scheduling policy (1) requests from thread with MAX Slowdown first (if any) (2) row-hit first (3) oldest-first
Maximizes DRAM throughput if cannot improve fairness
: Maximum tolerable unfairness : Lever between short-term versus long-term fairness System software determines and
Conveyed to the hardware via privileged instructions
20
How Does FairMem Prevent an MPH?
Row Buffer
Data
Row 0
T0: Row 0
Row 0
T1: Row 16
T0: Row 0
T1: Row 111
T0: Row 0T0: Row 0
T1: Row 5
T0: Row 0T0: Row 0
T0: Row 0
T0 Slowdown
T1 Slowdown 1.00
1.00
1.00Unfairness
1.03
1.03
1.06
1.06
1.05
1.03
1.06
1.031.04
1.08
1.04
1.04
1.11
1.06
1.07
1.04
1.10
1.14
1.03
Row 16Row 111
21
Hardware Implementation
Tracking Experienced Latency Relatively easy Update a per-thread latency counter as long as there
is an outstanding memory request for a thread
Estimating Ideal Latency More involved because thread is not running alone Need to estimate inherent row buffer locality
Keep track of the row that would have been in the row buffer if thread were running alone
For each memory request, update a per-thread latency counter based on “would-have-been” row buffer state
Efficient implementation possible – details in paper
22
Outline
The Problem Memory Performance Hogs
Preventing Memory Performance Hogs Our Solution
Fair Memory Scheduling Experimental Evaluation Conclusions
23
Evaluation Methodology
Instruction-level performance simulation Detailed x86 processor model based on Intel
Pentium M 4 GHz processor, 128-entry instruction window 512 Kbyte per core private L2 caches Detailed DRAM model
8 banks, 2Kbyte row buffer Row-hit latency: 50ns (200 cycles) Row-conflict latency: 100ns (400 cycles)
Benchmarks Stream (MPH) and rdarray SPEC CPU 2000 and Olden benchmark suites
24
Dual-core Results
MPH with non-intensive VPRFairness improvement 7.9X to 1.11System throughput improvement 4.5X
MPH with intensive MCFFairness improvement 4.8X to 1.08System throughput improvement 1.9X
25
4-Core Results
• Non-memory intensive applications suffer the most (VPR)• Low row-buffer locality hurts (MCF)
• Fairness improvement: 6.6X• System throughput improvement: 2.2X
27
Effect of Row Buffer Size
• Vulnerability increases with row buffer size
• Also with memory latency (in paper)
28
Outline
The Problem Memory Performance Hogs
Preventing Memory Performance Hogs Our Solution
Fair Memory Scheduling Experimental Evaluation Conclusions
29
Conclusions
Introduced the notion of memory performance attacks in shared DRAM memory systems
Unfair DRAM scheduling is the cause of the vulnerability More severe problem in future many-core systems
We provide a novel definition of DRAM fairness Threads should experience equal slowdowns
New DRAM scheduling algorithm enforces this definition Effectively prevents memory performance attacks
Preventing attacks also improves system throughput!