52
Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur Mutlu

Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Mosaic: A GPU Memory Manager with Application-Transparent Support

for Multiple Page Sizes

Rachata Ausavarungnirun Joshua Landgraf Vance Miller

Saugata Ghose Jayneel Gandhi

Christopher J. Rossbach Onur Mutlu

Page 2: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Executive Summary

2

• Problem: No single best page size for GPU virtual memory• Large pages: Better TLB reach• Small pages: Lower demand paging latency

• Our goal: Transparently enable both page sizes

• Key observations• Can easily coalesce an application’s contiguously-allocated small pages

into a large page• Interleaved memory allocation across applications breaks page contiguity

• Key idea: Preserve virtual address contiguity of small pages when allocating physical memory to simplify coalescing

• Mosaic is a hardware/software cooperative framework that:• Coalesces small pages into a large page without data movement• Enables the benefits of both small and large pages

• Key result: 55% average performance improvement overstate-of-the-art GPU memory management mechanism

Page 3: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

GPU Support for Virtual Memory

3

• Improves programmability with a unified address space

• Enables large data sets to be processed in the GPU

• Allows multiple applications to run on a GPU• Virtual memory can enforce memory protection

Page 4: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

GPU Core

Private TLB

State-of-the-Art Virtual Memory on GPUs

4

GPU CoreGPU CoreGPU Core

Shared TLB

Private TLB

Page Table Walkers

Page Table(Main memory)

Private TLB Private TLB

Limited TLB reach

High latency page walks

Data (Main Memory)

CPU Memory

High latency

I/O

Private

Shared

CPU-side memory

GPU-side memory

Page 5: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Trade-Off with Page Size

5

•Larger pages: •Better TLB reach•High demand paging latency

•Smaller pages: • Lower demand paging latency• Limited TLB reach

Page 6: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Trade-Off with Page Size

6

Can we get the best of both page sizes?

0.0

0.2

0.4

0.6

0.8

1.0

No

rmal

ize

dP

erf

orm

ance

Small (4KB) Large (2MB)

0.0

0.2

0.4

0.6

0.8

1.0

No

rmal

ize

d

Pe

rfo

rman

ce

Small (4KB) Large (2MB)

No Paging Overhead With Paging Overhead

-93%

52%

Page 7: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Outline

7

• Background

• Key challenges and our goal

• Mosaic

• Experimental evaluations

• Conclusions

Page 8: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Challenges with Multiple Page Sizes

8

Large Page Frame 1

Unallocated App 1 App 2

State-of-the-ArtTime

App 1 Allocation

App 2 Allocation

App 1 Allocation

Large Page Frame 2

GPU Memory

App 2 Allocation

CoalesceApp 1 Pages

Large Page Frame 3

Large Page Frame 4

Large Page Frame 5

Cannot coalesce (without migrating multiple 4K pages)

Need to search which pages to coalesce

CoalesceApp 2 Pages

Page 9: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Desirable Allocation

9

Large Page Frame 1

Unallocated App 1 App 2

Desirable BehaviorTime

App 1 Allocation

App 2 Allocation

App 1 Allocation

Large Page Frame 2

GPU Memory

App 2 Allocation

CoalesceApp 1 Pages

CoalesceApp 2 Pages

Large Page Frame 3

Large Page Frame 4

Large Page Frame 5

Can coalesce (without moving data)

Page 10: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Our Goals

10

•High TLB reach

•Low demand paging latency

•Application transparency• Programmers do not need to modify the applications

Page 11: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Outline

11

• Background

• Key challenges and our goal

• Mosaic

• Experimental evaluation

• Conclusions

Page 12: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Mosaic

12

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Hardware

GPU Runtime

Page 13: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Outline

13

• Background

• Key challenges and our goal

• Mosaic• Contiguity-Conserving Allocation

• In-Place Coalescer

• Contiguity-Aware Compaction

• Experimental evaluations

• Conclusions

Page 14: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Conserves contiguity within the large page frame

Mosaic: Data Allocation

14

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Allocate Memory

PageTable

Data

2

Large Page Frame

Hardware

GPU Runtime Application Demands Data1

Soft guarantee:A large page frame contains

pages from only a single address space

Page 15: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

• Data transfer is done at a small page granularity• A page that is transferred is immediately ready to use

Mosaic: Data Allocation

15

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Transfer Data

PageTable

Data3 System I/O Bus

CPU Memory

Large Page Frame

Hardware

GPU Runtime

Allocate Memory2

Application Demands Data1

Page 16: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Mosaic: Data Allocation

16

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

PageTable

Transfer Done4

Large Page Frame

Data

Hardware

GPU Runtime

Transfer Data

3 System I/O Bus

CPU Memory

Page 17: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Outline

17

• Background

• Key challenges and our goal

• Mosaic• Contiguity-Conserving Allocation

• In-Place Coalescer

• Contiguity-Aware Compaction

• Experimental evaluations

• Conclusions

Page 18: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

• Fully-allocated large page frame Coalesceable

• Allocator sends the list of coalesceable pages to the In-Place Coalescer

Mosaic: Coalescing

18

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Large Page Frame

List of large pages1

Large Page Frame

Hardware

GPU Runtime

Page 19: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

• In-Place Coalescer has:

• List of coalesceable large pages

• Key Task: Perform coalescing without moving data

• Simply need to update the page tables

Mosaic: Coalescing

19

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

PageTable

Data

Hardware

GPU Runtime

Update page tables2List of large pages1

Page 20: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Mosaic: Coalescing

20

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

PageTable

Data

Hardware

GPU Runtime

Update page tables2List of large pages1

0

Coalesced Bit

1

Large Page Table Small Page Table

• Application-transparent• Data can be accessed

using either page size• No TLB flush

Page 21: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Outline

21

• Background

• Key challenges and our goal

• Mosaic• Contiguity-Conserving Allocation

• In-Place Coalescer

• Contiguity-Aware Compaction

• Experimental evaluations

• Conclusions

Page 22: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

• Key Task: Free up not-fully-used large page frames

• Splinter pages Break down a large page into small pages

• Compaction Combine fragmented large page frames

Mosaic: Data Deallocation

22

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Hardware

GPU Runtime

Page 23: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

• Splinter only frames with deallocated pages

Mosaic: Data Deallocation

23

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

PageTable

Splinter Pages (reset the coalesced bit)2

Large Page Frame

Data

Hardware

GPU Runtime Application Deallocates Data 1

Page 24: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Mosaic: Compaction

24

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Hardware

GPU Runtime

• Key Task: Free up not-fully-used large page frames

• Splinter pages Break down a large page into small pages

• Compaction Combine fragmented large page frames

Page 25: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

• Compaction decreases memory bloat• Happens only when memory is highly fragmented

Mosaic: Compaction

25

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

PageTable

Compact Pages1

Large Page Frames

Data

Free large page

Hardware

GPU Runtime List of free pages2

Free large page

Page 26: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

• Once pages are compacted, they become non-coalesceable• No virtual contiguity

• Maximizes number of free large page frames

Mosaic: Compaction

26

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Hardware

GPU Runtime

Page 27: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Outline

27

• Background

• Key challenges and our goal

• Mosaic• Contiguity-Conserving Allocation

• In-Place Coalescer

• Contiguity-Aware Compaction

• Experimental evaluations

• Conclusions

Page 28: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Baseline: State-of-the-Art GPU Virtual Memory

28

GPU Core

Private TLB

GPU CoreGPU CoreGPU Core

Shared TLB

Private TLB

Page Table Walkers

Page Table(Main memory)

Private TLB Private TLB

Data (Main Memory)

CPU Memory

Private

Shared

CPU-side memory

GPU-side memory

Page 29: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Methodology

29

• GPGPU-Sim (MAFIA) modeling GTX750 Ti• 30 GPU cores

• Multiple GPGPU applications execute concurrently

• 64KB 4-way L1, 2048KB 16-way L2

• 64-entry L1 TLB, 1024-entry L2 TLB

• 8-entry large page L1 TLB, 64-entry large page L2 TLB

• 3GB main memory

• Model sequential page walks

• Model page tables and virtual-to-physical mapping

• CUDA-SDK, Rodinia, Parboil, LULESH, SHOC suites• 235 total workloads evaluated

• Available at: https://github.com/CMU-SAFARI/Mosaic

Page 30: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Comparison Points

30

• State-of-the-art CPU-GPU memory management • GPU-MMU based on [Power et al., HPCA’14]

• Upside: Utilizes parallel page walks, TLB request coalescing and page walk cache to improve performance

• Downside: Limited TLB reach

• Ideal TLB: Every TLB access is an L1 TLB hit

Page 31: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Performance

31

23.7%43.1%

31.5%

21.4%

Homogeneous Heterogeneous

39.0%33.8%

55.4%

61.5%

95.0%

0

1

2

3

4

5

6

7

1 2 3 4 5 2 3 4 5

We

igh

ted

Sp

ee

du

p

Number of Concurrently-Executing Applications

GPU-MMU Mosaic Ideal TLB

Mosaic consistently improves performance across a wide variety of workloads

Mosaic performs within 10% of the ideal TLB

Page 32: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Other Results in the Paper

32

• TLB hit rate• Mosaic achieves average TLB hit rate of 99%

• Per-application IPC• 97% of all applications perform faster

• Sensitivity to different TLB sizes• Mosaic is effective for various TLB configurations

• Memory fragmentation analysis• Mosaic reduces memory fragmentation and improves

performance regardless of the original fragmentation

• Performance with and without demand paging

Page 33: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Outline

33

• Background

• Key challenges and our goal

• Mosaic• Contiguity-Conserving Allocation

• In-Place Coalescer

• Contiguity-Aware Compaction

• Experimental evaluations

• Conclusions

Page 34: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Summary

34

• Problem: No single best page size for GPU virtual memory• Large pages: Better TLB reach• Small pages: Lower demand paging latency

• Our goal: Transparently enable both page sizes

• Key observations• Can easily coalesce an application’s contiguously-allocated small pages

into a large page• Interleaved memory allocation across applications breaks page contiguity

• Key idea: Preserve virtual address contiguity of small pages when allocating physical memory to simplify coalescing

• Mosaic is a hardware/software cooperative framework that:• Coalesces small pages into a large page without data movement• Enables the benefits of both small and large pages

• Key result: 55% average performance improvement overstate-of-the-art GPU memory management mechanism

Page 35: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Mosaic: A GPU Memory Manager with Application-Transparent Support

for Multiple Page Sizes

Rachata Ausavarungnirun Joshua Landgraf Vance Miller

Saugata Ghose Jayneel Gandhi

Christopher J. Rossbach Onur Mutlu

Page 36: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Backup Slides

Page 37: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Current Methods to Share GPUs

37

• Time sharing• Fine-grained context switching

• Coarse-grained context switching

• Spatial sharing• NVIDIA GRID

• Multi process service

Page 38: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Other Methods to Enforce Protection

38

• Segmented paging

• Static memory partitioning

Page 39: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

TLB Flush

39

• With Mosaic, the contents in the page tables are the same

• TLB flush in Mosaic occurs when page table content is modified• This invalidates content in the TLB Need to be flushed

• Both large and small page TLBs are flushed

Page 40: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Performance with Demand Paging

40

0.0

0.5

1.0

1.5

2.0

Homogeneous Heterogeneous

No

rmal

ize

dP

erf

orm

ance

GPU-MMU no Paging

GPU-MMU with Paging

Mosaic with Paging

Page 41: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

In-Place Coalescer: Coalescing

• Key assumption: Soft guarantee• Large page range always contains pages of the same application

41

Benefit: No data movement

L1 Page Table

Set Large Page Bit

Coalesce

L2 Page Table

Set Disabled Bit

Set Disabled Bit

Set Disabled Bit

Set Disabled Bit

PD PT POVA POQ: How to access large page base entry?

Page 42: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

In-Place Coalescer: Large Page Walk

• Large page index is available at leaf PTE

42

L1 Page Table

Set Large Page Bit

Coalesce

L2 Page Table

Set Disabled Bit

Set Disabled Bit

Set Disabled Bit

Set Disabled Bit

Page 43: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

0

1

2

3

4

5

Weig

hte

d S

peed

up

GPU-MMU Mosaic Ideal TLB

TLB-Friendly TLB-Sensitive

Sample Application Pairs

Page 44: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

0%

20%

40%

60%

80%

100%

1 App 2 Apps 3 Apps 4 Apps 5 Apps

TL

B H

it R

ate

Number of Concurrently-Executing ApplicationsGPU-MMU

Mosaic

L1 L2 L1 L2 L1 L2 L1 L2 L1 L2

TLB Hit Rate

Page 45: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

0.8

1.0

1.2

1.4

1.6

30% 50% 70% 90% 95% 97% 100%

No

rmalized

Perf

orm

an

ce

Fragmentation Index

no CAC CAC CAC-BC CAC-Ideal

Pre-Fragmenting DRAM

Page 46: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

0.8

1.0

1.2

1.4

1.6

No

rmalized

Perf

orm

an

ce

Large Page Frame Occupancy

no CAC CAC CAC-BC CAC-Ideal

Page Occupancy Experiment

Page 47: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Memory Bloat

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

1% 10% 25% 35% 50% 75%

Mem

ory

Blo

at

vs. G

PU

-MM

U

Page Occupancy

4KB Page GPU-MMU CAC

Page 48: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

0

1

2

3

4

5

6

7

8

0 10 20 30 40 50

No

rmali

zed

Perf

orm

an

ce

Sorted Application Number

GPU-MMU

Mosaic

Ideal-TLB

0

1

2

3

4

5

6

7

8

9

0 25 50 75

No

rmali

zed

Perf

orm

an

ce

Sorted Application Number

GPU-MMU

Mosaic

Ideal-TLB

0

1

2

3

4

5

6

7

8

0 25 50 75 100

No

rmali

zed

Perf

orm

an

ce

Sorted Application Number

GPU-MMU

Mosaic

Ideal-TLB

0

1

2

3

4

5

6

7

8

0 25 50 75 100 125

No

rmali

zed

Perf

orm

an

ce

Sorted Application Number

GPU-MMU

Mosaic

Ideal-TLB

Individual Application IPC

Page 49: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

0.8

0.9

1.0

1.1

1.2

1.3

1.4

64 128 256 512 1024 4096

No

rmalized

Perf

orm

an

ce

Shared L2 TLB Base Page Entries

GPU-MMU Mosaic

0.8

0.9

1.0

1.1

1.2

1.3

1.4

8 16 32 64 128 256

No

rmalized

Perf

orm

an

ce

Per-SM L1 TLB Base Page Entries

GPU-MMU Mosaic

0.8

0.9

1.0

1.1

1.2

1.3

1.4

32 64 128 256 512

No

rmalized

Pe

rfo

rman

ce

Shared L2 TLB Large Page Entries

GPU-MMU Mosaic

0.8

0.9

1.0

1.1

1.2

1.3

1.4

4 8 16 32 64

No

rmalized

Perf

orm

an

ce

Per-SM L1 TLB Large Page Entries

GPU-MMU Mosaic

Page 50: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Hardware

Mosaic: Putting Everything Together

50

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Application Demands Data

GPU Runtime

Allocate Memory

PageTable

Data

TransferDone

List of Large PagesCoalesce

Pages

Application Deallocate Data

SplinterPages

CompactPages

List of Free Pages

System I/O Bus Transfer Data

Page 51: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Hardware

Mosaic: Data Allocation

51

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

Application Demands Data

GPU Runtime

Allocate Memory

PageTable

Data

TransferDone

List of Large PagesCoalesce

Pages

System I/O Bus Transfer Data

Page 52: Mosaic: A GPU Memory Manager with Application-Transparent ... · Challenges with Multiple Page Sizes 8 Large Page Frame 1 Unallocated App 1 App 2 State-of-the-Art Time App 1 Allocation

Hardware

Mosaic: Data Deallocation

52

Contiguity-ConservingAllocation

In-PlaceCoalescer

Contiguity-AwareCompaction

GPU Runtime

PageTable

Data

Application Deallocate Data

SplinterPages

CompactPages

List of Free Pages