65
Efficient Memory Disaggregation with Infiniswap Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, Kang G. Shin

Efficient Memory Disaggregation with Infiniswap Memory Disaggregation with Infiniswap Juncheng Gu, Youngmoon Lee, Yiwen Zhang, MosharafChowdhury, Kang G. Shin Agenda •Motivation

  • Upload
    vodang

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Efficient Memory Disaggregation with Infiniswap

Juncheng Gu, Youngmoon Lee, Yiwen Zhang,Mosharaf Chowdhury, Kang G. Shin

Agenda• Motivation and related work

• Design and system overview

• Implementation and evaluation

• Future work and conclusion

3/30/17 1

23/30/17

Memory-intensive applications

33/30/17

Memory-intensive applications

3/30/17 4

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 5

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 6

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 7

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 8

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 9

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 10

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 11

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 12

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

3/30/17 13

Performance degradation

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.06

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

0.18

0.47

0.94 0.97

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory 75%workingsetsinmemory 50%workingsetsinmemory

Memory overestimation

3/30/17 14

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

Time (days)

3/30/17 15

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

0.8

Time (days)

3/30/17 16

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

0.8

0.5

Time (days)

3/30/17 17

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

0.8

0.5≈30%

Time (days)

3/30/17 18

• Google Cluster Analysis[1]

[1] Reiss, Charles, et al. "Heterogeneity and dynamicity of clouds at scale: Google trace analysis." SoCC’12.

Memory underutilization

How to utilize ABU memory?

Allocated Used

Porti

on o

f Mem

ory

0.8

0.5≈30%

Time (days)Can we utilize this memory?

3/30/17 19

Machine 2

Used memory Free memory Remote memory

Machine 3 Machine 4 Machine N

Machine 1

3/30/17 20

Disaggregate free memory

Machine 2

Used memory Free memory Remote memory

Machine 3 Machine 4 Machine N

Machine 1

Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N

Machine 1

Used memory Free memory Remote memory

3/30/17 21

Disaggregate free memory

Machine 2

Used memory Free memory Remote memory

Machine 3 Machine 4 Machine N

Machine 1

Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N

Machine 1

Used memory Free memory Remote memory

Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N

Machine 1

Used memory Free memory Remote memory

Machine 2

Memory Disaggregation Layer

Machine 3 Machine 4 Machine N

Machine 1

Used memory Free memory Remote memory

3/30/17 22

What are the challenges?

• Minimize deployment overhead• No hardware design• No application modification

• Tolerate failures• e.g. network disconnection, machine crash

• Manage remote memory at scale

No HW design No appmodification

Fault-tolerance Scalability

Memory Blade[ISCA’09]

HPBD[CLUSTER’05] / NBDX[1]

RDMA key-value service(e.g. HERD[SIGCOMM’14], FaRM[NSDI’14])

Intel Rack Scale Architecture(RSA)[2]

Infiniswap

3/30/17 23

Recent work on memory disaggregation

[1] https://github.com/accelio/NBDX[2] http://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html

Agenda• Motivation and related work

• Design and system overview

• Implementation and evaluation

• Future work and conclusion

3/30/17 24

3/30/17 25

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

3/30/17 26

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

Infiniswap Block Device• Swap space• Request router

3/30/17 27

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

Local disk• [ASYNC] backup swapped-out

data• Tolerate remote memory

failure

3/30/17 28

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

Infiniswap Deamon• Local memory region• Remote memory service

3/30/17 29

System Overview

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

Local Disk RNIC

Machine 1

ApplicationInfiniswapDaemon User

Space

Machine 2

RNIC

SyncAsync

RDMA • One-sided operations• Bypass remote CPU

Objectives Ideas

No hardware designRemote paging

No application modification

Fault-tolerance Local backup disk

Scalability Decentralized remote memory management

3/30/17 30

How to meet the design objectives?

3/30/17 31

One-to-many

Application1 Application2

Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 1 Machine 2

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 3

RNIC

Local Disk

User Space

Kernel Space

Async Sync

3/30/17 32

Many-to-many

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 1 Machine 2

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 3

RNIC

Application1 Application2 User Space

Kernel SpaceVirtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

Machine 4

Local Disk Local Disk

Async Sync AsyncSync

3/30/17 33

Many-to-many

Application1 Application2User Space

Kernel Space Virtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 1 Machine 2

RNIC

ApplicationInfiniswapDaemon User

Space

Machine 3

RNIC

Application1 Application2 User Space

Kernel SpaceVirtual Memory Manager (VMM)

Infiniswap Block Device

RNIC

Machine 4

Local Disk Local Disk

Async Sync AsyncSync

How to scale remote memory?

• How to find remote memory in the cluster?• Which remote mapping should be evicted?

Objectives Ideas

No hardware designRemote paging

No application modification

Fault-tolerance Local backup disk

Scalability Decentralized remote memory management

3/30/17 34

How to meet the design objectives?

3/30/17 35

Management unit: memory page?

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 36

Management unit: memory page?

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

Local Page Remote Pagep100 <s1, p1>

1GB = 256K entries1GB = 256K RTTs

3/30/17 37

Management unit: memory slab!

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 38

Management unit: memory slab!

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 39

Which remote machine should be selected?

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 40

Which remote machine should be selected?

Goal: balance memory utilization

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 41

Which remote machine should be selected?

Ø Central controller

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 42

Which remote machine should be selected?

Ø Central controller

Ø Decentralized approach

3/30/17 43

[1]Power of two choices[1]

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

[1] Mitzenmacher, Michael. "The power of two choices in randomized load balancing.”, Ph.D. thesis, U.C. Berkeley, 1996

3/30/17 44

[1]Power of two choices[1]

[1] Mitzenmacher, Michael. "The power of two choices in randomized load balancing.”, Ph.D. thesis, U.C. Berkeley, 1996

Infiniswap Block Device

InfiniswapDaemon

InfiniswapDaemon

InfiniswapDaemon

3/30/17 45

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

3/30/17 46

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

3/30/17 47

Slab eviction

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Infiniswap Daemon

1 2 3 4

Remote Memory Used Memory

Mapped Slab Unmapped Slab

3/30/17 48

Which slab should be evicted?

Daemon: Does not know the swap activities

Infiniswap Daemon

1 2 3 4

3/30/17 49

Daemon: Too expensive to query all the slabs

Infiniswap Daemon

1 2 3 4

Which slab should be evicted?

Infiniswap Daemon

1 2 3 4

3/30/17 50

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

Infiniswap Daemon

1 2 3 4

3/30/17 51

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

Infiniswap Daemon

1 2 3 4

Infiniswap Daemon

1 2 3 4

3/30/17 52

Power of multiple choices[1]

Select E least-active slabs from E+E’ random slabs

[1] Park, Gahyun. "A generalization of multiple choice balls-into-bins.” PODC’11

Infiniswap Daemon

1 2 3 4

Infiniswap Daemon

1 2 4

Agenda• Motivation and related work

• Design and system overview

• Implementation and evaluation

• Future work and conclusion

3/30/17 53

3/30/17 54

Implementation

• Connection Management• One RDMA connection per active block device - daemon pair

• Control Plane• SEND, RECV

• Data Plane• One-sided RDMA READ, WRITE

Kernel Space

InfiniswapBlock Device

User Space

InfiniswapDaemon

RDMA

3/30/17 55

What are we expecting from Infiniswap?

§ Application performance

§ Cluster memory utilization

§ Network usage

§ Eviction overhead

§ Fault-tolerance overhead

§ Performance as a block device

3/30/17 56

Evaluation

2 x 8 cores (32 vcores)64GB DRAM56Gbps InfiniBand NIC

32-node cluster

InfiniBandNetwork

• 50% working sets in memory

3/30/17 57

Application performance

• Application performance is improved by 2-16x

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

• 50% working sets in memory

3/30/17 58

Application performance

• Application performance is improved by 2-16x

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

0.04 0.060.12

0.04

0.66

0.77

0.61

0.08

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Normalized

Perform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

• 50% working sets in memory

3/30/17 59

Application performance

• Application performance is improved by 2-16x

0.04 0.060.12

0.040

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Norm

alize

dPerform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

0.04 0.060.12

0.04

0.66

0.77

0.61

0.08

0

0.2

0.4

0.6

0.8

1

VoltDB(TPC-C)

Memcached(Facebook/FBSYS)

PowerGraph(TunkRank)

GraphX(PageRank)

Normalized

Perform

ance

100%workingsetsinmemory Disk+50%workingsetsinmemoryInfiniswap+50%workingsetsinmemory

• 90 containers (applications), mixing all applications and memory constraints.

3/30/17 60• Cluster memory utilization is improved from 40.8% to 60% (1.47x)

Cluster memory utilization

0

20

40

60

80

100

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Mem

oryU

tiliza

tion(%)

RankofMachines

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

AxisTitle

AxisTitle

ChartTitle

Infiniswapw/oInfiniswap

Agenda• Motivation and related work

• Design and system overview

• Implementation and evaluation

• Future work and conclusion

3/30/17 61

3/30/17 62

Limitations and future work• Trade-off in fault-tolerance

• Local disk is the bottleneck• Multiple remote replicas

• Fault-tolerance vs. space-efficiency

• Performance isolation among applications• W/o limitation on each application’s usage• W/o mapping between remote memory and applications

• Infiniswap: remote paging over RDMA• Application performance• Cluster memory utilization

3/30/17 63

Conclusion

• Efficient, practical memory disaggregation• No hardware design• No application modification• Fault-tolerance• Scalability

Source code is coming soon!https://github.com/Infiniswap/infiniswap.git

Thank You !

3/30/17 64