34
HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, Yangyang Zhang and Jianxin Li ACT lab, Beihang University 2013-11-7

HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Embed Size (px)

Citation preview

Page 1: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster

Presented by: Dr Jianxin Li

Lei Cui, Bo Li, Yangyang Zhang and Jianxin Li

ACT lab, Beihang University

2013-11-7

Page 2: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

2

Outline• Background

• Problems

• Solution & Implementation

• Experimental Results

• Conclusions

Page 3: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

requestCloud

VM ClusterEnd users

Background• Virtual Machine

– Isolation, encapsulation, multi‐instance

– Key technique supporting cloud computing

– Limited capacity in CPU, memory, storage

• Virtual Machine Cluster– Multiple VMs are connected together to support powerful capacity

– Scientific computing, distributed database, web service, etc

3

Page 4: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

4

Background• Failure becomes a norm nowadays

– Computer node, Annual failure rate(AFR)  is 20~60% per processor [J. Physics’07]

– Storage node, AFR is 2%~4%, some even 3.9%~8.3%  [OSDI’10]

– Network node, AFR is 1.1%~11.4% [SIGCOMM’11]

• VM Skills– Migration

• Tolerate computer node failure

– vLockstep• Tolerate computer node failure

– Snapshot• Tolerate computer node and software failure

Component Disk Node Rack

MTTF 10‐50years 4.3months 10.2years

snapshot

Google Cluster

Page 5: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

5

Background• Failure becomes a norm nowadays

– Computer node, Annual failure rate(AFR)  is 20~60% per processor [J. Physics’07]

– Storage node, AFR is 2%~4%, some even 3.9%~8.3%  [OSDI’10]

– Network node, AFR is 1.1%~11.4% [SIGCOMM’11]

• VM Skills– Migration

• Tolerate computer node failure

– vLockstep• Tolerate computer node failure

– Snapshot• Tolerate computer node and software failure

Component Disk Node Rack

MTTF 10‐50years 4.3months 10.2years

snapshot

VMC snapshot and rollback occurs frequently to survive from the failures to complete the long time task running in VMC

Google Cluster

Page 6: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

6

Outline• Background

• Problems

• Solution & Implementation

• Experimental Results

• Conclusions

Page 7: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Problems• VMC snapshot

– Single VM snapshot• Save memory, disk, CPU and other devices’ state.

– Consistency protocol• Global virtual time

• m3 is sent from p2  after t1 to p3 before t1, violating consistency

• m3  should be dropped to keep consistency, but this will lead to TCP‐backoff 7

P1

P2

P3

m1

m3

m4

inconsistency

m2

t1

Page 8: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Problems• Analysis

– Current snapshot skill• Stop‐and‐copy

• Pre‐copy

8

runningpre-copy

iterate iterate

stop running

final copy

Memory snapshot Disk snapshot

Pre-copy based single VM snapshot

TCP‐backoff duration is related to downtime and difference between VMs’ snapshot completion times

Pre-copy based VMC snapshot

VM1 VM2

duration

TCP backoff

1

32

Page 9: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Problems• Experimental result, a sample

– 16 2G memory VMs. Distcc to compile the Linux kernel 2.6.32‐5

– VM9 is Distcc client, TCP‐backoff of VM9  and VM1 is 12.7s

9Details of Pre-copy based VMC snapshot

Page 10: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Problems• Experimental result, a sample

– 16 2G memory VMs. Distcc to compile the Linux kernel 2.6.32‐5

– VM9 is Distcc client, TCP‐backoff of VM9  and VM1 is 12.7s

10Details of Pre-copy based VMC snapshot

Minimize the downtime of single VM snapshot

Minimize the difference of snapshot completion times between communicating VMs

Page 11: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

11

Outline• Background

• Problems

• Solution & Implementation

• Experimental Results

• Conclusions

Page 12: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

• Key Issues in Pre‐copy method– The downtime of single VM snapshot

• Index tree in disk snapshot, several seconds

• Final copy in memory snapshot, hundreds to thousands  milliseconds

– The difference of snapshot completion times• Iterate copy the memory state. Workload impact.

• Available IO bandwidth to save memory state

• HotSnap solutions– Single VM snapshot

• Copy‐on‐write(COW) based memory snapshot

• Redirect‐on‐write (ROW) based disk snapshot

– Consistency protocol• Suitable to virtualized environments, keep global consistency

12

Solutions

12

Page 13: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Pre-copy vs. HotSnap

• Stop VM at the end

• Iterate copy the memory state

• Copy‐on‐write disk snapshot

• Longer downtime and duration

13

runningpre-copy

iterate iterate

stop running

Memory snapshot Disk snapshot

Pre-copy based single VM snapshot

• Stop VM first

• Copy‐on‐write memory snapshot

• Redirect‐on‐write disk snapshot

• Short downtime and duration

running stop

running

Memory snapshotDisk snapshot

memory

HotSnap for single VM

Pre-copy method HotSnap method

Page 14: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

• HotSnap analysis– Stop VM first, record some metadata, write-protect the memory,

resume the VM, and save the state during execution– Light-weight operations

14

Solutions

14

TCP‐backoff duration is mainly related to downtime

HotSnap based VMC snapshot

VM1 VM2

duration

TCP backoff

Page 15: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills – memory snapshot• COW based memory snapshot

– Only involve saving CPU, network, and devices’ state, and write-protecting the memory

– Intercept DMA write operations– Uniform view for bitmap and snapshot file

15

Snapshot Manager

Write Protect Controller

Write Protect

DMA Interceptor

Background Thread

Page Fault Trigger

Bitmap Snapshot file

KVM kernel space Guest Mode

Guest Write

DMA Write 

15

Guest Handler

Guestmemory

QEMU user space

Page 16: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills – memory snapshot• COW based memory snapshot

– Only involve saving CPU, network, and devices’ state, and write-protecting the memory

– Intercept DMA write operations– Uniform view for bitmap and snapshot file

16

Snapshot Manager

Write Protect Controller

Write Protect

DMA Interceptor

Background Thread

Page Fault Trigger

Bitmap Snapshot file

KVM kernel space Guest Mode

Guest Write

DMA Write 

16

Guest Handler

Guestmemory

QEMU user space

Page 17: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills – memory snapshot• COW based memory snapshot

– Only involve saving CPU, network, and devices’ state, and write-protecting the memory

– Intercept DMA write operations– Uniform view for bitmap and snapshot file

17

Snapshot Manager

Write Protect Controller

Write Protect

DMA Interceptor

Background Thread

Page Fault Trigger

Bitmap Snapshot filecheck save

QEMU user space KVM kernel space Guest Mode

Guest Write

DMA Write 

17

Guest Handler

Guestmemory

Page 18: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills – memory snapshot• COW based memory snapshot

– Only involve saving CPU, network, and devices’ state, and write-protecting the memory

– Intercept DMA write operations– Uniform view for bitmap and snapshot file

18

Snapshot Manager

Write Protect Controller

Write Protect

DMA Interceptor

Background Thread

Page Fault Trigger

Bitmap Snapshot filecheck save

QEMU user space KVM kernel space Guest Mode

Guest Write

DMA Write 

18

Guest Handler

Guestmemory

Page 19: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills – memory snapshot• COW based memory snapshot

– Only involve saving CPU, network, and devices’ state, and write-protecting the memory

– Intercept DMA write operations– Uniform view for bitmap and snapshot file

19

Snapshot Manager

Write Protect Controller

Write Protect

DMA Interceptor

Background Thread

Page Fault Trigger

Bitmap Snapshot filecheck save

QEMU user space KVM kernel space Guest Mode

Guest Write

DMA Write 

19

Guest Handler

Guestmemory

Page 20: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills – disk snapshot• ROW based disk snapshot

– Only create one bitmap and one null disk image– Lightweight metadata based on bitmap– Redirect on write

2020

Basic info snapshot1

Current status

Page 21: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills – disk snapshot• ROW based disk snapshot

– Only create one bitmap and one null disk image– Lightweight metadata based on bitmap– Redirect on write

2121

Basic info snapshot1

Current status

snapshot2

checkpoint

Page 22: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills – disk snapshot• ROW based disk snapshot

– Only create one bitmap and one null disk image– Lightweight metadata based on bitmap– Redirect on write

2222

Basic info snapshot1

checkpoint Current status

snapshot2

write

Page 23: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap skills - Consistency protocol

2323

HotSnap based VMC snapshot

VMInitiator

VM peerVMC VM 

peer

running stop running, but snapshoting

Broadcast SNAPSHOT

Recv red packet

send red packet

Peer stop over

stop over

create

finishPeer snapshot over

– Coloring method• Set bit in m_cType in MAC header

– Protocol• One VM as initiator

• Initiator broadcast SNAPSHOT to  VMs

• Peer VM create snapshot, if 

• Receive SNAPSHOT 

• Receive red packet

• VM colors the packet with red flag after finishing snapshot

• After receiving red packet – If snapshot is over, continue to run

– Else, create snapshot first, change VM state to red

Page 24: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

HotSnap Architecture• Architecture

– VM snapshot• Memory Snapshot

– DMA Write Handler– Background Thread– Guest Write Handler

• Disk Snapshot

– Consistency protocol• VMC Snapshot Manager• VM Snapshot Manager• Packet Mediator

2424

Page 25: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

25

Outline• Background

• Problems

• Solution & Implementation

• Experimental Results

• Conclusions

Page 26: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Experimental results• VM snapshot (2G memory)

– Duration: 60s in HotSnap VS. 50s in Pre-copy, related to snapshot size– Downtime: HotSnap is less than50ms, pre-copy is related to workload– Snapshot size: HotSnap snapshot size is same to memory size,Pre-copy is

much larger.

26

Duration (s) Downtime(ms) Snapshot size(GBytes)

benchmarks Pre‐copy Hotsnap Pre‐copy Hotsnap Pre‐copy Hotsnap

Idle 51.66 51.57 36.83 31.88 2.04 2.02

Compilzation 61.11 51.96 381.72 34.16 2.38 2.02

Matrixmultiplication

51.75 52.31 55.73 35.93 2.19 2.02

Memcached 69.43 54.72 150.85 33.80 2.41 2.02

Dbench 60.76 50.18 79.36 39.36 2.17 2.02

26

Page 27: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Experimental results• VMC snapshot results

– 16 2G VMs, 4 physical servers, Distcc to compile kernel– Start and end almost at the same time

2727Pre-copy based VMC snapshot HotSnap for VMC

Page 28: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

• TCP backoff– Average of difference of snapshot completion times between each two VMs

– 16 2G memory VMs, 4 physical servers

Experimental results

Different workloads Various VMC size

Various memory size Various disk size 28

Page 29: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

• TCP backoff– Average of difference of snapshot completion times between each two VMs

– 16 2G memory VMs, 4 physical servers

Experimental results

Different workloads Various VMC size

Various memory size Various disk size

TCP‐backoff duration in HotSnap is about 100‐200ms, and is regardless of workload, VMC size, VM configurations

29

Page 30: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

• Two VMs with different memory size– 1G & 4G has the largest TCP-backoff, 100s– Larger difference implies longer TCP-backoff duration for Pre-copy

method. HotSnap is regardless of difference.

Experimental results

Memory impact on TCP-backoff30

Page 31: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

• Performance impact– Kernel compilization

• Compared to pre-copy, HotSnap reduces 7%-10% time.

– BitTorrent application• 16 VMs, one VM as client, others as seeds• Compared to normal execution, download speed

reduce 28%• Compared to pre-copy, HotSnap shows better

performance when snapshot reaches over

Experimental results

Kernel complication

BT download speed 31

Page 32: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

32

Outline• Background

• Problems

• Solution & Implementation

• Experimental Results

• Conclusions

Page 33: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Conclusions• HotSnap

– Single VM snapshot• Minimize the downtime

• Minimize the difference of snapshot completion times.

– Consistency protocol suitable to virtualized environments

• Experimental results– Single VM snapshot downtime < 100ms– 32 VMs, TCP-backoff duration < 1s– TCP-backoff is regardless of workload, VMC size, VM configurations

• Future work– Evaluate HotSnap in real-world applications– Reduce the saved amount of VNC snapshot file further

33

Page 34: HotSnap: A Hot Distributed Snapshot System for Virtual ... · HotSnap: A Hot Distributed Snapshot System for Virtual Machine Cluster Presented by: Dr Jianxin Li Lei Cui, Bo Li, YangyangZhang

Q&A

ACT lab, Beihang University{cuilei, libo, zhangyy, lijx}@act.buaa.edu.cn