27
Evaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang, yechuan, linqiangmin}@huawei.com HUAWEI Sponsored by: &

Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

  • Upload
    ledieu

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

Evaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1

Xiaowei Yang, Chuan Ye, Qiangmin Lin

{xiaowei.yang, yechuan, linqiangmin}@huawei.com

HUAWEI Sponsored by:

&

Page 2: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

40pt

: R153 G0 B0

:

FrutigerNext LT Medium

: Arial

40pt

: R153 G0 B0

黑体

30pt

30pt

黑色

:

FrutigerNext LT Regular

: Arial

30pt

30pt

黑色

细黑体

Agenda

• Background

• Memory Overcommit Features / Policy

• Evaluation

Page 3: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Background

• Challenges • Memory becomes bottleneck as VM# grows

• E.g.: VM# > 100 in VDI scenario on 2-socket server

• What’s the proper vMem size? • vMem# too small: bad performance

• vMem# too big: low utilization

• Memory overcommit is a key factor to high VM density /

memory utilization

• ESX / KVM have rich memory overcommit features / policies • ESX: Balloon, TPS, Host swap, compression

• KVM: Balloon, KSM, Host swap

• Xen adds sharing / swapping since 4.0, but • Untested in production

• Need improvements in terms of efficiency, performance

• No policies

Page 4: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Goals

• Evaluate current memory overcommit features in Xen

4.1, particularly memory sharing / swapping

• Make enhancements to current memory sharing /

swapping features

• Design a memory overcommit policy to reach higher

VM density w/o sacrificing performance

Page 5: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

40pt

: R153 G0 B0

:

FrutigerNext LT Medium

: Arial

40pt

: R153 G0 B0

黑体

30pt

30pt

黑色

:

FrutigerNext LT Regular

: Arial

30pt

30pt

黑色

细黑体

Agenda

• Background

• Memory Overcommit Features / Policy

• Evaluation

Page 6: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Blktap2 Sharing How it works Rely on link-clone of parent image

Share 1st read:

• tapdisk2 read from the parent

image on disk, and records the

read / page (p1)

Later read: • Tapdisk2 finds the record,

notifies Xen HV to share with p1

Unshare • Write to the RO sharing page

triggers unshare

Page 7: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Blktap2 Sharing (2)

• Pros • Overhead is lower than content-based page sharing

• No need to calculate/compare page contents in the background

• Relief VM bootup storm issue

• Later read from memory cache

• Cons • Sharing% is much lower comparing to content-based sharing

• Only VMs with the same parent image can share pages

• Rely on PV driver

• No share before PV driver loaded – startup

• Page sharing # is limited by shm #

• Between tapdisk2 processes to store global hash table

Page 8: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

0-page Sharing

How it works

Share • zerosharing triggers Xen HV to

scan VM’s page contents

periodically

• If Xen HV finds the page’s

content is all 0, free the original

page, and points VM’s

corresponding p2m entry to a

special RO 0-page

Unshare • Write to the RO 0-page triggers

unshare

* Red words are our enhancements

Page 9: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

0-page Sharing (2)

• 0-page sharing is the most valuable part in content-

based page sharing per our evaluation

• 0-page sharing is more useful to Windows VM than to

Linux VM • All free mem are 0-page in Windows – it scrubs before free

• A proper scan rate of 0-page sharing is important • Slow: can’t relief memory pressure in time

• Fast: high CPU util%

• Actually 0-page sweep is used in POD • Usage limited - only when POD cache under the pressure

Page 10: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Host Swap

How it works

Swap out • xenpaging selects guest pages

thru policy

• Xenpaing saves their contents to

disk, then notifies Xen HV to free

them

Swap in • Guest access to the swapped-out

page triggers violation

• Xen HV notifies xenpaging to

read back the contents to a new

allocated page

Page 11: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Host Swap (2)

• Pros

• The only memory overcommit feature that guarantees

overcommit ratio

• Cons

• Page selection is hard

• Which page is proper to be swapped out?

• Inefficient

• Disk access latency is much higher

• Double swap scenario

Page 12: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Host Swap Policies

• Basics • Random

• Sequential

• Improvements • Skip low memory: which is used for BIOS, kernel image

• MRU: prevent recent swapped-in pages from being swapped out

• Aggressive MRU: prevent X continuous pages adjacent to each

MRU page from being swapped out

• Swap->sharing: If the elected page is 0-page, share it instead of

swap

• Advanced • Based on statistic of Guest OS page usage in HV

• Based on Guest OS MM knowledge – enlightened

Page 13: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Memory Overcommit Policy

• Metrics • Host Free Memory #

• VM Free Memory #

• VM Maximum Memory

• VM Reserved Memory

• VM Current Zero Page #

• VM Current Balloon Page #

• VM Current Swap Page #

• VM Current Sharing Page #

• Configure options • Sharing threshold (default: 20%)

• Balloon threshold (default: 10%)

• Swap threshold (default: 5%)

• 0-page sharing scan rate (frequency, page #)

• …

Page 14: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Memory Overcommit Policy (2)

• No memory pressure • Turn off memory overcommit

• Host memory pressure is moderate • Turn on blktap2 / 0-page sharing

• Set 0-page sharing scan rate

• Host memory pressure is severe • Adjust 0-page sharing scan rate

• Start balloon; balloon # by VM metrics

• Host memory pressure is critical • Balloon can’t afford memory consumption

• Start swap

• When the pressure goes down, return memory to VM

used% < 80%

used% > 80%

used% > 90%

used% > 95%

Page 15: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Memory Overcommit Support Matrix

Balloon Sharing Host Swap

QEMU OK Sharing

breaks

Trigger swap in

PV driver OK Sharing

breaks

Trigger swap in

I/O device passthru OK Conflict! Conflict!

VM live migration OK Sharing

breaks

1. Trigger swap in

2. Swapfile

accessible after L.M.

VM save OK OK Triggers swap in

vMem snapshot OK OK Triggers swap in

VM resume OK Sharing

breaks

OK

VM hibernate 1. Balloon in

2. Redirect to

0-page

OK Triggers swap in

Page 16: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

40pt

: R153 G0 B0

:

FrutigerNext LT Medium

: Arial

40pt

: R153 G0 B0

黑体

30pt

30pt

黑色

:

FrutigerNext LT Regular

: Arial

30pt

30pt

黑色

细黑体

Agenda

• Background

• Memory Overcommit Features / Policy

• Evaluation

Page 17: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Experimental Environment

Host Configure

Processor 2x Intel X5670 @ 2.93GHz, SMT enabled

Memory 96GB DDR3

Storage Intel 160G X25-M SSD

VM Workloads

SPECjbb 1 vCPU, 4G vMem; SLES 11

Heap size: 2.5GB

Kernel Build 1 vCPU, 512M vMem; SLES 11

Linux kernel: 2.6.32

Sysbench OLTP 1 vCPU, 1G vMem; SLES 11

Database: mysql

VDI benchmark 1 vCPU, 1G vMem; Windows 7

Workload: Office, IE, PDF, Java

Page 18: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Blktap2 / 0-page Sharing # -- VM startup

• 0-page sharing # is dominant • Xen scrubs pages (on host startup, domain destroy, …)

• Lots of `free memory’ are 0-page, can be shared

• Linux uses more memory to boot up • Less 0-page sharing #

• More blktap2 sharing #

Unshared #

Page 19: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Blktap2 / 0-page Sharing # -- VDI workload

VM’s Sharing # diff w/ VDI workload

Before (MB) After (MB) Diff

0-page sharing # 750 400 ↓53%

Blktap2 sharing # 42 171 ↑409%

Unshared # 199 418 ↑210%

Page 20: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

0-page Sharing # -- Windows v.s. Linux

• On startup almost all 0-pages are from `free memory’

• Windows: Free memory is 0-page all the time • Windows scrubs page before free

• More friendly to page sharing

• Linux: Free memory is 0-page only on startup • Linux doesn’t scrub free page

* Mem Hog test case consumes 500MB memory when running

Page 21: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Performance Impact of blktap2 Sharing

• Performance impact of blktap2 is negligible

• Scalability is very good

• In theory blktap2 sharing could benefit READ intensive workload

• First time: from disk; afterwards: from cache

• But the benefit is not observed in KB test

Page 22: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Performance Impact of 0-page Sharing

• Few impacts on the benchmarks’ scores

• Impacts of different scan rates are almost the same • 5%-7% CPU overhead in dom0

• Few new 0-pages are generated during the test – scan finishes fast

• A better benchmark?

Page 23: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Host Swap Policies

• Different policies result in different performances

• Swap->sharing policy brings the best performance most of the

time

• When the remaining vMem# < working set, the performance

drops dramatically

Page 24: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Host Swap v.s. Balloon

• Balloon usually performs better than swap • Balloon transfers the memory pressure from host to guest

• Guest OS knows better about memory usage: which is free; which is

not least/most used

• Swap->sharing policy narrows the gap between swap / balloon

Page 25: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

VM Density -- VDI Workload

1. W/ no memory overcommit: VM# = 85, memory is bottleneck

2. W/ balloon: VM# = 120, CPU/memory are both bottlenecks

3. W/ balloon+sharing: when VM# = 120, host free memory = 17GB,

CPU is bottleneck

4. W/ balloon+sharing: w/o CPU bottleneck, same memory can host

145 VM (projected)

* Host memory = 96GB

Projected

* Test 1/2/3 are performed w/ QoE unchanged

Page 26: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential

35pt

32pt

) :18pt

Takeaways

• 0-page sharing complements blktap2 sharing. The

combination of both is competitive

• The performance impact and overhead of blktap2 / 0-

page sharing is small if used properly

• Windows is more friendly to 0-page sharing than Linux

• Host swap policy do matters. Swap->sharing policy

narrows the gap between swap / balloon

• In VDI scenario a good memory overcommit policy can

increase VM density by 70+% w/ QoE uncompromised

Page 27: Evaluation and Enhancement to Memory Sharing and · PDF fileEvaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang,

Thank you www.huawei.com