Upload
the-linux-foundation
View
3.632
Download
3
Tags:
Embed Size (px)
Citation preview
Supporting Soft Real-Time Tasks
Min Lee, A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik
Paper published in VEE 2010
© 2009 Avaya Inc. All rights reserved.
Problem Statement
�Given a mix of workloads how do you schedule the workload such that
– Real-time workloads get the needed resources
– Non-real-time workloads are not starved
�Main goal:
– Present a new scheduler based on the credit scheduler which meets the demands of a real-time workload
– Target application studied: Media in an IP telephony system
2
© 2009 Avaya Inc. All rights reserved. 3
Real-Time Applications
�Requirements of a Real-time application like Media Server
– Highly I/O bound
– Needs timely allocation of compute resources
�Challenges when deployed on Xen
– I/O virtualization overhead
– Variable scheduling latency with mixed workloads
– Cache contention between workloads
�Scheduler is central to all these issues!
© 2009 Avaya Inc. All rights reserved. 4
Target Application: Enterprise IP Telephony System
Hardware
Xen Hypervisor
Do
m0
Call-C
Med
ia-S
Sip
-S
Backbone
Network Local Area Network
.
.
.
IP Endpoint
.
.
.
IP Endpoint
Call-C and Sip-S
(Call setup/tear down)
Gateway/
Media Server
Gateway/ Media Server
(stream
encode/decode)
Local Area Network
© 2009 Avaya Inc. All rights reserved. 5
Credit Scheduler: Credit Handling
�Consumes credits to run
VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0
Overpriority
Underpriority
�Every 30ms, distribute credits based on weights
– E.g. 20% CPU to VM 1, 40% CPU to VM 2
– Proportional distribution
– Default weight (256)
© 2009 Avaya Inc. All rights reserved. 6
Credit Scheduler: I/O Handling
�VCPUs woken up and boosted in priority when event arrives
Boostpriority
VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0
Overpriority
Underpriority
VCPU3
VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0 VCPU3
�Only one time boost (<10ms)
© 2009 Avaya Inc. All rights reserved. 7
Credit Scheduler: Credit Crisis
�Credits
– Good for CPU-bound task
�Boost Priority
– Good for Low-latency tasks, e.g. I/O in Dom0
– Short period boost
�Media Servers
– Need both CPU and low latency
– Timely CPU
– No support in current scheduler
CPU- bound
Compute-intensive Tasks (supported by
credit)
Compute-intensive Tasks (supported by
credit)
Low-latency Tasks (I/O
processing, Interactive)
Low-latency Tasks (I/O
processing, Interactive)
Multi-media Tasks
(Soft Real-time)
Multi-media Tasks
(Soft Real-time)
© 2009 Avaya Inc. All rights reserved. 8
Laxity-based Scheduler
� Four Components in the scheduler
– Laxity (A)
– Boost with event (B)
– Simple Load Balance (L)
– Cache-aware Real-time load balancing (LL)
© 2009 Avaya Inc. All rights reserved. 9
Experimental Setup
�Enterprise Telephony system
– Media server – ‘Media-S’
– Signaling server – ‘Call-C,Sip-S’
– Other VMs
• Dom0
• Cdom [Computational domain]
• Two more domains [Licensing, management]
�Two workload scenarios
– Standard (Cdom with no load)
– Cpuload (Cdom has 4 cpu-bound tasks)
© 2009 Avaya Inc. All rights reserved. 10
Experimental Setup
� Dell 2950 server
– 2 quad-core Xeon processors
– 4 GB of RAM
� 4 cores used: 2 cores from each socket
– So private cache
� 4cps, sample one out of 4calls
– G.711, 20ms packetization
– 30sec hold time
– Max 240 streams through Media-S
� PESQ
– Quality of voice metric
– Compare the referencewith stream from Media-S
COMPACT
IP networkIP network
Server
SIPp and RTP Clients
Xen Hypervisor
Do
m0
Me
dia
-S
Call-C
SIP
-S
C-D
om …
tcpdump
COMPACTCOMPACT
IP networkIP network
Server
SIPp and RTP Clients
Xen Hypervisor
Do
m0
Me
dia
-S
Call-C
SIP
-S
C-D
om …
tcpdump
© 2009 Avaya Inc. All rights reserved. 11
Instrumentation
� Custom events for Xentrace
� CPU utilization
� Worst/average wait time
– Each priority
� Scheduled Time
– Each priority
� Cache misses
– By xenoprof
Average wait time
L2 cache misses
© 2009 Avaya Inc. All rights reserved. 12
Reading the plots
�PESQ
– Quality of voice
– 0 (bad)~4.5
– 4.0 (toll quality)
�Cumulative Density
�Boxplots
– Min/Max
– 25%,75% percentile
– Median
Poor quality Good quality
© 2009 Avaya Inc. All rights reserved. 13
Default Credit Scheduler: Performance
�Add weights – 512, 1024
– Insignificant impact
�Pinning
– Media-S/Call-C require significant CPU
– Pinned Media-S/Call-C to CPU 0,1 respectively
– Pinned others to CPU 2,3
– Dom0 is floating
– Significant performance gain
– Underutilizing CPU!
Weights Pinning
© 2009 Avaya Inc. All rights reserved. 14
Laxity (A)
�A form of priority
– Target scheduling latency or deadline
– E.g. less than 5ms wait time in the run queue
– Parameter specified by user
�Laxity values for Real-time domains
�No value specified for Non real-time domains
– Conceptually infinite laxity
© 2009 Avaya Inc. All rights reserved. 15
Implementation of laxity
�Where to insert real-time tasks to meet their deadline
– In over priority, laxity value is ignored.
5us
20us20us&boost
VCPU4(1us)
VCPU2(7us)
VCPU7(4us)
VCPU1(13us)
VCPU0(40us)
VCPU5(21us)CPU0
VCPU#(Length of
previous time slice)Boostpriority
Overpriority
Underpriority
© 2009 Avaya Inc. All rights reserved. 16
Prediction of wait time in runqueue
� Each VCPU maintains an expected run time
– The amount of CPU time it utilized in its previous run
� Works reasonably well
– Min/max
– 25%,75% percentile
� More sophisticated formula can be used
Average difference betweenexpected and actual wait times
© 2009 Avaya Inc. All rights reserved. 17
Laxity-based Scheduler: Performance
�Improved PESQ
© 2009 Avaya Inc. All rights reserved. 18
Boost with event (B)
�Credit scheduler boosts only waiting VCPUs
– Media-S in run queue doesn’t get boosted
�Idea: Boost a VCPU that receives an event
– Even if VCPU with under priority in runqueue
VCPU4
(1us)
VCPU2
(7us)
VCPU7
(4us)
VCPU1
(13us)
VCPU0
(40us)
VCPU5
(21us)CPU0
(1) Receiving event
VCPU4
(1us)
VCPU1
(13us)
VCPU0
(40us)
VCPU5
(21us)CPU0
(2) Get boosted within queue
VCPU2
(7us)
VCPU7
(4us)
© 2009 Avaya Inc. All rights reserved. 19
Credit Scheduler: Load Balance
VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0
VCPU10 VCPU15CPU1
(1) Over or idle task?
(2) Peek peer’s Q
(3) Steal higher priority task
© 2009 Avaya Inc. All rights reserved. 20
Simple Load Balance (L)
VCPU4 VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0
VCPU10 VCPU15CPU1
(1) Under, over or idle task?
(2) Peek peer’s Q
(4) If same priority(a) If mine is real time task, don’t steal(b) If peer’s is real time task, steal(c) Both non-real-time, compare entrance time into queue
- with some delay (2ms)
Laxity-based scheduler (Simple Load Balance)
VCPU17 VCPU11
(3) Steal higher priority task
© 2009 Avaya Inc. All rights reserved. 21
Simple Load Balance (L)
Media-S VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0
VCPU10 VCPU15CPU1
Laxity-based scheduler (Simple Load Balance)
VCPU17 VCPU11
(1) This effectively distributes real-time tasks over CPUs(2) Also prevents starvation
© 2009 Avaya Inc. All rights reserved. 22
Simple Load Balance (L)
VCPU2 VCPU10 VCPU15CPU1
Laxity-based scheduler (Simple Load Balance)
VCPU17 VCPU11
(1) Moves non-real-time task if they waited for some time
Media-S VCPU2 VCPU7 VCPU1 VCPU0 VCPU5CPU0
© 2009 Avaya Inc. All rights reserved. 23
Cache-aware Real-time load balancing (LL)
�L good, but…
– Ping-ponging tasks trash cache
�Solution: Bind RT-tasks to CPU
– Fix RT-tasks to its initial CPU
– Disable load-balancing for RT-tasks
�Creates unbalance of multiple RT-tasks
– Need a new load-balancer for RT-tasks
© 2009 Avaya Inc. All rights reserved. 24
Cache-aware Real-time load balancing (LL)
�Don’t steal peer’s real-time tasks in L
– Prefer hot cache (to low wait time)
�New x-sec-load balancer
– Balance of real time task’s cpu utilization via bin-packing
CPU0 90% utilized by one real time task
CPU1 80% utilized by three real time tasks
© 2009 Avaya Inc. All rights reserved. 25
Cache-aware Real-time load balancing (LL)
�Improved PESQ
© 2009 Avaya Inc. All rights reserved. 26
Result (CPU utilization)
�We’re fully utilizing CPUs!
0
2000
4000
6000
8000
10000
12000
14000
baseline(pinned) baseline A AL ALL
Policies
Am
ou
nt
of
wo
rk d
on
e b
y c
pu
ho
gg
ing
pro
cess
© 2009 Avaya Inc. All rights reserved. 27
Result (Cache misses)
Total cache misses down
Cache misses for standard configuration
0
500
1000
1500
2000
2500
3000
3500
4000
baselin
e A
AL
ALL
Total misses
Mis
ses (
x10000)
0
200
400
600
800
1000
1200
baselin
e A
AL
ALL
baselin
e A
AL
ALL
baselin
e A
AL
ALL
baselin
e A
AL
ALL
baselin
e A
AL
ALL
C-dom Domain-0 Call-C Sip-S Media-S
Cache misses for Media Server down
© 2009 Avaya Inc. All rights reserved. 28
Conclusion
�New soft real-time aware scheduler for Xen
– Better support for real-time applications without penalizing non-real time tasks
– Fully utilizing CPU resources
– Timeliness requirement expressed by laxity
© 2009 Avaya Inc. All rights reserved.
Backup
© 2009 Avaya Inc. All rights reserved. 30
Result (Various laxity value)
�Lower laxity � more realtime � better quality
various laxity values - standard configuration
© 2009 Avaya Inc. All rights reserved. 31
Result (Various laxity value)
�Lower laxity � more realtime � better quality
various laxity values - cpuload configuration
© 2009 Avaya Inc. All rights reserved. 32
Result (Load balance)
�Load balancing is essential for multicore
PESQ improvement through load balancer in standard configuration
© 2009 Avaya Inc. All rights reserved. 33
Result (Boost with event)
�Some impact
Adding Boost with event