Transis
1
Fault Tolerant Video-On-Demand Services
Tal Anker, Danny Dolev, Idit Keidar,
The Transis Project
Transis
2
VoD Service
• VoD: Full VCR control• 1 video stream per client
Client C1
VoD Serviceprovider
Requests
Video Stream
Moviesdisk(s)
Transis
3
High Availability
• Multiple servers – at different sites
• Fault tolerance:– servers can crash
• Managing the load:– new servers can be brought up / down– load should be re-distributed “on the fly”
migration of clients
Transis
4
The challenges• Low overhead
• Transparency– How do clients know whom to connect to? “abstract” service
– Clients should be unaware of migration
serverserver
VoD Service
Client C1
Client C2
server
Client C1
Client C2
server
server
VoD Service
server
Failed Server
Transis
5
Buffer Management andFlow Control
• Overcome jitter, message re-ordering and migration periods
• Re-fill buffers quickly after migration– avoid buffer overflow
• Minimize buffers– minimize pre-fetch bandwidth
• Dynamically adjust transmission rate to client capabilities– Re-negotiation of QoS
Transis
6
Features of our solution• Use group communication in the control plane
– connection establishment– fault tolerance and migration
• Flow control explicitly handles migration• Low overhead
– ~1/1000 of the bandwidth– Negligible memory and CPU overhead
• Commodity hardware and publicly available network technologies
Transis
7
Environment
• Implementation– UDP/IP over 10 Mbit/s switched ethernet– Transis– Sun Sparc and BSDI PC’s as video servers– Win NT machines as video clients– MPEG1 & 2 hardware decoders
• Machine and Network Failures
Transis
8
Implementing the abstract service
• Use group communication – clients communicate with a well known group
name (logical entity)– unaware of the number and identity of the
servers in the group
• Servers periodically share information about clients (every 1/2sec)
• If a server crashes (or is overloaded), another server transparently takes over
Transis
9
Group Communication
• Reliable Group Multicast(Group Abstraction)
• Message Ordering
• Dynamic Reconfiguration• Membership with Strong Semantics
(Virtual Synchrony)
Systems: Transis, Horus, Ensemble, Totem, Newtop, RMP, ISIS, Psync, Relacs
Transis
10
The group layout of the VoD service
Title:VISIO-vod-grp.vsdCreator:PSCRIPT.DRV âéøñä 4.0Preview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Transis
11
Transis Allows Simple Design
Group abstraction for connection establishment and transparent migration
Reliable group multicast allows servers to consistently share information
Membership services detects conditions for migration
Reliable messages for control
– Server takes ~2500 C++ code lines– Client takes ~4000 C code lines (excluding GUI and display)
Transis
13
Flow Control• Feedback based flow-control (sparse):
– FC messages are sent to the logical server (session group)
– Clients determines the changes in the flow:
Value of buffer occupancyRange and freq request0 – critical low-1 urgent emergecnycritical low – low mark-1 urgent uplow mark – high mark-1 < prev normal uplow mark – high mark-1 > prev normal downhigh mark – full urgent down
Transis
14
Emergency Flow Control
• When the server receives an emergency message:– The server change the fps rate:
fps = latest-known-fps + emergency quantity
• The emergency quantity decays every second (by a factor) – While the quantity is above zero, the server
ignores FC messages from the client
Transis
15
Performance Measurements
• On HUJI Network (LAN)
• Servers at TAU and clients at HUJI (WAN)
• The measurements show the system is robust and support our transparency claims
Transis
16
Software BuffersTitle:'vod100.stats.softb.ps'Creator:gnuplotPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Transis
17
Hardware BuffersTitle:'vod100.stats.hardb.ps'Creator:gnuplotPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Transis
18
Skipped Frames on LANTitle:'vod100.stats.pl_skipped.ps'Creator:gnuplotPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Transis
19
Skipped Frames on WANTitle:'vod100.stats.pl_skipped.ps'Creator:gnuplotPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Transis
20
Summary
• Scalable VoD service
• Load balancing
• Tolerating machine and network failure
• All the above are achieved practically for free:– ~1/1000 of the total bandwidth– Negligible memory and CPU overhead
Transis
21
Thanks to ...
• Gregory Chockler
• The other members of the Transis project