26
1 of 26 Scaling and Fault Tolerance for Distributed Messages in a Service and Streaming Architecture Thesis Proposal Hasan Bulut [email protected]

1 of 26 Scaling and Fault Tolerance for Distributed Messages

  • Upload
    ronny72

  • View
    518

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

1 of 26

Scaling and Fault Tolerance for Distributed Messages in a Service and Streaming Architecture

Thesis ProposalHasan [email protected]

Page 2: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

2 of 26

Outline Motivation Goals of the Architecture & Example Applications Literature Survey Research Issues and Tasks Milestones Typical Scenarios Tests Contributions Summary

Page 3: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

3 of 26

Motivation Collaboration systems enable people to collaborate with

each other. However, there are various open research issues in these systems. Some of them are:

A more fault tolerant system A distributed and replicated archiving system

An architecture or framework to cope with network failures A mechanism to recover from failures while session is

recorded Playback is available only after the session is over

Playback mechanism for live sessions

Page 4: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

4 of 26

Motivation An architecture or framework to recover late or broken

clients Late clients will miss parts of the session that have already

passed Extending services to unicast clients

What happens if multicast feature is disabled on the network?

Support for heterogeneous clients Support for videoconferencing (i.e. H.323 clients) and

streaming clients (i.e. RealOne player) Support for desktops and mobile devices such as cellular

phones.

Page 5: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

5 of 26

Goals of the Architecture A service oriented architecture

Provide RTSP (Real Time Streaming Protocol) semantics Compatible with Web Services standards and technologies

Persistent and fault tolerant architecture A distributed and replicated archiving system in a messaging

system environment Dynamic replay service. Ability to switch among distributed

replay services in case node failures Scalable architecture

Allow a large number of clients to connect to the system. Allow heterogeneous (different types of) clients to connect to

the system

Page 6: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

6 of 26

Goals of the Architecture Provide a flexible and extendable framework for new

services Allow instant replay of streams. With this feature, it would be

possible to annotate streams Improve Quality of Service (QoS)

Time ordering of events Maintaining the time spacing between consecutive events

Enable late and broken clients to receive the past events (streams)

A generic architecture that can work with any collaboration tool, such as audio/video, whiteboard, text chat etc.

Page 7: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

7 of 26

Example Applications Consider a late client joining live audio/video session. This

client has three options: Does not care about the missed stream. Plays the missed stream in a faster mode until he/she

catches up with the live stream. Plays the stream from the beginning and follows the live

session from behind. The stream is not necessarily a video stream. It can be

events from a shared displays/applications such as whiteboards or from other collaboration tools.

Client can play a 2-hour long archived stream in 30 min (scaling 2-hour stream to 30-min stream).

Page 8: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

8 of 26

Literature Survey Collaboration systems

Access Grid, InSORS, VRVS, Web based collaboration tools (WebEx, Centra)

Archiving and replay services used in collaboration systems Voyager, IG Recorder

Streaming media standards SMIL, RTSP (RFC 2326), RTP/RDT, data types such as

H.261, H.263, MPEG-4, RealMedia XGSP – XML Based General Session Protocol;

GlobalMMCS NaradaBrokering - Distributed messaging infrastructure

Page 9: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

9 of 26

Collaboration Systems Access Grid (AG)

Uses Internet2 multicast for audio/video transmission. Voyager: Open source archiving tool used to record

audio/video streams in MBONE sessions. InSORS: Can be viewed as a commercial version of AG.

IG Recorder Similar to Voyager, it records audio/video streams as well as

other data streams (i.e. powerpoint slides) in AG sessions. VRVS

Provides some kind of integration of different A/V endpoints. No information about archiving system.

WebEx / Centra : Web based collaboration systems. Recording and playback is done in a traditional way; session

is recorded in a local storage.

Page 10: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

10 of 26

Streaming Media Standards RTSP – Real Time Streaming Protocol

NOT a transport protocol. VCR-like control protocol over media. Stateful server-client communication.

Init Ready

SETUP

TEARDOWN

PLAY / RECORD

PAUSETEARDOWN

Playing / Recording

RTSP States

Page 11: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

11 of 26

Streaming Media Standards SMIL - Synchronized Multimedia Integration Language

“An XML-based language that allows authors to write interactive multimedia presentations”

Multiple streams can be presented in a synchronized timeline.

Real Time Transport Protocol – RTP Usually used in conjunction with RTCP. RTSP server can deliver media data using RTP

RealNetworks’ Data Transport – RDT RealNetworks’ proprietary standard to deliver media. Can be used over UDP or TCP

Data types H.261, H263, JPEG , etc. (mostly used in VC systems) RealMedia, MPEG, etc. (mostly used in RTSP streaming

clients)

Page 12: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

12 of 26

Streaming Servers

Streaming servers are implementation of RTSP. Support for RTSP may vary.

Helix Streaming Server Streaming server from RealNetworks Open source version has limited capability. Formats:

RealMedia, mp3 Commercial version provides live archiving to the local

storage (as media files). Formats: RealMedia, mp3, mpeg-4, QT and WM

Darwin Streaming Servers Open source streaming server from Apple. Supports QT format. Archives the session to the local storage (as media files)

Page 13: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

13 of 26

XML Based General Session Protocol (XGSP) XGSP is a conference control framework. The goal of XGSP is to integrate heterogeneous systems

into one collaboration system. Includes three components; user session management,

application session management and floor control. SIP is a non-XML text-based signaling protocol for

Internet conferencing, telephony and instant messaging GlobalMMCS : A prototype system to verify and refine

XGSP conference control framework. A XGSP media server H.323, SIP gateways and Real Servers for A/V clients XGSP A/V Session Server The web server

Page 14: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

14 of 26

NaradaBrokering (NB) Virtualizes communication transport and endpoints

UDP, TCP, Multicast, SSL ….. Based on a distributed network of cooperating broker

nodes. (brokers support software overlay network) Efficiently routes (content or endpoint-based) information

from producers to consumers of content. Subscriptions can be based on SQL, Regular expressions

and XPath queries. Been deployed and tested in the context of multimedia

conferencing and Grid applications. Introduces delays of order one to two milliseconds at each

broker

Page 15: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

15 of 26

Research Issues We need to research capabilities/services that need to

exist in a messaging system to achieve a higher quality of service (qos) of archiving and replay service Effect of

Timestamping events using NTP on achieving synchronization among streams

Time ordering of events using buffering service and Time spaced release of events using time differential

service on stream quality. A metadata management service for archiving and replay

How to build a session catalog to describe information regarding the streams in the session

How to manage messaging system topics for RTSP sessions How to expose this service as a web service

Page 16: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

16 of 26

Research Issues Improving fault tolerance of the system

Redundancy in archiving/replay services How to provide continuity of the stream in case of a replay

service node crash How the replay service can leverage fault tolerance

Scalable replay service How many requests a replay service can support Load balancing among replay services Effect of network threshold Supporting different type of clients with different capabilities

Other research issues Systematic applications of major and minor event concepts

in event driven systems How to expose RTSP semantics as a web service Synchronization of replaying multiple streams

Page 17: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

17 of 26

Research Tasks RTSP semantics support in XGSP (service oriented

architecture) How RTSP clients can join to XGSP sessions

A RTSP to XGSP signaling gateway How XGSP will support RTSP clients

RTSP semantics support in NB (messaging system) How to support active replay (play, pause, rewind, forward,

absolute positioning, etc) for both live and archived streams Instant replay

How to support and provide seeking capability in live streams Current RTSP servers do not support rewind in live streams

Changes to NB archive and replay service to support RTSP semantics

Do we need extensions to RTSP?

Page 18: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

18 of 26

Milestones I NB Time Service

An implementation of Network Time Protocol (RFC 1305) Entities generating events in the system should utilize Time

Service to timestamp the events. NB Buffering Service

The goal is to time-order events. Delay introduced by the buffer service can vary based on the

above parameter values. Time Differential Service

Releases events preserving the time spacing between events.

Streaming Gateway Transcodes audio/video streams into RealMedia format. Targets both desktop PCs and cellular phones Stream conversion is a CPU intensive application

Page 19: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

19 of 26

Milestones II NB Replay Service

Should provide API to support RTSP semantics. RTSP Media/Topic Manager

Binding RTSP sessions with related NB topics. XGSP Archive Manager

Provides RTSP RECORD semantics to start archiving of topics.

Session Metadata Service Metadata service for archiving system.

RTSP Server / Proxy Ability to dynamically locate replay and archiving services. Ability to switch between replicas.

We will apply those to e-sports project

Page 20: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

20 of 26

Typical Scenario for Live Streaming and Recording

NB

RTSP Client

RTSP Server/Proxy

Producer (XGSP Client (MBONE tools, ...) ,…)

X

2

NBStable

Storage

Replay/ Archivin

g Service

NBStable

Storage

Replay/Archiving Service

Two way NB linkOne way NB link that carries streamLocal Storage accessCommunication channelTopicX

14

3

5

1: XGSP client sends and receives RTP packets2, 3: Archiving service subscribes to the topic and records the sessions on different storages.4: RTSP client communicates with RTSP server/proxy and establishes a RTSP session. 5: RTSP client receives the stream from the topic.

Page 21: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

21 of 26

Typical Scenario for Live Streaming and Recording (with stream conversion)

NB

RTSP Client

RTSP Server/Proxy

Producer (XGSP Client (MBONE tools, ...) ,…)

Streaming Gateway

X

4

NBStable

Storage

Replay/ Archiving Service

NBStable

Storage

Replay/Archiving Service

Two way NB linkOne way NB link that carries streamLocal storage accessCommunication channelTopic

X

2

3

16

5

7

1: XGSP client sends and receives RTP packets2: Streaming Gateway (SG) subscribes to the stream topic and receives the stream 3: SG publishes the stream over NB link 4, 5: Archiving service subscribes to the topic and records the sessions on different storages.6: RTSP client communicates with RTSP server/proxy and establishes a RTSP session. 7:RTSP client receives the stream from the topic.

4

Page 22: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

22 of 26

Typical Scenario for Playback

NB

Replay/Archiving Service

RTSP Client

RTSP Server/Proxy

X

1

2

3

One way NB linkOne way NB link that carries streamCommunication channelTopicX

4

NBStable

Storage

NBStable

Storage

Replay/Archiving Service

1: RTSP client communicates with RTSP server/proxy and establishes a RTSP session. 2: Stream is published by replay service3: Alternate stream to 24: RTSP client receives the stream from the topic.

Page 23: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

23 of 26

Typical Scenario for Instant Replay

NB

RTSP Client

RTSP Server/Proxy

Producer (XGSP Client (MBONE tools, ...) ,…)

X

NBStable

Storage

Replay/ Archiving Service

Two way NB linkOne way NB link that carries streamLocal Storage accessCommunication channelTopic

X

13

2

4

X

6

7

1: XGSP client sends and receives RTP packets2: Archiving service subscribes to the topic and records the sessions.3: RTSP client communicates with RTSP server/proxy and establishes a RTSP session. 4: RTSP client receives the stream from the topic.5: RTSP client communicates with RTSP server/proxy for instant replay.6: Replay service publishes the archived stream to a topic7: RTSP client receives the archived stream.

,5

Page 24: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

24 of 26

Tests NB Time Service tests on several machines. Time differential service performance test. Measuring number of clients that can be supported by a

single replay service and storage. Measuring client scalability Measuring latency of recovery from failures

How long will it take to dynamically switch between replay services during a node failure (node that provides the replay service)?

How long will it take for an archiving node to recover the missed events?

Page 25: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

25 of 26

Contribution of this Thesis Combines the benefits of RTSP with distributed

messaging system and a service oriented architecture for archiving and replay in a geographically distributed large

network A fault tolerant architecture for collaboration systems Enables late, broken clients to receive missed streams An architecture for instant replay of live streams A scalable replay architecture benefits from the

advantages of service oriented architecture and messaging systems

Support for heterogeneous clients

Page 26: 1 of 26 Scaling and Fault Tolerance for Distributed Messages

26 of 26

Summary This thesis addresses the following open research issues

in collaboration systems A framework for fault tolerance:

Support for late or broken clients in live sessions. Distributed archiving/replay system

Support for different clients : Research extension of architectures to support different clients with different capabilities, i.e. cellular phone clients.

Client scalability: Research extension of architectures to support as many clients as possible. Centralized servers support a limited number of clients

An instant replay mechanism for live streams.