6

Click here to load reader

Www.computer.org Csdl Proceedings Wowmom 2013 5827-00-06583507

  • Upload
    z93

  • View
    10

  • Download
    1

Embed Size (px)

DESCRIPTION

csdl

Citation preview

Page 1: Www.computer.org Csdl Proceedings Wowmom 2013 5827-00-06583507

Kurento: a media server technology for convergent WWW/mobile real-time multimedia communications supporting WebRTC.

Luis López Fernández, Miguel París Díaz, Raúl Benítez Mejías

Grupo de Sistemas y Comunicaciones (GSyC) Universidad Rey Juan Carlos (URJC)

º (Madrid), Spain. [email protected], [email protected], [email protected]

Francisco Javier López, José Antonio Santos Naevatec

Las Rozas (Madrid), Spain [email protected], [email protected]

Abstract— WebRTC technologies are an opportunity for achieving a real convergence between WWW, desktop and mobile multimedia real-time communications services, which will contribute to defeating fragmentation and shall provide significant advantages to users and developers all around the world. Following this vision, in this paper we introduce Kurento, a media server technology based on open source software capable of demonstrating how this convergence could take place by combining a SIP/HTTP based signaling plane and a powerful media server infrastructure built on top of the GStreamer software stack. The presented technology is suitable for sending and delivering real-time multimedia through different protocols and formats and capable of providing advanced processing capabilities, which include media mixing, transcoding and filtering. Thanks to this, Kurento could push current WebRTC capabilities beyond plain peer-to-peer communication.

Keywords— WebRTC; SIP; HTTP; RTP; SDP; WWW/Mobile Convergence; IMS; GStreamer; Mobicents

I. INTRODUCTION Real-Time communications are one of the most relevant

technologies of today’s Internet. Services such as Skype, Google Hangouts, Tokbox or Apple’s FaceTime count their users in billions. This phenomenon, joined with the popularization of social networking, is opening a new paradigm of human communications that is slowly cannibalizing the traditional phone service, which has been dominant during the last century. At the core of this new paradigm, we find two types of devices: desktop computers and Smartphone platforms. Desktop computers, through WWW browsers, have been traditionally the way users prefer to access most Internet services including social networks. However, in the last few years, Smartphone platforms have contributed with their ubiquity and mobility and their adoption is skyrocketing.

The area of real-time multimedia communication services has traditionally experienced a severe fragmentation of solutions due to lack of interoperability. For this reason, the above mentioned platform diversity may aggravate the fragmentation problem. This situation is extremely negative for users, who need to deal with different services to communicate with different contacts. Clearly, this is a

stopper for the massive adoption of such services and for the emergence of a whole new industry around Internet real-time communication.

In this scenario, there is an increasing effort for the creation of standardized technologies and services suitable for defeating fragmentation and for enabling and effective convergence of desktop WWW browsers and mobile/smartphone platforms. One of the most relevant initiatives in this area is WebRTC [1]. WebRTC belongs to the HTML5 ecosystem and is aimed at providing Real Time multimedia Communications (RTC) on the WWW. It has awakened significant interest among developers (see http://www.webrtc.org). In opposition to other previous proprietary WWW multimedia technologies such as Flash or Silverlight, WebRTC has been conceived to be open in a wide sense, both by basing on open standards and by providing open source software implementations

WebRTC standards are still under development and they will require some additional time to consolidate, however, there are a number of ingredients that clearly WebRTC will be incorporating in the near future. First, it will provide interoperable multimedia communications through well-established standardized protocols, codecs and formats. Second, it will enable and effective convergence among desktop WWW platforms and Smartphones. This convergence shall take place through a double mechanism. The first is based on the fact that all relevant Smartphone platforms are adhering to HTML5 and, hence, their WWW browsers will come with built-in WebRTC capabilities when the standard consolidates. Second, because the WebRTC stack is being open sourced with really open licenses, which makes simple and attractive for any developer or vendor to incorporate WebRTC capabilities on top of the native APIs of mobile platforms.

Given this, WebRTC is an opportunity for the creation of a truly open and interoperable technology, which could catalyze the emergence of a new generation of novel and non-fragmented social communication services. However, for this to happen, the WebRTC ecosystem needs to evolve further and provide more than pure peer-to-peer video conferencing, as it does now. In this direction, current efforts on WebRTC are concentrated on building the client side capabilities. However, at the server side, only minor contributions have occurred. Nevertheless, a state-of-the-art

978-1-4673-5828-6/13/$31.00 ©2013 IEEE

Page 2: Www.computer.org Csdl Proceedings Wowmom 2013 5827-00-06583507

WebRTC capable media server could provide very interesting features such as media recording, media mixing for group communications, media adaption and transcoding for integration into legacy systems, etc. In this paper we contribute to that vision by introducing Kurento, an open source based media server capable of strengthening the WebRTC ecosystem in several directions.

The structure of this paper is as follows. First, we review current state-of-the-art and show why Kurento contributes to pushing it. Second, we examine current status of WebRTC implementation and explain how the associated APIs work. After that, we introduce the Kurento Media Server architecture and show how WebRTC can be integrated into it. To conclude, we explain how WebRTC enabled applications can be created on top of the resulting infrastructure.

II. STATE OF THE ART AND CONTRIBUTIONS During the preceding decade, the most successful

multimedia solutions on the WWW have been based on proprietary technologies that made difficult their convergence with other technologies for a number of reasons. First, because they used non-standard protocols controlled by commercial companies that had their own objectives and roadmaps, which were not necessarily aligned with the ones of users. Second, because those WWW multimedia technologies were not designed for real-time communications and the quality of experience they provide is not always satisfactory.

More recently, many different initiatives have emerged for the generation of a technology capable of bringing together the mobile and WWW real-time communication worlds. One of the most remarkable of them has come from the IMS ecosystem [2, 3, 4]. However, the IMS model has not succeeded in permeating out of operators and those initiatives did not generate a critical mass of users.

In this context, WebRTC has appeared bringing to reality all the required ingredients for achieving an effective convergence between Web and mobile services. Early experiments and implementations of WebRTC were carried out by Ericsson. However, currently many different companies and individuals are involved in their standardization and prototyping. The standardization efforts are split into two complementary initiatives. On one hand, the RTCWeb group of the IETF (http://tools.ietf.org/wg/rtcweb/charters) is focused on the definition of the required protocols and interoperability mechanisms. On the other, the WebRTC group of the W3C (http://www.w3.org/2011/04/webrtc-charter.html) is working to define a number of APIs suitable for providing, through scripting languages, web browser support for interacting with media devices (microphones, webcams, speakers, etc.), media encoding/decoding capabilities and media transport features.

Although a relevant number of drafts for the protocols and APIs have already appeared [5, 6], WebRTC is still on its infancy. However, with independence on the details, WebRTC is being designed in such a way that its integration with real-time mobile communication services is immediate.

First, because it solves all the complex details of real WWW architectures such as NAT traversal, browser integration, media security, etc. Second, because it is based on a well-know standard for real-time multimedia: the RTP/RTCP protocol stack. Although WebRTC does not specify any type of signaling protocol, it is fully compatible with SIP (many current WebRTC applications are based on SIP). These two ingredients make WebRTC services to be directly usable in IMS and other VoIP infrastructures, which are usually based on combining SIP and RTP/RTCP.

In this context, WebRTC brings a clear opportunity for the creation of non-fragmented and universal communication services accessible from WWW, Smartphones and traditional VoIP systems. Nevertheless, WebRTC efforts are currently concentrated on creating a client technology capable of providing peer-to-peer (P2P) media. Although the provision of a P2P communication model is a remarkable first step, for its mass adoption, WebRTC needs to progress further so that a number of requirements demanded by users (which are not possible basing on current state of the standards) are fulfilled:

• The compatibility with group communications satisfying social interaction schemes.

• The capability of providing value added services such as call recording, call redirecting, answering machines, etc.

• The interoperability with legacy WWW media technologies such as Flash or Silverlight. In the same direction, the capability of integrating into legacy multimedia communication systems based on VoIP or other similar schemes.

• The ability to satisfy novel tendencies for the creation of media-aware and context-aware services involving computer vision, media augmentation, content searching, etc. This type of capability is expected to be the catalyzer of a whole new ecosystem of professional services in areas such as security, entertainment, eHealth, eLearning, etc.

In this paper, we contribute to pushing the success of WebRTC by enriching current state-of-the-art technologies with the following enablers

A. Contribution: innovative media server architecture abstracting complex details of application development. We have created a powerful media server architecture

combining the Mobicents/JBoss Application Server [7] and the GStreamer [http://www.gstreamer.net/] multimedia stack. To understand why this architecture is challenging and relevant, we need a basic understanding of GStreamer. GStreamer is architected around two main concepts: media elements and media pipelines. A media element can be seen as a “black box” capable of acting as a media sink, as a media source or as a media processor. Hence, media elements are usually associated to a specific function performed on the stream. Currently, GStreamer provides more than 1000 different media elements with diverse capabilities such as, for example, reading/writing streams from/to files, mixing several media streams into one, transcoding streams to/from many different formats

Page 3: Www.computer.org Csdl Proceedings Wowmom 2013 5827-00-06583507

(including H.263, H.264, VP8, Ogg, Vorbis, MP3, AMR and Speex), detecting faces into a video stream, blending multiple streams, etc. Basing on this, a media pipeline is a chain of media elements where the output generated by one element is fed into one (or several) downstream elements. Hence, the pipeline can be seen as a “machine” performing complex media processing comprised of a sequence of individual media operations. In summary, GStreamer is a powerful architecture enabling the creation of rich applications performing complex media processing. However, the use of GStreamer for creating applications an its integration into WWW services and systems is extremely complex and requires huge efforts and considerable expertise from developers.

However, our architecture abstracts the GStreamer stack on top of the JBoss/Mobicents capabilities. This means that the GStreamer media elements and pipelines can be managed through specialized Java stubs that can be used in the context of standard SIP and WWW Servlets applications. In other words, we bring the simplicity of WWW development into GStreamer. Besides, given that Mobicents may behave as an IMS Application Server, the integration with mobile, VoIP and IPTV network environments is immediate.

B. Contribution: WebRTC support for GStreamer. In current state-of-the-art it is not possible to use the

WebRTC protocol stack into GStreamer, given that there is no media element providing the required capabilities. For this reason, another major contribution of our work is to create the appropriate enablers adding to GStreamer the capability of receiving and sending WebRTC streams. Given our above discussion, this is a clear progress to the state-of-the-art given that it gives the possibility of injecting those streams into specific purpose pipelines, which can be managed in a simple and seamless manner through our media server. This opens a new whole spectrum of multimedia services for WebRTC applications including the support for flexible group communications, the integration of augmented reality, the use of computer vision for enhancing and personalizing services, etc.

III. WEBRTC TECHNOLOGIES Current WebRTC architecture is depicted on Fig. 1. As it

can be observed, it is based on separating signaling and media planes. Following the WebRTC philosophy, signaling is not part of the standards and its specific implementation is let to the application developer. Currently, different types of protocols are used for that purpose, being SIP and XMPP the most popular ones. The objective of the signaling protocol is to make possible the negotiation of the media formats and transport parameters between the two communicating end-points. This requires the exchange of SDPs that describe first, the offer and later the agreed answer. The details of this negotiation are out of the scope of this paper. The interested reader can find them in the JavaScript Session Establishment Protocol draft [5].

The WebRTC browser capabilities are exposed through an API [6] designed around two complementary concepts: PeerConnection and MediaStream.

A. PeerConnection PeerConnection is the WebRTC component that handles

communication of streaming data between peers. The capabilities of this component are exposed through a JavaScript object to developers. This object abstracts a large number of tedious details and complexities associated to the inner workings of video and audio including packet loss concealment, echo cancellation, bandwidth adaptation, automatic gain control, ICE control for NAT traversal, etc.

B. MediaStream The MediaStream represents the media plane of the

WebRTC protocol stack and comes with two flavors: local and remote. Local MediaStreams are used as a handle for managing the audio and video captured locally (i.e. through the browser of the local peer) in a webcam or microphone. A local MediaStream can be rendered through an HTML5 standard <video> tag.

In the same way, a remote MediaStream may carry video and audio channels and can be rendered using the tag <video>. However, in this case the media does not come from the local camera or microphone, but from the remote peer at the other end of the communication. From a protocol perspective, the stream traverses the network using the formats and ICE negotiated candidates. To secure the communication, SRTP is used, being DTLS one of the possible key exchange mechanisms.

Browser'

JavaScript'Applica2on'

WebRTC'API'

Browser'

JavaScript'Applica2on'

WebRTC'API'

Media'+'ICE'signaling'

Applica2on'Provider'

Signaling' Signaling'

Figure 1. Current WebRTC architecture is based on the typical separation

between signaling and media planes. The media plane is based on direct browser-to-broser secure RTP connections using ICE/STUN/TURN for

NAT traversal. The signaling protocol is not specified and the application developer can select her preferred option for creating it.

IV. KURENTO ARCHITECTURE FOR REAL-TIME COMMUNICATIONS

Once we have understood the basic concepts around WebRTC, we may present the media server solution where we wish to integrate its capabilities: Kurento. Kurento is a Free Open Source Software initiative whose source code is

Page 4: Www.computer.org Csdl Proceedings Wowmom 2013 5827-00-06583507

available here: http://code.google.com/p/kurento/. Kurento signaling plane is based on Mobicents [7], an Open Source platform written on top of the JBoss Application Server. Developing multimedia applications based on Mobicents has a number of advantages given that the underlying JBoss Microcontainer exposes to developers all the features of a professional and mature Java EE server infrastructure including database connectivity, transactional capabilities, messaging, web services, ESB connectivity, security, clustering, seamless web integration, etc. Hence, the JBoss/Mobicents stack acts as an Application Server where application business logic may be created and deployed.

In addition, and given that Mobicents does not provide video capabilities, the Kurento media plane has been created independently basing on the rich multimedia features provided by the GStreamer project. The conceptual representation of the Kurento architecture can be seen on Figure 2, where the separation between the Kurento Signaling Server (KSS) and the Kurento Media Server (KMS) can be observed.

Kurento(Signaling(Server((JBosss)(Kurento(Media(Server(

Flash(Video((Server(

HTTP(Servelet(SIP(Servlet(

Media(proxies(

Media(Session((

Server(side(signaling(plane(

IP#Network#(remote#clients)#

GStreamer(media(pipeline(

SIP#

Thri6#interface#

REST#

RAW#HTTP#

Other#Signaling#

Other(JEE((

SOAP

#

Media((Repository(

Input#Element#

Output#Element#

med

ia#stream

#

med

ia#stream

#

Media(message(bus(

Figure 2. The architecture of Kurento is split into media and signaling

planes. The former is based on the JBoss/Mobicents Java EE stack, while the latter has been built on top of the GStreamer media pipeline framework.

Both communicate using Thrift RPCs for exchanging media control information. This architecture exposes the powerful media capabilities of GStreamer through the flexible and interoperable framework provided by

Java EE technologies.

Observing that figure we can understand precisely what are the contributions of Kurento to the developer community: • First, a C++ wrapper to GStreamer, which exposes its

capabilities (i.e. media elements and pipelines) through low-latency and efficient RPCs based on Thirft [8].

• Second, a number of Java proxies which consume such RPCs and which are embeddable into the Mobicents Application Server.

• Third, a number of extensions to the Mobicents Application server enabling the management of the lifecycle of such proxies and its coordination in the context of Media Sessions.

• Fourth, a development framework based on such Media Sessions making simple the creation of applications combining WWW/SIP Servlets with the advanced multimedia processing capabilities of GStreamer.

Given this architecture, it is easy to understand that Kurento can be used in any type application where the signaling is based on SIP or HTTP and the media is represented and transported in any of the protocols and formats supported by GStreamer. This makes Kurento an ideal candidate for the provision of advanced multimedia applications requiring more than peer-to-peer or media switching capabilities.

V. WEBRTC INTEGRATION INTO KURENTO The integration of WebRTC into Kurento requires, as a

first step, providing a WebRTC capable media plane on top of GStreamer. For this reason, and as part of the research effort described in this paper, we have created a GStreamer media element providing such capability. We have called it webrtcbin, a bin in GStreamer is a special type of element that contains other elements and manages them. This component has been also open sources and is available here: http://code.google.com/p/kurento/source/checkout?repo=gst-plugins-webrtc. The webrtcbin is composed by four basic elements: nicesink, nicesrc, srtpprotect and srtpunprotect. Nicesink and nicesrc are elements provided by libnice package, they are responsible of sending and receiving data to/from the remote client following the ICE protocol. Srtpunprotect and srtpprotect are the elements that handle the security.

WebRTC'Bin'

ICE'

SRTP'

WebRTC'media'streams'from/to'network'

RTP/RTCP'stream'RTP/RTCP'stream'

Gstreamer'media'pipeline'

Kurento'Media'Server'

Processing'media'elements'

Figure 3. Architecture of the Gstreamer webrtcbin component. This media element implements the procol stack required for receving and delivering WebRTC multimedia streams. Thanks to it, Kurento Media

Server is capable of combining the GStreamer pipeline architecture with the WebRTC capabilities.

These elements can also multiplex/demultiplex RTP and RTCP channels if they come within the same SRTP stream.

Current implementation works fine in bundle mode, which is the default in Chrome WebRTC implementation. Bundle mode uses only one stream for all the media (audio, video and RTCP packets). In the future, this feature should be configurable and no-bundle mode will be supported as well. This module has been tested against chrome using VP8

Page 5: Www.computer.org Csdl Proceedings Wowmom 2013 5827-00-06583507

as video codec and OPUS as audio codec, using pre-existing GStreamer modules to encode and decode media.

VI. CREATING A SIMPLE WEBRTC APPLICATIONS WITH SERVER SUPPORT BASING ON WEBRTCBIN.

For illustration and validation purposes, we may explain how to create a simple demo application performing a media loopback, so that the streams the web client sends to the server are given-back to it. This application is not of special practical interest, but we include it given that it allows understanding easily the different elements and modules that are required for creating applications basing on Kurento and WebRTC. This example also allows evaluating the complexity a developer needs to face for building such an application.

The client side of the application has been built on top of Chrome 25 and uses a signaling plane based on a minimal SIP implementation using a WebSocket transport created ad-hoc for the experiment. As it can be observed on Fig 4, the Web application establishes a media session through the same sequence of API-calls used to establish a peer-to-peer connection with a remote browser. This sequence involves invoking the appropriate primitives on the RTCPeerConnection JavaScript object exposed by the WebRTC API. In other words, from the perspective of the Web application developer, the Kurento stack is undistinguishable from another WebRTC client. Hence, application developers do not need to execute any special actions to communicate with our media server.

As Fig. 4 shows, once the call invitation has been received, the KSS has the opportunity of executing an application logic deciding whether the call is accepted or not (more details about this are presented on the following section.) In case it is, the KMS is instructed, through a number of Thrift RPC calls, to create the appropriate GStreamer pipeline, which contains a webrtcbin capable of sending and receiving media. This bin uses the SDP offer received from the remote peer to initialize its media capabilities and generates the appropriate information for issuing an answer, which includes the supported media formats, the ICE candidates and the ciphering keys used for sending the local SRTP streams. Given this, the KMS logic is capable of building the answer SDP and delivering it as the return value of the Thirft RPC.

At this point, the KSS is able to create and issue the SIP OK message in response to the preceding INVITE. Upon reception, the Web client signaling plane delivers the remote SDP to the application, which uses it to assign the remote description and the remote ICE candidates to the local RTCPeerConnection object, which enables the WebRTC stack to initiate the media exchange. When the call is established, both the client and the server side applications receive a signal, so that specific actions (such as rendering the media flows in the client or recording the exchanged media in the server) can be executed.

VII. CREATING CONVERGENT REAL-TIME MULTIMEDIA SERVICES

After a simple example, we can introduce more complex applications involving convergent scenarios. The creation of rich and convergent WebRTC applications based on Kurento requires developers to implement the server side code providing information on what is the specific media processing logic. To start, a UA needs to be created and registered. UAs are server-side stateful objects that take the responsibility of managing communication end-points. In other words, UAs are in charge of specific server side SIP/HTTP URIs. Developers can subscribe event listeners to UAs. This makes possible to provide the specific logic that will be executed when incoming calls arrive, when outgoing calls are issued and when a call is established or terminated by a UA.

On each of these events, the UA provides its listeners a reference to a Call object. This object gives the application developer access to the low-level media stack. This is performed through Joinable objects inspired on the Joinable interface introduced on JSR 309 [9]. The developer is able to instantiate different joinables such as recorders, transcoders, filters, etc., which correspond to the different media elements available on the GStreamer stack. Calls, on their side, also provide joinables representing their incoming and outgoing media streams. Given this, the application developer can create the media processing logic by joining the different media elements so that a media pipeline can be created performing the desired actions (i.e. adapt, augment, filter, record, transcode, etc.) on the media.

A particularly interesting media object is the Mixer. The Mixer represents a GStreamer object capable of mixing media following a scheme defined by the specific type of mixer. Some of the currently available schemes include: audio mixing (full-duplex or half-duplex), audio mixing plus half-duplex video mixing, audio mixing plus grid video mixing (i.e. composing a video wall grid from the individual incoming flows), audio mixing plus selecting the video channel associated to the more powerful audio signal, etc. Mixers allow connecting clients through Ports, which are also joinable objects.

With this information in mind, we can understand how a rich convergent multimedia application can be created integrating WebRTC clients with SIP softphones or other types of videoconferencing application on smartphone, tablet or desktop PCs. For example, if we want to build a group video-chat among all those types of devices, we simply need to create an UA listening at the appropriate (SIP or HTTP) URIs, where clients will call to joint the chat. That UA will instantiate a Mixer which will be ready for combining the different incoming streams into a unified group call through the desired mixing scheme. Upon reception of a Call, the specific incoming call listener simply needs to create the appropriate GStreamer media end-point capable of communicating with the calling client. This end-point will be based on our webrtcbin component, when the caller is a WebRTC capable browser. It will be a standard RTP/RTCP end-point when the caller is a traditional SIP-phone. It may

Page 6: Www.computer.org Csdl Proceedings Wowmom 2013 5827-00-06583507

even be a Flash based client, in which case we may use the RTMP GStreamer capabilities. Independently on the type of end-point, the important aspect is that all of them are joinables, and can be joined to the Mixer.

A very relevant aspect of the joinable mechanism is that Kurento provides a transparent transcoding mechanism capable of transforming the media from the source joinable format to the destination joinable format without requiring additional actions from developers. Mixers require media in raw format to be able to apply the mixing scheme. For this reason, joining to a mixer usually involves transcoding the source/mixed streams to/from raw. From a practical perspective, this means that the above mentioned video chat application supports some clients to use, for example VP8/OPUS, while others may be using H.264/AMR or even others H.263/Speex, etc. In all cases, joining the downstream media flows to the mixer will involve their transcoding to raw, while joining the upstream media flows will be associated to transcoding the raw media provided by the mixer to the expected format of each individual call.

VIII. CONCLUSIONS In this paper we have introduced Kurento: a media server

technology compatible with WebRTC clients and capable of demonstrating how WebRTC applications can interoperate with other mobile and desktop real-time communication services in a seamless and simple way. We expect Kurento to contribute to the consolidation of the WebRTC ecosystem by showing a pathway toward more advanced and universal real-time communication services.

REFERENCES [1] L. Salvatore. R. Simon-Pietro. Real-Time Communications in the

Web: Issues, Achievements, and Ongoing Standardization Efforts. Internet Computing, IEEE, vol. 16, no 5, p. 68-73, 2012. doi: 10.1109/MIC.2012.115

[2] D. Lozano, L.A. Galindo, L. Garcia, "WIMS 2.0: Converging IMS and Web 2.0. Designing REST APIs for the Exposure of Session-Based IMS Capabilities," Next Generation Mobile Applications, Services and Technologies, 2008. NGMAST '08. The Second International Conference on , vol., no., pp.18-24, 16-19 Sept. 2008. doi: 10.1109/NGMAST.2008.97

[3] S. Islam, J.C. Grégoire, "Convergence of IMS and Web Services: A Review and a Novel Thin Client Based Architecture," Communication Networks and Services Research Conference (CNSR), 2010 Eighth Annual , vol., no., pp.221-228, 11-14 May 2010. doi: 10.1109/CNSR.2010.10

[4] L. Lopez-Fernandez, D. Gonzalez-Martinez, D.L. Llanos, C. Maestre-Terol, "AFICUS: An architecture for a future internet of User Generated Contents," Intelligence in Next Generation Networks (ICIN), 2011 15th International Conference on, vol., no., pp. 207-212, 4-7 Oct. 2011. doi: 10.1109/ICIN.2011.6081076

[5] J. Uberti, C. Jennings, “Javascript Session Establishment Protocol”. Internet Draft draft-uberti-rtcweb-jsep-02, Internet Engineering Task Force, Feb. 2012.

[6] A. Bergkvist, D.C. Burnett, C. Jennings and A. Narayanan, “WebRTC 1.0: Real-time Communications Between Browsers”, W3C Editor’s Draft 16, Jan. 2013.

[7] J. Deruelle, "JSLEE and SIP-Servlets Interoperability with Mobicents Communication Platform," Next Generation Mobile Applications, Services and Technologies, 2008. NGMAST '08. The Second International Conference on , vol., no., pp.634,639, 16-19 Sept. 2008. doi: 10.1109/NGMAST.2008.91

[8] M. Slee, A. Agarwal and M. Kwiatkowski, “Thrift: scalable cross-language services implementation”, Whitepaper, Facebook, 156 University Ave, Palo Alto, CA.

[9] T. Ericson, M. Brandt. JSR 309-Overview of Media Server Control API. Public Final Draft, Media Server Control API v1.0, 2009.

WebRTC'stack'

Client''Signaling''Plane'

Kurento''Signaling'Server'

Kurento''Media'Server'

WWW'Applica<on'

WebRTC''Bin'

createOffer() gotOffer(offer)

setLocalDescription (offer)

ICE candidates

invite(Client SDP) INVITE (Client SDP)

process(Client SDP) create(Client SDP)

ICE candidates

Application logic

accepts call

SRTP sending Key

getSDPAnswer()

OK(Server SDK) OK(Server SDP) onOk(Server SDP)

setRemoteDescription (Server SDP)

addIceCandidate (Server candidates)

gotRemoteStream(stream) SRTP'media'(audio/video)'streams'

gotRemoteStream

ACK

render streams

callSuccessful

WebRTC'Client' Kurento'Server'Infrastructure' Figure 4. Sequence diagram showing the different components, messages and interactions involved in a WebRTC application using the Kurento

infrastructure and the webrtcbin GStreamer module.