An extensible framework for constructing SIP user agents

� An Extensible Framework for ConstructingSIP User AgentsRobert M. Arlein and Vijay K. Gurbani

A Session Initiation Protocol (SIP) user agent is an endpoint in a signalingnetwork that can send or receive SIP messages. One can build a functionaluser agent in a few hundred lines of Java* code that sets up a call betweentwo SIP phones. However, such a user agent will not fully comply with theprotocol. Writing a compliant user agent is a complex undertaking involvingthousands of lines of code. The iSURF framework greatly reduces the effortof this undertaking. iSURF uses a SIP transaction library called siptrans as itstransaction processing layer. However, iSURF can use a different transactionlibrary, and siptrans can be used in a different framework or even in a SIPproxy. In this paper, we describe the protocol requirements for a SIP useragent and how our framework facilitates building such an agent. We alsodescribe the design and architecture of both iSURF and siptrans.© 2004 Lucent Technologies Inc.

IntroductionSession Initiation Protocol (SIP) [4] is an applica-

tion layer signaling protocol that can be used for many

purposes. For example, it may be used to set up a

multimedia session among several participants, it may

be used as a bridge to legacy networks, or it may be

used to configure or implement various telephony

services. There are two major SIP entities: a proxy

and a user agent. User agents are signaling endpoints,

while proxies aid in the rendezvous of two SIP user

agents. Most SIP-based applications require special-

ized user agents. A user agent may be built from

scratch with relative ease. In fact, one of the authors

built the third-party call control application discussed

in the section “An iSURF API Example” with just 500

lines of Java* code. However, this application makes

many simplifying assumptions that make the appli-

cation non-SIP compliant. For example, the applica-

tion assumes that all the requests it sends are received.

Building a SIP-compliant application requires much

more code.

The solution is to write user agents with a SIP

library and an application programming interface

(API) that encapsulate many of the details of the SIP

protocol. There are two such standard APIs for Java—

JAIN* SIP API [2] and JAIN SIP lite [6]. Both of these

APIs are relatively low level and message based.

There are no such standard C/C++ frameworks

for building user agents. Some of the available non-

standard frameworks [5, 7] do not have a user-friendly

API (in our opinion) and, furthermore, none of these

frameworks could be integrated easily with our lower-

level, transaction processing framework, siptrans. The

siptrans framework is both stable and fully standards

compliant. We are not sure whether the other frame-

works are also stable and standards compliant. For

these reasons, we built our own framework, iSURF

Bell Labs Technical Journal 9(3), 87–100 (2004) © 2004 Lucent Technologies Inc. Published by Wiley Periodicals, Inc.Published online in Wiley InterScience (www.interscience.wiley.com). • DOI: 10.1002/bltj.20044

Panel 1. Abbreviations, Acronyms, and Terms

API—Application programming interfaceASCII—American Standard Code for

Information InterchangeABNF—Augmented Backus-Naur formDNS—Domain name serverHTTP—Hypertext Transfer ProtocoliSIP—Lucent’s SIP proxyiSURF— iSIP user agent resource frameworkMIME—Multipurpose Internet mail extensionsSIP—Session Initiation ProtocolSRV—Service, a type of DNS recordTCP—Transmission Control ProtocolTLS—Transport layer securityTU—Transaction userUA—User agentUCS—Universal Character SetUDP—User Datagram ProtocolURI—Uniform resource identifierUTF-8—UCS Transformation Format, an 8-bit

lossless encoding of Unicode characters

88 Bell Labs Technical Journal

(iSIP user agent resource framework), over the siptrans

framework. (iSIP is a proxy from which the siptrans

layer was extracted.) iSURF provides a low-level API

with configurable, higher-level routines (plug-ins) to

handle common processing, such as authentication.

Although iSURF runs over siptrans, it can be modified

to run over similar frameworks. Likewise, other higher-

level frameworks can run over siptrans.

The rest of this paper describes the iSURF archi-

tecture and API, illustrates the API with a sample ap-

plication, and describes our implementation including

siptrans.

iSURF ArchitectureFigure 1 depicts the reference SIP stack as out-

lined in RFC 3261 [4]. It also shows which layers of

the reference stack are realized by the two frame-

works we discuss in this paper. Note that, as is the

case with reference models, the cleanly separated lay-

ers often have interdependencies when realized in

software. For example, even though the reference

model depicts the lowest layer as a syntax/encoding

layer, it is obvious that in order for a SIP request to get

to it, it must have visited the transport layer. That is,

the transport layer must have allocated appropriate

resources (e.g., socket descriptors, internal buffers

associated with these sockets) and recognized the

boundaries of the SIP message before sending it to

the syntax/encoding layer.

Working from the bottom, the first layer is the

syntax and encoding layer. SIP is an American

Standard Code for Information Interchange (ASCII)

protocol; its syntax is defined by the augmented

Transaction user

Transaction

Transport handling

Syntax/encoding

Statelessproxy UAS UAC Redirect Registrar Transaction-/call-

stateful proxy B2BUAiSURF

siptrans

B2BUA—Back-to-back user agentiSIP—Lucent's SIP proxyiSURF—iSIP user agent resource framework

UAC—User agent clientUAS—User agent server

Figure 1.A SIP stack hierarchy.

Bell Labs Technical Journal 89

Backus-Naur form (ABNF) in [4]. Unlike traditional

telephony signaling protocols that are encoded and

transported in abstract syntax notation, no special

encoding/decoding software is required for SIP. A SIP

stack normally provides routines for parsing the

UTF-8 encoded ASCII stream into internal data struc-

tures for consumption by the layers above.

The transport handling layer defines how clients

and servers behave with respect to the multiple trans-

ports supported by SIP. Unlike the Hypertext Transfer

Protocol (HTTP), which runs over Transmission

Control Protocol (TCP) and transport layer security

(TLS)-over-TCP, SIP has been defined to work over

TCP, TLS-over-TCP, User Datagram Protocol (UDP),

and Stream Control Transport Protocol. Transport plu-

rality is thus an additional facet of a SIP stack. The

transport layer isolates its idiosyncrasies from the

upper layers. For instance, the transport layer will

choose to transmit a SIP message over a congestion-

controlled transport, such as TCP, if it determines that

the message is so large that sending it over UDP will

fragment it. The transport layer will place only a sin-

gle SIP message into a UDP datagram, as specified by

the standard.

The next layer, the transaction layer, is present in

all SIP entities except stateless proxies. Transactions

are a fundamental concept in SIP; a transaction is a

request sent by a client, an optional (and unlimited

number of) provisional responses followed by one or

more final responses. The ACK request is considered

part of the INVITE transaction if the final response is

a non-2xx response; otherwise, it is considered a sep-

arate transaction. The transaction layer is also re-

sponsible for matching responses to appropriate

requests and for scheduling the retransmission of

requests and responses over unreliable transports.

Siptrans implements the transaction layer and

transport layer. iSURF is built over siptrans and im-

plements the transaction user (TU) layer. As Figure 2depicts, iSURF partitions this layer into two pieces:

the iSURF core and the application. The application

builds and sends outgoing messages. The application

can delegate plug-ins to process some of these

incoming messages. The iSURF core notifies the ap-

plication’s event handlers of incoming messages as

well as other events. The iSURF core, which main-

tains states for calls and dialogs called contexts, is de-

scribed later. The iSURF API helps the application

build messages in call or dialog contexts. The rest of

this section describes this architecture in more detail.

Transaction/Transport LayersThe transport layer works as described above and

is responsible for sending and receiving messages over

the network. The transaction layer manages SIP trans-

actions, which consist of a SIP request and the result-

ing responses. To this end, it maintains a state

machine whose inputs are incoming SIP messages,

outgoing SIP messages, and timeout indications. In

addition to maintaining state, the transaction layer

does the following:

• Retransmits outgoing requests until the receipt of

a response or until a timeout.

• Handles duplicate incoming requests by sending

cached responses.

• Passes messages from the TU to the transport

layer.

• Passes messages from the transport layer to the

TU.

• Generates ACKs for incoming non-2xx class

responses to INVITE transactions.

• Rejects badly formed incoming messages.

• Passes validated incoming messages to the TU.

The TU LayerThe TU layer is responsible for responding to valid

incoming messages, rejecting invalid messages, and

generating fresh requests. To do this properly, this

layer must maintain dialog state and other states

relating to message payloads. As noted above, these

responsibilities are shared between the application

and the user agent (UA) core. The core’s responsibil-

ities are:

• Maintaining dialog and call state.

• Notifying the application of various timeouts and

possibly responding to a timeout autonomously.

For example, if directed by the application, iSURF

will retransmit successful responses to INVITEs

until the arrival of an ACK.

• Processing incoming messages delegated to iSURF

through registered plug-ins.


The application’s responsibilities are:

• Handling incoming messages within event handlers

that the application registers with the iSURF core.

• Delegating all or some of the processing of in-

coming messages to plug-ins. The application pro-

visions these plug-ins as well. For example, the

application can register a plug-in to handle in-

coming OPTIONS requests.

• Originating outgoing requests and responses.

• Maintaining state, other than the state main-

tained by the framework. For example, the ap-

plication maintains media state.

iSURF APIWhile there is a place for a higher-level inter-

face that abstracts away the complications of SIP,

such interfaces cannot be used to build an arbitrary

SIP user agent. These interfaces are useful as a

supplement to a lower-level interface. The iSURF

API, like the SIP standard, is basically message

based. iSURF’s plug-in facility may be used to ele-

vate the level of the iSURF API by providing stan-

dard routines that the application registers and

provisions.

The iSURF API provides the following facilities

(the important facilities are discussed in more detail

later in this section):

• An object-oriented message representation in-

cluding constructors for building messages based

on other messages or contexts, e.g., building a

response from a request.

• Registration of event handlers to handle notifica-

tions of incoming messages, timeouts, and other

events.

Application main

Dispatcher

Event handlers

Plug-ins(Standard event handlers)

Contexts

Transaction/transport layer

Outgoingmessages/

initializeframework

Outgoingmessages Events including

incoming messages

Handlemessage ifplug-in is active

Update contextwith message

Incoming messagesand other events

Outgoingmessages

Initializeframe-work

Application layer

iSURF core layer TULayer

iSIP—Lucent's SIP proxy iSURF—iSIP user agent resource framework TU—Transaction user

Figure 2.iSURF architecture.


• Methods to send a message or repeatedly send

a message at intervals, e.g., retransmitting an

INVITE response or refreshing a REGISTER.

• An interface to configure the iSURF runtime

environment.

• Application controlled timers.

• A mechanism for the application to store and re-

trieve data from messages or contexts. We show

how this facility is used in the section “An iSURF

API Example.”

• Registration of plug-ins and support for registered

plug-ins.

Events and Event HandlersThe iSURF runtime environment is basically

event driven. The application initializes the frame-

work and then optionally sends one or more SIP re-

quests. Subsequent incoming messages and other

events are then delivered to the application’s event

handlers. There are several categories of events: in-

coming messages, various timeouts, notifications of

plug-in activity, notifications of context creation and

destruction, and notifications of subscription creation

and destruction. Every SIP message that shares the

same Call-ID header is part of the same call context.

Roughly, messages in the same call context that also

share From and To headers are part of the same dialog

context. (Contexts are discussed in a later section.) The

application can infer context creation and context de-

struction from the incoming and outgoing messages.

However, the rules for context creation and destruc-

tion are quite complicated. Notifying the application

of these events can simplify the application.

Every iSURF application registers a single global

event handler. In addition, iSURF allows context-

scoped event handlers, i.e., handlers for messages in

the same context. We have not seen mention of

context-scoped event handlers in the JAIN [2] or

VOCAL [7] application frameworks. This facility can

better organize an application when message treatment

is a function of message context, as is often the case. It

is the case with third-party call control as illustrated in

the “An iSURF API Example” section.

If a context-scoped event handler is active for an

incoming message, that handler is invoked for the

message rather than the global event handler. If a

message can be handled by both a dialog-scoped and

a call-scoped event handler, only the dialog-scoped

event handler is invoked.

Message RepresentationiSURF presents an object-oriented representation

of SIP messages and message components that models

the RFC 3261 grammar. The application never directly

accesses object data but uses member functions to

manipulate the data. An application never needs to

explicitly free these objects as described below. The

iSURF message constructors simplify building mes-

sages. When a message is built from scratch, con-

structor arguments contain some of the mandatory

message headers for the message. The other manda-

tory headers are derived. For example, a constructor

for an OPTIONS request takes a source and destination

uniform resource identifier (URI) as arguments and

uses these arguments to build the Request-URI, From,

and To headers. The constructor also fills in the Via,

Max-Forwards, Call-ID, and CSeq header fields. When

the message is not built from scratch, the message

headers are inherited from another message or con-

text. For example, a response may be constructed

from a request, in which case the From header, To

header, CSeq header, Via headers, and Call-ID headers

are copied from the request to the response.

When an application sends or receives a message,

iSURF only checks those elements of the message that

relate to maintaining context information. iSURF does

not run other checks for performance reasons. For

example, iSURF checks the number in the CSeq

header of messages in dialogs, but does not check to

ensure that the Event header is not present in an

INVITE request.

Every SIP message is represented as an instance of

the Message class. This class has two subclasses:

Request and Response. Furthermore, for each re-

quest method there is a subclass of Request such

as Invite, Ack, or Options. A request whose

method is unknown to the implementation is sim-

ply represented as instances of the base Request class.

The Message class is a subclass of the Memory

Wrapper class, which handles the memory manage-

ment for instances of Message. This mechanism is

described later. The Message class contains methods


for manipulating the headers and other components

of a message.

For each known header type there is an iSURF

class such as FromHeader and ToHeader. iSURF

provides methods to manipulate the various header

components. Header types unknown to the imple-

mentation are represented by the Header class,

which represents such headers simply as a header

name and header value.

iSURF represents message content and the asso-

ciated content-related headers as the Payload class

and so deviates from the standard SIP representation

in which the content-related headers are not grouped

separately. With this scheme, the Content-Length

header is never specified but is derived from the

length of the string representation of the payload and,

therefore, there can never be a discrepancy between

the actual payload length and length stated in

Content-Length. The Payload class represents the

actual payload as a string. Subclasses of Payload

provide richer representation than string for Session

Description Protocol (SDP) and for multi-part MIME

attachments. A multi-part MIME attachment is simply

represented as a list of Payloads.

Message Resending MechanismsiSURF provides a mechanism to periodically re-

send the same message. For example, iSURF can

repeatedly resend a 2xx INVITE response at appro-

priate (exponentially decaying) intervals until the

response has been acknowledged. Similarly, iSURF

can periodically send REGISTER requests or

SUBSCRIBE renewals.

State/ContextsiSURF maintains dialog and call context state. RFC

3261 describes dialog state in detail. Dialogs are estab-

lished after a provisional response or a successful final

response to an INVITE request. Extensions to the

standard describe other means of establishing a dialog.

For example, a successful response to a SUBSCRIBE

request establishes a dialog. Once a dialog has been

established, iSURF updates dialog state as additional

messages in the dialog are encountered. Certain

messages terminate a dialog. For example, a dialog that

has been established by an INVITE may be terminated

by a BYE, or certain responses to re-INVITES or to re-

SUBSCRIBES in a dialog will terminate the dialog.

Dialogs that have not been confirmed by a final

response are terminated by a subsequent failure

response.

The SIP standard does not mention call context

explicitly. However, in iSURF, a call context is estab-

lished on processing a request with a new Call-ID

value. Every message with this Call-ID is in the same

call context. If a response to that request establishes a

dialog, the call context terminates when every dialog

associated with the request terminates; otherwise, the

call context terminates when the transaction associ-

ated with the request terminates.

Applications can use contexts in several ways.

iSURF simplifies building messages in dialog contexts

and, as explained in the “Events and Event Handler”

section, the application can use context-scoped event

handlers to improve its organization. As another con-

venience, the application can store application data

in a context and retrieve the data when processing a

message in the context. Finally, the application can

access the context state itself. For example, the appli-

cation can access the list of active dialogs established

with a call.

As mentioned previously, for performance rea-

sons, iSURF only checks that incoming or outgoing

messages do not violate state constraints such as:

• A response must match a request.

• Requests in a dialog must be in order.

• It is invalid to send a re-INVITE before receiving

confirmation of the previous INVITE.

The application is responsible for all other mes-

sage validation including the validation described in

Section 8 of RFC 3261. The application can delegate

some of this responsibility by configuring and regis-

tering plug-ins to validate messages.

Plug-insPlug-ins are iSURF procedures that are config-

ured and registered by the application to partially or

completely handle incoming messages. As with inco-

ming event handlers, plug-ins may be registered

globally or in the scope of a dialog or call. We now

describe the types of plug-ins and their order of


execution. Multiple plug-ins may be executed for a

single message. The first plug-in that can be executed

for an incoming message checks the message syntax

using the procedures described in Section 8 of RFC

3261. The application configures this plug-in by in-

dicating the methods it recognizes, indicating the

headers it recognizes, and so forth. If this plug-in re-

jects a message, iSURF invokes the application’s

“plug-in rejects” event handler and none of the ap-

plication’s incoming message handlers are invoked

for this message.

Otherwise, if the message is a request and if an

authentication verification plug-in is active, it is in-

voked. The application configures this plug-in with

a policy that indicates which requests to challenge. If

this plug-in generates an authorization challenge

request, iSURF invokes the application’s “plug-in chal-

lenges request” event handler and none of the appli-

cation’s incoming message handlers is invoked.

Otherwise, at most one plug-in is now invoked

for the message. These final plug-ins are specified for

requests by the request method or specified for re-

sponses by the response code. For example, there is a

plug-in to handle OPTIONS requests and a plug-in to

handle 301-Moved Permanently responses. If a plug-

in is invoked at this time, then none of the applica-

tion’s incoming event handlers are executed for the

message. After the plug-in executes, the application is

notified by either the “plug-in rejects message” or the

“plug-in handles message” event.

Memory ManagementiSURF seeks to reduce the onerous task of mem-

ory management by automatically managing the

memory for the following objects: messages, message

components, and contexts. All of these objects are

represented as subclasses of the MemoryWrapper

class, which contains a pointer to the data for these

objects including a reference count. This single pointer

is the only data member in the various subclasses.

Copy constructors, regular constructors, assignment

operators, and destructors for MemoryWrapper are

designed to implement a reference counting scheme

as well as copy-by-reference. Constructing an instance

of one of these subclasses results in allocating data for

the instance, storing a pointer to the allocated data in

MemoryWrapper, and setting the reference in the

allocated data to one. Copying a MemoryWrapper

instance is implemented by a pointer copy and an

increment of the reference count. Destroying an

instance decreases the reference count for the in-

stance. Memory is freed when the reference count

goes to zero. Strings or lists are represented using the

C++ standard template library, which automatically

manages the memory for these types.

An iSURF API ExampleWe describe a sample application that demon-

strates context-scoped event handlers and building

messages with the iSURF API. The example application

is a server that receives requests to set up SIP sessions

between two parties and, thus, implements third-party

call control. For each request, the application sends

an INVITE without a session descriptor to user A. User

A’s response contains a session descriptor, offering one

or more choices for the media protocol for the session.

The application does not ACK user A’s response at this

time but sends an INVITE to user B with a copy of

user A’s session descriptor. User B’s subsequent re-

sponse contains a session descriptor that answers the

offer from the INVITE. The application then ACKs user

B’s response and ACKs user A’s earlier response. The

application attaches the session descriptor from user

B’s response to the ACK sent to user A. Media now

flows between the two parties. When either party

wants to end the call, it sends a BYE to the application.

After responding to the BYE, the application sends a

BYE to the other endpoint. Alternatively, the applica-

tion can terminate the call. We illustrate the call flow

for our application in Figure 3. The application

terminates the call in this illustration.

The application models a call with the Call class.

The Call constructor sends out an INVITE to

party A and registers an instance of ALegHandler

for this call leg. Upon receipt of a 200-OK response,

AEventHandler :: handleIncomingResponse

sends an INVITE to party B and registers an in-

stance of BLegHandler for the call leg between

the application and party B. On receipt of a 200-OK

response from party B, BlegHandler :: handle

IncomingResponse acknowledges this response


and then acknowledges party A’s previous response.

Both event handlers contain pointers to a Call

instance and contain functions to handle BYE requests

and responses. Panel 2 contains most of the applica-

tion code. It does not show the BYE handling code

and the code for failure responses.

ImplementationThe iSURF implementation is based on siptrans,

which we discuss next. The current iSURF imple-

mentation has the standard methods (INVITE, ACK,

BYE, REGISTER, OPTIONS, and CANCEL) as well as

SUBSCRIBE and NOTIFY. iSURF runs on the Solaris,*

Linux,* Windows,* and WinCE platforms. After the

siptrans discussion, we discuss how iSURF is built over

siptrans, iSURF’s performance overhead, and how to

extend iSURF with additional methods.

SiptransSiptrans is the transaction processing engine

for iSURF. Every SIP entity, with the exception of a

stateless proxy, requires a transaction manager. The

siptrans framework provides such a transaction man-

ager in the form of a multithreaded C library suitable

for being linked in with higher-layer applications (like

TUs). With reference to Figure 1, siptrans provides

the capabilities of the bottom three layers.

Siptrans provides an API consisting of eight meth-

ods and three interfaces that allows TUs to use it for

the services it offers. These interfaces and methods

are reproduced in Table I.In SIP, proxies are pure transaction processing

entities; thus, it appears reasonable that a transaction

processing library be culled from the internals of a SIP

proxy. Siptrans is refactored from a commercial grade,

User A Application User B

Invite (payload�0)

200-OK Response (payload�A)

ACK (payload�B)

ACK (payload�0)

200-OK Response (payload�B)

Invite (payload�A)

BYE

BYE

Media

200-OK Response

200-OK Response

Figure 3.Call flow for iSURF API example.


Panel 2. iSURF example

struct Call;class ALegHandler : public ContextEventHandler {

Call *call;public:

ALegHandler(Call *call) { this->call = call; }void incomingInviteResponse(const Response resp);void incomingByeRequest(const Bye);

};class BLegHandler : public ContextEventHandler {

Call *call;public:

BLegHandler(Call *call) { this->call = call; }void incomingInviteResponse(const Response resp);void incomingByeRequest(const Bye);

};struct Call {

Response aLegResponse;string aUser; string bUser;string aHost; string bHost;ALegHandler a;BLegHandler b;void sendInvite(string fromUser, string toUser, string host, Payload p,

ContextEventHandler *handler) {

SipAddress sender(SipStack::getMyURI(fromUser));SipAddress receiver(SipURI(toUser, host));FromHeader from(sender);ToHeader to(receiver);StandardContactHeader contact(sender);Invite invite(to, from, contact, p);SipStack::sendMessage(invite);CallContext call(invite);call.setEventHandler(handler);

}Call(string auser, string ahost, string buser, string bhost) : a(this),

b(this){aUser = auser; bUser = bUser; aHost = ahost; bHost = bHost;//Sends an Invite to party A, sets "a" as event handler for callsendInvite(bUser, aUser, aHost, Payload(), &a);

}};void ALegHandler::incomingInviteResponse(const Response resp) {

if (resp.getResponseCode() < 200 && resp.getResponseCode() >= 300) return;call->aLegResponse = resp;//Send an Invite to party B, sets "b" as event hander for this callcall->sendInvite(call->aUser, call->bUser, call->bHost, resp.getPayload(),

&call->b);}

//Ack the response from A, Ack the previous response from B Attach A's//payload to this Ackvoid BLegHandler::incomingInviteResponse(const Response resp) {

if (resp.getResponseCode() < 200 && resp.getResponseCode() >= 300) return;Ack ack(resp);SipStack::sendMessage(ack);ack = Ack(call->aLegResponse);ack.setPayload(resp.getPayload());SipStack::sendMessage(ack);

}


Table I. Siptrans interfaces.

Interface orName Method Description

siptrans_init_core Method Initializes the siptrans complex

siptrans_version_get Method Returns the version of the siptrans library

siptrans_log_set_level Method Sets the logging level for the siptrans library

siptrans_log Interface Implements the logging capabilities specific to the TU

siptrans_sip_message_to_TU Interface Invoked by siptrans library to send a SIP messageto the TU core.

siptrans_dispatch_sip_message Method Called by the TU to send a SIP message (requestor response) to siptrans for further delivery.

siptrans_schedule_alarm Method Called by TU to schedule an alarm.

siptrans_alarm_to_TU Interface Invoked by the siptrans library when an alarmset by the TU needs to be fired.

siptrans_start Method Called by TU to start the siptrans library

siptrans_wait Method Called by the TU to keep the calling thread active

siptrans_stop Method Called by TU to stop the siptrans library

SIP—Session Initiation ProtocolTU—Transaction user

RFC 3261-compliant [4] proxy developed by Lucent

Technologies. The proxy is resident on the PacketIN®

application hosting environment [1]. During its de-

velopment cycle and as part of an ongoing effort to

be compliant to the SIP specification, the proxy has

participated in six SIP interoperability events. Thus, it

was a design goal of siptrans to inherit, as much as

possible, the tested functionality of the proxy.

Siptrans alleviates much of the drudgery associ-

ated with parsing, transport management, and trans-

action handling. It also provides a general purpose

alarming facility for the TU. Once initialized, it ac-

cepts requests or responses either from its controlling

TU or from the network. Further handling of the SIP

request (or response) depends on who initiated it, the

controlling TU or a peer across the network. This is

described in the next section. Siptrans fully imple-

ments the client state machines corresponding to the

INVITE and non-INVITE requests (Figures 5 and 6,

respectively, of RFC 3261 [4]) as well as the server

state machines corresponding to the INVITE and non-

INVITE requests (Figures 7 and 8, respectively, of RFC

3261 [4]).

Siptrans and request handling. A request can arrive

at siptrans from one of two sources: a peer across

the network or the TU. For example, siptrans relays a

request from the network to the TU or siptrans relays

a request from the TU to its intended destination on

the network.

• Requests arriving from the network. If a request ar-

rives at siptrans from the network, siptrans parses

it and creates a transaction if none exists. The

parsed request is subsequently passed to the TU

through the Siptrans_sip_message_to_TU

interface. This is an interface that is provided by

the TU programmer and invoked by siptrans on

receiving a SIP request (or response). Siptrans does

not make any assumption of the details of this in-

terface; it only requires that the TU eventually in-

vokes the Siptrans_dispatch_sip_message


method to pass a final response corresponding to

the request. The response will be added to the SIP

transaction maintained by siptrans and transmitted

out to the network.

• Requests arriving from the TU. Another source of

incoming requests to siptrans is the TU. When

the TU desires to send out a request, it creates

one and presents it to siptrans by invoking the

Siptrans_dispatch_sip_message method.

Siptrans creates a transaction and analyzes the

Request-URI or Route header (if present) to de-

rive the network addresses that the request

should be sent to. Siptrans supports, in part, the

SIP procedures required to locate SIP servers [3]

and can thus query domain name server (DNS)

service (SRV) resource records (RR) for the TCP

and UDP transports as well as DNS A RR queries

(it does not yet support DNS naming authority

pointer [NAPTR] RR queries). It builds a list of

preferred servers if DNS SRV RR lookup is suc-

cessful and starts sending the request to the

servers until a server responds. If no server re-

sponds, siptrans creates a local failure response

and sends it to the TU, thus meeting the request-

response expectations of the TU.

Siptrans and response handling. Reliability and

retransmission characteristics differ in SIP for INVITE

and non-INVITE transactions. Non-2xx responses to

INVITE are retransmitted by the transaction layer

until an ACK is received (and absorbed by the trans-

action layer); 2xx responses for the INVITE are re-

transmitted by the TU and ACKs for such responses

are passed on to the TU. Furthermore, responses to

non-INVITE requests are not retransmitted at all, but

are cached on the server side and retransmitted only

upon the receipt of a duplicate request. For siptrans,

a response can arrive from one of two sources: the

TU or a peer across the network. A response origi-

nated by the TU is sent out on the network by siptrans

to its intended destination, whereas a response from

the network is presented to the TU.

• Responses arriving from the network. Assuming a

request from the TU was delivered to the appro-

priate peer, any responses to the request are

matched to the appropriate transaction by

siptrans and forwarded to the TU by invoking the

Siptrans_sip_message_to_TU interface.

Handling of responses in SIP differs for INVITE

and non-INVITE requests. First, retransmitted 2xx

responses for the INVITE are passed to the TU,

even if a transaction does not exist (as mandated

by RFC 3261 [4]). Second, INVITE requests re-

quire an additional method, ACK, to complete;

whereas, non-INVITE requests terminate when a

final response is received. Following the appro-

priate SIP state machines, if siptrans receives a

non-2xx response for an INVITE, it will generate

and send an ACK to the peer across the network

and present the non-2xx response to the TU.

Siptrans will not generate an ACK for a 2xx-class

response to the INVITE. For all non-INVITE

requests, siptrans will cache the response and

present it to the TU. Retransmissions of the

non-INVITE request will elicit the cached

response.

• Responses arriving from the TU. The TU invokes the

Siptrans_dispatch_sip_message method

to pass a response to siptrans. Siptrans simply

follows the Via list in the response to send it on-

ward (in SIP, responses are routed based on the

Via list).

Siptrans handles error conditions such as when

the peer UA has closed a TCP connection and cannot

receive a response on that connection. In such cases,

siptrans attempts to open a new TCP connection to

the peer UA in order to send the response. If a new

connection is successfully established, siptrans will

send the response; otherwise, there is precious little

the transaction layer can do besides clean up the

transaction.

Siptrans and alarm. In addition to the transaction

and transport handling features, siptrans also pro-

vides a general purpose alarming facility to the TU.

The TU can use this facility to schedule alarms for

retransmitting 200-OK responses to an INVITE re-

quest, for instance. However, the alarming facility is

designed to be as generic as possible so that arbitrary

alarms can be scheduled and executed. The APIs pro-

vide for the TU to store an opaque data item in the

alarm, which is returned to it when the alarm fires.


This allows the TU to save state in the alarm for

reuse when the alarm fires. To schedule an alarm,

the TU invokes the Siptrans_schedule_alarm

method; siptrans, in turn, invokes a TU programmer

supplied interface, Siptrans_alarm_to_TU, when

the alarm fires.

Building iSURF Over Siptrans or Other FrameworksThere are a number of issues when building

iSURF over a transaction framework:

• The transaction framework’s memory management

scheme. iSURF must know when memory refer-

ences from the transaction framework become in-

valid. If iSURF needs to retain data obtained from

the transaction framework after the data is freed,

it must copy the data before it is freed. In the case

of siptrans, iSURF was able to integrate the sip-

trans reference counting scheme with its own.

• Transaction layer timeouts. iSURF delivers transaction

layer timeouts to the application as 408-transac-

tion timeout response messages. Siptrans cre-

ates these response messages and passes these

messages to the iSURF core, which in turn passes

them to the application. If some other transaction

framework notified iSURF of the timeout through

another mechanism, then iSURF would have to

build the timeout response messages itself.

• Layering issues. Although the standard states the

responsibilities of the various SIP layers, imple-

mentations are free to reassign these responsibil-

ities. For example, siptrans is responsible for

placing Vias in outgoing requests and siptrans

determines the destination Internet Protocol

address for outgoing requests. If some other

transaction framework did not assume these

responsibilities, then the iSURF core would have

to assume them.

• Threading. iSURF needs to know the threading

structure of the transaction framework to avoid

deadlocks and to efficiently make its code thread

safe.

PerformanceWe investigated iSURF’s performance by writing a

client application that repeatedly initiates and termi-

nates sessions at a server application. We implemented

the client/server application pairs two ways: by using

the iSURF API and by using the siptrans API. We then

compared the performance of the two implementa-

tions. The iSURF client application underperformed

the siptrans client application by 10%. However, the

iSURF server application outperformed the siptrans

server application by 10%. We believe the iSURF

server application outperformed the siptrans server

application because the siptrans server copied head-

ers from requests to responses. Because of iSURF’s

sophisticated memory management, it merely copied

header references from requests to responses. There

was less opportunity of exploiting this advantage in

the client application. Regardless, iSURF does not im-

pose a large overhead over its siptrans transaction

layer.

Extending iSURFAdding another request type to iSURF involves

the following:

• Adding a new class representing the request

method. If a header must appear in a new request,

then every constructor for the class should include

this header as an argument. A constructor that

allows the new request to be built in a dialog

should also be added to the class.

• Adding a corresponding function to its event han-

dler class if there are any timed events associated

with the request. For example, when extending

iSURF to handle SUBSCRIBE-NOTIFY, we de-

clared an event handling function that is called

when a SUBSCRIBE expires.

• Implementing a periodic refresh for the request

if necessary. This was done for the SUBSCRIBE

request.

• Adding classes for any new headers necessary for

the request.

• Adjusting iSURF’s dialog state maintenance rou-

tines if a transaction involving the new request

effects dialog state.

• Providing access routines to dialog state related

to the new request. For example, iSURF

provides the application access to details on

active subscriptions created by the SUBSCRIBE

request.


• Determining whether to add timers that prevent a

transaction or dialog from being stuck in the same

state indefinitely. We added such a timer, so that

an INVITE transaction cannot be stuck in the

proceeding state indefinitely.

Conclusions/Next StepsiSURF is a framework for building arbitrary user

agents. The goal of the framework is to ease the effort

of writing a user agent without surrendering general-

ity. We believe we have succeeded in this effort.

Moreover, we believe the performance penalty of

achieving this ease of effort is manageable.

At the basic message passing level, the message

representation is easy to learn, as it largely follows

the SIP grammar. The message constructors guide

the application writer in building correct messages.

Moreover, iSURF relieves the application of memory

management duty. iSURF provides optional higher-

level constructs that facilitate application building:

• iSURF helps to dispatch incoming messages to

context-based event handlers.

• iSURF allows applications to store application data

in contexts or messages.

• iSURF provides message resending/refresh

operations.

• iSURF has configurable plug-ins that handle many

phases of processing incoming messages.

iSURF is constructed using siptrans, a SIP transac-

tion library. The library has been designed to be com-

patible with any TU that implements the interfaces

detailed in Table I. iSURF is one such TU; we have also

been successful in using siptrans in another unrelated

project within Lucent. Siptrans is portable across Solaris

and Linux platforms. It has also been successfully ported

to the pSOS* real-time operating system and WinCE.

Work continues on both iSURF and siptrans. We

are investigating increasing iSURF’s performance by

speeding up siptrans itself and by better integrating the

iSURF and siptrans memory management. We are also

investigating storing iSURF’s core state persistently to

help application writers build reliable user agents.

*TrademarksJAIN and Java are trademarks and Solaris is a registered

trademark of Sun Microsystems, Inc.

Linux is a registered trademark of Linus Torvalds.

pSOS is a trademark of Wind River Systems, Inc.

Windows is a registered trademark of MicrosoftCorporation.

References[1] Y. Chen, O. B. Clarisse, P. Collet, M. A.

Hartman, L. Rodriguez, L. Velazquez, and B. A.Westergren, “Web Communication Servicesand the PacketIN Application HostingEnvironment,” Bell Labs Tech. J., 7:1 (2002),25–40.

[2] P. O’Doherty and M. Ranganathan, “JSR 32:Jain SIP API Specification,” Aug. 2003,<http://jcp.org/en/jsr/detail?id=32>.

[3] J. Rosenberg and H. Schulzrinne, “SessionInitiation Protocol (SIP): Locating SIPServers,” IETF RFC 3263, June 2002,<http://www. ietf.org/rfc/rfc3263.txt?number=3263>.

[4] J. Rosenberg, H. Schulzrinne, G. Camarillo,A. Johnston, J. Peterson, R. Sparks, M.Handley, and E. Schooler, “SIP: SessionInitiation Protocol,” IETF RFC 3261, June 2002,<http:// www.ietf.org/rfc/rfc3261.txt?number=3261>.

[5] K. Singh, J. Lennox, S. Narayanan, and H.Schultzrinne, CINEMA: Columbia InterNetExtensible Multimedia Architecture, ColumbiaUniversity, NY, Nov. 2002, <http://www1.cs.columbia.edu/~library/TR-repository/reports/reports-2002/cucs-011-02.pdf>.

[6] D. Tweedie, “JSR 125: JAIN SIP Lite,” July2002, <http://jcp.org/en/jsr/detail?id=125>.

[7] Vovida, Vovida Open CommunicationsApplication Library (VOCAL), Apr. 2003,<http://www.vovida.org/applications/downloads/vocal/>.

(Manuscript approved May 2004)

ROBERT M. ARLEIN is a member of technical staff in theServices Infrastructure Research Departmentat Lucent Technologies in Murray Hill, NewJersey. He holds B.S. and A.M. degrees inmathematics from the University ofWisconsin in Madison and an M.S. degree in

computer science from New York University in NewYork City. He is currently investigating the uses of SIP-based components in a converged service network.Mr. Arlein holds one patent and has one applicationpending.


VIJAY K. GURBANI is a distinguished member oftechnical staff in the Wireless NextGeneration Architecture and EvolutionDepartment at Lucent Technologies inNaperville, Illinois. He holds B.Sc. and M.Sc.degrees in computer science from Bradley

University in Peoria, Illinois. He is a Ph.D. candidate incomputer science at the Illinois Institute of Technologyin Chicago. Mr. Gurbani is currently involved in thespecification, prototyping, and implementation ofservices based on SIP. His research interests are Internettelephony services, Internet signaling protocols,pervasive computing in the telecommunicationsdomain, distributed systems programming, andprogramming languages. He is a member of the ACMand the IEEE Computer Society. He holds one patentand has four applications pending. �

Documents

An extensible framework for constructing SIP user agents