Transcript

Computer Networks 50 (2006) 688–706

www.elsevier.com/locate/comnet

NetPDL: An extensible XML-based languagefor packet header description q

Fulvio Risso *, Mario Baldi

Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi, 24-10129 Torino, Italy

Received 2 January 2004; received in revised form 7 February 2005; accepted 10 May 2005Available online 2 August 2005

Responsible Editor: C.B. Westphall

Abstract

Although several applications need to know the format of network packets to perform their tasks, till now, eachapplication uses its own packet description database. This paper addresses this problem by proposing the NetPDL,an XML-based language for describing packet headers, which has the potential of enabling the realization of a com-mon, application-independent protocol description database that can be shared among several applications. Further,common functionalities related to the protocol database can be implemented in a library, which can be a basic buildingblock for implementing networking applications.� 2005 Elsevier B.V. All rights reserved.

Keywords: Protocol description language; NetPDL; XML; Protocol header description

1. Introduction

Several network applications—such as packetrouting, traffic classification, network address

1389-1286/$ - see front matter � 2005 Elsevier B.V. All rights reservdoi:10.1016/j.comnet.2005.05.029

q The work was partly funded by Microsoft Research,Cambridge, UK, and the System On Chip Division of TelecomItalia Lab S.p.A., Torino, Italy.* Corresponding author. Tel.: +39 011 564 7008; fax: +39 011

564 7099.E-mail addresses: [email protected] (F. Risso), mario.

[email protected] (M. Baldi).

translation, packet sniffing, traffic analysis, trafficgeneration, firewalling, intrusion detection—dealwith packet headers, hence need to know packetformats. Even though packet processing iscommon to a large number of applications, atpresent no solution exists to delegate it to asingle optimized component: packet processing isstill implemented within applications by customcode.

One problem with a general packet-processingcomponent stems from the different flavors ofpacket processing required by applications. For

ed.

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 689

example, while an application might need to filterpackets to further process only a subset of thosereceived, another one might need to modify thevalue of selected fields in each packet. A generalpacket processing component that fulfills the needsof any application would be required to have alarge set of functionalities, hence a high complex-ity. Another problem stems from the high degreeof portability required. In particular, the generalpacket processing component should be executableon a large number of platforms, ranging fromhardware boxes (e.g. network switches), embeddeddevices (e.g. firewalls), and workstations, runninga wide variety of operating systems.

These problems might have so far hamperedthe development of such a component. Irrespec-tive of such problems, though, a first step towardsmoving packet processing functions out of appli-cations consists in having a universal protocol

header database, which is shared among all appli-cations and contains packet descriptions for allnetwork protocols. Current applications use aproprietary syntax to describe packet headers,and packet descriptions are often hardwired intheir code. Consequently, supporting a new pro-tocol requires the intervention of the developersof the specific application. Ethereal [4] and tcp-dump [5], two well-known and widely deployedapplications, have even two different protocoldescriptions hardwired in their code: one usedwhen filtering network packets in real-time, theother one when displaying packets in a user-friendly fashion. The first description is simple(and limited) because it is designed for high-speedoperation (filtering). The second one is very com-prehensive and the corresponding packet-process-ing engine is much slower than the one using thefirst description.

This paper presents the Network Protocol

Description Language (NetPDL), an application-independent packet format description languagethat enables the creation of a universal protocoldescription database—the NetPDL database. Oneof the main design objectives of NetPDL, unlikealternative protocol description solutions (see Sec-tion 3), is simplicity. For this reason, NetPDL isnot intended as a protocol specification tool; forexample, it does not support the description of a

protocol temporal behavior—e.g., a protocol statemachine. Instead, NetPDL is targeted to an effec-tive description of packet header formats and pro-

tocol encapsulations.NetPDL is based on the eXtensible Markup

Language (XML) that is becoming the preferredway for exchanging structured data between differ-ent organizations. For this reason several tools,both stand-alone programs and libraries, existfor dealing with XML documents and can be lev-eraged for NetPDL handling. Moreover, XMLdocuments are usually parsed by applications atrun-time; by following the same approach withNetPDL, the protocol header database can bedynamically changed to include new protocols orprotocol features, without even restartingapplications.

Notice that NetPDL is beneficial also to appli-cations for which a generic packet processing en-gine and a shared database are not cost effective.An example is provided by applications thatperform simple operations on a small variety ofpacket headers. In these cases implementing pack-et processing within the application might be sim-pler than creating or interfacing a generic packetprocessing engine and leveraging from existingheader descriptions might seem to bring negligibleadvantages. However, basing header processingcode on NetPDL descriptions enables transparent(i.e., not requiring modifications to the applica-tion code) support of newer versions of theprotocols.

Section 2 elaborates on the concept of variousapplications sharing a generic packet processingengine that operates according to protocol descrip-tions stored in a NetPDL database, where thelatter can be dynamically updated. A survey ofexisting languages for describing network protocolheaders is provided in Section 3 that highlightstheir limitations for the application contextaddressed in this work. Section 4 presents an over-view of the NetPDL language and the architec-tural choices behind it, while Section 5 providesthe details of most primitives of the NetPDL lan-guage. Section 6 gives an overview of NetPDLextensions, i.e. a set of additional primitives thatcan be used to enrich NetPDL for specific pur-poses. In particular, Section 6 presents an exten-

690 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

sion aiming at the description of how packet infor-mation should be displayed. Finally, Section 7provides performance figures of a NetPDL-basedengine implementation. Conclusive remarks arepresented in Section 8.

2. Toward NetPDL-based packet processing

Fig. 1 depicts possible scenarios for the deploy-ment of a common protocol description databaseshared by various applications. A packet process-ing engine, i.e., a NetPDL-based engine in this con-text, can be either embedded into each application(left-hand side of Fig. 1) or shared among severalapplications.

A NetPDL-based engine uses a NetPDL proto-col database (NetPDL database in short), i.e. a setof XML files that contain a description of protocolheaders, to learn the structure of the packets it issupposed to process. Since the NetPDL databaseis external to both applications and NetPDL-based engines, it can be updated without requiringmodifications to the code implementing them. TheNetPDL-based engine parses these XML files andcreates an internal, engine-specific, representationof protocol headers. For example, based on theprotocol description obtained from the NetPDLdatabase, a (NetPDL-based) packet filtering en-gine can pre-calculate the offset of each field fromthe packet beginning. This will result in faster loca-

NetPDLprotocol database

Embedded NetPDL-

based Engine

Application 1

SharedNet PDL-based

Engine (e.g. DLL)

Application 2 Application 3

Public API

VisualizationExtensions

NetPDL . . .

Fig. 1. Relationships between applications, NetPDL protocoldatabase, and NetPDL-based engines.

tion of the requested fields within each incomingpacket. Hence, a filtering engine based on the Net-PDL language can have the same performance ofone based on custom protocol descriptions, i.e.,hardwired in the code of the filtering engine itself.Hence, performances of NetPDL-based engines donot depend on the characteristics of the languageitself, which (being XML-based) may seem ratherinefficient. From this point of view, a NetPDLdescription can be compared to a Java program,which can either be compiled into native code orinterpreted at run-time. The execution timestrongly depends on the tool used (compiler/inter-preter), not on the language itself.

A NetPDL database can even be remotelystored on a centralized server accessible throughthe Internet. Geographically dispersed NetPDL-based engines can (periodically) download themost recent version of the NetPDL database, usethe contained information to build their internalstructures, and perform their processing. In thisscenario NetPDL-based engines operate accordingto up-to-date and complete protocol descriptionscontained in some external, remotely locatedXML files, while not suffering performanceimpairments during packet level processing.

This paper does not focus on any specific Net-PDL-based engine, but rather on the definitionof the NetPDL database. Nevertheless, Section 7provides measurements obtained with an existingNetPDL-based engine implemented in the NetBeelibrary [3] with the objective of substantiating theabove statements on performance.

Implementing protocol-processing enginesbased on an external description database haslimitations when coming to second order optimiza-tions. For example, since a NetPDL-based engineworks on a per-protocol basis, optimizations thatrely on the combined presence of two or moreprotocol headers cannot be implemented within aNetPDL-based engine. Finally, more investigationis required to assess the applicability and benefitsof NetPDL-based packet processing in scenarioswhere performance requirements lead to thedeployment of custom hardware optimized for aspecific set of protocols. However, such applica-tion field is outside the scope of this work thatfocuses on software solutions.

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 691

3. Related work

This section provides an overview of knownprotocol description languages with specificemphasis on (1) support for packet header descrip-tion, (2) support for protocol encapsulationdescription, (3) extensibility.

Libpcap [6], one of the most widely used packetprocessing libraries, provides a set of functionsthat allow to selectively capture packets by meansof a filter. The filter, specified in high-level lan-guage (e.g. ‘‘tcp’’ means ‘‘capture only TCP traf-

fic’’), is translated into special assembly code thatis executed by a filtering engine, the BPF (BerkeleyPacket Filter) [6] virtual processor. Since the filteroperates by checking the value of selected packetheader fields, the protocol format must be known.Libpcap [7] embeds protocol definitions within itssource code, which can be (hardly) modified onlyby recompiling the library. In essence, libpcap doesnot have a language to describe protocol headers;its simple language can be used to define a packetfilter (operating on the most common protocolfields) and it cannot be used for other purposes.

One of the best-known protocol descriptionlanguages is the one deployed by Analyzer 2.0[1], a protocol analyzer developed at by one ofthe authors. An easily extensible C-like structureis used to describe packet header fields and proto-col encapsulation. A most notable feature of theresulting packet processing architecture is the abil-ity to both decode packets and customize theirsummary and detailed views based on externalfiles—Description File Format (DFF) and IndexFile Format (IFF) configuration files. However,the Analyzer 2.0 protocol description languagedoes not provide adequate support for variable-length fields and optional fields.

FALCON [9]—an evolution of the Analyzer 2.0protocol description language, notwithstanding adifferent syntax—enables more complex computa-tions, variable-length fields, and optional fields tobe specified. However, FALCON provides onlyprimitives for packet decoding (e.g. packet display-ing is not taken into consideration) and its proto-col description files have poor readability from thestandpoint of a user working on them without anyspecialized (e.g. GUI) tool.

The GASP (Generator and Analyzer System forProtocol) Language [10] is similar to Analyzer�s,but has a major emphasis on packet generation,rather than decoding. Like FALCON, it supportsonly header format description.

The protocol description language used by SPY[11] provides checking primitives to validate thecorrectness of selected fields and its protocoldescription files have excellent readability. How-ever it does not provide proper support for op-tional fields. Like Analyzer 2.0, it supports somevisualization primitives (albeit quite poor ones:only a detailed view of the packet is supported,with limited customizability); however this featureis natively provided by the language, while Ana-lyzer does the same though an extension mecha-nism, which allows arbitrary future enhancements.

The protocol description language recently pro-posed in the JnetStream project [13] is probablythe most flexible among the listed languages. Itdoes support field format descriptions, optionalfields (through conditional primitives such as if–

then–else and more), field value validation andvisualization directives (although the difference be-tween summary and detailed view of the packet isnot clear). However, it does not foresee extensionsto the language, which means that new featuresnot included in the ‘‘base’’ language cannot beadded. In addition, the complete NPL (NetworkProtocol Language) description is not easy to read(despite its C-like syntax) since all directives are to-gether without a clear separation between headerformat descriptions, field value validation andvisualization directives.

The Solidum PAX Pattern Description Lan-guage [12] is targeted at pattern description, a pat-tern being either a set of fields or a set of protocols.For instance, the IP/Ethernet stack is consideredthe most common ‘‘pattern’’ to check when look-ing for ICMP packets. The language is designedwith the objective of speeding up pattern matchingoperations on Solidum network processors. How-ever, protocol encapsulation description with thePAX language is cumbersome since all the possiblecombinations of the formats for a given protocolhave to be explicitly listed when describing theencapsulation of a higher layer protocol. Finally,PAX does not provide displaying and checking

692 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

primitives and does not properly support optionalfields.

Abstract Syntax Notation number One(ASN.1) [15] is an ISO standard notation oftenused to describe packet formats in protocol speci-fication documents. ASN.1, although a standard,is not attractive for many applications, such aspacket processing engines, due to its usagecomplexity. Moreover, several keywords aremeaningless when dealing with protocol descrip-tion, while other important features—such as sup-port for describing protocol encapsulation—aremissing. Furthermore, ASN.1 parsing is not trivialand only a few public and open-source tools doexist.

The ACT ONE language, part of the LOTOSstandard [16], is another existing approach toheader description. LOTOS, a Formal Descrip-tion Technique standardized by ISO for thedesign of distributed systems, consists of twoparts: (1) a process algebraic part is intendedfor modeling the dynamic behaviors of systems,and (2) a data algebraic part is proposed for mod-eling data structures and value expressions. Whilethe former has a different purpose than NetPDL,the latter, based on the abstract data type lan-guage ACT ONE, compares to NetPDL. How-ever, ACT ONE has been widely recognized tobe complex to use: the very nature of its buildingelements and the rigidity of its model lead tolengthy specifications with a lot of repetitive,technical details that clutter the specification[17]. The enhanced version E-LOTOS [18], whileaddressing some of these issues, is still quite com-plex to use and heavily oriented to protocol spec-ification and verification. Consequently, it doesnot support application-specific extensions, suchas data visualization. With respect to the objec-tives and intended applications of NetPDL, anal-ogous considerations apply also to Estelle [19], acurrently withdrawn standard.

Finally, the ABNF notation [14] includes someinteresting features that are useful when describingcomplex messages (like the ones of SMTP andHTTP); however it is fairly complex (due to itscompact and very efficient syntax) and it doesnot include any extension mechanism for support-ing other than header format descriptions.

In summary, the languages for protocol descrip-tion proposed thus far display several weaknesses,especially with respect to extensibility, simplicity;most of them also lack effective support for op-tional fields. Particularly, none of the examinedlanguages provides an explicit way to be extendedand only some of them support the definition ofvisualization and checking primitives. These rea-sons motivated the creation of NetPDL, whichaims specifically at addressing the aboveshortcomings.

4. NetPDL overview

This section presents the general architecture ofNetPDL and the ideas behind the language. Aftera brief look at XML (on which NetPDL is based),an overview of the protocol description languagewill be presented.

4.1. XML brief and terminology

The eXtensible Markup Language (XML) [21]is a simple, flexible text format derived from theStandard Generalized Markup Language(SGML), a.k.a. ISO 8879 [22]. Originally designedto meet the challenges of large-scale electronicpublishing, XML is playing an increasingly impor-tant role in a wide variety of data exchangesbetween computer systems using web-based proto-cols or others.

XML documents usually consist of elements,also called tags, delimited by the �<� and �>� char-acters. Each element contains a name and anoptional set of attributes; a value might be option-ally specified for each attribute. For instance,

<fruit name ¼ 00apple00=>

is an element called fruit, with an attributenamed name whose value is apple. Instead,

<person> John Black <=person>

is an element called person, that does not have anyattribute, and whose content is John Black. Ele-ments can be nested (e.g. the content of <per-son> can include <age>) in order to createmore complex structures.

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 693

The XML specification does not define elementsand attributes. This is done by using the syntacti-cal rules defined by XML to specify what elements,attributes and values are valid for a specific appli-cation. For instance, an XML-based languageaimed at describing living creatures could includea tag called <person>, while a language aimedat describing food could include a tag called<fruit>. Text files compliant to either theXML DTD (Document Type Definition) or themore powerful XML Schema standards definethe valid elements of a language. Although thesefiles are not strictly compulsory, they provide astandard way to specify the syntactical rules ofan XML-based language; therefore their use isstrongly encouraged.

An application supposed to use XML-baseddocuments must include an XML parser. Theparsing process can be split into two steps. First,locating XML elements, attributes and their val-ues; second, performing semantic actions associ-ated to each element, e.g. create a new databaserecord for the given person.

The first step is application-independent andseveral XML parsing tools, often called XML en-

gines, are available, most notably the largely usedApache Xerces [24] and Microsoft�s. For example,a typical output of the first parser is that the firstvalid XML tag within a document is the element<person>. The second step is, obviously, appli-cation-dependent. The creation of a NetPDL-based engine requires implementing only thissecond parsing step since an existing tool can bedeployed for the first one.

4.2. Why XML

One of the most common objections moved toNetPDL concerns its being based on XML insteadof a procedural language like C or Java. The mainreason for having chosen XML is that tokens oftraditional programming languages cannot be ex-tended. For instance, although a struct in theC language may be suitable to describe a protocolheader, different applications might need differentkinds of information to be provided together withpacket header formats. For example, an applica-tion might need a description of how a field should

be printed (e.g. as a hex/dec number), while an-other might need a specification of the validityrange of its values. Hence, an application pro-grammer needs to be able to extend a packetdescription language to provide the constructs re-quired to describe specific aspects needed by spe-cific applications. Extensibility is one of the keyfeatures of NetPDL and can be easily obtainedthrough the definition of new XML attributesassociated to base language elements. The same re-sult cannot be achieved with traditional program-ming languages unless heavily modified, hencechanging their very essence. Instead XML hasbuilt-in extensibility due to the structure itself ofan XML document based on elements and attri-butes (as described in the previous section). Newelements and attributes can be added whilepreserving backward compatibility with priorapplication-dependent XML parsers that cansimply ignore tokens that they are not able toprocess.

There are several additional reasons besidesextensibility for choosing XML. Being plain textfiles, XML descriptions can be easily edited anddebugged on any platform with a simple text edi-tor. Overall, thanks to the availability of XML li-braries that take care of the first parsing step,implementing parsers for XML-based documentsis definitely simpler than for any other existing lan-guage, whose parser is usually based on the lex

and yacc UNIX utilities. In addition, XML hasthe capability for strong syntactical validation,which may even be performed before the first pars-ing step by an application-independent XML toolaccording to companion definition documents fol-lowing the DTD or XML Schema standards. Thisfurther simplifies the implementation and execu-tion of application-dependent parsers since thevalidation process (which is performed automati-cally during the first parsing step) eliminates alarge set of possible errors (e.g. invalid tags, outof range values, etc.). Last but not least, XML of-fers portability across different platforms, specifi-cally web-based ones. For instance, it is prettysimple to visualize an XML description as a webpage following the formatting rules specified by acompanion eXtensible Stylesheet Language(XSL) file.

<proto name="Ethernet"><fields>

<fixed name="dst" size="6"/><fixed name="src" size="6"/><fixed name="type-length" size="2"/>

</fields>

<nextproto><switch><expr type="int">

<fieldref name="type-length"></expr>

<case value="2048"><protoref name="IP"/></case><case value="2054"><protoref name="ARP"/></case>

</switch></nextproto>

</proto>

Fig. 2. Excerpt of the NetPDL description of the Ethernetheader.

1 For simplicity, Preamble, Start Frame Delimiter, and FrameCheck Sequence are not shown in this sample description.

694 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

4.3. NetPDL basics

NetPDL aims at describing packets as definedby network protocol specifications. This includestwo complementary descriptions:

• the packet format: the list and format of thefields constituting a packet, and

• the protocol encapsulation: the rules on the fieldsof a packet that determine how—i.e., accordingto which protocol—to interpret the sequence ofbytes constituting the payload of the packet.

NetPDL was designed with the followingobjectives.

1. Simplicity: the syntax should be intuitive so that(1) it can be easily understood without a deepknowledge of the language and (2) protocoldescriptions can be written using a simple texteditor.

2. Completeness: the language must include a setof base primitives suitable to describe packetheaders of the most common (present andpossibly future) protocols, thus defining theway they can be processed. External plug-inscan be invoked by the NetPDL-based enginefor dealing with packet header formats thatcannot be described by the above-mentionedprimitives.

3. Extensibility: the language must support theaddition of new primitives to the small set ofbase elements, allowing for the language to betailored to a wide range of applications. Back-ward compatibility must be ensured when add-ing primitives: applications using the languagemust be able to skip over unknown primitives.

4. Efficiency: the performance of applications inte-grating or deploying (see the two possible archi-tectures in Fig. 1) NetPDL-based engines mustbe comparable to the ones of applications thatinclude custom code for packet processing pos-sibly based on hardwired packet descriptions.

The above objectives have driven severalchoices in the XML-based specification of Net-PDL. Each primitive consists of an element char-acterized by several attributes. For instance, a

header field is an element, the field size being anattribute of the element.

Fig. 2 shows an excerpt of the NetPDL descrip-tion of the Ethernet header that consists of 3 fixed-length fields,1 whose length is respectively six, six,and two bytes. As shown by this example, NetPDLrepresents each field as an element containing n

bytes. The <nextproto> element contains theprotocol encapsulation description, i.e., it specifieshow to determine the protocol (as indicated by thevalue of the <protoref> element) following thecurrent Ethernet header based on the value of thetype-length field (as specified by the value ofthe <fieldref> element).

5. The language

The main objective of the NetPDL specificationis the description of packet header formats andnetwork protocol encapsulation. This sectionbriefly presents the elements and attributes usedfor these purposes.

5.1. General structure

A NetPDL document, whose general structureis shown in Fig. 3, contains elements that enable

<netpdl>

<proto name="_startproto"><!– Determine which is the first header --><!– that is present in the packet -->

</proto>

<proto name="FirstProto"><fields><!-- field list -->

</fields>

<nextproto><!-- encapsulation info -->

</nextproto></proto>

<!-- Other protocols -->

<proto name="_defaultproto"><!– "Last resort" protocol -->

</proto>

</netpdl>

Fig. 3. General structure of a NetPDL document.

Table 1Basic field types defined in NetPDL

NetPDL element Description

<fixed> Fixed-length fields, aligned to abyte boundary

<masked> Field that contains bit fields<bit> Bit fields<variable> Variable-length field<line> CR/LF terminated variable length field<padding> Field realigning the header to a 16

or 32 bit boundary

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 695

proper packet processing. The document consistsof a set of descriptions, each one referred to a sin-gle protocol and contained into a <proto> ele-ment. Each description includes the elementsspecifying header formats (inside the <fields>element) and encapsulation (within the <next-proto> element), as shown in Fig. 2.

A NetPDL-based engine will start processing apacket (e.g. its binary dump) by matching the bytesequence with the elements of the description inthe order they appear.

Some predefined protocols (_startproto and_defaultproto) are used in special cases, suchas the ‘‘first’’ protocol of the encapsulationsequence and the ‘‘last resort’’ protocol to beused when no suitable protocol description isavailable for processing the remaining data of apacket. More details will be presented in the nextsections.

5.2. Base elements for packet header description

The headers defined by the majority of theprotocols currently in use contain a small setof fields, which, most often, can be catego-rized according to one of the six types shown inTable 1.

The vast majority of header fields has a fixedlength and is aligned to a byte boundary. Less fre-

quently, a field is composed of a few bits or has avariable length that can be determined only atpacket-processing time. Variable-length fieldscan be either length bounded (i.e., the length isspecified by the value of another field) or sentinelbounded (i.e., a given character or string indi-cates the end of the field). The above types offields can be specified through the <fixed> (forfixed-length fields), <masked> (identifying thepart of a header that contains bit fields), <bit>(for a bit field), and <variable> elements.Additional characteristics, such as the length ofa <variable> field, can be described bymeans of specific attributes. All the abovetypes are deployed in the example shown inFig. 4.

Due to their widespread presence in packetheaders, NetPDL includes two additional pre-defined field types: the line field (<line>)—anASCII line, which is a string terminated by a car-riage return (CR) or CR + LF (Line Feed) charac-ter—and the padding field (<padding>)—oftenused to align a protocol header to a 16 or 32 bitboundary.

A field is completely characterized by specify-ing its length, the number of occurrences, and itsposition in the packet. However, the latter twoitems are usually not needed, because normallya field is placed after a given preceding fieldand occurs only once. In order to keep the nota-tion simple, only the length of the field (throughthe attribute size) must be always specified forfixed length fields. The description of fields re-peated multiple times is addressed in Section5.3.4, while the position of a field can be specifiedthrough the optional attribute offset. A packet

<proto name="IPv6"><fields>

<masked name="ver-tc-flabel" size="4"><bit name="ver" mask="F0000000"/><bit name="tos" mask="0F000000"/><bit name="flabel" mask="00FFFFFF"/>

</masked><fixed name="plen" size="2"/><fixed name="nexthdr" size="1"/><fixed name="hop" size="1"/><fixed name="src" size="16"/><fixed name="dst" size="16"/>

<loop type="while"><!-- Loop until we find a 'break' --><expr type="bool">

<number value="1"/></expr>

<switch><expr type="int">

<fieldref name="nexthdr"></expr>

<case value="0"><includeblk name="HBH"/>

</case>

<!-- Other options follow here -->

<default><loopctrl type="break"/>

</default></switch>

</loop></fields>

<nextproto><switch><expr type="int">

<fieldref name="nexthdr"></expr>

<case value="6"><protoref name="TCP"/></case><case value="17"><protoref name="UDP"/></case>

</switch></nextproto>

<block name="HBH" longname="Hop By Hop Option"><fields><fixed name="nexthdr" size="1"/><fixed name="helen" size="1"/><variable name="Options" type="size">

<expr type="int"><fieldref name="helen"/>

</expr></variable>

</fields></block>

</proto>

Fig. 4. Extract from the IPv6 header description: <loop>,<switch>, <block> and <includeblk> elements.

696 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

trailer is a typical case in which this attribute isdeployed.

5.3. Advanced elements for packet header

description

The elements described above are often not suf-ficient: the header of a protocol as common as IP is

one such example. This section introduces a num-ber of more sophisticated, yet generally usefulelements.

5.3.1. Field blocksThe <block> element, a container for fields,

aims at improving readability of packet headerdescriptions. The content of a <block> elementis included in a protocol description through the<includeblk> tag, whose functionality can becompared to the one of macro expansions encom-passed by most high-level languages. When an<includeblk> tag is found, its content is re-placed by the content of the corresponding<block> element.

We foresee two deployment scenarios for the<block> element. First, it can be used to isolatea portion of NetPDL code that has a distinctiveidentity or function. An example can be seen inFig. 4: each option of the IPv6 protocol is definedwithin a distinct block in order to organize theNetPDL code in a clearer way. The second sce-nario is related to compactness: the same blockof fields can be present several times within theprotocol; the <block> element allows definingit once and the <includeblk> element enablesusing it multiple times within the header stack orwithin different contexts. The OSPF protocolprovides an example of the latter: a LinkState Advertisement header can be found inseveral OSPF packets (e.g. database descriptionpackets, acknowledgements, and others). Thegroup of fields is described once, and it is recalledseveral times by means of the <includeblk>element.

5.3.2. Conditional elementsIt is not uncommon within protocol headers

that the presence (or a value) of a field dependson the value of another field. Optional fields, likeIP options, are an example. Two NetPDL elementshave been defined to address this issue: the<switch>-<case> and <if> elements. Theformer allows multiple alternative descriptions tobe provided; the one actually considered whenprocessing a packet is determined by the evalua-tion of a simple condition, which is usually depen-dent on the value of another field (the one named

<proto name="Ethernet"><fields>

<fixed name="dst" size="6"/><fixed name="src" size="6"/>

<if><expr type="bool">

<lookahead><fixed size="2"/>

</lookahead><oper="le"/><number value="1500"/>

</expr>

<if-true><fixed name="length" size="2"/>

</if-true>

<if-false><fixed name="ethertype" size="2"/>

</if-false></if>

</fields>...

</proto>

Fig. 5. Ethernet header description, differentiating between theIEEE and DIX formats.

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 697

00nexthdr00 in Fig. 4). By default, each <case>compares a given value against the value of thefield specified within the <switch> element,although some more complex conditions can bedefined. For instance, the condition ‘‘this case must

be selected for all values between 10 and 20’’ isgiven by <case value=001000 maxvalue=002000>.Conditions are evaluated in order; therefore a<case> description is applied only if no otherpreceding condition matches. The element <de-fault>, also exemplified in Fig. 4, indicates a‘‘default choice’’ in case no other choice is suitable.

The <if> element enables the description of agroup of fields to be made dependent on the eval-uation of an arbitrarily complex condition, (e.g.the value of several other fields). The <expr>element can be used to define the condition inthe context of the <if> element. An example ofthe <if> and <expr> element deployment canbe found in Fig. 5.

5.3.3. Expressions

NetPDL supports mathematical, logical andstring expressions that are needed by conditionalelements. Expressions are possibly the most com-plex structures of NetPDL, mainly because users

are often used to think about expression in infixnotation (A + B * C), which is not appropriate forXML. In fact, although some solutions for usingsuch a notation exist even in XML, the choicefor NetPDL has been to define expressions in ‘‘na-tive’’ XML structures. This enables their syntacti-cal correctness to be verified through simple XMLrules.

Fig. 5 includes a sample expression: the<expr> element defines the entire expression,while child elements define operands andoperators.

5.3.4. Repeating a field or a group of fields

The repetition of a field (or a group of fields) israther common and is addressed by the <loop>element. The following four variants identified bythe type attribute as shown in Fig. 4, areavailable.

• Size-bounded loop: a (group of) field(s) is iter-ated until the cumulative size becomes equalto a given value.

• Occurrence-bounded loop: a (group of) field(s) isiterated a given number of times.

• While-bounded loop: a (group of) field(s) is iter-ated until a given condition is true (see Fig. 4).

• Do-bounded loop: a (group of) field(s) is presentat least once; the number of repetitions dependson a given condition.

The <loopctrl> element forces a corre-sponding repetition (as specified by a <loop> ele-ment) to be restarted or interrupted—similarly tothe C/C++ break and continue instructions. Thedescription of the IPv6 protocol header, shownin Fig. 4, contains a while variant of the <loop>element. An IPv6 header possibly consists of vari-ous optional headers; the number is not known inadvance. Each optional header has a ‘‘nextheader’’ field (<fixed name = 00nexthdr00>) thatidentifies the next optional header. Thus, a packetprocessing engine should continue looking for op-tional headers as long as the value of the ‘‘nextheader’’ field equals one of the values specified asvalid. In Fig. 4 this is expressed through a<switch> element that lists all the valid valuesfor the nexthdr field. The loop (i.e., repetition

698 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

of the optional header) is repeated until an ‘‘inva-lid’’ value is found in the ‘‘next header’’ field;2 inthis case, the <loopctrl> element breaks theloop.

5.3.5. Expressions with lookahead operands

Sometimes, expressions are required to look atsome bytes that have not been associated to anyfield yet. An example can be found in the wide-spread Ethernet and IEEE 802.3 headers: the fieldspanning the 13th and 14th bytes from the begin-ning of a frame can be either the Ethernet 2.0ethertype or the IEEE 802.3 length field,depending on whether its value is greater than1500 or not, respectively. Obviously, it is not pos-sible to determine whether to interpret the above-mentioned pair of bytes as either a length oran ethertype field before actually checking theirvalue. NetPDL addresses this situation throughlookahead operands.

When evaluating an expression, a NetPDL-based engine can interpret the next bytes in a pack-et dump (which have not been assigned to anyfield) as they were belonging to a new field, andevaluate the expression according to this value.

The use of a <lookahead> operand for thedescription of Ethernet headers is shown inFig. 5, which can be compared to Fig. 2 wherethe same header is described without differentiat-ing between the IEEE and Ethernet 2.0 formats.The 13th and 14th bytes are evaluated as they werepart of a <fixed> field, which is used to deter-mine which branch of the <if> element specifieshow to process the sequence of bytes. Accordingto the selected branch, the two bytes used to eval-uate the expression are interpreted as the actualfield—length or ethertype.

5.3.6. Custom plug-ins

For the sake of simplicity, NetPDL does notaim at supporting the description of every possiblefeature ever defined within a protocol. Instead,NetPDL provides a <plugin> element to enable

2 More precisely, the repetition is interrupted when the valueof the ‘‘next header’’ field does not match any of the <case>elements. This happens, for example, when the next header isthe TCP or UDP one.

the deployment of external code to handle proto-col header features not directly supported by Net-PDL elements. The <plugin> element defines alink to custom code that can be part of either theNetPDL-based engine or an external library (e.g.a Dynamic Link Library in Win32). This elementcan be used, for example, to implement the pro-cessing of protocols that have a very complexstructure (e.g. DNS, SNMP etc.).

Since a <plugin> is an interface toward spe-cial purpose native code, each NetPDL engineimplementation must include (in addition to thecode implementing NetPDL primitives) the ad-hoc code corresponding to the plug-ins possiblyin use.

5.4. Protocol encapsulation

This section presents the NetPDL primitives forhandling protocol encapsulation, i.e., how to spec-ify the protocol description to be used in process-ing a packet�s payload. Protocol encapsulation isbased on the <nextproto> element, as shownin the Ethernet description example in Fig. 2.

For most protocols, encapsulation is basedon the value of one or more header fields. Therelationship between the value of such fields andthe encapsulated protocol can be specified bythe <protoref> element deployed within<switch>-<case> or <if> elements, whichare used to evaluate the condition that brings tothe correct encapsulated protocol. The<switch>-<case> element is usually deployedto identify an encapsulated protocol through thevalue of a single field (see Fig. 2 for an example),while the <if> element allows encapsulation tobe made dependent on complex conditions (see[20] for details and examples).

In some cases, the encapsulated protocol is not(univocally) identified by any field in the encapsu-lating packet header: further processing of theencapsulated header is needed to ensure properidentification of the corresponding protocol. Forinstance, a BOOTP (Boot Protocol) and a DHCP(Dynamic Host Configuration Protocol) packetcan be discriminated only by the value of a fieldwithin their own header. The <presentif> ele-ment supports such cases by defining a condition

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 699

to be evaluated while processing a packet header inorder to verify that the correct header descriptionis being used. If the condition is valued false, theprotocol description used so far for the processingdoes not match the byte stream and another onemust be selected. An example can be seen inFig. 6: the IPv4 protocol is characterized by the‘‘version’’ field (<fieldref name= 00ver00/>)containing a value equal (<oper type=00eq00>)to four (<number value=00400/>). In case the por-tion of packet that would correspond to that fieldcontains a different value, the byte sequence beingprocessed is not an IPv4 packet.

Notwithstanding the versatility of its primitives,NetPDL does not support encapsulation rules fora few protocols. For example, there are no fieldsneither within the UDP nor the RTP (Real-TimeTransport Protocol) headers that can be used touniquely identify RTP packets. In particular, reli-able identification of RTP packets requires pro-cessing of state information on an RTP session,which is not adequately supported by the currentNetPDL specification. In fact, NetPDL providesonly limited support, by means of custom vari-ables (see Section 5.5), for stateful packetprocessing.

Finally, a NetPDL-based engine takes pre-de-fined actions when the packet header format can-not be inferred by matching the providedNetPDL packet descriptions to the byte streambeing processed. This happens at least in twocases. First, when starting processing a new data

<proto name="IPv4"><presentif>

<!-- Check that 'version' is equal to '4' --><expr type="bool">

<fieldref name="ver"/><oper type="eq"/><number value="4"/>

</expr></presentif>

<fields><masked name="verhlen">

<bit name="ver" mask="F0"/><bit name="hlen" mask="0F"/>

</masked>...

</fields></proto>

Fig. 6. Excerpt from the description of the IPv4 header: the<presentif> element.

unit, the first bytes must be interpreted accordingto the link-layer type of the interface throughwhich the packet dump has been received and can-not be inferred from the data itself. The _start-proto primitive protocol identifier specifiesaccording to which header description the first by-tes of a sequence should be interpreted/processed.Second, when none of the available protocoldescriptions can be mapped to the byte sequencebeing processed, the _defaultproto primitiveprotocol identifier specifies a sort of ‘‘last resort’’protocol, i.e., a header description to be used whenno other protocol description applies.

5.5. Variables

Variables can be declared within a NetPDLdescription and manipulated at run time. Thevalidity of a variable can be limited in time—spe-cifically, a volatile variable is valid only while pro-cessing one packet, while the content of a staticvariable is preserved through different packets—and in scope—specifically, a local variable is validonly when processing fields of the associated pro-tocol header, while a global one is valid while pro-cessing the entire packet. Hence, a variable belongsto one of the four categories resulting from allcombinations of validity: local-volatile, local-sta-tic, global-volatile, global-static. Permanent (sta-tic) variables allow information to be kept whileprocessing subsequent packets, thus enabling alimited degree of stateful packet processing.

Each NetPDL-based engine has a number ofpredefined global variables containing commoninformation such as the link-layer type and, lengthof the frame being processed and the total numberof bytes available for processing (for example, theuser can choose to capture and/or store only thefirst part of a packet), the number of bytes pro-cessed (e.g. decoded) within the frame, the numberof bytes processed within the current protocolheader, a timestamp containing the time at whichthe related byte sequence was captured on the net-work. These default variables can be accessed fromNetPDL when processing the corresponding pro-tocol headers. In addition, the user can create itsown variables.

700 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

Short Ethernet frames provide an example of asituation in which global volatile variables areneeded. The shortest Ethernet frame is 64 bytes,however the number of valid bytes in the payloadcan be smaller than 64 when carrying a short pack-et, hence ‘‘padding’’ is present. Since the Ethernet2.0 frame does not have a ‘‘frame length’’ field, thenumber of valid bytes within the Ethernet payloadis kept in a global variable and must be inferredwhen processing the next level protocol. Accordingto the IP description in [20], a NetPDL-based en-gine processing an IP packet received within ashort Ethernet frame will properly update the var-iable containing the Ethernet payload size.

6. NetPDL extensions

One of the most valuable characteristics of Net-PDL is its extensibility, i.e. the possibility to addnew keywords (that can be inserted as either attri-butes of existing NetPDL elements or new ele-ments) that will be used by some applications fortheir purposes. An example of a possible extensionis attaching to header fields information related totheir validity range; for instance, some fields allowonly a limited set of values, while others (e.g. CRCfields) must have a precise value.

Extending NetPDL in order to support new fea-tures is rather simple. As an example, this sectionpresents the first extension to this language, calledNetPDL Visualization Extension, which providesinformation on how a decoded packet should bedisplayed. For instance, a 32-bit string represent-ing an IP address should be displayed in dotted-decimal form, while a 32-bit string representing aCRC should be displayed as a hexadecimalnumber.

Only the capability to parse protocol descrip-tions based on NetPDL core is required from aNetPDL-based engine. Besides this, a NetPDL en-gine might process only extensions relevant to thespecific application for which it was designed (e.g.,packet filtering). Therefore, a NetPDL-based en-gine that does not support new extensions simplyignores extension specific attributes and elements,thus operating on a description like the one inFig. 2.

Thanks to the properties of XML, extensionscan be written in files separate from the ‘‘core’’specification, improving both readability andmaintainability (users can update files separately).

6.1. The NetPDL Visualization Extension

The NetPDL Visualization Extension (the com-plete specification can be found in [20]) has beendesigned to support protocol analyzers (a.k.a. snif-fers), which need to display captured data streamsin an intuitive and user-friendly way. WhileNetPDL elements provide protocol analyzers withenough information to decode packets, the Visual-ization Extension provides the additional informa-tion needed for displaying packet fields.

The existing NetPDL Visualization Exten-sion allows the definition of two views: a summary

view, which includes the most important fields tobe shown for each packet, and a detailed view,which includes all the fields of each packet, in fulldetail.

Even though the NetPDL Visualization Exten-sion does not represent the only possible solutionfor protocol visualization, it is noteworthy interms of simplicity and efficiency. For instance,XSL Transformations [23] have been consideredfor the definition of the above views since ourpacket decoding engine (presented in Section 7)produces an XML output (the PDML format, de-scribed in [20]). Even though XSL Transforma-tions allow for richer visualization features, theuser has to learn yet another (rather complex) pro-gramming language in order to handle visualiza-tion, which is against one of the main goalsmotivating our work: simplicity. The NetPDLVisualization Extension, instead, defines only afew visualization primitives that are based on thesame principles of the core primitives, hence quickto get familiar with. Moreover, experiences withXSL processing have shown it computation inten-sive and slow; particularly, processing time growslinearly with the number of protocols involved(although this might be just a feature of the soft-ware package we used). For instance, an experi-ment on a Pentium IV—2.4 GHz machine hasshown that displaying a packet requires about2.5 ms with XSL (Xalan-C [25] implementation)

Table 2Native code vs. NetPDL-based engine implementation performance

Tool name Results

Partial packet decoding NetBee 39 ls/pkt

Complete packet decoding Ethereal (native code) 66 ls/pktNetBee 75 ls/pkt

Complete packet decoding + packet dump (PDML) Ethereal (native code) 1077 ls/pktNetBee 648 ls/pkt

Filter generation for packet filtering WinPcap (native code) <1 msNetBee 2831 ms

<proto name="Ethernet" longname="Ethernet 802.3"><fields>

<fixed name="dst" longname="MAC Destination" size="6"showtemplate="EthMAC"/>

<fixed name="src" longname="MAC Source" size="6"showtemplate="EthMAC"/>

<fixed name="type-length" longname="Type - Length"size="2" showtemplate="FieldHex"/>

</fields>...

</proto>...<netpdlshow>

<showtemplate name="FieldHex" showtype="hex"/>

<showtemplate name="EthMAC"showtype="hex" showgrp="3" showsep="-">

<showmap><expr type="string">

<!-- Extract the first 3 bytes of the address --><strfieldref startat="0" size="3"/>

</expr>

<case value="FFFFFF" show="Broadcast address"/> <case value="000001" show="SuperLAN-2U"/>...<default show="code not available"/>

</showmap>

<showdtl><pdmlfield attrib="show"/><if>

<!-- It extracts the first byte of the --><!-- MAC address, then it matches the result --><!-- against the 'xxxxxxx0' pattern --><expr type="bool">

<fieldref startat="0" size="1"/><oper type="match"/><pattern value="xxxxxxx0"/>

</expr>

<if-true><text value=" Unicast address (Vendor "/><pdmlfield attrib="showmap"/><text value=")"/>

</if-true></if>...

</showdtl></showtemplate>

</netpdlshow>

Fig. 7. Complete NetPDL description, with visualizationextensions.

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 701

and about 0.6 ms with the NetPDL VisualizationExtension (more details are provided in Table 2).In any case, it is a decision of each NetPDL-basedengine developer whether to implement the Visual-ization Extension or to rely on the XSL Transfor-mations instead. For the sake of this paper theNetPDL Visualization Extension is just an exam-ple of a possible extension to the NetPDLlanguage.

6.2. Displaying the details of a packet

The NetPDL Visualization Extension defines aset of elements and attributes that are used withina visualization section, or template, to specify howeach field is to be displayed. Each NetPDL fieldcontains a reference to a template (through theshowtemplate attribute) that contains the rele-vant visualization elements.

The most important attributes contained in thevisualization template are showtype, showgrp,and showsep, which determine respectively theformat (hexadecimal, decimal, ascii, or binary) ofeach byte, how bytes must be grouped, and theseparator string between the groups. For examplethe MAC source and destination fields in Fig. 7specify that the field should be shown using theEthMAC template. This template displays a fieldby splitting its value in two parts (of three byteseach, as specified by the showgrp attribute) ofhexadecimal numbers (showtype attribute) sepa-rated by a ‘‘-’’ sign (showsep attribute). The finalresult looks like 000800-AB34F9.

The <showdtl> element defines a custom tem-plate that can be used to describe more sophisti-cated displaying rules. For example, in Fig. 7 the

template associated to a MAC address (EthMAC)verifies the type of the address (i.e., unicast,

702 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

multicast, or broadcast) and displays a string iden-tifying such type.

Similarly, the <showmap> element comparesthe field value against a set of choices to determinea suitable displaying format. For example, inFig. 7, this mechanism is used to determine themanufacturer of the network interface card (whichdepends on the first three bytes of the MAC ad-dress) and to show its name. The <showdtl>and <showmap> elements use various tags, suchas <if> and <switch>-<case>, alreadydefined within the NetPDL.

In case the proper way of displaying specificinformation cannot be expressed by means of theprevious attributes and elements, the showplg

attribute references a custom visualization pluginnatively implemented into the NetPDL-based en-gine. An example is a plug-in that displays thecanonical name (e.g. www.foo.bar) correspond-ing to IP addresses. The plug-in mechanism inthe Visualization Extension is similar to the onein the base language (<plugin> element in Sec-tion 5.3.6) and enables full flexibility through com-plete customizability of packet visualization.

6.3. Displaying the summary of a packet

The Visualization Extension includes a set ofprimitives for creating a summary view of eachpacket. A protocol summary should include themost important information to be displayed andhow to display it; a NetPDL-based engine (partic-ularly its visualization extension code) will put all

<proto name="Ethernet" showsumtemplate="eth"><fields>

...</fields>...

<netpdlshow><showsumtemplate name="eth"><section name="next"/><text value="Eth: "/><pdmlfield name="src" attrib="show"/><text value=" => "/><pdmlfield name="dst" attrib="show"/>

</showsumtemplate></netpdlshow>

</proto>

Fig. 8. NetPDL Visualization Extension: specifying the sum-mary related to the Ethernet header.

the ‘‘protocol summaries’’ together transparently.The summaries related to all the protocols subse-quently encapsulated into a packet are appendedto one another to create a single string.

As an example, Fig. 8 shows the elements of thesummary view for Ethernet protocol headers.Each Ethernet frame will be summarized with thestring ‘‘Eth:’’ followed by the source MAC ad-dress, the string ‘‘=>’’ and the destination MACaddress, i.e., a format looking like

Eth:0001C7-B75007=>000629-393D7E

7. Implementing a NetPDL-based engine

A first implementation of a NetPDL-based en-gine, also supporting the Visualization Extension,can be found in the NetBee library [3], which iscurrently used by the Analyzer 3.0 protocol ana-lyzer [2]. Both tools have been released as opensource software.

The NetPDL database shipped with Analyzerincludes an experimental description of 64 proto-cols, mostly related to the TCP/IP suite, includingEthernet, Token Ring, VLAN, IP, IPv6, TCP,UDP, DHCP, DNS, RIP, OSPF, BGP, PIM. ThisNetPDL-based engine, implemented as a 500 Kby-tes Dynamic Link Library (DLL) for Windows,decodes packets and generates detailed and sum-mary views. Additionally, it can also performpacket filtering. Analyzer accesses NetPDL-relatedfunctionalities by invoking functions (such asDecodePacket()) exported by the DLL. TheNetBee Library exports a very clean interfaceand decoding and printing a packet requires onlya few lines of code as shown in Fig. 9.

Table 2 provides a comparison of the perfor-mance of NetBee�s NetPDL-based engine vs. cus-tom implementations. The test, run on a PentiumIV at 2.4 GHz, is based on the analysis of 5 traces,each containing at least 8000 packets captured onour University egress link. Traces contain com-plete packets and no filters have been set in thecapture process. The analysis of each trace is re-peated 10 times and the second best processingtime is retained as representative of the task com-pletion time and shown in Table 2. The executiontime of the relevant code of a few well-known tools

while (1){struct _nbPDMLPacket *PDMLPacket;struct _nbPDMLProto *ProtocolItem;

// Read packet from file or networkRes= PacketSource->Read(&PacketHeader, &PacketData);

if (Res == nbFAILURE)break;

// Decode packetDecoder->DecodePacket(DataLinkCode, PacketCounter,

PacketHeader, PacketData);

// Get the current decoded packetPDMLReader->GetCurrentPacket(&PDMLPacket);

// Print some global information about the packetprintf("Packet number %d\n", PDMLPacket->Number);printf("Total lenght= %d\n", PDMLPacket->Length);

// Retrieve the 1st protocol contained in the packetProtocolItem= PDMLPacket->FirstProto;

// Scan the current packet and print on screen the most // relevant data related to each proto contained in itwhile(ProtocolItem){printf ("Protocol %s: size %d, offset %d\n",

ProtocolItem->LongName, ProtocolItem->Size,ProtocolItem->Position);

ProtocolItem= ProtocolItem->NextProto;}

}

Fig. 9. Sample code using the NetBee library: decoding andprinting the details of a packet.

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 703

(Tethereal [4] for packet decoding, which is the no-GUI version of Ethereal and WinPcap [8] forpacket filtering) is measured and compared to Net-Bee�s. Both tools are able to decode packets inmemory and, if required, to create a PDML file(XML-based format for packet detailed view) con-taining the packet description.

Table 2 shows that the performance of NetBee(which decodes protocols by means of a Net-PDL-based engine) and Tethereal (which imple-ments protocol dissectors with native code) isvery similar, 75 ls and 66 ls, respectively. Thesenumbers account only for the decoding time, sincethe decoded packet is only created in memoryaccording to each tool internal format. In caseonly the most important information about eachfield is required (basically the field name, its posi-tion in the packet dump, and its size), NetBee fur-ther decreases the processing cost from 75 ls/packet to 39 ls/packet; this function, called ‘‘par-tial packet decoding’’ in Table 2 is not availablein Tethereal. In case decoded packets have to bedumped on file (in PDML format), performance

increase to 0.65 ms and 1.08 ms per packet forNetBee and Ethereal, respectively. However theseincreased figures are mostly due to the efficiencyof the code that dumps results on disk, which doesnot depend on the decoding process.

Although these results provide only a generalindication of the performance obtainable fromNetPDL-based tools, they clearly demonstratethat the NetPDL language itself does not intro-duce performance penalizations; performance fullydepends on the quality of the tool deploying thelanguage, i.e., of the NetPDL engine imple-mentation.

The NetBee library includes also a second set ofclasses that implement packet filtering capabilities.These classes compile a high-level filter (in a lan-guage similar to the libpcap one) to pseudo-assem-bly instructions, which are executed by a virtualmachine for processing packets. The NetPDL lan-guage is therefore used for describing packet for-mats and encapsulations which are intended togenerate filters, while packet filtering itself is out-side the scope of the language. Although the filtergeneration times shown in Table 2 are quite long,which is at the moment due to the very immaturestage of the compiler, the implementation of thesefunctionalities is important to demonstrate how asingle NetPDL database is successfully and effec-tively deployed by different applications.

8. Conclusions

This paper presents the Network Protocol

Description Language (NetPDL), a new extensible,XML-based language for describing the format ofprotocol headers. Network applications can basetheir packet processing on the NetPDL protocoldatabase, i.e., a collection of packet descriptions.According to this architecture, new protocols canbe easily supported by updating the protocol data-base instead of changing the application. For in-stance, tools like Ethereal and libpcap/WinPcapmust be extended (i.e. new code is to be addedand the whole tool recompiled) in order to supportnew protocols, while Analyzer needs only to up-date its protocol database without any recompila-tion. Simplicity in upgrading makes NetPDL

704 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

advantageous also for applications that deal with asmall set of protocols, for which an external proto-col database might seem to offer no advantage.

In any case, this is not the only way that Net-PDL can be beneficial for packet processing. Asmany other technologies (XDR, ASN.1, IDL, . . .),the NetPDL protocol database can be used to gen-erate C code implementing packet processing. Thetarget application includes the dynamically gener-ated code as part of its sources, thus being ableto run this code (natively) at very high speed. Inthis scenario upgrading applications requires auto-matically generating new source files out of an up-dated NetPDL database and re-linking.

The paper presents the basic NetPDL primitivesneeded for packet format description. NetPDL canbe easily extended to satisfy the needs of specificapplications, its syntax is easy to understand, andimplementing parsers and support tools (such asgraphic editors) is particularly simple thanks tothe large number of existing XML-support toolsand libraries. NetPDL further allows extensionsaimed at more specific functions; as an example,this paper describes the Visualization Extension

that provides primitives for describing how pack-ets should be displayed, in a detailed and summa-rized view. NetPDL has been developed as a firststep in an effort to relieve network applicationsfrom tasks related to packet processing. The basicidea is to delegate such tasks to a packet-process-ing engine that operates based on ‘‘standard’’packet descriptions. NetPDL is intended as theformalism for such packet descriptions that canbe gathered in an application independent Net-

PDL protocol database.This will result in shorter development time for

network applications and tools like protocol ana-lyzers, firewalls, network monitors. NetPDL pro-vides a general way to describe the format ofprotocol headers, eliminating the need for imple-

menting a custom protocol database in each tool.This is a major change in the network tools arena,where each application currently defines its ownprotocol database that is often ‘‘hardwired’’ intothe application code.

NetPDL-based engines, that perform packetprocessing based on NetPDL descriptions, can beimplemented ad hoc for specific applications. In

addition, if this approach became popular, it couldtrigger the creation of a set of ‘‘general purpose’’NetPDL-based engines, which can be delegatedby the applications to perform low-level tasks likepacket decoding, packet filtering, and field extrac-tion. Applications can include a generic NetPDL-based engine as external library or interact withan external NetPDL-based engine through a pub-lic API (Fig. 1). The availability of off-the-shelfNetPDL-based engines will speed up the develop-

ment of network applications because programmerscan concentrate on high-level tasks (e.g. imple-menting filtering rules into a firewall) instead ofdealing with low-level issues (i.e. how to locate acertain field in a given protocol header). Moreover,given their wide applicability, the above-men-tioned NetPDL-based engines can be extensivelytested, deeply debugged, and optimized; conse-quently, their deployment will result in improved

application quality.The Analyzer 3.0 network sniffer is based on a

NetPDL-based engine (implemented within theNetBee library) that uses both the basic NetPDLlanguage and the Visualization Extension. TheNetBee Library has been made freely availablewith a BSD-style license in order to encouragethe adoption of the presented technologies by bothacademia and industry. Complete specifications ofthe NetPDL language and the related technologiesare available at [20].

Acknowledgements

The authors wish to thank Leonardo Scucchia,who laid the basis for this work during his Laureagraduation project and Francesco Andriani, whocontributed to the first implementation of a Net-PDL-based engine.

References

[1] Computer Networks Group (NetGroup) at Politecnico diTorino, Analyzer, Available from: <http://analyzer.pol-ito.it>, March 1999.

[2] Computer Networks Group (NetGroup) at Politecnico diTorino, Analyzer 3.0, Available from: <http://ana-lyzer.polito.it/30alpha/>, March 2003.

F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706 705

[3] Computer Networks Group (NetGroup) at Politecnico diTorino, The NetBee Library, Available from: <http://www.nbee.org/>, August 2004.

[4] Ethereal, a public-domain sniffer, Available from: <http://www.ethereal.com>.

[5] Tcpdump, a public domain sniffer. Available from:<http://www.tcpdump.org>.

[6] S. McCanne, V. Jacobson, The BSD packet filter: a newarchitecture for user-level packet capture, in: Proceedingsof the 1993 Winter USENIX Technical Conference, SanDiego, CA, USENIX, January 1993.

[7] V. Jacobson, C. Leres, S. McCanne, libpcap, LawrenceBerkeley Laboratory, Berkeley, CA. Initial public releaseJune 1994. Available from: <http://www.tcpdump.org/>.

[8] Computer Networks Group (NetGroup) at Politecnico diTorino, WinPcap Web Site, Available from: <http://www.winpcap.org>, April 2003.

[9] Surasak Sanguanpong, Ekapol Rojratanavichai, Syntaxdirected, definition supported universal protocol ana-lyzer, in: Electrical Engineering Conference (EECON),Kasetsart University, Bangkok, December 1999, Availablefrom: <http://anreg.cpe.ku.ac.th/pub/protocol.pdf> (inThai).

[10] Laurent Riesterer, Generator and Analyzer System forProtocols (GASP), March 2000, Available from: <http://laurent.riesterer.free.fr/gasp/>.

[11] Christian Lorenz, SPY LAN protocol analyzer, 1999,Available from: <http://www.gromeck.de/Spy/>.

[12] Solidum (now Integrated Device Technology—IDT), PAXPattern Description Language, October 2002, Availablefrom: <http://www.solidum.com/products/pax_pdl.cfm>.

[13] Mark Bednarczyk, JnetStrem Project, Available from:<http://netrepository.net>.

[14] D. Crocker, Augmented BNF for Syntax Specifications:ABNF, RFC 2234, IETF Network Working Group,November 1997.

[15] Olivier Dubuisson (Ed.), ASN.1—CommunicationBetween Heterogeneous Systems, Morgan Kaufmann,October 2000.

[16] International Organization for Standardization. Informa-tion processing systems—open systems interconnections,LOTOS—A Formal Description Technique Based on theTemporal Ordering of Observational Behaviour, StandardISO 8807, 1989.

[17] C. Pecheur, VLib: Infinite virtual libraries for LOTOS, in:Proceedings of the IFIP TC6/WG6.1 13th InternationalSymposium on Protocol Specification, Testing and Verifi-cation, 25–28 May 1993, Liege, Belgium, Available from:<ftp://ftp.run.montefiore.ulg.ac.be/pub/RUN-PP93-03.ps>.

[18] International Organization for Standardization. Informa-tion technology, Enhancements to LOTOS (E-LOTOS).Standard ISO/IEC 15437:2001, 2001.

[19] International Organization for Standardization. Informa-tion processing systems—open systems interconnection,Estelle—A Formal Description Technique Based on anExtended State Transition Model. Standard ISO/IEC9074, 1997.

[20] Computer Networks Group (NetGroup) at Politecnico diTorino, The NetPDL Specification, May 2003, Availablefrom: <http://www.nbee.org/NetPDL/>.

[21] Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, EveMaler, Extensible Markup Language (XML) 1.0 (seconded.), W3C Recommendation, 6 October 2000.

[22] International Organization for Standardization. Infor-mation processing—text and office systems, Stan-dard Generalized Markup Language (SGML), ISO 8879,1986.

[23] James Clark, XSL Transformations (XSLT) Version 1.0,W3C Recommendation, 16 November 1999.

[24] The Apache Project, Xerces: XML Parsers in Java andC++, Available from: <http://xml.apache.org/>.

[25] The Apache Project, Xalan: XSL stylesheet processors inJava and C++, Available from: <http://xml.apache.org/>.

Fulvio Risso (born in 1971) is an assis-tant researcher at the Department ofControl and Computer Engineering ofPolitecnico di Torino. He got his Ph.D.in computer and system engineeringfrom Politecnico di Torino in 2000. Hisresearch topic was ‘‘Quality of Ser-vice on Packet-Switched Networks’’;present research activity focuses onnetwork analysis and network moni-toring; past activities focused on the

provision of guaranteed-quality services over packet-switched

networks.

His international experiences include a one-year period atUniversity College London (UK) during his PhD and one atCisco Systems, San Jose (CA), USA as a Visiting Faculty.

He started and he is one of the maintainers of the WinPcap(http://winpcap.polito.it) and the Analyzer (http://analyzer.polito.it) projects. The former is the de-facto standard libraryfor network analysis tools under the Win32 platform, while thelatter is one of the most appreciated tools for packet sniffingand network monitoring.

Mario Baldi is an Associate Professoron tenure track at the ComputerEngineering Department of TorinoPolytechnic, Torino, Italy and VicePresident for Protocol Architecture atSynchrodyne Networks, Inc., NewYork.

He received his M.S. DegreeSumma Cum Laude in ElectricalEngineering in 1993, and his Ph.D. inComputer and System Engineering in

1998 both from Torino Polytechnic. He was an Assistant

Professor on tenure track at Torino Polytechnic from 1997 to2002. He joined Synchrodyne Networks, Inc. in November1999.

706 F. Risso, M. Baldi / Computer Networks 50 (2006) 688–706

He has been a visiting researcher at the IBM T.J. WatsonResearch Center, Yorktown Heights, NY, at Columbia Uni-versity, New York, NY, and at the International ComputerScience Institute (ICSI), Berkeley, CA.

As part of his extensive research activity at Torino Poly-technic, he has been leading various networking research pro-jects, involving Universities and industrial partners, funded byEuropean Union, Local Government, and various companies,including Telecommunications Carriers, such as Infostrada and

Telecom Italia, and research institutions, such as Telecom ItaliaLabs.

He provides on a regular basis consultancy and trainingservices, both directly to companies and through varioustraining and network consultancy centers.

He co-authored over 50 papers on various networkingrelated topics and two books, one on internetworking and oneon switched local area networks.