36
Biswadeep Nag Staff Engineer Performance Engineering Acceleration Techniques for XML Processors XMLConference 2004

Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Biswadeep Nag

Staff Engineer

Performance Engineering

Acceleration Techniques for XML Processors

XMLConference

2004

Page 2: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML is Everywhere

● Configuration files (web.xml, TurboTax)

● Office documents (StarOffice, OpenOffice)

● Web content (transformed to HTML)

● Web services and application integration

Page 3: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Advantages

● Cross platform● Human readable● Self-describing● Extensible

Page 4: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Disadvantages

● Verbose

– Binary encoding (fast infoset)

● Expensive to process

– Parsing is fundamentally costly

Page 5: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Parsing

● Vs. programming language parsing– Similar, but simpler

● Lexical grammar defined by XML 1.x spec– Lex, Yacc equivalent

● Semantics (types, constraints, etc) defined by XML Schema

● Current parsers handcrafted in Java, C

Page 6: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Document

<invoice> <lineitem lineid="1"> <item sku="1234"> <name> Widget </name> </item> <qty> 10 </qty> <price currency="USD"> 100 </price> </lineitem></invoice>

Page 7: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

SAX Events Generated● StartElement(invoice)● StartElement(lineitem, lineid=1)● StartElement(item, sku=1234)● StartElement(name)● Characters(Widget)● EndElement(name)● EndElement(item)● StartElement(qty)● Characters (10)● EndElement (qty)● StartElement (price, currency=USD)● Characters (100)● EndElement (price)● EndElement (lineitem)● EndElement (invoice)

Page 8: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Parsing Types

● Streaming– Push – SAX (Simple API for XML)– Pull – StaX (Streaming API for XML)– Small memory footprint– Sequential access

● Tree-building– Language independent – DOM – Java specific – JAXB – More memory consumption– Random access, modification

Page 9: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Processing Technologies

● Java API for XML Processing (JAXP)– SAX, DOM, XSLT and in future StaX– Basic (W3C standard based) XML parsing

● Java Architecture for XML Binding (JAXB)– Converting XML to Java objects and vice versa

● Java API for XML-Based RPC (JAX-RPC)– Foundation of web services

● All part of Java Web Services Developer Pack (JWSDP)

Page 10: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Acceleration Opportunity

● Diverse applications of XML– Parsing XML documents

– Manipulating XML using Java programs

– Remote web service calls via XML

● Yet one area that binds them all!– Basic XML parsing

● Current solutions focus on single application

Page 11: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Offload Engine (XOE)

● Handle common, routine aspects of XML processing, similar to:– TCP/IP offload engine (TOE)

– SSL/Crypto accelerator

● Gains come from:– Acceleration – Specialized XML processing hardware

– Offload – Using XOE processor instead of host cycles

Page 12: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Application Stack

User Application

SAX

DOM

XSLT StaX

JAXB JAXP JAX-RPC

XML Parser(Xerces, XPP3 etc)

Java Standard API

XML Parsing API

Page 13: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Parser Architecture

SAX, DOM, StaX

UTFDecoder

CharacterValidator

ElementScanner

Parser

SchemaValidator

APIImplementor

XMLDocument

Page 14: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Parser DetailsParser Module Module Function

UTF Decoder Decodes from UTF to Java Unicode chars

Char Validator Checks that document has valid XML chars

Element Scanner Tokenizes elements and attributes

Parser Checks for well-formedness

Schema Validator Validates XML document against XML schema

API Implementor Implements W3C standard and Java APIs

Page 15: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XOE Characteristics

● Smart NIC– 10 Gigabit Ethernet– PCI-Express card

● Functions– TCP Termination (TOE)– HTTP Termination– SSL Acceleration– XML Acceleration

● Specialized Hardware● Processor running software/microkernel

Page 16: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Design Considerations

● Address broadest use-cases

● XML & WS technologies still volatile

– Avoid design obsolescence

● Keep most of the software stack unchanged

● Can fit into any host system

● Limited hardware design costs

● Use natural stratification of software stack

Page 17: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Design Decisions

● XOE Hardware– Low level parser functions– Relatively simple hardware

● XOE Firmware– High level functions (grammar checking)– Data structures requiring more memory

● Host Software– Applications APIs that may change

● JAXB, JAX-RPC, DOM, StaX, SAX

– Steps that are done infrequently (Schema validation)

Page 18: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Xerces Call Graph

scanStartElement6.9%

scanAttribute3.3%

scanEndElement0.9%

scanContent7.4%

scanQName14.1%

skipChar5.5%

skipString3.9%

skipSpaces2.9%

scanLiteral2.3%

scanChar2.1%

scanContent7.9%

read3.8%

addSymbol11.2%

isName0%

isContent0%

isSpace0%

XMLDocumentScanner

XMLEntityScanner

SymbolTableUTF8Reader XMLChar

Page 19: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Offload Architecture

SAX, DOM, StaX

UTFDecoder

CharacterValidator

ElementScanner

ParserSchemaValidator

APIImplementor

XMLDocument

Host Software

Acceleration Hardware

XOE Firmware

Page 20: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Acceleration Architecture

User Application

JAXB, JAXP, JAX-RPC

SAX, DOM, StaXSchema Validation

XOE Driver

Host CPU

Parser

Scanner

Decoder

XML Offload Engine

Network TOE

XOE Hardware

XOE Firmware

Crypto Accel

Network Interface

TokenizedXML

Format

Page 21: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Hardware Design

● Pipeline Architecture– DMA data from local memory

– UTFDecode

– Check valid XML chars

– Tokenize into elements & attributes

– Store in symbol table (hash implementation)

– Compose tokenized format

– DMA back to memory for sending to host

Page 22: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Firmware Design

● Control and direct XML accelerator h/w

● Implements XML grammar

● Well-formedness checking– Each begin tag matched with an end tag

– Tags are properly nested

● Stack structure with ptrs to symbol table

● Compare endElement ptr to stack top

Page 23: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XOE Operating Mode 1

● XML from/to wire (web services)– Inline mode– SOAP / JAX-RPC– Requires TCP

termination– HTTP termination

(proxy)– Possibly SSL

Host XOETXF

PCI-E

XML

NET

Page 24: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XOE Operating Mode 2

● XML from/to host CPU– Coprocessor mode– Local file-system or

generated by CPU– XML document

processing (JAXP, JAXB, XSLT etc)

Host XOETXF

XML

PCI-E

Page 25: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Data Transfer Performance

● PCI-Express interconnect● Bulk / throughput transfers● Avoid latency for short messages● Buffering required● Sharing between multiple streams– Saving parser state in firmware

– Stopping at consistent points

Page 26: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Tokenized XML Format

● Close to binary format● Used to pass data between host & XOE● XML assumed to be valid & well-formed● Cheaper to process than plain XML (> 2X)● Compression of element and attr names● Already converted to Java Unicode● Fixed length strings

Page 27: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Document

<invoice> <lineitem lineid="1"> <item sku="1234"> <name> Widget </name> </item> <qty> 10 </qty> <price currency="USD"> 100 </price> </lineitem></invoice>

Page 28: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

TXF Symbol Table

Symbol Id Symbol

1 invoice2 lineitem3 lineid4 item5 sku6 name7 qty8 price9 currency

Page 29: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

TXF DocumentCode Length Value

1 StartElement - 1 invoice1 - 2 lineitem3 AttrName - 3 lineid4 AttrValue 1 11 - 4 item3 - 5 sku4 4 12341 - 6 name5 ElemContent 6 Widget2 EndElement /name2 /item1 - 7 qty5 2 102 /qty1 - 8 price3 - 9 currency4 3 USD5 3 1002 /price2 /lineitem2 /invoice

Page 30: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Security

● Different from transport level (SSL)● Encrypting parts of SOAP messages● Digital signatures● Requires Base 64 encoding

Page 31: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Example Encrypted Document

<invoice> <lineitem lineid="1"> <item sku="1234"> <name> Widget </name> </item> <qty> 10 </qty> <cipherData> hI0nMM9QhInpIe6D35cMW1prhJ </cipherData> </lineitem></invoice>

Page 32: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XWSS in Development

● Current implementation (Apache-based) uses non-standard APIs

● Future JSRs– JSR 105 – XML Signatures

– JSR 106 – XML Encryption

● Very poor performance (10X slower)● Requires iteration through XML parsing

and crypto engine

Page 33: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XWSS Opportunity

● Co-location of:– Crypto Accelerator

– XML Parser

● Combination has major advantages– All processing happens in XOE

– No round-trips to host

– Transparent to host and application software

– Huge performance gains

Page 34: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Security Details

● XML hardware tokenizes XML● Does Base 64 decoding● Firmware recognizes cipherData tag● Calls crypto engine with encrypted data● XML produced fed back to XML hardware● Resulting tokenized XML combined with

original document

Page 35: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

XML Parser Architecture

SAX, DOM, StaX

UTFDecoder

CharacterValidator

ElementScanner

Parser

SchemaValidator

APIImplementor

XMLDocument

Base64DecodingDecryption

Page 36: Acceleration Techniques for XML Processors · XML Processing Technologies Java API for XML Processing (JAXP) – SAX, DOM, XSLT and in future StaX – Basic (W3C standard based) XML

Summary

● XML processing is expensive on general purpose CPUs.

● Using an XML Offload Engine– Many basic XML parsing functions accelerated by

special-purpose hardware.– Generic, pre-processed, tokenized XML document

produced by firmware.– Specific APIs implemented by host software

● Combining with crypto accelerator produces major benefits in XML security