27
ISO/IEC JTC 1/SC 32 N 1761 Date: 2008-05-25 REPLACES: -- ISO/IEC JTC 1/SC 32 Data Management and Interchange Secretariat: United States of America (ANSI) Administered by Farance Inc. on behalf of ANSI DOCUMENT TYPE Information from JTC1 Secretariat TITLE Efficient Binary representation of XML - Presentation SOURCE JTC1 Secretariat PROJECT NUMBER 1.32. STATUS In accordance with JTC 1 Gold Coast resolution 45, the attached document, presented at the Technology Watch meeting on the Gold Coast, is forwarded to SC 6, SC 29, SC 32 and SC 34 to access potential opportunities. REFERENCES ACTION ID. ACT REQUESTED ACTION DUE DATE Number of Pages 27 LANGUAGE USED English DISTRIBUTION P & L Members SC Chair WG Conveners and Secretaries Dr. Timothy Schoechle, Secretary, ISO/IEC JTC 1/SC 32 Farance Inc *, 3066 Sixth Street, Boulder, CO, United States of America Telephone: +1 303-443-5490; E-mail: [email protected] available from the JTC 1/SC 32 WebSite http://www.jtc1sc32.org / *Farance Inc. administers the ISO/IEC JTC 1/SC 32 Secretariat on behalf of ANSI

ISO/IEC JTC 1/SC 32 N 1761

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

ISO/IEC JTC 1/SC 32 N 1761 Date: 2008-05-25

REPLACES: --

ISO/IEC JTC 1/SC 32

Data Management and Interchange

Secretariat: United States of America (ANSI)

Administered by Farance Inc. on behalf of ANSI

DOCUMENT TYPE Information from JTC1 Secretariat TITLE Efficient Binary representation of XML - Presentation SOURCE JTC1 Secretariat PROJECT NUMBER 1.32. STATUS In accordance with JTC 1 Gold Coast resolution 45, the attached document,

presented at the Technology Watch meeting on the Gold Coast, is forwarded to SC 6, SC 29, SC 32 and SC 34 to access potential opportunities.

REFERENCES ACTION ID. ACT REQUESTED ACTION

DUE DATE Number of Pages 27 LANGUAGE USED English DISTRIBUTION P & L Members

SC Chair WG Conveners and Secretaries

Dr. Timothy Schoechle, Secretary, ISO/IEC JTC 1/SC 32 Farance Inc *, 3066 Sixth Street, Boulder, CO, United States of America Telephone: +1 303-443-5490; E-mail: [email protected] available from the JTC 1/SC 32 WebSite http://www.jtc1sc32.org/ *Farance Inc. administers the ISO/IEC JTC 1/SC 32 Secretariat on behalf of ANSI

ISO/IEC JTC 1 SWG for Technology Watch Secretariat: US (ANSI)

ISO/IEC JTC1 TW0043 2008-03-14

Document Type: Presentation

Document Title: Efficient Binary representation of XML

Document Source: Dr. Raymond Wong, Australia, National ICT Australia (NICTA)

Project Number:

Document Status: Final

Action ID: ACT or FYI

Due Date:

Distribution: TWG and JTC 1

No. of Pages: 25

Note:

The imagination driving Australia’s ICT future

Efficient Binary Representation of

XMLRaymond Wong and Bill Shui

mContext Project

National ICT Australia

[email protected]

[email protected]

The imagination driving Australia’s ICT future

Motivations: XML everywhere

The imagination driving Australia’s ICT future

Even you have a large flash memory card

• Runtime footprint will be huge!!!

• e.g., Runtime footprint = 10 x original storage size

=> = 50 x compressed doc size

So size of memory footprint is critical !!!

Decompression

Runtime footprint

The imagination driving Australia’s ICT future

Problem of simply compressing XML

• When reading the compressed data

– Need decompression

– Need space for (compressed + decompressed) data

Compression Decompression

+

The imagination driving Australia’s ICT future

Solutions

• Binary XML to improve processing and space

efficiency

• Benefits:

– Still maintain existing XML effort in managing and storing

information.

– Prevent branching of multiple alternative formats.

The imagination driving Australia’s ICT future

Binary XML:

Property Demand

The imagination driving Australia’s ICT future

Minimal requirements

• MUST SUPPORT– Directly Readable and Writable

– Transport Independence

– Compactness

– Human Language Neutral

– Platform Neutrality

– Integratable into XML Stack

– Royalty Free

– Fragmentable

– Streamable

– Roundtrip Support

– Generality

– Schema Extensions andDeviations

– Format Version

– Identifier

– Content Type Management

– Self Contained

• MUST NOT PREVENT– Processing Efficiency

– Small Footprint

– Widespread Adoption

– Space Efficiency

– Implementation Cost

– Forward Compatibility

The imagination driving Australia’s ICT future

Existing proposals

• Efficient XML (EXI) a proposed W3C Standard for BinaryXML (EXI)

• ASN.1 X.694 with BER (Basic Encoding Rules)

• ASN.1 X.694 with PER (Packed Encoding Rules)

• XML + gzip

• Fast Infoset (Sun Microsystems)

• FXDI (Fujitsu Binary)

• Xebu

• ASN.1 X.694 with PER + Fast Infoset

• Efficiency Structured XML (esXML)

• BiM (from MPEG 7)

• WBXML (Wireless Binary XML or WAP Binary XML)

The imagination driving Australia’s ICT future

Meeting minimal requirements

The imagination driving Australia’s ICT future

Problems when XML data are edited

• Higher CPU Usage for re-packaging and compressing

the entire dataset.

• More runtime space usage:

– Runtime Storage Required = old version + newly compressed

version.

• Non of the proposed standards supports efficient update

operations.

The imagination driving Australia’s ICT future

mContext binary XML

• Meets both MUST haves and NOT PREVENTS.

• Works with and without Schema Information.

• Small and constant runtime footprint.

• It is not a compressed format lower CPU usage.

• Fast update, navigation and access of XML nodes

regardless of size.

• Can directly map to existing SAX and DOM interfaces.

• Already able to link to MSXML and Xerces.

The imagination driving Australia’s ICT future

mContext binary XML

• Compatible with existing algorithms for efficient XPath,

XQuery and XSLT processing.

• Extensible for third party text compression schemes.

• No more XML parsing.

• Tested up to 16GB of XML data, more than 770million

nodes.

• API available in C/C++, Java and C#.

• Works on mobile devices, desktop and server

environments.

The imagination driving Australia’s ICT future

Summary

• Data size of XML documents will increase.

• Binary XML is needed to secure the extensive usage of

XML for large and small computing devices.

• mContext Binary XML enables

– Satisfies requirements of existing standards group on binary

XML formats.

– Fast update, navigation and random access of XML data.

– Succinct structure without compression better utilisation of

processing resources.

– API ready to adapt to existing XML based infrastructures.

The imagination driving Australia’s ICT future

The End

For further information, please contact us at

[email protected]

[email protected]

The imagination driving Australia’s ICT future

mContext Succinct Binary XML

• It is published in World Wide Web Conference 2007.

– http://www2007.org/htmlpapers/paper794/

The imagination driving Australia’s ICT future

Data Structure

The imagination driving Australia’s ICT future

An Example Result

100M Data (public domain) Commercial

software lib

(MSXML)

mContext

Memory footprint 329MB 67MB

Loading time 17.8s 0.67s

Runtime footprint

(search)

333MB 67MB

Processing time (search) 1.814s 0.143s

The imagination driving Australia’s ICT future

ASN.1 X.694

• BER

– Uses ASN.1 with BER for encoding.

– Uses X.694 for mapping XSD to ASN.1.

– Advantage:

• Binary tokens and binary texts smaller size than XML

– Disadvantage:

• Requires schema information for the encoding.

• PER

– Same as above, but with higher compression ratio.

– However, still suffer from the same disadvantage.

The imagination driving Australia’s ICT future

Fast Infoset

• Failed the compactness test.

• Performs badly without the knowledge of schema.

The imagination driving Australia’s ICT future

Fujitsu XML Data Interchange (FXDI)

• Fails the version info test by W3C

• However, it is also heavily dependant on the knowledge

of schema information.

• High compression is achieved when schema information

is provided.

• Uses separate encoding and decoding API for reading

and writing binary XML. However, performs much worse

in processing time when schema is used.

The imagination driving Australia’s ICT future

Xebu

• Splits into Xebu and Xebu-S.

• http://www.w3.org/XML/EXI/eval/xebu-evaluation.html

• Fails the compactness test.

• Designed only mainly for mobile phones. Not well tested

on larger systems.

• Not totally self-contained.

The imagination driving Australia’s ICT future

Efficiency Structured XML (esXML)

• Fails compactness test.

• Uses pointer based layers in its info-set.

The imagination driving Australia’s ICT future

The imagination driving Australia’s ICT future

XML in Enterprise Systems

Effective service-oriented

architecture needs efficient

XML handling

Almost all

documents in

XML format

The imagination driving Australia’s ICT future

XML on Mobile Devices

Pure games without

deep story

Responsiveness & scalability

of interactive mobile

applications