Upload
milo-holmes
View
220
Download
1
Embed Size (px)
Citation preview
A Comparison of Compression Techniques for XML-based Security Policies in Mobile Computing Environments
Xuebing Qing
Carlisle Adams
Agenda
Why Compress? Criteria for Compression Algorithms Gzip and Bzip wbXML with/without Transcode ASN.1 Combinations
– wbXML + Zip
– ASN.1 + Zip Recent XML Compression Proposals Conclusions and Future Directions
Foreign Domain
Internet
Home Domain
Firewall
Firewall
PDA
Pol
icy
Dep
loym
ent
/Upd
ate
Mutual AuthorizationRequest/Response
Why Compress?
For high interoperability between domains, XML (XACML) is a good choice for policy representation
On-device Authorization Decision rendering, and simple policy deployment/updating, is also required.
XML is too verbose and heavy for many mobile devices:
– Limited bandwidth
– Limited CPU power, RAM
– Limited battery, flash memory, etc.
Evaluating Compression Algorithms
Criterion 1: High Compression Ratio Criterion 2: Low Processing Overhead Criterion 3: No Semantic Ambiguity “Nice to have”: 3rd Party API Support
We consider the most popular compression algorithms, as well as their combinations:
– Gzip and Bzip– wbXML– ASN.1– wbXML with transcode + Gzip or Bzip– ASN.1 + Gzip + Bzip
None of them introduce semantic ambiguity and all have good 3rd party API support.
The ideal algorithm: should achieve the highest compression rate while keeping decompression overhead at a minimum.
Experimental Setting
Written in Java, tested under JSDK 1.4.2 / Windows 2000 / 866 MHz CPU and 512 MB RAM
Runtime Memory Profiling: Eclipse Hyades Plug-in
Java APIs Used
– wbXML: kXML 1.1 (Open Source)
– ASN.1: Pure Java API by OSS Nokalva (Alpha Version – Trial version)
– Gzip: The gzip implementation in JDK 1.4.2
– Bzip: Apache BZip2 Implementation
Test Cases: 9 XACML files (2KB ~ 1 MB) created from the XACML (version 1.1) Conformance Test Suite
Gzip and Bzip2: Compression Rate
GZip/BZip2 Compression Rate
05
10152025303540
Com
pres
sion
Rat
e (%
)
Gzip
BZip2
Very good compression rate (especially when size > 70K) Compression_rate gzip better than Compression_rate bzip2 when
size <= 70K, while Compression_rate bzip2 better than Compression_rate gzip when size > 70K
Bzip2 performs extremely well when size >= 250K. Zip algorithm works better with large files, yet it still compresses
small files (2K) to 1/3 of original size.
Gzip and Bzip2: Processing Overhead - Time
GZip/BZip2 Decompression Overhead - Time
0
100200
300400
500600
700
File 1 File 2 File 3 File 4 File 5 File 6 File 7 File 8 File 9
Tim
e (m
illi
seco
nd
)
wbXML
Gzip
BZip2
Only decompression time is considered, because the compression of XACML only happens on the server side when deploying policies.
Absolute decompression time is not enough to evaluate. The wbXML-to-XML conversion mainly involves XML tag replacement and is
not CPU intensive so it can be performed on a device (thus the time of the conversion can be used as a reference to make a fairly realistic evaluation).
Gzip performs the best; BZip2 is similar to wbXML conversion Considering that kXML 1.1 API has significant room for optimization, it
appears that wbXML conversion may ultimately have a similar time overhead to Gzip and hence may be acceptable on a mobile device.
Gzip and Bzip2: Memory Overhead – Raw Data
Numbers in brackets are mem increment; numbers in red means memory in use decreases when file size increases – it is caused by garbage collection.
Memory overhead of wbXML-to-XML is used as a reference for the estimate. Size memory = Size memory_in_use + Size memory_gced. So the memory used by File 8 is
not 1,857,623 (memory in use), but 3,087,933 bytes that include garbaged collected memory in the process.
To analyze, we categorize memory as two parts: base runtime memory for the decompression API and program itself, and decompression memory for representation and computation of data at runtime.
Base memory is estimated by comparing the absolute memory size with that of wbXML-to-XML conversion.
Memory size increment factor is used to estimate decompression memory.
File Size (bytes)
GZIP Memory [increment]
(bytes)
Bzip2 Memory [increment](bytes)
wbXML with Transcode (bytes)
File 1 2,167 913,770 18,699,694 1,221,647
File 2 4,798 922,000 [8,230] 18,707,972 [8,278] 1,272,590[50,943]
File 3 9,479 938,566 [16,566] 18,851,890 [143,918] 1,372,148[99,448]
File 4 23,976 974,080 [35,514] 18,759,269 [-92621]2 1,803,052[430,904]
File 5 70,186 1,175,590 [201,510] 18,957,045 [197,776] 1,241,162[-561,890]4
File 6 140,071 1,450,050 [274,460] 19,374,474 [417,429] 1,106,431[-134,730]4
File 7 278,623 1,996,007 [545,957] 19,752,229 [377,755] 1,131,434[25,003]4
File 8 556,395 1,857,623 [-138,384]1 20,802,929 [1,050,700] 1,385,234[253,800]4
File 9 1,111,939 3,482,445 [1,624,822] 8,916,388 [-11,886,541]3 742,690[-642,544]4
Gzip and Bzip2: Memory Overhead – Result
Mem Size Increment Factor
2.5 4.5 3.15
30.7
19.4
29.7
0
20
40
Gzip Bzip wbXML
Memory size increment factor measures the memory increment caused by the data size increment, or memory increment / file size increment.
The bigger a memory size increment factor is, the more memory is used for data compression and the more frequent the garbage collection will be.
It is range of possible values instead of one fixed value Result: Gzip has a very small footprint when decompressing XACML data – its
processing memory overhead is reasonable and acceptable. However, a zipped XACML has to be unpacked into XML and then processed. The processing overhead of Gzip is OHgzip = OHGzip-decompression + OHxml-processing
Base Mem
100
2046
134
0
500
1000
1500
2000
2500
Per
cen
tage
Gzip
BZip2
wbXML
wbXML: Overview
Part of the presentation logic in WAP Uses a token dictionary, where each token (transcode) maps to a
predefined string (mainly element tags and attribute tags). wbXML without transcode: no explicit token dictionary specified
(otherwise, wbXML with transcode). Code segments used to generate transcode in kXML 1.1
wbXML: Compression Rate
GZip/BZip2 Compression Rate
020406080
100C
ompr
essio
n R
ate
(%)
without Transcode
with Transcode
Gzip
wbXML with transcode reduces size to under 50% of the original, which is much better than wbXML without transcode.
Not comparable with Gzip, particularly when the file size is over 5 KB.
However, an XACML policy in wbXML can be processed directly by a wbXML parser without any decompression overhead.
We only discuss the processing overhead for wbXML with transcode.
wbXML: Analysis of Processing Overhead
There is no time and memory overhead for decompression. However, it is impractical to measure and compare CPU time and
memory used by evaluating an XACML policy in wbXML form and in XML form.
We do following analysis rather than experiments– Footprintwbxml_obj < Footprintxml_obj : since a wbXML file is 50% of its original
XACML size, it is reasonable to assume that a wbXML object is approximately half of its XML counterpart.
– Smaller runtime representation certainly enables faster processing, but need to consider the overhead of transcode-table lookup at runtime.
– We can assume Processing_Timewbxml <= Processing_Timexml
– Evaluating an XACML policy in wbXML is less battery intensive because its in-memory representation is much smaller than its XML counterpart.
– Result: OHwbxml = x OHxml-processing where < 1; it is smaller than OHgzip = OHGzip-
decompression + OHxml-processing
ASN.1: Schema Based XML Encoding
A schema-based binary encoding spec, X-694 “Mapping W3C XML Schema Definitions into ASN.1”, is under development.
The spec introduces ASN.1, a binary-and-schema-based language, into the XML world, which is XML-schema based.
With the specification, an XML document can be converted into ASN.1, which is then encoded with ASN.1’s binary encoding rules, such as PER, DER, CER, BER
Theoretically, ASN.1 with PER, the most compact encoding rule, can achieve the same level compression rate that Gzip does [4].
However, Pure Java API by OSS Nokalva only offers a compression rate that is just a little bit better than wbXML, partially because the API is still in its Alpha stage – several hot fixes have been sent during the experiments in this research.
ASN.1 Encoding: Compression Rate
GZip/BZip2 Compression Rate
0102030405060
Com
pres
sion
Rat
e (%
)
ASN.1
with Transcode
Gzip
Slightly better than wbXML with transcode, but not comparable to Gzip.
The result is different from the one from Fast Web Services (FWS) [7]; this might be caused by the difference in APIs used and/or by the different characteristic between XACML files and the Web services XML files used in FWS.
ASN.1 Encoding: Analysis of Processing Overhead
No need to convert an ASN.1 encoded policy to XACML when processing, because ASN.1 is a schema language and supports similar operations as XML.
As with wbXML, we do analysis rather than experiments. The analysis is similar with the one for wbXML. Result: OHASN.1 = x OHxml-processing where < 1; it is smaller than OHgzip =
OHGzip-decompression + OHxml-processing
According to Sun’s experimental results on FWS, could be as small as 0.1 in a Web services environment (although no such result has been achieved in our experiments).
Agenda
Why Compress? Criteria for Compression Algorithms Gzip and Bzip wbXML with/without Transcode ASN.1 Combinations
– wbXML + Zip
– ASN.1 + Zip Recent XML Compression Proposals Conclusions and Future Directions
Combine wbXML or ASN.1 with Gzip
Gzip, wbXML and ASN.1 do not perform well enough to satisfy the criteria on their own.
Pure Gzip has more processing overhead than wbXML and ASN.1, while wbXML and ASN.1 do not compress as well as Gzip.
It makes sense to combine them:– wbXML with transcode + Gzip– ASN.1 with transcode + Gzip– Other combinations are not as good as the above (wbXML with transcode is
better than wbXML without transcode, and Bzip2 consumes much more memory and CPU time than Gzip for decompression).
The Combinations: Compression Rate
GZip/BZip2 Compression Rate
05
101520253035
Com
pres
sion
Rat
e (%
)
ASN.1 + Gzip
wbXML + Gzip
Gzip
Much better than pure ASN.1 and wbXML Even better than pure Gzip It is interesting that the overall compression rate of wbXML +
Gzip for XACML over 100KB is better than ASN.1 + Gzip.
The Combinations: Analysis of Processing Overhead
For wbXML with transcode + Gzip: OHwbxml_GZip = OHGzip_decompression + x OHxml-processing
For ASN.1 + Gzip: OHASN.1_Gzip = OHGzip_decompression + x OHxml-processing Just for reference:
– Gzip: OHgzip = OHGzip-decompression + OHxml-processing
– wbXML: OHwbxml = x OHxml-processing
OHwbxml_Gzip is definitely better than OHGzip because an XACML file is only decompressed once but processed many times.
Although OHwbxml_Gzip is greater than OHwbxml, the difference can be ignored, because OHGzip_decompression is small and the decompression only happens the first time the policy is downloaded, and when the policy is updated.
Conclusion: wbXML + Gzip is better than ASN.1 + Gzip :– Tag names in XACML are long; simple replacement (wbXML) achieves a good
compression rate.– Replacement (wbXML) creates less overhead than complex encoding (ASN.1)– ASN.1 does not achieve the excellent compression rate expected (when publicly
available APIs are used).– Good open source wbXML APIs are available.
Recent XML Compression Proposals (1): XOP/MTOM
XOP: XML-binary Optimized Packaging– an XML serialization protocol, which converts certain XML data content
(usually base-64 encoded) into binary streams and puts them into a structure that looks like MIME multipart, with an XML document as the root part.
MTOM: Message Transmission Optimization Mechanism– a description of how XOP is layered into SOAP HTTP transport (SOAP 1.2)
for Web services More HTTP friendly (it’s using MIME multipart); not originally conceived
for the wireless world. More like a communication protocol than a compression algorithm. There appears to be no public implementation available; therefore, not
known how well it performs with respect to our criteria (compression rate, processing overhead, semantic ambiguity)
Recent XML Compression Proposals (2): XMill
A compression algorithm from AT&T, particularly designed for XML Step 1 - Regrouping: separate structure, layout, and data, then
distribute data elements into data streams (int, char, string, base64, etc.)
Step 2 – Use gzip, bzip2, etc., to compress these streams XMill typically achieves much better compression rate than
conventional compressors such as gzip, bzip2 on XML data. More processing overhead than gzip, bzip2 for the extra “step 1”. Compared with wbXML + Gzip, XMill needs to convert XACML back to
XML for processing.
Conclusions and Future Directions
Suggested criteria for the use of XML-based policies in mobile devices Reviewed and compared a variety of compression algorithms for XML Concluded that {wbXML + transcode + Gzip} offers the best
combination of compression rate and processing overhead of all algorithms tested
– This combination is recommended for use with XML-based security policies in mobile computing environments
Directions for further work– Keep an eye on ASN.1 (will public implementations match theoretical
results?)– The compression rate of wbXML with transcode can be improved by adding
more transcodes into the table (e.g., built-in function names, data type names, etc.). How much improvement can be gained?
– Experiments on XMill (perform more detailed comparison with wbXML to determine the best algorithm for this environment)
References [1] Uche Ogbuji. “Tip: Compress XML files for efficient transmission”, IBM DeveloperWorks, 9
April, 2004 [2] M. Cokus, D, Winkowski. “XML Sizing and Compression Study For Military Wireless Data”,
XML 2002 Proceedings by deepX [3] http://www.wapforum.org/what/technical/PROP-WBXML-19990815.pdf. “WAP Binary XML
Content Format Specifications – Version 1.2” [4] ASN.1 Site - XML. “What ASN.1 Can Offer for XML?”, http://asn1.elibel.tm.fr/xml/ June, 2004 [5] ITU-T X.694. “Information Technology – ASN.1 encoding rules – Mapping W3C XML Schema
Definitions Into ASN.1”, Jan, 2004 [6] Nokia. “Nokia Position Paper: W3C Workshop on Binary Interchange of XML Information Item
Sets”, Aug, 2003, http://www.w3.org/2003/08/binary-interchange-workshop/02-Nokia-Position-Paper_02.htm
[7] P. Sandoz, et al. Sun Microsystem. “Fast Web Services”, July, 2003, W3C Workshop on Binary Interchange of XML Information Item Sets
[8] http://www.devx.com/xml/article/16754/0/page/1 “Compressing XML” [9] M. Girardot, N. Sundaresan. “Millau, an encoding format for efficient representation and
exchange of XML over the Web”, http://www9.org/w9cdrom/154/154.html [10] http://www.gnu.org/software/gzip/gzip.html. “gzip - GNU Project - Free Software
Foundation(FSF)” [11] http://gnuwin32.sourceforge.net/packages/bzip2.htm “Bzip2 for Windows” [12] http://www.kxml.org “kXML with wbXML support” [13] http://www.oss.com “OSS Nokalva ASN.1/Pure Java Tools - Beta” [14] http://www.eclipse.org/hyades/ “Hyades – Automated Software Quality Evaluation
Framework” [15] http://sourceforge.net/projects/xmill “XMill - A User Configurable XML Processor”
Questions