View
216
Download
2
Embed Size (px)
Citation preview
http://www.pankaj-k.net/xpb4j 1
XML Processing Performance Comparison with XPB4J
Pankaj Kumar,Web Services Architect, HP
July 25, 2002
http://www.pankaj-k.net/xpb4j 2
Agenda
• XPB4J: Whys and Whats?• XStat Processing• How to run XPB4J? -- Show it with a Demo• Measurements
– Parsing/Processing APIs and implementations– What are we looking for?– Input Data– Measurement Method– Results
• What Next?• How can you benefit ( and contribute )?
http://www.pankaj-k.net/xpb4j 3
Why?
• Input for Design and Development• Performance Modeling• Comparing parser/processor performance• Learning XML• Having Fun!!
http://www.pankaj-k.net/xpb4j 4
A different kind of benchmark
• A benchmark for developers– Traditional benchmarks are for vendors of systems
to be used as sales tool– XPB4J is for developers to study and understand
– Performance tradeoffs– Performance modeling– Performance Tuning
• Focus on relative numbers• No single metric
http://www.pankaj-k.net/xpb4j 5
Components of XPB4J
• Infrastructure ( Java code and Jakarta-Ant scripts ) to run the processing code on input data and report the performance numbers and results.
• A framework to plug any XML processing code– A couple of light-weight Java interfaces
• A specific processing code -- XStat Processing code
http://www.pankaj-k.net/xpb4j 6
XStat Processing
• Collect structural statistics on an XML file– No. of times an element occurred– No. of times it had a particular element as parent– No. of times it had a particular element as child– No. of times it had a particular attribute– Amount of character data it had– Whether the element was empty
• Other assumptions– Namespaces ignored. Take qualified names as the
element identifiers.– No validation.
http://www.pankaj-k.net/xpb4j 7
How to run XPB4J?
• Download it from http://www.pankaj-k.net/xpb4j as a .zip file
• Extract it. It creates subdirectory xpb4j-0.90• Make sure that you have
– JDK 1.4.x and JAVA_HOME is set to its base directory– Jakarta-Ant 1.4.x or higher and its bin directory is in PATH.
• Issue: ant run• Changing Input Data and other parameters• Changing Parser implementations
http://www.pankaj-k.net/xpb4j 9
What determines processing time?
• Processing Activity• Input Data – Type and size of data• Machine ( CPU, RAM, OS, Disk, … )• JVM implementation• JVM state – Steady, First few executions• Processing API – [SAX, XmlPull], [DOM, JDOM,
DOM4J ], XSLT• Parser/Processor implementation
http://www.pankaj-k.net/xpb4j 10
Parsing/Processing APIs and implementations
• SAX– JDK 1.4.0, Xerces-2.0.1, GNU JAXP 1.0 beta1,
Piccolo 1.02
• XmlPull– XPP3, kXML
• DOM– JDK 1.4.0, Xerces, GNU JAXP
• JDOM (beta8)• DOM4J 1.3• XSLT
– JDK1 1.4.0, xalan-2.3.1
http://www.pankaj-k.net/xpb4j 11
Input Data
Input Data Set
Files Total Size
DS1
DS2
DS2
res0.xml
res0.xml,…, res9.xml
res.xml
11.9KB
98.3KB
111.7KB
Search Results from Google’s Web Services API on“Bill Gates”:
http://www.pankaj-k.net/xpb4j 12
Measurement Machine
• Self-assembled Server– AMD Athlon 900MHz CPU– 512 MB RAM– Dual boot -- Windows 2000/Mandrake
Linux 8.1
http://www.pankaj-k.net/xpb4j 13
Measurement Loop
// Psuedo code. Won’t compile. for (int r = 0; r < runcount; r++) // runcount runs { Runtime.gc(); // Hope that this will force garbage collection. long startMem = Runtime.totalMemory() - Runtime.freeMemory();
long startTime = System.currentTimeMillis(); for (int l = 0; l < loopcount; l++) // loopcount loops { for (file f in input files ) // Do the processing. process f; } long endTime = System.currentTimeMillis(); long endMem = Runtime.totalMemory() - Runtime.freeMemory(); int avgPT = (endTime - startTime)/loopcount; int memU = (endMem - startMem)/1024; System.out.println("Processing Time: " + avgPT + " milli secs."); System.out.println("Memory Use: " + memU + " KB."); }
http://www.pankaj-k.net/xpb4j 14
Questions: #1
• How does performance vary with SAX parsers?
• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– Processing API – SAX– JVM State – Steady
• Variable:– SAX Parser – JDK1.4, Piccolo 1.02, Xerces 2.0.1,
GNUJAXP-Beta1, Xerces 1.4.4– Input Data – DS1, DS2, DS3
http://www.pankaj-k.net/xpb4j 15
Results: #1
XStat Processing using SAX API with different parsers ( on J2SE-1.4.0 from Sun )
020406080
100120140160
J2SE-1.4.0
Piccolo-1.02
Xerces-2.0.1
GNUJAXP-Beta1
Xerces-1.4.4
SAX Parser Implementation
Avg
. P
roce
ssin
g T
ime
( m
illi
sec
s )
DS3
DS2
DS1
http://www.pankaj-k.net/xpb4j 16
Questions: #2
• How does performance vary with DOM parsers?
• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– Processing API – DOM– JVM State – Steady
• Variable:– DOM Parser – JDK1.4, Xerces 2.0.1, GNUJAXP-Beta1,
Xerces 1.4.4– Input Data – DS1, DS2, DS3
http://www.pankaj-k.net/xpb4j 17
Results: #2
XStat Processing using DOM API with different parsers ( on J2SE-1.4.0 from Sun )
0
100
200
300
400
500
J2SE 1.4.0 GNUJAXP-beta1
xerces-2.0.1 xerces-1.4.4
DOM Parsers
Avg
. P
roce
ssin
g T
ime
( m
illi
sec
s. )
DS3
DS2
DS1
http://www.pankaj-k.net/xpb4j 18
Questions: #3
• How does performance vary with XmlPull parsers?
• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– Processing API – XmlPull– JVM State – Steady
• Variable:– XmlPull Parser – XPP3, kXML– Input Data – DS1, DS2, DS3
http://www.pankaj-k.net/xpb4j 19
Results: #3
XStat Processing using XmlPull API with different parsers ( on J2SE-1.4.0 from Sun )
0
20
40
60
80
100
120
xpp3 kXML
XmlPull Parsers
Avg
. P
roce
ssin
g T
ime
( m
illi
sec
s. )
Series3
DS2
DS1
http://www.pankaj-k.net/xpb4j 20
Questions: #4
• How does performance vary with Memory Tree oriented parsers/processors?
• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– Processing API – Memory Tree oriented– JVM State – Steady
• Variable:– Parser/Processor – JDK1.4 DOM Parser, JDOM beta8,
DOM4J, JDK1.4 XSLT Processor– Input Data – DS1, DS2, DS3
http://www.pankaj-k.net/xpb4j 21
Results: #4
XStat Processing using tree oriented APIs ( on J2SE-1.4.0 from Sun )
0
200
400
600
800
1000
DOM JDOM dom4j XSLT
Tree Oriented APIs
Avg
. P
roce
ssin
g T
ime
( m
illi
sec
s. )
DS3
DS2
DS1
http://www.pankaj-k.net/xpb4j 22
Questions: #5
• How does performance compare across best of XmlPull, SAX and DOM parsers?
• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– JVM State – Steady
• Variable:– Parser/Processor – XPP3, JDK1.4 DOM, JDK1.4 SAX– Input Data – DS1, DS2, DS3
http://www.pankaj-k.net/xpb4j 23
Results: #5
XStat Processing using Pull, SAX and DOM APIs ( on J2SE-1.4.0 from Sun )
0
50
100
150
200
xpp3 j2sdk-sax j2sdk-dom
Parsers
Avg
. P
roce
ssin
g T
ime
( m
illi
sec
s. )
DS3
DS2
DS1
http://www.pankaj-k.net/xpb4j 24
Questions: #6
• How does performance vary with JVM?
• Fixed:– Measurement Machine– Processing Activity – XStat– JVM State – Steady– Input Data – DS2
• Variable:– Parser/Processor – XPP3, Xerces 1.4.4– JVM – IBM-JDK1.3, JRockit1.3.1, Sun’s JDK1.3.1,
Sun’s JDK1.4
http://www.pankaj-k.net/xpb4j 25
Results: #6
XStat Processing on Input Data Set DS2 on different JVMs
020406080
100120140160
ibmjdk
1.3
jrock
it1.3
.1
jdk1.
3.1_0
2
jdk1.
4.0
JVMs
Avg
. P
roce
ssin
g T
Ime
( m
illi
sec
s. )
xpp3
xerces-1.4.4 SAX
xerces-1.4.4 DOM
http://www.pankaj-k.net/xpb4j 26
Questions: #6
• How does performance vary with JVM warmup?
• Fixed:– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– Input Data – DS2
• Variable:– Parser/Processor – XPP3, JDK1.4, JDOM beta8,
DOM4J– JVM State – First time, Steady
http://www.pankaj-k.net/xpb4j 27
Results: #7
XStat Processing for first iteration ( on J2SE-1.4.0 from Sun )
0200400600800
10001200
SAXDO
MXPP
JDOM
XSLT
DOM4J
Parser
Pro
cess
ing
Tim
e (
mil
li
secs
. )
first
any after 500
http://www.pankaj-k.net/xpb4j 28
Questions: #8
• How does memory use vary with parser/processor?
• Fixed:– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– JVM State – Steady– Input Data – DS2
• Variable:– Parser/Processor – XPP3, JDK1.4, JDOM beta8,
DOM4J
http://www.pankaj-k.net/xpb4j 29
Results: #8
Memory Used ( KB ) in XStat Processing
0
200
400
600
800
1000
SAX DOM XPP JDOM XSLT DOM4J
Parser
Mem
ory
Use
d (
KB
)
Memory Used
http://www.pankaj-k.net/xpb4j 30
Questions: #9
• How does performance vary with input xml filesize?
• Fixed:– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– JVM State – Steady
• Variable:– Parser/Processor – XPP3, JDK1.4, JDOM beta8,
DOM4J– Input Data – 100KB, 1MB, 10MB
http://www.pankaj-k.net/xpb4j 31
Results: #9
XStat Processing for large xml files -- Average
Processing Time
0
5000
Parser
Avg
. P
roce
ssin
g T
ime
( m
s ) 102KB
1000KB
9974KB
102KB 16 42 20 62 63
1000KB 134 355 167 440 440
9974KB 1270 3205 1540 3980 4020
SAX DOM XPP JDOM DOM4J
http://www.pankaj-k.net/xpb4j 32
Questions: #10
• How does memory use vary with input xml filesize?
• Fixed:– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– JVM State – Steady
• Variable:– Parser/Processor – XPP3, JDK1.4, JDOM beta8,
DOM4J– Input Data – 100KB, 1MB, 10MB
http://www.pankaj-k.net/xpb4j 33
Results: #10
XStat Processing for large xml files -- Memory Use
0
20000
40000
60000
Parser
Mem
ory
Use
( K
B )
102KB
1000KB
9974KB
102KB 853 600 1550 1320 700
1000KB 450 3700 300 5500 3500
9974KB 890 34000 1100 53000 39000
SAX DOM XPP JDOM DOM4J
http://www.pankaj-k.net/xpb4j 34
Questions: #11
• Any Interesting Observation?• Fixed:
– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– JVM State – Steady– Parser/Processor – JDOM beta8– Input Data – 100KB, 1MB, 10MB
• Variable– Node traversal loop – Loop1, Loop2
http://www.pankaj-k.net/xpb4j 35
Questions: #11 ( Contd. )
Loop1:…List children = elem.getChildren();for (int i = 0; i < children.size(); i++)
collectStat((Element)children.get(i), sc);
Loop2:…ListIterator li = children.listIterator(); while (li.hasNext())
collectStat((Element)li.next(), sc);
http://www.pankaj-k.net/xpb4j 36
Results: #11
XStat processing with JDOM -- Loop1 and Loop2
0
50000
100000
150000
XML File Size
Avg
. P
roce
ssin
g t
ime
( m
illi
sec
s. )
Loop1
Loop2
Loop1 67 1020 108000
Loop2 62 440 3980
97KB 995KB 9980KB
http://www.pankaj-k.net/xpb4j 37
Caveats
• Different APIs are not perfect substitutes• XSLT processors are significantly different
from parsers• Performance should be only one criterion
among many others• Xstat is an artificial processing and favors
SAX/XmlPull API
http://www.pankaj-k.net/xpb4j 38
What Next?
• Comparison with C/C++ Parsers/Processors• Dynamic generation of input data• Framework improvements• Better Reporting and Presentation• More processing activities• Better tuning ?!
http://www.pankaj-k.net/xpb4j 39
How can you benefit ( and contribute )?
• Benefit from XPB4J– Gain insight from the report– Learn XML by playing with code– Validate your assumptions– Tune your parser/processor ( if you are an
implementer )
• Contribute to XPB4J– Run it under your environment and share your results– Write processing code– Extend the framework
• Discussion mailing list is: – [email protected]