40
http://www.pankaj-k.net/xpb4j 1 XML Processing Performance Comparison with XPB4J Pankaj Kumar, Web Services Architect, HP July 25, 2002

Http:// 1 XML Processing Performance Comparison with XPB4J Pankaj Kumar, Web Services Architect, HP July 25, 2002

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

http://www.pankaj-k.net/xpb4j 1

XML Processing Performance Comparison with XPB4J

Pankaj Kumar,Web Services Architect, HP

July 25, 2002

http://www.pankaj-k.net/xpb4j 2

Agenda

• XPB4J: Whys and Whats?• XStat Processing• How to run XPB4J? -- Show it with a Demo• Measurements

– Parsing/Processing APIs and implementations– What are we looking for?– Input Data– Measurement Method– Results

• What Next?• How can you benefit ( and contribute )?

http://www.pankaj-k.net/xpb4j 3

Why?

• Input for Design and Development• Performance Modeling• Comparing parser/processor performance• Learning XML• Having Fun!!

http://www.pankaj-k.net/xpb4j 4

A different kind of benchmark

• A benchmark for developers– Traditional benchmarks are for vendors of systems

to be used as sales tool– XPB4J is for developers to study and understand

– Performance tradeoffs– Performance modeling– Performance Tuning

• Focus on relative numbers• No single metric

http://www.pankaj-k.net/xpb4j 5

Components of XPB4J

• Infrastructure ( Java code and Jakarta-Ant scripts ) to run the processing code on input data and report the performance numbers and results.

• A framework to plug any XML processing code– A couple of light-weight Java interfaces

• A specific processing code -- XStat Processing code

http://www.pankaj-k.net/xpb4j 6

XStat Processing

• Collect structural statistics on an XML file– No. of times an element occurred– No. of times it had a particular element as parent– No. of times it had a particular element as child– No. of times it had a particular attribute– Amount of character data it had– Whether the element was empty

• Other assumptions– Namespaces ignored. Take qualified names as the

element identifiers.– No validation.

http://www.pankaj-k.net/xpb4j 7

How to run XPB4J?

• Download it from http://www.pankaj-k.net/xpb4j as a .zip file

• Extract it. It creates subdirectory xpb4j-0.90• Make sure that you have

– JDK 1.4.x and JAVA_HOME is set to its base directory– Jakarta-Ant 1.4.x or higher and its bin directory is in PATH.

• Issue: ant run• Changing Input Data and other parameters• Changing Parser implementations

http://www.pankaj-k.net/xpb4j 8

XPB4J Demo

[ XPB4J Demo ]

http://www.pankaj-k.net/xpb4j 9

What determines processing time?

• Processing Activity• Input Data – Type and size of data• Machine ( CPU, RAM, OS, Disk, … )• JVM implementation• JVM state – Steady, First few executions• Processing API – [SAX, XmlPull], [DOM, JDOM,

DOM4J ], XSLT• Parser/Processor implementation

http://www.pankaj-k.net/xpb4j 10

Parsing/Processing APIs and implementations

• SAX– JDK 1.4.0, Xerces-2.0.1, GNU JAXP 1.0 beta1,

Piccolo 1.02

• XmlPull– XPP3, kXML

• DOM– JDK 1.4.0, Xerces, GNU JAXP

• JDOM (beta8)• DOM4J 1.3• XSLT

– JDK1 1.4.0, xalan-2.3.1

http://www.pankaj-k.net/xpb4j 11

Input Data

Input Data Set

Files Total Size

DS1

DS2

DS2

res0.xml

res0.xml,…, res9.xml

res.xml

11.9KB

98.3KB

111.7KB

Search Results from Google’s Web Services API on“Bill Gates”:

http://www.pankaj-k.net/xpb4j 12

Measurement Machine

• Self-assembled Server– AMD Athlon 900MHz CPU– 512 MB RAM– Dual boot -- Windows 2000/Mandrake

Linux 8.1

http://www.pankaj-k.net/xpb4j 13

Measurement Loop

// Psuedo code. Won’t compile. for (int r = 0; r < runcount; r++) // runcount runs { Runtime.gc(); // Hope that this will force garbage collection. long startMem = Runtime.totalMemory() - Runtime.freeMemory();

long startTime = System.currentTimeMillis(); for (int l = 0; l < loopcount; l++) // loopcount loops { for (file f in input files ) // Do the processing. process f; } long endTime = System.currentTimeMillis(); long endMem = Runtime.totalMemory() - Runtime.freeMemory(); int avgPT = (endTime - startTime)/loopcount; int memU = (endMem - startMem)/1024; System.out.println("Processing Time: " + avgPT + " milli secs."); System.out.println("Memory Use: " + memU + " KB."); }

http://www.pankaj-k.net/xpb4j 14

Questions: #1

• How does performance vary with SAX parsers?

• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– Processing API – SAX– JVM State – Steady

• Variable:– SAX Parser – JDK1.4, Piccolo 1.02, Xerces 2.0.1,

GNUJAXP-Beta1, Xerces 1.4.4– Input Data – DS1, DS2, DS3

http://www.pankaj-k.net/xpb4j 15

Results: #1

XStat Processing using SAX API with different parsers ( on J2SE-1.4.0 from Sun )

020406080

100120140160

J2SE-1.4.0

Piccolo-1.02

Xerces-2.0.1

GNUJAXP-Beta1

Xerces-1.4.4

SAX Parser Implementation

Avg

. P

roce

ssin

g T

ime

( m

illi

sec

s )

DS3

DS2

DS1

http://www.pankaj-k.net/xpb4j 16

Questions: #2

• How does performance vary with DOM parsers?

• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– Processing API – DOM– JVM State – Steady

• Variable:– DOM Parser – JDK1.4, Xerces 2.0.1, GNUJAXP-Beta1,

Xerces 1.4.4– Input Data – DS1, DS2, DS3

http://www.pankaj-k.net/xpb4j 17

Results: #2

XStat Processing using DOM API with different parsers ( on J2SE-1.4.0 from Sun )

0

100

200

300

400

500

J2SE 1.4.0 GNUJAXP-beta1

xerces-2.0.1 xerces-1.4.4

DOM Parsers

Avg

. P

roce

ssin

g T

ime

( m

illi

sec

s. )

DS3

DS2

DS1

http://www.pankaj-k.net/xpb4j 18

Questions: #3

• How does performance vary with XmlPull parsers?

• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– Processing API – XmlPull– JVM State – Steady

• Variable:– XmlPull Parser – XPP3, kXML– Input Data – DS1, DS2, DS3

http://www.pankaj-k.net/xpb4j 19

Results: #3

XStat Processing using XmlPull API with different parsers ( on J2SE-1.4.0 from Sun )

0

20

40

60

80

100

120

xpp3 kXML

XmlPull Parsers

Avg

. P

roce

ssin

g T

ime

( m

illi

sec

s. )

Series3

DS2

DS1

http://www.pankaj-k.net/xpb4j 20

Questions: #4

• How does performance vary with Memory Tree oriented parsers/processors?

• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– Processing API – Memory Tree oriented– JVM State – Steady

• Variable:– Parser/Processor – JDK1.4 DOM Parser, JDOM beta8,

DOM4J, JDK1.4 XSLT Processor– Input Data – DS1, DS2, DS3

http://www.pankaj-k.net/xpb4j 21

Results: #4

XStat Processing using tree oriented APIs ( on J2SE-1.4.0 from Sun )

0

200

400

600

800

1000

DOM JDOM dom4j XSLT

Tree Oriented APIs

Avg

. P

roce

ssin

g T

ime

( m

illi

sec

s. )

DS3

DS2

DS1

http://www.pankaj-k.net/xpb4j 22

Questions: #5

• How does performance compare across best of XmlPull, SAX and DOM parsers?

• Fixed:– Measurement Machine– JVM – Sun’s JDK1.4.0– Processing Activity – XStat– JVM State – Steady

• Variable:– Parser/Processor – XPP3, JDK1.4 DOM, JDK1.4 SAX– Input Data – DS1, DS2, DS3

http://www.pankaj-k.net/xpb4j 23

Results: #5

XStat Processing using Pull, SAX and DOM APIs ( on J2SE-1.4.0 from Sun )

0

50

100

150

200

xpp3 j2sdk-sax j2sdk-dom

Parsers

Avg

. P

roce

ssin

g T

ime

( m

illi

sec

s. )

DS3

DS2

DS1

http://www.pankaj-k.net/xpb4j 24

Questions: #6

• How does performance vary with JVM?

• Fixed:– Measurement Machine– Processing Activity – XStat– JVM State – Steady– Input Data – DS2

• Variable:– Parser/Processor – XPP3, Xerces 1.4.4– JVM – IBM-JDK1.3, JRockit1.3.1, Sun’s JDK1.3.1,

Sun’s JDK1.4

http://www.pankaj-k.net/xpb4j 25

Results: #6

XStat Processing on Input Data Set DS2 on different JVMs

020406080

100120140160

ibmjdk

1.3

jrock

it1.3

.1

jdk1.

3.1_0

2

jdk1.

4.0

JVMs

Avg

. P

roce

ssin

g T

Ime

( m

illi

sec

s. )

xpp3

xerces-1.4.4 SAX

xerces-1.4.4 DOM

http://www.pankaj-k.net/xpb4j 26

Questions: #6

• How does performance vary with JVM warmup?

• Fixed:– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– Input Data – DS2

• Variable:– Parser/Processor – XPP3, JDK1.4, JDOM beta8,

DOM4J– JVM State – First time, Steady

http://www.pankaj-k.net/xpb4j 27

Results: #7

XStat Processing for first iteration ( on J2SE-1.4.0 from Sun )

0200400600800

10001200

SAXDO

MXPP

JDOM

XSLT

DOM4J

Parser

Pro

cess

ing

Tim

e (

mil

li

secs

. )

first

any after 500

http://www.pankaj-k.net/xpb4j 28

Questions: #8

• How does memory use vary with parser/processor?

• Fixed:– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– JVM State – Steady– Input Data – DS2

• Variable:– Parser/Processor – XPP3, JDK1.4, JDOM beta8,

DOM4J

http://www.pankaj-k.net/xpb4j 29

Results: #8

Memory Used ( KB ) in XStat Processing

0

200

400

600

800

1000

SAX DOM XPP JDOM XSLT DOM4J

Parser

Mem

ory

Use

d (

KB

)

Memory Used

http://www.pankaj-k.net/xpb4j 30

Questions: #9

• How does performance vary with input xml filesize?

• Fixed:– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– JVM State – Steady

• Variable:– Parser/Processor – XPP3, JDK1.4, JDOM beta8,

DOM4J– Input Data – 100KB, 1MB, 10MB

http://www.pankaj-k.net/xpb4j 31

Results: #9

XStat Processing for large xml files -- Average

Processing Time

0

5000

Parser

Avg

. P

roce

ssin

g T

ime

( m

s ) 102KB

1000KB

9974KB

102KB 16 42 20 62 63

1000KB 134 355 167 440 440

9974KB 1270 3205 1540 3980 4020

SAX DOM XPP JDOM DOM4J

http://www.pankaj-k.net/xpb4j 32

Questions: #10

• How does memory use vary with input xml filesize?

• Fixed:– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– JVM State – Steady

• Variable:– Parser/Processor – XPP3, JDK1.4, JDOM beta8,

DOM4J– Input Data – 100KB, 1MB, 10MB

http://www.pankaj-k.net/xpb4j 33

Results: #10

XStat Processing for large xml files -- Memory Use

0

20000

40000

60000

Parser

Mem

ory

Use

( K

B )

102KB

1000KB

9974KB

102KB 853 600 1550 1320 700

1000KB 450 3700 300 5500 3500

9974KB 890 34000 1100 53000 39000

SAX DOM XPP JDOM DOM4J

http://www.pankaj-k.net/xpb4j 34

Questions: #11

• Any Interesting Observation?• Fixed:

– Measurement Machine– Processing Activity – Xstat– JVM – JDK 1.4.0– JVM State – Steady– Parser/Processor – JDOM beta8– Input Data – 100KB, 1MB, 10MB

• Variable– Node traversal loop – Loop1, Loop2

http://www.pankaj-k.net/xpb4j 35

Questions: #11 ( Contd. )

Loop1:…List children = elem.getChildren();for (int i = 0; i < children.size(); i++)

collectStat((Element)children.get(i), sc);

Loop2:…ListIterator li = children.listIterator(); while (li.hasNext())

collectStat((Element)li.next(), sc);

http://www.pankaj-k.net/xpb4j 36

Results: #11

XStat processing with JDOM -- Loop1 and Loop2

0

50000

100000

150000

XML File Size

Avg

. P

roce

ssin

g t

ime

( m

illi

sec

s. )

Loop1

Loop2

Loop1 67 1020 108000

Loop2 62 440 3980

97KB 995KB 9980KB

http://www.pankaj-k.net/xpb4j 37

Caveats

• Different APIs are not perfect substitutes• XSLT processors are significantly different

from parsers• Performance should be only one criterion

among many others• Xstat is an artificial processing and favors

SAX/XmlPull API

http://www.pankaj-k.net/xpb4j 38

What Next?

• Comparison with C/C++ Parsers/Processors• Dynamic generation of input data• Framework improvements• Better Reporting and Presentation• More processing activities• Better tuning ?!

http://www.pankaj-k.net/xpb4j 39

How can you benefit ( and contribute )?

• Benefit from XPB4J– Gain insight from the report– Learn XML by playing with code– Validate your assumptions– Tune your parser/processor ( if you are an

implementer )

• Contribute to XPB4J– Run it under your environment and share your results– Write processing code– Extend the framework

• Discussion mailing list is: – [email protected]

http://www.pankaj-k.net/xpb4j 40

Q & A