29
Large output in XML with Unicode and namespace Thomas Aglassinger http://roskakori.at

Large output in xml with unicode and namespace

Embed Size (px)

Citation preview

Large output in XMLwith Unicode and namespace

Thomas Aglassingerhttp://roskakori.at

We wanted to write this:

We wanted to write this:

XML

We wanted to write this:

XML Unicode

We wanted to write this:

XML UnicodeNamespaces

We wanted to write this:

XML UnicodeNamespacesLarge

We already knew how to read XML.

● xml.dom.minidom.parse()● xml.etree.ElementTree.parse()● xml.sax.parse()● lxml.etree.parse()

http://encyclopediadramatica.se/File:Bill_Nye_Expert.jpg

So we went to the Python Library...

http://upload.wikimedia.org/wikipedia/commons/2/2b/Melk_-_Abbey_-_Library.jpg

...and we were like:

http://www.apple.com/switch/stories/ellenfeiss.html (in 2002)

xml.dom.minidom

xml.dom.minidom

Many lines of code

Explicit attributefor name space

Verbose way toadd attributes

xml.dom.minidom

● “Users [...] who would like to write less code for processing XML files should consider using the xml.etree.ElementTree module instead”(The Python Standard Library, Chapter 19, Structured Markup)

http://www.destructoid.com/blogs/Sevre/femshep-5-a-space-opera-208844.phtml

xml.etree

xml.etree Clark notation instead of XPath

Generally shorterbut wider

RequiresPython 2.7

Similar issueswith lxml.etree

Memory issues

● So far, XML-Document is built in memory● Won't work well for large sets of data● We need a streaming interface

codecs.open() and write()

codecs.open() and write()

Manualescaping

It just doen'tfeel “right”

saxutils.XMLGenerator

saxutils.XMLGenerator

No support for <x/>,only <x></x>

Lack of basic validation

● Are all tags closed?● In the correct order?● Has a namespace been

registered before usage?

So we had to go all kinky...

http://mylittlefacewhen.com/f/3781/

...and write yet-another XML module

http://www.110pounds.com/?p=6880

Before you judge too harshly:

It justwrites XML!

loxun

loxunCompact

namespacesyntax

Compactattributesyntax

Defaults toUTF-8 output

loxun

Pure Python 2.5+

Streaming interfacefor low memory usage

Supportswith-

statement

No dependencies on other modules

Optimizes <x></x>to simply <x/>

Raises XmlError if...

● ...you add references to undefined name spaces

● ...if you forget to close tags (elements)● ...if you build non-well formed documents● ...if you add non-ASCII characters in 8-bit

strings

Available from:

● http://pypi.python.org/pypi/loxun/● https://github.com/roskakori/loxun● Open Source

$ sudo pip install loxunDownloading/unpacking loxun Downloading loxun-1.3.zip Running setup.py egg_info for package loxun Installing collected packages: loxun Running setup.py install for loxun Successfully installed loxunCode examples for this talk:

https://gist.github.com/3067859

Large output in XMLwith Unicode and namespace

Try loxun for:

Also writessmall ASCII

files!