Upload
roskakori
View
829
Download
0
Tags:
Embed Size (px)
Citation preview
Large output in XMLwith Unicode and namespace
Thomas Aglassingerhttp://roskakori.at
We already knew how to read XML.
● xml.dom.minidom.parse()● xml.etree.ElementTree.parse()● xml.sax.parse()● lxml.etree.parse()
http://encyclopediadramatica.se/File:Bill_Nye_Expert.jpg
So we went to the Python Library...
http://upload.wikimedia.org/wikipedia/commons/2/2b/Melk_-_Abbey_-_Library.jpg
...and we were like:
http://www.apple.com/switch/stories/ellenfeiss.html (in 2002)
xml.dom.minidom
● “Users [...] who would like to write less code for processing XML files should consider using the xml.etree.ElementTree module instead”(The Python Standard Library, Chapter 19, Structured Markup)
http://www.destructoid.com/blogs/Sevre/femshep-5-a-space-opera-208844.phtml
xml.etree Clark notation instead of XPath
Generally shorterbut wider
RequiresPython 2.7
Similar issueswith lxml.etree
Memory issues
● So far, XML-Document is built in memory● Won't work well for large sets of data● We need a streaming interface
Lack of basic validation
● Are all tags closed?● In the correct order?● Has a namespace been
registered before usage?
So we had to go all kinky...
http://mylittlefacewhen.com/f/3781/
...and write yet-another XML module
http://www.110pounds.com/?p=6880
loxun
Pure Python 2.5+
Streaming interfacefor low memory usage
Supportswith-
statement
No dependencies on other modules
Optimizes <x></x>to simply <x/>
Raises XmlError if...
● ...you add references to undefined name spaces
● ...if you forget to close tags (elements)● ...if you build non-well formed documents● ...if you add non-ASCII characters in 8-bit
strings
Available from:
● http://pypi.python.org/pypi/loxun/● https://github.com/roskakori/loxun● Open Source
$ sudo pip install loxunDownloading/unpacking loxun Downloading loxun-1.3.zip Running setup.py egg_info for package loxun Installing collected packages: loxun Running setup.py install for loxun Successfully installed loxunCode examples for this talk:
https://gist.github.com/3067859