XML-Troubleshooting Professional Magazine

Embed Size (px)

Citation preview

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    1/47

    qwertyuiopasdfghjklzxcvbnmqwerty

    opasdfghjklzxcvbnmqwertyuiopasdfg

    klzxcvbnmqwertyuiopasdfghjklzxcvb

    nmqwertyuiopasdfghjklzxcvbnmqwe

    yuiopasdfghjklzxcvbnmqwertyuiopa

    dfghjklzxcvbnmqwertyuiopasdfghjklz

    vbnmqwertyuiopasdfghjklzxcvbnmq

    wertyuiopasdfghjklzxcvbnmqwertyu

    pasdfghjklzxcvbnmqwertyuiopasdfgh

    klzxcvbnmqwertyuiopasdfghjklzxcvbmqwertyuiopasdfghjklzxcvbnmqwer

    uiopasdfghjklzxcvbnmqwertyuiopasd

    ghjklzxcvbnmqwertyuiopasdfghjklzxvbnmqwertyuiopasdfghjklzxcvbnmrt

    uiopasdfghjklzxcvbnmqwertyuiopasd

    ghjklzxcvbnmqwertyuiopasdfghjklzx

    XMLTroubleshooting Professional

    MagazineVolume 5 Issue 3, March 2001

    By Steve Litt - Publish By Amirul Asyraf

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    2/47

    Editors Desk

    By Steve Litt

    What's up with XML? Is it a revolutionary technology destined to be our livelihood the next fewyears, or a passing fad? Is it a universal standard specified by the W3C, or has it been usurpedand proprietarized by Microsoft? And for some, the most nagging question is "how the heck do Ilearn it?". This issue of Troubleshooting Professional will attempt to answer all 3 questions. Butfor those who turn to the last page of the book, let me answer the questions now:

    1. XML is a revolutionary technology destined to be our livelihood the next few years.2. XML is a universal standard specified by the W3C.3. You can learn the basics of XML in this issue of Troubleshooting Professional.

    XML was detected by trade mags' radar in 1997 or 1998. It was proclaimed a world changingtechnology. Learn it and you're rich.

    We were all skeptical. After all, the trades had predicted similar futures for push technology,ATM, and a hundred other technologies we've all forgotten. But the trades get it right sometimes.Witness Java and Linux. And definitely XML.

    It's 2001. XML is being incorporated in all sorts of projects. The reason you don't hear about itconstantly is the *app* that reads, writes, changes and renders the XML is written in a traditionallanguage such as Java, Perl, Python or C++. In that respect XML is data. But used correctly,much of an application's logic can be stored as easily modified XML. The actual C++, Java,Python or Perl code then becomes primarily the user interface. Imagine how nice it would be toimplement your business rules as XML. You can!

    Then there's the Microsoft connection. Microsoft is gung-ho about XML. Does that make XMLan unwise move?

    Probably not. Even if Microsoft does what they do best, and somehow manage to proprietarizesome dialects of XML, it will be easy to reverse engineer, and may even be legal to do so in spiteof UCITA supported anti-reverse engineering license language. Meanwhile, the rest of us canuse our own dialects.

    "Dialects" are numerous. As will be explained later in this magazine, XML itself is just anextremely intuitive general specification for how to declare something that could be consideredhierarchical data, or markup language, depending on your viewpoint. Within that specification,an implementer specifies his own set of rules for naming XML elements, and what otherelements each element can contain. That specification can be implemented on paper, ortechnologically enforced with a DTD or schema. If this paragraph loses you don't worry --everything in this paragraph will be explained in detail in this magazine.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    3/47

    Unfortunately, XML is poorly documented. There are exceptions. The W3C specifications areeasily readable and understandable. But for the most part, XML books do nothing but documentXML's syntax, rules and vocabulary, leading the novice reader to ask "so how can I dosomething with it". If you follow along with the Java examples in this magazine, you'll knowexactly what you can do with XML. Once you understand XML at that level, you can port that

    knowledge to Perl, Python, C++ and other languages that have XML APIs.

    XML derives its power from the fact that it can represent anything the human mind canconceive. And that representation is very readable both for a human and for a machine. Theconcept is so clean that upon understanding it, my first question was "why didn't I inventXML?". I certainly have the intelligence to have invented it -- XML's not rocket science. I'veneeded it for years, but had to "roll my own" every time I needed a configuration file or dataformat.

    So get familiar with XML. Whether you're in the Microsoft world or the Open Source world, orsomewhere in between, you'll need to interface with it in the next couple years.

    How can a Troubleshooter benefit from XML? XML should make applications simpler todiagnose and simpler to tweak. And an XML file provides loads of testpoints from which youcan manipulate the apps interacting with it. It brings back some of the Troubleshootingadvantages of the intermediate files of the Cobol era, but unlike those, it's persistent and useful inand of itself.

    So whether you're a Troubleshooter, programmer, DBA, Sysadmin, or just a person who likestechnological progress, kick back, relax, and enjoy your magazine.

    Steve Litt is the documentor of theUniversal Troubleshooting Process. He can be reached atSteve Litt's

    email address.

    About this Issue's Exercises, PLEASE

    READ!!

    By Steve Litt

    If you complete the XML tutorials in this issue of Troubleshooting Professional, you will havemastered the following:

    XML terminology -- Documents, elements, attributes, DTD's, DOM, SAX, callbacks,well formed, valid, parsers, and the like.

    XML construction and syntax. XML application architecture. XML tree navigation. A thorough understanding of DOM and the frequently used interfaces and methods of the

    DOM API.

    http://www.troubleshooters.com/tuni.htmhttp://www.troubleshooters.com/tuni.htmhttp://www.troubleshooters.com/tuni.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/tuni.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    4/47

    Ability to code an XML/DOM app, complete with node navigation, access, modification,adds, and deletes.

    Ability to build DOM documents from scratch. A thorough understanding of SAX, including use of the ContentHandler and

    ErrorHandler objects, and construction of the major callback functions.

    Ability to code a SAX app to do what you need to. Ability to code a SAX app that loads per-record DOM documents for out of order

    processing. Guidelines concerning when to use SAX and when to use DOM. A thorough understanding of DTDs, and a methodology for creating a DTD to match and

    validate existing XML code. How to tell the Xerces parser to validate. Syntax for in-file DTDs as well as using DTDs in separate files. Ability to quickly read and understand intermediate and advanced XML books, as well as

    various specification documents from standards bodies, and XML websites. Ability to write XML apps on the job.

    This issue of Troubleshooting Professional Magazine is organized as a tutorial. It's takes youthrough all aspects of beginning level XML, well into the intermediate level. I stronglyrecommend you go through this tutorial in the order it's written. That means going down thispage through the "Learning from the Masters: How Dia Uses XML" article, then go down the"XML Java Coding Exercises" page, and then come back to this page and continue where youleft off. Everywhere necessary, there are links to point you in the right direction.

    The coding exercises are all in Java. My research indicates Java has the most mature support forXML. Once you download Xerces from the Apache Foundation and install it, these exerciseswork on a Linux box with Java installed. Java is the most straightforward way I could offer

    coding exercises.

    I had originally intended to do the exercises in both Perl and Java, but Perl DOM support provedproblematic, and there wasn't enough time.

    !! STOP THE PRESSES !!

    Xerces-Perl for Linux has shipped!

    After I had done most of the exercises in Java, I got an email message that there now exists. It's so new it's not

    on CPAN, and I couldn't find it on xml.apache.org. It's been tested only on Debian. But Xerces is a killer tool,

    and a Perl/Linux version is a good thing. Stay tuned. More info as it comes in.

    Rest assured, though, if you're a Perl, Python or C++ person, everything you learn in this tutorialwill apply to XML in your language of choice. In every exercise, I used only calls defined in theDOM and SAX specifications. I used no "native Javaisms" to manipulate XML.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    5/47

    Java is a killer language. It's portable, ubiquitous, free beer and in some implementations freespeech, it's fast enough, and it's corporationally correct. These are some more reasons I choseJava for the XML coding.

    This tutorial was written, tech edited, and tested in Linux (Mandrake 7.2). No effort was made to

    test under Windows. Instead I used the time to delve deeper into XML. That being said, I knowof no reason the Java exercises shouldn't work on a Windows box that's properly configured withJava and Xerces. If you don't have a Linux box, and you can't get your hands on one, by allmeans use a Windows box for the Java exercises. You'll need to convert some of the shellscriptsto batch files, and you'll need to do a Windows install of the JDK and Xerces instead of a Linuxinstall, but that should be pretty easy.

    The Dia diagramming program, basis of the "Learning from the Masters: How Dia Uses XML",originated on Linux but has been ported to Windows. The Linux package is more mature, so ifyou have a choice you might want to do that exercise on a Linux box. And that's an exceptionallyimportant exercise, so even if you don't have a Linux box, please try to find someone who will

    let you use theirs for this exercise. If you don't know anyone with a Linux box, find your localLinux User Group (LUG) and beg someone there to let you use their box to do the Dia exercises.

    Personally, I felt more comfortable working on a Linux platform. If you feel more comfortableon a Windows platform, I'd imagine you should be able to get this tutorial to work from withinWindows, although of course I haven't tested it on Windows.

    Steve Litt is the main author ofSamba Unleashed. He can be reached atSteve Litt's email address.

    What is XML?

    By Steve Litt

    In this Article You Will Learn

    XML is a styles based markup language XML is hierarchical in nature. XML is extremely readable and easy to understand. XML can represent almost any concept. XML can implement a major part of an application.

    This is a far trickier question than you can imagine, and I think once you master the answer,everything else falls into place.

    One possible answer is that XML is a markup language. And that's absolutely true, as anyonewho sees the bracketed begin and end tags for its elements can attest. This answer is true, butalmost useless. Because to think of XML as HTML on steroids is to relinquish 90% of XML'sfunctionality.

    http://www.troubleshooters.com/bookstore/su.htmhttp://www.troubleshooters.com/bookstore/su.htmhttp://www.troubleshooters.com/bookstore/su.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/bookstore/su.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    6/47

    Another possible answer is that XML is a styles-based markup language, rather than anappearance-based markup language like HTML. Once again, so true, and so useless.

    I think a much better definition for XML is a specification for a markup language that can be

    used to represent almost any concept. Keeping in mind that neither phonebook, person, info

    nor name are keywords, imagine how the following could be used:

    800-555-1212407-555-5555Skating buddyRacing inlines

    800-555-1234407-555-2222

    Coworker8

    You've just implemented a phone book. Add a user interface and you're done. The user interfacereads the fields from the XML, and places the values from those fields in on-screen text boxes,queries the user to change the contents of those fields And notice that if you write that userinterface well, you can add new fields simply by changing the XML. You can have a program onthe other end that puts the finished XML into a database, assuming the database is flexibleenough to represent such data.

    Notice a few facts about the preceding XML code:

    Just like HTML, start tags are angle bracket enclosed, and end tags are angle bracketenclosed with a prepended forward slash.

    An entity started by a start tag and ended by an end tag is called an element. Elements can contain other elements. In the preceding case, the phonebook element

    contains two person elements. The first of the two person elements contains four info

    elements and a bicycle element. Thus XML is perfect for setting describing andmanipulating any kind of hierarchy. Please note that phonebook, person and info areNOT reserved words.

    An element can contain a mix of different elements, as shown by the first person, whohas both info elements and a bicycle element. Additionally, the mixture can contain

    both elements and text nodes. An XML file must contain exactly one element in the top level. In the preceding example

    that top level element is the phonebook element. In any XML file, the single top levelelement is often called the document element.

    Free standing text between a start tag and an end tag is called a text node. You'll learnmore about this in the article on the DOM spec. In the preceding example, the actual

    phone numbers (such as 800-555-1212 for John Smith), are text nodes.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    7/47

    Any element can have zero or more attributes. In the preceding XML code, each infoelement has one attribute, an attribute called name (name is not a reserved word, it could

    have been called infoname or whatitis). Attributes are name/value pairs, starting withthe attribute's name, then an equal sign, then the attribute's value within quotes. Attributesare declared in the start tag of an element. An attribute represents a fact about the

    element. Elements, attributes and text nodes are all nodes. The idea of nodes is important because

    DOM documents are navigated and traversed nodewise.

    Because elements can contain other elements, to a certain extent attributes and sub-elements areinterchangeable. For instance, in the person element I described the person's name with an lname

    and an fname attribute. Instead, I could have had each person element contain an lname and afname subelement, each of which had the appropriate name between the begin and end tag. Inother words:SmithJohn

    800-555-1212407-555-5555Skating buddyRacing inlines

    Please remember there are no reserved words in the preceding example. info and name are juststrings I decided upon to make it self documenting. As an alternative to the preceding, I couldhave even used info tags to accomplish the same purpose:SmithJohn800-555-1212407-555-5555

    Skating buddyRacing inlines

    Your choice of attributes vs. elements depends on things such as whether you'll need more thanone of the entity (no two attributes of a single element can have the same name), and whetheryou should always have the entity (that might favor using an attribute). Also, use elements iforder is important, because the XML specification doesn't specify the order of attributes, soparsers don't necessarily preserve attribute order. All this will be explained later in thismagazine.

    The preceding examples have used XML as a hierarchical representation. But it can also be used

    as stylized markup:

    Why XML is So GreatXML is absolutely wonderful!Andit's not just because XML is CorporationallyCorrect!Now let's talk about...

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    8/47

    In the preceding, the XML markup describes the styles, or functionality, of marked up text. It'sup to the application rendering the XML to assign an appearance to such styles. Even therelationship between style and appearance can be moved out of the application using XSL(Extensible Style Language). XSL is a separate but related subject that is not discussed in thisissue of Troubleshooting Professional.

    Tags must be nested, never interlaced. The following is not allowed:

    XML is great and good.

    The well formed way to write the preceding would be to nest tags, like this:

    XML is great and good.

    Because tags can't be interlaced, but instead must be nested, all XML represents a hierarchy. Forinstance, the preceding snippet could be thought of like this:

    XML is

    truly

    great

    and fantastic.

    Generally speaking, in XML intended to represent a hierarchy, an element containing a text nodecontains no other elements or text nodes, but in XML intended to represent markup, an element

    often contains several text nodes and several other elements. But this is not a rule, only a custom.

    I believe the best way to learn XML is through the DOM (Document Object Model) spec, soDOM is discussed voluminously in later portions of this issue of Troubleshooting ProfessionalMagazine.

    In this Article You Have Learned

    XML is a styles based markup language XML is hierarchical in nature. Tags must be nested, never interlaced. XML is extremely readable and easy to understand. XML can represent almost any concept. XML can implement a major part of an application.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    9/47

    Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist". He can be

    reached at Steve Litt's email address.

    Some Definitions

    By Steve Litt

    In this Article You Will Learn

    You'll learn definitions for the following:o Documento Elemento Attributeo Text Nodeo Nodeo Document elemento DTDo Well formedo Valido Schemao DOMo SAXo Dom documento Namespace

    Document

    The data contained in an entire XML file:

    getchbsd.pl;

    Element

    The entity defined by a start tag and end tag, but not the entities contained betweenthe start and end tags:

    getchbsd.pl

    Note that all elements are nodes, but not all nodes are elements. Elements inherit allmethods of nodes, and add some of their own. Nodes are discussed later in thistable.

    Attribute

    The name/value pairs enumerated in an element's start tag:

    getchbsd.pl

    Text Node The text between the open and close tag of its parent element:getchbsd.pl

    Node

    The most atomic XML entity that is programmatically useful. Elements, attributesand text nodes are all nodes. There are other node types which are described in theDOM spec:// NodeTypeconst unsigned short ELEMENT_NODE = 1;

    http://www.troubleshooters.com/bookstore/rl.htmhttp://www.troubleshooters.com/bookstore/rl.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/bookstore/rl.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    10/47

    const unsigned short ATTRIBUTE_NODE = 2;const unsigned short TEXT_NODE = 3;const unsigned short CDATA_SECTION_NODE = 4;const unsigned short ENTITY_REFERENCE_NODE = 5;const unsigned short ENTITY_NODE = 6;const unsigned short PROCESSING_INSTRUCTION_NODE = 7;

    const unsigned short COMMENT_NODE = 8;const unsigned short DOCUMENT_NODE = 9;const unsigned short DOCUMENT_TYPE_NODE = 10;const unsigned short DOCUMENT_FRAGMENT_NODE = 11;const unsigned short NOTATION_NODE = 12;

    The Node interface of the DOM spec contains most of the navigational methods.Note that all elements are nodes, but not all nodes are elements. Elements inherit allmethods of nodes, and add some of their own.

    Documentelement

    Top level element, of which there can be only one per XML file:

    getchbsd.pl;

    DTD

    A sort of type declaration for XML. Here's an ultra-simple one:

    Note that docelement is NOT a reserved word.

    Wellformed

    An XML file conforming to the XML syntax rules, including:

    Every start tag has an end tag, and none are "interlaced", but instead all areproperly nested.

    Every attribute has a name followed by an equal sign followed by a quotedvalue.

    There is one and only 1 top level elementValid Well formed, AND conforming to the rules of the DTD.

    Schema Performs a function similar to a DTD.

    DOMStands for Document Object Model. A method of placing an entire XML file'shierarchy, with all its elements, in a memory object. This memory object is built forquick lookup, traversal and modification.

    SAX

    Stands for Simple API for XML. An event driven method of dealing with an XMLfile. Instead of containing the entire hierarchy in memory at one time, it presents

    elements as events which can then be exploited by your code. SAX has theadvantage of less memory consumption for large files, but has the disadvantage thatthe programmer must write code to save anything he wants saved, and must writechanges to the XML file in sequential order. DOM allows random changes toelements. Because needn't keep entire files in memory at once, SAX is universallyuseful, whereas DOM is not useful for truly huge XML files.

    DOM In the DOM standard, an object containing the entire hierarchy, elements, and

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    11/47

    Document information of an XML file.

    DOMobject

    Any object contained within a DOM document. Vague, ambiguous, andmisunderstood -- don't use this term. THIS TERM IS NOT A SYNONYM FORDOM document!!!

    Namespace

    A method of uniquifying tag names from various XML varients:Circuit - Vertical Zener Diode

    In this Article You Have Learned

    You'll learn definitions for the following:o Documento Elemento Attributeo Text Nodeo Nodeo Document elemento DTDo Well formedo Valido Schemao DOMo SAXo Dom documento Namespace

    Steve Litt is the developer ofThe Universal Troubleshooting Process troubleshooting courseware. He can be

    reached atSteve Litt's email address.

    Anatomy of an XML App

    By Steve Litt

    http://www.troubleshooters.com/utp/tcourses.htmhttp://www.troubleshooters.com/utp/tcourses.htmhttp://www.troubleshooters.com/utp/tcourses.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/utp/tcourses.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    12/47

    In this Article You Will Learn

    The high level architecture of a DOM/XML application. Interaction of the XML file, parser, DOM document, XML write logic, Renderer,

    Modification logic, and output.

    An XML application reads an XML file, after which it can modify and rewrite the XML, and/orit can print output based on that XML (commonly called "rendering"). Note that "rendering" cantake widely diverse forms, including changing which fields are available on a form, printing avector graphic, or the most obvious case of rendering marked up text. Rendering can even takethe form of configuring an application, or executing remote procedures.

    The DOM model is easiest to understand, so here is the architecture of an XML app using DOM:

    So here's what happens: A parser reads the XML file and builds a DOM document to match theXML file. From that point until a save is performed, all interaction between the app and XMLhits the DOM document rather than the corresponding XML file. It's interesting to note thatalmost all XML parsers use SAX. The reason is simple enough. Before you build a DOMdocument you must detect events such as start of element (start tag encountered), end of element(end tag encountered), new attribute (name followed by equal sign followed by quoted stringencountered), and the like. So DOM can be thought of as an extra abstraction to lessen theprogrammer's workload, at the expense of memory usage.

    Modifications are made directly to the DOM document. Elements can be added, deleted,renamed, rearranged. Text nodes can be added, deleted or changed. Elements can be movedeither within the same level, or promoted or demoted to different levels.

    Obviously, the DOM is modified in apps that rewrite the XML file. But DOM modification isalso often done in an app that only renders the XML. The classic example is in a "DOMWalker"app, which simply walks the DOM tree and prints what it finds in a hierarchical outline. In fact,

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    13/47

    the newlines and spaces intended to make the XML file more readable are actually legitimatetext nodes in XML, but in an XML app concerned only with a hierarchy they're extraneous.Therefore, the first thing a DOMWalker program does is delete text nodes made up only ofwhitespace. Source code for an example DOMWalker is given later in this magazine.

    Rendering is the heavy part of most XML apps. It's often graphics intensive. Consider the Diavector drawing program, which keeps all drawing information in XML but renders as geometricshapes. Often there are several rendering processes, one for each kind of output. Thus a bookauthored in XML could be rendered as a paper book, as a PDF, as a Postscript file, or as anHTML page or series of HTML pages. Indeed, this is one of the primary benefits of styles based

    documents. Often the rendering itself is decoupled from the app by use of XSL (eXtensible Style

    Language), much the same as program logic is decoupled from the app using XML.

    Rewriting the XML file is actually easy -- about what you'd expect for your last class project in acollege Programming 101 course. In the case of DOM, you've already assembled the output in aDOM document, so you just walk its tree and write the markup.

    In the case of SAX based XML apps it's a little harder because you often don't read theinformation in the same order you want to write it. In other words, if your app's specificationcalls for something occuring later in the input modifying something earlier in the output, youcan't just use a read-write loop. So you do the typical stuff -- keep some things in memory, ormaybe write an intermediate file and then sort it, or run 2 passes through the XML. This is whyfor apps interacting with guaranteed small XML files, DOM is better.

    In this Article You Have Learned

    The high level architecture of a DOM/XML application. Interaction of the XML file, parser, DOM document, XML write logic, Renderer,

    Modification logic, and output.

    Steve Litt is the documentor of theUniversal Troubleshooting Process. He can be reached atSteve Litt's

    email address.

    Simplified Explanation of the DOM API

    By Steve Litt

    In this Article You Will Learn

    How to create a DOM Document object with a parser. The three main DOM activities. Using the "checker metaphor" to understand iterative document navigation.

    http://www.troubleshooters.com/tuni.htmhttp://www.troubleshooters.com/tuni.htmhttp://www.troubleshooters.com/tuni.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/tuni.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    14/47

    The major DOM navigation public variables and equivalent methods. The Down, right, up, done DOM walking algorithm. A simple Java implementation of that algorithm. How to access elements by name. How to add, change and delete information in the DOM document. Navigating attributes by name and sequentially.

    If you understand DOM, you're 90% of the way to understanding XML.

    What you might think of as a "DOM object" is really an instance of the Document class:

    DOMParser dp = new DOMParser();dp.parse("myfile.xml");

    Document doc = dp.getDocument();

    In the preceding code, the parser delivers an instance ofDocument, called doc, which containsthe entire information hierarchy contained in the original file myfile.xml. You can use methodsfrom the DOM API to extact any info from the DOM document if that information was in theoriginal XML file (with a very few exceptions)

    The simplest explanation of a DOM document is that it's an in-memory tree containing all infofrom the XML file hierarchy, together with with varous methods to navigate that tree, to getinformation from a specific node, and to add, delete, rearrange or modify nodes. If you cannavigate, get, and change, that's pretty much all you need to do with a hierarchy.

    There's no better documentation on DOM than W3C's DOM specification papers, available attheir website. To learn XML, you should spend about a day reading the parts dealing with XML(not with HTML). It is time *very* well spent.

    The purpose of this article is to help you understand what you will see when you read the DOMspec, so that you don't go off on the wrong track and you aren't overwhelmed.

    Throughout this article, keep in mind that DOM methods enable three main activities:

    1. Navigating the hierarchy tree2. Viewing information (get)3. Modifying information(put, delete, add, move, etc.)

    Navigating the hierarchy tree

    The DOM navigation methods are defined so you can navigate the tree without recursion. Theydo this using methods that move your current position around like a checker on top of the various

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    15/47

    nodes. Here I use the word "checker" like the round plastic play pieces used in the board gamecalled "Checkers".

    Note: The "checker" metaphor will be used extensively throughout this issue of Troubleshooting

    Professional.

    Most of the methods to read and modify elements operate on the element with the checker.HOWEVER...

    My assertion that they operate by moving a checker around is not quite accurate, because thesenavigation methods do not change the state of the DOM document. Instead, they simply deliver anode. The programmer records the current position by assigning the returns of these methods to anode object. That node object marks the place of the "checker".

    !! CAUTION !!

    Although attributes are nodes, they are invisible to the navigationmethods and public variables listed below. There are specialized

    methods and public variables to access and navigate attributes.

    The following is a list of the major navigational methods, and the equivalent public variables,and the interfaces in which these methods and public variables are implemented. Immediatelybelow the list is a sample hierarchy to walk. Observe the naming convention that in general thevariable name is converted to the method name by capitalizing the first leter, and prependingeither get or set as appropriate. For the time being, don't worry about the Interface column:

    Method Equivalent public variable Interface

    getOwnerDocument()getDocumentElement()getFirstChild()getLastChild()getNextSibling()getPreviousSibling()getParentNode()

    readonly attribute Document ownerDocument;readonly attribute Element DocumentElement;readonly attribute Node firstChild;readonly attribute Node lastChild;readonly attribute Node nextSibling;readonly attribute Node previousSibling;readonly attribute Node parentNode;

    NodeDocumentNodeNodeNodeNodeNode

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    16/47

    In plain English, you start with the checker on the document element. At every juncture:

    You go down if you can go down. Otherwise you go right if you can go right Otherwise you go up if you can go up Otherwise you're done.

    Trace the preceding pseudocode algorithm on the hierarchy diagram above and you'll see what Imean. Starting at the document element, you go down to A, then right to B, then down to 1, thenright to 2, then right to 3, then up to B, then right to C, then up to the document element, atwhich time you're done because you've already been there.

    That brings up an important point. You shouldn't be able to go down from an element if you'vealready done so. When you first arrive at an element via a downward or a rightward movement,you descend if you can. But sooner or later, you'll come back up to that same element afteryou've gone as far right as you can in the level below the element. Obviously, you don't want todescend again, as that would make an infinite loop as described in the following indentedparagraph:

    From A move right to B. From B move down to 1. From 1 move right to 2. From 2 move right to

    3. From 3 move up to B. From B move down to 1...

    So you implement a boolean control variable (let's call it ascending) that is true when youascend to a node, and false otherwise. The definition of "can go down" then becomes not onlythat there are children, but also that you are not ascending. The following Java loop walks a treeand calls once printNodeInfo() for each element:

    mynode=doc.getDocumentElement();while (true) {if (!ascending) {

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    17/47

    printNodeInfo(mynode);}

    if ((mynode.hasChildNodes()) && (!ascending)) {mynode=mynode.getFirstChild();ascending = false;

    }else if (mynode.getNextSibling() != null) {mynode=mynode.getNextSibling();ascending = false;

    }else if (mynode.getParentNode() != null) {mynode=mynode.getParentNode();ascending = true;

    }else {break;

    }}

    In the preceding Java code, object mynode is the "checker". Basically, what the code says is

    perform an action (printNodeInfo() in this case) on the checkered element, and then makeyour move. Move the checker down if you can, otherwise move it right if you can, otherwisemove it up if you can, otherwise you're done (because you've returned to the document element).

    Oh, and one more thing. The preceding navigation accesses not only elements, but also text

    nodes. You can discern the two types with the nodeType public variable or the getNodeType()method implemented in the Node interface. However, remember that the preceding navigationmethods do NOT bring the checker to rest on attributes. Attributes have their own navigation andaccess methods. Using the "checker" metaphor, they could be said to have their own checker.

    Accessing Elements by Name

    The preceding section of this article discussed navigating elements by tree moves. That's idealwhen you don't know what elements you'll encounter. But sometimes, because of the nature ofthe application, you know it's likely that under a particular element, and that it's likely you'll haveone or more elements of a certain name. The following XML is an example:SmithJohn800-555-1212407-555-5555

    Skating buddyRacing inlines

    It's likely you'll have info elements, and you might want to list them. That's when you use thegetElementsByTagName(name)syntax, which delivers a NodeList (similar to an array) of all

    such subelements. You can then loop through the NodeList to put your checker on each of thosesimilarly named elements. This can be done even when you know there will be only one suchnamed element.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    18/47

    Viewing information (get)

    Once your checker is on an element, in some DOM implementations, including Python, you canaccess that element's information with variables:readonly attribute DOMString nodeName;

    attribute DOMString nodeValue;readonly attribute unsigned short nodeType;

    In other implementations, including Java, you use methods to accomplish these same things:public String getNodeName();public String getNodeValue();public short getNodeType();

    Some implementations allow you to do either.

    Modifying information(put, delete, add, etc.)

    To change the value of an element, use the nodeValue public variable or its setNodeValue()

    method equivalent. It's read/write. To change name of the element, you'll need to replace theelement with a different element, using the replaceChild(newChild,oldChild) syntax. Notethat this works not on the node with the checker, but a child of the node with the checker. To dothis you need to move your checker up. Depending on language and DOM implementation, and

    assuming the checker is on myElement, this might be possible with a 1 liner:myElement.parentNode.replaceChild(newElement,myElement)

    Otherwise, try something like this:Element tempElement = myElement;myElement = (Element)myElement.getParentNode();myElement.replaceChild(newElement,tempElement);

    An element can be inserted before the checker like this:Element tempElement = myElement;myElement = (Element)myElement.getParentNode();myElement.insertBefore(newElement,tempElement);myElement = tempElement; //Return to original position

    An element can be appended after the checker like this:Element tempElement = myElement;myElement = myElement.getParentNode();myElement.appendChild(newElement,tempElement);myElement = tempElement; //Return to original position

    Once the new node is in place, you can change its value with its nodeValue public variable, orthe setNodeValue() method.

    The "checker" element can be deleted like this:

    Element tempElement = myElement;myElement = myElement.getParentNode();myElement.removeChild(newElement,tempElement);

    In the case of deletion, you can't move the checker back to the original node because the originalnode is gone. The programmer handles this by storing where he wants to go after the deletion.For instance, a DOM walker that deletes all blank text nodes keeps a copy of where the checkerwas in the previous iteration, and upon deletion goes back there. In the next iteration, it gets thenode "after" the deleted one.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    19/47

    Navigating Attribute Nodes

    So far this article focused exclusively on navigating or accessing elements and text nodes. Butwithin elements there are sometimes attribute nodes. There are two broad ways to access anattribute node:

    1. By attribute name2. Sequentially

    Navigating Attributes by Name

    Believe it or not, accessing nodes by attribute name is by far the more useful. That's because ifyour app has never heard of a given attribute, there's not a whole lot it can do with it, assumingyou're using attributes as they're designed to be used. So it's rare to access attributes sequentially,but it can be done.

    Navigating attributes is simpler than navigating elements because attributes cannot containanything else, and because you cannot have two attributes with the same name.

    To get the value of a named attribute, use the my element.getAttribute(attribname) syntax.

    To get an attribute object, use the element.getAttributeNode(attribname) syntax. Anattribute object contains the attribute name, its value, whether the value was specified as opposedto default, and the element that owns the attribute.

    Navigating Attributes Sequentially

    Getting attributes sequentially is much more difficult, and various DOM implementations havetheir own glitches. You'll need to experiment to get it just right. A typical use of sequentialaccess to attributes is a reporting program, or writing the DOM document out to an XML file.

    An element's attributes are accessed as an array, not with a getNext type of API. Differentimplementations are different, and you'll need to experiment, but typically you get the array, getthe array's length, and then loop through the attribute nodes. You get the array with theattributes public variable or the getAttributes() method, defined in the Node interface, and

    the number of elements with the length public variable defined in the NamedNodeMap interface,then loop, accessing each attribute with the item() method implemented in the NamedNodeMap

    interface, then accessing the attribute's name and value public variables from the Attr interface.If your implementation uses only methods, use

    getNodeName()and

    getNodeValue(). The

    following is some Java code to do that:

    NamedNodeMap attribs = thisNode.getAttributes();for(int i=0; i < attribs.getLength(); i++){Node attrib = attribs.item(i);System.out.print(attrib.getNodeName());System.out.print("=\"");

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    20/47

    System.out.print(attrib.getNodeValue());System.out.print("\"\n");

    }

    Once again, in many DOM implementations the preceding doesn't work. In some cases attribs

    is an array in the computer language's native format, after which it can be traversed usingconstructs of the language. Experiment.

    In this Article You Have Learned

    How to create a DOM Document object with a parser (DOMParser object). The three main DOM activities are navigating, viewing and modifying. Using the "checker metaphor" to understand iterative document navigation. The major DOM navigation public variables and equivalent methods.

    o The methods: getOwnerDocument() getDocumentElement() getFirstChild() getLastChild() getNextSibling() getPreviousSibling() getParentNode()

    o The Public Variables: readonly attribute Document ownerDocument; readonly attribute Element DocumentElement; readonly attribute Node firstChild; readonly attribute Node lastChild; readonly attribute Node nextSibling; readonly attribute Node previousSibling; readonly attribute Node parentNode;

    The Down, right, up, done DOM walking algorithm:o Go down if you cano else go right if you cano else go up if you cano else you're doneo Do not descend if you got to the node by ascending.o Do not process the node if you got to it by ascending.

    A simple Java implementation of that algorithm. Using getElementsByTagName(name) to access elements by name. How to add, change and delete information in the DOM document with

    appendChild(newElement,tempElement) and

    insertBefore(newElement,tempElement) ,replaceChild(newElement,tempElement) , andremoveChild(newElement,tempElement) .

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    21/47

    Navigating attributes by name using getAttribute(attribname) andgetAttributeNode(attribname), and navigating attributes sequentially withgetAttributes(), getLength(), and getNodeName().

    Steve Litt is the documentor ofThe Universal Troubleshooting Process. He can be reached atSteve Litt's

    email address.

    Learning from the Masters: How Dia Uses

    XML

    By Steve Litt

    In this Article You Will Learn How to use Dia to learn good XML construction. Dia is a vector drawing package that stores its drawing information in XML format. Modifying the drawing modifies the XML, and Modifying the XML modifies the

    drawing.

    This article may seem very tedious. You might be tempted to skip it. But unless you already havea deep understanding of XML and a feel for what makes good XML, this is the most importantarticle in this magazine. If you skip this article, you'll likely fail (or at least not understand what

    you're doing) when you try coding the XML app exercises later in this issue. But if you spend thehour it takes to do this article's exercises, and the extra 1 to 3 hours to debrief yourself so youreally understand what has happened, you will have a deep, intuitive grasp of XML, and nothingwill stop you.

    !! CAREFULLY READ AND PARTICIPATE IN THIS ARTICLE !!

    Many Linux distros come with a vector graphics drawing program called Dia. Dia is an OpenSource alternative to Visio. It stores not only drawings but also template shapes in XML, so it'svery extensible and could surpass Visio. Using only a text editor, you can create brand newtemplate shapes, each with an arbitrary number and placement of connnection points. It's

    incredible.

    Dia is available on many Linux distros. I know it's on Mandrake 7.1 and 7.2, although it's not on

    the menu. But it's in /usr/bin. If Dia isn't installed, see if you can install it from yourdistribution CD (check for a file with a name like dia-0.86-2mdk.i586.rpm in your RPMSdirectory on Red-Hat derived distros).

    http://www.troubleshooters.com/tuni.htmhttp://www.troubleshooters.com/tuni.htmhttp://www.troubleshooters.com/tuni.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/tuni.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    22/47

    If your distro didn't come with Dia, here are some places you can get it:

    Type of install Where to find it

    Source http://www.lysator.liu.se/~alla/dia/dia.htmlDebian Package http://packages.debian.org/unstable/graphics/dia.html

    RPM files http://www.rpmfind.net, then search for dia.

    Dia is a diagramming tool most suitable for data flow diagrams, network system diagrams, orbasically anything resembling a block diagram. Connection lines stay connected as you movecomponents around. You can add bends to connection points by right-clicking a multi-segmentconnection line and choosing "add new segment". Outstanding!

    All drawings are stored as gzipped XML files. You can modify a drawing two ways --

    graphically, or by editing the XML. Although the latter is much more time consuming and harderto visualize, for work requiring exact measurements it might be preferable.

    Hello World Dia XML investigation

    But never mind. I came to use Dia, not to praise it. We're going to use Dia to learn how it usesXML, in preparation for our own XML app. Start by running Dia from the command line.Among other screens which are relatively extraneous, you'll see a screen like the following:

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    23/47

    That's the Dia toolbox. From the menu, click file, then new, and you'll be brought to a blankpage. Right click the blank page, choose file, then save as, and save it as blank.xml.gz. Nowclose the drawing by right clicking the empty drawing and choosing close.

    Remember, Dia saves its drawings as gzipped xml files. View blank.xml.gz with the following

    command:

    zless blank.xml.gz

    You'll see an XML file whose document element is (with a namespace appended --well discuss this much later). Second level element are and . Examine the element's XML code:

    There's no end tag. In XML, when an element contains no subelements or text nodes, the starttag and end tag would butt up next to each other. To enhance readability in such cases, XMLsyntax allows a forward slash before the ending angle bracket of the start tag to denote an end

    tag. The layer element has two attributes, name, with value "Background", and visible, withvalue "true". Remember that none of these strings are XML reserved words.

    In the case of the element, it has tons of subelements, most of which are elements (this is not an XML reserved word). As you can see, there's an element for the drawing's background, an element for the "paper" used with thedrawing (size, margins, portrait/landscape and the like), an element for the grid to beused, and an for something called "guides", of which there's apparently a horizontaland a vertical instance. People hear me well, a lot of the Dia application is specified by thislayout, and this layout is extremely readable. Behold the power of XML!

    You'll notice a couple other things. elements contain other elements (ordon't, as the individual elements data dictates). XML allows storage ofvery freeform data. You'll

    also notice a element. This is intended as a container for multiple elements.

    What's in an Ellipse

    We're going to draw an ellipse, save it as ellipse.xml.gz, and then compare it withblank.xml.gz. The result will be the Dia application's XML representation of an ellipse.

    From the Dia toolbox, choose file and open, and open blank.xml.gz. In the tool box, click the ellipse tool, which has an icon like this: In the drawing, click and drag to lay down the ellipse. Drag in such a way that the

    ellipse is considerably wider than it is high. In the drawing, right click, choose file/save as, and name the modified file

    ellipse.xml.gz. In the drawing, right click, choose file/close to close the drawing.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    24/47

    Now use the following commands to view the difference between blank.xml.gz andellipse.xml.gz:$ gunzip ellipse.xml.gz blank.xml.gz$ diff ellipse.xml blank.xml | less

    You get something like the following:

    58,76c58< < < < < < < < < < < <

    < < < < < < < --->

    Look it over for a second. All that happened was a single element, whose type attribute

    has value "Standard - Ellipse", has been inserted into the object whose name attribute has

    value "Background". The element contains several elements describing allthe "attributes" you'd expect of an ellipse, such as position (X and Y coords), the top left corner(X and Y coords), the width and the length. There's also an element called obj_bbwhich is the four points comprising the bounding box of the object. It's all very readable.

    Notice there's no color listed? Let's give the ellipse a fill color and observe the change.

    How Colors are Implemented

    First, be sure to gzip ellipse.xml:gzip ellipse.xml

    Now open ellipse.xml.gz in Dia. Drag a rectangle around the ellipse to select it without therisk of moving it. Now right click the ellipse, choose dialogs, then properties. Click the color barnext to "Fill colour", and crank the blue all the way down until it's a pure yellow. Now click thecolor bar next to "Line Colour", and crank up the blue until the line is pure blue. Right click the

    drawing, choose File/save as, and save the drawing as colors.xml.gz. Finally, click thedrawing, choose file/close to close the drawing.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    25/47

    Now use the following commands to view the difference between blank.xml.gz andellipse.xml.gz:

    $ gunzip colors.xml.gz ellipse.xml.gz$ diff colors.xml ellipse.xml | less

    You get the following:

    75,83d74< < < < < < < < <

    It's simple to see what happened. An element with attribute name having value"inner_color" was created with a subelement called , with a val attribute whose valueis"#ffff00" (pure yellow), to describe the fill color. An element called"border_color" was created with a subelement with attribute val valued at"#0000ff" (pure blue), to describe the line color. And an element called

    "border_width" with a subelement called , whose val attribute has value at "0.1". Notethat when I say the elements were called such and so, what I really meant was that

    they had an XML attribute called name, and the value of that attribute was such and so.

    If you're like me, you wonder why a border width entity was created. I'd guess that there was noborder until you specified its color.

    : NOTE :

    Look what the application has done. Every property of the ellipse is described with an

    element. They could have had special elements called and thelike, but they didn't. Likewise, they have a subelement to describe the value of the property.Each such subelement has a name corresponding to what is being measured, and a valuecorresponding to the actual value of the property. Why did they do this? they could have just aseasily done something like this:

    But that wouldn't have been as generic. What the authors of Dia have done is to create a systemwhere any property can be described, and all properties can be read into the app. This is how thepros do XML.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    26/47

    Anyway, you now see how it handles colors. We've done quite a bit of work manipulating Diaand noting the result in XML. Now let's go the other way.

    Modifying a Drawing with a Text Editor

    Because you gunzipped colors.xml.gz, you now have an XML file called colors.xml. Usingyour favorite text editor, edit that file. Pay particular attention to the following two elements,which might not be next to each other in your experiment:

    As you remember, you made the ellipse much wider than it was high. That's why the elem_widthis much bigger than elem_height. Using the cut and paste of your text editor, carefully exchange

    the values associated with elem_width and elem_height, and then resave the file. If things go asexpected, pulling the diagram up in Dia should now show an ellipse higher than wide.

    Naturally, you need to gzip the file again:

    gzip colors.xml

    And finally pull the drawing up in Dia. And sure enough, the ellipse is now higher than wide (ifnot, troubleshoot).

    Creating a Dia Exploration Script

    We really didn't need to go through all the gzipping and gunzipping, file save and file close andtext editing and lessing. We did that just to minimize the extraneous variables so you could seethe exact effects of tiny changes in the drawing. Now it's time to make a script to quicklyalternate between the graphic and text view of drawings, with the ability to change in either viewand view the changes in the other. Here's the script:

    resp='y'echo $respwhile test "$resp" = "y"; do

    dia test.xml.gzrm test.xmlgunzip test.xml.gzvi test.xmlgzip test.xmlecho -n "Do it again? (y/n)===>"read resp

    done

    Save the preceding script as rdia and chmod rdia as executable by all (chmod a+x rdia).

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    27/47

    : NOTE :

    If you don't like the VI editor, substitute the name of your favorite Linux editor for vi in thescript

    This script won't function if there doesn't exist a test.xml.gz, so before using the script go intoDia, create a blank drawing, and save it as test.xml.gz. Finally, run the script and experimentediting both the XML text and the Dia graphics, and note how changes in one environmentappropriately change the other.

    ! CAUTION !

    This script will not procede to editing the XML file until you completely exit the Dia

    application. You exit Dia after saving your work by clicking the close icon on the Dia

    toolbox.

    If you "gum up" the XML so badly that you can't pull up the file in Dia, simply create a new

    blanktest.xml.gz in Dia.

    Exploring Shapes and Connectors

    While in the rdia loop, make a drawing with a single ellipse, a single rectangle, a single triangle

    (make it with the button), and a single line. View it in the XML view, noting how each

    shape is specified in XML. Feel free to move and magnify things. Go back and forth. Have fun.

    Who's on Top?

    In Dia, make a yellow ellipse on top of a blue rectangle. Make sure the yellow ellipse is notcompletely inside the blue rectangle, and that the yellow ellipse doesn't completely cover theblue rectangle. If you have trouble putting the yellow ellipse on top, right click the yellowellipse, choose objects, and then clickbring to front. The yellow ellipse will now be on top. Youshould be able to see parts of the blue rectangle below it, and the yellow ellipse should not beentirely inside the blue rectangle. Save and exit Dia.

    Once in the XML file, you'll notice that the blue box object appears before the yellow ellipseobject. That's intuitive, because objects appearing later get thrown on the canvas "on top of"existing objects. To test this theory, cut the XML for the yellow ellipse object, and place it belowthe XML for the blue box object. Save the file and continue. If everything's gone right youshould now see the blue box on top of the yellow ellipse.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    28/47

    See the beauty of XML. A concept like "how do I signify which objects are on top of whichothers would normally be difficult to implement. But if the app stores its info in XML, it's a no-brainer.

    Notice that all objects are inside a single layer object. If you want to have a little fun, within the

    Dia environment send different objects to different layers, then view the results in XML. Notethat I've had cases where seemingly correct changes to layers caused Dia not to load the file, andI've even seen where simply saving the file in VI caused Dia not to load the file. The good newsis I've always been able to correct this type of problem by deleting the new layer in VI, afterwhich Dia would load the file. When all is working well, you can manipulate layers in the XMLfile and have the results show up exactly as expected in Dia.

    Experiment. The possibilities are endless.

    A Little Theory

    In your editor, go to the top of the XML file and note the following:

    The first line is the XML Declaration, and basically gives the XML version. The second linedeclares a namespace, called dia, and equates it with the URI

    http://www.lysator.liu.se/~alla/dia/ . Note that I said URI, not URL. There's a subtledifference. But anyway, don't expect to find anything at that URI. It would be coincidence if youdid.

    The second line is a namespace declaration. It declares a namespace called dia, associating itwith the unique identifier "http://www.lysator.liu.se/~alla/dia/". The reason URI's are used isbecause they are the best hope for a unique identifier worldwide. For instance, if I were to authora new XML file and wanted to give it a new namespace, I could name it after a directory onTroubleshooters.Com, knowing that Troubleshooters.Com is mine to control. Of course, thisdoesn't stop someone else from using Troubleshooters.Com as part of their unique identifier, butthat would be very bad ettiquette.

    A namespace is simply an "area" or "scope" (for want of better words) within which each nameis guaranteed unique. This is important as Internet enabled apps use more and more XML files

    from more and more sources. The basic idea is that all elements could be prepended with dia:,

    in which case, for instance, would be differentiated from, let's say,.

    For the time being this is isn't important in the learning process, but remember it in case you latersee this syntax. And remember, you WILL NOT find a DTD or schema for the XML file at theURI. The URI is just a method of unique identification.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    29/47

    Exploring Groups

    So much for theory -- back to action. Here we'll explore groups of objects in Dia. Run your ./rdiascript, and delete all objects from the drawing using the Ctrl+X keystroke combination. Also, ifyou created extra layers in prior exercises, be sure to delete them. Right click anywhere on the

    drawing, choose dialogs, then layers. The X button is what enables you to delete a layer. Be surenot to delete the original layer, which is probably called Background.

    Now draw yourself an ellipse (wider than high) and a rectangle (wider than high). Now drag aband around both to select them, right click on either object, select objects then group, and notethat instead of both objects being selected, the pair of objects is selected. You cannot select oneobject without selecting both, and when you move one both move. Save the drawing and quitDia.

    Viewing it XML, you see that the two objects are now between a pair of tags.The complex task of grouping objects is handled just that simply.

    Go back to Dia, select the group, right click either object in the group, and choose objects andungroup. The objects become two separate objects now. Save the drawing and exit Dia. Lookingat the XML, you'll notice everything's the same except the pair of tags is gone.

    Exploring Connection Drawings

    Let's make a real diagram and see how it's implemented in XML. We'll be creating a diagramsimilar to the following:

    First erase everything from the drawing with the Ctrl+X keystroke combination. Next, draw thebattery, resistor and zener diode using their buttons from the Circuit template group, which will

    probably be the default. The buttons for battery, resistor, and zener are, respectively: , ,

    and . Each of these graphic circuit components have connection points at their leads, so use

    the Zig Zag Line button ( ) to create zig zag lines, clicking on one connection point anddragging to the one on the next electronic component. Try to arrange the components so theylook something like the drawing above. When you have something resembling the preceding

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    30/47

    drawing, save it and exit Dia to see the XML you've created.

    : NOTE:If you're really having trouble drawing this drawing in Dia,click here.

    First, notice that you've created the following objects:

    Observe that each object has an id attribute with values from "O0" through "O5". That fact willcome in handy investigating the mechanics of line to circuit component connections. Note thatdepending on the order in which you placed components and lines, your ID numbers may vary.

    Next, notice that each Zig Zag line has a element, containing two elements. The following code shows the elements for the first, second, and thirdzigzag lines respectively:

    So let's describe each zigzag line's connections in English. Line 1 connects to handle 0 of objectO0 (the battery), and also to handle 1 of object O1 (the resistor). It connects the battery to theresistor. Zigzag 2 connects to handle 0 of object O1 (the resistor, and please remember thathandle 1 of the resistor is already taken), and also to handle 1 of object O2, the zener. It connects

    the resistor to the zener. Zigzag 3 connects to handle 0 of object O2 (the zener), and handle 1 ofobject O0 (the battery). It connects the zener to the battery, completing the circuit.

    You can actually modify the XML to put the program in an illegal state. On the following line:

    http://troubleshooters.com/tpromag/200103/drawinghelp.htmhttp://troubleshooters.com/tpromag/200103/drawinghelp.htmhttp://troubleshooters.com/tpromag/200103/drawinghelp.htmhttp://troubleshooters.com/tpromag/200103/drawinghelp.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    31/47

    Change the 0 to 1 in the handle="0" attribute, save, and pull it up in Dia, and note thateverything looks fine. Now click on the zener diode, and note that Dia aborts. Change the 1 backto 0 and confirm that the illegal state has been taken care of.

    Basically, what happened is that a line with nonzero length had its begin and end handles at the

    same point -- an error condition in both Dia and mathematics.

    That brings up an interesting hypothesis. Perhaps we should strive to make XML apps somutable that you can't put them in an illegal state by editing the XML. Such an app would indeedhave all its logic in the XML file, and the executable app would merely be a viewer. Perhaps thathypothesis is a little over the top, but it's an interesting thought.

    Creating a new Template Shape

    Indeed, part of Dia's logic is in its XML files. The template shapes are determined by two XMLfiles and one file with a dot picture. The XML files determine the properties of the template

    shape -- its image and its connection points. The dot file determines the icon on its button.During this exercise we'll add a new template shape to the beginning of the circuit templategroup. We'll call that new shape a smily, and it will be a smily face, in a rectangular head, with aconnection point at each ear. These are the four steps to making this new shape:

    1. Find the Dia shared directory2. Create the icon, shapes/Circuit/smily.xpm3. Create the smily shape, shapes/Circuit/smily.shape4. Add the Smily Face shape to the Circuits template group, sheets/Circuit.sheet

    #2 creates the icon. #3 defines the shape, and associates it with the icon. #4 incorporates the new

    shape in the Circuits template group (sheet).

    Find the Dia shared directory

    It's probably going to be /usr/share/dia, and it will have two subdirectories, shapes and

    sheets. The best way to find it is to find Circuit.sheet, which resides in the sheets directoryunder the Dia shared directory. First try the locate command:$ locate Circuit.sheet

    If that doesn't produce results, do a brute force search through the /usr tree:# find /usr -type f | grep "Circuit\.sheet"

    /usr/share/dia/sheets/Circuit.sheet/usr/src/RPM/SOURCES/dia-0.86/sheets/Circuit.sheet#

    In the preceding example, the Dia shared directory would be /usr/share/dia.

    Create the icon, shapes/Circuit/smily.xpm

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    32/47

    This file specifies the appearance of the icon on the new shape's button. Copy and paste thecontents of the following box to a file called smily.xpm in the Circuit directory below the Diashared directory:

    /* XPM */

    static char * smily_xpm[] = {"22 22 3 1"," c None",". c #000000","+ c #FFFFFF"," "," "," "," "," "," "," .............. "," . . "," . . . . ",

    " . . "," . . . "," . . . . "," . .. .. . "," . ... . "," .............. "," "," "," "," "," "," "," "};

    As you can see, it's really just a dot picture of the icon to be displayed, plus the size and a fewother properties.

    Create the smily shape, shapes/Circuit/smily.shape

    This is the specification of the new shape. Copy the contents of the following box to a file calledsmily.shape in the Circuit directory below the Dia shared directory:

    Circuit - Smily FaceA smily face to brighten your daysmily.xpm

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    33/47

    The document element is . It has the following subelements:

    Subelement Function

    The name by which this shape is known.

    A human readable description of the shape.

    The Icon file associated with the shape, in this case the one you made earlier.

    The connection points for persistant line connections.

    The actual shape information, built from further subelements which aregeometric shapes such as polygons.

    Now you have a shape file describing your new template, and associating it with an icon. The

    last step is to inform the Circuit template group (sheet) that this shape has been added...

    Add the Smily Face shape to the Circuits template group, sheets/Circuit.sheet

    There's a file called Circuit.sheet in the sheets directory under the Dia shared directory. Copythat file to Circuit.sheet.org in the same directory. Now if you hopelessly mess upCircuit.sheet it won't be necessary to reinstall Dia to fix the mess. You can simply copy theoriginal file back.

    Now open Circuit.sheet with your favorite text editor. You'll note that the document elementis , with several subelements, each of which gives a description in a

    different language. It also has one subelement. That's where the rubber meets theroad, because contains many subelements, each of which describe a

    template shape. If your file has not been modified, the first will be named "Circuit -Vertical Resistor". You're going to insert the smily object before the vertical resistor. Simplycopy the contents of the following box between the tag and the first tag:

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    34/47

    En smilyfaceUne smilyfaceEine smilyfaceA smilyface

    The value of the name attribute of the element must be spelled, capitalized and spacedexactly as shown. That's because that value is how Dia finds the shape file you previouslycreated. In fact, this text must exactly match the text in the shape file. Once you'vecompleted this, quit Dia, restart it, and you should see the smily icon on the Circuit template.Click it, and drag on the drawing to make a smily. Connect zigzag lines to it, put it into a circuit.Save, quit, and see the results in XML. Within XML, experiment with changing the booleanvalue of the elements whose name attr ibutes are flip_vertical and show_background,and view the results in Dia.

    This is XML at its best. You've just given an application a new capability, using only a texteditor.

    Summary

    The exercises in this article were long, and possibly tedious. But they were worth it, because ifyou did them, you now know XML intellectually and intuitively. You've seen an XML dialectthat has been specified for maximum adaptibility. You've seen how the masters put as much ofthe implementation as possible in the XML, rather than in the executable app. You've seen trueround trip development between a GUI and a text editor environment.

    Allow your imagination to relish all the possibilities. Imagine how XML would help you write

    that app you always wanted to write but never figured out how. With XML, the only limit isyour imagination.

    Congratulations. You've learned as much XML as you can get from reading. Now it's time to do.You've graduated. The next article walks you through writing your own XML Hello World.Enjoy.

    In this Article You Have Learned

    How to use Dia to learn good XML construction. Dia is a vector drawing package that stores its drawing information in XML format. Modifying the drawing modifies the XML, and Modifying the XML modifies the

    drawing..

    Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist". He can be reached

    atSteve Litt's email address.

    http://www.troubleshooters.com/bookstore/ttech.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/bookstore/ttech.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    35/47

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    36/47

    Steve Litt is the developer ofThe Universal Troubleshooting Process troubleshooting courseware. He can be

    reached atSteve Litt's email address.

    Where to Go From Here

    By Steve Litt

    If you've done all the exercises you're pretty comfortable at XML, and could probably take on anXML project at work tomorrow. Does that mean you know all necessary XML information? Noteven close.

    We covered the tip of the iceberg. Just the basics of reading and writing XML from a program.There's a world of other XML knowledge to gain. There are excellent XML processing modelsbesides SAX and DOM. Schemas offer an alternative to DTD validation. There's the entirediscipline of rendering, complete with XSL, XSLT, and XML frameworks like Cocoon fromApache Software Foundation. There are specific XML varients such as SVG for vector graphics,

    as well as chemical and mathematical markup languages. There's even a markup language calledXML-RPC for remote procedure control. And there's much, much more.

    Luckily, the information is easy to find, and you now have the skills to exploit it. This articlegives some web resources and a couple books that you'll find helpful.

    Here are some URL's that will help you with specific projects. And we definitely coveredenough so that computer programs you write can access and work with XML. Without the infoin this tutorial, going farther would have been folly.But there's a world of other XML availableto you:

    1. XML Schemas:http://www.w3.org/XML/Schema.htmlo XML Schemas are an alternative to DTD's, probably a better alternative. I just

    didn't have enough time to do justice to schemas, but the preceding URL gives anexcellent starting place.

    2. XML processing models other than SAX and DOM:o pyxie:http://www.pyxie.org

    "Python specific" hierarchy storage system that appears to have Perlsupport. An ultra simple alternative to DOM.

    o JDOM:http://www.jdom.org Lightweight, simple Java specific XML handler with a native Java syntax.

    o Grove:http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pm

    Grove allows manipulation of an XML hierarchy as native Perl hashes andarrays.

    o XML::Simple:http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pm

    An ultra simple event model (like SAX) XML interface for Perl. If XMLis a small part of the project, consider this.

    http://www.troubleshooters.com/utp/tcourses.htmhttp://www.troubleshooters.com/utp/tcourses.htmhttp://www.troubleshooters.com/utp/tcourses.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.w3.org/XML/Schema.htmlhttp://www.w3.org/XML/Schema.htmlhttp://www.w3.org/XML/Schema.htmlhttp://www.pyxie.org/http://www.pyxie.org/http://www.pyxie.org/http://www.jdom.org/http://www.jdom.org/http://www.jdom.org/http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://www.jdom.org/http://www.pyxie.org/http://www.w3.org/XML/Schema.htmlhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/utp/tcourses.htm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    37/47

    o XML::Twig:http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pm

    This appears to be a "best of both worlds", with the ability to store data intrees, but possessing callbacks and other features allowing the processingof just the parts of a huge XML file that the program finds necessary.

    Given Perl's less than stellar support for DOM, you should give Twigsome serious consideration.3. XSL:http://www.w3.org/Style/XSL/

    o eXtensible Stylesheet Language. In a nutshell, XSL endeavors to define howXML is transformed and rendered. The transform part is done by XSLT.

    4. XSLT:http://www.w3.org/TR/xslto XSLT is a language the defines XML transformations.

    5. Cocoon:http://xml.apache.org/cocoon/o An XML publishing framework endeavoring to split XML work into XML

    authoring, XML processing, and XSL rendering. The result is the ideal we've allbeen looking for -- an XML file that can be simultaneously rendered as HTML

    (possibly different browser specific HTML formats), Postscript files, and otherforms.6. XML-RPC:http://www.xmlrpc.com/

    o This is just too cool. At the heart of it is an XML dialect defining what procedureis to be called, and what arguments are to be passed to it. What comes back isanother XML document, with each returned argument and its value defined. Canyou say "distributed computing?". This could become very corporationallycorrect. See the XML-RPC for Newbies discussion athttp://davenet.userland.com/1998/07/14/xmlRpcForNewbiesand the RPCDebugger athttp://frontier.userland.com/stories/storyReader$1077for furtherdetails.

    Excellent XML Books

    Beware. Most XML books talk about little else except XML syntax and validation. Most of theXML books out there don't understand that programmers learn by programming, not bymemorizing YAAFAX (Yet Another Arcane Fact About Xml). Most XML books out theredevote several chapters to HTML, SGML, browser rendering, and blatantly Microsoft specificapplications, while having little or no programming to show how a real programmer canmanipulate and render XML.

    Most XML books out there are trash. That's why I wrote this tutorial -- to undo the damage done

    by those so-called XML books that did nothing but scare would-be XML programmers out ofXML, before they started. I assume that by this point in this month's TroubleshootingProfessional Magazine, you understand that XML is anything but rocket science.

    Yes, most XML books are trash. But in my travels I discovered some good ones, and one trulyoutstanding one.

    Java and XML by Brett McLaughlin: ISBN 0-596- 00016-2

    http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://www.w3.org/Style/XSL/http://www.w3.org/Style/XSL/http://www.w3.org/Style/XSL/http://www.w3.org/TR/xslthttp://www.w3.org/TR/xslthttp://www.w3.org/TR/xslthttp://xml.apache.org/cocoon/http://xml.apache.org/cocoon/http://xml.apache.org/cocoon/http://www.xmlrpc.com/http://www.xmlrpc.com/http://www.xmlrpc.com/http://davenet.userland.com/1998/07/14/xmlRpcForNewbieshttp://davenet.userland.com/1998/07/14/xmlRpcForNewbieshttp://frontier.userland.com/stories/storyReader$1077http://frontier.userland.com/stories/storyReader$1077http://frontier.userland.com/stories/storyReader$1077http://frontier.userland.com/stories/storyReader$1077http://davenet.userland.com/1998/07/14/xmlRpcForNewbieshttp://www.xmlrpc.com/http://xml.apache.org/cocoon/http://www.w3.org/TR/xslthttp://www.w3.org/Style/XSL/http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pm
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    38/47

    This is an astoundingly excellent book! Because it's intermediate level, once you've finished thistutorial you can graduate directly to this book.

    The first 8 chapters are built quite a bit like this tutorial, with code progressions to walk youthrough the process of learning the principles. McLaughlin starts the reader on SAX, then walks

    you through creating, parsing and interpreting DTD's and Schemas, and finally giving a thoroughindoctrination in DOM and JDOM. Chapter 12, "Creating XML with Java", is also necessary tobasic XML programming. All the material is thorough and rigorous, with programs done in aJavanically compliant way.

    From there he goes on to discuss all the Kewl things you can do with XML now that you knowits principles, including XML publishing frameworks, XML-RPC, XML and Enterprise JavaBeans, and finally business to business examples (think that might be a valuable skill?).

    This is the book I would have written if I knew as much XML and Java as McLaughlin.

    I don't say that lightly. In scope, depth, organization, and writing style, I find "Java and XML"quite similar to Samba Unleashed.

    If you have anything to do with XML, even if you're not a Java person -- get this book!

    XML Processing with Python by Sean McGrath: ISBN 0-13-021119-2

    This book was where I got my first real XML knowledge. It's an excellent book, especially if youconsider Python easier than Perl and Java. It comes with a CD full of everything you'll need forthe exercises in the book. There are chapters on DOM and SAX. If you're a Python programmer,this is the XML book for you.

    One word of warning: "XML Processing with Python" is heavily weighted toward Pyxie XMLmethodology, and the utilities and tools on the CD aren't those you'd typically download frompython.org. So if you want to learn generic XML, especially language independent, that's adisadvantage. BUT, if you're a Python guy (and most of us are whether we admit it or not), thetools that come with this book, ESPECIALLY Pyxie, can have you up and running with XML inrecord time.

    The DOM Specification:

    One might say this isn't a book, but the Level 2 Core PDF is 107 pages. Taken together, they're

    as big as a book. And man, they're well written and informative. I didn't really understand XMLuntil I read the DOM spec, and then it was obvious. Read it. URL's below:

    http://www.w3.org/TR/DOM-Level-2-Core/ http://www.w3.org/TR/DOM-Level-2-Views/ http://www.w3.org/TR/DOM-Level-2-Style/ http://www.w3.org/TR/DOM-Level-2-Events/ http://www.w3.org/TR/DOM-Level-2-Traversal-Range/

    http://www.w3.org/TR/DOM-Level-2-Core/http://www.w3.org/TR/DOM-Level-2-Core/http://www.w3.org/TR/DOM-Level-2-Views/http://www.w3.org/TR/DOM-Level-2-Views/http://www.w3.org/TR/DOM-Level-2-Style/http://www.w3.org/TR/DOM-Level-2-Style/http://www.w3.org/TR/DOM-Level-2-Events/http://www.w3.org/TR/DOM-Level-2-Events/http://www.w3.org/TR/DOM-Level-2-Traversal-Range/http://www.w3.org/TR/DOM-Level-2-Traversal-Range/http://www.w3.org/TR/DOM-Level-2-Traversal-Range/http://www.w3.org/TR/DOM-Level-2-Events/http://www.w3.org/TR/DOM-Level-2-Style/http://www.w3.org/TR/DOM-Level-2-Views/http://www.w3.org/TR/DOM-Level-2-Core/
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    39/47

    http://www.w3.org/TR/DOM-Level-2-HTML/XML Devcon

    There comes a time when reading and performing exercises just aren't enough. You want to rub

    elbows with your peers. Luckily, there are two more XML Devcons this year.www.xmldevcon2001.com lists this year's remaining conferences as:

    New York City Conference: April 8-11 Exhibition: April 9-10 San Jose Conference: Fall 2001

    This is put on by Camelot Communications, the same people who put on ApacheCon. I went tothe 2000 ApacheCon in Orlando, and it rocked. These conferences sell out, so if you want to goto the New York conference, sign up soon athttp://www.xmldevcon2001.com/NY/html/registration.html. You can order a free Exhibit Passfor the New York event, good for all Expo Days that gets you into the exhibit floor, the

    Keynotes, the Technical Briefings, the Management Briefings, and All Special Events. Go toregistration URL and click "For Free Exhibit Pass".

    Steve Litt is the author ofRapid Learning: Secret Weapon of the Successful Technologist. He can be reached

    atSteve Litt's email address.

    Apache Software Foundation and W3C Rule!

    By Steve Litt

    As you can imagine, writing this issue of Troubleshooting Professional took some time. And themore time I spent researching, the more obvious it became that the Apache Software Foundationand the World Wide Web Consortium are two of the most powerful software entities on earth. Ithink of them as the legislative and executive branch. W3C manages the creation of thespecifications. And Apache Software Foundation maintains the actual projects.

    When it comes to standards based specs, look what W3C has to offer. They offer the standardspecifications for XML, XSL, XML Schemas, DOM, HTML, SVG (Scalable Vector Graphicsvarient of XML), Cascading Style Sheets. There's a working draft for XQuery -- a querylanguage to extract info from XML docs. They have a working draft of the WAI -- the WebAccessibility Initiative. I've just scratched the surface. And best of all, these are *standards*.

    They won't be changed or kidnapped at the whim of a corporation.

    As I researched for this month's magazine, it started looking like whatever W3C recommends,Apache Software Foundation builds or maintains. Sometimes the projects are initiated atcorporations, but ASF has a reputation for running Open Source projects, so when the originatorswant to leave their project in good hands and move on to other things, they leave it to ASF. Andof course, many ASF projects start at ASF. During the writing of this magazine, I saw so much

    http://www.w3.org/TR/DOM-Level-2-HTML/http://www.w3.org/TR/DOM-Level-2-HTML/http://www.troubleshooters.com/bookstore/rl.htmhttp://www.troubleshooters.com/bookstore/rl.htmhttp://www.troubleshooters.com/bookstore/rl.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/bookstore/rl.htmhttp://www.w3.org/TR/DOM-Level-2-HTML/
  • 8/8/2019 XML-Troubleshooting Professional Magazine

    40/47

    kewl stuff from ASF that I almost forgot they're the source of the worlds most popular webserver.

    I'd like to take a quick look at just a few of the software tools you can download from theApache Software Foundation website.

    Near and dear to my heart is Xerces, the "parser" that made it possible to do this tutorial. Thereason I put quotes around the word is Xerces can do things far beyond mere parsing. It containsthe entire DOM interface, and all sorts of other things. And every bit of it works consistently,exactly like you'd expect it to. I had forgotten how much fun it is to work with software tools sosolid you can spend your mental effort in design, rather than workaround. I used Xerces for Java,but there appear to be versions for C++ and Perl. And according to an email I just got, the Perlversion now works with Linux.

    Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XMLdocument types.

    Cocoon is an XML framework. The way I interpret their description, you author your content inXML, which has no appearance component. There's then a logic component, which I don'tunderstand, and finally an XSL component to map each XML entity to an appearance. The way Iread this, the subject matter expert can write his content without having to worry aboutappearance. Obviously I don't understand the full picture. You might want to have a lookyourself. It sounds mighty powerful.

    What can I say about SOAP: :-) :-) :-). My understanding is that Microsoft originated SOAP,which is a lightweight data exchange mechanism for distributed computing. Now it's beensubmitted to W3C. IBM has implemented it, and as you can see Apache supports it. The way it

    looks to me, the community intercepted Microsofts pass and scored a touchdown.

    Then there's Batik: Whoooaaa! This is a series of core modules to work with the SVG (ScalableVector Graphics) XML dialect. Something very similar to the data format of Dia. Batik workswith Java. Imagine being able to draw a picture in a Vector Drawing Program (vector drawingsconsume much less bandwidth than their bitmap graphics cousins), and have it visible in anybrowser with the proper Batik plugin! Ya know, I'm sick of creating diagrams in Dia and thenhaving to tweak them in Gimp to show them to the world. I'm not saying Batik can do that yet,but it's what crossed my mind when I read what it is. The W3C tested six SVG implementations,including Adobe and JASC (the Paintshop Pro people), and Batik did exceptionally well in allareas except animation. See the results at http://www.w3.org/Graphics/SVG/Test/BE-ImpStatus.

    Summary

    What do you think of when someone utters the phrase "the most powerful software entity of ourtime". Until a few days ago I thought of the crumbling but still mighty empire. And if moneywere the measure of power, Microsoft would still be the most powerful. But if quality, reliability,and staying power are the measure, W3C and ASF far surpass mere corporations.

  • 8/8/2019 XML-Troubleshooting Professional Magazine

    41/47

    So the next time you get an assignment in "new technology", your very first move should be tocheck W3C for a recommendation or working draft, and to check Apache Software Foundationfor an implementation. A