XML-Troubleshooting Professional Magazine

8/8/2019 XML-Troubleshooting Professional Magazine

1/47

qwertyuiopasdfghjklzxcvbnmqwerty

opasdfghjklzxcvbnmqwertyuiopasdfg

klzxcvbnmqwertyuiopasdfghjklzxcvb

nmqwertyuiopasdfghjklzxcvbnmqwe

yuiopasdfghjklzxcvbnmqwertyuiopa

dfghjklzxcvbnmqwertyuiopasdfghjklz

vbnmqwertyuiopasdfghjklzxcvbnmq

wertyuiopasdfghjklzxcvbnmqwertyu

pasdfghjklzxcvbnmqwertyuiopasdfgh

klzxcvbnmqwertyuiopasdfghjklzxcvbmqwertyuiopasdfghjklzxcvbnmqwer

uiopasdfghjklzxcvbnmqwertyuiopasd

ghjklzxcvbnmqwertyuiopasdfghjklzxvbnmqwertyuiopasdfghjklzxcvbnmrt

uiopasdfghjklzxcvbnmqwertyuiopasd

ghjklzxcvbnmqwertyuiopasdfghjklzx

XMLTroubleshooting Professional

MagazineVolume 5 Issue 3, March 2001

By Steve Litt - Publish By Amirul Asyraf


2/47

Editors Desk

By Steve Litt

What's up with XML? Is it a revolutionary technology destined to be our livelihood the next fewyears, or a passing fad? Is it a universal standard specified by the W3C, or has it been usurpedand proprietarized by Microsoft? And for some, the most nagging question is "how the heck do Ilearn it?". This issue of Troubleshooting Professional will attempt to answer all 3 questions. Butfor those who turn to the last page of the book, let me answer the questions now:

1. XML is a revolutionary technology destined to be our livelihood the next few years.2. XML is a universal standard specified by the W3C.3. You can learn the basics of XML in this issue of Troubleshooting Professional.

XML was detected by trade mags' radar in 1997 or 1998. It was proclaimed a world changingtechnology. Learn it and you're rich.

We were all skeptical. After all, the trades had predicted similar futures for push technology,ATM, and a hundred other technologies we've all forgotten. But the trades get it right sometimes.Witness Java and Linux. And definitely XML.

It's 2001. XML is being incorporated in all sorts of projects. The reason you don't hear about itconstantly is the *app* that reads, writes, changes and renders the XML is written in a traditionallanguage such as Java, Perl, Python or C++. In that respect XML is data. But used correctly,much of an application's logic can be stored as easily modified XML. The actual C++, Java,Python or Perl code then becomes primarily the user interface. Imagine how nice it would be toimplement your business rules as XML. You can!

Then there's the Microsoft connection. Microsoft is gung-ho about XML. Does that make XMLan unwise move?

Probably not. Even if Microsoft does what they do best, and somehow manage to proprietarizesome dialects of XML, it will be easy to reverse engineer, and may even be legal to do so in spiteof UCITA supported anti-reverse engineering license language. Meanwhile, the rest of us canuse our own dialects.

"Dialects" are numerous. As will be explained later in this magazine, XML itself is just anextremely intuitive general specification for how to declare something that could be consideredhierarchical data, or markup language, depending on your viewpoint. Within that specification,an implementer specifies his own set of rules for naming XML elements, and what otherelements each element can contain. That specification can be implemented on paper, ortechnologically enforced with a DTD or schema. If this paragraph loses you don't worry --everything in this paragraph will be explained in detail in this magazine.


3/47

Unfortunately, XML is poorly documented. There are exceptions. The W3C specifications areeasily readable and understandable. But for the most part, XML books do nothing but documentXML's syntax, rules and vocabulary, leading the novice reader to ask "so how can I dosomething with it". If you follow along with the Java examples in this magazine, you'll knowexactly what you can do with XML. Once you understand XML at that level, you can port that

knowledge to Perl, Python, C++ and other languages that have XML APIs.

XML derives its power from the fact that it can represent anything the human mind canconceive. And that representation is very readable both for a human and for a machine. Theconcept is so clean that upon understanding it, my first question was "why didn't I inventXML?". I certainly have the intelligence to have invented it -- XML's not rocket science. I'veneeded it for years, but had to "roll my own" every time I needed a configuration file or dataformat.

So get familiar with XML. Whether you're in the Microsoft world or the Open Source world, orsomewhere in between, you'll need to interface with it in the next couple years.

How can a Troubleshooter benefit from XML? XML should make applications simpler todiagnose and simpler to tweak. And an XML file provides loads of testpoints from which youcan manipulate the apps interacting with it. It brings back some of the Troubleshootingadvantages of the intermediate files of the Cobol era, but unlike those, it's persistent and useful inand of itself.

So whether you're a Troubleshooter, programmer, DBA, Sysadmin, or just a person who likestechnological progress, kick back, relax, and enjoy your magazine.

Steve Litt is the documentor of theUniversal Troubleshooting Process. He can be reached atSteve Litt's

email address.

About this Issue's Exercises, PLEASE

READ!!

By Steve Litt

If you complete the XML tutorials in this issue of Troubleshooting Professional, you will havemastered the following:

XML terminology -- Documents, elements, attributes, DTD's, DOM, SAX, callbacks,well formed, valid, parsers, and the like.

XML construction and syntax. XML application architecture. XML tree navigation. A thorough understanding of DOM and the frequently used interfaces and methods of the

DOM API.
http://www.troubleshooters.com/tuni.htmhttp://www.troubleshooters.com/tuni.htmhttp://www.troubleshooters.com/tuni.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/tuni.htm


4/47

Ability to code an XML/DOM app, complete with node navigation, access, modification,adds, and deletes.

Ability to build DOM documents from scratch. A thorough understanding of SAX, including use of the ContentHandler and

ErrorHandler objects, and construction of the major callback functions.

Ability to code a SAX app to do what you need to. Ability to code a SAX app that loads per-record DOM documents for out of order

processing. Guidelines concerning when to use SAX and when to use DOM. A thorough understanding of DTDs, and a methodology for creating a DTD to match and

validate existing XML code. How to tell the Xerces parser to validate. Syntax for in-file DTDs as well as using DTDs in separate files. Ability to quickly read and understand intermediate and advanced XML books, as well as

various specification documents from standards bodies, and XML websites. Ability to write XML apps on the job.

This issue of Troubleshooting Professional Magazine is organized as a tutorial. It's takes youthrough all aspects of beginning level XML, well into the intermediate level. I stronglyrecommend you go through this tutorial in the order it's written. That means going down thispage through the "Learning from the Masters: How Dia Uses XML" article, then go down the"XML Java Coding Exercises" page, and then come back to this page and continue where youleft off. Everywhere necessary, there are links to point you in the right direction.

The coding exercises are all in Java. My research indicates Java has the most mature support forXML. Once you download Xerces from the Apache Foundation and install it, these exerciseswork on a Linux box with Java installed. Java is the most straightforward way I could offer

coding exercises.

I had originally intended to do the exercises in both Perl and Java, but Perl DOM support provedproblematic, and there wasn't enough time.

!! STOP THE PRESSES !!

Xerces-Perl for Linux has shipped!

After I had done most of the exercises in Java, I got an email message that there now exists. It's so new it's not

on CPAN, and I couldn't find it on xml.apache.org. It's been tested only on Debian. But Xerces is a killer tool,

and a Perl/Linux version is a good thing. Stay tuned. More info as it comes in.

Rest assured, though, if you're a Perl, Python or C++ person, everything you learn in this tutorialwill apply to XML in your language of choice. In every exercise, I used only calls defined in theDOM and SAX specifications. I used no "native Javaisms" to manipulate XML.


5/47

Java is a killer language. It's portable, ubiquitous, free beer and in some implementations freespeech, it's fast enough, and it's corporationally correct. These are some more reasons I choseJava for the XML coding.

This tutorial was written, tech edited, and tested in Linux (Mandrake 7.2). No effort was made to

test under Windows. Instead I used the time to delve deeper into XML. That being said, I knowof no reason the Java exercises shouldn't work on a Windows box that's properly configured withJava and Xerces. If you don't have a Linux box, and you can't get your hands on one, by allmeans use a Windows box for the Java exercises. You'll need to convert some of the shellscriptsto batch files, and you'll need to do a Windows install of the JDK and Xerces instead of a Linuxinstall, but that should be pretty easy.

The Dia diagramming program, basis of the "Learning from the Masters: How Dia Uses XML",originated on Linux but has been ported to Windows. The Linux package is more mature, so ifyou have a choice you might want to do that exercise on a Linux box. And that's an exceptionallyimportant exercise, so even if you don't have a Linux box, please try to find someone who will

let you use theirs for this exercise. If you don't know anyone with a Linux box, find your localLinux User Group (LUG) and beg someone there to let you use their box to do the Dia exercises.

Personally, I felt more comfortable working on a Linux platform. If you feel more comfortableon a Windows platform, I'd imagine you should be able to get this tutorial to work from withinWindows, although of course I haven't tested it on Windows.

Steve Litt is the main author ofSamba Unleashed. He can be reached atSteve Litt's email address.

What is XML?

By Steve Litt

In this Article You Will Learn

XML is a styles based markup language XML is hierarchical in nature. XML is extremely readable and easy to understand. XML can represent almost any concept. XML can implement a major part of an application.

This is a far trickier question than you can imagine, and I think once you master the answer,everything else falls into place.

One possible answer is that XML is a markup language. And that's absolutely true, as anyonewho sees the bracketed begin and end tags for its elements can attest. This answer is true, butalmost useless. Because to think of XML as HTML on steroids is to relinquish 90% of XML'sfunctionality.
http://www.troubleshooters.com/bookstore/su.htmhttp://www.troubleshooters.com/bookstore/su.htmhttp://www.troubleshooters.com/bookstore/su.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/bookstore/su.htm


6/47

Another possible answer is that XML is a styles-based markup language, rather than anappearance-based markup language like HTML. Once again, so true, and so useless.

I think a much better definition for XML is a specification for a markup language that can be

used to represent almost any concept. Keeping in mind that neither phonebook, person, info

nor name are keywords, imagine how the following could be used:

800-555-1212407-555-5555Skating buddyRacing inlines

800-555-1234407-555-2222

Coworker8

You've just implemented a phone book. Add a user interface and you're done. The user interfacereads the fields from the XML, and places the values from those fields in on-screen text boxes,queries the user to change the contents of those fields And notice that if you write that userinterface well, you can add new fields simply by changing the XML. You can have a program onthe other end that puts the finished XML into a database, assuming the database is flexibleenough to represent such data.

Notice a few facts about the preceding XML code:

Just like HTML, start tags are angle bracket enclosed, and end tags are angle bracketenclosed with a prepended forward slash.

An entity started by a start tag and ended by an end tag is called an element. Elements can contain other elements. In the preceding case, the phonebook element

contains two person elements. The first of the two person elements contains four info

elements and a bicycle element. Thus XML is perfect for setting describing andmanipulating any kind of hierarchy. Please note that phonebook, person and info areNOT reserved words.

An element can contain a mix of different elements, as shown by the first person, whohas both info elements and a bicycle element. Additionally, the mixture can contain

both elements and text nodes. An XML file must contain exactly one element in the top level. In the preceding example

that top level element is the phonebook element. In any XML file, the single top levelelement is often called the document element.

Free standing text between a start tag and an end tag is called a text node. You'll learnmore about this in the article on the DOM spec. In the preceding example, the actual

phone numbers (such as 800-555-1212 for John Smith), are text nodes.


7/47

Any element can have zero or more attributes. In the preceding XML code, each infoelement has one attribute, an attribute called name (name is not a reserved word, it could

have been called infoname or whatitis). Attributes are name/value pairs, starting withthe attribute's name, then an equal sign, then the attribute's value within quotes. Attributesare declared in the start tag of an element. An attribute represents a fact about the

element. Elements, attributes and text nodes are all nodes. The idea of nodes is important because

DOM documents are navigated and traversed nodewise.

Because elements can contain other elements, to a certain extent attributes and sub-elements areinterchangeable. For instance, in the person element I described the person's name with an lname

and an fname attribute. Instead, I could have had each person element contain an lname and afname subelement, each of which had the appropriate name between the begin and end tag. Inother words:SmithJohn

800-555-1212407-555-5555Skating buddyRacing inlines

Please remember there are no reserved words in the preceding example. info and name are juststrings I decided upon to make it self documenting. As an alternative to the preceding, I couldhave even used info tags to accomplish the same purpose:SmithJohn800-555-1212407-555-5555

Skating buddyRacing inlines

Your choice of attributes vs. elements depends on things such as whether you'll need more thanone of the entity (no two attributes of a single element can have the same name), and whetheryou should always have the entity (that might favor using an attribute). Also, use elements iforder is important, because the XML specification doesn't specify the order of attributes, soparsers don't necessarily preserve attribute order. All this will be explained later in thismagazine.

The preceding examples have used XML as a hierarchical representation. But it can also be used

as stylized markup:

Why XML is So GreatXML is absolutely wonderful!Andit's not just because XML is CorporationallyCorrect!Now let's talk about...


8/47

In the preceding, the XML markup describes the styles, or functionality, of marked up text. It'sup to the application rendering the XML to assign an appearance to such styles. Even therelationship between style and appearance can be moved out of the application using XSL(Extensible Style Language). XSL is a separate but related subject that is not discussed in thisissue of Troubleshooting Professional.

Tags must be nested, never interlaced. The following is not allowed:

XML is great and good.

The well formed way to write the preceding would be to nest tags, like this:

XML is great and good.

Because tags can't be interlaced, but instead must be nested, all XML represents a hierarchy. Forinstance, the preceding snippet could be thought of like this:

XML is

truly

great

and fantastic.

Generally speaking, in XML intended to represent a hierarchy, an element containing a text nodecontains no other elements or text nodes, but in XML intended to represent markup, an element

often contains several text nodes and several other elements. But this is not a rule, only a custom.

I believe the best way to learn XML is through the DOM (Document Object Model) spec, soDOM is discussed voluminously in later portions of this issue of Troubleshooting ProfessionalMagazine.

In this Article You Have Learned

XML is a styles based markup language XML is hierarchical in nature. Tags must be nested, never interlaced. XML is extremely readable and easy to understand. XML can represent almost any concept. XML can implement a major part of an application.


9/47

Steve Litt is the author of Rapid Learning: Secret Weapon of the Successful Technologist". He can be

reached at Steve Litt's email address.

Some Definitions

By Steve Litt


You'll learn definitions for the following:o Documento Elemento Attributeo Text Nodeo Nodeo Document elemento DTDo Well formedo Valido Schemao DOMo SAXo Dom documento Namespace

Document

The data contained in an entire XML file:

getchbsd.pl;

Element

The entity defined by a start tag and end tag, but not the entities contained betweenthe start and end tags:

getchbsd.pl

Note that all elements are nodes, but not all nodes are elements. Elements inherit allmethods of nodes, and add some of their own. Nodes are discussed later in thistable.

Attribute

The name/value pairs enumerated in an element's start tag:

getchbsd.pl

Text Node The text between the open and close tag of its parent element:getchbsd.pl

Node

The most atomic XML entity that is programmatically useful. Elements, attributesand text nodes are all nodes. There are other node types which are described in theDOM spec:// NodeTypeconst unsigned short ELEMENT_NODE = 1;
http://www.troubleshooters.com/bookstore/rl.htmhttp://www.troubleshooters.com/bookstore/rl.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/bookstore/rl.htm


10/47

const unsigned short ATTRIBUTE_NODE = 2;const unsigned short TEXT_NODE = 3;const unsigned short CDATA_SECTION_NODE = 4;const unsigned short ENTITY_REFERENCE_NODE = 5;const unsigned short ENTITY_NODE = 6;const unsigned short PROCESSING_INSTRUCTION_NODE = 7;

const unsigned short COMMENT_NODE = 8;const unsigned short DOCUMENT_NODE = 9;const unsigned short DOCUMENT_TYPE_NODE = 10;const unsigned short DOCUMENT_FRAGMENT_NODE = 11;const unsigned short NOTATION_NODE = 12;

The Node interface of the DOM spec contains most of the navigational methods.Note that all elements are nodes, but not all nodes are elements. Elements inherit allmethods of nodes, and add some of their own.

Documentelement

Top level element, of which there can be only one per XML file:

getchbsd.pl;

DTD

A sort of type declaration for XML. Here's an ultra-simple one:

Note that docelement is NOT a reserved word.

Wellformed

An XML file conforming to the XML syntax rules, including:

Every start tag has an end tag, and none are "interlaced", but instead all areproperly nested.

Every attribute has a name followed by an equal sign followed by a quotedvalue.

There is one and only 1 top level elementValid Well formed, AND conforming to the rules of the DTD.

Schema Performs a function similar to a DTD.

DOMStands for Document Object Model. A method of placing an entire XML file'shierarchy, with all its elements, in a memory object. This memory object is built forquick lookup, traversal and modification.

SAX

Stands for Simple API for XML. An event driven method of dealing with an XMLfile. Instead of containing the entire hierarchy in memory at one time, it presents

elements as events which can then be exploited by your code. SAX has theadvantage of less memory consumption for large files, but has the disadvantage thatthe programmer must write code to save anything he wants saved, and must writechanges to the XML file in sequential order. DOM allows random changes toelements. Because needn't keep entire files in memory at once, SAX is universallyuseful, whereas DOM is not useful for truly huge XML files.

DOM In the DOM standard, an object containing the entire hierarchy, elements, and


11/47

Document information of an XML file.

DOMobject

Any object contained within a DOM document. Vague, ambiguous, andmisunderstood -- don't use this term. THIS TERM IS NOT A SYNONYM FORDOM document!!!

Namespace

A method of uniquifying tag names from various XML varients:Circuit - Vertical Zener Diode


You'll learn definitions for the following:o Documento Elemento Attributeo Text Nodeo Nodeo Document elemento DTDo Well formedo Valido Schemao DOMo SAXo Dom documento Namespace

Steve Litt is the developer ofThe Universal Troubleshooting Process troubleshooting courseware. He can be

reached atSteve Litt's email address.

Anatomy of an XML App

By Steve Litt
http://www.troubleshooters.com/utp/tcourses.htmhttp://www.troubleshooters.com/utp/tcourses.htmhttp://www.troubleshooters.com/utp/tcourses.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/utp/tcourses.htm


12/47


The high level architecture of a DOM/XML application. Interaction of the XML file, parser, DOM document, XML write logic, Renderer,

Modification logic, and output.

An XML application reads an XML file, after which it can modify and rewrite the XML, and/orit can print output based on that XML (commonly called "rendering"). Note that "rendering" cantake widely diverse forms, including changing which fields are available on a form, printing avector graphic, or the most obvious case of rendering marked up text. Rendering can even takethe form of configuring an application, or executing remote procedures.

The DOM model is easiest to understand, so here is the architecture of an XML app using DOM:

So here's what happens: A parser reads the XML file and builds a DOM document to match theXML file. From that point until a save is performed, all interaction between the app and XMLhits the DOM document rather than the corresponding XML file. It's interesting to note thatalmost all XML parsers use SAX. The reason is simple enough. Before you build a DOMdocument you must detect events such as start of element (start tag encountered), end of element(end tag encountered), new attribute (name followed by equal sign followed by quoted stringencountered), and the like. So DOM can be thought of as an extra abstraction to lessen theprogrammer's workload, at the expense of memory usage.

Modifications are made directly to the DOM document. Elements can be added, deleted,renamed, rearranged. Text nodes can be added, deleted or changed. Elements can be movedeither within the same level, or promoted or demoted to different levels.

Obviously, the DOM is modified in apps that rewrite the XML file. But DOM modification isalso often done in an app that only renders the XML. The classic example is in a "DOMWalker"app, which simply walks the DOM tree and prints what it finds in a hierarchical outline. In fact,


13/47

the newlines and spaces intended to make the XML file more readable are actually legitimatetext nodes in XML, but in an XML app concerned only with a hierarchy they're extraneous.Therefore, the first thing a DOMWalker program does is delete text nodes made up only ofwhitespace. Source code for an example DOMWalker is given later in this magazine.

Rendering is the heavy part of most XML apps. It's often graphics intensive. Consider the Diavector drawing program, which keeps all drawing information in XML but renders as geometricshapes. Often there are several rendering processes, one for each kind of output. Thus a bookauthored in XML could be rendered as a paper book, as a PDF, as a Postscript file, or as anHTML page or series of HTML pages. Indeed, this is one of the primary benefits of styles based

documents. Often the rendering itself is decoupled from the app by use of XSL (eXtensible Style

Language), much the same as program logic is decoupled from the app using XML.

Rewriting the XML file is actually easy -- about what you'd expect for your last class project in acollege Programming 101 course. In the case of DOM, you've already assembled the output in aDOM document, so you just walk its tree and write the markup.

In the case of SAX based XML apps it's a little harder because you often don't read theinformation in the same order you want to write it. In other words, if your app's specificationcalls for something occuring later in the input modifying something earlier in the output, youcan't just use a read-write loop. So you do the typical stuff -- keep some things in memory, ormaybe write an intermediate file and then sort it, or run 2 passes through the XML. This is whyfor apps interacting with guaranteed small XML files, DOM is better.


The high level architecture of a DOM/XML application. Interaction of the XML file, parser, DOM document, XML write logic, Renderer,

Modification logic, and output.

Steve Litt is the documentor of theUniversal Troubleshooting Process. He can be reached atSteve Litt's

email address.

Simplified Explanation of the DOM API

By Steve Litt


How to create a DOM Document object with a parser. The three main DOM activities. Using the "checker metaphor" to understand iterative document navigation.


14/47

The major DOM navigation public variables and equivalent methods. The Down, right, up, done DOM walking algorithm. A simple Java implementation of that algorithm. How to access elements by name. How to add, change and delete information in the DOM document. Navigating attributes by name and sequentially.

If you understand DOM, you're 90% of the way to understanding XML.

What you might think of as a "DOM object" is really an instance of the Document class:

DOMParser dp = new DOMParser();dp.parse("myfile.xml");

Document doc = dp.getDocument();

In the preceding code, the parser delivers an instance ofDocument, called doc, which containsthe entire information hierarchy contained in the original file myfile.xml. You can use methodsfrom the DOM API to extact any info from the DOM document if that information was in theoriginal XML file (with a very few exceptions)

The simplest explanation of a DOM document is that it's an in-memory tree containing all infofrom the XML file hierarchy, together with with varous methods to navigate that tree, to getinformation from a specific node, and to add, delete, rearrange or modify nodes. If you cannavigate, get, and change, that's pretty much all you need to do with a hierarchy.

There's no better documentation on DOM than W3C's DOM specification papers, available attheir website. To learn XML, you should spend about a day reading the parts dealing with XML(not with HTML). It is time *very* well spent.

The purpose of this article is to help you understand what you will see when you read the DOMspec, so that you don't go off on the wrong track and you aren't overwhelmed.

Throughout this article, keep in mind that DOM methods enable three main activities:

1. Navigating the hierarchy tree2. Viewing information (get)3. Modifying information(put, delete, add, move, etc.)

Navigating the hierarchy tree

The DOM navigation methods are defined so you can navigate the tree without recursion. Theydo this using methods that move your current position around like a checker on top of the various


15/47

nodes. Here I use the word "checker" like the round plastic play pieces used in the board gamecalled "Checkers".

Note: The "checker" metaphor will be used extensively throughout this issue of Troubleshooting

Professional.

Most of the methods to read and modify elements operate on the element with the checker.HOWEVER...

My assertion that they operate by moving a checker around is not quite accurate, because thesenavigation methods do not change the state of the DOM document. Instead, they simply deliver anode. The programmer records the current position by assigning the returns of these methods to anode object. That node object marks the place of the "checker".

!! CAUTION !!

Although attributes are nodes, they are invisible to the navigationmethods and public variables listed below. There are specialized

methods and public variables to access and navigate attributes.

The following is a list of the major navigational methods, and the equivalent public variables,and the interfaces in which these methods and public variables are implemented. Immediatelybelow the list is a sample hierarchy to walk. Observe the naming convention that in general thevariable name is converted to the method name by capitalizing the first leter, and prependingeither get or set as appropriate. For the time being, don't worry about the Interface column:

Method Equivalent public variable Interface

getOwnerDocument()getDocumentElement()getFirstChild()getLastChild()getNextSibling()getPreviousSibling()getParentNode()

readonly attribute Document ownerDocument;readonly attribute Element DocumentElement;readonly attribute Node firstChild;readonly attribute Node lastChild;readonly attribute Node nextSibling;readonly attribute Node previousSibling;readonly attribute Node parentNode;

NodeDocumentNodeNodeNodeNodeNode


16/47

In plain English, you start with the checker on the document element. At every juncture:

You go down if you can go down. Otherwise you go right if you can go right Otherwise you go up if you can go up Otherwise you're done.

Trace the preceding pseudocode algorithm on the hierarchy diagram above and you'll see what Imean. Starting at the document element, you go down to A, then right to B, then down to 1, thenright to 2, then right to 3, then up to B, then right to C, then up to the document element, atwhich time you're done because you've already been there.

That brings up an important point. You shouldn't be able to go down from an element if you'vealready done so. When you first arrive at an element via a downward or a rightward movement,you descend if you can. But sooner or later, you'll come back up to that same element afteryou've gone as far right as you can in the level below the element. Obviously, you don't want todescend again, as that would make an infinite loop as described in the following indentedparagraph:

From A move right to B. From B move down to 1. From 1 move right to 2. From 2 move right to

3. From 3 move up to B. From B move down to 1...

So you implement a boolean control variable (let's call it ascending) that is true when youascend to a node, and false otherwise. The definition of "can go down" then becomes not onlythat there are children, but also that you are not ascending. The following Java loop walks a treeand calls once printNodeInfo() for each element:

mynode=doc.getDocumentElement();while (true) {if (!ascending) {


17/47

printNodeInfo(mynode);}

if ((mynode.hasChildNodes()) && (!ascending)) {mynode=mynode.getFirstChild();ascending = false;

}else if (mynode.getNextSibling() != null) {mynode=mynode.getNextSibling();ascending = false;

}else if (mynode.getParentNode() != null) {mynode=mynode.getParentNode();ascending = true;

}else {break;

}}

In the preceding Java code, object mynode is the "checker". Basically, what the code says is

perform an action (printNodeInfo() in this case) on the checkered element, and then makeyour move. Move the checker down if you can, otherwise move it right if you can, otherwisemove it up if you can, otherwise you're done (because you've returned to the document element).

Oh, and one more thing. The preceding navigation accesses not only elements, but also text

nodes. You can discern the two types with the nodeType public variable or the getNodeType()method implemented in the Node interface. However, remember that the preceding navigationmethods do NOT bring the checker to rest on attributes. Attributes have their own navigation andaccess methods. Using the "checker" metaphor, they could be said to have their own checker.

Accessing Elements by Name

The preceding section of this article discussed navigating elements by tree moves. That's idealwhen you don't know what elements you'll encounter. But sometimes, because of the nature ofthe application, you know it's likely that under a particular element, and that it's likely you'll haveone or more elements of a certain name. The following XML is an example:SmithJohn800-555-1212407-555-5555

Skating buddyRacing inlines

It's likely you'll have info elements, and you might want to list them. That's when you use thegetElementsByTagName(name)syntax, which delivers a NodeList (similar to an array) of all

such subelements. You can then loop through the NodeList to put your checker on each of thosesimilarly named elements. This can be done even when you know there will be only one suchnamed element.


18/47

Viewing information (get)

Once your checker is on an element, in some DOM implementations, including Python, you canaccess that element's information with variables:readonly attribute DOMString nodeName;

attribute DOMString nodeValue;readonly attribute unsigned short nodeType;

In other implementations, including Java, you use methods to accomplish these same things:public String getNodeName();public String getNodeValue();public short getNodeType();

Some implementations allow you to do either.

Modifying information(put, delete, add, etc.)

To change the value of an element, use the nodeValue public variable or its setNodeValue()

method equivalent. It's read/write. To change name of the element, you'll need to replace theelement with a different element, using the replaceChild(newChild,oldChild) syntax. Notethat this works not on the node with the checker, but a child of the node with the checker. To dothis you need to move your checker up. Depending on language and DOM implementation, and

assuming the checker is on myElement, this might be possible with a 1 liner:myElement.parentNode.replaceChild(newElement,myElement)

Otherwise, try something like this:Element tempElement = myElement;myElement = (Element)myElement.getParentNode();myElement.replaceChild(newElement,tempElement);

An element can be inserted before the checker like this:Element tempElement = myElement;myElement = (Element)myElement.getParentNode();myElement.insertBefore(newElement,tempElement);myElement = tempElement; //Return to original position

An element can be appended after the checker like this:Element tempElement = myElement;myElement = myElement.getParentNode();myElement.appendChild(newElement,tempElement);myElement = tempElement; //Return to original position

Once the new node is in place, you can change its value with its nodeValue public variable, orthe setNodeValue() method.

The "checker" element can be deleted like this:

Element tempElement = myElement;myElement = myElement.getParentNode();myElement.removeChild(newElement,tempElement);

In the case of deletion, you can't move the checker back to the original node because the originalnode is gone. The programmer handles this by storing where he wants to go after the deletion.For instance, a DOM walker that deletes all blank text nodes keeps a copy of where the checkerwas in the previous iteration, and upon deletion goes back there. In the next iteration, it gets thenode "after" the deleted one.


19/47

Navigating Attribute Nodes

So far this article focused exclusively on navigating or accessing elements and text nodes. Butwithin elements there are sometimes attribute nodes. There are two broad ways to access anattribute node:

1. By attribute name2. Sequentially

Navigating Attributes by Name

Believe it or not, accessing nodes by attribute name is by far the more useful. That's because ifyour app has never heard of a given attribute, there's not a whole lot it can do with it, assumingyou're using attributes as they're designed to be used. So it's rare to access attributes sequentially,but it can be done.

Navigating attributes is simpler than navigating elements because attributes cannot containanything else, and because you cannot have two attributes with the same name.

To get the value of a named attribute, use the my element.getAttribute(attribname) syntax.

To get an attribute object, use the element.getAttributeNode(attribname) syntax. Anattribute object contains the attribute name, its value, whether the value was specified as opposedto default, and the element that owns the attribute.

Navigating Attributes Sequentially

Getting attributes sequentially is much more difficult, and various DOM implementations havetheir own glitches. You'll need to experiment to get it just right. A typical use of sequentialaccess to attributes is a reporting program, or writing the DOM document out to an XML file.

An element's attributes are accessed as an array, not with a getNext type of API. Differentimplementations are different, and you'll need to experiment, but typically you get the array, getthe array's length, and then loop through the attribute nodes. You get the array with theattributes public variable or the getAttributes() method, defined in the Node interface, and

the number of elements with the length public variable defined in the NamedNodeMap interface,then loop, accessing each attribute with the item() method implemented in the NamedNodeMap

interface, then accessing the attribute's name and value public variables from the Attr interface.If your implementation uses only methods, use

getNodeName()and

getNodeValue(). The

following is some Java code to do that:

NamedNodeMap attribs = thisNode.getAttributes();for(int i=0; i < attribs.getLength(); i++){Node attrib = attribs.item(i);System.out.print(attrib.getNodeName());System.out.print("=\"");


20/47

System.out.print(attrib.getNodeValue());System.out.print("\"\n");

}

Once again, in many DOM implementations the preceding doesn't work. In some cases attribs

is an array in the computer language's native format, after which it can be traversed usingconstructs of the language. Experiment.


How to create a DOM Document object with a parser (DOMParser object). The three main DOM activities are navigating, viewing and modifying. Using the "checker metaphor" to understand iterative document navigation. The major DOM navigation public variables and equivalent methods.

o The methods: getOwnerDocument() getDocumentElement() getFirstChild() getLastChild() getNextSibling() getPreviousSibling() getParentNode()

o The Public Variables: readonly attribute Document ownerDocument; readonly attribute Element DocumentElement; readonly attribute Node firstChild; readonly attribute Node lastChild; readonly attribute Node nextSibling; readonly attribute Node previousSibling; readonly attribute Node parentNode;

The Down, right, up, done DOM walking algorithm:o Go down if you cano else go right if you cano else go up if you cano else you're doneo Do not descend if you got to the node by ascending.o Do not process the node if you got to it by ascending.

A simple Java implementation of that algorithm. Using getElementsByTagName(name) to access elements by name. How to add, change and delete information in the DOM document with

appendChild(newElement,tempElement) and

insertBefore(newElement,tempElement) ,replaceChild(newElement,tempElement) , andremoveChild(newElement,tempElement) .


21/47

Navigating attributes by name using getAttribute(attribname) andgetAttributeNode(attribname), and navigating attributes sequentially withgetAttributes(), getLength(), and getNodeName().

Steve Litt is the documentor ofThe Universal Troubleshooting Process. He can be reached atSteve Litt's

email address.

Learning from the Masters: How Dia Uses

XML

By Steve Litt

In this Article You Will Learn How to use Dia to learn good XML construction. Dia is a vector drawing package that stores its drawing information in XML format. Modifying the drawing modifies the XML, and Modifying the XML modifies the

drawing.

This article may seem very tedious. You might be tempted to skip it. But unless you already havea deep understanding of XML and a feel for what makes good XML, this is the most importantarticle in this magazine. If you skip this article, you'll likely fail (or at least not understand what

you're doing) when you try coding the XML app exercises later in this issue. But if you spend thehour it takes to do this article's exercises, and the extra 1 to 3 hours to debrief yourself so youreally understand what has happened, you will have a deep, intuitive grasp of XML, and nothingwill stop you.

!! CAREFULLY READ AND PARTICIPATE IN THIS ARTICLE !!

Many Linux distros come with a vector graphics drawing program called Dia. Dia is an OpenSource alternative to Visio. It stores not only drawings but also template shapes in XML, so it'svery extensible and could surpass Visio. Using only a text editor, you can create brand newtemplate shapes, each with an arbitrary number and placement of connnection points. It's

incredible.

Dia is available on many Linux distros. I know it's on Mandrake 7.1 and 7.2, although it's not on

the menu. But it's in /usr/bin. If Dia isn't installed, see if you can install it from yourdistribution CD (check for a file with a name like dia-0.86-2mdk.i586.rpm in your RPMSdirectory on Red-Hat derived distros).


22/47

If your distro didn't come with Dia, here are some places you can get it:

Type of install Where to find it

Source http://www.lysator.liu.se/~alla/dia/dia.htmlDebian Package http://packages.debian.org/unstable/graphics/dia.html

RPM files http://www.rpmfind.net, then search for dia.

Dia is a diagramming tool most suitable for data flow diagrams, network system diagrams, orbasically anything resembling a block diagram. Connection lines stay connected as you movecomponents around. You can add bends to connection points by right-clicking a multi-segmentconnection line and choosing "add new segment". Outstanding!

All drawings are stored as gzipped XML files. You can modify a drawing two ways --

graphically, or by editing the XML. Although the latter is much more time consuming and harderto visualize, for work requiring exact measurements it might be preferable.

Hello World Dia XML investigation

But never mind. I came to use Dia, not to praise it. We're going to use Dia to learn how it usesXML, in preparation for our own XML app. Start by running Dia from the command line.Among other screens which are relatively extraneous, you'll see a screen like the following:


23/47

That's the Dia toolbox. From the menu, click file, then new, and you'll be brought to a blankpage. Right click the blank page, choose file, then save as, and save it as blank.xml.gz. Nowclose the drawing by right clicking the empty drawing and choosing close.

Remember, Dia saves its drawings as gzipped xml files. View blank.xml.gz with the following

command:

zless blank.xml.gz

You'll see an XML file whose document element is (with a namespace appended --well discuss this much later). Second level element are and . Examine the element's XML code:

There's no end tag. In XML, when an element contains no subelements or text nodes, the starttag and end tag would butt up next to each other. To enhance readability in such cases, XMLsyntax allows a forward slash before the ending angle bracket of the start tag to denote an end

tag. The layer element has two attributes, name, with value "Background", and visible, withvalue "true". Remember that none of these strings are XML reserved words.

In the case of the element, it has tons of subelements, most of which are elements (this is not an XML reserved word). As you can see, there's an element for the drawing's background, an element for the "paper" used with thedrawing (size, margins, portrait/landscape and the like), an element for the grid to beused, and an for something called "guides", of which there's apparently a horizontaland a vertical instance. People hear me well, a lot of the Dia application is specified by thislayout, and this layout is extremely readable. Behold the power of XML!

You'll notice a couple other things. elements contain other elements (ordon't, as the individual elements data dictates). XML allows storage ofvery freeform data. You'll

also notice a element. This is intended as a container for multiple elements.

What's in an Ellipse

We're going to draw an ellipse, save it as ellipse.xml.gz, and then compare it withblank.xml.gz. The result will be the Dia application's XML representation of an ellipse.

From the Dia toolbox, choose file and open, and open blank.xml.gz. In the tool box, click the ellipse tool, which has an icon like this: In the drawing, click and drag to lay down the ellipse. Drag in such a way that the

ellipse is considerably wider than it is high. In the drawing, right click, choose file/save as, and name the modified file

ellipse.xml.gz. In the drawing, right click, choose file/close to close the drawing.


24/47

Now use the following commands to view the difference between blank.xml.gz andellipse.xml.gz:$ gunzip ellipse.xml.gz blank.xml.gz$ diff ellipse.xml blank.xml | less

You get something like the following:

58,76c58< < < < < < < < < < < <

< < < < < < < --->

Look it over for a second. All that happened was a single element, whose type attribute

has value "Standard - Ellipse", has been inserted into the object whose name attribute has

value "Background". The element contains several elements describing allthe "attributes" you'd expect of an ellipse, such as position (X and Y coords), the top left corner(X and Y coords), the width and the length. There's also an element called obj_bbwhich is the four points comprising the bounding box of the object. It's all very readable.

Notice there's no color listed? Let's give the ellipse a fill color and observe the change.

How Colors are Implemented

First, be sure to gzip ellipse.xml:gzip ellipse.xml

Now open ellipse.xml.gz in Dia. Drag a rectangle around the ellipse to select it without therisk of moving it. Now right click the ellipse, choose dialogs, then properties. Click the color barnext to "Fill colour", and crank the blue all the way down until it's a pure yellow. Now click thecolor bar next to "Line Colour", and crank up the blue until the line is pure blue. Right click the

drawing, choose File/save as, and save the drawing as colors.xml.gz. Finally, click thedrawing, choose file/close to close the drawing.


25/47

Now use the following commands to view the difference between blank.xml.gz andellipse.xml.gz:

$ gunzip colors.xml.gz ellipse.xml.gz$ diff colors.xml ellipse.xml | less

You get the following:

75,83d74< < < < < < < < <

It's simple to see what happened. An element with attribute name having value"inner_color" was created with a subelement called , with a val attribute whose valueis"#ffff00" (pure yellow), to describe the fill color. An element called"border_color" was created with a subelement with attribute val valued at"#0000ff" (pure blue), to describe the line color. And an element called

"border_width" with a subelement called , whose val attribute has value at "0.1". Notethat when I say the elements were called such and so, what I really meant was that

they had an XML attribute called name, and the value of that attribute was such and so.

If you're like me, you wonder why a border width entity was created. I'd guess that there was noborder until you specified its color.

: NOTE :

Look what the application has done. Every property of the ellipse is described with an

element. They could have had special elements called and thelike, but they didn't. Likewise, they have a subelement to describe the value of the property.Each such subelement has a name corresponding to what is being measured, and a valuecorresponding to the actual value of the property. Why did they do this? they could have just aseasily done something like this:

But that wouldn't have been as generic. What the authors of Dia have done is to create a systemwhere any property can be described, and all properties can be read into the app. This is how thepros do XML.


26/47

Anyway, you now see how it handles colors. We've done quite a bit of work manipulating Diaand noting the result in XML. Now let's go the other way.

Modifying a Drawing with a Text Editor

Because you gunzipped colors.xml.gz, you now have an XML file called colors.xml. Usingyour favorite text editor, edit that file. Pay particular attention to the following two elements,which might not be next to each other in your experiment:

As you remember, you made the ellipse much wider than it was high. That's why the elem_widthis much bigger than elem_height. Using the cut and paste of your text editor, carefully exchange

the values associated with elem_width and elem_height, and then resave the file. If things go asexpected, pulling the diagram up in Dia should now show an ellipse higher than wide.

Naturally, you need to gzip the file again:

gzip colors.xml

And finally pull the drawing up in Dia. And sure enough, the ellipse is now higher than wide (ifnot, troubleshoot).

Creating a Dia Exploration Script

We really didn't need to go through all the gzipping and gunzipping, file save and file close andtext editing and lessing. We did that just to minimize the extraneous variables so you could seethe exact effects of tiny changes in the drawing. Now it's time to make a script to quicklyalternate between the graphic and text view of drawings, with the ability to change in either viewand view the changes in the other. Here's the script:

resp='y'echo $respwhile test "$resp" = "y"; do

dia test.xml.gzrm test.xmlgunzip test.xml.gzvi test.xmlgzip test.xmlecho -n "Do it again? (y/n)===>"read resp

done

Save the preceding script as rdia and chmod rdia as executable by all (chmod a+x rdia).


27/47

: NOTE :

If you don't like the VI editor, substitute the name of your favorite Linux editor for vi in thescript

This script won't function if there doesn't exist a test.xml.gz, so before using the script go intoDia, create a blank drawing, and save it as test.xml.gz. Finally, run the script and experimentediting both the XML text and the Dia graphics, and note how changes in one environmentappropriately change the other.

! CAUTION !

This script will not procede to editing the XML file until you completely exit the Dia

application. You exit Dia after saving your work by clicking the close icon on the Dia

toolbox.

If you "gum up" the XML so badly that you can't pull up the file in Dia, simply create a new

blanktest.xml.gz in Dia.

Exploring Shapes and Connectors

While in the rdia loop, make a drawing with a single ellipse, a single rectangle, a single triangle

(make it with the button), and a single line. View it in the XML view, noting how each

shape is specified in XML. Feel free to move and magnify things. Go back and forth. Have fun.

Who's on Top?

In Dia, make a yellow ellipse on top of a blue rectangle. Make sure the yellow ellipse is notcompletely inside the blue rectangle, and that the yellow ellipse doesn't completely cover theblue rectangle. If you have trouble putting the yellow ellipse on top, right click the yellowellipse, choose objects, and then clickbring to front. The yellow ellipse will now be on top. Youshould be able to see parts of the blue rectangle below it, and the yellow ellipse should not beentirely inside the blue rectangle. Save and exit Dia.

Once in the XML file, you'll notice that the blue box object appears before the yellow ellipseobject. That's intuitive, because objects appearing later get thrown on the canvas "on top of"existing objects. To test this theory, cut the XML for the yellow ellipse object, and place it belowthe XML for the blue box object. Save the file and continue. If everything's gone right youshould now see the blue box on top of the yellow ellipse.


28/47

See the beauty of XML. A concept like "how do I signify which objects are on top of whichothers would normally be difficult to implement. But if the app stores its info in XML, it's a no-brainer.

Notice that all objects are inside a single layer object. If you want to have a little fun, within the

Dia environment send different objects to different layers, then view the results in XML. Notethat I've had cases where seemingly correct changes to layers caused Dia not to load the file, andI've even seen where simply saving the file in VI caused Dia not to load the file. The good newsis I've always been able to correct this type of problem by deleting the new layer in VI, afterwhich Dia would load the file. When all is working well, you can manipulate layers in the XMLfile and have the results show up exactly as expected in Dia.

Experiment. The possibilities are endless.

A Little Theory

In your editor, go to the top of the XML file and note the following:

The first line is the XML Declaration, and basically gives the XML version. The second linedeclares a namespace, called dia, and equates it with the URI

http://www.lysator.liu.se/~alla/dia/ . Note that I said URI, not URL. There's a subtledifference. But anyway, don't expect to find anything at that URI. It would be coincidence if youdid.

The second line is a namespace declaration. It declares a namespace called dia, associating itwith the unique identifier "http://www.lysator.liu.se/~alla/dia/". The reason URI's are used isbecause they are the best hope for a unique identifier worldwide. For instance, if I were to authora new XML file and wanted to give it a new namespace, I could name it after a directory onTroubleshooters.Com, knowing that Troubleshooters.Com is mine to control. Of course, thisdoesn't stop someone else from using Troubleshooters.Com as part of their unique identifier, butthat would be very bad ettiquette.

A namespace is simply an "area" or "scope" (for want of better words) within which each nameis guaranteed unique. This is important as Internet enabled apps use more and more XML files

from more and more sources. The basic idea is that all elements could be prepended with dia:,

in which case, for instance, would be differentiated from, let's say,.

For the time being this is isn't important in the learning process, but remember it in case you latersee this syntax. And remember, you WILL NOT find a DTD or schema for the XML file at theURI. The URI is just a method of unique identification.


29/47

Exploring Groups

So much for theory -- back to action. Here we'll explore groups of objects in Dia. Run your ./rdiascript, and delete all objects from the drawing using the Ctrl+X keystroke combination. Also, ifyou created extra layers in prior exercises, be sure to delete them. Right click anywhere on the

drawing, choose dialogs, then layers. The X button is what enables you to delete a layer. Be surenot to delete the original layer, which is probably called Background.

Now draw yourself an ellipse (wider than high) and a rectangle (wider than high). Now drag aband around both to select them, right click on either object, select objects then group, and notethat instead of both objects being selected, the pair of objects is selected. You cannot select oneobject without selecting both, and when you move one both move. Save the drawing and quitDia.

Viewing it XML, you see that the two objects are now between a pair of tags.The complex task of grouping objects is handled just that simply.

Go back to Dia, select the group, right click either object in the group, and choose objects andungroup. The objects become two separate objects now. Save the drawing and exit Dia. Lookingat the XML, you'll notice everything's the same except the pair of tags is gone.

Exploring Connection Drawings

Let's make a real diagram and see how it's implemented in XML. We'll be creating a diagramsimilar to the following:

First erase everything from the drawing with the Ctrl+X keystroke combination. Next, draw thebattery, resistor and zener diode using their buttons from the Circuit template group, which will

probably be the default. The buttons for battery, resistor, and zener are, respectively: , ,

and . Each of these graphic circuit components have connection points at their leads, so use

the Zig Zag Line button ( ) to create zig zag lines, clicking on one connection point anddragging to the one on the next electronic component. Try to arrange the components so theylook something like the drawing above. When you have something resembling the preceding


30/47

drawing, save it and exit Dia to see the XML you've created.

: NOTE:If you're really having trouble drawing this drawing in Dia,click here.

First, notice that you've created the following objects:

Observe that each object has an id attribute with values from "O0" through "O5". That fact willcome in handy investigating the mechanics of line to circuit component connections. Note thatdepending on the order in which you placed components and lines, your ID numbers may vary.

Next, notice that each Zig Zag line has a element, containing two elements. The following code shows the elements for the first, second, and thirdzigzag lines respectively:

So let's describe each zigzag line's connections in English. Line 1 connects to handle 0 of objectO0 (the battery), and also to handle 1 of object O1 (the resistor). It connects the battery to theresistor. Zigzag 2 connects to handle 0 of object O1 (the resistor, and please remember thathandle 1 of the resistor is already taken), and also to handle 1 of object O2, the zener. It connects

the resistor to the zener. Zigzag 3 connects to handle 0 of object O2 (the zener), and handle 1 ofobject O0 (the battery). It connects the zener to the battery, completing the circuit.

You can actually modify the XML to put the program in an illegal state. On the following line:
http://troubleshooters.com/tpromag/200103/drawinghelp.htmhttp://troubleshooters.com/tpromag/200103/drawinghelp.htmhttp://troubleshooters.com/tpromag/200103/drawinghelp.htmhttp://troubleshooters.com/tpromag/200103/drawinghelp.htm


31/47

Change the 0 to 1 in the handle="0" attribute, save, and pull it up in Dia, and note thateverything looks fine. Now click on the zener diode, and note that Dia aborts. Change the 1 backto 0 and confirm that the illegal state has been taken care of.

Basically, what happened is that a line with nonzero length had its begin and end handles at the

same point -- an error condition in both Dia and mathematics.

That brings up an interesting hypothesis. Perhaps we should strive to make XML apps somutable that you can't put them in an illegal state by editing the XML. Such an app would indeedhave all its logic in the XML file, and the executable app would merely be a viewer. Perhaps thathypothesis is a little over the top, but it's an interesting thought.

Creating a new Template Shape

Indeed, part of Dia's logic is in its XML files. The template shapes are determined by two XMLfiles and one file with a dot picture. The XML files determine the properties of the template

shape -- its image and its connection points. The dot file determines the icon on its button.During this exercise we'll add a new template shape to the beginning of the circuit templategroup. We'll call that new shape a smily, and it will be a smily face, in a rectangular head, with aconnection point at each ear. These are the four steps to making this new shape:

1. Find the Dia shared directory2. Create the icon, shapes/Circuit/smily.xpm3. Create the smily shape, shapes/Circuit/smily.shape4. Add the Smily Face shape to the Circuits template group, sheets/Circuit.sheet

#2 creates the icon. #3 defines the shape, and associates it with the icon. #4 incorporates the new

shape in the Circuits template group (sheet).

Find the Dia shared directory

It's probably going to be /usr/share/dia, and it will have two subdirectories, shapes and

sheets. The best way to find it is to find Circuit.sheet, which resides in the sheets directoryunder the Dia shared directory. First try the locate command:$ locate Circuit.sheet

If that doesn't produce results, do a brute force search through the /usr tree:# find /usr -type f | grep "Circuit\.sheet"

/usr/share/dia/sheets/Circuit.sheet/usr/src/RPM/SOURCES/dia-0.86/sheets/Circuit.sheet#

In the preceding example, the Dia shared directory would be /usr/share/dia.

Create the icon, shapes/Circuit/smily.xpm


32/47

This file specifies the appearance of the icon on the new shape's button. Copy and paste thecontents of the following box to a file called smily.xpm in the Circuit directory below the Diashared directory:

/* XPM */

static char * smily_xpm[] = {"22 22 3 1"," c None",". c #000000","+ c #FFFFFF"," "," "," "," "," "," "," .............. "," . . "," . . . . ",

" . . "," . . . "," . . . . "," . .. .. . "," . ... . "," .............. "," "," "," "," "," "," "," "};

As you can see, it's really just a dot picture of the icon to be displayed, plus the size and a fewother properties.

Create the smily shape, shapes/Circuit/smily.shape

This is the specification of the new shape. Copy the contents of the following box to a file calledsmily.shape in the Circuit directory below the Dia shared directory:

Circuit - Smily FaceA smily face to brighten your daysmily.xpm


33/47

The document element is . It has the following subelements:

Subelement Function

The name by which this shape is known.

A human readable description of the shape.

The Icon file associated with the shape, in this case the one you made earlier.

The connection points for persistant line connections.

The actual shape information, built from further subelements which aregeometric shapes such as polygons.

Now you have a shape file describing your new template, and associating it with an icon. The

last step is to inform the Circuit template group (sheet) that this shape has been added...

Add the Smily Face shape to the Circuits template group, sheets/Circuit.sheet

There's a file called Circuit.sheet in the sheets directory under the Dia shared directory. Copythat file to Circuit.sheet.org in the same directory. Now if you hopelessly mess upCircuit.sheet it won't be necessary to reinstall Dia to fix the mess. You can simply copy theoriginal file back.

Now open Circuit.sheet with your favorite text editor. You'll note that the document elementis , with several subelements, each of which gives a description in a

different language. It also has one subelement. That's where the rubber meets theroad, because contains many subelements, each of which describe a

template shape. If your file has not been modified, the first will be named "Circuit -Vertical Resistor". You're going to insert the smily object before the vertical resistor. Simplycopy the contents of the following box between the tag and the first tag:


34/47

En smilyfaceUne smilyfaceEine smilyfaceA smilyface

The value of the name attribute of the element must be spelled, capitalized and spacedexactly as shown. That's because that value is how Dia finds the shape file you previouslycreated. In fact, this text must exactly match the text in the shape file. Once you'vecompleted this, quit Dia, restart it, and you should see the smily icon on the Circuit template.Click it, and drag on the drawing to make a smily. Connect zigzag lines to it, put it into a circuit.Save, quit, and see the results in XML. Within XML, experiment with changing the booleanvalue of the elements whose name attr ibutes are flip_vertical and show_background,and view the results in Dia.

This is XML at its best. You've just given an application a new capability, using only a texteditor.

Summary

The exercises in this article were long, and possibly tedious. But they were worth it, because ifyou did them, you now know XML intellectually and intuitively. You've seen an XML dialectthat has been specified for maximum adaptibility. You've seen how the masters put as much ofthe implementation as possible in the XML, rather than in the executable app. You've seen trueround trip development between a GUI and a text editor environment.

Allow your imagination to relish all the possibilities. Imagine how XML would help you write

that app you always wanted to write but never figured out how. With XML, the only limit isyour imagination.

Congratulations. You've learned as much XML as you can get from reading. Now it's time to do.You've graduated. The next article walks you through writing your own XML Hello World.Enjoy.


How to use Dia to learn good XML construction. Dia is a vector drawing package that stores its drawing information in XML format. Modifying the drawing modifies the XML, and Modifying the XML modifies the

drawing..

Steve Litt is the author of Troubleshooting Techniques of the Successful Technologist". He can be reached

atSteve Litt's email address.
http://www.troubleshooters.com/bookstore/ttech.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/bookstore/ttech.htm


35/47


36/47

Steve Litt is the developer ofThe Universal Troubleshooting Process troubleshooting courseware. He can be

reached atSteve Litt's email address.

Where to Go From Here

By Steve Litt

If you've done all the exercises you're pretty comfortable at XML, and could probably take on anXML project at work tomorrow. Does that mean you know all necessary XML information? Noteven close.

We covered the tip of the iceberg. Just the basics of reading and writing XML from a program.There's a world of other XML knowledge to gain. There are excellent XML processing modelsbesides SAX and DOM. Schemas offer an alternative to DTD validation. There's the entirediscipline of rendering, complete with XSL, XSLT, and XML frameworks like Cocoon fromApache Software Foundation. There are specific XML varients such as SVG for vector graphics,

as well as chemical and mathematical markup languages. There's even a markup language calledXML-RPC for remote procedure control. And there's much, much more.

Luckily, the information is easy to find, and you now have the skills to exploit it. This articlegives some web resources and a couple books that you'll find helpful.

Here are some URL's that will help you with specific projects. And we definitely coveredenough so that computer programs you write can access and work with XML. Without the infoin this tutorial, going farther would have been folly.But there's a world of other XML availableto you:

1. XML Schemas:http://www.w3.org/XML/Schema.htmlo XML Schemas are an alternative to DTD's, probably a better alternative. I just

didn't have enough time to do justice to schemas, but the preceding URL gives anexcellent starting place.

2. XML processing models other than SAX and DOM:o pyxie:http://www.pyxie.org

"Python specific" hierarchy storage system that appears to have Perlsupport. An ultra simple alternative to DOM.

o JDOM:http://www.jdom.org Lightweight, simple Java specific XML handler with a native Java syntax.

o Grove:http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pm

Grove allows manipulation of an XML hierarchy as native Perl hashes andarrays.

o XML::Simple:http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pm

An ultra simple event model (like SAX) XML interface for Perl. If XMLis a small part of the project, consider this.
http://www.troubleshooters.com/utp/tcourses.htmhttp://www.troubleshooters.com/utp/tcourses.htmhttp://www.troubleshooters.com/utp/tcourses.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.w3.org/XML/Schema.htmlhttp://www.w3.org/XML/Schema.htmlhttp://www.w3.org/XML/Schema.htmlhttp://www.pyxie.org/http://www.pyxie.org/http://www.pyxie.org/http://www.jdom.org/http://www.jdom.org/http://www.jdom.org/http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://www.jdom.org/http://www.pyxie.org/http://www.w3.org/XML/Schema.htmlhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/utp/tcourses.htm


37/47

o XML::Twig:http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pm

This appears to be a "best of both worlds", with the ability to store data intrees, but possessing callbacks and other features allowing the processingof just the parts of a huge XML file that the program finds necessary.

Given Perl's less than stellar support for DOM, you should give Twigsome serious consideration.3. XSL:http://www.w3.org/Style/XSL/

o eXtensible Stylesheet Language. In a nutshell, XSL endeavors to define howXML is transformed and rendered. The transform part is done by XSLT.

4. XSLT:http://www.w3.org/TR/xslto XSLT is a language the defines XML transformations.

5. Cocoon:http://xml.apache.org/cocoon/o An XML publishing framework endeavoring to split XML work into XML

authoring, XML processing, and XSL rendering. The result is the ideal we've allbeen looking for -- an XML file that can be simultaneously rendered as HTML

(possibly different browser specific HTML formats), Postscript files, and otherforms.6. XML-RPC:http://www.xmlrpc.com/

o This is just too cool. At the heart of it is an XML dialect defining what procedureis to be called, and what arguments are to be passed to it. What comes back isanother XML document, with each returned argument and its value defined. Canyou say "distributed computing?". This could become very corporationallycorrect. See the XML-RPC for Newbies discussion athttp://davenet.userland.com/1998/07/14/xmlRpcForNewbiesand the RPCDebugger athttp://frontier.userland.com/stories/storyReader$1077for furtherdetails.

Excellent XML Books

Beware. Most XML books talk about little else except XML syntax and validation. Most of theXML books out there don't understand that programmers learn by programming, not bymemorizing YAAFAX (Yet Another Arcane Fact About Xml). Most XML books out theredevote several chapters to HTML, SGML, browser rendering, and blatantly Microsoft specificapplications, while having little or no programming to show how a real programmer canmanipulate and render XML.

Most XML books out there are trash. That's why I wrote this tutorial -- to undo the damage done

by those so-called XML books that did nothing but scare would-be XML programmers out ofXML, before they started. I assume that by this point in this month's TroubleshootingProfessional Magazine, you understand that XML is anything but rocket science.

Yes, most XML books are trash. But in my travels I discovered some good ones, and one trulyoutstanding one.

Java and XML by Brett McLaughlin: ISBN 0-596- 00016-2
http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://www.w3.org/Style/XSL/http://www.w3.org/Style/XSL/http://www.w3.org/Style/XSL/http://www.w3.org/TR/xslthttp://www.w3.org/TR/xslthttp://www.w3.org/TR/xslthttp://xml.apache.org/cocoon/http://xml.apache.org/cocoon/http://xml.apache.org/cocoon/http://www.xmlrpc.com/http://www.xmlrpc.com/http://www.xmlrpc.com/http://davenet.userland.com/1998/07/14/xmlRpcForNewbieshttp://davenet.userland.com/1998/07/14/xmlRpcForNewbieshttp://frontier.userland.com/stories/storyReader$1077http://frontier.userland.com/stories/storyReader$1077http://frontier.userland.com/stories/storyReader$1077http://frontier.userland.com/stories/storyReader$1077http://davenet.userland.com/1998/07/14/xmlRpcForNewbieshttp://www.xmlrpc.com/http://xml.apache.org/cocoon/http://www.w3.org/TR/xslthttp://www.w3.org/Style/XSL/http://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pmhttp://search.cpan.org/doc/KMACLEOD/XML-Grove-0.46alpha/lib/XML/Grove.pm


38/47

This is an astoundingly excellent book! Because it's intermediate level, once you've finished thistutorial you can graduate directly to this book.

The first 8 chapters are built quite a bit like this tutorial, with code progressions to walk youthrough the process of learning the principles. McLaughlin starts the reader on SAX, then walks

you through creating, parsing and interpreting DTD's and Schemas, and finally giving a thoroughindoctrination in DOM and JDOM. Chapter 12, "Creating XML with Java", is also necessary tobasic XML programming. All the material is thorough and rigorous, with programs done in aJavanically compliant way.

From there he goes on to discuss all the Kewl things you can do with XML now that you knowits principles, including XML publishing frameworks, XML-RPC, XML and Enterprise JavaBeans, and finally business to business examples (think that might be a valuable skill?).

This is the book I would have written if I knew as much XML and Java as McLaughlin.

I don't say that lightly. In scope, depth, organization, and writing style, I find "Java and XML"quite similar to Samba Unleashed.

If you have anything to do with XML, even if you're not a Java person -- get this book!

XML Processing with Python by Sean McGrath: ISBN 0-13-021119-2

This book was where I got my first real XML knowledge. It's an excellent book, especially if youconsider Python easier than Perl and Java. It comes with a CD full of everything you'll need forthe exercises in the book. There are chapters on DOM and SAX. If you're a Python programmer,this is the XML book for you.

One word of warning: "XML Processing with Python" is heavily weighted toward Pyxie XMLmethodology, and the utilities and tools on the CD aren't those you'd typically download frompython.org. So if you want to learn generic XML, especially language independent, that's adisadvantage. BUT, if you're a Python guy (and most of us are whether we admit it or not), thetools that come with this book, ESPECIALLY Pyxie, can have you up and running with XML inrecord time.

The DOM Specification:

One might say this isn't a book, but the Level 2 Core PDF is 107 pages. Taken together, they're

as big as a book. And man, they're well written and informative. I didn't really understand XMLuntil I read the DOM spec, and then it was obvious. Read it. URL's below:

http://www.w3.org/TR/DOM-Level-2-Core/ http://www.w3.org/TR/DOM-Level-2-Views/ http://www.w3.org/TR/DOM-Level-2-Style/ http://www.w3.org/TR/DOM-Level-2-Events/ http://www.w3.org/TR/DOM-Level-2-Traversal-Range/
http://www.w3.org/TR/DOM-Level-2-Core/http://www.w3.org/TR/DOM-Level-2-Core/http://www.w3.org/TR/DOM-Level-2-Views/http://www.w3.org/TR/DOM-Level-2-Views/http://www.w3.org/TR/DOM-Level-2-Style/http://www.w3.org/TR/DOM-Level-2-Style/http://www.w3.org/TR/DOM-Level-2-Events/http://www.w3.org/TR/DOM-Level-2-Events/http://www.w3.org/TR/DOM-Level-2-Traversal-Range/http://www.w3.org/TR/DOM-Level-2-Traversal-Range/http://www.w3.org/TR/DOM-Level-2-Traversal-Range/http://www.w3.org/TR/DOM-Level-2-Events/http://www.w3.org/TR/DOM-Level-2-Style/http://www.w3.org/TR/DOM-Level-2-Views/http://www.w3.org/TR/DOM-Level-2-Core/


39/47

http://www.w3.org/TR/DOM-Level-2-HTML/XML Devcon

There comes a time when reading and performing exercises just aren't enough. You want to rub

elbows with your peers. Luckily, there are two more XML Devcons this year.www.xmldevcon2001.com lists this year's remaining conferences as:

New York City Conference: April 8-11 Exhibition: April 9-10 San Jose Conference: Fall 2001

This is put on by Camelot Communications, the same people who put on ApacheCon. I went tothe 2000 ApacheCon in Orlando, and it rocked. These conferences sell out, so if you want to goto the New York conference, sign up soon athttp://www.xmldevcon2001.com/NY/html/registration.html. You can order a free Exhibit Passfor the New York event, good for all Expo Days that gets you into the exhibit floor, the

Keynotes, the Technical Briefings, the Management Briefings, and All Special Events. Go toregistration URL and click "For Free Exhibit Pass".

Steve Litt is the author ofRapid Learning: Secret Weapon of the Successful Technologist. He can be reached

atSteve Litt's email address.

Apache Software Foundation and W3C Rule!

By Steve Litt

As you can imagine, writing this issue of Troubleshooting Professional took some time. And themore time I spent researching, the more obvious it became that the Apache Software Foundationand the World Wide Web Consortium are two of the most powerful software entities on earth. Ithink of them as the legislative and executive branch. W3C manages the creation of thespecifications. And Apache Software Foundation maintains the actual projects.

When it comes to standards based specs, look what W3C has to offer. They offer the standardspecifications for XML, XSL, XML Schemas, DOM, HTML, SVG (Scalable Vector Graphicsvarient of XML), Cascading Style Sheets. There's a working draft for XQuery -- a querylanguage to extract info from XML docs. They have a working draft of the WAI -- the WebAccessibility Initiative. I've just scratched the surface. And best of all, these are *standards*.

They won't be changed or kidnapped at the whim of a corporation.

As I researched for this month's magazine, it started looking like whatever W3C recommends,Apache Software Foundation builds or maintains. Sometimes the projects are initiated atcorporations, but ASF has a reputation for running Open Source projects, so when the originatorswant to leave their project in good hands and move on to other things, they leave it to ASF. Andof course, many ASF projects start at ASF. During the writing of this magazine, I saw so much
http://www.w3.org/TR/DOM-Level-2-HTML/http://www.w3.org/TR/DOM-Level-2-HTML/http://www.troubleshooters.com/bookstore/rl.htmhttp://www.troubleshooters.com/bookstore/rl.htmhttp://www.troubleshooters.com/bookstore/rl.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://troubleshooters.com/email_steve_litt.htmhttp://www.troubleshooters.com/bookstore/rl.htmhttp://www.w3.org/TR/DOM-Level-2-HTML/


40/47

kewl stuff from ASF that I almost forgot they're the source of the worlds most popular webserver.

I'd like to take a quick look at just a few of the software tools you can download from theApache Software Foundation website.

Near and dear to my heart is Xerces, the "parser" that made it possible to do this tutorial. Thereason I put quotes around the word is Xerces can do things far beyond mere parsing. It containsthe entire DOM interface, and all sorts of other things. And every bit of it works consistently,exactly like you'd expect it to. I had forgotten how much fun it is to work with software tools sosolid you can spend your mental effort in design, rather than workaround. I used Xerces for Java,but there appear to be versions for C++ and Perl. And according to an email I just got, the Perlversion now works with Linux.

Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XMLdocument types.

Cocoon is an XML framework. The way I interpret their description, you author your content inXML, which has no appearance component. There's then a logic component, which I don'tunderstand, and finally an XSL component to map each XML entity to an appearance. The way Iread this, the subject matter expert can write his content without having to worry aboutappearance. Obviously I don't understand the full picture. You might want to have a lookyourself. It sounds mighty powerful.

What can I say about SOAP: :-) :-) :-). My understanding is that Microsoft originated SOAP,which is a lightweight data exchange mechanism for distributed computing. Now it's beensubmitted to W3C. IBM has implemented it, and as you can see Apache supports it. The way it

looks to me, the community intercepted Microsofts pass and scored a touchdown.

Then there's Batik: Whoooaaa! This is a series of core modules to work with the SVG (ScalableVector Graphics) XML dialect. Something very similar to the data format of Dia. Batik workswith Java. Imagine being able to draw a picture in a Vector Drawing Program (vector drawingsconsume much less bandwidth than their bitmap graphics cousins), and have it visible in anybrowser with the proper Batik plugin! Ya know, I'm sick of creating diagrams in Dia and thenhaving to tweak them in Gimp to show them to the world. I'm not saying Batik can do that yet,but it's what crossed my mind when I read what it is. The W3C tested six SVG implementations,including Adobe and JASC (the Paintshop Pro people), and Batik did exceptionally well in allareas except animation. See the results at http://www.w3.org/Graphics/SVG/Test/BE-ImpStatus.

Summary

What do you think of when someone utters the phrase "the most powerful software entity of ourtime". Until a few days ago I thought of the crumbling but still mighty empire. And if moneywere the measure of power, Microsoft would still be the most powerful. But if quality, reliability,and staying power are the measure, W3C and ASF far surpass mere corporations.


41/47

So the next time you get an assignment in "new technology", your very first move should be tocheck W3C for a recommendation or working draft, and to check Apache Software Foundationfor an implementation. A