Benefits of Jaxp

Embed Size (px)

Citation preview

  • 8/3/2019 Benefits of Jaxp

    1/8

    The benefits of JAXPOne of the most important technologies available in java is the APIs used to work withXML. There are basically two ways to work with XML documents. SAX involves an eventdriven means of processing XML using callbacks to handle the relevant events. DOMinvolves using an in-house tree structure of the XML document. Sun Microsystems createda Java API for XML Processing (JAXP) toolkit which makes XML manageable for alldevelopers to use. It is a key component for exploiting all the possibilities with using XMLtechnology such as building web services.

    In this article Im assuming that you have some basic knowledge of XML although you maynot know very much about XML parsing. If not, there are a large number of booksavailable to help you with understanding the basics of XML. Now lets get started!

    There are two key things for any developer using the JAXP to remember when deciding

    which of the two APIs to use in their project for parsing an XML document. If you arefocused on making one pass through the document and want to use the events initiated bythis to capture key information, than the Simple API for XML (SAX) is the API that you wantto use. If you are looking to manipulate, transform or query a document than you are betterto use the Document Object Model (DOM). In fact, one cannot really call them APIs butrather abstraction layers since you are able to plug in the parsers that you prefer toperform these operations.

    The BasicsIn order to use a parser irrespective of what you are trying to do, in general the process isexactly the same. The steps are the following:

    Create a parser objectPass your XML document to the parserProcess the resultsWith this process in mind, one can start to build applications that take advantage of XML.Of course the process of building applications or web services are more involved than this.But this shows the typical flow for an application using XML.

    Types of parsersThere are different ways to categorize parsers. There are parsers that support theDocument Object Model (DOM) as well as those that support the Simple API for XML. The

    parsers using these abstraction models are written in a number of languages includingJava, Perl and C++. One can also differentiate between validating and non-validatingparsers. XML documents that use a schema or older documents using a DTD and followthe rules defined in that schema or DTD are called valid documents. XML documents thatfollow the basic tagging rules are called well-formed documents. The XML specificationrequires all parsers to report errors when they find that a document is not well- formed.Validation, is however a completely different issue. Validating parsers validate XMLdocuments as they parse them. Non-validating parsers ignore any validation errors. Inother words, if an XML document is well-formed, a non-validating parser doesnt care if thedocument follows the rules specified in its schema (if any).

    The benefits of non-validating parserThe benefit of using non-validating parser is the gain in speed and efficiency due to thetime saved avoiding the validation of the document. It takes a significant amount of effort

  • 8/3/2019 Benefits of Jaxp

    2/8

    for an XML parser to process a schema and make sure that every element in an XMLdocument follows the rules of the schema. One would only attempt this if one is confidentthat the XML document is already valid (either something that has been used within yourorganization or from a trusted source), so theres no point in validating it again. Anotherscenario is when you want to find all of the XML tags in a document. Once you haveacquired them, you can use them to extract the data from them and process them.

    The Simple API for XML (SAX)The SAX API is an event driven means of working with the contents of XML documents. Itwas developed by David Megginson and other members of the XML-Dev mailing list.When you parse an XML document with a SAX parser, the parser generates events atvarious points in your document. You then use callback functions to decide what to do witheach of those events. A SAX parser generates events at the start and end of a document,at the start and end of an element, when it finds characters inside an element, and atseveral other points. You write the Java code (callback) that handles each event, and youdecide what to do with the information you get from the parser.

    Working with SAXIn the SAX model, we send our XML document to the parser, and the parser notifies uswhen certain events happen. Its up to us to decide what we want to do with those events;if we ignore them, the information in the event is discarded. The SAX API defines anumber of events. You can write Java code that handles all of the events you care about. Ifyou dont care about a certain type of event, you dont have to write any code at all. Justignore the event, and the parser will discard it. Here is a list of most of the commonly usedSAX events. There are other SAX events but are not relevant for this article. Theyre partof the DefaultHandler class in the org.xml.sax.helpers package. startDocument - Signals the start of the document.

    endDocument - Signals the end of the document. startElement - Signals the start of an element. The parser fires this event when all of thecontents of the opening tag have been processed. This includes the name of the tag andany attributes it might have. endElement - Signals the end of an element.

    characters - Contains character data, similar to a DOM Text node.

    A Simple SAX Parser using JAXPSo a simple SAX Parser uses the following typical routine:1. Create a SAXParser instance using the SAXParserFactory for instantiating a specific

    vendors parser implementation.

    2. Register callback implementations (by extending DefaultHandler or another callbackclass)

    3. Start parsing and sit back as your callback implementations are fired off.

    JAXP's SAX component provides a simple means for doing all of this. JAXP lets youprovide a parser as a Java system property. The parser that is used is Sun's version ofXerces. You can change the parser to another implementation by just changing theclasspath setting without any need to recompile any code. That is the beauty of JAXP.

    Once you have set up the factory, invoking newSAXParser(), it returns a ready-to-useinstance of the JAXP SAXParser class. This class wraps an underlying SAX parser (aninstance of the SAX class org.xml.sax.XMLReader). It also protects you from using anyvendor-specific additions to the parser class. (Remember the discussion about the

  • 8/3/2019 Benefits of Jaxp

    3/8

    XmlDocument class earlier in this article?) This class allows actual parsing behavior to bekicked off. The First figure shows the handler with all the callbacks

    Figure 1

    class SimpleHandler extends DefaultHandler {

    // SAX callback implementations from DocumentHandler, ErrorHandler, etc.

    private Writer out;

    public SimpleHandler() throws SAXException {

    try {

    out = new OutputStreamWriter(System.out, "UTF8");

    } catch (IOException e) {

    throw new SAXException("Error getting output handle.", e);

    }

    }

    public void startDocument() throws SAXException {

    print("\n");

    }

    public void startElement(String uri, String localName,

    String qName, Attributes atts)

    throws SAXException {

    print("");

    }

    public void endElement(String uri, String localName,String qName) throws SAXException {

    print("\n");

    }

    public void characters(char[] ch, int start, int len) throws SAXException {

    print(new String(ch, start, len));

    }

    private void print(String s) throws SAXException {

    try {

    out.write(s);

    out.flush();

    } catch (IOException e) {

    throw new SAXException("IO Error Occurred.", e);}

    }

    }

  • 8/3/2019 Benefits of Jaxp

    4/8

    The next figure shows the steps for how to create, configure, and use a SAX factory.

    Figure 2

    Working with Document Object Model (DOM)The Document Object Model defines an interface that enables programs to access andupdate the style, structure, and contents of XML documents. XML parsers that support the

    DOM implement that interface. When you use a DOM parser to parse an XML document,you get back a tree structure that contains all of the elements of the document. The DOMprovides a variety of functions you can use to examine the contents and structure of thedocument. Here are the methods which you will commonly used: Document.getDocumentElement() Returns the root element of the document.

    Node.getFirstChild() Returns the first child of a given Node.

    Node.getLastChild() Returns the last child of a given Node.

    Node.getNextSibling() This method returns the next sibling of a given Node.

    Node.getPreviousSibling() This method returns the previous sibling of a given Node.

    Node.getAttribute(attrName) For a given Node, returns the attribute with the requested

    name.

    A Simple DOM Parser using JAXPDOM with JAXP is almost the same as using SAX. The differences are primarily in thenames of the classes and the return types. JAXP is responsible for return aorg.w3c.dom.Document object from parsing. The XML document and is made up of DOMnodes that represent the elements, attributes, and other XML constructs.

    Unlike with SAX we dont have any callback handler so it is just a matter of parsing theXML document and then using the DOM object for addressing our needs. In this example,we show how to write out the DOM tree both forwards and in reverse.

    public class SimpleSAXParsing {

    public static void main(String[] args) {

    try {

    if (args.length != 1) {

    System.err.println ("Usage: java SimpleSAXParsing [filename]");

    System.exit (1);}

    // Get SAX Parser Factory

    SAXParserFactory factory = SAXParserFactory.newInstance();

    // Turn on validation, and turn off namespaces

    factory.setValidating(true);

    factory.setNamespaceAware(false);

    SAXParser parser = factory.newSAXParser();

    parser.parse(new File(args[0]), new SimpleHandler());

    } catch (ParserConfigurationException e) {

    System.out.println("The underlying parser does not support " +

    " the requested features.");

    } catch (FactoryConfigurationError e) {

    System.out.println("Error occurred obtaining SAX Parser Factory.");

    } catch (Exception e) {e.printStackTrace();

    }

    }

    }

  • 8/3/2019 Benefits of Jaxp

    5/8

    Figure 1

    public void write(Node node, String indent) {

    switch(node.getNodeType()) {

    case Node.DOCUMENT_NODE: {

    Document doc = (Document)node;

    out.println(indent + "");

    Node child = doc.getFirstChild();

    while(child != null) {

    write(child, indent);

    child = child.getNextSibling();

    }

    break;

    }

    case Node.DOCUMENT_TYPE_NODE: {

    DocumentType doctype = (DocumentType) node;

    out.println("");

    break;

    }

    case Node.ELEMENT_NODE: {

    Element elt = (Element) node;

    out.print(indent + "");

    String newindent = indent + " ";

    Node child = elt.getFirstChild();

    while(child != null) {

    write(child, newindent);

    child = child.getNextSibling();}

    out.println(indent + "");

    break;

    }

    case Node.TEXT_NODE: {

    Text textNode = (Text)node;

    String text = textNode.getData().trim();

    if ((text != null) && text.length() > 0)

    out.println(indent + fixup(text));

    break;

    }

    case Node.PROCESSING_INSTRUCTION_NODE: {

    ProcessingInstruction pi = (ProcessingInstruction)node;

    out.println(indent + "");

    break;

    }

    case Node.ENTITY_REFERENCE_NODE: {

    out.println(indent + "&" + node.getNodeName() + ";");

    break; }

    case Node.CDATA_SECTION_NODE: {

    CDATASection cdata = (CDATASection)node;// Careful! Don't put a CDATA section in the program itself!

    out.println(indent + "");

    break; }

    case Node.COMMENT_NODE: {

    Comment c = (Comment)node;

    out.println(indent + "");

    break;

    }

    default:

    System.err.println("Ignoring node: " + node.getClass().getName());

    break;

    }

    }

  • 8/3/2019 Benefits of Jaxp

    6/8

    Figure 2

    public void reverse(Node node, String indent) {

    switch(node.getNodeType()) {

    case Node.DOCUMENT_NODE: {

    Document doc = (Document)node;

    out.println(indent + "");

    Node child = doc.getLastChild();

    while(child != null) {

    reverse(child, indent);

    child = child.getPreviousSibling(); } break;

    }case Node.DOCUMENT_TYPE_NODE: {

    DocumentType doctype = (DocumentType) node;

    out.println("");

    break; }

    case Node.ELEMENT_NODE: {

    Element elt = (Element) node;

    out.print(indent + "");

    String newindent = indent + " ";Node child = elt.getLastChild();

    while(child != null) {

    reverse(child, newindent);

    child = child.getPreviousSibling(); }

    out.println(indent + "");

    break; }

    case Node.TEXT_NODE: {

    Text textNode = (Text)node;

    String text = textNode.getData().trim();

    if ((text != null) && text.length() > 0)

    out.println(indent + fixup(text));

    break; }

    case Node.PROCESSING_INSTRUCTION_NODE: {ProcessingInstruction pi = (ProcessingInstruction)node;

    out.println(indent + "");

    break; }

    case Node.ENTITY_REFERENCE_NODE: {

    out.println(indent + "&" + node.getNodeName() + ";");

    break;

    }

    case Node.CDATA_SECTION_NODE: {

    CDATASection cdata = (CDATASection)node;

    // Careful! Don't put a CDATA section in the program itself!

    out.println(indent + "");

    break;

    }

    case Node.COMMENT_NODE: {Comment c = (Comment)node;

    out.println(indent + "");

    break;

    }

    default:

    System.err.println("Ignoring node: " + node.getClass().getName());

    break;

    }

    }

  • 8/3/2019 Benefits of Jaxp

    7/8

    Figure 3

    public static void main(String[] args) throws Exception {

    String filename = null;

    boolean dtdValidate = false;

    boolean xsdValidate = false;

    String schemaSource = null;

    boolean ignoreWhitespace = false;

    boolean ignoreComments = false;

    boolean putCDATAIntoText = false;

    boolean createEntityRefs = false;

    for (int i = 0; i < args.length; i++) {

    if (args[i].equals("-dtd")) {

    dtdValidate = true;} else if (args[i].equals("-xsd")) {

    xsdValidate = true;

    } else if (args[i].equals("-xsdss")) {

    if (i == args.length - 1) {

    usage(); }

    xsdValidate = true;

    schemaSource = args[++i];

    } else if (args[i].equals("-ws")) {

    ignoreWhitespace = true;

    } else if (args[i].startsWith("-co")) {

    ignoreComments = true;

    } else if (args[i].startsWith("-cd")) {

    putCDATAIntoText = true;

    } else if (args[i].startsWith("-e")) {

    createEntityRefs = true;} else if (args[i].equals("-usage")) {

    usage();

    } else if (args[i].equals("-help")) {

    usage();

    } else {

    filename = args[i];

    if (i != args.length - 1) {

    usage();

    }

    }

    }

    if (filename == null) {

    usage();

    }

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();dbf.setNamespaceAware(true);

    dbf.setValidating(dtdValidate || xsdValidate);

    if (xsdValidate) {

    try {

    dbf.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);

    } catch (IllegalArgumentException x) {

    System.err.println( "Error: JAXP DocumentBuilderFactory attribute not recognized: "

    + JAXP_SCHEMA_LANGUAGE);

    System.err.println( "Check to see if parser conforms to JAXP 1.2 spec.");

    System.exit(1);

    }

    }

    if (schemaSource != null) {

    dbf.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource));

    }dbf.setIgnoringComments(ignoreComments);

    dbf.setIgnoringElementContentWhitespace(ignoreWhitespace);

    dbf.setCoalescing(putCDATAIntoText);

    dbf.setExpandEntityReferences(!createEntityRefs);

    DocumentBuilder db = dbf.newDocumentBuilder();

    OutputStreamWriter errorWriter = new OutputStreamWriter(System.err, outputEncoding);

    db.setErrorHandler(new MyErrorHandler(new PrintWriter(errorWriter, true)));

    Document doc = db.parse(new File(filename));

    // Print out the DOM tree

    OutputStreamWriter outWriter =

    new OutputStreamWriter(System.out, outputEncoding);

    XMLDocumentWriter xmlDocWriter = new XMLDocumentWriter(new PrintWriter(outWriter, true));

    xmlDocWriter.write(doc);

    xmlDocWriter.reverse(doc);

    }

  • 8/3/2019 Benefits of Jaxp

    8/8

    Key PointThe key point Ill make is that in working with the Nodes in the DOM tree, you have tocheck the type of each Node before you work with it. Certain methods, such asgetAttributes, return null for some node types. If you dont check the node type, youll getunexpected results (at best) and exceptions (at worst).

    Which parser should you use?Use a DOM parser when:

    You need to know a lot about the structure of a document

    You need to move parts of the document around (you might want to sort certainelements, for example)

    You need to use the information in the document more than onceUse a SAX parser when:

    You only need to extract a few elements from an XML document.

    You dont have much memory to work with

    Youre only going to use the information in the document once (as opposed to parsingthe information once, then using it many times later).

    In this article, we have covered some of the basics related to using JAXP and the benefitsit provides in relation to XML processing. In a future article we will look at some of themore advanced functions used with JAXP for both SAX and DOM parsers.