Rpg Has Sax Appeal

Embed Size (px)

Citation preview

  • 7/29/2019 Rpg Has Sax Appeal

    1/22

    RPG Has SAX Appeal!

    Contributed by Jon ParisTuesday, 04 March 2008Last Updated Tuesday, 04 March 2008

    In this part of our RPG XML series, you'll learn how to use RPG's XML-SAX op-code to deal with problematic XMLdocuments and handle situations that XML-INTO cannot deal with.

    By Jon Paris

    In the previous two articles in this series, "%Handling XML-INTO Problems" and "i5/OS Offers Native XML Support inV5R4", we focused on the capabilities of RPG's XML-INTO. As we saw, this op-code processes an entire document,either as a single piece or, when needed or desired, in "chunks" by using the capabilities of the %HANDLER BIF. Thereare, however, situations when this will not work for you. This often relates to limitations in RPG's data structure (DS)capabilities. As you know, a named DS is limited to a maximum size of 64K (at least until V6R1 anyway). Suppose thateven a single repeating element will not fit into this? That may sound unlikely, but it doesn't take a huge number ofrepeating text fields to exceed this limit. Another example, and one that seems to occur quite often, arises when yourXML document contains a structure that simply cannot be represented in an RPG DS. To illustrate this, take a look at thenew version of our XML document, shown below:

    In this part of our RPG XML series, you'll learn how to use RPG's XML-SAX op-code to deal with problematic XML

    documents and handle situations that XML-INTO cannot deal with.

    In the previous two articles in this series, "%Handling XML-INTO Problems" and "i5/OS Offers Native XML Support inV5R4", we focused on the capabilities of RPG's XML-INTO. As we saw, this op-code processes an entire document,either as a single piece or, when needed or desired, in "chunks" by using the capabilities of the %HANDLER BIF. There

    are, however, situations when this will not work for you. This often relates to limitations in RPG's data structure (DS)capabilities. As you know, a named DS is limited to a maximum size of 64K (at least until V6R1 anyway). Suppose thateven a single repeating element will not fit into this? That may sound unlikely, but it doesn't take a huge number ofrepeating text fields to exceed this limit. Another example, and one that seems to occur quite often, arises when yourXML document contains a structure that simply cannot be represented in an RPG DS. To illustrate this, take a look at thenew version of our XML document, shown below:

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    2/22

    Toasters

    (A) Two slot chrome

    (B) This beautiful two slot chrome finished toaster is

    a perfect complement to any modern kitchen ...

    22.95

    15.95

    247

    Four slot matt black

    35.75

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    3/22

    23.95

    247

    Coffee Makers

    10 cup auto start

    It is substantively the same as in our previous examples, but with one very significant exception: The element can now be repeated. If that were the only difference, then we could accommodate it by adding a DIM( )keyword to the element's definition in the DS. But notice that not only does the element repeat, but there is also a newattribute, type, which is used to indicate the type of description (short or long) that is being defined. This presents us witha problem. Since an attribute is treated in the same way as a child element of the parent, the correct RPG definition for"type" would be this:

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    4/22

    d description DS

    d type 5a

    But this leaves us with nowhere to put the content of the description since the content of a DS is the sum of its subfieldsand any data placed there would overwrite those subfields. In other words, in our situation, the description wouldoverwrite the type field (or vice versa). Not a lot of help! In theory, a DS that looks like the one below should solve theproblem:

    d description DS Qualified Dim(2)

    d description 1000a Varying

    d type 5a

    In this case, the would be stored in the field description.description and the "type" attribute would be storedin description.type. Makes sense, doesn't it? Maybe to you, but sadly, not to the compiler.

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    5/22

    IBM is aware of this deficiency, and it is on their "to-do" list, but don't expect to see it in V6R1. And don't hold me to itworking the way I have described it here; IBM may well have other ideas.

    So if we cannot create a DS that matches the structure of the XML data, then we cannot use XML-INTO or at leastcannot use it for the whole task. So what are our options?

    There are effectively three options:

    - The first is to take advantage of RPG's XML-SAX op-code. This can be used either by itself to process the entiredocument or as a follow-on to an XML-INTO parse to "fill in the gaps." We will be dealing with the usage of XML-SAX in

    the balance of this article.

    The second is to reformat the document by using an XSL transform so that it is in a format that can be expressed in RPGterms. This is the approach recommended in the IBM Redbook The Ins and Outs of XML and DB2 UDB for i5/OS. If youhave the required XSL skills or are prepared to develop them, this is certainly a valid option and can also help to dealwith other issues, such as empty elements. Since the Redbook provides a good working example, we won't duplicatethat work here.

    - Another option would be to process the document in two passes using XML-INTO with a different target DS on eachpass. You would also need to use the "AllowExtra" and "AllowMissing" processing options in order to persuade theparser to handle the document since neither of the DSs will exactly match the document. This is not as effective as theXML-SAX option, so we will not be discussing it further.XML-SAX

    The operation of XML-SAX is very different from that of XML-INTO. XML-INTO parses the data from many elements at a

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    6/22

    time and places the parsed content into the appropriate field in the target DS or array. XML-SAX on the other handparses the document one event at a time. Examples of events include the beginning of an element (i.e., its starting tag),the value of an element, the end of an element (i.e., its ending tag), the name of an attribute, the value of the attribute,etc.

    With XML-INTO, the use of a handler procedure is optional, but with XML-SAX %HANDLER must always be specified.Your handler procedure will be called for every event that the parser encounters. It is up to your logic to decide if itshould simply ignore the event or react to it in some way.

    Logic is needed in the handler to recognize and react to the beginning of each element and attribute and to store thevalues in the appropriate places. You will perhaps get a better idea of the kind of logic that might be required if you studythe list below. It represents the sequence of events and the associated data (in parentheses) that would be passed to thehandler when processing the section of the XML document that begins at (A) above and ends at (B).

    Start Element (description)

    Attribute Name (type)

    Attribute Characters (short)

    End Attribute (type)

    Element Characters (two-slot chrome)

    End Element (description)

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    7/22

    Notice that when we receive the element and attribute data, we have no idea which element/attribute it belongs to. Thatis up to us to determine. In fact, this is not a difficult task as the data will always belong to the last element/attribute thatbegan but has not yet ended. With so many events being signaled to your handler, you can no doubt see that writing thelogic to completely process even a simple document with XML-SAX would be somewhat tedious, requiring a lot of ratherrepetitive code. Luckily, we rarely require all of the data in a document, and we also have the option to combine XML-SAX with XML-INTO to simplify our task.

    So to handle the situation in our example, that is what we will do. We will use XML-INTO to capture the bulk of the dataand then process again using XML-SAX to fill in the missing piece: the type codes associated with the descriptions.

    Let's look at the code that achieves this (shown at the end of this article).

    The first thing to notice is the change in the product DS (A). Notice that we have made the description field an array withtwo elements and also added the type field as a two-element array. Note that the name of the type field in the DS(descrType) does not match the name of the attribute (type) to ensure that XML-INTO will not try to populate it and to

    make that fact more obvious to those who come after us. In fact, there is no need to actually include the type in the DS atall, but it is convenient to keep all the data together.

    The XML-INTO must have the "allowextra=yes" option specified (B) to accommodate the extra type fields. Without thisoption, the parse would fail since the new version of the DS no longer corresponds to the XML document. Once XML-INTO has completed, we invoke XML-SAX (C) to reprocess the document.

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    8/22

    There is no difference in the definition of %HANDLER, but there is a difference between the information passed to anXML-SAX handler and the information passed to the XML-INTO handler we saw in the last article. Take a look at theprototype at (D) and you will see what I mean. The only parameter that is common to the two versions is the first one, the

    Communication Area. The remaining parameters are as follows:

    event is a four-byte integer that identifies the type of event being processed. Don't worry about the fact that theevent is identified by a number. As you will see later, RPG supplies a number of named constants that can be comparedwith the event value.

    pstring is a pointer to the beginning of the string containing the event data (e.g., the element/attribute names ordata).

    stringLen is the length of the string "pointed to" by the previous parameter. This length must be used to determineif data is present as there are occasions when a valid pointer is passed even though there is no data. Only the number ofcharacters indicated by this parameter should be processed.

    exceptionId is an error code identifying any error passed to the handler by the parser. We will not be discussingthis in this article. Check the RPG manuals for more information.

    Having seen the parameters passed to the handler, it is time to study the mechanics of the handler procedure

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    9/22

    MySAXHandler. The first step (E) is to check whether any data was received. If no data is received, then the handlersimply returns control to the parser. If data is present, then the procedure RmvWhiteSpace( ) is called to remove anyunwanted characters and reduce them to a single space. We will look at what I mean by "unwanted" in a moment. Noticethat %SUBST is used to pass only the valid portion of the data to the subprocedure. Remember, we were passed only apointer and a length, and there is probably other data beyond the point indicated by the length parameter. It is worthnoting at this point that the field string, which is based on the pointer, can be very useful during debug. If you display it,you will usually be able to see not only the data you are about to process, but also the next part of the XML document. Inother words, you will know what to expect next and can perhaps set appropriate breakpoints. This is not guaranteed as

    sometimes the pointer references a work area, but it is worth remembering.

    What do we mean by "unwanted" and why do we need the RmvWhiteSpace routine? Because carriage returns, newlines, tabs, and excess spaces are often present in XML data (sometimes to make it look "pretty"), and we need toremove them from the data. We will not be studying the detail of this procedure, but you will find it included in the version

    of the program that is available for download. Hopefully, its operation is self-explanatory. (Many thanks to IBM Toronto'sBarbara Morris for supplying this routine.)

    At (F), the real work begins. A SELECT group is used to identify the type of event we are handling; this is where thenamed constants mentioned earlier come into play. For example, *XML_START_ELEMENT represents the event code

    that announces the arrival of a new element name. In the SELECT group at (G), we then identify the specific elementthat we are dealing with and process accordingly. All this logic is really doing is setting up the appropriate array indicesfor the Category, Product, and Description arrays. Since we know that the document we are processing is the same onethat we just parsed with XML-INTO, we can afford to short-circuit the process, so no attempt is made to match theproduct codes with the descriptions or anything.

    If the event does not represent the beginning of an element, then we next test to see if it is an attribute name (H). If it is,we check to see if it is the type attribute, and if so, we turn on the waitingForType indicator. This indicator allows us toassociate the attribute data when it arrives (I) as belonging to the type attribute. Remember, we said earlier that it is up tous to determine that. We then store the value for the type attribute in the appropriate descrType array element.

    After processing the document, the XML-SAX parse completes and control returns to the program's main line at (J). Atthis point, the complete content of the XML document has been stored in our category DS, so our program can processor store that data as necessary. In this simple example, we will just display the data. The logic simply loops through all ofthe categories and products. As in our previous example, the category loop is controlled by the RPG-supplied

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    10/22

    xmlElements count in the Program Status Data Structure, which was populated by the XML-INTO operation, and theproduct loop completes when a blank product code is encountered. The format of our XML document is such that theremust be a short description, so the first elements of the description and type arrays are displayed. At (K), the logic thentests to see if a second set is present and, if it is, displays the relevant data.

    And that's really all there is to it. I won't describe it here, but I have included in the source code accompanying this articlea utility program (XMLSAXLIST) that you might find useful when studying XML documents that you need to process. Ituses XML-SAX to parse the document and produces a listing of all the events signaled and the length and content of theassociated data. If you run the program, you will be able to see the effect of the RmvWhiteSpace procedure as theoriginal length of the data item is included. If you have any questions about the operation of the program, please let meknow.

    H Option(*NoDebugIO : *SrcStmt )

    // This count is populated by XML-INTO whenever the INTO

    // variable is an array

    D progStatus SDS

    D xmlElements 20i 0 Overlay(progStatus: 372)

    (D) D MySAXHandler Pr 10i 0

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    11/22

    D commArea Like(dummyCommArea)

    D event 10i 0 Value

    D pstring * Value

    D stringLen 20i 0 Value

    D exceptionId 10i 0 Value

    D RmvWhitespace pr 65535a Varying

    D input 65535a Varying Const

    D category DS Qualified Dim(20)

    D code 2a

    D catDescr 20a

    D product LikeDS(product) Dim(50)

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    12/22

    D product DS Qualified

    D code 4a

    (A) D descrType 5a Dim(2)

    D description 600a Dim(2)

    D mSRP 7p 2

    D sellPrice 7p 2

    D qtyOnHand 5i 0

    D XML_Source S 256a Varying

    D Inz('/Partner400/XML/Example5.xml')

    // Short version of Description for display purposes

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    13/22

    D dispDescription...

    D S 40a

    D dummyCommArea S 1a

    D i S 5i 0

    D p S 5i 0

    /Free

    (B) XML-INTO category

    %XML(XML_Source: 'case=any doc=file allowextra=yes +

    allowmissing=yes');

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    14/22

    // XML-INTO has filled the category array

    // Next we use XML-SAX to fill in the missing type details

    (C) XML-SAX %HANDLER(MySAXHandler: dummyCommArea)

    %XML(XML_Source: 'doc=file');

    Dsply ('xmlElements = ' + %char(xmlElements) );

    // The XML parser's element count is used to control the loop

    (J) For i = 1 to xmlElements;

    Dsply ('Cat: ' + category(i).code + ' ' +

    category(i).catDescr );

    For p = 1 to %Elem(category.product);

    If category(i).product(p).code = *Blanks;

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    15/22

    Leave; // Exit once blank product code entry located

    Else;

    // Process the current product entry

    dispDescription = category(i).product(p).description(1);

    Dsply ('Product: ' + dispDescription);

    Dsply ('Type: ' + category(i).product(p).descrType(1));

    // If second description is present, display details

    (K) If category(i).product(p).description(2) *Blanks;

    dispDescription = category(i).product(p).description(2);

    Dsply ('Product: ' + dispDescription);

    Dsply ('Type: ' + category(i).product(p).descrType(2));

    EndIf;

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    16/22

    EndIf;

    EndFor;

    EndFor;

    *InLR = *On;

    /End-Free

    // SAX handler

    P MySAXHandler B

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    17/22

    D PI 10i 0

    D commArea Like(dummyCommArea)

    D event 10i 0 Value

    D pstring * Value

    D stringLen 20i 0 Value

    D exceptionId 10i 0 Value

    D string S 65535a Based(pstring)

    D data S 65535a Varying

    // Static variables used by handler logic

    D catIndex S 10i 0 Static

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    18/22

    D prodIndex S 10i 0 Static

    D descIndex S 5i 0 Static

    D waitingForType S n Static

    // Constants to identify the element and attribute

    // names we are interested in.

    D categorElem C 'Category'

    D prodElem C 'Product'

    D descrElem C 'Description'

    D typeAttr C 'type'

    /free

    // If any data is supplied strip whitespace from it

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    19/22

    // otherwise just return to parser

    (E) If stringLen > 0;

    data = RmvWhiteSpace(%subst(string : 1 : stringLen));

    Else;

    return 0;

    endif;

    (F) Select;

    When event = *XML_START_ELEMENT;

    // Whenever we start a new element, we increment the index

    // for that level and zero the index for the next level.

    (G) Select;

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    20/22

    When data = categorElem;

    catIndex += 1;

    prodIndex = 0;

    When data = prodElem;

    prodIndex += 1;

    descIndex = 0;

    When data = descrElem;

    descIndex += 1;

    EndSl;

    (H) When event = *XML_ATTR_NAME;

    // Turn "waiting" indicator when beginning a "type" attribute

    If data = typeAttr;

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    21/22

    waitingForType = *On;

    EndIf;

    (I) When event = *XML_ATTR_CHARS;

    // If waiting for type information then store type

    if waitingForType;

    category(catIndex).product(prodIndex).descrType(descIndex)

    = data;

    waitingForType = *Off;

    EndIf;

    EndSl;

    MC Press Online

    http://www.mcpressonline.com Powered by Joomla! Generated: 29 August, 2008, 00:07

  • 7/29/2019 Rpg Has Sax Appeal

    22/22

    return 0;

    /end-free

    P E

    MC Press Online