WebSphere Message Broker Coding Tips

WebSphere Message Broker Coding Tips

Tim Dunn [email protected]

V1.0

Last Updated 22nd August 2008

Introduction....................................................................................................................3 Identifying the areas in which Processing Costs Arise..................................................4

Parsing........................................................................................................................5 Message/Business Processing....................................................................................5 Navigation..................................................................................................................5 Tree Copying .............................................................................................................6 Resources ...................................................................................................................6

General Message Flow Coding Considerations .............................................................7 Identify the Critical Path of Processing [CPU, Memory] ......................................7 Minimize number of Compute & JavaCompute nodes [CPU, Memory] ..............7 Avoid Consecutive Short Message Flows [CPU, Memory] ..................................8 Maximise use of the built-in parsers [CPU, Memory]...........................................8 Use Subflow’s carefully [CPU, Memory] .............................................................8 Watch the order in which you define message tree lements [CPU] ......................9

ESQL........................................................................................................................10 Array Variables [CPU] ........................................................................................10 CARDINALITY function [CPU] ........................................................................10 CREATE Statement [CPU, Memory]..................................................................11 DECLARE statements [Memory]........................................................................11 EVAL Statement [CPU] ......................................................................................11 FORMAT Clause [CPU]......................................................................................11 IF and CASE statements [CPU]...........................................................................11 PASSTHRU Statement [CPU].............................................................................12 PROPAGATE [CPU, Memory]...........................................................................12 Reference Variables [CPU]..................................................................................13 Shared Variables [CPU].......................................................................................13 Minimize use of String Manipulation Functions [CPU]......................................13 Volume of ESQL [CPU, Memory] ......................................................................13 Minimize Navigation of the Logical Tree [CPU] ................................................13

Java ..........................................................................................................................14 Storing Intermediate tree references [CPU].........................................................14 String Concatenation [CPU] ................................................................................14 Optimise BLOB Processing [CPU] .....................................................................15 Java Code .............................................................................................................15

Examples......................................................................................................................16 What Not to do.........................................................................................................16

Array Subscripts...................................................................................................16 Memory Use.........................................................................................................17

What to do................................................................................................................19 Large Repeating Structures..................................................................................19

Introduction The purpose of this document is to provide coding tips for Message Broker message flow developers. WebSphere Message Broker provides a variety of transformation techniques to the message flow developer or analyst. These range from coding to mapping techniques which use drag and drop technology. These techniques are:

• ESQL code written in nodes such as Compute, Filter and Database nodes • Java written a JavaCompute node • eXtensible Stylesheet running in the XMLT node • Use of drag and drop facility in the Mapping node • WebSphere Transformation Extender running in the WebSphere

Transformation Extender plug-in node In theses facilities WebSphere Message Broker explicitly provides support for two programming languages: ESQL and Java. As with any programming language it is possible to unwittingly write ESQL or Java code that is inefficient. This often arises because the developer is not familiar with the implications of using certain features or artefacts of the programming language in a particular way. To help developers produce more efficient message flows and in particular to code more efficient ESQL and Java code in WebSphere Message Broker message flows this article outlines the key performance issues and documents some recommended best practices for code development. You will see by the side of each tip an indication of whether it helps with CPU and/or memory consumption.

[CPU] indicates that using this tip will help to reduce CPU usage by a message flow. [Memory] indicates that using this tip will help to reduce the amount of memory used by a message flow. [CPU, Memory] indicates that using this tip will help to reduce both CPU and memory usage by a message flow.

Identifying the areas in which Processing Costs Arise Before we look at specific coding tips it is helpful to understand the different areas in which processing costs typically rise during the execution of a message flow. Figure 1 below shows a simple message flow. In this message flow a message is read by the MQInput node. Next, a Filter node examines the incoming message to determine whether it is an order or process payment type. Once the message type has been determined the specific processing for that type of message takes place.

Figure 1. A simple Routing Message Flow. If it is an order the top path of execution through the Order Analysis and Order Processing nodes is followed. If it is a payment the bottom path through the Process Payment node is followed. When this or any message flow executes processing costs arise in the following areas:

o Parsing. This has two parts. The processing of incoming messages and the creation of output messages. As parsing proceeds a message tree is populated in which the elements of the incoming message are represented.

o Message/Business Processing. This is the routing and transformation logic which you code in ESQL, Java, Mapping node, XSL or WebSphere TX mappings.

o Navigation. This is the process of "walking" the message tree to access the elements which are referred to in the ESQL or Java.

o Tree Copying. This occurs in nodes which are able to change the message tree such as Compute nodes. A copy of the message tree is taken for recovery reasons.

o Resources. This is the cost of invoking resource requests such as reading or writing WebSphere MQ messages or making database requests.

Parsing Before an incoming message can be processed by the nodes or ESQL it must transformed from the sequence of bytes, which is the input message, into a structured object, which is the message tree. Some parsing will take place immediately such as the parsing of the MQMD (assuming the incoming message is an MQ message), some will take place, on demand, as fields in the message payload are referred to within the message flow. The amount of data which needs to be parsed is dependent on the organization of the message and the requirements of the message flow. Not all message flows may require access to all data in a message. When an output message is created the message tree needs to be converted into an actual message. This is a function of the parser. The process of creating the output message is referred to as serialization or flattening of the message tree. The creation of the output message is a simpler process than reading an incoming message. The whole message will be written at once when an output message is created. Figure 2 below shows this processing schematically.

Figure 2. Parsing and Serialisation in a Message Flow.

Message/Business Processing It is possible to code message manipulation or business processing in any one of a number of transformation technologies. These were touched on at the beginning of the article. It is through these technologies that the input message is processed and the tree for the output message is produced. The cost of running this processing is dependent on the amount and complexity of the transformation processing that is coded. Navigation The cost of navigation is dependent on the complexity and size of the message tree which is in turn dependent on the size and complexity of the input messages and the complexity of the processing within the message flow. As the message tree changes shape over the course of execution of the message so will the costs of accessing different parts of the tree. The cost will be proportional to the depth of tree. There

are steps which can be taken to reduce the cost of navigating the tree and we will touch on these in the sections on ESQL and Java coding Tree Copying This occurs in nodes which are able to change the message tree such as Compute nodes. A copy of the message tree is taken for recovery reasons so that if a compute node makes changes and processing in node incurs or generates an exception the message tree can be recovered to a point earlier in the message flow. Without this a failure in the message flow downstream could have implications for a different path in the message flow. Tree copying does not happen in the Filter or Database nodes as these nodes cannot modify the message tree. A tree copy is a copy of a structured object and so is relatively expensive. It is not a copy of a sequence of bytes that is being copied. For this reason it is best to minimize the number of such copies, hence the general recommendation to minimize the number of compute nodes (Compute and JavaCompute) in a message flow. Resources The cost of processing messages and databases is dependent on the type (read/write/update for example), the exact type of resource (non persistent or persistent message for example) and the level of activity. Processing non persistent messages will cost less in CPU and I/O processing then persistent messages which need to be logged to ensure data integrity. Similarly a database read will cost less in CPU and I/O activity than a database insert. With the insert data must be added to the db2 table and logged to ensure data integrity. The processing costs in each of the sections above can normally be reduced by following a series of coding recommendations which are given in the sections below. The recommendations are split into three distinct areas. There are those which are generic in nature and to do with the way in which the message flow is constructed. There are those which apply when coding in ESQL and finally there are those which apply when coding in Java. The primary effect of the recommendations will be to reduce CPU and memory usage. I/O reduction normally occurs as a result of issuing fewer resource requests. This is normally controlled as part of message flow design. This aspect is not covered in this article.

General Message Flow Coding Considerations These recommendations apply to all message flows.

Identify the Critical Path of Processing [CPU, Memory] Identify the most frequently used critical path of execution in the message flow and ensure that you optimise the processing along that path. This is particularly important in a message flow which processes multiple types of messages. You do not want to have traverse most of the message flow to start the processing of the most popular message type.

Minimize number of Compute & JavaCompute nodes [CPU, Memory] Message flows typically contain many processing nodes. Separation of processing logic, into multiple nodes, makes it easier to encapsulate pieces of processing logic. It also makes it easier to build an understanding the processing sequence when it is viewed in the Message Broker Toolkit. Nodes such as Filter and Database do not modify the incoming message tree. Other nodes such as the Compute and JavaCompute do allow the tree to be modified and as such it usual to take a copy of the message tree for back-out purposes in the event of an error or exception using code such as SET OutputRoot=Inputroot; This tree copy can be relatively expensive in both CPU and memory dependent on the complexity of the messages and the way in which they are processed. To reduce the impact of the tree copy you are recommended to do two things. Firstly reduce the number of Compute and JavaCompute nodes in the message flow and secondly to avoid consecutive Compute or JavaCompute nodes. It is important to think about node use and language at design time and which language you will use to implement common logic. In some situations a project might decide to implement a common function like an audit or log routine as a Java node whilst other functions are written as ESQL inside Compute nodes. When this happens it is not possible to combine the code in the Compute and JavaCompute nodes into a single node in the same way that it would be possible if all of the logic was written in the same language. Figure 1 above illustrates a case where the message flow could be optimised.. There are two paths of processing after the filter node. One for the order processing and one for the payment processing. Notice how the order processing leg has two adjacent Compute nodes (Order Analysis and Order Processing). The payment processing leg of the message flow has only a single compute node. To optimise this message flow we would combine the Order Analysis and Order Processing nodes into a single compute node as the nodes are adjacent. Given the way in which the flow has been constructed it is not possible to reduce the number of Compute nodes to fewer than two.

If two Compute nodes are separated by a ResetContentDescriptor node it is possible to combine all three nodes in to a single Compute node using function that is now available from Message Broker V5 fixpack 3 onwards. The CREATE with PARSE clause statement now means that the message broker’s parsers can be invoked from within a Compute node removing the need for the ResetContentDescriptor node.

There will be situations where you will need more than one compute node in a message flow and this is fine. It is as expected. The key thing is to avoid unnecessary additional Compute nodes. Do not take away the impression that you are being recommended to force all of the processing into a single compute node.

Avoid Consecutive Short Message Flows [CPU, Memory] Avoid consecutive short message flows in which the output of a message flow is immediately processed by another message flow as opposed to the output of the message flow being read by an external application. By using consecutive short message flows you are forcing additional parsing and serialisation of messages which is likely to be expensive. The only exception to this is the use of the Aggregation nodes. The use of multiple short message flows in this way will also lead to an increase of the level of WebSphere MQ messages as the results of the first message flow are placed on a WebSphere MQ queue for the second message flow to read. Do not confuse this advice with the case where you have a short message flow because that is all that is needed.

Maximise use of the built-in parsers [CPU, Memory] Within a message set in the MRM it is possible to define a default value for a field or specify that a field containing spaces is represented in the message tree as a null value or as hexadecimal zeros. Use these features of the MRM and reduce the amount of ESQL or Java code which you need to write. It is better to attach more than one wire format to a single logical message set model and allow the Message Broker writers to convert the data when it is written to the wire, than having to use multiple lines of ESQL or Java to copy field values from one logical message set model to another. This will often require more time and effort in the construction of the model, but will save coding effort in return, and will provide a smaller runtime memory footprint which will be long lasting.

Use Subflow’s carefully [CPU, Memory] The subflow in Message Broker is a development facility that encourages the reuse of code. They are typically embedded into multiple message flows to provide consistent implementations of functions like logging and auditing.

When they were introduced in Version 2 they were the only way of achieving code re-use and as such were widely used. There are now other facilities available which you should consider. But first let us consider a potential drawback to using subflow’s. Subflow's which contain common routines can be embedded into message flows. The subflow's are 'in-lined' into the message flow when the message flow is compiled. There are no additional nodes inserted into the message flow as a result of using subflow's. [The input and output terminals of the subflow are not processing nodes in the way that a Compute or Filter node are]. However be aware of implicitly adding extra nodes into a message flow as a result of using subflow's. In some situations compute nodes are added to subflow's to perform marshalling of data from one part of the message tree into a known place in the message tree so that the data can be processed by the common processing in the subflow. The result may then need to be copied back to another part of the message tree before the subflow completes. This approach can easily lead to the addition of two compute nodes, each of which performs a tree copy. In such cases the subflow facilitates the reuse of logic but unwittingly adds an additional processing overhead each time it is used. In Message Broker Version 5 ESQL schemas, procedures and functions were introduced. It is more efficient to achieve code re-use through the use of ESQL procedures as they do not result in the additional tree copying that can easily occur easily with subflow’s.

Watch the order in which you define message tree lements [CPU] When constructing an internal OutputRoot message tree structure (for an XML message) you must create the individual elements in the correct sequence as defined in the XSD and message set. The parser will not re-order the elements. This applies equally when coding with ESQL or Java. In order to ensure that the correct order is observed you can create a sequence of statements such as the ESQL shown below:

CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Surname'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Inits'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Addr1'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Addr2'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Addr3'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Postcode'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Account_Number'; CREATE LASTCHILD OF OutputRoot.XMLNSC.MsgStruct NAME 'Account_Bal'; ...

This code creates the right elements, and in the correct sequence. Then later on, when the elements are populated (generally using code like SET OutputRoot.XMLNSC.MsgStruct.<element> = …), each is already there and so the sequence is maintained. Note: It is not necessary to code the DOMAIN clause on every CREATE LASTCHILD statement. When creating a child node under parent P, the child is created by P's parser. So the parser (i.e. the domain) automatically propagates down the tree from the root. There is a performance and memory gain to be had as a result of not coding DOMAIN on the creation of the child elements although the magnitude of the gain is not quantified.

Coding Suggestions

ESQL

Array Variables [CPU] Avoid use of array subscripts [ ] on the right hand side of expressions – use LASTMOVE and reference variables instead. The reason for doing is because of the way in which array subscripts are evaluated at runtime. Every access to an element of an array will always start from the first element. There is no problem when the first element is required, but when the 10th is accessed it involves walking along the message tree from the first elememtn until the 10th is reached. When the 50th element is referenced then again the message tree has to be walked from the first element again. So the higher the array subscript the greater the cost of accessing it. The evaluation of array subscripts works in this way to support the dynamic insertion of elements into the array. Reference variables overcome this by maintaing a pointer into the message to the last element accessed. So if the 10th element has been accessed, accessing the 11th involves only walking to the next element, not starting from the first again. Here is an example of how to use reference variables to access the elements of an array. DECLARE myref REFERENCE TO

OutputRoot.XML.Invoice.Purchases.Item[1]; -- Continue processing for each item in the array WHILE LASTMOVE(myref)=TRUE DO -- Add 1 to each item in the array SET myref = myref + 1; -- Move the dynamic reference to the next item in the array MOVE myref NEXTSIBLING; END WHILE;

CARDINALITY function [CPU] Avoid use of CARDINALITY in a loop for example consider the statement

WHILE ( I < CARDINALITY (InputRoot.MRM.A.B.C[]). The CARDINALITY function has to be evaluated each time the loop is traversed. This can be a problem with large arrays where the cost of evaluating CARDINALITY is expensive and as the array is large we also iterate around the loop more often. It is better to determine the size of the array before the while loop. So code like the following is better as long as the array does not change size during the loop:

SET ARRAY_SIZE = CARDINALITY (InputRoot.MRM.A.B.C[] WHILE ( I < ARRAY_SIZE ) This way CARDINALITY is only evaluated the once.

CREATE Statement [CPU, Memory] The CREATE with PARSE clause statement in ESQL makes it possible to invoke parsers from within a Compute node. This can be used as an alternative to use of a ResetContentDescriptor node and subsequent Compute nodes and can keep processing down to a single Compute node where as previously it might have been three - Compute – ResetContentDescriptor – Compute.

DECLARE statements [Memory] The number of ESQL statements can easily be reduced by DECLAREing a variable and setting its initial value within a single statement. Alternatively, DECLARE multiple variables of the same data type within a single ESQL statement rather than in multiple statements. This will help to reduce memory usage.

EVAL Statement [CPU] Avoid use of the EVAL statement if possible as it is very expensive in CPU use. It effectively involves double execution of a statement.

FORMAT Clause [CPU] Use the FORMAT clause which became available in Message Broker V6 where possible to perform data and time formatting. It has powerful formatting capabilities which may allow multiple changes to be made in a single clause.

IF and CASE statements [CPU] Avoid Nested IF statements – It is less CPU intensive to use ELSEIF branches. If possible, use CASE statements, as this will cause the structure to be exited as soon as one of the branches is successfully evaluated. Code IF or CASE statements so that you place the most likely tests to be met at the front. This way fewer statements have to be evaluated. Calling PASSTHRU Avoid the use of the PASSTHRU statement with a CALL statement to invoke a stored procedure. Use the "CREATE PROCEDURE ... EXTERNAL ..." and "CALL ..." commands instead.

PASSTHRU Statement [CPU] When using the PASSTHRU statement use host variables (parameter markers) for data values rather than coding literal values. This allows the dynamic SQL statement to be reused by the dynamic SQL statement processor within DB2. Otherwise each statement has to be prepared each time which adds significantly to the processing cost. For example the statement

PASSTHRU(’UPDATE SHAREPRICES AS SP SET Price = 100 WHERE SP.COMPANY = ‘IBM’’);

is specific to that instance. The statement cannot be re-used unless you need to update IBM to the value of 100 again. If the value of price was 120 next time this would need another statement as the value is hard coded in the statement. This will also be seen as a different statement by the dynamic SQL statement processor. This will mean an SQL PREPARE for the new statement which is CPU intensive. By using host variables the statement can be reused by the database even if different data values are used. Here is an example of the same update statement which has been recoded to use host variables.

PASSTHRU(’UPDATE SHAREPRICES AS SP SET Price = ? WHERE SP.COMPANY = ?’, InputRoot.XML.Message.Price, InputRoot.XML.Message.Company);

The value for Price in the Update statement is taken from the location InputRoot.Message.Price and the Company name is taken from the location InputRoot.XML.Message.Company. Note: To see the level of dynamic statement cache activity use the commands: db2 connect to <database name>

db2 get snapshot for database on <database name> To see the contents of the dynamic statement cache use the commands: db2 connect to <database name>

db2 get snapshot for dynamic SQL on <database name>

PROPAGATE [CPU, Memory] When the message being processed in a message flow is complex or has multiple nested structures within it look to see whether you can process a section of the message tree at a time. Consider using the PROPAGATE function to send a portion of the message tree down the remainder of the message flow. This can save having to pass the whole of the message tree along all portions of the message flow.

Reference Variables [CPU] Use reference variables rather than long correlation names such as InputRoot.MRM.A.B.C.D.E. Declare a reference pointer using code like DECLARE refPtr REFERENCE to InputRoot.MRM.A.B.C And then to refer to element E of the message tree use the correlation name refPtr.E. Also see the use of reference variables with arrays in the section Array Subscripts above.

Shared Variables [CPU] Limit the use of shared variables to a small number of entries, tens of entries rather than hundreds or thousands, when using an array of ROW variables. Also order entries in probability of usage (the current implementation is not indexed so performance does degrade with higher numbers of entries).

Minimize use of String Manipulation Functions [CPU] All string manipulation functions used within ESQL are CPU intensive. LENGTH, SUBSTRING, RTRIM etc. need to access individual bytes in the message tree which makes them expensive to run. Avoid using such functions if possible. If you do need to use them avoid repeatedly executing the same concatenations by storing intermediate results in variables for example.

Volume of ESQL [CPU, Memory] Code ESQL using the fewest number of lines possible. This will help to reduce memory and CPU usage at runtime. It is logical that the fewer lines of code that are used the more efficient processing will be. Every statement of ESQL requires interpretation at runtime, so the memory footprint and CPU usage of a message flow can be reduced by coding fewer statements.

Minimize Navigation of the Logical Tree [CPU] Use the “REFERENCE” and “MOVE” statements to help reduce the amount of navigation within the message tree. This technique can be particularly useful when constructing large numbers of “SET” or “CREATE” statements. Instead of navigating to the same branch in the tree you can use a REFERENCE variable to establish a pointer to the branch and then use the MOVE statement to process one field at a time.

Java

Storing Intermediate tree references [CPU] Avoid building and navigating trees without storing intermediate references. An example of where this was not done and how it should be done is given below.

MbMessage newEnv = new MbMessage(env); newEnv.getRootElement().createElementAsFirstChild(MbElement.TYPE_NAME, "Destination", null); newEnv.getRootElement().getFirstChild().createElementAsFirstChild(MbElement.TYPE_NAME, "MQDestinationList", null); newEnv.getRootElement().getFirstChild().getFirstChild() createElementAsFirstChild(MbElement.TYPE_NAME,"DestinationData", null);

This repeatedly navigates from root to build the tree. It is better to store references as follows:

MbMessage newEnv = new MbMessage(env); MbElement destination = newEnv.getRootElement().createElementAsFirstChild(MbElement.TYPE_NAME,"Destination", null); MbElement mqDestinationList = destination.createElementAsFirstChild(MbElement.TYPE_NAME, "MQDestinationList", null); mqDestinationList.createElementAsFirstChild(MbElement.TYPE_NAME,"DestinationData", null);

String Concatenation [CPU] When concatenating java.lang.String objects use the StringBuffer class and append method rather than the + operator. Concatenating strings using the + operator is expensive since it (internally) involves creating a new String object for each concatenation. Code such as keyforCache = hostSystem + CommonFunctions.separator + sourceQueueValue + CommonFunctions.separator + smiKey + CommonFunctions.separator + newElement;

will perform better written as: StringBuffer keyforCacheBuf = new StringBuffer(); keyforCacheBuf.append(hostSystem); keyforCacheBuf.append(CommonFunctions.separator); keyforCacheBuf.append(sourceQueueValue); keyforCacheBuf.append(CommonFunctions.separator); keyforCacheBuf.append(smiKey); keyforCacheBuf.append(CommonFunctions.separator); keyforCacheBuf.append(newElement); keyforCache = keyforCacheBuf.toString();

Optimise BLOB Processing [CPU] In some special cases a project may need to process a BLOB – to cut it into chunks or insert characters for example. In this situation a JavaCompute node using Java strong processing capabilities may be better than using ESQL with its strong manipulation facilities such as SUBSTRING. If using the JavaCompute node use ByteArrays and ByteArrayOutputStream to process the BLOB.

Java Code Follow the usual Java coding tips.

Examples

What Not to do Here are some examples of how performance was impacted by the use of the wrong coding technique.

Array Subscripts Below is an example of some ESQL used to load records from a database table. The aim of the processing was to read the rows in from the database table and then iterate around them to create an output message which is then propagated to the next node. The load of the records from four database tables involved processing several hundred thousand rows from a database. It was taking 6-8 hours to run. Here is an extract of the code

SET Environment.Variables.DBDATA[] = ( SELECT T.* FROM Database.{'ABC'}.{'XYZ'} as T ); DECLARE A INTEGER 1; DECLARE B INTEGER CARDINALITY(Environment.Variables.*[]); SET JPcntFODS = B; WHILE A <= B DO CALL CopyMessageHeaders(); CREATE FIELD OutputRoot.XML.FODS; DECLARE outRootRef REFERENCE TO OutputRoot.XML.Data; SET outRootRef.Field1 = Trim(Environment.Variables.DBDATA[A].Field1); SET outRootRef.Field2 = Trim(Environment.Variables.DBDATA[A].Field2); SET outRootRef.Field3 = Trim(Environment.Variables.DBDATA[A].Field3); SET outRootRef.Field4 = Trim(Environment.Variables.DBDATA[A].Field4); SET outRootRef.Field5 = Trim(Environment.Variables.DBDATA[A].Field5); . . . . . . SET outRootRef.Field37 = CAST(Environment.Variables.DBDATA[A].Field37) SET A = A + 1; PROPAGATE; END WHILE;

The problem with the ESQL is the repeated use of array subscripts throughout such as Environment.Variables.DBData[A]. See the section Array Subscript above for why this is not good for performance. The solution in this case was to use REFERENCE variables and the LASTMOVE function instead. This is covered in the section Array Subscript By replacing the use of array subscripts with reference pointers the time dropped to minutes.

Memory Use The message flow was reading records from four databases into an array in Environment and processing each. The user was experiencing problems with memory usage. The flow was abending after 6 to 8 hours because of memory problems. They ESQL was as follows: SET Environment.Variables.Part1[] = ( SELECT T.* FROM Database.MyDB.TableA as T ); While loop for each row Build message PROPAGATE; End While SET Environment.Variables.Part2[] = ( SELECT T.* FROM Database.MyDB.TableB as T ); While loop for each row Build message PROPAGATE; End While SET Environment.Variables.Part3[] = ( SELECT T.* FROM Database. MyDB.TableC as T ); While loop for each row Build message PROPAGATE; End While SET Environment.Variables.Part4[] = ( SELECT T.* FROM Database.MyDB.TableD as T ); While loop for each row Build message PROPAGATE; End While Each of these loads read in between 50K-100K records. This obviously made the memory requirements large as there were hundreds of thousands of rows in total. The while loops after each read built an output message for each row. This was passed to the next node in the flow using the PROPAGATE statement.

At no point did was there any attempt to free memory and in most cases it is not needed within a message. BUT when processing large volumes of data you sometimes need to take some explicit action to avoid problems. After each part had been processed what they should have done was to issue a DELETE for that portion of the tree. For example DELETE LASTCHILD OF Environment.Variables.Part1. This would have freed the memory associated with that part of the Environment Correlation. Note: setting the field to null does not work. So for example: SET Environment.Variables.Part1[ ] =<some large array>; SET Environment.Variables.Part1 = null; Results in the named portion of the tree being detached It effectively disconnects that portion of the tree but does not delete it (that is free the memory). With a detach the memory is tracked, and released when the parser associated with the message is reset. That is when the node has finished its work or after a PROPAGATE without a DELETE NONE.

What to do

Large Repeating Structures Here is an example of how to deal efficiently with a large repeating message structure which might be many megabytes in size. This code is taken from the Large Messaging sample in the sample gallery of the Message Broker Tookit. See the sample gallery if you would like to run the sample. The message flow works by reading in the whole message, storing it a ROW variable, and then processing one element of the repeating structure at a time. Each element is then sent along the remainder of the message flow using the PROPAGATE function. When an element of the repeating structure has been processed it is deleted by using the statement DELETE PREVIOUSSIBLING OF refEnvironmentSaleList; The key factor in the success of this technique is the use of the ROW variable rowCachedInputXML when give mutable tree. InputRoot is immutable and as such portions of it cannot be deleted. CREATE COMPUTE MODULE XMLwithRepeat_to_singleXML_slicer_Compute -- ======================== -- The INPUT message format -- ======================== -- SaleEnvelope -- Header -- SaleListCount -- SaleList (n) -- Invoice (2) -- Initial (2) -- Surname -- Item (2) -- Code (3)

-- Description -- Category -- Price -- Quantity -- Balance -- Currency -- Trailer -- CompletionTime DECLARE ROOT_LEVEL CONSTANT CHARACTER 'SaleEnvelope'; DECLARE HEADER CONSTANT CHARACTER 'Header'; DECLARE REPEATING_ELEMENT_COUNT CONSTANT CHARACTER 'SaleListCount'; DECLARE REPEATING_ELEMENT CONSTANT CHARACTER 'SaleList'; -- Therefore, the repeating item which will be being processed is the 'SaleList' element. -- Elements within SafeList will not be referenced *specifically* by this code (but they will -- be parsed and hence memory will be claimed to store information about the internal elements ). -- Declare module level variables ("global" to this module) DECLARE intNumberOfSaleListsDeclared INTEGER 0; DECLARE intNumberOfSaleListsFound INTEGER 0; /* =================================== Main function to control processing =================================== */ CREATE FUNCTION Main() RETURNS BOOLEAN BEGIN CALL ProcessLargeMessageToProduceIndividualMessages(); CALL ProduceProcessingCompleteNotification(); END; /* ============================================================================================

> Declare variables > Find first instance of the element to process > For each instance found 1> Release memory used to store information about the previous instance (if appropriate) 2> Call a procedure to produce a single message the current instance 3> Look for a following instance ============================================================================================ */ CREATE PROCEDURE ProcessLargeMessageToProduceIndividualMessages() BEGIN -- Creat a (local to this node) variable to hold a mutable tree... DECLARE rowCachedInputXML ROW; -- ... and create a suitable parser (DOMAIN) to process the incoming message /* As both the incoming message AND the new parser are XMLNSC no translation is required and therefore the XML message is NOT fully parsed */ CREATE FIRSTCHILD OF rowCachedInputXML DOMAIN ('XMLNSC') NAME 'XMLNSC'; -- Create a reference variable to be used to traverse the input XML message /* Which will be processed via the local variable described above */ DECLARE refEnvironmentSaleList REFERENCE TO rowCachedInputXML.XMLNSC; -- Create a mutable tree by copying the INPUT XML to the local parser /* This is to allow data about parsed message elements to be deleted from the message tree (which can not happen on the InputRoot as its message tree is immutable) */ SET rowCachedInputXML.XMLNSC = InputRoot.XMLNSC; -- Determine how many SaleList items are expected... IF FIELDNAME( InputBody.{ROOT_LEVEL}.{HEADER}.*[>]) = REPEATING_ELEMENT_COUNT THEN SET intNumberOfSaleListsDeclared = InputBody.{ROOT_LEVEL}.{HEADER}.{REPEATING_ELEMENT_COUNT}; ELSE THROW USER EXCEPTION MESSAGE 2999 VALUES ('LMSmessageFailure', 'No count found!'); END IF; -- Acquire the first SaleList element... MOVE refEnvironmentSaleList FIRSTCHILD NAME ROOT_LEVEL; IF NOT LASTMOVE(refEnvironmentSaleList) THEN

THROW USER EXCEPTION MESSAGE 2999 VALUES ('LMSmessageFailure', 'No root element found!'); END IF; -- The next line results in the parser attempting to locate the first SaleList structure... MOVE refEnvironmentSaleList FIRSTCHILD NAME REPEATING_ELEMENT; -- Loop around each SaleList item WHILE LASTMOVE(refEnvironmentSaleList) DO -- Increment the count of SaleList items found... SET intNumberOfSaleListsFound = intNumberOfSaleListsFound + 1; -- Are we on the second, or subsequent repeating item? IF intNumberOfSaleListsFound > 1 THEN -- YES, therefore erase the parsed details about the previous item to release memory /* The following line is most significant with respect to memory usage. Its execution results in the last-but-one *repeating* element (SaleList), including subordinate elements, of the message tree being deleted allowing the memory used to hold information generated during parsing to be reused for further parsing. */ DELETE PREVIOUSSIBLING OF refEnvironmentSaleList; END IF; CALL ProduceIndividualSaleListMessage(refEnvironmentSaleList, intNumberOfSaleListsFound); -- The next line searches for another repeating element... MOVE refEnvironmentSaleList NEXTSIBLING NAME REPEATING_ELEMENT; END WHILE; END; /* ==================================================================== Produce a message consisting of one "slice" of the compound message. ==================================================================== */

CREATE PROCEDURE ProduceIndividualSaleListMessage(IN refEnvironmentSaleList REFERENCE, IN intSaleListNumber INTEGER) BEGIN -- ================================== -- The relevent OUTPUT message format -- ================================== -- Parent -- Number -- SaleList CALL CopyMessageHeaders(); SET OutputRoot.XMLNSC.{ROOT_LEVEL}.Number = intSaleListNumber; SET OutputRoot.XMLNSC.{ROOT_LEVEL}.{REPEATING_ELEMENT} = refEnvironmentSaleList; -- Generate a new message consisting of one SaleList structure PROPAGATE; END;

Documents

WebSphere Message Broker Coding Tips