36
z/OS ® V1R12 Communications Server Performance Study: OSA-Express3 Inbound Workload Queueing Tom Moore [email protected] Patrick Brown [email protected] Document version 1.0 December 2010 © 2010 IBM Corporation 1

z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

  • Upload
    others

  • View
    6

  • Download
    1

Embed Size (px)

Citation preview

Page 1: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

z/OS® V1R12Communications Server

Performance Study:

OSA-Express3 Inbound Workload Queueing

Tom [email protected]

Patrick [email protected]

Document version 1.0December 2010

© 2010 IBM Corporation 1

Page 2: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

19

Part 5: PerformanceData for Mixed(Interactive +Streaming) Workloads . . . . . . . . . . . . . . . . . . . .

17Part 4: IWQPerformance Summary . . . . . . . . . . . . . . . . . . . .

16

Part 3: IWQ forSysplex DistributorOperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

z/OS CommServerAPAR PM20056 (PTFUK61028) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

So how exactly doesIWQ help z/OS keepstreaming data in-order? . . . . . . . . . . . . . . . . . . . .

13

Out of Order packetdelivery (ortransmission) due to MPRaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13Why is it important tokeep TCP data in-order? . . . . . . . . . . . . . . . . . . . .

13

Part 2: IWQ: KeepingTCP Streaming DataIn-Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11Setting the Lan-IdleTimers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

Single SessionRequest-Responseworkload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9Inbound Streamingworkload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

Part 1: InterruptFrequency for InboundNetwork Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8Future Updates to thisPaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8PerformanceDisclaimers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7Copyrights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

How to enable z/OSV1R12 OSA-3 IWQ . . . . . . . . . . . . . . . . . . . . . . . . .

5How does z/OS V1R12exploit OSA-3 IWQ? . . . . . . . . . . . . . . . . . . . . . . . .

5Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4AcknowledgementsAcknowledgementsAcknowledgementsAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . .

32

Find all TCP connectionsthat are associated withthe bulk-data ancillaryqueue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

Determine if inboundtraffic is using InboundWorkload Queueing(IWQ) by using VTAM ™tuning statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

Determine whetherrouting variablesdescribing the ancillaryqueues are registeredwith OSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

Determine if IWQ isenabled for your QDIOinterface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

Appendix A:IWQ-RelatedDiagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29Performance Results forSysplex Workload #2: . . . . . . . . . . . . . . . . . . . . . .

28Performance Results forSysplex Workload #1: . . . . . . . . . . . . . . . . . . . . . .

27Sysplex Workload #2 -two mini-workloads . . . . . . . . . . . . . . . . . . . . . . . .

27Sysplex Workload #1 -three mini-workloads . . . . . . . . . . . . . . . . . . . . . . .

27

Part 7: PerformanceData for SysplexDistributor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

Streaming results on a“clean” network (nopacket loss): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

Streaming throughput with some packet loss inthe network: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

Part 6: PerformanceData for PurelyStreaming Workloads . . . . . . . . . . . . . . . . . . . .

23CPU Consumption forMixed Workload . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

10Gb Ethernet Results(MixedInteractive|Streamingworkload) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

1Gb Ethernet Results(MixedInteractive|Streamingworkload) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

© 2010 IBM Corporation 2

Page 3: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

35

Appendix B: CPUConsumption Analysisfor Mixed Workloads . . . . . . . . . . . . . . . . . . . . .

34

Determine number ofsegments received forall TCP connections thatare associated with thebulk-data ancillary queue . . . . . . . . . . . . . . . . . . . .

© 2010 IBM Corporation 3

Page 4: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Acknowledgements

Many thanks the following, for their contributions to the IWQ project:

Jerry Stevens, Mike Fitzpatrick, GusKassimis - z/OS Communications ServerDesign

Bruce Ratcliff, Jeff Turner, Tom Doster -OSA Design/Development/Performance

Dhananjay Patel, Jeannie Hawrysz, JeffHaggar, Ed Zebrowski, Bebe Isrel, DaveHerr, Todd Valler, Mark McClintock,Hugh Hockett, Ted Bodenheimer - z/OSCommunications ServerDevelopment/Performance

Todd Lopez, Ben Rau, Phillip Trent, JasonMansfield - z/OS Communications ServerSystem Test

© 2010 IBM Corporation 4

Page 5: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Introduction

OSA-Express3 Inbound Workload Queueing(IWQ) was included in the July 2010announcements of z/OS V1R12 and thezEnterprise™ 196 Server.

IWQ is an advanced new function on theSystem z10™ and zEnterprise 196OSA-Express3 features, for 1Gb and 10Gbethernet. IWQ introduces the concept of“multiple input queues”, where theOSA-Express3 performs a real-time sort of a“heterogeneous” inbound traffic stream ontoseparate inbound processing queues (forpresentation to the host operating system - z/OS in this paper). In IWQ mode, z/OSCommunications Server provides the rulesdefining how OSA is to perform thistraffic-sort, and these rules are automaticallyselected in such a way as to improve systemperformance beyond what would be possibleusing a single input queue.

With IWQ, an inbound packet will be routed toan ancillary input queue if a sorting rule hasbeen found matching the packet. Else if nosorting rule has been registered matching thepacket, OSA will route the packet onto theprimary input queue.

What do we mean by a “heterogeneousinbound traffic stream”? We’ll use the term “heterogeneous” or“mixed” to describe a traffic stream beingpresented inbound to z/OS, where the streamconsists of packets for a number of differentworkload types. For instance, we wouldconsider a burst off the ethernet containingpackets for both DB2™/DRDA and FTP as aheterogeneous or mixed inbound traffic stream.

As will be described further, these workloadtypes have greatly-differing response timedemands, so it would make sense for z/OS totailor its processing to match the response timerequirements of the individual workloads.

How does z/OS V1R12 exploit OSA-3 IWQ?

When OSA-3 IWQ is enabled, z/OSCommunications Server and theOSA-Express3 establish a primary input queueand one or more ancillary input queues forinbound traffic. z/OS Communications Serverand the OSA-Express3 cooperatively use themultiple queues as follows:

� The TCP layer quickly and automaticallydetects connections operating in a bulk-datafashion (such as FTP data connections),and these connections are registered to thereceiving OSA-Express3 as bulk-modeconnections. (Please note: the “bulk-data”detection mechanism is not limited to anyparticular application, nor is it based onusage of any well-known port numbers.The intent is to automatically detect anyTCP connection exhibiting streamingbehavior, then get it registered to the OSAas a bulk-data connection.) TheOSA-Express3 then directs an inboundpacket (received on this interface) for anyregistered bulk-mode connection to theTCP bulk-data ancillary input queue. z/OSCommunications Server tailors itsprocessing for the bulk queue, notably byimproving in-order packet delivery onmultiprocessors, which generally results inimprovements to CPU consumption andthroughput. The processing of data on thebulk queue can be in parallel with traffic onthe other queues.

© 2010 IBM Corporation 5

Page 6: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

� The OSA-Express3 directs an inboundpacket that is to be forwarded by thesysplex distributor to the sysplex distributorancillary input queue. z/OSCommunications Server then tailors itsprocessing for the sysplex distributorqueue, notably by using the multiprocessorto service sysplex distributor traffic inparallel with traffic on the other queues.

� If a packet is not directed to an ancillaryinput queue, the OSA-Express3 directs thepacket to the primary input queue.

How to enable z/OS V1R12 OSA-3 IWQ

To enable Inbound Workload Queueing (IWQ)separation for a specific QDIO interface, perform the following:

� Specify INBPERF DYNAMIC on theIPAQENET or IPAQENET6 INTERFACEstatement with the WORKLOADQsubparameter.

� For IPv4 QDIO interfaces defined viathe DEVICE/LINK/HOME statements,you must first convert to an IPAQENET

INTERFACE statement.

� Additionally a virtual MAC address isrequired because the current OSA/FPGAdesign requires this information to separateinbound packets for certain workload types.You can allow the OSA-Express device togenerate it by specifying the VMACINTERFACE parameter without a macaddrvalue.

Optionally the OLM INTERFACE parametercan be specified for an IWQ interface. When specified , the OSA-Express adapter willoperate in optimized latency mode for queuesthat will benefit from this support. Since thisfunction is targeting interactive traffic, thissetting will not be utilized by OSA for the TCPbulk-data ancillary input queue.

In order to function in IWQ mode, the z10OSA-Express3 needs to be at or above thismicrocode level: Driver 79, EC N24398MCL003

© 2010 IBM Corporation 6

Page 7: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

This paper is organized as follows:

� In Part 1, we discuss the timing benefits(improved communications latency) thatcan be achieved by transparently separatingstreaming traffic away from morelatency-sensitive interactive traffic;

� Part 2 describes how IWQ helps keepstreaming data in-order on a multiprocessor(and why that’s important);

� Part 3 describes the benefits that can beachieved in separating traffic destined forthe Sysplex Distributor function away fromall other traffic targeting the host;

� Part 4 summarizes all the performance datapresented in the subsequent sections, and

� Parts 5, 6 and 7 contain detailedperformance data collected for IWQ-modeOSA-Express3’s (with comparisons againstthe earlier OSA operational modes).

Two appendices are included - Appendix A describes IWQ-related diagnostics, andAppendix B contains CPU consumptionanalysis for mixed workloads.

Copyrights

IBM logo, AIX, DB2, System z, System z10,VTAM, zEnterprise and z/OS are trademarks orregistered trademarks of IBM in the United States,other countries, or both.

SAP®

is a registered trademark of SAP AG inGermany and several other countries.

.

© 2010 IBM Corporation 7

Page 8: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Performance DisclaimersPerformance DisclaimersPerformance DisclaimersPerformance Disclaimers

1. The performance data presented in thispaper were collected using a dedicatedsystem environment. The results obtainedin other configurations or operating systemenvironments may vary significantlydepending upon environments used.Therefore, no assurance can be given, andthere is no guarantee to achieveperformance equivalent to that describedherein. Users of this document shouldverify the applicable data for their specificenvironment.

2. IWQ has not yet been performance-testedon a uniprocessor (UP). There may besome elements of the IWQ design that willprovide positive performance effects on aUP, but at this time we cannot makeperformance claims for IWQ deploymenton a uniprocecessor.

3. IWQ has not been (and will not be)performance-tested on z10 Business-Class processors. As described in Part 2 of thisdocument, IWQ assumes a single CP hassufficient capacity to handle the peak traffic rate arriving from an OSA-Express3.This assumption may not be valid oncertain z10 BC processor models, so IWQ

deployment cannot be recommended insuch a configuration.

4. IWQ has not yet been performance-testedin shared-OSA configurations (i.e.,configurations where multiple LPARpartitions share an OSA port).

5. Deploying IWQ will grow ECSA usage by72 KBytes (per OSA Interface) if SysplexDistributor (SD) is in use; 36 KBytes if SDis not in use. Customers alreadyexperiencing ECSA constraint shouldweigh IWQ’s potential performancebenefits against this minor increase inECSA usage.

Future Updates to this PaperFuture Updates to this PaperFuture Updates to this PaperFuture Updates to this Paper

At this time, the IWQ function has beenperformance-tested only on System z10. Thispaper will be updated in early 2011, to includeIWQ performance data for the zEnterprise 196.This planned update will also include data forsome of the configurations currently describedin “performance disclaimers”.

© 2010 IBM Corporation 8

Page 9: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Part 1: IPart 1: IPart 1: IPart 1: Interrupt Frequency fornterrupt Frequency fornterrupt Frequency fornterrupt Frequency for

Inbound Network FlowsInbound Network FlowsInbound Network FlowsInbound Network Flows

A critical performance decision for z/OSCommunications Server is: “how long shouldwe allow packets (inbound to the host) toqueue in the communication adapter beforerequiring the adapter to present the inbounddata?”. “Presenting the data” generallyinvolves the adapter interrupting the host,which can be an expensive operation (in termsof both host cycles consumed in fielding theinterrupt, and adapter microprocessor cycles ingenerating the interrupt). Becausenetwork-related I/O interruptions will consume considerable CPU andmicroprocessor resource, most ethernethardware (with cooperation from the operatingsystem) employ timing mechanisms to allowmultiple packets to be delivered to the host ona single I/O interrupt.

On z/OS, with the Open Systems Adapter , thetiming mechanism is realized via two timers,implemented in the OSA.

� Inter-packet Gap Timer

� Block Hold Timer

The two timers work together as follows: ifthe adapter detects a timing gap betweenconsecutive inbound ethernet frames of longerthan the Inter-Packet Gap setting, OSA willinfer there’s a pause in the traffic stream, andwill now present any accumulated frames up tothe host. Else (gap timer not exceeded): if theblock hold timer has been exceeded, OSA willpresent the accumulated frames up to the host.

Collectively, the two timers are commonlyreferred to as the “OSA Lan-Idle” timingfunction.

Here are two examples demonstrating the OSALan-Idle timing function. In the first example,the Lan-Idle timing function is beneficial(greatly improves system efficiency). In thesecond example however, the timing functionactually results in a performance penalty(significantly worse response time).

Inbound Streaming workload

2 2 2 2 2 2

flow direction

receiving OSA Express-3

2 2 2 2 2 2 40

flow direction

receiving OSA Express-3

Figure 1 Inbound streaming burst - short pause, start of second burst

Data flow for streaming workloads is usuallyheavily biased in a single direction, and in theexample above (Figure 1) the direction of dataflow is inbound to z/OS. (Consider an FTPfile PUT to z/OS: the inbound side of z/OSwill see payload-carrying TCP segments, whilethe outbound side will see only TCP controlflows such as ACKs and Window Updates.)Besides being heavily payload-biased in asingle direction, streaming workloads alsocreate bursty periods where many packets flowin a single direction (with nothing flowingback in the other direction). And within eachburst, we normally see the packets spaced byjust a few microseconds.

From an interrupt-timing perspective, two keycharacteristics of streaming traffic patternsneed to be considered:

© 2010 IBM Corporation 9

Page 10: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

� Since there is a very small timing gapbetween consecutive packets in the burst,read-side interrupt frequency would beextremely high if we were to allow theadapter to present each packetimmediately. Immediate presentation (viaread-side interrupt) of each packet in thestreaming burst would result in enormousCPU consumption, which would reduce theeffective capacity of the machine, whilealso driving up usage-based z/OS pricing.

� Because a sender normally transmitsmultiple packets in a single burst, there isno need for the receiver to immediatelyACK every packet (or even every-otherpacket) in the burst. (TCP flow controlallows the sender to get somewhat ahead ofthe receiver. The receive side can thereforedelay TCP Acknowledgements for a shortperiod without causing throughputdegradation.)

With the above considerations in mind, z/OSuses OSA’s Lan-Idle timing capability, suchthat each interrupt presents a batch of packets(rather than just one packet being presented perinterrupt). So CPU processing expense on theinbound side can be minimized without impacting streaming throughput.

In the figure 1 example above, we have anarriving burst of 7 packets, with each packetseparated by a 2 microsecond timing gap. A40 microsecond pause in the stream is thenseen (here, possibly the send-side hadtransmitted all its queued data, then needed thenext FTP disk-read to complete before anynew data could be injected into the network).Assume the OSA inter-packet gap timer is setat 20 microseconds. Because the first 7

packets are so tightly spaced, the inter-packetgap timer will never trigger in the gapsbetween any of those first 7 packets. Thetimer in fact will not trigger until the mid-pointof the 40 microsecond pause, at which timeOSA will interrupt the host to process thosefirst 7 packets. z/OS CommServer will thenprocess those 7 packets as a single unit ofwork, and will generate TCPAcknowledgement(s). On a well-tuned TCPconnection, these ACKs will arrive back at thesend-side before the send-side had completelyfilled the allowed TCP window - meaning theshort Lan-Idle timer delay will have had nonegative impact on the connection’sthroughput.

Single Session Request-Response workload

single packet (request) IN

single packet (response) OUT

Figure 2 - single session interactive flow

Where the streaming example in figure 1 was somewhat insensitive to receive-side latency,we’ll now show a workload that’s the exactopposite: this one is extremely sensitive toreceive-side latency.

Single-Session Request-Response workload(such as the SAP ® Upgrade Utility):

Transaction response time for single TCPsession request/response is a function ofsystem pathlengths (client side generating arequest; server side processing the request; server side generating the response; client side

© 2010 IBM Corporation 10

Page 11: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

processing the response) PLUS network times(client to server and server back to client)PLUS other latencies in the systems (e.g.,delays due to resource contention). As host processor speeds and softwareefficiencies improve, network time becomes alarger component of overall end-to-endresponse time. And for z/OS, the Lan-Idletiming function can be a sizable portion of thisnetwork time.

Figure 2 looks at the server side of asingle-session request/response transaction.The single packet IN might be a DB2/DRDAquery, and it’s followed by the DB2/DRDAresponse. As in the previous streamingexample, again assume the OSA inter-packetgap timer is set at 20 microseconds. When theIN packet arrives at the OSA, the inter-packetgap timer starts. But unlike in the streamingexample, there is no second packet flowinginbound to the server, so the inter-packet gaptimer will trigger approximately 20microseconds after the IN packet has arrived.The packet will then be presented to the host(via read-side interrupt), then the host willperform all the z/OS, TCP/IP and DB2processing, and the DRDA response will begenerated. If, say, the total server-side cpuexecution time for all this was 20 cpumicroseconds, then the total server-sideresponse time will have been 40 microseconds.And of this 40 microseconds, fully half thetime was spent waiting for the inter-packet gaptimer to trigger. Or (said another way) in thisexample, the Lan-Idle timing function resultedin a doubling of server-side response time.

The above two examples should demonstratethat while the Lan-Idle timing functionprovides processing efficiency for streaming

workload, it can be detrimental for interactiveworkload response time.

Setting the Lan-Idle Timers

z/OS Communications Server allows thecustomer to indirectly alter the Lan-Idle timersettings, via the INBPERF setting on MPCIPALINK and INTERFACE statements.

In order to understand the role IWQ will nowplay in read-side interrupt frequency, let’s firstquickly review the INBPERF settings availableto customers prior to z/OS V1R12:

MINLATENCY:MINLATENCY:MINLATENCY:MINLATENCY: the two timers are statically setto very low values, in order to minimizeresponse time degradation (due to adapterhold-time). While this setting might beappropriate for workloads with demandingresponse-time requirements, this setting willusually result in excessive CPU consumptionfor streaming-type workloads.

MINCPU:MINCPU:MINCPU:MINCPU: the two timers are statically set toextremely high values, in order to minimizeI/O interrupts. While this setting might beappropriate for certain streaming workloads(like inbound FTP transfers) , this settingwould be detrimental for workloads requiringvery fast response times (such as SAP/DB2).

BALANCED:BALANCED:BALANCED:BALANCED: the two timers are statically setaround the midpoint of the extremes used inthe MINCPU and MINLATENCY settings.This is the default setting for all z/OS releases,and is intended to provide reasonably goodthroughput with reasonably low CPUconsumption.

DYNAMIC:DYNAMIC:DYNAMIC:DYNAMIC: Unlike the above three (static)settings, with the dynamic setting,CommServer will study packet arrival patterns,and will dynamically tune the OSA timers at

© 2010 IBM Corporation 11

Page 12: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

the point that appears to be maximizing systemthroughput. The Dynamic setting is effectivefor installations that occasionally run lightinteractive workloads but also frequently runheavy streaming workloads.

At this point it should be stressed that none of theINBPERF settings above is optimal for allworkload types. Earlier we mentioned theMINLATENCY setting would be inappropriate forStreaming workloads (cpu consumption would beexcessive), and the MINCPU setting would beinappropriate for latency-demanding workloads(response time would be poor due to excessivehold time in the adapter). BALANCED mode is a static compromise between MINLATENCY andMINCPU. The DYNAMIC setting improvesupon these static settings in that it can performquite well for either light interactive workload orheavy streaming workload. But what happenswhen z/OS is servicing a “mixed” interactive ANDstreaming workload? The reality is - if, forinstance, the inbound workload consists of bothheavy FTP streaming along with light DB2/DRDAinteractive activity, DYNAMIC mode will alwaystend toward CPU conservation (setting higherOSA timer values), which will inevitably result ina response time penalty on the interactive trafficflows. The problem prior to CommServer V1R12has been this: there is just one set of timers for

the OSA interface, so how can we possibly tunethat single set of timers to perform optimallyfor a mixed interactive+streaming workload?

z/OS V1R12 provides the solution to the“mixed workload” dilemma: with V1R12and OSA-Express3 IWQ, each inputqueue has its own set of interrupt timers.So we can now dynamically tune theinterruption criteria independently, tomatch the latency demands of theworkloads being serviced on each queue. The bulk ancillary queue will operatewith the timers set to conserve CPU(doing so will not degrade streamingthroughput), while the primary andsysplex distributor queues will usuallyoperate with aggressive timer settings, inorder to minimize z/OS communicationlatency for the interactive workloadsbeing serviced on those queues.

Performance data demonstrating the value ofindependent, queue-based, interrupt timingcriteria are contained in Part 5.

© 2010 IBM Corporation 12

Page 13: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Part 2: IWQ: Keeping TCPPart 2: IWQ: Keeping TCPPart 2: IWQ: Keeping TCPPart 2: IWQ: Keeping TCP

Streaming Data In-OrderStreaming Data In-OrderStreaming Data In-OrderStreaming Data In-Order

In the previous section, we covered theinterrupt-timing benefits afforded by IWQ.We now move onto the second major benefitof IWQ exploitation - with streaming (bulk)traffic separated onto its own queue, it’s mucheasier for z/OS to keep streaming data in-orderwithin a multiprocessor.

Why is it important to keep TCP datain-order?

While the TCP standards (RFCs) do requireresequencing logic to deal with traffic arrivingout-of-order, performance-wise it’s verybeneficial to keep data in-order:

� Out-of-order TCP packet reception resultsin transmission of duplicateAcknowledgements. And when a TCPtransmitter receives a third consecutiveduplicate ACK (e.g., because the receiversaw at least 4 TCP segments arrive out oforder), the transmitter enters the FastRetransmit-Recovery (FRR) state. FRR isintended to perform a “fast” retransmission(rather than waiting for a longerretransmission timer pop) when there’sevidence of packet loss. (Reception of athird duplicate ACK is evidence of packetloss - even if the only problem had beenpackets getting out of order within amultiprocessor!) Somultiprocessor-induced ordering problemscan lead to unnecessary “fast”retransmissions, accompanied by areduction of the TCP congestion windowon the send side (which will inhibitattainable throughput).

� Setting aside the “false” retransmissionissue due to out-of-order delivery, TCPstacks employ header-prediction logic - anefficient processing path that will minimizeCPU consumption when all predictedconditions are met. When data arrives outof order, these predicted conditions will notbe met and the processing “fastpath” willnot be taken; this will result in excessiveCPU consumption.

Out of Order packet delivery (ortransmission) due to MP Races

In earlier z/OS CommServer releases, it wascommon for customers to notice fairly highout-of-order inbound TCP packet counts, andin some cases it was shown the data wasIN-ORDER when it arrived at the receivingOSA (i.e., the ordering problem occurredwithin the inbound side of z/OS). Theseearlier z/OS releases also had some likelihoodof actually putting outbound streaming data onthe wire already out-of-order. Both flavors ofthe ordering problem were brought on byprocessing races in the multiprocessor:

� The inbound out-of-order condition occurswhen the OSA interrupts z/OS before anearlier-scheduled TCP/IP SRB hascompleted. On a multiprocessor, thesecond SRB then races with the first topresent its data to the TCP layer. If bothSRBs carry packets for the same TCPconnection, there is some likelihood of thesecond SRB reaching the TCP layer first(which will result in detection ofout-of-order delivery). This problem ismost pronounced when multiple TCPconnections are running. In figure (3),SRB#1 carries two packets for TCP

© 2010 IBM Corporation 13

Page 14: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

connection A, and these two packets are atthe end of the “batch” of packets beingprocessed on this SRB. SRB #2 carriesfour packets for the same TCP connectionA, but these packets are at the beginning ofits batch. Since SRB #1’s TCP pathlengthwill be longer (than SRB 2’s) beforereaching the connection A packets, theeffect will be to drive up the odds that theTCP layer will see SRB 2’s packets forConnection A before it sees SRB 1’sConnection A packets.

BBBCCCDDDDDAA

AAAADDBBBBCCC

SRB 1 SRB 2

Figure 3. Two SRBs each carrying packets forfour TCP connections

� The outbound ordering problem is anothermanifestation of inbound SRBs racing on amultiprocessor. It’s common for theprocessing of an inbound TCPAcknowledgement to result in new databeing transmitted on the outbound path.And when multiple SRBs (carryingAcknowledgements for a single TCPconnection) race on an MP, each may result

in a separate outbound burst of data. It’sbeen common to see these multiple burstsbecoming interleaved in the lower layers ofTCP/IP - meaning the data is leaving z/OSalready out of order.

So how exactly does IWQ help z/OS keepstreaming data in-order?

We’ve known for some time that a single z10or z196 CP can handle the full-sustainedstreaming load arriving inbound from thefastest OSA-Express3 adapters. That is to say- no throughput (or CPU consumption)problem would arise if we opted to service allthe streaming traffic on a single CP (i.e., usinga single MVS SRB). We’ve also known forsome time that interactive workload responsetime would suffer if we opted to process all theinteractive traffic on a single CP (again using asingle MVS SRB). So since (on releases priorto V1R12), a single input queue was used tohold interleaved interactive and streamingtraffic, it was not practical in earlier releases touse one scheduling philosophy for streamingtraffic and another for the interactive traffic.

With IWQ’s separation of streaming trafficaway from interactive traffic, it now hasbecome feasible to service streaming trafficwith a single SRB (thereby doing away withmultiprocessor-induced races), while stillaggressively servicing the interactive queue(potentially with multiple SRBs). Sostreaming traffic will remain in-order withinthe multiprocessor without risk of anyadditional latency hit to interactive traffic.

© 2010 IBM Corporation 14

Page 15: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

z/OS CommServer APAR PM20056 (PTFUK61028)

In order to get the most benefit for streamingtraffic over IWQ, customers should apply PTFUK61028, which incorporates IWQ-relatedperformance changes that didn’t make it ontothe base V1R12 ship tape.

One of the notable changes contained in thisPTF is: increased TCP Acknowledgementfrequency following apparent packet loss.

During performance test of IWQ, we observedz/OS (as a receiver) employing an extremelyconservative TCP-Acknowledgementfrequency. Specifically, when an inboundSRB presented multiple segments for a singleTCP connection, the TCP layer was alwaysgenerating a single TCP ACK. And in the casewhere some (or all) of the data is arrivingout-of-order (e.g., due to packet loss), wefound this infrequent TCP ACK behavior wasinsufficient to trigger theFast-Retransmit-Recovery (FRR) function on

the send side (recall earlier we mentioned FRRis triggered upon receipt of a 3rd duplicateACK). Without FRR, the send side becomesdependent upon the TCP retransmission timer, which results in a much longer stall onthe connection than would be seen with a“FAST” retransmission.

In APAR PM20056, z/OS CommServer’s TCPlayer is therefore updated as follows:

� If data on this inbound SRB has arrivedout-of-order (i.e., the receiver now suspectspacket loss), increase the TCP ACKfrequency to give the send side a betterchance of entering the FRR state (to avoida long retransmission stall). This change isimplemented such that it takes effect onlyfor TCP streaming connections beingserviced by an IWQ-mode OSA1.

The positive effects of this change are visiblein Part 6 - Figures 8 and 9.

© 2010 IBM Corporation 15

1 Consideration was given to increasing TCP ACK frequency (during out-of-order delivery) regardlessof the INBPERF setting of the receiving OSA. However, since only IWQ mode is designed to retaininbound ordering within a multiprocessor, the other modes would likely see a large CPU consumptionincrease if we were to allow this change to take effect in the other INBPERF modes. For this reason,the change was designed such that it takes effect only for TCP connections being serviced by anIWQ-mode OSA.

Page 16: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Part 3: IWQ for SysplexPart 3: IWQ for SysplexPart 3: IWQ for SysplexPart 3: IWQ for Sysplex

Distributor OperationDistributor OperationDistributor OperationDistributor Operation

Parts 1 and 2 above described the performancebenefits achievable in having bulk (streaming)traffic separated away from more interactivetraffic. Sysplex Distributor’s exploitation ofIWQ is an extension of this basic idea.

When IWQ is enabled on an OSA servicingtraffic for the Sysplex Distributor (SD), aseparate input queue is created and reservedsolely for TCP traffic destined to the sysplexdistributor function. To accomplish this, Communications Server registers a sorting rulewith the OSA-Express3, indicating all inbound TCP-protocol datagrams targeting anyDistributed Dynamic Virtual IP Address(distributed DVIPA) are to be placed on thesysplex distributor input queue.

As discussed in previous sections, each IWQinput queue is equipped with its own set of

Lan-Idle interruption timers. So with SDtraffic now separated onto its own queue,Communications Server can tune the SDtraffic’s interruption criteria independently(without regard to the traffic timing patternspresent on the other two input queues). Goingbeyond this interrupt-timing benefit, IWQ’sseparation of SD traffic onto its own queueenables much-improved parallelism:

Consider a 3-CP z/OS image concurrentlyservicing streaming traffic, SD traffic, andother non-distributed interactive traffic.With IWQ, the three workload streams canbe serviced in parallel (with each queuebeing serviced by its own running SRB).This increased parallelism can result inimproved response times.

Performance data for Sysplex Distributor’sIWQ exploitation is contained in Part 7.

© 2010 IBM Corporation 16

Page 17: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Part 4: IPart 4: IPart 4: IPart 4: IWQ PerformanceWQ PerformanceWQ PerformanceWQ Performance

SummarySummarySummarySummary

If you’ve skipped directly to this performancesummary, at a minimum please review the‘Performance Disclaimers’ discussion in theintroductory section.

Before deploying IWQ mode, please applyz/OS Communications Server PTF UK61028.

Summarizing the mixedInteractive+Streaming Workload Results:

For mixed interactive|streaming workloads, thenew OSA-Express3 Inbound WorkloadQueueing (IWQ) mode may providesubstantial throughput and response timeimprovements for interactive traffic. In thelab, we’ve measured a peak interactivethroughput boost of 84%, which translates to a46% improvement in interactive response time.

Summarizing the Streaming WorkloadResults:

For purely-streaming workloads, the new IWQmode may also deliver substantial throughputimprovement. This throughput boost appearsto be delivered with no increase in normalized(per MegaByte) CPU processing expense. (Infact, in some tests, the improvements in rawthroughput were also accompanied by adecrease in per-MB CPU consumption.) Ourlab results show a peak throughput boost of41% (measured for z/OS streaming outboundto AIX over 10Gbe), and a reduction inper-MB CPU consumption of up to 12%.

Summarizing the Sysplex Distributor Results:

� If, in addition to its normal sysplextraffic-distribution function, the SD node isalso servicing streaming traffic for localapplication(s), IWQ’s transparentseparation of these traffic streams enablessubstantial throughput improvement for thedistributed traffic. (For such a workload, we measured an 18% throughput boost forthe distributed traffic.)

� If the Sysplex Distributor function itself isservicing a streaming-traffic workload,IWQ enables significant throughputimprovement for other interactive trafficalso targeting the Sysplex Distributor Node.In one such benchmark, we observed morethan a doubling of interactive throughput(better than a halving of interactiveresponse time). Such positive results willbe most pronounced at installations notusing the QDIO Accelerator function.

� If the traffic stream being serviced by theSysplex Distributor function is interactivein nature (and no other bulk traffic istargeting local applications on the SDnode), we find IWQ’s separation of the SDtraffic away from the primary queue trafficwill have little performance effect. At best,this separation will enable improvedparallelism via multiprocessing (resultingin somewhat improved response time forthe SD traffic). At worst, there will be aminor degradation of the SD traffic’sresponse time, due to new latencies in theOSA hardware (in performing the trafficsort of the inbound stream).

© 2010 IBM Corporation 17

Page 18: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Based on the performance studies in Parts 5,6and 7 below, the new IWQ mode appears tohave no significant performance weaknessrelative to the earlier INBPERF modes (whilein many cases, greatly outperforming theearlier INBPERF modes). We therefore dorecommend usage of IWQ, particularly forz/OS images that service any mix ofbulk+interactive+Sysplex Distributor traffic 2.

Further, it appears IWQ’s Multiple InputQueue design has done away with the majorperformance inhibitor to Optimized LatencyMode (OLM) deployment:

� OLM is a fairly new OSA INBPERFoption, which shipped in CommunicationsServer for z/OS V1R11. But during z/OSV1R11 performance test, this mode wasfound to consume excessive amounts ofCPU when OSA is handling inboundstreaming traffic. As a result, OLM usagewas therefore not “generally”recommended. (In V1R11, we viewedOLM as a special-case mode that would beappropriate only for z/OS images thathandle homogeneous workloads of purelysmall-data, interactive traffic patterns.)

� The new IWQ mode makes OLMdeployment more attractive, as thecpu-intensive design elements of OLM donot engage on the bulk-data queue. So -

you can get the latency improvement forthe interactive traffic without beingexposed to excessive cpu consumptionrelated to inbound streaming traffic.

Some CPU consumption increase is stilllikely with the IWQ+OLM combination, sowe still won’t recommend this combinationfor everyone. Many customers will see animprovement in their interactivethroughput/response time when they makethe switch to IWQ by itself (without alsoturning on OLM). So we’ll recommend theIWQ+OLM combination only for z/OScustomers needing more interactivethroughput boost than IWQ can provide onits own -- and then only if the z/OS imagealso has ample CPU headroom to allow forincreased CPU consumption. More detailon this point will be contained insubsequent updates to this paper.

© 2010 IBM Corporation 18

2 While we do recommend IWQ mode for performance reasons, we have not made it the newdefault INBPERF mode in Communications Server V1R12. The slight increase in ECSA usagewith IWQ mode (see “Performance Disclaimers” in the introductory section) is the main reasonwe’ve opted to not change the default INBPERF setting. The default setting is stillBALANCED.

Page 19: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Part 5: Performance Data for Part 5: Performance Data for Part 5: Performance Data for Part 5: Performance Data for

Mixed Mixed Mixed Mixed ((((InteractiveInteractiveInteractiveInteractive + + + + StreamingStreamingStreamingStreaming))))

WorkloadsWorkloadsWorkloadsWorkloads

This section looks at mixedInteractive+Streaming workloads, and howIWQ’s independent Lan-Idle timing(per-queue) can improve interactive responsetime and throughput. In our performance testconfiguration (figure 4), z/OS-B is the“system-under-test”, meaning - we’ll varyparameters on that system, while holding allparameters on the other machines constant.The particular parameter of interest is theINBPERF mode of z/OS B’s OSA-Express 3.

z/OS-A and AIX™ have both 1Gb and 10Gbconnectivity to z/OS-B.

Two variants of mixed workload are studied.In the first, streaming traffic (like an FTP) isrunning z/OS-A to z/OS-B while interactiverequest/response workload is running betweenz/OS-B and AIX. In the second, we have theinteractive traffic running between z/OS-A andz/OS B, while the streaming traffic is runningAIX to z/OS-B.

z10z/osv1r12

OSA EXP-3in Balanced,Dynamic,or newIWQ mode

Aix 5.3p570

z/os-A z/os-B

1gb or 10gbethernet

Figure 4 - Perf test configuration for mixed workload

© 2010 IBM Corporation 19

Page 20: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

1Gb Ethernet Results (MixedInteractive|Streaming workload)

In this 1Gb ethernet test, we ranrequest/response traffic (30 TCP connections)between z/OS and AIX, while the z/OS systemunder test is also receiving streaming traffic(one TCP connection) over the same 1Gbeinterface. This primitive workloadcombination is a good simulation of application server/database request-responsecommunication, while the database server isalso servicing streaming traffic (like inboundFTP).

4663

9

7241

8

6621

1

6914

1

rr30 strm1

individual workload

0

20

40

60

80

Tho

usan

ds

RR

TP

S o

r S

trea

m K

B/S

ec

DYNAMICIWQ

rr30 is z/os to aixstrm1 is z/os to z/os

Mixed Workload - RR30 + StreamIWQ vs INBPERF Dynamic

1Gb ethernet

Figure 5 Mixed workload results on 1Gbe

We compare IWQ against the DYNAMICINBPERF mode, as (prior to IWQ)DYNAMIC is the setting most likely to yieldthe highest Reqeust/Response transaction ratewith reasonable CPU consumption. As described earlier in section 2, in a mixed(interactive+streaming) workload, we’d expect

the DYNAMIC mode to tend toward CPUconservation (growing the lan-idle timers),which will result in a longer hold time for theinteractive flows. With IWQ’s independentqueue-base lan idle timing (lan-idle timersdynamically set low on the primary queue), a55% improvement for interactive throughputwas measured. A modest streamingimprovement (~4.5%) was also achieved.(Note - on the chart, we chose streamingKB/Sec rather than MB/Sec just so the streaming magnitudes would line up betterwith RR TPS.)

IWQ’s positive results are even morepronounced in a mixed workload where theinteractive component is just a single TCPrequest/response connection. In such abenchmark run (not charted), IWQ provided an84% throughput boost (46% improvement inresponse time) for the request/responseconnection, as compared to the DYNAMICsetting.

© 2010 IBM Corporation 20

Page 21: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

10Gb Ethernet Results (MixedInteractive|Streaming workload)

More permutations were added into the 10Gbemixed-workload testing:

� obtained data for the BALANCED(CommServer default) setting in addition toDYNAMIC mode,

� included both z/OS-AIX and z/OS-z/OSinteractive flows in our testing (‘MixedWorkload #1 and #2’, respectively), and

� combined IWQ with Optimized LatencyMode (OLM), which was introduced inz/OS V1R11. Prior to IWQ, OLM usagewas not recommended if any degree ofstreaming traffic were to be serviced on theinterface. In IWQ mode however, OLM’scpu-intensive latency-reductionmechanisms do not apply to the bulk queue.So the presence of streaming traffic withIWQ+OLM should result in no throughputdegradation or increase in CPUconsumption.

The results contained in figure 6 againdemonstrate double-digit (percentage)interactive throughput boost with IWQ. (IWQprovided 17% higher interactive throughputthan BALANCED mode, and 11% higher thanDYNAMIC mode.)

Next, since IWQ removes the CPUconsumption risk likely with OptimizedLatency Mode (on streaming traffic), we rancombined IWQ+OLM as the final data point.The IWQ+OLM combination provided +41%higher interactive throughput than

BALANCED mode; +33% higher thanDYNAMIC mode.

rr30

strm1

indi

vidu

al w

orkl

oad

0 50 100 150 200 250 300

ThousandsRR TPS or Stream KB/Sec

rr30 is z/os to aixstrm1 is z/os to z/os

Mixed Workload - RR30 + StreamBalanced, Dynamic, IWQ, IWQ+OLM

10 Gb ethernet

Balanced Dynamic IWQ IWQ+OLM

rr30 strm172543 28344376761 28262485098 227123

102110 218010

Figure 6 Mixed Workload #1 results on 10Gbe

A few words about streaming throughput inthese results:

This 10Gbe mixed workload (in all INBPERFmodes) is bottlenecked on full 100%utilization of the OSA-Express3microprocessor. With this in mind, a boost ininteractive throughput can be achieved onlywith some accompanying degradation ofstreaming throughput. This explains thestreaming throughput degradation in theseresults (with the IWQ and IWQ+OLMsettings). We believe customers running a10Gbe mixed workload will agree -- theinteractive workload should naturally be

© 2010 IBM Corporation 21

Page 22: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

favored, even if that results in somedegradation of streaming throughput.

For completeness, in our final mixed-workloadtests (figure 7), we then swapped the configuration such that the interactive flowswould be between z/OS and z/OS, with thestreaming traffic flowing inbound to z/OSfrom AIX.

rr30

strm1

indi

vidu

al w

orkl

oad

0 50 100 150 200 250

ThousandsRR TPS or Stream KB/Sec

rr30 is z/os to z/osstrm1 is aix to z/os

Mixed Workload - RR30 + StreamBalanced, Dynamic, IWQ, IWQ+OLM

10 Gb ethernet

Balanced Dynamic IWQ IWQ+OLM

rr30 strm144461 17326154071 22046774856 21186677309 210944

Figure 7 Mixed Workload #2 results on 10Gbe

These positive results are more pronounced than those contained in figure 6. Notabledifferences:

� more of the interactive throughput boost isdirectly attributable to IWQ (without usingOLM): +68% interactive boost vsBALANCED mode; +38% vs DYNAMICmode.

� less degradation of the streamingthroughput was seen as interactivethroughput was improved.

Small timing effects due to the differingendpoints (for the interactive traffic) accountfor the differences between these results andthose in figure 6. (Mixed Workload #1 pusheda higher aggregate packets-per-second, andactually reached the bottleneck point in theOSA. Workload #2 did not reach thisbottleneck point.)

.

© 2010 IBM Corporation 22

Page 23: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

CPU Consumption for Mixed Workload

The mixed-workload results above all showedinteractive throughput was boosted in OSAconfigurations using IWQ mode (as comparedto the other INBPERF modes).

A reasonable question to then ask is: “doesIWQ provide this interactive throughputimprovement at a reasonable CPU cost?”

We believe the answer to this question is YES.But the analysis of CPU consumption in amixed workload is a bit tricky. Please seeAppendix B for details.

The interactive throughput boost IWQdelivers is due to very aggressive scheduling of(and interruption criteria for) traffic on theprimary input queue. Interrupts and aggressivescheduling do cost CPU cycles, so some CPUgrowth is to be expected with IWQ in a mixedworkload.

In the earlier INBPERF modes, interactivetraffic would generally get a free ride on theMVS SRB(s) servicing the bulk traffic. Andwhile these earlier modes may result in lowlevels of CPU consumption, our results aboveshow these modes typically produceless-than-optimal response times forinteractive traffic.

Some of the increased processing expense (dueto IWQ’s prompt scheduling of interactivetraffic arriving on the primary queue) is offsetby new processing efficiencies z/OS gains inthe processing of streaming traffic arriving onthe bulk queue. This aspect is discussed in thenext section.

For mixed workloads, we conclude IWQcan provide substantial (up to ~45%)interactive response time improvementwith minor (less than 3%) increase innormalized (per-transaction) network cpuconsumption 3.

© 2010 IBM Corporation 23

3 Network CPU Consumption is a measure of all TCP/IP communications-related CPU resource consumed by an application on a per-transaction (or per-Megabyte) basis. Applications consume anadditional amount of CPU resource (per transaction) which is unrelated to communications, and thisamount varies from application to application. An example: Network CPU Consumption might be~15% of total CPU for certain interactive DB2/DRDA database workloads - so the 3% network cpuincrease cited above would yield an overall system cpu increase of less than half a percent.

Page 24: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Part Part Part Part 6666: Performance Data for: Performance Data for: Performance Data for: Performance Data for

Purely Streaming WorkloadsPurely Streaming WorkloadsPurely Streaming WorkloadsPurely Streaming Workloads

In Part 2, we discussed the importance ofkeeping streaming data in-order, and how TCPacknowledgement frequency is a critical factorin maintaining throughput over a period ofpacket loss. This section now presents theperformance data collected for pure-streamingworkloads.

Streaming throughput with some packet lossin the network :

We were fortunate (didn’t think so at the time)to have a few days in the lab where fiberconnectivity into our 10Gbe switch wascausing frequent CRC (cyclic redundancycheck) errors. This caused packet loss,affording us a “less-than-pristine”environment to performance-test IWQ +APAR PM20056, relative to earlier INBPERFmodes.

359.

238

7.7 47

9.2

0

100

200

300

400

500

600

Str

eam

MB

/Sec BALANCED ResultDYNAMIC ResultIWQ Result

z/OS-z/OS Streaming Throughputover lossy media (CRC errors - dropped pkts)

10 Gb ethernet

Figure 8 Streaming Throughput - lossy media

In this series of tests (figure 8), IWQoutperformed the other modes by a largemargin (+33% vs BALANCED mode and+24% vs DYNAMIC mode). Further, itshould be noted - the CRC error rate beingexperienced by all modes in these runs wasvariable (and unrelated to the mode of theOSA under test). Of the three performanceruns, the IWQ-mode run experienced thehighest CRC error rate - and yet IWQ stilldelivered much higher streaming throughput.This positive result is largely due to theincreased TCP Acknowledgement frequencychange included in APAR PM20056(discussed in Part 2).

CPU consumption in this series of tests is alsoa good story. On the streaming-sender side(figure 9), CPU consumption (per MBtransferred) is improved ~12% (reduced from~780 uSec/MB down to 690 uSec/MB).

776

788

690

0100200300400500600700800900

Per

MB

CP

U m

icro

seco

nds

cons

umed

BALANCED ResultDYNAMIC ResultIWQ Result

this data is from the streaming-sender side

z/OS-z/OS Streaming CPU Consumptionover lossy media (CRC errors - dropped pkts)

10 Gb ethernet

Figure 9 Send-side CPU consumption - lossy media

© 2010 IBM Corporation 24

Page 25: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

IWQ’s reduction in sender-side CPUconsumption is likely due to FastRetransmissions (whereas BALANCED andDYNAMIC modes retransmissions will almostalways be timer-induced).

On the streaming-receiver side (figure 10),CPU consumption (per MB transferred) is flatin all modes. This implies the higher ACKfrequency change (which enables in IWQmode) does not drive up CPU consumption(again, good news).

1904

1933

1902

0

500

1000

1500

2000

2500

Per

MB

CP

U m

icro

seco

nds

cons

umed

BALANCED ResultDYNAMIC ResultIWQ Result

this data is from the streaming-receiver side

z/OS-z/OS Streaming CPU Consumptionover lossy media (CRC errors - dropped pkts)

10 Gb ethernet

Figure 10 Receive-side CPU consumption - lossy media

Streaming results on a “clean” network (nopacket loss):

Once the fiber to the 10Gbe switch wasreplaced, we were back to zero packet loss inthe lab network. Here’s how IWQ then faredrelative to DYNAMIC mode in this pristineenvironment:

0100200300400500600

(MB

/Sec

)

Str

eam

ing

Thr

ough

put

Streaming Throughput z/OS and AIXIWQ vs Dynamic

10 Gb ethernet

DYNAMIC IWQ

z/os to aix aix to z/os210.2 492.1294.7 492.1

Figure 11 z/OS-AIX Stream throughputs - clean network

For outbound (from z/OS) streamingworkloads, the figure 11 data show IWQprovided a +41% throughput boost relative toDYNAMIC mode.

For inbound (to z/OS) streaming workload,throughput was identical between IWQ andDYNAMIC modes.

The explanation for the differing results (basedon streaming direction) is related to the twotypes of SRB race discussed in Part 2. InDYNAMIC INBPERF mode (in the lab), wewere easily able to generate the “outboundordering” SRB race condition - and the figure11 results show that race condition is nowresolved in IWQ mode. On the other hand, forinbound (TO z/OS) streaming, we were unableto generate enough of the “inbound-data SRBrace” events to produce a measurable impacton throughput in either mode.4

© 2010 IBM Corporation 25

4 A production z/OS image (with many applications running concurrently), would likely generate farmore inbound TCP/IP SRB races than we were able to cause in the networking lab. Disk-related I/Ointerrupts, timer interrupts, and usage of shared-CPs would be enough to alter TCP/IP SRB timings,thereby driving up odds of out-of-order packet delivery. (These factors were somewhat lacking in ournetwork lab environment.)

Page 26: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

The explanation for figure 11’s higherthroughput into-z/OS as compared to into-AIXis beyond the scope of this paper. (The issuehas nothing to do with z/OS, but rather isrelated to the performance capability of the10g adapter on the AIX p575 machine.)

Turning next to Streaming CPU consumptionon the “clean” network:

1029

924 10

4710

47

z/os to aix aix to z/os0

200

400

600

800

1000

1200

Per

MB

CP

U m

icro

seco

nds

cons

umed

DYNAMICIWQ

Streaming CPU ConsumptionIWQ vs Dynamic

10 Gb ethernet

Figure 12 Stream cpu consumption - clean network

Similar to the throughput results shown infigure 10, the cpu-consumption results alsoshow improvement (~10.5% reduction in CPUconsumption per MB) for outbound (fromz/OS) streaming, and a flat result for inbound(to z/OS) streaming. The flat result forinbound streaming was again due to ourinability to sufficiently generate the inboundSRB race condition in the lab.

Note: some readers may have noticed thefigure 12 CPU consumption numbers forz/OS-to-AIX streaming are quite a bit higherthan those in figure 9 (which was z/OS-z/OS -and we were coping with significant packetloss in the fig 9 data). The difference here isdue to the much higher TCP ACK frequencygenerated by the AIX TCP stack (as comparedto z/OS). 5 AIX’s higher ACK frequencymeans the transmit node will have to process

© 2010 IBM Corporation 26

5 The increased z/OS ACK frequency introduced with APAR PM20056 becomes enabled only duringperiods of out-of-order delivery (which may indicate packet loss). While data is arriving in-order, z/OSTCP will continue to employ a low ACK frequency (to conserve CPU on both ends of the connectionwhen there’s no evidence of packet loss).

Page 27: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

more inbound packets per outbound MB,which inevitably drives up CPU cost.

© 2010 IBM Corporation 27

Page 28: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Part 7: Performance Data forPart 7: Performance Data forPart 7: Performance Data forPart 7: Performance Data for

Sysplex DistributorSysplex DistributorSysplex DistributorSysplex Distributor

The performance-test configuration for theSysplex Distributor study is similar to that usedin the mixed (Interactive+Streaming) study:

z10’sz/osv1r12

OSA EXP-3 in Dynamicor newIWQ mode

z/os-A z/os-B

1gbethernet

z/os-C

z/OS-B is the Sysplex Distributor node;

z/OS-C is the Target node for distribution;

z/OS-A is the traffic generator (client)

z/OS-B is the sysplex distributor node, andwe’ll vary the INBPERF setting on thatmachine’s OSA-Express3 (Dynamic and IWQmodes).

We drive this configuration with a mixedworkload. Our intent is to drive all threequeues (Bulk, Sysplex Distributor andPrimary) concurrently.

Sysplex Mixed Workload - threemini-workloads

� RR20: 20 TCP Request/Responseconnections running between z/OS-A andz/OS-B. This traffic is not targeting a

distributed DVIPA, so will be serviced viathe primary queue of z/OS-B’s OSA.

� DIST-INTERACT-20: Another 20 TCPRequest/Response Connections. But thistraffic is targeting a distributed DVIPA, sowill be serviced on the Sysplex Distributorqueue of z/OS B’s OSA, then distributedout to the target machine (z/OS-C).

� STRM1: Streaming traffic running fromz/OS-A into z/OS-B. This traffic will beserviced on the bulk queue of z/OS-B’sOSA.

© 2010 IBM Corporation 28

Page 29: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Performance Results for the SysplexWorkload:

24217

34886

2317427252

63112

55078

rr20 dist-interact strm10

10

20

30

40

50

60

70

Tho

usan

dsR

R T

PS

or

Str

eam

KB

/Sec

DYNAMICIWQ

Sysplex Distributor Mixed WorkloadDynamic vs IWQ

1Gb ethernet

Figure 13 - Sysplex Distributor results

Figure 13 provides the first glimpse of all threeof IWQ’s input queues operating concurrently.By transparently separating thelatency-demaning traffic away from thestreaming traffic, IWQ provided:

� +44% throughput boost to the interactivetraffic for applications on the local host (RR20above) and

� +18% throughput boost for the distributedtraffic (dist-interact20 above).

This workload mix fully saturates the 1Gbethernet (even in DYNAMIC mode). So thethroughput improvements above (for RR20and dist-interact) are made possible only viasome degradation of the streaming throughput(to free up some bandwidth for interactiveflows). Again this streaming degradation isthe natural consequence of IWQ’s aggressivescheduling of traffic arriving on the non-bulkqueues (which results in thelatency-demanding traffic being given a largerslice of the 1Gb bandwidth).

© 2010 IBM Corporation 29

Page 30: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Appendix A: IWQAppendix A: IWQAppendix A: IWQAppendix A: IWQ -Related-Related-Related-Related

DiagnosticsDiagnosticsDiagnosticsDiagnostics

Determine if IWQ is enabled for your QDIOinterface To determine if Inbound Workload Queueing(IWQ) has been enabled on yourOSA-Express3 1Gb or 10Gb ethernetinterface, use the Netstat Devlinks/-dcommand. This will display the status andassociated configuration values for interfacesdefined to the TCP/IP stack. Below is a anexample of what the display for an OSAExpress QDIO interface in the NetstatDevlinks/-d report :

INTFNAME: LGBNS24G INTFTYPE: IPAQENET INTFSTATUS: READYINTFNAME: LGBNS24G INTFTYPE: IPAQENET INTFSTATUS: READYINTFNAME: LGBNS24G INTFTYPE: IPAQENET INTFSTATUS: READYINTFNAME: LGBNS24G INTFTYPE: IPAQENET INTFSTATUS: READY PORTNAME: GBNS24G DATAPATH: 2DA4 DATAPATHSTATUS: READY PORTNAME: GBNS24G DATAPATH: 2DA4 DATAPATHSTATUS: READY PORTNAME: GBNS24G DATAPATH: 2DA4 DATAPATHSTATUS: READY PORTNAME: GBNS24G DATAPATH: 2DA4 DATAPATHSTATUS: READY CHPIDTYPE: OSD CHPIDTYPE: OSD CHPIDTYPE: OSD CHPIDTYPE: OSD SPEED: 0000010000 SPEED: 0000010000 SPEED: 0000010000 SPEED: 0000010000 IPBROADCASTCAPABILITY: NO IPBROADCASTCAPABILITY: NO IPBROADCASTCAPABILITY: NO IPBROADCASTCAPABILITY: NO VMACADDR: 0200113B9213 VMACORIGIN: OSA VMACROUTER: ALL VMACADDR: 0200113B9213 VMACORIGIN: OSA VMACROUTER: ALL VMACADDR: 0200113B9213 VMACORIGIN: OSA VMACROUTER: ALL VMACADDR: 0200113B9213 VMACORIGIN: OSA VMACROUTER: ALL ARPOFFLOAD: YES ARPOFFLOADINFO: YES ARPOFFLOAD: YES ARPOFFLOADINFO: YES ARPOFFLOAD: YES ARPOFFLOADINFO: YES ARPOFFLOAD: YES ARPOFFLOADINFO: YES CFGMTU: NONE ACTMTU: 8992 CFGMTU: NONE ACTMTU: 8992 CFGMTU: NONE ACTMTU: 8992 CFGMTU: NONE ACTMTU: 8992 IPADDR: 9.67.170.59/0 IPADDR: 9.67.170.59/0 IPADDR: 9.67.170.59/0 IPADDR: 9.67.170.59/0 VLANID: NONE VLANPRIORITY: DISABLED VLANID: NONE VLANPRIORITY: DISABLED VLANID: NONE VLANPRIORITY: DISABLED VLANID: NONE VLANPRIORITY: DISABLED READSTORAGE: GLOBAL (8064K) READSTORAGE: GLOBAL (8064K) READSTORAGE: GLOBAL (8064K) READSTORAGE: GLOBAL (8064K) INBPERF: DYNAMIC INBPERF: DYNAMIC INBPERF: DYNAMIC INBPERF: DYNAMIC WORKLOADQUEUEING: YES WORKLOADQUEUEING: YES WORKLOADQUEUEING: YES WORKLOADQUEUEING: YES CHECKSUMOFFLOAD: YES CHECKSUMOFFLOAD: YES CHECKSUMOFFLOAD: YES CHECKSUMOFFLOAD: YES SECCLASS: 255 MONSYSPLEX: NO SECCLASS: 255 MONSYSPLEX: NO SECCLASS: 255 MONSYSPLEX: NO SECCLASS: 255 MONSYSPLEX: NO ISOLATE: NO OPTLATENCYMODE: NO ISOLATE: NO OPTLATENCYMODE: NO ISOLATE: NO OPTLATENCYMODE: NO ISOLATE: NO OPTLATENCYMODE: NO MULTICAST SPECIFIC: MULTICAST SPECIFIC: MULTICAST SPECIFIC: MULTICAST SPECIFIC: MULTICAST CAPABILITY: YES MULTICAST CAPABILITY: YES MULTICAST CAPABILITY: YES MULTICAST CAPABILITY: YES GROUP REFCNT SRCFLTMD GROUP REFCNT SRCFLTMD GROUP REFCNT SRCFLTMD GROUP REFCNT SRCFLTMD ----- ------ -------- ----- ------ -------- ----- ------ -------- ----- ------ -------- 224.0.0.1 0000000001 EXCLUDE 224.0.0.1 0000000001 EXCLUDE 224.0.0.1 0000000001 EXCLUDE 224.0.0.1 0000000001 EXCLUDE SRCADDR: NONE SRCADDR: NONE SRCADDR: NONE SRCADDR: NONE INTERFACE STATISTICS: INTERFACE STATISTICS: INTERFACE STATISTICS: INTERFACE STATISTICS: BYTESIN = 316 BYTESIN = 316 BYTESIN = 316 BYTESIN = 316 INBOUND PACKETS = 1 INBOUND PACKETS = 1 INBOUND PACKETS = 1 INBOUND PACKETS = 1 INBOUND PACKETS IN ERROR = 0 INBOUND PACKETS IN ERROR = 0 INBOUND PACKETS IN ERROR = 0 INBOUND PACKETS IN ERROR = 0 INBOUND PACKETS DISCARDED = 0 INBOUND PACKETS DISCARDED = 0 INBOUND PACKETS DISCARDED = 0 INBOUND PACKETS DISCARDED = 0

When the “WorkloadQueueing” field of theNetstat Devlinks/-d report is set to YES, you have validated that IWQ is enabled for thisQDIO interface. You can also obtain thisinformation with a GetIfs request for theTCP/IP callable NMI (EZBNMIFR).

Determine whether routing variablesdescribing the ancillary queues are registeredwith OSA The new DISPLAY TCPIP,,OSAINFOcommand can be used to retrieve informationfor active IPAQENET and IPAQENET6interfaces. When IWQ is being utilized youwill observe “Ancillary Input”Queue RoutingVariable” information. These routing variablesidentify which inbound packets are to bepresented on an ancillary input queue.

D TCPIP,,OSAINFO,INTFN=LGBNS28A . . . Registered Addresses: IPv4 Unicast Addresses: ARP: Yes Addr: 9.67.170.41 Total number of IPv4 addresses: 1 IPv4 Multicast Addresses: MAC: 01005E000001 Addr: 224.0.0.1 Total number of IPv4 addresses: 1 Ancillary Input Queue Routing Variables: Queue Type: BULKDATA Queue ID: 2 Protocol: TCP Src: 9.67.170.66..1068 Dst: 9.67.170.41..2141 Total number of IPv4 connections: 1 Queue Type: SYSDIST Queue ID: 3 Protocol: TCP Addr: 9.67.210.28 Total number of IPv4 addresses: 1 31 of 31 lines displayed

From the retrieval of OSA information usingthe DISPLAY TCPIP,,OSAINFO we canobserve the following: � There is one registered TCP bulk-data

connection and the inbound packets that areto be presented on the BULKDATAancillary queue will have the followingcharacteristics: � Source IP address of 9.67.170.66� Source port number of 1068 � Destination IP address of 9.67.170.41� Destination port number of 2141 � Protocol of TCP

� There is one sysplex distributor DVIPAregistered with an IP address of

© 2010 IBM Corporation 30

Page 31: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

9.67.210.28. Therefore all TCP trafficdestined for this DVIPA address will bepresented to the SYSDIST ancillary queue.

Determine if inbound traffic is using InboundWorkload Queueing (IWQ) by using VTAM™ tuning statistics VTAM tuning statistics can be utilized todetermine if inbound QDIO traffic is utilizingIWQ. On the IST1233I messages you will begiven information for the read and writequeues being utilized. When IWQ is beingutilized you will observe ancillary read queueinformation along with the primary queueinformation.

IST1233I DEV = 2D02 DIR = READ IST1233I DEV = 2D02 DIR = READ IST1233I DEV = 2D02 DIR = READ IST1233I DEV = 2D02 DIR = READ

IST1234I BSIZE = 4092 MAXBYTES = 274 IST1234I BSIZE = 4092 MAXBYTES = 274 IST1234I BSIZE = 4092 MAXBYTES = 274 IST1234I BSIZE = 4092 MAXBYTES = 274

IST1235I SIO = 5 SLOWDOWN = 0 IST1235I SIO = 5 SLOWDOWN = 0 IST1235I SIO = 5 SLOWDOWN = 0 IST1235I SIO = 5 SLOWDOWN = 0

IST1236I BYTECNTO = 0 BYTECNT = 1144 IST1236I BYTECNTO = 0 BYTECNT = 1144 IST1236I BYTECNTO = 0 BYTECNT = 1144 IST1236I BYTECNTO = 0 BYTECNT = 1144

IST1570I NBYTECTO = 0 NBYTECT = 1144 IST1570I NBYTECTO = 0 NBYTECT = 1144 IST1570I NBYTECTO = 0 NBYTECT = 1144 IST1570I NBYTECTO = 0 NBYTECT = 1144

IST924IIST924IIST924IIST924I----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = RD/1 (PRIMARY)RD/1 (PRIMARY)RD/1 (PRIMARY)RD/1 (PRIMARY)

IST1719I PCIREALO = 0 PCIREAL = 460725 IST1719I PCIREALO = 0 PCIREAL = 460725 IST1719I PCIREALO = 0 PCIREAL = 460725 IST1719I PCIREALO = 0 PCIREAL = 460725

IST1720I PCIVIRTO = 0 PCIVIRT = 383419 IST1720I PCIVIRTO = 0 PCIVIRT = 383419 IST1720I PCIVIRTO = 0 PCIVIRT = 383419 IST1720I PCIVIRTO = 0 PCIVIRT = 383419

IST1750I PCITHRSO = 0 PCITHRSH = 11043 IST1750I PCITHRSO = 0 PCITHRSH = 11043 IST1750I PCITHRSO = 0 PCITHRSH = 11043 IST1750I PCITHRSO = 0 PCITHRSH = 11043

IST1751I PCIUNPRO = 0 PCIUNPRD = 49505 IST1751I PCIUNPRO = 0 PCIUNPRD = 49505 IST1751I PCIUNPRO = 0 PCIUNPRD = 49505 IST1751I PCIUNPRO = 0 PCIUNPRD = 49505

IST2316I EARLYINO = 0 EARLYINT = 0 IST2316I EARLYINO = 0 EARLYINT = 0 IST2316I EARLYINO = 0 EARLYINT = 0 IST2316I EARLYINO = 0 EARLYINT = 0

IST2317I ULPRETUO = 0 ULPRETU = 0 IST2317I ULPRETUO = 0 ULPRETU = 0 IST2317I ULPRETUO = 0 ULPRETU = 0 IST2317I ULPRETUO = 0 ULPRETU = 0

IST1752I RPROCDEO = 0 RPROCDEF = 0 IST1752I RPROCDEO = 0 RPROCDEF = 0 IST1752I RPROCDEO = 0 RPROCDEF = 0 IST1752I RPROCDEO = 0 RPROCDEF = 0

IST1753I RREPLDEO = 0 RREPLDEF = 0 IST1753I RREPLDEO = 0 RREPLDEF = 0 IST1753I RREPLDEO = 0 RREPLDEF = 0 IST1753I RREPLDEO = 0 RREPLDEF = 0

IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0

IST1721I SBALCNTO = 0 SBALCNT = 687624 IST1721I SBALCNTO = 0 SBALCNT = 687624 IST1721I SBALCNTO = 0 SBALCNT = 687624 IST1721I SBALCNTO = 0 SBALCNT = 687624

IST1722I PACKCNTO = 0 PACKCNT = 1440304 IST1722I PACKCNTO = 0 PACKCNT = 1440304 IST1722I PACKCNTO = 0 PACKCNT = 1440304 IST1722I PACKCNTO = 0 PACKCNT = 1440304

IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0

IST1236I BYTECNTO = 0 BYTECNT = 265185634 IST1236I BYTECNTO = 0 BYTECNT = 265185634 IST1236I BYTECNTO = 0 BYTECNT = 265185634 IST1236I BYTECNTO = 0 BYTECNT = 265185634

IST1810I PKTIQDO = 0 PKTIQD = 0 IST1810I PKTIQDO = 0 PKTIQD = 0 IST1810I PKTIQDO = 0 PKTIQD = 0 IST1810I PKTIQDO = 0 PKTIQD = 0

IST1811I BYTIQDO = 0 BYTIQD = 0 IST1811I BYTIQDO = 0 BYTIQD = 0 IST1811I BYTIQDO = 0 BYTIQD = 0 IST1811I BYTIQDO = 0 BYTIQD = 0

IST924IIST924IIST924IIST924I----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = RD/2(BULKDATA)RD/2(BULKDATA)RD/2(BULKDATA)RD/2(BULKDATA)

IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0

IST1721I SBALCNTO = 0 SBALCNT = 116135 IST1721I SBALCNTO = 0 SBALCNT = 116135 IST1721I SBALCNTO = 0 SBALCNT = 116135 IST1721I SBALCNTO = 0 SBALCNT = 116135

IST1722I PACKCNTO = 0 PACKCNT = 2504216 IST1722I PACKCNTO = 0 PACKCNT = 2504216 IST1722I PACKCNTO = 0 PACKCNT = 2504216 IST1722I PACKCNTO = 0 PACKCNT = 2504216

IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0

IST1236I BYTECNTO = 0 BYTECNT = 3765010664 IST1236I BYTECNTO = 0 BYTECNT = 3765010664 IST1236I BYTECNTO = 0 BYTECNT = 3765010664 IST1236I BYTECNTO = 0 BYTECNT = 3765010664

IST1810I PKTIQDO = 0 PKTIQD = 0 IST1810I PKTIQDO = 0 PKTIQD = 0 IST1810I PKTIQDO = 0 PKTIQD = 0 IST1810I PKTIQDO = 0 PKTIQD = 0

IST1811I BYTIQDO = 0 BYTIQD = 0 IST1811I BYTIQDO = 0 BYTIQD = 0 IST1811I BYTIQDO = 0 BYTIQD = 0 IST1811I BYTIQDO = 0 BYTIQD = 0

IST924IIST924IIST924IIST924I----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = IST1233I DEV = 2D04 DIR = RD/3 (SYSDIST)RD/3 (SYSDIST)RD/3 (SYSDIST)RD/3 (SYSDIST)

IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0 IST1754I NOREADSO = 0 NOREADS = 0

IST1721I SBALCNTO = 0 SBALCNT = 520325 IST1721I SBALCNTO = 0 SBALCNT = 520325 IST1721I SBALCNTO = 0 SBALCNT = 520325 IST1721I SBALCNTO = 0 SBALCNT = 520325

IST1722I PACKCNTO = 0 PACKCNT = 1346267 IST1722I PACKCNTO = 0 PACKCNT = 1346267 IST1722I PACKCNTO = 0 PACKCNT = 1346267 IST1722I PACKCNTO = 0 PACKCNT = 1346267

IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0 IST2185I FRINVCTO = 0 FRINVCT = 0

IST1236I BYTECNTO = 0 BYTECNT = 247709288 IST1236I BYTECNTO = 0 BYTECNT = 247709288 IST1236I BYTECNTO = 0 BYTECNT = 247709288 IST1236I BYTECNTO = 0 BYTECNT = 247709288

IST1810I PKTIQDO = 0 PKTIQD = 1344804 IST1810I PKTIQDO = 0 PKTIQD = 1344804 IST1810I PKTIQDO = 0 PKTIQD = 1344804 IST1810I PKTIQDO = 0 PKTIQD = 1344804

IST1811I BYTIQDO = 0 BYTIQD = 247442036IST1811I BYTIQDO = 0 BYTIQD = 247442036IST1811I BYTIQDO = 0 BYTIQD = 247442036IST1811I BYTIQDO = 0 BYTIQD = 247442036

From the above display you can observe thatinbound data is being placed onto separateinbound ancillary queues for both TCPbulk-data and sysplex distributor traffic. Allother inbound data is directed to the primaryread queue.

Find all TCP connections that are associatedwith the bulk-data ancillary queue

You can use the Netstat ALL/-A command to determine whether any TCP connections areregistered to the TCP bulk data ancillary inputqueue. For TCP connections that are registeredto the bulk data ancillary queue then the“Ancillary Input Queue” field will be set toYES. Additionally the BulkDataIntfName fieldwill indicate the name of the interface overwhich the inbound traffic is being received. Inthe sample below the TCP connection isregistered to the bulk data ancillary queue andthe inbound data is being received over theinterface LGBNS28A.

D TCPIP,,NETSTAT,ALL

CLIENT NAME: BWM4 CLIENT ID: 000000B6 LOCAL SOCKET: 9.67.170.41..2141 FOREIGN SOCKET: 9.67.170.66..1068 BYTESIN: 00000000010562347366

© 2010 IBM Corporation 31

Page 32: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

BYTESOUT: 00000000000000000545 SEGMENTSIN: 00000000000007444951 SEGMENTSOUT: 00000000000000383344 . . . SENDDATAQUEUED: 0000000000 ANCILLARY INPUT QUEUE: YES BULKDATAINTFNAME: LGBNS28A

© 2010 IBM Corporation 32

Page 33: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Determine number of segments received forall TCP connections that are associated withthe bulk-data ancillary queue

The Netstat STATS/-S command is used toshow statistics for the TCP protocol (amongother statistics). This command will now alsoshow the total number of segments receivedfor all connections from the BulkData ancillaryinput queue (AIQ) of the OSA-Express QDIOinbound workload queueing function. Thisnew TCP statistic is displayed as“SEGMENTS RECEIVED ON OSA BULKQUEUES”.

D TCPIP,TCPIP,NETSTAT,STATS

IP STATISTICS (IPV4) PACKETS RECEIVED = 3217803 RECEIVED HEADER ERRORS = 0 RECEIVED ADDRESS ERRORS = 0 DATAGRAMS FORWARDED = 1080 UNKNOWN PROTOCOLS RECEIVED = 0 RECEIVED PACKETS DISCARDED = 0 RECEIVED PACKETS DELIVERED = 3216715 OUTPUT REQUESTS = 1224896 OUTPUT DISCARDS NO ROUTE = 0 OUTPUT DISCARDS (OTHER) = 0 . . .TCP STATISTICS CURRENT ESTABLISHED CONNECTIONS = 23 . . .

WINDOW UPDATES RECEIVED = 0 SEGMENTS RECEIVED ON OSA BULK QUEUES= 2108346 SEGMENTS SENT = 1224896 WINDOW UPDATES SENT = 25959 DELAYED ACKS SENT = 2 RESETS SENT = 0 SEGMENTS RETRANSMITTED = 0 RETRANSMIT TIMEOUTS = 0 CONNECTIONS DROPPED BY RETRANSMIT = 0 PATH MTU DISCOVERY RETRANSMITS = 0 PATH MTU BEYOND RETRANSMIT LIMIT = 0 WINDOW PROBES SENT = 0 CONNECTIONS DROPPED DURING PROBE = 0 KEEPALIVE PROBES SENT = 0 CONNECTIONS DROPPED BY KEEPALIVE = 0

CONNECTIONS DROPPED BY FINWAIT2 = 0

© 2010 IBM Corporation 33

Page 34: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

Appendix B: CPU ConsumptionAppendix B: CPU ConsumptionAppendix B: CPU ConsumptionAppendix B: CPU Consumption

Analysis for Mixed WorkloadsAnalysis for Mixed WorkloadsAnalysis for Mixed WorkloadsAnalysis for Mixed Workloads

(System z10)(System z10)(System z10)(System z10)

With a mixed workload, there is no way to directlymeasure the amount of CPU being consumed bythe R/R workload and the amount being consumedby the Streaming workload. And we need to knowthis breakout, because IWQ favors the moreCPU-intensive R/R workload (r/r is more cycleintensive because it involves a full trip through thesocket layer on every transaction... By contrast,streaming workloads have socket activityoccurring only once per some LARGE number ofpackets in or out).

We can derive the R/R vs Stream breakout if westart with an assumption that streaming CPUconsumption per packet should be ~equivalent inpure streaming workload and mixed workload.(The presence of the R/R traffic will not have anymeaningful impact on the CPU consumption permegabyte of a streaming workload -- so it shouldbe valid to measure pure stream and assume theCPU consumption for stream will not changemuch when we add an interactive workload to runalongside it.)

To provide an extremely challenging comparison,we'll use the CPU consumption of a streamingworkload in BALANCED mode. (The resultingCPU consumption benchmark will then be vsBALANCED mode, since it's very CPU-efficientand is the default in the field.)

Result for Balanced mode Streamingsingle-session:

79223 packets/sec @ 3CPs x 5.869% busy(each). = 2.22 CPU microsec per streampacket.

Next, again to provide a challenging benchmarkfor IWQ mode, we'll derive the CPU cost pertransaction for the R/R portion of a mixedworkload in BALANCED mode:

Result for mixed-workload test: balanced mode10Gb eth Stream1+RR30: stream: 245816 packetsper second; RR: 65128 packets per second; 3 CPsavg 68.08% busy. Use the 2.22 microsec perstream packet (from the pure-stream measurement)to determine the per-tran R/R CPU consumption:

245816 stream pkts/sec *.00000222 CPUsec/pkt = .545 (of one engine) = .1819 of eachof 3 engines (each engine is 18.19% busyw/stream workload)

Well we actually measured the machine (each of 3CPs) at 68.08% busy in the balanced-mode mixedworkload, so to determine CPU cost per R/R tran:

(3*(.6808-.1819))/65128 rr tps = 22.98 CPUmicrosec per r/r tran.

So we now have reasonable estimates of theamount of CPU consumed by a stream packet andan R/R transaction, when they're running togetherin balanced mode. Now just apply these CPUconsumption results to the throughputs seen in theIWQ mixed workload run - here we're calculatinghow busy we'd expect the IWQ config to be, if aStream packet and an R/R transaction burnedexactly the same amount of CPU we saw in thebalanced config.

IWQ mixed workload result: Strm1 + RR30 inIWQ mode on 10Gb ethernet: 246991 streampackets per second and 78857 R/R transactions persecond. Now, using the CPU consumptionnumbers just derived:

(246991*.00000222) + (78857*.00002298) =2.36 (2.36 engines consumed) = 3 engines @ 78.67 % busy. This is how busy we'd expectthe IWQ config to be if CPU consumption (per

© 2010 IBM Corporation 34

Page 35: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

stream or R/R transaction) were identical tothe balanced mode config.

Well we were actually 80.53% busy in theIWQ-mode mixed workload. So 80.53/78.67 =1.024...meaning IWQ mode is burning 2.4% moreCPU than BALANCED mode would IFBALANCED MODE COULD ACHIEVE THISKIND OF TRANSACTION RATE (which itcan't).

So IWQ in this test delivered a 21% improvementin interactive throughput, at an increasedprocessing expense (per transaction) of just 2.4%.

© 2010 IBM Corporation 35

Page 36: z/OS V1R12 Communications Server...Introduction OSA-Express3 Inbound Workload Queueing (IWQ) was included in the July 2010 announcements of z/OS V1R12 and the zEnterprise 196 Server

End of DocumentEnd of DocumentEnd of DocumentEnd of Document

© 2010 IBM Corporation 36