62
Oracle SOA Suite 11 g Troubleshooting Methodology Compiled by :Amit Deo,Oracle FMW SME Consultant Note:The middleware Universe is full of Workarounds :)

Troubleshooting SOA suite11g

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Troubleshooting SOA suite11g

Oracle SOA Suite 11g Troubleshooting Methodology

Compiled by :Amit Deo,Oracle FMW SME Consultant

Note:The middleware Universe is full of Workarounds :)

Page 2: Troubleshooting SOA suite11g

Slide 2 of 64 © |

1. Introduction

2. The Problem

3. The Basics of Troubleshooting: Where Do You Start?

4. Infrastructure Issues

5. Performance Issues

6. Deployment Issues

7. Summary

Agenda

Page 3: Troubleshooting SOA suite11g

Slide 3 of 64 © |

INTRODUCTION

Page 4: Troubleshooting SOA suite11g

Slide 5 of 64 © |

THE PROBLEM

Page 5: Troubleshooting SOA suite11g

Slide 6 of 64 © |

T-Mobile's support team had an exceedingly difficult time

pinpointing the specific cause of the problem.

Not only did the team involve representatives for each IT

functional area, they had no way to troubleshoot from the

source and no one team had visibility of the complete

picture.

In general resolving problems took the T-Mobile'smelded support team approximately multiple days.

How Every Large Company Troubleshoots

Page 6: Troubleshooting SOA suite11g

Slide 8 of 64 © |

In the past, App and network admins were to blame for

everything.

Problem With Troubleshooting Integrations

Page 7: Troubleshooting SOA suite11g

Slide 9 of 64 © |

In the FMW Universe, the integration folks are the new target.

Problem With Troubleshooting Integrations

Page 8: Troubleshooting SOA suite11g

Slide 10 of 64 © |

Numerous touch points

Numerous SOA technologies

Focus of this document is on Oracle SOA Suite 11g

Problem With Troubleshooting Integrations W

eb

Ap

plic

ati

on

OE

G

OS

B

SO

A S

uit

e

OS

B

ODI/OAM/OIM

1

3

2

4

Page 9: Troubleshooting SOA suite11g

Slide 11 of 64 © |

We created WLST wrapper script that loops through and

performs garbage collection for all managed servers

OSB relentlessly fails over HTTPS or due toother connectivity reasons

Always getting OutOfMemoryError: PermGen space

after new installs/deployments

Weird… but at least consistent

Real World Scenario – Bizarre Behaviour

Page 10: Troubleshooting SOA suite11g

Slide 12 of 64 © |

Real World Scenario – Convoluted & Unclear

The infamous and ever misleading “Unable to access the

following endpoints” error

Page 11: Troubleshooting SOA suite11g

Slide 13 of 64 © |

Could be:

Caused by: java.net.SocketTimeoutException:

Read timed out

Message send failed:

sun.security.validator.ValidatorException:

PKIX path building failed:

sun.security.provider.certpath.SunCertPathBu

ilderException: unable to find valid

certification path to requested target

Real World Scenario – Convoluted & Unclear

Page 12: Troubleshooting SOA suite11g

Slide 14 of 64 © |

THE Basic Principles OF TROUBLESHOOTING: WHERE DO

YOU START?

Page 13: Troubleshooting SOA suite11g

Slide 15 of 64 © |

Part skill

Some people have natural tendency to pinpoint problem areas

Can be learned; usually involves methodical approach and logic

Part knowledge

Without understanding the product, doesn’t matter how smartyou are :)

Most frustrating when it’s related to an area we don’t know

What is Troubleshooting?

Page 15: Troubleshooting SOA suite11g

Slide 17 of 64 © |

Start Somewhere – Narrow Down Problem Area

Issues

Performance

Server-wide Service-specific

Runtime

Composite Infrastructure

Deployment

Page 16: Troubleshooting SOA suite11g

Slide 18 of 64 © |

INFRASTRUCTURE ISSUES

Page 17: Troubleshooting SOA suite11g

Slide 19 of 64 © |

Could be a server issue

Could be a coding issue

Could be a business fault that should be handled by the

code..Contact Dev Teams

Must be able to differentiate between infrastructure errors

and composite instance errors

Troubleshooting the Infrastructure

Page 18: Troubleshooting SOA suite11g

Slide 20 of 64 © |

1. Use logs

2. Use thread dumps

Troubleshooting the Infrastructure

Page 19: Troubleshooting SOA suite11g

Slide 21 of 64 © |

The soa_server1.out log file contains most runtime

issues.For all other issues refer to the servername.log file.

Must differentiate between infrastructure errors and

composite instance errors

1. Using Logs

Page 20: Troubleshooting SOA suite11g

Slide 22 of 64 © |

Random crashes immediately after go-live

Only happened in Production

No warning signs

Error does not appear on the EM console

Example: Infrastructure Error

<Aug 5, 2013 12:00:02 AM EDT> <Error><oracle.soa.bpel.engine.dispatch> <BEA-000000>

<failed to handle message

javax.ejb.EJBException: EJB Exception:

java.lang.StackOverflowError...

Page 21: Troubleshooting SOA suite11g

Slide 23 of 64 © |

Often easy to distinguish

Should be handled by the code

Shows as a faulted instance on the EM console

Example: Business Fault

<Aug 6, 2013 10:10:33 AM EDT> <Error><oracle.soa.mediator.serviceEngine> <BEA-000000>

<Got an exception:

oracle.fabric.common.FabricInvocationException:

javax.xml.ws.soap.SOAPFaultException:

Message: Organization 129024 not found. Stack trace: at

Core.WebServices.Message.MessageWebService.SaveNotification(O

rganization organization, Notification notification) in

c:\Data\1.0\Core\Message\MessageWebService.svc.cs:line 100,

detail=javax.xml.ws.soap.SOAPFaultException:

Page 22: Troubleshooting SOA suite11g

Slide 24 of 64 © |

Thrown by external system

No action needed

Shows as a faulted instance on the EM console

No action needed; follow up with target system

Example: System Fault (but not your fault!)

<Aug 6 , 2013 10:10:33 AM EDT> <Error> <oracle.soa.mediator.serviceEngine> <BEA-000000>

<Got an exception:

oracle.fabric.common.FabricInvocationException:

javax.xml.ws.soap.SOAPFaultException:

CreateCustomer failed with Message: Cannot insert the value

NULL into column 'CustomerID', table '@Customers'; column

does not allow nulls. INSERT fails.

Page 23: Troubleshooting SOA suite11g

Slide 25 of 64 © |

The infamous and ever misleading “Unable to access the

following endpoints” error

Example: System Fault

Page 24: Troubleshooting SOA suite11g

Slide 26 of 64 © |

In this case, due to:

Message send failed:

sun.security.validator.ValidatorException:

PKIX path building failed:

sun.security.provider.certpath.SunCertPathBu

ilderException: unable to find valid

certification path to requested target

Example: System Fault

Page 25: Troubleshooting SOA suite11g

Slide 27 of 64 © |

Just an infrastructure warning

Threads would eventually clear themselves up

Does not show on the EM console

Due to failed transaction that continues to retry

Example: Coding or Infrastructure Problem?

<Sep 30, 2013 11:30:04 PM EDT> <Warning><oracle.integration.platform.instance.store.async> <BEA-000000>

<Unable to allocate additional threads,

as all the threads [10] are in use.

Threads distribution :

Fabric Instance Activity = 1,Fabric-Instance-Manager = 9,>

Page 26: Troubleshooting SOA suite11g

Slide 28 of 64 © |

A lot more information is logged in the soa_server1-

diagnostic.log file

Modifying Logger Levels

Page 27: Troubleshooting SOA suite11g

Slide 29 of 64 © |

A lot more information is logged in the soa_server1-

diagnostic.log file

Modifying Logger Levels

[2012-01-01T22:35:56.144-05:00] [soa_server1] [TRACE] [] [oracle.soa.adapter]

[ecid: cb680017c6a0acfe:-3f1527ec:13487d1ea4c:-8000-0000000000000fe1,0:2]

JmsProducer_execute:[default destination = jndi/CustomerJMSQueue]:

Successfully produced message.

[2012-01-01T22:35:56.256-05:00] [soa_server1] [NOTIFICATION] [] [oracle.soa.adapter]

[ecid: cb680017c6a0acfe:-5675273b:1348cccad75:-8000-0000000000055743,0]

JMSAdapter JMSConsumer JMSMessageConsumer_consume: Got message with ID

ID:<458362.1325475356144.0> from destination jndi/CustomerJMSQueue

[2012-01-01T22:35:56.261-05:00] [soa_server1] [TRACE] [] [oracle.soa.adapter]

[ecid: cb680017c6a0acfe:-5675273b:1348cccad75:-8000-0000000000055743,0]

JMS Adapter JMSProducer:CustomerJMS [ CustomerProduce_ptt::CustomerProduce(body)

] XMLHelper_convertJmsMessageHeadersAndPropertiesToXML:

<JMSInboundHeadersAndProperties xmlns="http://xmlns.oracle.com/pcbpel/

adapter/jms/">[[

<JMSInboundHeaders>

<JMSMessageID>ID:&lt;458362.1325475356144.0></JMSMessageID>

<JMSTimestamp>1325475356144</JMSTimestamp>

Page 28: Troubleshooting SOA suite11g

Slide 30 of 64 © |

When a managed server goes into warning state, what are

you supposed to do?

2. Using Thread Dumps

Page 29: Troubleshooting SOA suite11g

Slide 31 of 64 © |

Navigate to Servers > (managed server) > Monitoring >

Threads

Understanding Stuck Threads

Page 30: Troubleshooting SOA suite11g

Slide 32 of 64 © |

AdminServer.log

bam_server1.log

Understanding Stuck Threads

####<Dec 23, 2011 6:03:49 PM EST> <Error> <WebLogicServer>

<soahost1> <AdminServer> <BEA-000337> <[STUCK] ExecuteThread: '0'

for queue: 'weblogic.kernel.Default (self-tuning)' has been busy

for "658" seconds

####<Dec 23, 2011 5:53:36 PM EST> <Error> <JMX> <soahost1> <bam_

server1> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.

Default (self-tuning)'> <<WLS Kernel>> <> <> <1324680816405> <BEA-

149500> <An exception occurred while registering the MBean

com.bea:Name=AdminServer,Type=WebServiceRequestBufferingQueue,

WebServiceBuffering=AdminServer,Server=AdminServer,

WebService=AdminServer. java.lang.OutOfMemoryError: PermGen space

Page 31: Troubleshooting SOA suite11g

Slide 33 of 64 © |

1. We found AdminServer to be in the “Warning” state, due

to a stuck thread.

2. We confirmed that there was indeed a stuck

“ ExecuteThread ” as shown on both the Oracle

WebLogic Administration Console and theAdminServer.log file.

3. By reviewing the soa_server1.log and

bam_server1.log files, we found startup errors in the

BAM server log.

4. The BAM server was unable to register an AdminServerMBean due to the java.lang.OutOfMemoryError

exception that was thrown.

Understanding Stuck Threads

Page 32: Troubleshooting SOA suite11g

Slide 34 of 64 © |

PERFORMANCE ISSUES

Page 33: Troubleshooting SOA suite11g

Slide 35 of 64 © |

Is logging in to Oracle Enterprise Manager Fusion

Middleware Control extremely slow?

Are all composite instances completing in an unusually

longer period of time?

Are the logs or your dehydration database growing

unusually quickly?

Are you seeing an exceptionally high number of errors in

the logs?

Server Wide Performance Issues

Page 34: Troubleshooting SOA suite11g

Slide 36 of 64 © |

root@soahost1:/root> df –m

Filesystem 1M-blocks Used Available Use% Mounted on

/dev/sda8 996 451 494 48% /

/dev/sda9 815881 697454 76314 91% /u01

/dev/sda7 996 36 909 4% /home

/dev/sda5 1984 138 1744 8% /tmp

/dev/sda3 1984 283 1598 16% /var

/dev/sda2 5950 3842 1802 69% /usr

/dev/sda1 99 12 83 13% /boot

tmpfs 8023 0 8023 0% /dev/shm

Check available disk space

Often an overlooked area

Page 35: Troubleshooting SOA suite11g

Slide 37 of 64 © |

The vmstat or TOP command easily outputs CPU,

memory, and I/O statistics

Do not rely on Linux’s reporting of available memory, and

best to look at SWAP space usage

Why Linux reports 100% memory usage all the time ???

Check CPU, RAM, and I/O

root@soahost1:/root> vmstat -S m

procs -------memory--------- --swap-- ---io-- --system-- ----cpu-----

r b swpd free buff cache si so bi bo in cs us sy id wa st

0 0 0 59 402 15055 0 0 2 16 0 0 2 2 96 1 0

Page 36: Troubleshooting SOA suite11g

Slide 38 of 64 © |

System log files can reveal resource issues:

Check OS Resources

root@soahost1:/root> cat /var/log/messages

Aug 31 20:53:22 uslx286 sshd[22480]: fatal:

setresuid 10000: Resource temporarily unavailable

root@soahost1:/root> ps -A | wc -l

297

root@soahost1:/root> lsof | wc -l

6064

Too many open files can exhaust system resources:

Too many running processes can exhaust system resources:

Page 37: Troubleshooting SOA suite11g

Slide 39 of 64 © |

For performance, consider the following:

Switching from Sun JDK to JRockit JDK

Optimizing JVM settings

Additional JVM performance tuning documentation from

Oracle can be found at:

http://docs.oracle.com/cd/E23943_01/web.1111/e13814.pdf

http://docs.oracle.com/cd/E15289_01/doc.40/e15060.pdf

JVM Performance Tuning

Page 38: Troubleshooting SOA suite11g

Slide 40 of 64 © |

Add this to the PORT_MEM_ARGS, argument in thesetSOADomainEnv.sh(.cmd) script

-XX:+HeapDumpOnOutOfMemoryError

Although this is not a performance setting, I recommendsetting it to dump the heap to an hprof file when

java.lang.OutOfMemoryError exceptions are thrown

This is useful for later analysis and troubleshooting

JVM Logging

Page 39: Troubleshooting SOA suite11g

Slide 41 of 64 © |

Ensuring that the heap allocated to the JVM is appropriately

sized (that is, comparing heap versus non-heap usage)

Ensure that there is no excessive garbage collection

Monitor JVM thread performance

Check JVM

Page 40: Troubleshooting SOA suite11g

Slide 42 of 64 © |

Data source errors are usually easy to identify – when

exhausted, errors show up everywhere

Check Data Sources

Page 41: Troubleshooting SOA suite11g

Slide 43 of 64 © |

Involve a DBA,who is familiar with the Platform.

Check Database Performance

Page 42: Troubleshooting SOA suite11g

Slide 44 of 64 © |

Navigate to Monitoring > Performance Summary

Can choose metrics to display for any composite

Viewing Performance Summary Graphs

Page 43: Troubleshooting SOA suite11g

Slide 45 of 64 © |

Right-click on Monitoring > Request Processing

Utilizing SQL queries is so much better

Viewing Request Processing Metrics

Page 44: Troubleshooting SOA suite11g

Slide 46 of 64 © |

Remember SQL output from last page?

Let’s also get the invoke durations

Composite Instance Performance

SELECT

composite_instance_id,

composite_creation_date,

component_name,

action,

component_state,

TO_CHAR((TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),12,2))*60*60) +

(TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),15,2))*60) +

TO_NUMBER(SUBSTR(TO_CHAR(updated_time-created_time),18,4)),'999990.000') duration

FROM

mediator_instance

WHERE

component_name = 'Order.Create’

Page 45: Troubleshooting SOA suite11g

Slide 47 of 64 © |

DEPLOYMENT ISSUES

Page 46: Troubleshooting SOA suite11g

Slide 48 of 64 © |

Involves:

1. Compilation

ant -f ant-sca-package.xml package -

DcompositeDir=$CODE/HelloWorld -

DcompositeName=HelloWorld -Drevision=1.0

2. Deployment

ant -f ant-sca-deploy.xml deploy -

DserverURL=$SOAURL/soa-infra/deployer -

Duser=$USERNAME -Dpassword=$PASSWORD -

DsarLocation=$CODE/HelloWorld/deploy/sca_HelloWorl

d_rev1.0.jar -Dpartition=default -Doverwrite=true

-DforceDefault=true

Understanding the Ant Deployment Process

{we are not using Ant..but having this info won't hurt}

Page 47: Troubleshooting SOA suite11g

Slide 49 of 64 © |

Compilation done via the package target in ant-sca-

package.xml

The package target calls other targets to perform:

1. Cleanup

2. Validation

3. Compilation

Understanding the Ant Compilation Process

Page 48: Troubleshooting SOA suite11g

Slide 50 of 64 © |

Removes any existing SAR files

Compilation: The init Target

clean:

[echo] deleting

/u01/svn/HelloWorld/deploy/sca_HelloWorld_rev1.0.jar

Page 49: Troubleshooting SOA suite11g

Slide 51 of 64 © |

Sets environment variables and validates all resources

within the code

Compilation: The scac-validate Target

scac-validate:

[echo] Running scac-validate in

/u01/svn/HelloWorld/composite.xml

[echo] oracle.home =

/u01/app/oracle/middleware/Oracle_SOA1/bin/..

[input] skipping input as property compositeDir has already

been set.

[input] skipping input as property compositeName has already

been set.

[input] skipping input as property revision has already been

set.

Page 50: Troubleshooting SOA suite11g

Slide 52 of 64 © |

Compiles the code

Compilation: The scac Target

scac:

[scac] Validating composite "/u01/svn/HelloWorld/composite.xml"

[scac] error: location

.

Load of wsdl "HelloWorldWebService.wsdl with Message part

element undefined in wsdl [file:/u01/svn/HelloWorld/

.

[echo]

[echo] ERROR IN TRYCATCH BLOCK:

[echo] /u01/scripts/build.soa.xml:112: The following

error occurred while executing this line:

.

[echo] /u01/app/oracle/middleware/Oracle_SOA1/bin/ant-sca-

compile.xml:269: Java returned: 1 Check log file : /tmp/out.err

for errors

Page 51: Troubleshooting SOA suite11g

Slide 53 of 64 © |

Understand that ant runs on the client machine, not the SOA

server[echo] /u01/app/oracle/middleware/Oracle_SOA1/bin/ant-sca

deploy.xml:188: java.lang.OutOfMemoryError: PermGen space

Compilation errors, check out.err and understand adf-

config.xml

oracle.fabric.common.wsdl.SchemaBuilder.loadEmbeddedSchemas

(SchemaBuilder.java:492) Caused by: java.io.IOException:

oracle.mds.exception.MDSException: MDS-00054: The file to be

loaded oramds:/apps/Common/HelloWorld.xsd does not exist.

Deployment errors are usually straightforward[deployComposite] INFO: Creating HTTP connection to

host:soahost1, port:8001

[deployComposite] java.net.UnknownHostException: soahost1

Types of Errors

Page 52: Troubleshooting SOA suite11g

Slide 54 of 64 © |

Located in Unix/Linux:

/tmp/out.err

Located in Microsoft Windows:

C:\Users\[user]\AppData\Local\Temp\out.err

Location of out.err

Page 53: Troubleshooting SOA suite11g

Slide 55 of 64 © |

OTHER STUFF

Page 54: Troubleshooting SOA suite11g

Slide 56 of 64 © |

DMS Spy Servlet displays instant Dynamic Monitoring

Service (DMS) related metrics

Navigate to http://<host>:<soaport>/dms/Spy

http://docs.oracle.com/cd/E15586_01/core.1111/e10108/monitor.htm#CFAHIAIB

The DMS Spy Servlet

Page 55: Troubleshooting SOA suite11g

Slide 57 of 64 © |

The EDN Database Debug Log can be accessed at:

http://<host>:<soaport>/soa-infra/events/edn-db-log

Changing the oracle.integration.platform.blocks.event.saq

logger to TRACE:32 captures the body of the event

message is available in the EDN trace

Check Event Delivery Network (EDN)

Page 56: Troubleshooting SOA suite11g

Slide 58 of 64 © |

SUMMARY

Page 57: Troubleshooting SOA suite11g

Slide 59 of 64 © |

Troubleshooting is part politics, part product knowledge

Oracle SOA Suite 11g errors can mostly be classified into:

Runtime (or infrastructure) errors

Performance issues/errors

Deployment errors

Summary

Page 58: Troubleshooting SOA suite11g

Slide 60 of 64 © |

For infrastructure errors:

Identify whether it is a composite or an infrastructure error

Consider increasing logger levels

Identifying the root cause of stuck threads may require some

drill-down investigation

Summary

Page 59: Troubleshooting SOA suite11g

Slide 61 of 64 © |

For performance issues:

Identify whether it is a server-wide performance issue, or

specific to a single composite

Check overall system health, even the obvious areas

Obtaining composite instance performance metrics is easily

done through SQL,In case of OSB/Paris run SOAP UI unit tests.

Summary

Page 60: Troubleshooting SOA suite11g

Slide 62 of 64 © |

For deployment errors:

Understand the ant compilation (i.e., packaging) and

deployment processes

Understand adf-config.xml

Summary

Page 61: Troubleshooting SOA suite11g

Slide 63 of 64 © |

Oracle SOA Suite 11g Administrator’s

Handbook

http://www.packtpub.com/oracle-soa-suite-11g-

administrators-handbook/book

Chapter 6: Troubleshooting the Oracle

SOA Suite 11g Infrastructure

“Highly recommended

Book

Page 62: Troubleshooting SOA suite11g

Slide 64 of 64 |

Amit DeoSenior Consultant

[email protected]

Contact Information