12
TECHNICAL WHITE PAPER BMC ProactiveNet Performance Management Deep-Dive Application Diagnostics

Bmc Bppm Document

Embed Size (px)

Citation preview

Page 1: Bmc Bppm Document

TECHNICAL WHITE PAPER

BMC ProactiveNet Performance ManagementDeep-Dive Application Diagnostics

Page 2: Bmc Bppm Document

TAbLE of CoNTENTs

IntroductIon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

ProactIve aPPlIcatIon Performance management . . . . . . . . . . . . . . . . . . . . . . . . . 2

Problem IsolatIon challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

automated root cause analysIs WIth deeP dIve dIagnostIcs . . . . . . . . . . . . . . . . . . . 3

Deep-Dive Application Diagnostics Components » . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Painless Deployment » . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Analysis: Starting with a Bird’s Eye View » . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Taking the Quick Dive » . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Part of a comPrehensIve busIness servIce management aPProach . . . . . . . . . . . . . . 9

summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Page 3: Bmc Bppm Document

1

INTRoduCTIoN Business applications are the lifeblood of your organization . When they fail, your company stands to lose

revenue and reputation . As a result, you likely find yourself under immense pressure to instantly fix problems

and restore service, all while dealing with demanding management, employees, customers, and partners .

To effectively handle the potential severe impact of application malfunctions, you must carefully and consistently

manage your applications so that, if and when they fail, you have the tools to:

Detect a problem “before the phone rings” »Assign the right priority to a problem, based on how it might impact the business »Isolate the root cause of a problem to determine which team needs to be engaged to diagnose and resolve »the problem

Resolve the problem before users and services are impacted »

Industry analysts estimate that 40 percent of downtime is typically caused by application failures . With

virtualized and cloud applications, they predict that application failures will increase to 60 percent and 80

percent, respectively . In today’s dynamic IT environment, where each application is composed of hundreds of

moving parts deployed in virtual and cloud environments, and where dynamic business requirements dictate

constant change, early detection and root cause isolation are key challenges . Wrestling with the inherent

complexity of today’s distributed Web applications, most organizations find themselves in an “all hands on deck”

exercise when problems occur, disrupting process-based operations and wasting the time of expert IT staff .

Knowing that a problem exists is only the first step in effective application performance management . More

important is knowing about a problem before it affects users and services; and most important is knowing the

cause of a problem so that you can resolve it as quickly and efficiently as possible .

The BMC Proactive Application Performance Management solution empowers IT staff to proactively detect

— then quickly triage, prioritize, and isolate — root cause . It also automates the resolution of application

performance and availability issues in distributed, mainframe, virtualized, and cloud-based (custom and pre-

packaged) applications . As a result, organizations can:

Improve availability » by proactively identifying application performance problems and automating problem

isolation and resolution

Improve service quality » by minimizing the impact of application performance problems on business

processes through proper prioritization of issues, avoidance of service outages, and resolution of problems

before service levels are affected

Reduce IT costs » by optimizing application problem isolation and diagnosis processes across the organization

and enabling effective collaboration between Application Development and IT Operations teams

By applying predictive analytics to end-user transactions, BMC ProactiveNet Performance Management (a

core product within the BMC Proactive Application Performance Mangaement solution) learns normal behavior

patterns, quickly detects when irregular end-user behavior occurs, and proactively triggers the automatic

capture of deep application diagnostics associated with degraded end-user transactions . This detailed

diagnostic information is immediately available to operators, thus speeding root cause analysis .

BMC ProactiveNet Performance Management - Application Diagnostics provides deep application diagnostics

collection to isolate problems in distributed Java EE, Windows, and .NET applications, helping you to:

Reduce escalations to Level 3 support »Accelerate application problem isolation and resolution (MTTR) »Improve availability and performance of critical business applications »Enable collaboration between Application Development and IT Operations teams »Reduce IT costs through improved process efficiencies; eliminating finger-pointing and “war-rooms” »

This paper presents the challenges of application problem isolation and focuses on the deep-dive diagnostic

capabilities of BMC ProactiveNet Performance Management – Application Diagnostics .

Page 4: Bmc Bppm Document

2

PRoACTIvE APPLICATIoN PERfoRmANCE mANAgEmENTThe BMC Proactive Application Performance Management solution learns the actual behavior of the end-user

experience and the supporting application infrastructure; detects subtle changes in behavior; and proactively

alerts on impending performance and availability issues at the earliest possible time . The solution also enables

rapid problem isolation and resolution by automating the analysis of learned behavior in conjunction with events

and change information to isolate the root cause and service impact of performance and availability issues .

At the core of the solution, BMC ProactiveNet Performance Management delivers the following application

performance management capabilities across mainframe, physical, virtual, and private cloud architectures:

Real (passive) and synthetic (active) end-user behavior monitoring of complex Web, client/server, and »mainframe applications running on mainframe, physical, virtual, and cloud (private, public, and hybrid)

architectures . The solution even includes optional adapters for collecting performance data and events from

various third-party end-user experience monitoring solutions, including Keynote, Gomez, and HP Business

Availability Center (Topaz suite) . The solution measures application availability and response times, and

performs transaction accuracy checking .

SAP®, Oracle®, and Siebel application management functionality for application administrators provides »additional insight into the performance and availability of packaged applications .

Application component (application, database, middleware) behavior monitoring provides additional insight »into the application tier .

Deep-dive application diagnostics collection automatically gathers data from Java EE, Windows, and .NET »application servers to provide visibility to determine which application component is causing a problem1 .

Since BMC ProactiveNet Performance Management takes the guesswork out of the equation, you can escalate

incidents to their appropriate owners and eliminate inefficient and frustrating “war room” situations .

The following additional capabilities are also available as part of BMC’s Proactive Application Performance

Management solution:

Application discovery and dependency modeling via BMC Atrium Discovery and Dependency Mapping »Transaction profile monitoring (including message middleware component health and availability) via BMC »Middleware Management

Mainframe application diagnostics (for CICS, DB2®, and IMS™) via BMC MainView Transaction Analyzer »Mainframe application component health and availability monitoring via BMC MainView products »

Together, these products deliver the industry’s only Proactive Application Performance Management solution —

delivering early problem identification with rapid problem isolation and resolution .

PRobLEm IsoLATIoN CHALLENgEsWith IT as an enabler of your organization’s business processes, you are most likely using Java EE, Windows,

or .NET application servers for backend business logic processing, integration of your enterprise applications,

and Web-based applications . These application environments, while providing numerous advantages, also add

several levels of complexity that make problem isolation a formidable challenge .

The multi-tier architecture of Java EE, Windows, and .NET-based applications relies on multiple networked

components, including client machines, load balancers, firewalls, Web servers, application servers, security

servers, transaction servers, and database servers . What’s more, the application server, in itself, is a highly

componentized entity . Increasingly, applications are being virtualized and deployed in cloud computing

environments, adding additional infrastructure components to manage .

1 The BMC Proactive Application Performance Management solution supports in-depth data collection and analysis for the following: JBoss, Tomcat, Java, and .NET servers; Weblogic, WebSphere MQ, WebSphere Application Server, and WebSphere Message Broker; IBM DataPower XI50; TIBCO EMS and TIBCO RV; Sun GlassFish Enterprise Server; Oracle Database and Oracle WebLogic; CICS, DB2, IMS, z/OS, USS, zLinux and z/VM; and IP, VTAM, abd DASD storage . However, the focus of this paper is on the BMC ProactiveNet Performance Management – Application Diagnostics component that supports Java and .NET servers .

Page 5: Bmc Bppm Document

3

Add to the inherent complexity of distributed and virtualized applications the frequent changes these

applications go through due to regular maintenance, fixes, and new business requirements, and it is easy to

see why proper application performance management is vital .

Running distributed applications composed of so many different moving parts means that multiple teams

touch the application, including the IT Operations staff who manage the servers, the DBAs who set up the

database, the security engineers who own firewalls and authentication servers, the mainframe system

administrators, the network administrators, and others .

For example, when a bank employee attempts to execute a transaction and receives poor performance, it’s

unclear what is causing the problem and who needs to fix it . Is it a network hiccup? Insufficient application

server connection pools? An overloaded backend server? A bug in a Java EE component? etc .

When critical problems occur, the typical response for many IT organizations is to summon representatives

from all functional teams (both within and outside the organization), shut them in a big meeting room, and

let them “figure it out .” Industry analysts estimate that, on average, 10-14 people are involved when a single

application or service outage occurs . Subsequent problem analysis is based on vast amounts of log files,

memory dumps, end-user reports, performance monitoring statistics, and guesswork .

Needless to say, these problem isolation methods are extremely inefficient . With little data (or too much

irrelevant data) to go on, finger-pointing is common, and IT Operations often spend extensive amounts of time

proving their innocence on problems that have nothing to do with their domain . For example, some database

transactions may not be processed due to incorrect configuration of the application server or a bug in a Java

EE component; hence DBAs would waste their time sitting in the “war room .” Even worse, the lack of clear

visibility into application transaction execution means longer mean time to repair (MTTR), thereby increasing

the costs associated with application downtime .

AuTomATEd RooT CAusE ANALysIs WITH dEEP-dIvE dIAgNosTICsTo accelerate problem isolation and minimize business disruptions, you need fast, reliable root cause

information with detailed application transaction diagnostic data — every single time a problem occurs .

With the BMC Proactive Application Performance Management solution, application problems are proactively

detected through a powerful combination of end-user behavior monitoring, coupled with application

infrastructure behavior monitoring, real-time predictive root cause, and service impact analytics .

As part of the solution, BMC ProactiveNet Performance Management analyzes and learns the behavior of

real and synthetic end-user experiences, as well as the application infrastructure components, and alerts IT

Operations when degraded performance or availability issues occur . Based on recent and current trends in

end-user and application behavior, the solution can also alert IT Operations about a potential problem that is

likely to occur within the next few hours .

When a subtle change in end-user or application behavior is detected by the real-time predictive analytics

engine, BMC ProactiveNet Performance Management generates a predictive alert or an abnormality, and

proactively triggers the automatic capture of deep application diagnostics, which then can be automatically

associated with the degraded end-user transactions (see Figure 1) . Because the detailed diagnostic data are

captured when the problem occurs, there is no need to recreate the problem .

Page 6: Bmc Bppm Document

4

Figure 1 . The service model changes status when BMC ProactiveNet Performance Management detects subtle changes in normal end-user or application behavior, indicating current or potential degraded performance or failures .

To accelerate problem isolation, BMC ProactiveNet Performance Management provides on-demand root

cause analysis for every event . The real-time predictive analytics engine automates the analysis and

correlation of learned behavior, events/alerts, and change information (e .g ., BMC BladeLogic or BMC Remedy

changes); ensuring IT staff have enough information available to quickly isolate the most likely root cause(s)

of a problem and determine its impact on the business . This combination of early detection with automated

root cause and service impact analytics allows IT Operations to find application performance problems at the

earliest possible time and drive fast and efficient automated repair of problems .

This powerful combination of behavior learning, predictive root cause, service impact analytics, and

continuous deep-dive diagnostic collection enables IT Operations to quickly and efficiently detect, prioritize,

and isolate application problems and route them to the appropriate person or team for resolution . As a result,

you can avoid costly application and service outages .

dEEP-dIvE APPLICATIoN dIAgNosTICs ComPoNENTsBMC ProactiveNet Performance Management – Application Diagnostics helps IT Operations and Application

Support staff to isolate performance and availability problems in the application tier of distributed applications

running in Java EE, Microsoft .NET, or COM/COM+ application environments .

By quickly determining where the root cause of a problem lies, the solution enables IT staff to route the

problem to the appropriate domain expert for rapid resolution . By eliminating the need to involve multiple

IT groups to diagnose a problem, the entire problem resolution process is expedited, service is restored

promptly, and end users are either unaware that a problem was averted or are simply satisfied that they are

productive again .

Page 7: Bmc Bppm Document

5

BMC ProactiveNet Performance Management - Application Diagnostics consists of the following main components:

Agents. » BMC ProactiveNet Performance Management - Application Diagnostics agents are lightweight

software agents deployed on production Java EE or .NET application servers . The primary role of the agents is

to continuously gather deep diagnostic data on application transaction performance, execution, and errors for

inclusion in application root cause analysis . These agents are based on the patented BMC AppSight “Blackbox”

technology, which automates the entire process to record application execution and captures a synchronized

record of system events, performance metrics, configuration data, and code execution flow . The “BlackBox” can

record application execution at all levels of detail, either locally or at remote sites, without requiring any change

to the application or the application server environment . You have complete control of the recording session, even

allowing you to switch to a greater level of recording detail when problems arise, without having to restart your

application .

server. » The BMC ProactiveNet Performance Management - Application Diagnostics server (i .e ., BMC AppSight

Server) is a middle-tier component that connects agents to the BMC ProactiveNet Performance Management

server, providing access to captured data, which can be stored in a database and/or as XML files on the file

system .

Console. » Operators can view the recorded detailed diagnostics data from a BMC ProactiveNet Performance

Management event within the solution’s operations console . Recorded operations/transactions can be replayed

to facilitate rapid problem identification and diagnosis .

Figure 2 depicts the basic components and flow of diagnostic data being collected .

Figure 2 . Deep-dive diagnostic data is collected continuously for inclusion in root cause analysis .

Page 8: Bmc Bppm Document

6

PAINLEss dEPLoymENTWhen business applications fail to perform, IT needs to act quickly to restore service . It is essential that any

solution used in the course of the problem isolation and resolution process is easy to deploy and use . After all,

IT staff need to spend their time finding the root cause(s) and fixing problems rather than expending effort to

manage their management tools .

BMC ProactiveNet Performance Management - Application Diagnostics requires no change to monitored

application environments . You do not need to modify Java EE application server startup scripts, run a special

version of the Java Virtual Machine (JVM) or Common Language Runtime (CLR), or change application code .

You can install BMC ProactiveNet Performance Management - Application Diagnostics agents through the

command line or by using any existing deployment tools . The Java EE version of the agent is packaged as

an EAR file and can be deployed using the Java EE application server administrator console . The Windows/ .

NET version is packaged as a Windows service that can be easily deployed directly or through your standard

software distribution processes and tools .

After installing the agents, you can get started with gathering transaction execution data from your

application . BMC ProactiveNet Performance Management - Application Diagnostics, designed with special

focus on simplicity, comes with predefined configurations for monitoring all common distributed application

environments; hence no scripting or special customization is involved . The tool’s ease-of-use enables a high

level of flexibility, allowing users to choose between running agents continuously or deploying and running

them only when problems occur .

ANALysIs: sTARTINg WITH A bIRd’s EyE vIEWThe location of a problem’s root cause is rarely known when analysis begins . Therefore, you have to start

by looking at the big picture and finding “suspect” tiers or components before drilling down . The Technology

Breakdown view shown in Figure 3 provides you with exactly that . It displays duration data on transaction

performance as recorded within the application server . Using this view, you can easily spot slow-performing

transaction categories and determine which tier may have caused the issue .

For example, consider a situation where an application support engineer is tasked with isolating the root

cause of a performance slowdown in an online Java EE-based trading application . The engineer may find that

a certain type of account verification transaction performs poorly when compared to other transactions or to

historical performance . Looking at the application transaction performance breakdown, the engineer sees

that the majority of time was spent on the database side .

Page 9: Bmc Bppm Document

7

Figure 3 . Application diagnostics break down the application technology to quickly identify where the transaction is spending the most time .

TAkINg THE QuICk dIvEWhile a high-level view is a good starting point for analysis, it rarely suffices for root cause isolation, as it

only tells part of the story . Before making the final determination as to where the root cause lies, you need

to investigate problematic transactions and understand their actual execution performance at a more

granular level .

Rather than execute a different tool to gather more detailed information or sift through long server

logs, BMC ProactiveNet Performance Management lets you drill down into the problematic transaction

invocations at a click of a button .

The solution’s Application Transaction Breakdown view presents actual transaction execution, including

full transaction execution path — SQL queries, EJB calls, Servlets, JSPs, JMS, JCA, JTA, JNDI, ASP/Xs,

COM/COM+, and more — made in the context of the transaction . Performance data is displayed for each

of the transaction steps . Figure 4 illustrates some of the application components that might be displayed

when application performance degradation occurs .

Page 10: Bmc Bppm Document

8

Figure 4 . The transaction breakdown listed in the invocation tree helps you pinpoint the component(s) within the transaction that are consuming the most time .

If the support engineer zooms in on the account verification transaction from Figure 3, and further evaluates

the transaction breakdown, he/she notices that a certain type of JDBC call takes an exceptionally long time

to complete, and throws an exception . BMC ProactiveNet Performance Management displays the full SQL

query that was sent to the database, enabling the engineer to realize that the transaction sends a request to an

external database on the company partner’s extranet . When the incident is escalated to level 3, a simple mouse-

click allows the application developer to drill down further to the actual line of code details and parameter values

that existed at the time the degradation occurred (see Figure 5) .

As a result, instead of spending countless hours and numerous individuals’ time, the problem is quickly and

accurately isolated and escalated to the partner’s help desk team for resolution .

Figure 5 . Deep-dive diagnostic drill-down from BMC ProactiveNet Performance Management into BMC AppSight shows parameter values, object states, and lines of code, thus enabling collaboration between application developers and IT Operations and facilitating rapid problem isolation and resolution .

Page 11: Bmc Bppm Document

9

PART of A ComPREHENsIvE busINEss sERvICE mANAgEmENT APPRoACHBusiness Service Management (BSM) is a comprehensive and unified platform that simultaneously

optimizes IT costs, demonstrates transparency, increases business value, controls risk, and assures

quality of service . BSM simplifies, standardizes, and automates IT processes, so you can efficiently

manage business services throughout their lifecycle — across distributed, mainframe, virtual, and

cloud-based resources . With BSM, your organization has the trusted information it needs, so you can

prioritize work based on critical business services and orchestrate workflow across your IT management

processes and functions .

The BMC Proactive Application Performance Management solution follows IT Infrastructure Library®

(ITIL®) guidelines on problem investigation and diagnosis, and helps you achieve BSM through a unified

architecture that enables you to:

Exceed service level commitments by focusing on what’s really important to the business »Reduce application outages by solving issues before service levels are affected »Improve first-time resolution and slash the time it takes to repair application problems by more than »75 percent with accurate root cause and diagnostic information

Accelerate application problem resolution by eliminating the need to reproduce problems »Drive business value by automating manual workflows and actions across multiple vendors, »platforms, and sources

summARyBMC’s Proactive Application Performance Management solution empowers IT staff to proactively detect

— and then quickly and efficiently triage, prioritize, and isolate — root cause . It also automates the

resolution of application performance and availability issues in distributed, mainframe, virtualized, and

cloud-based applications . As a result, IT organizations can:

Improve availability » by proactively identifying application performance problems and automating

problem isolation and resolution

Improve service quality » by minimizing the impact of application performance problems on business

processes through proper prioritization of issues, avoidance of service outages, and resolution of

problems before service levels are affected

Reduce IT costs » by optimizing application problem isolation and diagnosis processes across the

organization and enabling effective collaboration between Application Development and IT Operations

teams

As a core product within this solution, BMC ProactiveNet Performance Management provides deep-

dive application diagnostics capabilities that isolate problems in distributed Java EE, Windows and .NET

applications, helping you to:

Reduce escalations to Level 3 support »Accelerate application problem isolation and resolution (MTTR) »Improve availability and performance of critical business applications »Enable collaboration between Application Development and IT Operations Teams »Reduce IT costs through improved process efficiencies; eliminating finger-pointing and “war-rooms” »

To learn more about BMC Proactive Application Performance Management (APM) and BMC ProactiveNet

Performance Management, please visit www.bmc.com/products/offering/bmC-ProactiveNet-Performance-management.html

Page 12: Bmc Bppm Document

bmc, bmc software, and the bmc software logo are the exclusive properties of bmc software, Inc ., are registered with the u .s . Patent and trademark office, and may be registered or pending registration in other countries . all other bmc trademarks, service marks, and logos may be registered or pending registration in the u .s . or in other countries . saP r/3 is the trademark or registered trademark of saP ag in germany and in several other countries . oracle is a registered trademark of oracle corporation . db2 and Ims are trademarks or registered trademarks of International business machines corporation in the united states, other countries, or both . It Infrastructure library® is a registered trademark of the office of government commerce and is used here by bmc software, Inc ., under license from and with the permission of ogc . ItIl® is a registered trademark, and a registered community trademark of the office of government commerce, and is registered in the u .s . Patent and trademark office, and is used here by bmc software, Inc ., under license from and with the permission of ogc . all other trademarks or registered trademarks are the property of their respective owners . © copyright 2008, 2009, 2010 . bmc software, Inc . all rights reserved .

*199306*

Business runs on IT. IT runs on BMC Software.Business thrives when IT runs smarter, faster and stronger . That’s why the most demanding IT

organizations in the world rely on BMC Software across distributed, mainframe, virtual and cloud

environments . Recognized as the leader in Business Service Management, BMC offers a comprehensive

approach and unified platform that helps IT organizations cut cost, reduce risk and drive business profit .

For the four fiscal quarters ended December 31, 2010, BMC revenue was approximately $2 billion .