28
Continuous Application Performance Management A Whitepaper from dynaTrace Software Inc. This whitepaper discusses the new requirements for application performance management (APM) in the face of accelerating complexity in application development, architectural design, and production environments. There is no question that applications continue to become more business critical. They are weapons upon which we rely to compete in our given marketplace. They are more externally focused than ever before, touching customers, business partners and supply chain and driving more and more of our business. These business critical applications must not just be “available” 24x7, but they must perform optimally at all times for all users. They must scale further than ever before, supporting more and more users driving more and more transactions. And we are tweaking them, changing them, enhancing them more and more often than ever before. All this is going on while we fold in more 3rd party code including open source, distribute our applications globally, and shift from physical to virtual production environments. (continued…) Executive Summary

Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

  • Upload
    vannhan

  • View
    246

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance ManagementA Whitepaper from dynaTrace Software Inc.

This whitepaper discusses the new requirements for application

performance management (APM) in the face of accelerating complexity in

application development, architectural design, and production environments.

There is no question that applications continue to become more business

critical. They are weapons upon which we rely to compete in our given

marketplace. They are more externally focused than ever before, touching

customers, business partners and supply chain and driving more and more

of our business. These business critical applications must not just be

“available” 24x7, but they must perform optimally at all times for all users.

They must scale further than ever before, supporting more and more users

driving more and more transactions. And we are tweaking them, changing

them, enhancing them more and more often than ever before. All this is

going on while we fold in more 3rd party code including open source,

distribute our applications globally, and shift from physical to virtual

production environments. (continued…)

Executive Summary

Page 2: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009

Executive Summary

This whitepaper discusses the new requirements for application performance management (APM)1 in the face of accelerating complexity in application development, architectural design, and production environments. There is no question that applications continue to become more business critical. They are weapons upon which we rely to compete in our given marketplace. They are more externally focused than ever before, touching customers, business partners and supply chain and driving more and more of our business. These business critical applications must not just be “available” 24x7, but they must perform optimally at all times for all users. They must scale further than ever before, supporting more and more users driving more and more transactions. And we are tweaking them, changing them, enhancing them more and more often than ever before. All this is going on while we fold in more 3rd party code including open source, distribute our applications globally, and shift from physical to virtual production environments.

Traditional APM approaches are struggling to keep up. Despite numerous investments in APM tools over the years, we still have high severity issues in production that take entire teams to diagnose and eventually (one hopes) resolve. We are continually surprised by scalability issues at the worst times. Regressions continue to creep in to slow us down. We are continually challenged to bring new functionality on-line in a predictable manner. And despite lots of money and the best minds working on it, it’s not getting better fast enough. In fact, for many it’s getting worse. And there is a logical reason why – your applications have become so complex that they may be beyond the capabilities of your tools to manage them. That’s an untenable position for any enterprise that depends on applications as well as for the business and IT executives that have to deliver results. Therefore, a new, innovative approach is needed to address this new application reality – one that solves today’s challenges while anticipating tomorrow’s.

The limitations of traditional APM approaches combined with the accelerating complexity of our application development, application architectures and production environments is driving the urgency for a new approach to APM. This approach must:

• Combine innovation in both technology and approach

• Be built from the ground up to be transaction-centric and to run continuously, 24x7, assuring rapid and predictable resolution of issues in a fraction of the current time

• Support the entire development lifecycle to proactively identify and prevent issues from reaching production

• Automate repetitive, time-consuming tasks while automatically supporting the highly dynamic nature of advanced applications

• Integrate easily into pre-existing systems and business processes to enhance the value of existing application management investments.

1 APM in the dynaTrace context means ‘application performance management’, not ‘application performance monitoring’. This is a critical distinction when considering the differences between traditional APM tools (monitoring centric) and today’s highly evolved APM systems (management centric). It’s important to note that APM is often a confusing area because so many companies that do very different things are lumped under the same “APM” moniker. To make matters worse, many vendors use the same nomenclature to mean different things. This whitepaper does not pretend to clarify all nomenclature issues in this market space, but some clarifications are included below to help the reader better understand what to watch-out for when assessing APM solutions.

Page 3: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009

TABLE OF CONTENTS

1 WHO WILL FIND VALUE IN THIS WHITEPAPER? ............................................................................ 1

2 THE CONVERGENCE OF COMPLEXITY ........................................................................................... 1

3 THE LIMITS OF TRADITIONAL APM .................................................................................................. 3

3.1 LIMITATION 1: COMPONENT MONITORING – AVERAGES ONLY ............................................. 3

3.2 LIMITATION 2: REACTIVE TROUBLESHOOTING ONLY – SIMPLE APPS .................................. 4

3.3 LIMITATION 3: SILOED TOOLS – NO INTEGRATED LIFECYCLE APPROACH .......................... 6

3.4 LIMITATION 4: OLD ARCHITECTURE – STATIC APPS ONLY ..................................................... 7

3.5 LIMITATION 5 – DIFFICULT TO INTEGRATE INTO EXISTING ENVIRONMENTS ...................... 8

4 THE NEW APM REQUIREMENTS .................................................................................................... 10

4.1 REQUIREMENT 1: TRANSACTION-CENTRIC ............................................................................ 10

4.2 REQUIREMENT 2: CONTINUOUS, 24x7 OPERATION ............................................................... 11

4.3 REQUIREMENT 3: INTEGRATED ACROSS THE LIFECYCLE ................................................... 13

4.4 REQUIREMENT 4: HIGHLY AUTOMATED .................................................................................. 14

4.5 REQUIREMENT 5: OPEN TO ENHANCE EXISTING INVESTMENTS ........................................ 15

5 THE DYNATRACE SOLUTION .......................................................................................................... 17

5.1 PUREPATH GLOBAL TRANSACTION TRACING ........................................................................ 17

5.2 LIGHT-WEIGHT AGENT WITH SCALABLE SERVER-COLLECTOR ARCHITECTURE ............. 18

5.3 INTEGRATED LIFECYCLE APPROACH WITH INTEGRATED COLLABORATION SYSTEM .... 19

5.4 AUTOMATED DIAGNOSTICS AND SMART AGENT AUTOMATION WITH DYNAMIC, ON-THE-FLY PROVISIONING ............................................................................................................................... 21

5.5 OSGI PLUG-IN MODEL FOR OPEN INTEGRATION AND EXTENSIBILITY ............................... 22

6 CONCLUSION .................................................................................................................................... 24

Page 4: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 1

1 WHO WILL FIND VALUE IN THIS WHITEPAPER?

Any senior stakeholder in the application development process - application development executive, software or system architect, performance specialist, or production operations executive. Discussion of APM in the context of Agile, Continuous Integration, 3rd Party Code/Applications, Global SOA, Virtualized Data Centers/clusters, and Cloud Computing is included herein.

For more information on dynaTrace products, solutions and customer case-studies, please refer to the dynaTrace website, www.dynaTrace.com.

2 THE CONVERGENCE OF COMPLEXITY

Fifteen years ago, when application performance monitoring first emerged, externally facing business applications were quite simple. They were often 2-tier, database and front-end, all developed in-house, for a target audience measured in the thousands. Focus was on the network, the database and “pinging” the application periodically to make sure it was “available”. Companies like Tivoli (acquired by IBM), Quest and BMC emerged to satisfy these early, first-generation needs.

By 2000, the internet was rapidly becoming the new application platform of choice, and to support the need for more sophisticated distributed applications a third tier evolved, often written in Java, where the more complex business logic was embedded. Java application servers exploded onto the scene. Typical Web applications now had to scale to thousands of concurrent users and transaction scale needed to follow. These applications were still built in-house, but by a growing number of Java developers. Complexity was increasing and APM vendors responded with a second generation of products. Not surprisingly, the second generation response came from new innovators at the time, Mercury Software (acquired by HP, 2006) who established synthetic load-testing as a fundamental pre-requisite prior to production roll-out, and synthetic end-user experience monitoring for measuring application availability and “response time” SLAs, and Wily (acquired by CA, 2005) who established the need for Java component monitoring and problem isolation in the Java code.

But application complexity has not waned – in fact, over the past 5 years, complexity has accelerated dramatically. It has not only accelerated in application scale, sophistication and architecture, but also in development techniques, in production environments, and in performance expectations. N-tier applications, globally distributed, are more and more common. Service oriented environments with components built by 3rd parties, including commercial and open-source, are commonplace. Agile development techniques and continuous integration are being brought together to accelerate release cycles. Production environments are being converted from well segmented physical environments to virtual clusters running a combination of applications across a shared environment. Next on the list will be private clouds, and ‘cloud bursting’ for peakload requirements. And all the while, SLAs are becoming tighter, and response time expectations at a user by user level are increasing. As a software or system architect, any one of these changes might be manageable. But to have all these changes hitting simultaneously creates a perfect storm of complexity. And for IT management, what looks like a great opportunity to accelerate time to market, cut development costs and reduce production footprint and power consumption becomes a nightmare for guaranteeing application performance, scalability and stability.

This convergence of complexity across development, architecture and production has made traditional APM approaches obsolete. Despite the investment made over the past decade in APM solutions, high severity issues persist in production. Teams of people are still called together to troubleshoot performance, scalability and stability issues. Test cycles continue to be squeezed and test coverage reduced, despite the growing value of the applications. And time to market continues to be challenged as companies roll-out more and more sophisticated applications to drive business and compete more

Page 5: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 2

effectively. These challenges will not only persist, they will get worse as application complexity accelerates.

To respond to these mounting challenges, an innovative approach is required – a 3rd generation approach that not only solves today’s immediate application challenges, but also anticipates tomorrow’s: One that can proactively prevent performance, scalability and stability issues from getting to production in the first place. One that when application issues are discovered, as they always will be, accelerates resolution time to minutes from days, weeks or never. One that enables greater test coverage completed in half the time. One that can be leveraged across the entire application lifecycle to accelerate time-to-market, increase developer efficiency, reduce maintenance costs and improve the quality of the application experience delivered to customers, employees, and partners. The need for this 3rd generation approach was anticipated by Bernd Greifeneder, dynaTrace Software’s CTO, when he founded the company in 2005.

MonitorPrevent

Resolve

Gain Visibility

Reduce resolutiontime

Proactively avoidproblems

Continuous APM: A 3rd Generation Approach to Application Performance Management

Page 6: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 3

3 THE LIMITS OF TRADITIONAL APM

To better understand the value of the ideal 3rd generation approach, it is essential to view the limitations of traditional APM approaches in the context of today’s application complexity.

Traditional pre-production QA and APM approaches (including load testing on the pre-production side) have focused on identifying problems – a break-through 10 years ago. Synthetic load-test software was developed to simulate end-user demands on a new application and identify and capture the potential problems that surfaced under increasing load. Production monitoring software was developed to watch application components and if average response times slowed below a predefined threshold, an alert would be generated that a potential problem existed in a given application. In both cases, once the test or production teams had triaged the problem and isolated it to the application tier, it was then up to the development team to figure out whether the suspected problem was indeed a problem. Operating with only complaints about the symptoms but little or no useful diagnostic information, the developers would tediously try to recreate the problem in question. If they were lucky and recreating the problem was possible, they could move to diagnose the underlying cause (again without much useful diagnostic information) and eventually resolve the issue it. When they weren’t as lucky, the developers would add debug output and log statements into the mix in hopes that this additional information might provide clues as to the source of the problem.

Today, many application performance and scalability issues are found in test before they make their way to production and problem triage in production had become easier, both of which are positive developments. But the amount of time invested by the development organization itself to recreate, diagnose and resolve performance and scalability issues had increased substantially. In fact, according to recent research studies2, 25%-40% of development’s time is spent fixing application issues that surface in test and production taking an unacceptably large amount of R&D cycles away from delivering new features to support the business objectives.

As complexity has increased in development, architecture and production, these traditional approaches to APM offer less and less value in managing the performance, scalability and stability of today’s sophisticated, distributed applications. This puts even greater pressure on our operations and development teams to keep up. Below are the key limitations of these traditional APM tools given today’s application reality.

3.1 LIMITATION 1: COMPONENT MONITORING – AVERAGES ONLY

As described above, current monitoring solutions identify potential problems in components of an application. Some do this better than others. Bytecode instrumentation has proven to be the most efficient and effective way to get detailed information about an application instance or component. However, despite having access to application details, all traditional component monitoring systems report average performance metrics, such as response times. Averages, by definition are not the real data, but the statistical accumulation of such data. These systems then correlate the averages to produce a guesstimate of what might be happening across an application. It’s this statistical, correlated data built on averages that is used to determine potential application performance issues, and it’s this correlation of averages that is used to validate our performance against SLAs. Neither provides the level of accuracy needed to meet today’s requirements (described in more detail in Section 3), and as application complexity grows, both become increasingly problematic.

2 The Business Case for Better Problem Resolution, Nov. 2009 by Forrester and The State of Application Performance Management, a survey of IT professionals conducted between June and September 2009, sponsored by dynaTrace software.

Page 7: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 4

Consider the troubleshooting challenge. If you’re a software architect or performance specialist who must use statistical, correlated data for troubleshooting, you have a problem. You are trying to isolate to a specific line of code or architectural flaw that is causing that problem in a specific situation / transaction. You are looking for a needle in a field of haystacks. Knowing which application component might be a problem may help you get to the right haystack, but you still have to find the needle. And if it’s not in the suspected haystack – remember, we are guesstimating based on averages and correlations – then what? Too often, it’s back to reams of logs to find the next haystack to investigate. The granularity and precision of application data and the context surrounding the issue, especially as complexity grows, is fundamental for consistent rapid resolution.

Application Performance: Many Stakeholders, Different Skill Sets, Different Departments

���������� ����������� ���������������������������������������

��������������������������������������������

����������������������

������������ �������������

!������������������������� ��������������� ������

"����������������������#�$������������������������

"�������� ��������%��������������������������������������

&���'����������������������(����� )�*������

Regarding SLAs, most of the ones written today are no longer based on “average response times”. Rather, they are written to absolute response times as encountered by end-users, transaction by transaction. Furthermore, SLAs for the same component may also differ by transaction type. Therefore, traditional threshold methods which depend on averaged infrastructure and application component data are rapidly becoming obsolete. This is one of the reasons many companies have begun looking into “end user monitoring” which promises to monitor the actual performance from the end users’ perspective. The promise is real data at a transaction level for every user interaction. As SLA requirements tighten and go from end-user level down to individual services / components – to real data at a transaction level for every interaction – traditional component monitoring solutions must give way to a new approach.

3.2 LIMITATION 2: REACTIVE TROUBLESHOOTING ONLY – SIMPLE APPS

Some of the more sophisticated 2nd generation monitoring systems and load-testing products offer what they call “diagnostic” capabilities. Some claim that their diagnostics can “trace complete transactions” to give development a better view into what might be going wrong in a given application. Transaction level tracing is indeed the correct approach. However, the success of traditional tools in executing this strategy has been limited because traces have been averaged. The few 2nd generation products that claim support for distributed applications merely present performance averages based on a small subset of transaction traces, providing little value for either diagnostic purposes or SLA performance.

Page 8: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 5

The main limitations of these 2nd generation approaches to dealing with application performance are as follows. First, these diagnostic capabilities are commonly an add-on, not core to the original monitoring or load-test systems, and therefore, poorly integrated and hard to use. Second, all these diagnostic solutions are reactive – turned on after a problem is encountered. This is an obvious issue since problem recreation is often a long and very challenging process. Third, all these diagnostic solutions carry heavy overhead penalties of 20%-50%3 or more when their transaction tracing capabilities are activated. Therefore, they are run in production only on a reactive basis in emergencies, and only sparingly in test and performance centers. Fourth, these systems require development to stop what they’re doing and access the production system database, or develop custom trap methods for problem isolation as opposed to being simply “turned on” and automatically capturing problems. Finally, these systems are very difficult to set-up, and the data gathered, though better than nothing, is still of limited value to development for rapid resolution of issues.

Transactional tracing needs to be always-on 24x7 to reliably catch intermittent problems

Presentation Tier.NET Server

Business TierJEE Server

Data TierRDBMSMainframe

ExternalWeb services

Now consider this reactive, high-overhead approach given today’s multi-tier, globally distributed, dynamic applications that often incorporate various degrees of 3rd party code. First, to trace transactions requires the ability to traverse multiple tiers, a challenge not required (nor anticipated) when the traditional APM diagnostic tools were designed more than a decade ago. Next, to trace across global hops, remoting is required, again not a requirement for traditional APM solutions, and not one they support. Third, because today’s applications are more dynamic than ever, one can not simply turn tracing on after an issue. An applications’ behavior and underlying infrastructure may never be quite the same and the symptoms of the problem may change or point to different underlying application issues. And finally, these traditional

3 The percentage of overhead here measures the portion of total system resources consumed the transaction tracing processes of the APM system. In this sense, and APM system that has an overhead penalty of 50% is consuming fully one-half of the total systems resources, leaving only the other half of the system resources to process transactions.

Page 9: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 6

APM diagnostic tools were never designed to work with 3rd party code, a huge blind-spot when considering today’s service-oriented applications, off-shore application development, and 3rd party frameworks. Traditional APM and pre-production diagnostic tools, built to support the application development and troubleshooting requirements of a decade ago, may continue to suffice for simple applications, static production environments and in-house development teams. But for those who are embracing change and experiencing accelerating complexity, a new APM system may be needed.

3.3 LIMITATION 3: SILOED TOOLS – NO INTEGRATED LIFECYCLE APPROACH

In nearly all cases, traditional APM vendors amassed a cadre of tools that could be used for various tasks by different stakeholders throughout the application development lifecycle. Unfortunately, these tools have rarely been well integrated, forcing architects, developers and performance specialists to use human cycles and guess-work to correlate findings among themselves. This human factor and guesswork often leads to finger-pointing and wasted cycles trying to determine where to start to recreate and diagnose critical application issues. Whether in test or production, these wasted cycles cost real money in terms of application downtime, delays in market roll-out, and inefficiencies in development and test processes.

In production, application monitoring tools are the core APM offering. Some have added tools to “sniff the wire” between end user browsers and the web application itself to offer greater precision over “averages only”. Load test vendors argue that their virtual user monitoring tools should also be used in production to better anticipate potential performance bottlenecks. Others offer hardware and operating system monitors. Some claim their network monitors can help identify slow running applications. And many IT shops have developed tools and scripts of their own that they also use here or there. The output from all these tools provides a series of partial views into the production application environment and smart, human intervention continues to be required to figure out what it all means. Add more complexity, add more applications, add higher SLA expectations, and the management challenge here rises exponentially.

Though it’s well known that finding issues in production costs anywhere from 50x-100x more than when they are found in test or development, traditionally APM tools do not bridge the gap between production and pre-production. Proactive, preventative capabilities are missing from traditional production-focused APM tools. They monitor production environments 24x7, but it’s the role of a different set of tools built for non-operations stakeholders that are used in pre-production and for troubleshooting production issues. Unfortunately, the perpetuation of the silos between production and test continue to block strategies for effective pro-active prevention which continues to cost us money, costs us time, and costs us customer satisfaction.

4It is important to note that there is a key part of the application development lifecycle that hardly does any performance testing or runtime architectural reviews, and that’s development itself. A decade ago, waterfall development methodologies recommended splitting development from test, but that philosophy is rapidly changing

4

Most issues are introduced in early development phases but found late in test or production

Page 10: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 7

with Agile techniques and continuous integration. The demand for new application features and capabilities to enhance our ability to compete and win business is driving faster and faster release cycles. And simultaneously, architectural complexity and the requirement to rapidly scale in production are putting pressure on quality and test coverage. Despite all the efforts to build better code, the fact remains that well over 50% of application issues discovered in production could have been found and fixed in development if the proper systems had been in place. And, with the rise of virtualization and clouds, architects must now consider the production environment when developing their applications, thinking through the implications and strategies of how to validate their applications well before they are rolled out to the world. Amazingly, despite all the potential value in bringing APM into development, the only “visibility” tools available to development have been profilers and debuggers. But debuggers are not APM tools as they are for solving functional problems and stall the application for line by line execution. Similarly, neither are profilers APM tools because they cannot be run under load and in distributed environments. These limitations prevent these tools from diagnosing most scalability problems, which occur under load in distributed systems.

Development managers need proactive performance management, and only a continuous feedback loop between developers and automated testing will shift the mindset and behavior to a proactive one. A full lifecycle approach to performance, scalability and stability given today’s more aggressive application development processes, dictates that development be included in future APM solutions.

3.4 LIMITATION 4: OLD ARCHITECTURE – STATIC APPS ONLY

As applications have become increasingly complex and dynamic, architects can no longer predict the exact runtime behavior of their applications. They know what their applications are supposed to do, but no one knows how they really behave and how transactions are really being processed under load. This is due partly to the increase in services being used and the widely distributed nature of today’s multi-tiered applications. In addition, there is a growing amount of dynamic code coupled with 3rd party code (either open-source or outsourced) and frameworks. The dynamic code executes under load only, and the behavior of third-party code and frameworks is often impossible to determine even when the application is live. Now fold in a clustered or virtualized production environment in which resources are shared and / or dynamically provisioned and suddenly no two transactions take the exact same path. How can an APM tool that only sees average response times, and can only be used after a failure occurs to provide troubleshooting guesstimates be of much value in anything more than letting you know you may have a problem somewhere in your application?

The rise of virtualized data centers and cloud computing, dynamic resource provisioning, dynamic application behavior and heterogeneous environments will break traditional APM approaches that were designed for earlier, more static environments. Traditional APM approaches require manual provisioning – VM by VM, server instance by server instance – and any dynamic provisioning is restricted to identical instances on identical hardware. Further, if diagnostic tracing is ever turned on and used, the set-up and provisioning challenge will very likely outweigh the risk and value of the results in a dynamic environment. The virtualization vendors will provide dynamic solutions to their layers of the stack, as VMWare has done with AppSpeed, to provide insight into their hypervisor layer. The next generation of APM solutions must follow that with the full range of dynamic provisioning to provide true insight and visibility into the application layer itself, including transaction tracing for rapid root-cause analysis and mapping dependencies down to the individual services, and always-on monitoring required to be effective and valuable in these continuously changing production environments.

Page 11: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 8

Virtualized Front-end Services

Virtualized Back-end Services

Dynamically deployed serverto process increased load

Execution paths change on the fly due to virtualized on-demand service deployment

3.5 LIMITATION 5 – DIFFICULT TO INTEGRATE INTO EXISTING ENVIRONMENTS

All companies have investments in tools and processes around application development and production operations. Many have large investments. No environment is homogeneous with uniform hardware, app-server instances, development processes between teams, and application architectures. And as IT continues to align to the unique requirements of each business unit, diversity may actually increase. Therefore, APM tools must be easily integrated with pre-existing systems, self-managed with complementary automation interfaces to fit pre-existing processes, and highly extensible to adapt to future needs. Traditional APM tools rarely fit this description.

In fact, traditional APM tools are often wrapped with extensive professional services. And professional services are not used simply to install and train, they are used to build integrations, configure scripts and workflows, and write the “glue-code” to make their tools work in a pre-existing environment. In fact, some of the most popular application monitoring systems are so difficult to use and configure that they require professional services simply to instrument an application. Make any architectural or production modifications, and professional services are again needed to rework the

instrumentation. Moreover, simply the integration of data capture into dashboards and other application management consoles is often a professional services job. This is a very costly and inefficient arrangement that has the negative side affect of hindering adaptability and agility at the same time. This may have been tolerable a decade ago, but it’s not the expectation today.

LoadController

Load test control and analysis to uncover problems

Log

Log

Log

LogLog

Application data collection control and problem diagnostics

?dynaTrace’s lightweight and scalable architecture

Page 12: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 9

It should be noted that most of the traditional APM vendors have amassed their “suites” through years of acquisitions. These tool acquisitions, while providing lots of check marks on an RFP, remain poorly integrated. Sometimes this is simply neglect, but often it points to a bigger issue, namely that the tools are actually incompatible and are hard, if not impossible, to integrate. Rather than do the costly re-engineering work it would take to do the proper integration, these traditional vendors have chosen to leave it to their professional services organizations to cobble together “solutions” in the field. A large professional services component accompanying a “suite” purchase is a clear indicator that the “suite” may not be well integrated, and that continued use of it will cost more and more professional services dollars with every change a customer needs to make in their application environment. Too often, what sounded like a good idea in PowerPoint becomes a very costly and limiting solution in practice.

Taken together, these limitations of traditional APM approaches, especially in light of the accelerating application complexity we are encountering, are driving the urgency for a new APM approach. This new approach must take into consideration the limitations described above and must anticipate the future requirements arising from virtualized data centers, cloud computing, globally distributed service-oriented applications, increasing use of 3rd party code and Agile development techniques.

APM suites consisting of multiple acquired tools are difficult to integrate

Page 13: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 10

4 THE NEW APM REQUIREMENTS

A 3rd generation APM approach starts with a different center-point than traditional APM approaches. Problem identification is still important, but no longer the sole objective. Rapid resolution and proactive prevention of application issues are the new focal points, considered by most, including Gartner and Forrester, to be more fundamental and more valuable than simply identifying problems. In addition, 3rd generation APM goes beyond “tool suites” to an integrated systemic approach across the application development lifecycle, breaking down silos and accelerating automation. And 3rd generation systems must offer greater application visibility under load, provide business transaction transparency to business owners, and fit easily into existing environments for fast time- to-value and lowest total cost of ownership. These new systems must solve today’s application performance and scalability challenges as well as tomorrow’s.

The cornerstones of a 3rd generation APM solution are outlined below. Taken together, they offer a dramatic reduction in the time and resources it takes build, test and manage applications throughout their lifecycle. Developer time is optimized, test time is optimized, production issues and maintenance costs are kept to a minimum, and time-to-market for new and enhanced applications accelerates. The value proposition can be significant, and the time-to-value surprisingly short.

4.1 REQUIREMENT 1: TRANSACTION-CENTRIC

The only data that provides true visibility into application performance is real transaction data as observed and captured under load. And the transaction data must be the true trace from end to end, across physical and logical tiers, across remote hops, from web browser to database and back. Not just timing data needs to be observed and captured, but so does the additional context surrounding the transaction – methods arguments and return values, SQL statements and bind values, messages, remoting parameters, exceptions, logs, synchronization, CPU utilization, memory and so on. With true transaction detail including the context surrounding each transaction an entirely new approach to APM is enabled.

First, no longer must we rely on averages and correlations – we now are working with the actual transaction detail. There is no more guesstimating. The facts are clear. When troubleshooting, we can now drill down from abnormal transactions – too slow, too many exceptions, too many SQL statements, etc – or from alerts when thresholds are breached - directly to the code level and runtime dependencies. We can quickly find needles in the offending haystack in minutes. And with integration with IDEs like Eclipse and Visual Studio, problem diagnosis and resolution become straightforward, further reducing MTTR.

When managing SLAs, we can now guarantee absolute response times, not just average response times. For any offending transaction we can drill down to diagnose the reason for this “outliers”, pinpointing the root cause at the code level for rapid resolution. In fact, with transaction detail and associated context, we can group transactions by any particular user or customer to see exactly what their response times are and for any time period – a very powerful capability for advanced SLA management.

Tracing transactions across system and technology boundaries

Page 14: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 11

Vvisibility into the true runtime behavior of even the most complex applications is enhanced dramatically. Visualizing application behavior under load through a UML diagram that allows drilldown on any VM, tier or service can be enlightening, especially for architects. Modeling is good, but visualizing and monitoring true runtime behavior at the transaction level provides missing detail for architectural validation and for proactively anticipating scalability or stability challenges before they cause user disruption.

In the case of 3rd party code, a service, framework, or entire application can be made transparent without having to understand its internals. This gives architects and performance testers the ability, for the first time, to unlock the black-box of 3rd party code, dramatically increasing visibility and test coverage and accelerating everything from application validation to root cause diagnosis. Armed with the actual transaction detail and context, no longer is there an argument with the 3rd party regarding whose problem it is if an incident occurs, a bug is found or an architectural flaw is uncovered.

Finally, when considering virtualization or cloud computing, a transaction-centric APM approach is the only way that delivers true application-level monitoring, diagnosis and visibility. Even then, the APM system must be aware of both physical and virtual timing parameters to accurately trace transactions. But only from within the application itself can physical and virtual timing parameters be gathered to clarify issues between guest OS, hypervisor and the application itself. Virtualized environments add a new layer of complexity to application management, and only virtualization-aware transaction-centric APM systems can provide the required visibility and transparency needed to assure application performance and scalability in these highly dynamic environments.

4.2 REQUIREMENT 2: CONTINUOUS, 24x7 OPERATION

A transaction-centric approach is most valuable when it can be run continuously, 24x7. In production, all transactions should be captured, runtime visibility must be true and accurate, and if any issues occur, they should be captured with no problem recreation necessary. In test or staging, a transaction-centric APM system should be run along side every load test, again, providing visibility under load that is true and accurate, and when issues are identified by the load test software, the corresponding transaction detail must be immediately available for analysis and drill down. In addition, application behavior in production can be captured, modeled and incorporated into enhanced test suites in test and staging to accurately model 3rd party services that are not available in a test scenario, thereby increasing test coverage and realism to stop regressions from making their way back into production. A continuous approach, using the 3rd generation system 24x7 in production, test and development, takes the transaction-level value well beyond reactive troubleshooting to proactive problem identification, resolution and prevention.

As discussed above, traditional APM systems, when their transaction tracing add-ons are activated, are only allowed to run in production for emergencies and only sparingly – even for load tests. Their overhead is simply too great to allow for 24x7 continuous use. To break this barrier, 3rd generation APM systems are light-weight enough to run at very low overheads, typically 3% - 5%. To accomplish this, a new architectural approach is necessary that combines innovative agent design with a highly scalable server-collector architecture. All data processing needs to be offloaded from agent to server, and data collection and transfer should be managed by agents and collectors. This elegant architecture, when implemented

Visualization an application’s runtime behavior

Page 15: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 12

effectively, is capable of tracing and capturing transactions (and their associated context) for globally distributed applications processing millions of transactions a minute.

Java .NET Database

AnalysisServer

WAN

Collector

WAN

Agent Agent

Offloading transaction data processing to a central server enables 24x7always-on transaction tracing at production-safe overhead levels

Running continuously, with unprecedented transaction detail, 3rd generation systems offer a new level of business transaction management (BTM) that heretofore has been unavailable. IT can now offer business managers real-time dashboards of key business transactions. Such transactions can be anything from online offers, to new user accounts, to downloads, to searches, to top customers’ orders and so on. Any transaction group that is performance or scalability sensitive can be monitored, and if issues occur, immediately analyzed, diagnosed and corrected, often before issues are noticed by the users themselves. There are other vendors who specialize in BTM solutions only, but like traditional APM tools, they monitor for problems only; when problems occur a troubleshooting team still needs to be formed, logs pulled, and the arduous task of problem recreation and diagnosis begins. In the meantime, users suffer, and business is lost. BTM is an important new requirement for 3rd generation APM systems.

Perhaps most important is the reality that dynamic application environments – such as multi-application clusters, virtualized data centers and clouds – require 24x7 transaction-level monitoring for an APM system to add any value whatsoever. Because no two transactions necessarily take the same path under the same conditions, the connection between symptom and root cause is much more difficult, often impossible, to determine without the true, global transaction details for each transaction. So when planning for the added complexity of future application environments, a 3rd generation APM system may be an essential pre-requisite for assuring application performance, scalability and stability success.

Service”Login”

VULondon

Service”Credit“

Tenant#243

Service”Booking“

Tenant#123

Slow!

Fails!

Business Transaction Management linking real-time business dashboards to deep diagnostic views

Page 16: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 13

4.3 REQUIREMENT 3: INTEGRATED ACROSS THE LIFECYCLE

A single, integrated APM system that supports production, test and development has been a desire for many for years. Such a system provides the foundation for proactive problem resolution and prevention, with the promise of substantially reducing total cost of ownership of an application over its 5-10+ year life. But the suites of tools offered by traditional APM vendors has consistently fallen short, requiring considerable professional services to glue the pieces together and manage them year in and year out.

3rd generation APM systems break through this barrier with ground-up design that anticipates and embraces this requirement. First, the system must run and add appropriate value in production, test and development. It must support all stakeholders in the application performance management chain – from developer, to architect, to operator, to IT manager, to business manager. Components must be reusable, approach and technique consistent, with an integrated collaboration framework to automate data capture and communication. The requirement for data capture - the recording of outliers, transaction by transaction, with appropriate context – should not be overlooked. Exact recordings of application behavior under load are required for fact-based communication between application stakeholders, especially if they are geographically distributed, or employees of another company. Recorded information should be available online in real-time, to allow analysis of transactions that run endlessly, as well as offline, because often there is no real-time access for developers. The ideal state is to minimize miscommunication, eliminate finger pointing, and enable immediate problem resolution by the right team and team member.

Second, these APM systems must plug-in seamlessly to, and enhance the value of, existing performance environments. In production, this means plugging easily into everything from the application itself, to the management consoles used, to the alerting process as defined, to the incident tracking system of record. In test, this means plugging easily into everything from the application to be tested, to the load tester used, to troubleshooting processes, to the incident tracking system of record. And in development, this means flexibly supporting development processes (e.g. Agile), integrating with continuous integration systems, and integrating with the IDE of record. A 3rd generation APM system should add value quickly by speeding up and/or automating away redundant, time consuming tasks done by highly paid individuals. And, it should enable expand value over time as the application development processes are optimized across silos

For those planning for the future, APM solutions that integrates across the lifecycle will prove a necessity when

CentralDB

Business, System Architect and Developer View on Single Data Source

APMfor Production

APMfor Test

APM for Development

IDEs:•Eclipse•Visual Studio

CI Build Systems:•Ant, NAnt•MSBuild•Maven

Load Testers:•HP LoadRunner•MicroFocus

SilkPerformer•Microsoft VSTS•IBM Rational

Performance Center•Radview WebLOAD

Ops Mgmt:• IBM Tivoli•HP OpenView•BMC PM•CA Unicenter•MS SCOM

•iTKO SOA Test•Neotys

NewLoad•Proxysniffer•Push2Test•JMeter•OpenSTA

End-User Monitoring:•Coradiant TrueSight•BMC TM-ART

Continuous APM requires seamless integration across the lifecycle

Page 17: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 14

building and optimizing applications for virtualized data centers and clouds. No longer will application architecture and production environments be treated as two independent decisions. How we architect and build applications to properly scale in a virtualized production environment is a fundamental consideration and early design requirement. And the only reliable and consistent way to manage this interdependency will be with an APM approach that truly aligns and integrates development, test and production effectively.

4.4 REQUIREMENT 4: HIGHLY AUTOMATED

There are 3 levels of automation to be considered when assessing the value of a 3rd generation APM system. First, is the level of automation for application instrumentation itself, a heavily manual task in traditional APM tools. Second, is the level of automation to effectively eliminate redundant, time consuming tasks that cost us time and money in lost resource and process efficiency. And third is the level of automation provided to handle complex, dynamic application environments, a requirement that will continue to increase in urgency as application complexity grows.

Application instrumentation in traditional APM systems is highly manual, difficult to do and can cause significant application issues if not done exactly right. 3rd generation APM systems get around this with smart agent technology and sensor placement wizards. Agent management is consolidated, handled centrally from the server. Agents themselves deploy automatically, placing their sensors at logical application component boundaries. Ideally, the agents should be “self-learning”, centrally configurable, automatically adjusting to load and/or new production configurations. In addition, wizards should help you with placing the sensors to maximize visibility with the lowest possible overhead and the agents themselves should be aware of the overhead they may incur to help guide administrators in proper instrumentation. Certainly, manual over-ride must be available for any agent-sensor placement and parameter, however the more automated and intelligent the instrumentation system is, the easier the implementation, the easier the management and the fewer the snafus.

The automation of redundant, time consuming tasks is of huge immediate value when implementing a 3rd generation APM system. Consider the time lost today by developers, architects and testers as they sort through load test reports, rerun tests with different parameters, and sort through thousands of lines of log files in an attempt to isolate issues that might be causing performance issues. Research shows that development alone spends 10%-15% of their cycles recreating, diagnosing and fixing performance issues found in test. Imagine if every issue identified in a load test report had the full transaction level detail available for performance analyst and developer / architect review. Plus, envision if the APM system automatically executes the most common diagnostic steps to accelerate problem triage. No test documentation to complete, no logs to pull, no more test reruns, no debate about who is responsible and no more arguments about what and where the issues are. Developers and architects can spend their time fixing bugs and enhancing applications rather than analyzing, recreating and diagnosing suspected issues.

Likewise, consider the time lost by operations, architects, testers and management when an application issue is found in production. Documenting incidents is today a manual task costing on average an hour of time per incident, according to Forrester. A team is assembled to review the report from the monitoring tool, with its averages and correlations, make guesstimates of where to start, have logs pulled, scatter to do more due diligence, participate on long phone calls, attempt problem recreation in test or staging, and so

Load Testing with Integrated Performance Data Collection

Page 18: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 15

on. Forrester estimates that development alone spends as much as 20%-25% of their time analyzing, recreating and diagnosing application issues found in production, and 25% of the time they never find the root cause of the issue. Now imagine if the monitoring system, at the time the alert was triggered, also captured a recording of the transaction detail and associated application context causing the breach - operations hands the detailed recording off to an architect, the architect drills down and triages to the appropriate development team, and the development team finally identify the offending lines of code and drop them into their IDE to make the fix. No more long conference calls, no more guesstimating, no more wasted motion. Development is much more productive, customers are happier, and management rests easier.

As we move to dynamic production environments, agent based APM approaches will be particularly challenged if they have not been architected with this level of complexity in mind. Virtualized production environments and clouds will almost never be homogeneous with identical hardware configurations, OS configurations, application server configurations and so on. Therefore, to support these advanced production environments, new APM systems will need to dynamically provision each instance and initiate data gathering immediately to assure end to end transaction integrity is maintained. Likewise, as transactions take unanticipated paths and application behavior shifts unexpectedly under load, the APM system will need to dynamically adjust to capture the full paths accurately, in detail, to assure performance, scalability and stability.

Automation is a multi-faceted requirement for next generation APM systems, and an opportunity for significant value to be gained by those who incorporate them into their application development lifecycle. And as complexity increases, as our production environments become increasingly virtualized and dynamic, automation will move beyond a convenience to an imperative.

4.5 REQUIREMENT 5: OPEN TO ENHANCE EXISTING INVESTMENTS

As discussed in the “Limitation” section above, every company already has a significant investment in application management tools and processes. Each stakeholder and team across the application development lifecycle has a way they work today. It may not be optimal, but it’s familiar. Any new system that is introduced into an existing application management environment must fit seamlessly and enhance the value and efficiency of prior investments in tools, systems and processes. And integration should be easy and extensible so that professional services can be held to a minimum, and self-reliance by the customer maximized.

3rd generation APM systems have the advantage of being new, architected with modern techniques to provide an easy to work with, easy to extend, open environment. An open source model (e.g. OSGi) for monitors, extensions and integrations, is ideal for providing customers both fast time to value and maximum extensibility over time. The APM system company itself can enhance its offering over time by providing its customers with an ever expanding list of plug-ins for various adjacent capabilities such as system and OS monitors, synthetic transaction extensions, and integrations to build systems, load testers, management consoles and incident tracking systems. And customers can add new plug-ins of their own or modify existing plug-ins tuned to their specific requirements. All plug-ins, being open source, can be used as models, easily modified, or easily validated for active use. If a community portal is also made available, plug-ins can be shared between customers adding even more value beyond the initial APM investment.

Page 19: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 16

Continuous Application Performance Management Systems

Integrations-IDEs

-CI and load testing-Operations management

-Incident management

Extensions-Preconfigured sensor packs-Automatic session analysis

-Management actions

Monitors-Synthetic end-user monitoring

-URL pinger-System monitoring

OSGi OSGi OSGi

An open plug-in model guarantees easy extensibility leveraging existing investments

An open data model is also very important to the integration capability of a 3rd generation system. Traditional systems often require custom integrations to simply export data in the proper format to satisfy a customer’s unique reporting needs or to fit a particular business process. 3rd generation APM systems should provide open Web Service based access to human readable performance reports and to structured XML-based performance results that allow you to automate repetitive analysis tasks (e.g., did performance between subsequent test runs degrade, does a transaction execute more than 100 SQLs, etc.). Additionally, being open also means to support Open Source databases like PostgreSQL in addition to commercial databases like Oracle, DB2 and SQLServer for permanent data storage.

Taken together, these requirements define the 3rd generation APM system approach that works for today’s increasingly complex application development challenges and for tomorrow’s globally distributed, highly dynamic application environments. It’s an approach that maximizes the efficiency of application development, testing and troubleshooting, thereby saving significant time and resource across the entire application lifecycle.

Page 20: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 17

5 THE DYNATRACE SOLUTION

dynaTrace software was founded in 2005 to address the new APM requirements being driven by the rapid acceleration of complexity in new development techniques, new application architectures and increasingly complex production environments. It is the industry’s first 3rd generation approach to application performance management. Monitoring, is only the beginning. dynaTrace has combined business transaction management (BTM), traditional monitoring (APM, with “m” standing for “monitoring), deep diagnostics from business transaction to code level for every transaction, and proactive prevention across the application development lifecycle into a single integrated system. The system is innovative, easy to use, and provides value far beyond the traditional APM tools.

There are several breakthroughs dynaTrace has made which, taken in combination, provide the unique power of our system. These breakthroughs are required for today’s advanced applications, and anticipate the requirements of tomorrow’s virtualized data centers and cloud computing. Below is a brief discussion of these breakthroughs. This is not intended to be a complete description of the dynaTrace solution, but rather an introduction to dynaTrace’s innovative approach to APM.

5.1 PUREPATH GLOBAL TRANSACTION TRACING

At the atomic level of the dynaTrace APM system is a patent pending transaction tracing technology that the company calls “PurePath”. The PurePath is the true path a single transaction takes from service to service, instance to instance, tier to tier, hop to hop from entry point to backend database and back. Accompanying transaction timing details is a full set of context surrounding that transaction – memory, cpu, SQL statements and bind values, synchronization parameters, method arguments and return values, exceptions, logs, remoting and so on. And each transaction is monitored, checked against pre-set parameters for out-of-bounds conditions, and recorded for off-line review and coordinated communication.

Having the true transaction detail for any transaction, no matter how complex, is hugely empowering. Business transactions can be monitored for true performance against SLAs, with any and all outliers captured for proactive analysis, diagnosis and repair. Troubleshooting even the most challenging issues, whether in test or production, is dramatically accelerated as guesswork is virtually eliminated. And the communication between stakeholders, from initial finding, to triage, to code fix is accelerated as well since the actual PurePath recording of offending transactions becomes the lingua franca all stakeholders can count on. With transaction detail and context, 3rd party code and frameworks can be penetrated for the first time, decompiled for troubleshooting with associated PurePaths for clear communication of issues. Often for the first time, architects can visualize

the true behavior of their applications under load to validate architecture and anticipate scalability issues PurePath transaction analysis pinpointing a Web service issue

Page 21: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 18

they would have never been able to determine before. And for those who are already virtualizing their production environments, transaction performance can be monitored and measured with confidence, including the variances between physical and virtual timing service by service, component by component, instance by instance.

The PurePath is the foundation of the dynaTrace system, but by no means its only breakthrough.

5.2 LIGHT-WEIGHT AGENT WITH SCALABLE SERVER-COLLECTOR ARCHITECTURE

dynaTrace’s PurePath approach was architected from the ground-up to run continuously in the most demanding production environments. System overhead is kept to a minimum by an innovative architecture that breaks from traditional approaches. The agent used is very lightweight and does not increase in footprint under load. It instruments the bytecode of the target Java / .NET application to inject sensors that gather the PurePath data automatically; no source code changes are required. All data is offloaded to the dynaTrace server for processing and recording, and collectors are used to support agent communication and data transfer for very demanding and/or globally distributed applications. This innovative architecture breaks the barriers that blocked traditional monitoring tools from continuous use when data capture add-ons were turned on. dynaTrace’s lightweight agent and scalable server-collector architecture is field proven in very large production environments with PurePath data captured continuously, 24x7, at 3%-5% overhead.

When a transaction tracing APM system can be run continuously with very low overhead under heavy load, the opportunity for value increases dramatically. For the first time, true runtime behavior under load can be monitored and visualized at a detailed application level giving architects unprecedented insight for proactive troubleshooting and application dependency mapping. SLAs can be monitored and managed at absolute response times, not just average response times. Business transactions can be managed and graphed on a real-time basis giving business users and operators deep visibility into the performance characteristics of their most important transaction sets. Dependency maps of all your services are based on real transaction routes and thus available on the most granular level and always up-to-date. Never again is the time consuming task of problem recreation needed since there is already an exact recording of offending transactions and application context. There is no more guesswork to be done to determine root cause, no more logs to pull and pour through. There are no more communication errors as symptoms are described, often by non-developers. Response behavior of external services can be documented and used to create accurate mocks when the application accessing them is put into the test lab, which extends realism and test coverage. And the iterative task of building and testing new code and enhancing existing code can be accelerated as test reruns are eliminated and regressions are found immediately. With continuous transaction tracing always on, proactive problem prevention is truly possible which simultaneously increases productivity, accelerates application development and improves application performance and quality.

Internet

Internet

Clients

Application Servers

Web Services

Database Systems

Mainframe

Web Servers

Internet

dynaTraceAgent

dynaTraceCollector

dynaTrace Clients with Custom Dashboards

PurePath ®

dynaTraceServer

Bytecode Sensors

dynaTrace’s lightweight and scalable architecture

Page 22: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 19

5.3 INTEGRATED LIFECYCLE APPROACH WITH INTEGRATED COLLABORATION SYSTEM

The third breakthrough of the dynaTrace APM system is its lifecycle approach. Unlike traditional APM vendors who offer a “suite” of tools and large professional services organization to build out the integrations, dynaTrace architected its system from the outset to support the entire application development lifecycle – a single, integrated system supporting development, test, and production. By doing so, automation of a number of redundant, time consuming tasks can be easily accomplished, plus the opportunity to improve business process across silos – e.g. between operations and development, between test and development, between business users and operations.

To support the entire lifecycle, the same system must be able to run and add value in all 3 core areas of the application development lifecycle. dynaTrace uses the same APM system foundation, including PurePath and its agent-collector- server architecture, for 3 targeted Editions of its system – the Development Edition, the Test Center Edition, and the Production Edition. Each Edition is packaged with out of the box dashboards, integrations and instrumentation configurations to accelerate time to value for development, testers, and operators respectively.

Development Edition

This Edition targets development organization interested in increasing the number and predictability of releases. This Edition integrates with popular continuous integration systems to provide early visibility into application behavior under load. Most development organizations incorporate some level of functional testing early in the development cycle, but rarely has performance and scalability testing been available at this early stage. The dynaTrace Development Edition makes this easy with pre-built plug-ins for popular build systems and provides a series of standard dashboards and version diffing capabilities to accelerate time to value. Used continuously, with every build, architectural flaws are reduced, development time is reduced, test time is accelerated, time to market is accelerated, and production quality improved. Enabled by automation and the lifecycle approach, the continuous feedback loop between the integration test system and the developers local performance work through dynaTrace Development Edition enables the ever needed but never achieved change from sporadic tactic troubleshooting to strategic proactive performance engineering. Architects will further leverage the dynamic application behavior analysis, to validate the architectures scalability, avoiding costly design flaws.

Test Center Edition

The dynaTrace Test Center Edition significantly reduces test time and test repetitions/cycles while increasing test coverage through opening the black box, even for the most complex applications. This Edition automates a number of redundant, time consuming tasks and interactions between test and development. The Test Center Edition is designed to run with every load test, providing PurePaths for every issue identified in the load test report. Tester no longer need to document issues and pull associated logs. Developers get the exact transaction level recording, with context, associated with each issue in the report eliminating guesswork and test reruns, and accelerating diagnosis and repair – even

Performance Regression Dashboard unveiling Application Components Facing Regression Problems

Page 23: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 20

for 3rd party code. Architects and performance experts “can” (package) their knowledge into Knowledge Sensor Packs and dashboards, to automate performance analysis steps and enable a broader team of testers leveraging the expertise. No more issues go unsolved, and architects get the added benefit of architectural validation under substantial load prior to production roll-out. And, the PurePath enables developer-centric communication, accelerating teamwork between test and development even for geographically distributed teams. Similar to the Development Edition, the Test Center Edition packages integrations with popular load test tools (e.g. HP Loadrunner, Micro Focus / Borland SilkPerformer, Apache JMeter, ProxySniffer, PushToTest, BrowserMob, iTKO, etc.) and incident tracking systems, and provides a series of standard dashboards and version diffing capabilities to accelerate time to value. Using the Test Center Edition continuously, for every application and every load test, test time can be reduced by as much as, test iterations by as much as 80%, while test effectiveness and coverage can be increased significantly.

Production Edition

This Edition targets a series of stakeholders beginning with production operators and managers, business users, performance experts, system architects and for the resolution and prevention process also development. For production operators, they will see little change from the traditional monitoring tools they are familiar with – dashboards will look that same, alerts will happen the same way, and triage will proceed the same way. But looking deeper, dashboards are more flexible so they can be designed for and empower all stakeholders - from business users to architects to developers – as well as double for reporting at a single click. Data capture, when an incident occurs, is automatic and can be automatically populated into the incident tracking system of choice. And the data provided troubleshooters and/or development is no longer subject to interpretation and guesswork – it’s the PurePath transaction detail described above.

In addition, the Production Edition automates a number of manual tasks associated with traditional monitoring systems. Sensor placement is automatic and self-learning, agents and collectors are centrally managed from an easy to use server console, and default instrumentation levels are pre-set to reduce impact of over-instrumentation. Packaged with a number of plug-ins for enterprise management systems and their event dashboards, incident tracking systems, server and OS monitors and standard dashboard, the Production Edition is easy to incorporate into any existing production environment and quick to add value in a variety of ways for a variety of stakeholders.

Operations dashboard highlighting a problem with a business transaction

Problem triage dashboard highlighting a problem in the application under test

Page 24: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 21

Development Edition-Architectural Validation & Review

-Proactive Performance Tuning-Continuous Integration

Performance Management

Production Edition-Business Transaction Management

-Application Monitoring -Application Dependency Mapping

-Scalability Tuning

Test Center Edition-Performance Regression Analysis

(Component-level)-Scalability Tuning

dynaTrace Platform©2009 dynaTrace software

Development

ContinuousIntegration

Staging / 24x7 ProductionQA / Load Testing

Problem Documentation and IsolationDeep Code-level Diagnostics (Real-time & Offline)

Problem Resolution

Individual Building Blocks for an Integrated Lifecycle Approach

5.4 AUTOMATED DIAGNOSTICS AND SMART AGENT AUTOMATION WITH DYNAMIC, ON-THE-FLY PROVISIONING

The dynaTrace APM system breaks through the traditional limitations of manual, static APM approaches with an innovative system that supports the 3 levels of automation required by today’s dynamic applications and high efficiency IT organizations. First, as described above, the dynaTrace Editions automate a number of repetitive, time consuming tasks with diagnosing application performance issues very effectively while providing an additional value opportunity for business process change over time. The former provides a rapid return on investment, often in less than 6 months, while the latter can provide significant on going value as silos are broken down and cross-functional teamwork improves.

Second, dynaTrace has developed an innovative smart agent technology that automates a series of tasks that in traditional systems were manual, often needing vendor-specific expert services to do properly. dynaTrace agents are easily managed centrally at a central server. Agents are “smart”, self-learning the application and auto-discovering the application architecture to automatically deploy themselves and their sensors at logical application boundaries automatically. A sensor placement wizard guides you with configuring the bytecode instrumentation of the target application in order to achieve maximum visibility with minimum overhead, a particularly useful capability when working with unfamiliar 3rd party code. Application specialists can tune the sensor placement easily using a point and click visual class browser, as well as configure custom sensors for deeper levels of insight. Instrumentation for a new application can be done in a fraction of the time, with much greater confidence than with traditional APM systems. And applications can be instrumented with multiple sensor

Automated Transaction Analysis Unveiling Database Access Problems Causing Slow Web Request Response Times

Page 25: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 22

dynaTraceServer

dynaTraceCollector

Web Trans M.

SNMP

My Cust. Monitor

Web Trans M.

Unix System Moni.

My Cust. Monitor

Web Trans M.

Unix System Moni.

My Cust. Monitor

Unix System Moni.

SQL Server Mon.

DB2 MonitordynaTraceCollector

SNMP

SQL Server Moni.

My Remedial Action

Intranet

Web Trans M.

WAN

MonitoredApplication

Server

MonitoredDatabase

configurations - easily swappable on-the-fly with no system interruption - for varying levels of detail that may be needed from time to time.

In addition, this smart agent technology supports advanced application architectures like globally distributed applications and virtualized data center environments. Like local applications, globally distributed applications are managed centrally, with collectors placed throughout the distributed application to support agent deployment, management and communication flow. With this intelligent server-collector approach, agent distribution, configuration, management and sensor placement is as easy as if all were local. And, in heterogeneous, dynamic environments, dynaTrace senses configuration changes, such as when a new VM instance is added, and automatically configures the new instance and begins gathering data on the fly, integrating new traces seamlessly. The more complex the environment, the more valuable these smart agent capabilities become for continuous APM operation.

5.5 OSGI PLUG-IN MODEL FOR OPEN INTEGRATION AND EXTENSIBILITY In recognition that every customer’s application environment and pre-existing management tool-set will be different, and that these same companies are striving to streamline operations and reduce professional services costs, dynaTrace has opened its system with an open source, OSGi-based plug-in model for integrations, extensions and 3rd party services. This provides integration break-through in the way that partners and customers can extend dynaTrace on their own with open source models and the dynaTrace plug-in development kit. This allows customers to get tailored solutions specific to their needs at no or very low additional cost. And perhaps most importantly, customers can leverage plug-ins built by others to further accelerate time to market and increase the value of their investment over time.

dynaTrace Plug-Ins today come in 3 flavors – monitors, extensions and integrations. Examples of available monitoring plug-ins include passive monitoring plug-ins for e.g. SQLServer, VMWare, database content query, command line tool adapter, as well as active monitoring plug-ins like synthetic Web Transaction monitor and URL pinger. Examples of available extensions would be Knowledge Sensor Packs, such as for Spring, SharePoint, SAP JCO, and many more, as well as feature additions such as annotation based sensors or an advanced automatic session analysis library. Examples of available integrations would be: for example plug-ins to send alerts into HP OpenView, SCOM or Tivoli, or plug-ins for load test or build system integrations. And more are being added by dynaTrace, and its partners and customers all the time.

dynaTrace offers plug-ins as downloadable modules from its Community Portal, open to all trained customers and technology partners. Plug-ins are described,

Sensor placement wizard accelerating application instrumentation

Plug-in management infrastructure executing monitoring plug-ins

Page 26: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 23

documented, and rated on the Portal. Customers can easily download and try any plug-in, as well as use the dynaTrace built-in Rapid Plug-In Development Environment to create new, adopt existing and share own plug-ins by uploading it to the community. The Rapid Plug-in Development Environment allows for easy customization of pre-existing plug-ins and for development of new plug-ins to fit current or future needs.

Page 27: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

Continuous Application Performance Management

dynaTrace software ©2009 24

6 CONCLUSION

Over the past 5 years, application complexity has accelerated dramatically, not just in terms of scale, sophistication and architecture, but also in development techniques, production environments, and performance expectations. The cumulative impact of these changes has created a perfect storm that requires a re-evaluation of traditional approaches towards application performance management. The great opportunity presented by new development and production technologies that promise to accelerate time-to-market, cut development costs and reduce production footprint and power consumption may become a nightmare for guaranteeing application performance, scalability and stability without a new approach.

dynaTrace’s system for Continuous Application Performance Management overcomes the limitations of traditional APM tools and is the first solution to meet the new APM requirements to solve today’s increasingly complex application performance challenges. The architectural and technical breakthroughs inherent in the dynaTrace system are enabling the Global 2000, innovative governmental organizations, and leading technology companies to transform the way they manage performance throughout the application lifecycle. No longer simply ‘another tool’, the dynaTrace APM system is re-defining how applications are built, tested and managed in production. It is a single system supporting the entire lifecycle, providing application professionals unprecedented insight into even their most complex applications.

With targeted editions of the dynaTrace APM system for each stage of the lifecycle, dynaTrace enables you to:

• Improve time to market for new or enhanced applications by 30-50% Eliminate and accelerate test-fix cycles and avoid end-of project architecture overhauls that cause significant delays

• Reduce mean-time-to-repair by 90% Accelerate resolution time to minutes from days or weeks

• Ensure application performance supports business goals Manage SLAs and increase test coverage to ensure customer satisfaction and to drive more revenue

• Integrate with your current systems and workflows Leverage your existing investments and streamline your current processes to be more efficient

• Automate to do more with less Eliminate redundant, manual tasks and free teams to focus on more value-added initiatives

Visit us at www.dynatrace.com to see how dynaTrace customers are proactively dealing with accelerating application complexity and shifting what had historically been a reactive, painful process of managing application performance, scalability and stability under load into a proactive, successful, and efficient one.

Page 28: Continuous Application Performance Managementstatic.progressivemediagroup.com/Uploads/Whitepaper/43/c7a316e7... · 1 APM in the dynaTrace context means ‘application performance

dynaTrace software Inc.

95 Hayden Avenue, Waltham, MA 02451, USA, T +1 781.674.4000 F +1 (781) 2075365

Headquarters EMEA: dynaTrace software GmbH

Freistädter Str. 313, 4040 Linz, Austria/Europe, T+ 43 (732) 908208, F +43 (732) 210100.008 E: [email protected]

All rights reserved

dynaTrace software is a registered trademark of dynaTrace software GmbH. All other marks and names mentioned herein may be trademarks of other respective companies. (090928)