Upload
wolfgang-gottesheim
View
135
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Can you get away with that answer after crashing your production website with a change you just deployed? Usually you can’t, and instead you’re tasked with figuring out and fixing the problem. In this session, we will talk about typical architectural, coding and deployment problems you might recognize, show what data you need to quickly identify them, and how to catch them before impacting the business.
Citation preview
11
COMPANY CONFIDENTIAL – DO NOT DISTRIBUTE
Works on my machine – your problem now?
Wolfgang Gottesheim
Compuware APM
22
Business comes up with new featuresBusiness comes up with new features
33
Testing?Testing?
44
And this is what you end up with…And this is what you end up with…
55
System Unresponsive?System Unresponsive?
66
What Operations tells Developers…What Operations tells Developers…
77
…and what Devs would like to know…and what Devs would like to know
88
…and what Devs would like to know…and what Devs would like to know
Top Contributor is related to String handling
99% of that time comes from RegEx Pattern Matching
Page Rendering is the main component
99
Attitudes like this don’t help eitherAttitudes like this don’t help either
Image taken from https://www.scriptrock.com/blog/devops-whats-hype-about/
1010
Very “expensive” to work on these issuesVery “expensive” to work on these issues
~80% of problems
caused by ~20% patterns
YES we know this
80% Dev Time in Bug Fixing
$60B Defect Costs
BUT
1111
1212
#1: Exhausted Resource Pools#1: Exhausted Resource Pools
1313
#2: Maxing out Worker Threads#2: Maxing out Worker ThreadsThe timeline shows how these active
worker threads are distributed across all
JVMs
At ~10:10 AM almost all JVMs max out their
available worker threads
Detailed information for
every single JVM
1414
Root Cause:Class Loading as Performance Hotspot
Root Cause:Class Loading as Performance Hotspot
Most of the time is spent in
CLASSLOADING during Peak Load
But the same is true for “normal” load. Classloading
seems to be a general problem that is not load
related
1515
Root Cause:Trying to Load a Missing ClassRoot Cause:Trying to Load a Missing Class
Class Loading impacts ALL transactions (fast or slow)
Class Loader tries to load a class ending in
TransferValidatorBPBeanInfo
It’s a class that doesn’t exist
1616
#3: Deployment Mistakes#3: Deployment Mistakes
1717
Root Cause: Missing FileRoot Cause: Missing File
1818
#4: Different settings in Test & Prod#4: Different settings in Test & Prod
1919
#5: Real-world Data != Test Data#5: Real-world Data != Test Data
2020
#6: N+1 Query Problem#6: N+1 Query Problem
2121
#7: Misconfigured Caching Framework#7: Misconfigured Caching Framework
798772 DB Calls in 30 minutes
With NO TRAFFIC
2222
#8: Memory Leaks#8: Memory Leaks
Still crashes
Problem fixed!
Fixed Version Deployed
2323
#9: Bloated Web Sites#9: Bloated Web Sites
17! JS Files – 1.7MB in Size
Useless Information!Even might be a security risk!
2424
Recent example: Healthcare.govRecent example: Healthcare.gov
55 JS Files, 16 jQuery related!
Merging files can reduce roundtrips by 95%
2525
#10: Browser caches#10: Browser caches
62! Resources not cached
49! Resources with short expiration
2626
Problems that could have been avoidedProblems that could have been avoided
BUT WHY are they still making it to Production?HOW can we catch them earlier?
?
2727
Root Cause: Disconnected TeamsRoot Cause: Disconnected Teams
28
Solution: DevOps + Performance Focus
2929
CultureCulture Become ONE TeamBecome ONE Team
3030
CultureCulture TestabilityTestability
3131
Automate & Measure …Automate & Measure …PerformancePerformance
3232
Automate & Measure …Automate & Measure …ScalabilityScalability
3333
AutomateAutomate DeploymentDeployment
3434
ShareShare ToolsTools
3535
How? Performance Focus in Test AutomationHow? Performance Focus in Test Automation
12 0 120ms
3 1 68ms
Build 20 testPurchase OK
testSearch OK
Build 17 testPurchase OK
testSearch OK
Build 18 testPurchase FAILED
testSearch OK
Build 19 testPurchase OK
testSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms
3 1 68ms
12 5 60ms
3 1 68ms
75 0 230ms
3 1 68ms
Test Framework Results Architectural Data
We identified a regresesion
Problem solved
Lets look behind the scenes
Exceptions probably reason for failed tests
Problem fixed but now we have an architectural regression
Problem fixed but now we have an architectural regression
Now we have the functional and architectural confidence
3636
How? Performance Focus in Test AutomationHow? Performance Focus in Test Automation
Embed your Architectural Results in
Jenkins
3737
Version Control System
dynaTraceServer
Developer
CI Server
Commit
Trigger build
Build andrun tests
Publish performancemetrics
Drilldownfor further
analysis
Inform about build
status
Look beyond test pass/fail!
3838
How? Performance Focus in Test AutomationHow? Performance Focus in Test Automation
Analyzing All Unit / Performance
Tests
Analyze Perf Metrics
Identify Regression
s
3939
How? Performance Focus in Test AutomationHow? Performance Focus in Test Automation
Cross Impact of KPIs
4040
ShareShare ResultsResults
41© 2011 Compuware Corporation — All Rights Reserved
Simply Smarter