60
LET’S LEARN FROM THE TOP PERF MISTAKES @grabnerandi

STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Embed Size (px)

DESCRIPTION

Presentation given at STPCon 2014. It highlights the top performance problems seen in 2013 and how we can identify these problems in dev & test instead of waiting until the app crashes in production

Citation preview

Page 1: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

LET’S LEARN FROM THE TOP PERF MISTAKES

@grabnerandihttp://apmblog.compuware.com

Page 2: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

What to do with the fastest car …

Page 3: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

… if it fails to reach the finish line

Page 4: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

What to do with millions of $$ for

building a web site …

Page 5: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 6: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 7: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 8: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Performance, Scalability & Architecture

Page 9: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 10: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#1: Architectural Decisions

Page 11: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#1: “We want more Web 2.0”

Page 12: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#1: Load Test Prior to Change

Page 13: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#1: Load Test After Change

Page 14: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Metrics: # Visitors# Requests / User

Business: Do we need all these bells and

whistles?

Page 15: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 16: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#2: Disconnected Teams

Page 17: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#2: “Teamwork” between Dev and Ops

SEV1 Problem in Production

Need access to log files

Where are they? Can’t get them

Need to increase log level

Can’t do! Can’t change config files in prod!

Page 18: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#2: Solution: Implement a Custom “On Demand” Remote Logger

Page 19: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#2: Implementation and Rollout

Implemented Custom Logger

Worked well in Load Testing

Page 20: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#2: What happened?

~ 1Mio Lock Exceptions in 30 mins

Page 21: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#2: Root Cause: A special WebSphere Setting!

Log Service provides a synchronized log file across ALL JVMs

Log Service provides a synchronized log file across

ALL JVMs

Page 22: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Metrics: # Log Messages, # Exceptions

Share: Same Server Settings

Page 23: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 24: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#3: Implementation

Flaws

Page 26: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#3: Solution: Cache to the RESCUE!!

Page 27: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#3: Implementation and Rollout

Implemented InMemory Cache

Worked well in Load Testing

Page 28: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#3: Result: Out of Memory Crashes!!

Still crashes

Problem fixed!Fixed Version Deployed

Page 29: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Metrics: Heap Size, # Objects Allocated,# Objects in Cache

Cache Hit Ratio

Test: With realistic Data

Page 30: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 31: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#4: Push without a Plan

Page 32: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#4: Mobile Landing Page of Super Bowl Ad

434 Resources in total on that page:230 JPEGs, 75 PNGs, 50 GIFs, …

Total size of ~ 20MB

Page 33: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#4: m.store.com redirects to www.store.com

ALL CSS and JS files are redirected to the www domain

This is a lot of time “wasted” especially on high latency mobile

connections

Page 34: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#4: Critical Pages not Optimized!

Browse, Search and Product Info

performs well

… because they don’t follow best practices: 87 Requests, 28

Redirects, …

Critical Pages such as Shopping Cart are very

slow …

Page 35: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Metrics: Load Time, # Resources (Images, …),

# HTTP 3xx, 4xx, 5xx

Dev: Build for Mobile

Test: Test on Mobile

Page 36: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 37: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#5: “Blindly” (Re)use Existing

Components

Page 38: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#5: Requirement: We need a report

Page 39: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#5: Using Hibernate results in 4k+ SQL Statements to display 3 items!

Hibernate Executes 4k+ Statements

Individual Execution VERY

FAST

But Total SUM takes 6s

Page 40: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#5: Requirement: We need a fancy UI

Page 41: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#5: Using Telerik Controls Results in 9s for Data-Binding of UI Controls

#1: Slow Stored ProcedureDepending on Request

execution time of this SP varies between 1 and 7.5s

#2: 240! Similar SQL StatementsMost of these 240! Statements are

not prepared and just differ in things like Column Names

Page 42: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Metrics: # Total SQLs# SQLs / Web Request# Same SQLs / Request

Transferred Rows

Test: With realistic Data

Dev: “Learn” Frameworks

Page 43: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

12 000 000 $

Page 44: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#6: No “Agile” Deployment

Page 45: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Ad on air

Availability dropped to 0%

#6: Load Spike resulted in Unavailability

Page 46: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#6: Alternative: “GoDaddy goes DevOps”

Response time improved 4x

1h before SuperBowl KickOff

1h after Game ended

Page 47: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

#6: Behind the Scenes

Page 48: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

Metrics: AvailabilityPage Size, # Objects

# Hosts, # Connections

DevOps: “Feature” Switches

Page 49: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

What have we learned today?

Page 52: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

UNDERSTAND THE TECHNOLOGY

WE ARE WORKING WITH

Page 53: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 54: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

# of Requests / User

# of Log Messages

# of Exceptions

# Objects Allocated

# Objects In Cache

Cache Hit Ratio

# of Images

# of SQLs

# SQLs per RequestAvailability

# HTTP 3xx, 4xx

Page Size

Page 55: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

A final thought …

Page 56: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
Page 57: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

How about this idea?

12 0 120ms

3 1 68ms

Build 20 testPurchase OK

testSearch OK

Build 17 testPurchase OK

testSearch OK

Build 18 testPurchase FAILED

testSearch OK

Build 19 testPurchase OK

testSearch OK

Build # Test Case Status # SQL # Excep CPU

12 0 120ms

3 1 68ms

12 5 60ms

3 1 68ms

75 0 230ms

3 1 68ms

Test Framework Results Architectural Data

We identified a regresesion

Problem solved

Let’s look behind the scenes

Exceptions probably reason for failed tests

Problem fixed but now we have an architectural regression

Problem fixed but now we have an architectural regression

Now we have the functional and architectural confidence

Page 58: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

How? Performance Focus in Test Automation

Cross Impact of KPIs

Analyzing All Unit / Performance Tests

Analyze Perf Metrics

Identify Regressions

Page 59: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

More Info

• My Blog: http://apmblog.compuware.com

• Tweet about it: @grabnerandi

• dynaTrace Enterprise– Full End-to-End Visibility in your Java, .NET, PHP Apps

– Sign up for a 15 Days Free Trial on http://compuwareapm.com

• dynaTrace AJAX Edition– Browser Diagnostics for IE + FF

– Download @ http://ajax.dynatrace.com

Page 60: STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

THANK YOU@grabnerandi