Performance Optimization of Rails Applications

Embed Size (px)

Citation preview

Advanced Performance Optimizationof Rails Applications

Serge SmetanaRuPy 2009

www.acunote.com

What Am I Optimizing?

Acunote www.acunote.comOnline project management and scrum softwareRuby on Rails application since inception in 2006

~5300 companies

~13000 users

Hosted on Engine Yard

Hosted on Customer's Servers

nginx + mongrel

PostgreSQL

Performance Degradation Over Time

April 2008

May 2008

June 2008

July 2008

Request Time (on development box), %

Actually Happens: O(nc)

Best Case: O(log n)

Solutions?

Throw Some Hardware at it!

Solutions?

Performance Optimization!

What to optimize?

What To Optimize?

Development?

What To Optimize?

Development

AND Production

How to optimize?

How To Optimize?

Three rules of
performance optimization

Three Rules Of Performance Optimization

1. Measure!

Three Rules Of Performance Optimization

2. Optimize only what's slow!

Three Rules Of Performance Optimization

3. Optimize for the user!

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Optimizing Ruby: Date Class

What's wrong with Date?

> puts Benchmark.realtime { 1000.times { Time.mktime(2009, 5, 6, 0, 0, 0) } }0.005> puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } }0.080

16x slower than Time! Why?

%self total self wait child calls name 7.23 0.66 0.18 0.00 0.48 18601 #reduce 6.83 0.27 0.17 0.00 0.10 5782 #jd_to_civil 6.43 0.21 0.16 0.00 0.05 31528 Rational#initialize 5.62 0.23 0.14 0.00 0.09 18601 Integer#gcd

Optimizing Ruby: Date Class

Fixing Date: Use C, Luke!

Date::Performance gem with Date partially rewritten in Cby Ryan Tomayko (with patches by Alex Dymo in 0.4.7)

> puts Benchmark.realtime { 1000.times { Time.mktime(2009, 5, 6, 0, 0, 0) } }0.005> puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } }0.080

> require 'date/performance'puts Benchmark.realtime { 1000.times { Date.civil(2009, 5, 6) } }0.006

git clone git://github.com/rtomayko/date-performance.gitrake package:buildcd dist && gem install date-performance-0.4.8.gem

Optimizing Ruby: Date Class

Real-world impact of Date::Performance:

Before: 0.95 secAfter: 0.65 sec1.5x!

Optimizing Ruby: Misc

Use String:: long_string = "foo" * 100000> Benchmark.realtime { long_string += "foo" }0.0003> Benchmark.realtime { long_string n = BigDecimal("4.5")> Benchmark.realtime { 10000.times { n 4.5 } }0.063> Benchmark.realtime { 10000.times { n BigDecimal("4.5") } }0.014

in theory:4.5xin practice:1.15x

in theory:75xin practice:up to 70x

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Optimizing Rails: String Callbacks

What can be wrong with this code?

class Task < ActiveRecord::Base before_save "some_check()"end...100.times { Task.create attributes}

Kernel#binding is called to eval() the string callbackThat will duplicate your execution context in memory!More memory taken => More time for GC

Optimizing Rails: String Callbacks

What to do

class Task < ActiveRecord::Base before_save :some_checkend

Optimizing Rails: Partial Rendering

Not too uncommon, right?

#1000 times 'object', :locals => { :object => object } %>

We create 1000 View instances for each object here!Why?

list.rhtml

Optimizing Rails: Partial Rendering

Template inlining for the resque:

#1000 times 'object', :locals => { :object => object },:inline => true %>

list.rhtml_object.rhtml_object.rhtml_object.rhtml_object.rhtml_object.rhtml_object.rhtml_object.rhtml_object.rhtml

Optimizing Rails: Partial Rendering

Template Inliner plugin:http://github.com/acunote/template_inliner/

Real world effect from template inlining:

Rendering of 300 objects, 5 partials for each objectwithout inlining:0.89 secwith inlining:0.75 sec

1.2x

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Optimizing Database

How to optimize PostgreSQL:explain analyzeexplain analyzeexplain analyze...

Optimizing Database: PostgreSQL Tips

EXPLAIN ANALYZE explains everything, but...... run it also for the "cold" database state!

Example: complex query which works on 230 000 rows anddoes 9 subselects / joins:cold state: 28 sec, hot state: 2.42 sec

Database server restart doesn't helpNeed to clear disk cache: sudo echo 3 | sudo tee /proc/sys/vm/drop_caches (Linux)

Optimizing Database: PostgreSQL Tips

Use any(array ()) instead of in()
to force subselect and avoid join

explain analyze select * from issues where id in (select issue_id from tags_issues);

QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------------------------- Merge IN Join (actual time=0.096..576.704 rows=55363 loops=1) Merge Cond: (issues.id = tags_issues.issue_id) -> Index Scan using issues_pkey on issues (actual time=0.027..270.557 rows=229991 loops=1) -> Index Scan using tags_issues_issue_id_key on tags_issues (actual time=0.051..73.903 rows=70052loops=1) Total runtime: 605.274 ms

explain analyze select * from issues where id = any( array( (select issue_id from tags_issues) ) );

QUERY PLAN------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on issues (actual time=247.358..297.932 rows=55363 loops=1) Recheck Cond: (id = ANY ($0)) InitPlan -> Seq Scan on tags_issues (actual time=0.017..51.291 rows=70052 loops=1) -> Bitmap Index Scan on issues_pkey (actual time=246.589..246.589 rows=70052 loops=1) Index Cond: (id = ANY ($0)) Total runtime: 325.205 ms

2x!

Database Optimization: PostgreSQL Tips

Push down conditions into subselects and joinsPostgreSQL often won't do that for you

select *,(select notes.author from notes where notes.issue_id = issues.id) as note_authorsfrom issueswhere org_id = 1

select *,(select notes.author from notes where notes.issue_id = issues.id and org_id = 1) as note_authorsfrom issueswhere org_id = 1

Issuesidserialnamevarcharorg_idinteger

Notesidserialnamevarcharissue_idintegerorg_idinteger

What To Do?

Optimize For Development BoxRuby code

Rails code

Database queries

Alternative Ruby

Optimize For ProductionShared filesystems and databases

Live debugging

Load balancing

Optimize For The UserHTTP

Javascript

Internet Explorer

Alternative Ruby

Everybody says "JRuby and Ruby 1.9 are faster"

Is that true in production?

Alternative Ruby

In short, YES!

= Acunote Benchmarks = MRI JRuby 1.9.1 Date/Time Intensive Ops 1.79 0.67 0.62Rendering Intensive Ops 0.59 0.44 0.40Calculations Intensive Ops 2.36 1.79 1.79Database Intensive Ops 4.87 4.63 3.66

Alternative Ruby

In short, YES!

= Acunote Benchmarks = MRI JRuby 1.9.1 Date/Time Intensive Ops 1x 2.6x 2.9xRendering Intensive Ops 1x 1.3x 1.5xCalculations Intensive Ops 1x 1.3x 1.3xDatabase Intensive Ops 1x 1x 1.3x

JRuby: 1.55x fasterRuby 1.9: 1.75x faster

Alternative Ruby

In short, YES!

= Acunote Benchmarks = MRI JRuby 1.9.1 Date/Time Intensive Ops 1x 2.6x 2.9xRendering Intensive Ops 1x 1.3x 1.5xCalculations Intensive Ops 1x 1.3x 1.3xDatabase Intensive Ops 1x 1x 1.3x

JRuby: 1.55x fasterRuby 1.9: 1.75x faster

Alternative Ruby

What is faster ?

Acunote Copy Tasks Benchmark MRI JRuby 1.9.1 Request Time 5.52 4.45 3.24 Template Rendering Time 0.35 0.21 0.21 Database Time 0.70 1.32 0.69 GC Time 1.07 N/A 0.62Faster template rendering!Less GC!JDBC database driver performance issue with JRuby?

Alternative Ruby

Why faster?

Alternative Ruby

Things I usually see in the profiler after optimizing:

%self self calls name 2.73 0.05 351 Range#each-1 2.73 0.05 33822 Hash#[]= 2.19 0.04 4 Acts::AdvancedTree::Tree#walk_tree 2.19 0.04 44076 Hash#[] 1.64 0.03 1966 Array#each-1 1.64 0.03 378 Org#pricing_plan 1.64 0.03 1743 Array#each 1.09 0.02 1688 ActiveRecord::AttributeMethods#respond_to? 1.09 0.02 1311 Hash#each 1.09 0.02 6180 ActiveRecord::AttributeMethods#read_attribute_before_typecast 1.09 0.02 13725 Fixnum#== 1.09 0.02 46736 Array#[] 1.09 0.02 15631 String#to_s 1.09 0.02 24330 String#concat 1.09 0.02 916 ActiveRecord::Associations#association_instance_get 1.09 0.02 242 ActionView::Helpers::NumberHelper#number_with_precision 1.09 0.02 7417 Fixnum#to_s

Alternative Ruby

# of method calls during one request:50 000 - Array35 000 - Hash25 000 - String

Slow classes written in Ruby:DateRational

Alternative Ruby

Alternative Rubys optimize mostly:the cost of function call

complex computations in pure Ruby

memory by not keeping source code AST

Alternative Ruby

Alternative Rubys optimize mostly:the cost of function call

complex computations in pure Ruby

memory by not keeping source code AST

Alternative Ruby

So, shall I use alternative Ruby?Definitely Yes!... but

JRuby:if your application works with it(run requests hundreds of times to check)Ruby 1.9:if all gems you need are ported

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Optimizing For Shared Environment

Issues we experienced deploying on Engine Yard:

1) VPS is just too damn slow2) VPS may have too little memory to run the request!3) shared database server is a problem4) network filesystem may cause harm as well

Optimizing For Shared Environment

VPS may have too little memory to run the request

Think 512M should be enough?Think again.We saw requests that took 1G of memory!

Solutions:buy more memory

optimize memory

set memory limits for mongrels (with monit)

Optimizing For Shared Environment

You're competing for cache on a shared server:1. two databases with equal load share the cache

Optimizing For Shared Environment

You're competing for memory cache on a shared server:2. one of the databases gets more load and wins the cache

Optimizing For Shared Environment

As a result, your database can always be in a "cold" stateand you read data from disk, not from memory!complex query which works on 230 000 rows anddoes 9 subselects / joins:from disk: 28 sec, from memory: 2.42 sec

Solutions: optimize for the cold state

push down SQL conditions

sudo echo 3 | sudo tee /proc/sys/vm/drop_caches

Optimizing For Shared Environment

fstat() is slow on network filesystem (GFS)

Request to render list of tasks in Acunote:on development box: 0.50 secon production box:0.50 - 2.50 sec

Optimizing For Shared Environment

fstat() is slow on network filesystem (GFS)Couldn't figure out why until we ran strace

We used a) filesystem store for fragment cachingb) expire_fragment(regexp)

Later looked through all cache directories even though we knew the cache is located in only one specific subdir

Optimizing For Shared Environment

fstat() is slow on network filesystem (GFS)Solution:memcached instead of filesystem

if filesystem is ok, here's a trick:http://blog.pluron.com/2008/07/hell-is-paved-w.html

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Live Debugging

To see what's wrong on "live" application:For Linux: strace and oprofileFor Mac and Solaris: dtraceFor Windows: uhm... about time to switch ;)

To monitor for known problems:monitnagiosown scripts to analyze application logs

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Load Balancing

The problem of round-robin and fair load balancing

Rails App 1Rails App 2Rails App 31321

3

per-process queues

32

12

Load Balancing

The problem of round-robin and fair load balancing

Rails App 1Rails App 2Rails App 311321

3

per-process queues

22

Load Balancing

Solution: the global queue

Rails App 1Rails App 2Rails App 32145

3

mod_rails / Passenger

Load Balancing

Dedicated queues for long-running requests

Rails App 1Rails App 2Rails App 31121

3

queue for long-running requests

2

regular per-process queues

nginx dedicated queues

Load Balancing

nginx configuration for dedicated queues

upstream mongrel { server 127.0.0.1:5000; server 127.0.0.1:5001;}upstream rss_mongrel { server 127.0.0.1:5002;}server { location / { location ~ ^/feeds/(rss|atom) { if (!-f $request_filename) { proxy_pass http://rss_mongrel; break; } } if (!-f $request_filename) { proxy_pass http://mongrel; } }}

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Optimize For The User: HTTP

Network and FrontendBackendThings to consider:Gzip HTML, CSS and JS

Minify JS

Collect JS and CSS
(javascript_include_tag :all, :cache => true)

Far future expires headers for JS, CSS, images

Sprites

Cache-Control: public

everything else YSlow tells you

5%

95%

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Optimize Frontend: Javascript

Things you don't want to hear from your users:

"...Your server is slow..."

said the user after clickingon the link to show a formwith plain javascript (no AJAX)

Optimize Frontend: Javascript

Known hotspots in Javascript:- eval()- all DOM operations - avoid if possible, for example- use element.className instead of element.readAttribute('class')- use element.id instead of element.readAttirbute('id')- $$() selectors, especially attribute selectors- may be expensive, measure first- $$('#some .listing td a.popup[accesslink]' - use getElementsByTagName() and iterate results instead- element.style.* changes- change class instead- $() and getElementById on large (~20000 elements) pages

Things To Optimize

DevelopmentRuby code

Rails code

Database queries

Alternative Ruby

ProductionShared filesystems and databases

Live debugging

Load balancing

FrontendHTTP

Javascript

Internet Explorer

Optimize Frontend: IE

Slow things that are especially slow in IE:- $() and $$(), even on small pages- getElementsByName()- style switching

Optimize Frontend: IE

Good things about IE:

profiler in IE8fast in IE => fast everywhere else!

Keep It Fast!

So, you've optimized your application.How to keep it fast?

Keep It Fast!

Measure, measure and measure...Use profilerOptimize CPU and MemoryPerformance Regression Tests

Keep It Fast: Measure

Keep a set of benchmarks for most frequent user requests.For example:

Benchmark Burndown 120 0.70 0.00Benchmark Inc. Burndown 120 0.92 0.01Benchmark Sprint 20 x (1+5) (C) 0.45 0.00Benchmark Issues 100 (C) 0.34 0.00Benchmark Prediction 120 0.56 0.00Benchmark Progress 120 0.23 0.00Benchmark Sprint 20 x (1+5) 0.93 0.00Benchmark Timeline 5x100 0.11 0.00Benchmark Signup 0.77 0.00Benchmark Export 0.20 0.00Benchmark Move Here 20/120 0.89 0.00Benchmark Order By User 0.98 0.00Benchmark Set Field (EP) 0.21 0.00Benchmark Task Create + Tag 0.23 0.00 ... 30 more ...

Keep It Fast: Measure

Benchmarks as a special kind of tests:

class RenderingTest < ActionController::IntegrationTest def test_sprint_rendering login_with users(:user), "user"

benchmark :title => "Sprint 20 x (1+5) (C)", :route => "projects/1/sprints/3/show", :assert_template => "tasks/index" end

end

Benchmark Sprint 20 x (1+5) (C) 0.45 0.00

Keep It Fast: Measure

Benchmarks as a special kind of tests:

def benchmark(options = {})(0..100).each do |i|GC.startpid = fork dobeginout = File.open("values", "a")ActiveRecord::Base.transaction doelapsed_time = Benchmark::realtime dorequest_method = options[:post] ? :post : :getsend(request_method, options[:route])endout.puts elapsed_time if i > 0out.closeraise CustomTransactionErrorendrescue CustomTransactionErrorexitendendProcess::waitpid pidActiveRecord::Base.connection.reconnect!endvalues = File.read("values")print "#{mean(values).to_02f} #{sigma(values).to_02f}\n"end

Keep It Fast: Query Testing

Losing 10ms in benchmark might seem OK

Except that it's sometimes not because you're running one more SQL query

Keep It Fast: Query Testing

def test_queriesqueries = track_queries doget :indexendassert_equal queries, ["Foo Load","Bar Load","Event Create"]end

Keep It Fast: Query Testing

module ActiveSupportclass BufferedLogger

attr_reader :tracked_queries

def tracking=(val) @tracked_queries = [] @tracking = val end

def debug_with_tracking(message) @tracked_queries