Upload
luca-mearelli
View
3.035
Download
2
Tags:
Embed Size (px)
DESCRIPTION
A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application. Did this at rubyday.it 2011
Citation preview
To Batch Or Not To BatchLuca Mearelli
rubyday.it 2011
@lmea #rubyday
First and foremost, we believe that speed is more than a feature. Speed is the most important feature. If your application is slow, people won’t use it.
Fred Wilson
First and foremost, we believe that speed is more than a feature. Speed is the most important feature. If your application is slow, people won’t use it.
Fred Wilson
@lmea #rubyday
Not all the interesting features are fast
Interacting with remote API
Sending emails
Media transcoding
Large dataset handling
@lmea #rubyday
Anatomy of an asynchronous action
The app decides it needs to do a long operation
The app asks the async system to do the operation and quickly returns the response
The async system executes the operation out-of-band
@lmea #rubyday
Batch
Asynchronous jobs
Queues & workers
@lmea #rubyday
Batch
@lmea #rubyday
Cron
scheduled operations
unrelated to the requests
low frequency
longer run time
@lmea #rubyday
Anatomy of a cron batch: the rake task
namespace :export do task :items_xml => :environment do # read the env variables # make the export endend
@lmea #rubyday
Anatomy of a cron batch: the shell script
#!/bin/sh# this goes in script/item_export_full.shcd /usr/rails/MyApp/currentexport RAILS_ENV=production
echo "Item Export Full started: `date`"rake export:items_xml XML_FOLDER='data/exports'echo "Item Export Full completed: `date`"
@lmea #rubyday
Anatomy of a cron batch: the crontab entry
0 0 1 * * /usr/rails/MyApp/current/script/item_export_full.sh >> /usr/rails/MyApp/current/log/dump_item_export.log 2>&1
30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/newsletter_daily.log 2>&1
@lmea #rubyday
Cron helpers
Whenever
https://github.com/javan/whenever
Craken
https://github.com/latimes/craken
@lmea #rubyday
Whenever: schedule.rb
# adds ">> /path/to/file.log 2>&1" to all commandsset :output, '/path/to/file.log'
every 3.hours do rake "my:rake:task" end
every 1.day, :at => '4:30 am' do runner "MyModel.task_to_run_at_four_thirty_in_the_morning"end
every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot command "/usr/bin/my_great_command", :output => {:error => 'error.log', :standard => 'cron.log'}end
@lmea #rubyday
Cracken: raketab
59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1
@daily solr:reindex > /tmp/solr_daily.log 2>&1
# also @yearly, @annually, @monthly, @weekly, @midnight, @hourly
@lmea #rubyday
Cracken: raketab.rb
Raketab.new do |cron| cron.schedule 'thing:to_do > /tmp/thing_to_do.log 2>&1', :every => mon..fri
cron.schedule 'first:five:days > /tmp/thing_to_do.log 2>&1', :days => [1,2,3,4,5]
cron.schedule 'first:day:q1 > /tmp/thing_to_do.log 2>&1', :the => '1st', :in => [jan,feb,mar]
cron.schedule 'first:day:q4 > /tmp/thing_to_do.log 2>&1', :the => '1st', :months => 'October,November,December'end
@lmea #rubyday
Queues & Workers
un-scheduled operations
responding to a request
mid to high frequency
mixed run time
@lmea #rubyday
Queues & Workers
Delayed job
https://github.com/collectiveidea/delayed_job
Resque
https://github.com/defunkt/resque
@lmea #rubyday
Delayed job
Any object method can be a job
Db backed queue
Integer-based priority
Lifecycle hooks (enqueue, before, after, ... )
@lmea #rubyday
Delayed job: simple jobs
# without [email protected]!(@event)
# with [email protected]!(@event)
# always asyncronous methodclass Newsletter def deliver # long running method end handle_asynchronously :deliverend
newsletter = Newsletter.newnewsletter.deliver
@lmea #rubyday
Delayed job: handle_asyncronously
handle_asynchronously :sync_method, :priority => 20
handle_asynchronously :in_the_future, :run_at => Proc.new { 5.minutes.from_now }
handle_asynchronously :call_a_class_method, :run_at => Proc.new { when_to_run }
handle_asynchronously :call_an_instance_method, :priority => Proc.new {|i| i.how_important }
@lmea #rubyday
Delayed job
class NewsletterJob < Struct.new(:text, :emails) def perform emails.each { |e| NewsMailer.deliver_text_to_email(text, e) } endend
Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...', User.find(:all).collect(&:email))
@lmea #rubyday
Delayed job
RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start
RAILS_ENV=production script/delayed_job stop
rake jobs:work
@lmea #rubyday
Delayed job: checking the job status
The queue is for scheduled and running jobs
Handle the status outside Delayed::Job object
@lmea #rubyday
Delayed job: checking the job status
# Include this in your initializers somewhere
class Queue < Delayed::Job
def self.status(id)
self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure")
end
end
# Use this method in your poll method like so:
def poll
status = Queue.status(params[:id])
if status == "success"
# Success, notify the user!
elsif status == "failure"
# Failure, notify the user!
end
end
@lmea #rubyday
Delayed job: checking the job status
class AJob < Struct.new(:options)
def perform
do_something(options)
end
def success(job)
# record success of job.id
Rails.cache.write("status:#{job.id}", "success")
end
end
# a helper
def job_completed_with_success(job_id)
Rails.cache.read("status:#{job_id}")=="success"
end
@lmea #rubyday
Resque
Redis-backed queues
Queue/dequeue speed independent of list size
Forking behaviour
Built in front-end
Multiple queues / no priorities
@lmea #rubyday
Resque: the job
class Export
@queue = :export_jobs
def self.perform(dataset_id, kind = 'full')
ds = Dataset.find(dataset_id)
ds.create_export(kind)
end
end
@lmea #rubyday
Resque: enqueuing the job
class Dataset
def async_create_export(kind)
Resque.enqueue(Export, self.id, kind)
end
end
ds = Dataset.find(100)
ds.async_create_export('full')
@lmea #rubyday
Resque: persisting the job
# jobs are persisted as JSON,
# so jobs should only take arguments that can be expressed as JSON
{
'class': 'Export',
'args': [ 100, 'full' ]
}
# don't do this: Resque.enqueue(Export, self, kind)
# do this:
Resque.enqueue(Export, self.id, kind)
@lmea #rubyday
Resque: generic async methods
# A simple async helper
class Repository < ActiveRecord::Base
# This will be called by a worker when a job needs to be processed
def self.perform(id, method, *args)
find(id).send(method, *args)
end
# We can pass this any Repository instance method that we want to
# run later.
def async(method, *args)
Resque.enqueue(Repository, id, method, *args)
end
end
# Now we can call any method and have it execute later:
@repo.async(:update_disk_usage)
@repo.async(:update_network_source_id, 34)
@lmea #rubyday
Resque: anatomy of a worker
# a worker does this:
start
loop do
if job = reserve
job.process
else
sleep 5
end
end
shutdown
@lmea #rubyday
Resque: working the queues
$ QUEUES=critical,high,low rake resque:work
$ QUEUES=* rake resque:work
$ PIDFILE=./resque.pid QUEUE=export_jobs rake environment resque:work
task "resque:setup" => :environment do
AppConfig.a_parameter = ...
end
@lmea #rubyday
Resque: monit recipe
# example monit monitoring recipe
check process resque_worker_batch_01
with pidfile /app/current/tmp/pids/worker_01.pid
start program = "/bin/bash -c 'cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup
rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/worker_01.pid'" as uid
deploy and gid deploy
stop program = "/bin/bash -c 'cd /app/current && kill -s QUIT `cat tmp/pids/worker_01.pid` && rm
-f tmp/pids/worker_01.pid; exit 0;'"
if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory?
group resque_workers
@lmea #rubyday
Resque: built-in monitoring
@lmea #rubyday
Resque plugins
Resque-statushttps://github.com/quirkey/resque-status
Resque-schedulerhttps://github.com/bvandenbos/resque-scheduler/
More at: https://github.com/defunkt/resque/wiki/plugins
@lmea #rubyday
Resque-status
Simple trackable jobs for resque
Job instances have a UUID
Jobs can report their status while running
@lmea #rubyday
Resque-status
# inheriting from JobWithStatus
class ExportJob < Resque::JobWithStatus
# perform is an instance method
def perform
limit = options['limit'].to_i || 1000
items = Item.limit(limit)
total = items.count
exported = []
items.each_with_index do |item, num|
at(num, total, "At #{num} of #{total}")
exported << item.to_csv
end
File.open(local_filename, 'w') { |f| f.write(exported.join("\n")) }
complete(:filename=>local_filename)
end
end
@lmea #rubyday
Resque-status
job_id = SleepJob.create(:length => 100)
status = Resque::Status.get(job_id)
# the status object tell us:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Resque::Status.kill(job_id)
@lmea #rubyday
Resque-scheduler
Queueing for future execution
Scheduling jobs (like cron!)
@lmea #rubyday
Resque-scheduler
# run a job in 5 days
Resque.enqueue_in(5.days, SendFollowupEmail)
# run SomeJob at a specific time
Resque.enqueue_at(5.days.from_now, SomeJob)
@lmea #rubyday
Resque-scheduler
namespace :resque do
task :setup do
require 'resque'
require 'resque_scheduler'
require 'resque/scheduler'
Resque.redis = 'localhost:6379'
# The schedule doesn't need to be stored in a YAML, it just needs to
# be a hash. YAML is usually the easiest.
Resque::Scheduler.schedule = YAML.load_file('your_resque_schedule.yml')
# When dynamic is set to true, the scheduler process looks for
# schedule changes and applies them on the fly.
# Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule)
# methods can be used to alter the schedule
#Resque::Scheduler.dynamic = true
end
end
$ rake resque:scheduler
@lmea #rubyday
Resque-scheduler: the yaml configuration
queue_documents_for_indexing:
cron: "0 0 * * *"
class: QueueDocuments
queue: high
args:
description: "This job queues all content for indexing in solr"
export_items:
cron: "30 6 * * 1"
class: Export
queue: low
args: full
description: "This job does a weekly export"
@lmea #rubyday
Other (commercial)
SimpleWorker
http://simpleworker.com
SQS https://github.com/appoxy/aws/
http://rubygems.org/gems/right_aws
http://sdruby.org/video/024_amazon_sqs.m4v
@lmea #rubyday
Other (historical)
Beanstalkd and Stalkerhttp://asciicasts.com/episodes/243-beanstalkd-and-stalker
http://kr.github.com/beanstalkd/
https://github.com/han/stalker
Backgroundjob (Bj)https://github.com/ahoward/bj
BackgroundRbhttp://backgroundrb.rubyforge.org/
@lmea #rubyday
Other (different approaches)
Nanitehttp://www.slideshare.net/jendavis100/background-processing-with-nanite
Cloud Crowd
https://github.com/documentcloud/cloud-crowd/wiki/Getting-Started
@lmea #rubyday
http://www.flickr.com/photos/rkbcupcakes/3373909785/http://www.flickr.com/photos/anjin/23460398http://www.flickr.com/photos/vivacomopuder/3122401239http://www.flickr.com/photos/pacdog/4968422200http://www.flickr.com/photos/comedynose/3834416952http://www.flickr.com/photos/rhysasplundh/5177851910/http://www.flickr.com/photos/marypcb/104308457http://www.flickr.com/photos/shutterhacks/4474421855http://www.flickr.com/photos/kevinschoenmakersnl/5562839479http://www.flickr.com/photos/triplexpresso/496995086http://www.flickr.com/photos/saxonmoseley/24523450http://www.flickr.com/photos/gadl/89650415http://www.flickr.com/photos/matvey_andreyev/3656451273http://www.flickr.com/photos/bryankennedy/1992770068http://www.flickr.com/photos/27282406@N03/4134661728/