46
To Batch Or Not To Batch Luca Mearelli rubyday.it 2011

To Batch Or Not To Batch

Embed Size (px)

DESCRIPTION

A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application. Did this at rubyday.it 2011

Citation preview

Page 1: To Batch Or Not To Batch

To Batch Or Not To BatchLuca Mearelli

rubyday.it 2011

Page 2: To Batch Or Not To Batch

@lmea #rubyday

First and foremost, we believe that speed is more than a feature. Speed is the most important feature. If your application is slow, people won’t use it.

Fred Wilson

First and foremost, we believe that speed is more than a feature. Speed is the most important feature. If your application is slow, people won’t use it.

Fred Wilson

Page 3: To Batch Or Not To Batch

@lmea #rubyday

Not all the interesting features are fast

Interacting with remote API

Sending emails

Media transcoding

Large dataset handling

Page 4: To Batch Or Not To Batch

@lmea #rubyday

Anatomy of an asynchronous action

The app decides it needs to do a long operation

The app asks the async system to do the operation and quickly returns the response

The async system executes the operation out-of-band

Page 5: To Batch Or Not To Batch

@lmea #rubyday

Batch

Asynchronous jobs

Queues & workers

Page 6: To Batch Or Not To Batch

@lmea #rubyday

Batch

Page 7: To Batch Or Not To Batch

@lmea #rubyday

Cron

scheduled operations

unrelated to the requests

low frequency

longer run time

Page 8: To Batch Or Not To Batch

@lmea #rubyday

Anatomy of a cron batch: the rake task

namespace :export do task :items_xml => :environment do # read the env variables # make the export endend

Page 9: To Batch Or Not To Batch

@lmea #rubyday

Anatomy of a cron batch: the shell script

#!/bin/sh# this goes in script/item_export_full.shcd /usr/rails/MyApp/currentexport RAILS_ENV=production

echo "Item Export Full started: `date`"rake export:items_xml XML_FOLDER='data/exports'echo "Item Export Full completed: `date`"

Page 10: To Batch Or Not To Batch

@lmea #rubyday

Anatomy of a cron batch: the crontab entry

0 0 1 * * /usr/rails/MyApp/current/script/item_export_full.sh >> /usr/rails/MyApp/current/log/dump_item_export.log 2>&1

30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/newsletter_daily.log 2>&1

Page 11: To Batch Or Not To Batch

@lmea #rubyday

Cron helpers

Whenever

https://github.com/javan/whenever

Craken

https://github.com/latimes/craken

Page 12: To Batch Or Not To Batch

@lmea #rubyday

Whenever: schedule.rb

# adds ">> /path/to/file.log 2>&1" to all commandsset :output, '/path/to/file.log'

every 3.hours do rake "my:rake:task" end

every 1.day, :at => '4:30 am' do runner "MyModel.task_to_run_at_four_thirty_in_the_morning"end

every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot command "/usr/bin/my_great_command", :output => {:error => 'error.log', :standard => 'cron.log'}end

Page 13: To Batch Or Not To Batch

@lmea #rubyday

Cracken: raketab

59 * * * * thing:to_do > /tmp/thing_to_do.log 2>&1

@daily solr:reindex > /tmp/solr_daily.log 2>&1

# also @yearly, @annually, @monthly, @weekly, @midnight, @hourly

Page 14: To Batch Or Not To Batch

@lmea #rubyday

Cracken: raketab.rb

Raketab.new do |cron| cron.schedule 'thing:to_do > /tmp/thing_to_do.log 2>&1', :every => mon..fri

cron.schedule 'first:five:days > /tmp/thing_to_do.log 2>&1', :days => [1,2,3,4,5]

cron.schedule 'first:day:q1 > /tmp/thing_to_do.log 2>&1', :the => '1st', :in => [jan,feb,mar]

cron.schedule 'first:day:q4 > /tmp/thing_to_do.log 2>&1', :the => '1st', :months => 'October,November,December'end

Page 15: To Batch Or Not To Batch

@lmea #rubyday

Queues & Workers

un-scheduled operations

responding to a request

mid to high frequency

mixed run time

Page 16: To Batch Or Not To Batch

@lmea #rubyday

Queues & Workers

Delayed job

https://github.com/collectiveidea/delayed_job

Resque

https://github.com/defunkt/resque

Page 17: To Batch Or Not To Batch

@lmea #rubyday

Delayed job

Any object method can be a job

Db backed queue

Integer-based priority

Lifecycle hooks (enqueue, before, after, ... )

Page 18: To Batch Or Not To Batch

@lmea #rubyday

Delayed job: simple jobs

# without [email protected]!(@event)

# with [email protected]!(@event)

# always asyncronous methodclass Newsletter def deliver # long running method end handle_asynchronously :deliverend

newsletter = Newsletter.newnewsletter.deliver

Page 19: To Batch Or Not To Batch

@lmea #rubyday

Delayed job: handle_asyncronously

handle_asynchronously :sync_method, :priority => 20

handle_asynchronously :in_the_future, :run_at => Proc.new { 5.minutes.from_now }

handle_asynchronously :call_a_class_method, :run_at => Proc.new { when_to_run }

handle_asynchronously :call_an_instance_method, :priority => Proc.new {|i| i.how_important }

Page 20: To Batch Or Not To Batch

@lmea #rubyday

Delayed job

class NewsletterJob < Struct.new(:text, :emails) def perform emails.each { |e| NewsMailer.deliver_text_to_email(text, e) } endend

Delayed::Job.enqueue NewsletterJob.new('lorem ipsum...', User.find(:all).collect(&:email))

Page 21: To Batch Or Not To Batch

@lmea #rubyday

Delayed job

RAILS_ENV=production script/delayed_job -n 2 --min-priority 10 start

RAILS_ENV=production script/delayed_job stop

rake jobs:work

Page 22: To Batch Or Not To Batch

@lmea #rubyday

Delayed job: checking the job status

The queue is for scheduled and running jobs

Handle the status outside Delayed::Job object

Page 23: To Batch Or Not To Batch

@lmea #rubyday

Delayed job: checking the job status

# Include this in your initializers somewhere

class Queue < Delayed::Job

def self.status(id)

self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure")

end

end

# Use this method in your poll method like so:

def poll

status = Queue.status(params[:id])

if status == "success"

# Success, notify the user!

elsif status == "failure"

# Failure, notify the user!

end

end

Page 24: To Batch Or Not To Batch

@lmea #rubyday

Delayed job: checking the job status

class AJob < Struct.new(:options)

def perform

do_something(options)

end

def success(job)

# record success of job.id

Rails.cache.write("status:#{job.id}", "success")

end

end

# a helper

def job_completed_with_success(job_id)

Rails.cache.read("status:#{job_id}")=="success"

end

Page 25: To Batch Or Not To Batch

@lmea #rubyday

Resque

Redis-backed queues

Queue/dequeue speed independent of list size

Forking behaviour

Built in front-end

Multiple queues / no priorities

Page 26: To Batch Or Not To Batch

@lmea #rubyday

Resque: the job

class Export

@queue = :export_jobs

def self.perform(dataset_id, kind = 'full')

ds = Dataset.find(dataset_id)

ds.create_export(kind)

end

end

Page 27: To Batch Or Not To Batch

@lmea #rubyday

Resque: enqueuing the job

class Dataset

def async_create_export(kind)

Resque.enqueue(Export, self.id, kind)

end

end

ds = Dataset.find(100)

ds.async_create_export('full')

Page 28: To Batch Or Not To Batch

@lmea #rubyday

Resque: persisting the job

# jobs are persisted as JSON,

# so jobs should only take arguments that can be expressed as JSON

{

'class': 'Export',

'args': [ 100, 'full' ]

}

# don't do this: Resque.enqueue(Export, self, kind)

# do this:

Resque.enqueue(Export, self.id, kind)

Page 29: To Batch Or Not To Batch

@lmea #rubyday

Resque: generic async methods

# A simple async helper

class Repository < ActiveRecord::Base

# This will be called by a worker when a job needs to be processed

def self.perform(id, method, *args)

find(id).send(method, *args)

end

# We can pass this any Repository instance method that we want to

# run later.

def async(method, *args)

Resque.enqueue(Repository, id, method, *args)

end

end

# Now we can call any method and have it execute later:

@repo.async(:update_disk_usage)

@repo.async(:update_network_source_id, 34)

Page 30: To Batch Or Not To Batch

@lmea #rubyday

Resque: anatomy of a worker

# a worker does this:

start

loop do

if job = reserve

job.process

else

sleep 5

end

end

shutdown

Page 31: To Batch Or Not To Batch

@lmea #rubyday

Resque: working the queues

$ QUEUES=critical,high,low rake resque:work

$ QUEUES=* rake resque:work

$ PIDFILE=./resque.pid QUEUE=export_jobs rake environment resque:work

task "resque:setup" => :environment do

AppConfig.a_parameter = ...

end

Page 32: To Batch Or Not To Batch

@lmea #rubyday

Resque: monit recipe

# example monit monitoring recipe

check process resque_worker_batch_01

with pidfile /app/current/tmp/pids/worker_01.pid

start program = "/bin/bash -c 'cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup

rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/worker_01.pid'" as uid

deploy and gid deploy

stop program = "/bin/bash -c 'cd /app/current && kill -s QUIT `cat tmp/pids/worker_01.pid` && rm

-f tmp/pids/worker_01.pid; exit 0;'"

if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory?

group resque_workers

Page 33: To Batch Or Not To Batch

@lmea #rubyday

Resque: built-in monitoring

Page 34: To Batch Or Not To Batch

@lmea #rubyday

Resque plugins

Resque-statushttps://github.com/quirkey/resque-status

Resque-schedulerhttps://github.com/bvandenbos/resque-scheduler/

More at: https://github.com/defunkt/resque/wiki/plugins

Page 35: To Batch Or Not To Batch

@lmea #rubyday

Resque-status

Simple trackable jobs for resque

Job instances have a UUID

Jobs can report their status while running

Page 36: To Batch Or Not To Batch

@lmea #rubyday

Resque-status

# inheriting from JobWithStatus

class ExportJob < Resque::JobWithStatus

# perform is an instance method

def perform

limit = options['limit'].to_i || 1000

items = Item.limit(limit)

total = items.count

exported = []

items.each_with_index do |item, num|

at(num, total, "At #{num} of #{total}")

exported << item.to_csv

end

File.open(local_filename, 'w') { |f| f.write(exported.join("\n")) }

complete(:filename=>local_filename)

end

end

Page 37: To Batch Or Not To Batch

@lmea #rubyday

Resque-status

job_id = SleepJob.create(:length => 100)

status = Resque::Status.get(job_id)

# the status object tell us:

status.pct_complete #=> 0

status.status #=> 'queued'

status.queued? #=> true

status.working? #=> false

status.time #=> Time object

status.message #=> "Created at ..."

Resque::Status.kill(job_id)

Page 38: To Batch Or Not To Batch

@lmea #rubyday

Resque-scheduler

Queueing for future execution

Scheduling jobs (like cron!)

Page 39: To Batch Or Not To Batch

@lmea #rubyday

Resque-scheduler

# run a job in 5 days

Resque.enqueue_in(5.days, SendFollowupEmail)

# run SomeJob at a specific time

Resque.enqueue_at(5.days.from_now, SomeJob)

Page 40: To Batch Or Not To Batch

@lmea #rubyday

Resque-scheduler

namespace :resque do

task :setup do

require 'resque'

require 'resque_scheduler'

require 'resque/scheduler'

Resque.redis = 'localhost:6379'

# The schedule doesn't need to be stored in a YAML, it just needs to

# be a hash. YAML is usually the easiest.

Resque::Scheduler.schedule = YAML.load_file('your_resque_schedule.yml')

# When dynamic is set to true, the scheduler process looks for

# schedule changes and applies them on the fly.

# Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule)

# methods can be used to alter the schedule

#Resque::Scheduler.dynamic = true

end

end

$ rake resque:scheduler

Page 41: To Batch Or Not To Batch

@lmea #rubyday

Resque-scheduler: the yaml configuration

queue_documents_for_indexing:

cron: "0 0 * * *"

class: QueueDocuments

queue: high

args:

description: "This job queues all content for indexing in solr"

export_items:

cron: "30 6 * * 1"

class: Export

queue: low

args: full

description: "This job does a weekly export"

Page 42: To Batch Or Not To Batch

@lmea #rubyday

Other (commercial)

SimpleWorker

http://simpleworker.com

SQS https://github.com/appoxy/aws/

http://rubygems.org/gems/right_aws

http://sdruby.org/video/024_amazon_sqs.m4v

Page 43: To Batch Or Not To Batch

@lmea #rubyday

Other (historical)

Beanstalkd and Stalkerhttp://asciicasts.com/episodes/243-beanstalkd-and-stalker

http://kr.github.com/beanstalkd/

https://github.com/han/stalker

Backgroundjob (Bj)https://github.com/ahoward/bj

BackgroundRbhttp://backgroundrb.rubyforge.org/

Page 44: To Batch Or Not To Batch

@lmea #rubyday

Other (different approaches)

Nanitehttp://www.slideshare.net/jendavis100/background-processing-with-nanite

Cloud Crowd

https://github.com/documentcloud/cloud-crowd/wiki/Getting-Started

Page 46: To Batch Or Not To Batch

@lmea #rubyday

http://www.flickr.com/photos/rkbcupcakes/3373909785/http://www.flickr.com/photos/anjin/23460398http://www.flickr.com/photos/vivacomopuder/3122401239http://www.flickr.com/photos/pacdog/4968422200http://www.flickr.com/photos/comedynose/3834416952http://www.flickr.com/photos/rhysasplundh/5177851910/http://www.flickr.com/photos/marypcb/104308457http://www.flickr.com/photos/shutterhacks/4474421855http://www.flickr.com/photos/kevinschoenmakersnl/5562839479http://www.flickr.com/photos/triplexpresso/496995086http://www.flickr.com/photos/saxonmoseley/24523450http://www.flickr.com/photos/gadl/89650415http://www.flickr.com/photos/matvey_andreyev/3656451273http://www.flickr.com/photos/bryankennedy/1992770068http://www.flickr.com/photos/27282406@N03/4134661728/