Upload
gabriele-lana
View
5.946
Download
2
Embed Size (px)
DESCRIPTION
A little story to explain why I think ruby is pure magic
Citation preview
themagicof
ruby
once upona time there
was adeveloperworking for a big company
it’s very easy, we
need something quick and
dirty...
in the beginning...
1. login2. go to a
page3. scrap a
number4. that’s it!!!
in the beginning...
ok!
require "mechanize"
agent = Mechanize.new do | agent | agent.user_agent_alias = "Linux Mozilla"end
agent.get("http://example.com/login") do | login_page |
result_page = login_page.form_with(:name => "login") do | login_form | login["username"] = username login["password"] = password end.submit
result_page.search("//table[starts-with(@class,'boundaries')]").map do | option_table | { "name" => option_table.search("./caption/child::text()") "credits" => option_table.search("./descendant::td[position()=3]/child::text()") } end
end
enter mechanize + nokogiri
and then...
good, but we’d like to extract more informationsfrom a few different
pages
enter commander
command :is_registered do | command | command.syntax = "is_registered --username TELEPHONE_NUMBER [ --without-cache ]" command.description = "Check if user is registered" command.option "-u", "--username TELEPHONE_NUMBER", String, "user's telephone number" command.option "-n", "--without-cache", "bypass user's profile informations cache"
command.when_called do | arguments, options | options.default :username => "", :without_cache => false ok(is_registered(options.username, options.without_cache)) endend
extract code into functions
describe arguments
use page object pattern
def is_registered(username) browse do | agent, configuration | LoginPage.new( agent.get(configuration["login_page_url"]) ).is_registered?(username) endend
use page object pattern
class LoginPage < PageToScrub
def is_registered?(username) begin login(username, "fake password") rescue WrongPassword true rescue NotRegistered, WrongUsername, WrongArea false end end
def login(username, password) check_page( use_element(:login_form) do | login | login["username"] = username login["password"] = password end.submit ) end
def login_form @page.form_with(:name => "login") end
use page object pattern
class LoginPage < PageToScrub
def is_registered?(username) begin login(username, "fake password") rescue WrongPassword true rescue NotRegistered, WrongUsername, WrongArea false end end
def login(username, password) check_page( use_element(:login_form) do | login | login["username"] = username login["password"] = password end.submit ) end
def login_form @page.form_with(:name => "login") end
useful abstractions
use page object pattern
class PageToScrub
...
def use_element(element_name) element = self.send(element_name) raise MalformedPage.new(@page, "unable to locate #{element_name}") if ( element.nil? || (element.empty? rescue true) ) return yield(element) if block_given? element end
...
end
...few pages my A@@
45 pages and 93 different
pieces of data
after a while...
i need to feel more confidentwith this...
rspec is your friend :-)
describe "is_registered" do
context "XXX3760593" do
it "should be a consumer registered" do result = command(:is_registered, :username => "XXX3760593") result.should_not be_an_error result["area"].should == "consumer" result["registered"].should == true end
end
end
and then...
obviously not all the requests can be live
on our systems
enter the cache
def browse begin cache = CommandCache.new(database_path) configuration = YAML::load(File.open(configuration_path)) agent = Mechanize.new do | agent | agent.user_agent_alias = "Linux Mozilla" end yield(agent, configuration, cache) rescue Mechanize::ResponseCodeError => error failure(LoadPageError.new(error)) rescue Timeout::Error failure(TimeoutPageError.new) rescue ScrubError => error failure(error) rescue => error failure(UnknownError.new(error.to_s)) ensure cache.close! endend
enter the cache
def is_registered(username, without_cache) browse do | agent, configuration, cache | cache.command([ username, "is_registered" ]) do LoginPage.new( agent.get(configuration["login_page_url"]) ).is_registered?(username) end endend
single line change
enter the cache
class CommandCache
def initialize(database_path) @database = create_database(database_path) end
def command(keys) begin from_cache(keys) rescue NotInCache => e raise e if not block_given? to_cache(keys, yield) end end
end
better ask forgiveness than
permission
and then...
our systems cannot take
more than 25 concurrent requests...
make sure of it!!!
@!#$$@&#...
maybe we can use a proxy
...
god bless mechanize
def browse begin cache = CommandCache.new(database_path) configuration = YAML::load(File.open(configuration_path)) proxy = configuration["proxy"] agent = Mechanize.new do | agent | agent.user_agent_alias = "Linux Mozilla" agent.set_proxy(proxy["host"], proxy["port"]) if proxy end yield(agent, configuration, cache) rescue Mechanize::ResponseCodeError => error failure(LoadPageError.new(error)) rescue Timeout::Error failure(TimeoutPageError.new) rescue ScrubError => error failure(error) rescue => error failure(UnknownError.new(error.to_s)) ensure cache.close! endend
single line change
and then...
well, you know, we have a lot of users, so when proxy says is
overloaded you must retry a few times before give
up
@!#$$@&#
class Mechanize
alias real_fetch_page fetch_page
def fetch_page(params) ... attempts = 0 begin attempts += 1 real_fetch_page(params) rescue Net::HTTPServerException => error if is_overloaded?(error) sleep wait_for_seconds and retry if attempts < retry_for_times raise SystemError.new("SystemOverloaded") end raise error end end
def is_overloaded?(error) error.response.code == "403" end
end
god bless ruby
look at this line!!!
we can also test it :-)
class WEBrick::HTTPResponse
def serve(content) self.body = content self["Content-Length"] = content.length end
def overloaded serve("<html><body>squid</body></html>") self.status = 403 end
end
proxy = WEBrick::HTTPProxyServer.new( :Port => 2200, :ProxyContentHandler => Proc.new do | request, response | response.overloaded end)
trap("INT") { proxy.shutdown }proxy.start
finally ;-)
well... i guess we can release it...
the unexpected
but... wait...our i.t.
department said that
sometimes it crashes
you need to fix it by
tomorrow!!!
the unexpected
@!#$$@&#@!#$$@&#@!#$$@&#@!#$$@&#@!#$$@&#@!#$$@&#
If you want something done, do it yourselfhow to transform a command line program into a web application
class ScrubsHandler < Mongrel::HttpHandler
def process(request, response) command = request.params["PATH_INFO"].tr("/", "") elements = Mongrel::HttpRequest.query_parse(request.params["QUERY_STRING"]) parameters = elements.inject([]) do | parameters, parameter | name, value = parameter parameters << if value.nil? "--#{name}" else "--#{name}='#{value}'" end end.join(" ") response.start(200) do | head, out | head["Content-Type"] = "application/json" out.write(scrubs.execute(command, parameters)) end end
...end
almost a single line change
can this be true ?!?!?
well... i guess we can release it...
all the requests are live!!!
our systems are melting down!!! fix it!!! now!!!
after a while...
@!#$$@&#@!#$$@&#@!#$$@&#@!#$$@&#@!#$$@&#@!#$$@&#
change the cache implementationuse the file system luke...
def expire(keys, result = nil) FileUtils.rm path(keys), :force => true result.merge({ "from_cache" => false }) unless result.nil? end
def expire_after(keys, seconds, result = nil) expire(keys, result) if (from_cache(keys)["cached_at"] + seconds) <= now rescue nil end
def from_cache(keys) cache_file_path = path(keys) raise NotInCache.new(keys) unless File.exists?(cache_file_path) JSON.parse(File.read(cache_file_path)).merge({ "from_cache" => true }) end
def to_cache(keys, result) result = result.merge({ "cached_at" => now }) File.write(path(keys), JSON.generate(result)) result.merge({ "from_cache" => false }) end
are you handling the maintenance page right?
after a while...
maintenance page detection
class PageToScrub
def initialize(page) @page = page check_page_errors check_for_maintenance end
def check_for_maintenance @page.search("//td[@class='txtbig']").each do | node | if extract_text_from(node.search("./descendant::text()")) =~ /^.+?area.+?clienti.+?non.+?disponibile.+?stiamo.+?lavorando/im raise OnMaintenancePage.new(@page, "??? is on maintenance") end end end
...
end
good job gabriele, it’s
working beyond our expectations
after few days
tell me, these
“robots” of yours can be
used to check our systems
after few days
...
yes!
in the end...
• almost all self care’s features are replicated
• ~500.000 unique users/day• ~12.000.000 requests/day• ~4gb of cached data• specs are used to
monitoring the entire system
but in the beginning was...
it’s very easy, we
need something quick and
dirty...
that is for me the ruby
magic :-)
questions?