108
code == data

data == code | LRUG April 2008

  • Upload
    rob

  • View
    2.434

  • Download
    2

Embed Size (px)

DESCRIPTION

Morph and Pottery rubygem utilities for screen scrapers.

Citation preview

Page 1: data == code | LRUG April 2008

code == data

Page 2: data == code | LRUG April 2008

data == code

Page 3: data == code | LRUG April 2008

OpenStruct

Photo: Salt Fired http://www.flickr.com/photos/saltfired/201994906/

Page 4: data == code | LRUG April 2008

require 'ostruct'

Page 5: data == code | LRUG April 2008

o = OpenStruct.new

Page 6: data == code | LRUG April 2008

o.name = 'el rug'

Page 7: data == code | LRUG April 2008

o.name

Page 8: data == code | LRUG April 2008

=> "el rug"

Page 9: data == code | LRUG April 2008

o.inspect

Page 10: data == code | LRUG April 2008

=> <OpenStruct name=\"el rug\">

Page 11: data == code | LRUG April 2008

# not very classy

Page 12: data == code | LRUG April 2008

o.class

Page 13: data == code | LRUG April 2008

=> OpenStruct

Page 14: data == code | LRUG April 2008

class Fund < OpenStruct

def your_logicend

end

Page 15: data == code | LRUG April 2008
Page 16: data == code | LRUG April 2008

public class Fundextends HashMap {

}

Page 17: data == code | LRUG April 2008

public class Fundextends HashMap {

/* bad code smell */

}

Page 18: data == code | LRUG April 2008

public class Fundextends HashMap<String,

Object> {

}

Page 19: data == code | LRUG April 2008

public class Fundextends HashMap<String,

Object> {

/* this stinks! */

}

Page 20: data == code | LRUG April 2008
Page 21: data == code | LRUG April 2008
Page 22: data == code | LRUG April 2008

Morph

Photo: Salt Firedhttp://www.flickr.com/photos/saltfired/201998836/

Page 23: data == code | LRUG April 2008

gem install morph

Page 24: data == code | LRUG April 2008

require 'morph'

Page 25: data == code | LRUG April 2008

require 'hpricot'

require 'open-uri'

Page 26: data == code | LRUG April 2008
Page 27: data == code | LRUG April 2008
Page 28: data == code | LRUG April 2008
Page 29: data == code | LRUG April 2008

class Hubbit

include Morph

Page 30: data == code | LRUG April 2008

def initialize name

doc = Hpricot open"http://github.com/#{name}"

Page 31: data == code | LRUG April 2008

(doc/'label').collect do |l|

Page 32: data == code | LRUG April 2008

label = l.inner_text

Page 33: data == code | LRUG April 2008

value = l.next_sibling.inner_text.strip

Page 34: data == code | LRUG April 2008

morph(label, value)

Page 35: data == code | LRUG April 2008

class Hubbit

include Morph

def initialize name begin doc = Hpricot open("http://github.com/#{name}")

(doc/'label').collect do |node| label = node.inner_text value = node.next_sibling.inner_text.strip

morph(label, value)

end rescue raise "Couldn't find hubbit with name: #{name}" end endend

Page 36: data == code | LRUG April 2008

Hubbit.morph_methods

Page 37: data == code | LRUG April 2008

=> []

Page 38: data == code | LRUG April 2008

why = Hubbit.new 'why'

Page 39: data == code | LRUG April 2008

=> #<Hubbit @name="why the lucky

stiff", @email="why@why...">

Page 40: data == code | LRUG April 2008

Hubbit.morph_methods

Page 41: data == code | LRUG April 2008

=>["email", "email=","name", "name="]

Page 42: data == code | LRUG April 2008

why.name

Page 43: data == code | LRUG April 2008

=> "why the lucky stiff"

Page 44: data == code | LRUG April 2008

why. 年龄 = 21

Page 45: data == code | LRUG April 2008

why. 年龄

Page 46: data == code | LRUG April 2008

=> 21

Page 47: data == code | LRUG April 2008

why.company

Page 48: data == code | LRUG April 2008

NoMethodError: undefined method

'company'

Page 49: data == code | LRUG April 2008

# maybe should have

Page 50: data == code | LRUG April 2008

why.company?

Page 51: data == code | LRUG April 2008

# but that's not there yet

Page 52: data == code | LRUG April 2008
Page 53: data == code | LRUG April 2008

dhh = Hubbit.new 'dhh'

Page 54: data == code | LRUG April 2008

Hubbit.morph_methods

Page 55: data == code | LRUG April 2008

=> ["blog", "blog=", "company", "company=",

"email", "email=", "location", "location=",

"name", "name=","年龄 ", "年龄 ="]

Page 56: data == code | LRUG April 2008

dhh.company

Page 57: data == code | LRUG April 2008

=> "37signals"

Page 58: data == code | LRUG April 2008

why.company

Page 59: data == code | LRUG April 2008

=> nil

Page 60: data == code | LRUG April 2008

implementation

Page 61: data == code | LRUG April 2008

def method_missing sym, *argsis_writer = sym.to_s =~ /=$/

is_writer? morph_method_missing(sym, *args): super

end

Page 62: data == code | LRUG April 2008

def morph_method_missing symbol, *args attribute = symbol.to_s.chomp '=' # ... if block_given? yield self.class, attribute else self.class.class_eval

"attr_accessor :#{attribute}" send(symbol, *args) end # ...end

Page 63: data == code | LRUG April 2008

Soup

Photo: Chrissy Wainwrighthttp://www.flickr.com/photos/wainwright/380578681/

Page 64: data == code | LRUG April 2008

gem install soup

Page 65: data == code | LRUG April 2008

require 'soup'

Page 66: data == code | LRUG April 2008

Soup.prepare

Page 67: data == code | LRUG April 2008

s = Snip.new

Page 68: data == code | LRUG April 2008

s.name = 'el rug'

Page 69: data == code | LRUG April 2008

s.inspect

Page 70: data == code | LRUG April 2008

=> "<Snip id:unset name:el rug>"

Page 71: data == code | LRUG April 2008

s.save

Page 72: data == code | LRUG April 2008

=> "<Snip id:1 name:el rug>"

Page 73: data == code | LRUG April 2008

s = Snip['el rug']

Page 74: data == code | LRUG April 2008

=> "<Snip id:1 name:el rug>"

Page 75: data == code | LRUG April 2008

# has no class

Page 76: data == code | LRUG April 2008

s.class

Page 77: data == code | LRUG April 2008

=> nil

Page 78: data == code | LRUG April 2008

BlankSlate

Page 79: data == code | LRUG April 2008

class EmptyClass

instance_methods.each { |m| undef_method(m) unless m =~ /^(__|instance_eval|respond_to\?)/ }

end

class Snip < EmptyClass; end

Page 80: data == code | LRUG April 2008

Pottery

Photo: zhaoshourenhttp://www.flickr.com/photos/ajanhelendam/2326369128/

Page 81: data == code | LRUG April 2008

gem install pottery

Page 82: data == code | LRUG April 2008
Page 83: data == code | LRUG April 2008
Page 84: data == code | LRUG April 2008
Page 85: data == code | LRUG April 2008

def get_price_rows doc rows = rows_starting 'Bid(GBX)', doc

@bid_offer = rows.size > 0

Page 86: data == code | LRUG April 2008
Page 87: data == code | LRUG April 2008
Page 88: data == code | LRUG April 2008

rows = rows_starting 'Nav(GBX)', doc unless @bid_offer

rows end

Page 89: data == code | LRUG April 2008

def rows_starting label, doc

(doc/"table/tr/td/[text()='#{label}']/../../../tr")

end

Page 90: data == code | LRUG April 2008

def each_entry doc

get_price_rows(doc).each do |row|

cells = (row/'td').collect(&:inner_text). collect(&:strip).delete_if(&:blank?)

cells.in_groups_of(2) do |entry| yield entry[0], entry[1] end

end

end

Page 91: data == code | LRUG April 2008

doc = open_doc url

each_entry doc do |label, value| morph(label, value) end

time = Time.now.utc.to_s self.time = time.match(/\d\d:\d\d:\d\d/)[0] self.name = doc.at('.FundNameHeader').inner_text self.url = url self.date = Date.today.to_s self.id_name = "#{url}##{date}"

Page 92: data == code | LRUG April 2008

require 'pottery'

Page 93: data == code | LRUG April 2008

class Fund

include Pottery

Page 94: data == code | LRUG April 2008

def initialize fund=nil if fund url = "http://funds.ft.com/funds/#{fund}" doc = open_doc url

each_entry doc do |label, value| morph(label, value) end

time = Time.now.utc.to_s self.time = time.match(/\d\d:\d\d:\d\d/)[0] self.name = doc.at('.FundNameHeader').inner_text self.url = url self.date = Date.today.to_s self.id_name = "#{url}##{date}" end end

def bid_price @bid_offer ? bid_gbx : nav_gbx end

def offer_price @bid_offer ? offer_gbx : '' end

private

def each_entry doc get_price_rows(doc).each do |row| cells = (row/'td').collect(&:inner_text).collect(&:strip).delete_if(&:blank?) cells.in_groups_of(2) do |entry| yield entry[0], entry[1] end end end

def get_price_rows doc rows = rows_starting 'Bid(GBX)', doc @bid_offer = rows.size > 0 rows = rows_starting 'Nav(GBX)', doc unless @bid_offer rows end

def rows_starting label, doc (doc/"table/tr/td/[text()='#{label}']/../../../tr") end

Page 95: data == code | LRUG April 2008

end # of Fund

Page 96: data == code | LRUG April 2008
Page 97: data == code | LRUG April 2008

fund = Fund.new 'rufferllp/ruffer/RZBST'

Page 98: data == code | LRUG April 2008

Fund.morph_methods

Page 99: data == code | LRUG April 2008

["_52w_high", "_52w_high=", "_52w_low", "_52w_low=",

"change", "change=","date", "date=",

"gross_yield", "gross_yield=", "id_name", "id_name=",

"listed_yield", "listed_yield=", "name", "name=",

"nav_gbx", "nav_gbx=", "net_yield", "net_yield=",

"percentage_change", "percentage_change=", "time",

"time=", "url", "url="]

Page 100: data == code | LRUG April 2008

fund.save

Page 101: data == code | LRUG April 2008

Fund.restore 'rufferllp/ruffer/RZBST#2008-04-14'

Page 102: data == code | LRUG April 2008

#<Fund:0x1857414 @percentage_change="+0.96",

@gross_yield="-", @id_name="rufferllp/ruffer/RZBST#2008-04-14", @net_yield="-", @bid_offer=false, @date="2008-04-14", @_52w_low="142.38", @listed_yield="-", @time="23:00:14",

@name="Ruffer CF Baker Steel Gold O Acc NAV", @nav_gbx="183.90",

@url="rufferllp/ruffer/RZBST", @change="+1.74", @_52w_high="209.88">

Page 103: data == code | LRUG April 2008
Page 104: data == code | LRUG April 2008

Future features?

Page 105: data == code | LRUG April 2008

identifydata types

e.g. integer, date, string

Page 106: data == code | LRUG April 2008

generate Rails generator line

e.g. script/generate model x:string

y:integer

Page 107: data == code | LRUG April 2008

generate doodle definition!

Page 108: data == code | LRUG April 2008

data == code

http://code.whytheluckystiff.net/hpricot

http://github.com/lazyatom/soup

http://github.com/robmckinnon/morph

http://github.com/robmckinnon/pottery