Writing a Crawler with Python and TDD

Preview:

DESCRIPTION

 

Citation preview

Writing a Crawler with Python and TDD

1Thursday, July 21, 2011

The Problem

2Thursday, July 21, 2011

CAP

3Thursday, July 21, 2011

4Thursday, July 21, 2011

Labs Legacy

5Thursday, July 21, 2011

Gremlins No Manuals

6Thursday, July 21, 2011

Labs Legacy

xmldata model?

rdf

7Thursday, July 21, 2011

Come lo verifico?

8Thursday, July 21, 2011

Cosa vi aspetta

• Catalogo, come funziona

• Il processo che ho usato

• Le librerie che ho usato

• Gli strumenti che ho usato

• Il risultato

9Thursday, July 21, 2011

The Catalog

10Thursday, July 21, 2011

11Thursday, July 21, 2011

11Thursday, July 21, 2011

11Thursday, July 21, 2011

11Thursday, July 21, 2011

12Thursday, July 21, 2011

13Thursday, July 21, 2011

14Thursday, July 21, 2011

The program

15Thursday, July 21, 2011

The Program / Installation

$ python setup.py install

17Thursday, July 21, 2011

The Development

18Thursday, July 21, 2011

import lxmlimport reimport urllib2import setuptools

Tools

virtualenvpipnose

19Thursday, July 21, 2011

Test Driven Development

mantras

5min

20Thursday, July 21, 2011

pip

21Thursday, July 21, 2011

$ pip install SomePackage

$ pip uninstall SomeUnwantedPackage

22Thursday, July 21, 2011

virtualenv

23Thursday, July 21, 2011

$ virtualenv env $ ls -1 env/bin/ env/bin/pip env/bin/python ...

24Thursday, July 21, 2011

$ env/bin/pip install nose$ env/bin/pip install lxml

$ env/bin/pip freeze > stable-req.txt$ cat stable-req.txt

$ env/bin/pip install -r stable-req.txt

25Thursday, July 21, 2011

nose

26Thursday, July 21, 2011

import unittest

class TestSomething(unittest.TestCase): def test_math_should_work(self): self.assertEquals(3, 1+2)

if __name__ == '__main__': unittest.main()

from nose.tools import assert_equals

class TestSomething: def test_math_should_work(_): assert_equals(3, 1+2)

Standard xUnit Library

Nose

Wordiness

27Thursday, July 21, 2011

• Nose:

$ nosetest

• Python <= 2.6: not supported

• Python >=2.7

$ python -m unittest

Automatic Discovery

28Thursday, July 21, 2011

• @SkipTest #function decorator

• raise SkipTest() #within the code

Nose - SkipTest

29Thursday, July 21, 2011

$ curl http://google.com/opensearch | xmllint --format -

xmllint and curl

30Thursday, July 21, 2011

Vim 7.3• Customizable:

:nnoremaps ,t :wa \| !nosetests

• Omni-completion: C-xC-o

• Colored Column:

:set cc+=1

• Two simple commands for indenting XMLs:

:%s/></>\r</g

gg=G

31Thursday, July 21, 2011

Recommended