34
Writing a Crawler with Python and TDD 1 Thursday, July 21, 2011

Writing a Crawler with Python and TDD

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Writing a Crawler with Python and TDD

Writing a Crawler with Python and TDD

1Thursday, July 21, 2011

Page 2: Writing a Crawler with Python and TDD

The Problem

2Thursday, July 21, 2011

Page 3: Writing a Crawler with Python and TDD

CAP

3Thursday, July 21, 2011

Page 4: Writing a Crawler with Python and TDD

4Thursday, July 21, 2011

Page 5: Writing a Crawler with Python and TDD

Labs Legacy

5Thursday, July 21, 2011

Page 6: Writing a Crawler with Python and TDD

Gremlins No Manuals

6Thursday, July 21, 2011

Page 7: Writing a Crawler with Python and TDD

Labs Legacy

xmldata model?

rdf

7Thursday, July 21, 2011

Page 8: Writing a Crawler with Python and TDD

Come lo verifico?

8Thursday, July 21, 2011

Page 9: Writing a Crawler with Python and TDD

Cosa vi aspetta

• Catalogo, come funziona

• Il processo che ho usato

• Le librerie che ho usato

• Gli strumenti che ho usato

• Il risultato

9Thursday, July 21, 2011

Page 10: Writing a Crawler with Python and TDD

The Catalog

10Thursday, July 21, 2011

Page 11: Writing a Crawler with Python and TDD

11Thursday, July 21, 2011

Page 12: Writing a Crawler with Python and TDD

11Thursday, July 21, 2011

Page 13: Writing a Crawler with Python and TDD

11Thursday, July 21, 2011

Page 14: Writing a Crawler with Python and TDD

11Thursday, July 21, 2011

Page 15: Writing a Crawler with Python and TDD

12Thursday, July 21, 2011

Page 16: Writing a Crawler with Python and TDD

13Thursday, July 21, 2011

Page 17: Writing a Crawler with Python and TDD

14Thursday, July 21, 2011

Page 18: Writing a Crawler with Python and TDD

The program

15Thursday, July 21, 2011

Page 20: Writing a Crawler with Python and TDD

The Program / Installation

$ python setup.py install

17Thursday, July 21, 2011

Page 21: Writing a Crawler with Python and TDD

The Development

18Thursday, July 21, 2011

Page 22: Writing a Crawler with Python and TDD

import lxmlimport reimport urllib2import setuptools

Tools

virtualenvpipnose

19Thursday, July 21, 2011

Page 23: Writing a Crawler with Python and TDD

Test Driven Development

mantras

5min

20Thursday, July 21, 2011

Page 24: Writing a Crawler with Python and TDD

pip

21Thursday, July 21, 2011

Page 25: Writing a Crawler with Python and TDD

$ pip install SomePackage

$ pip uninstall SomeUnwantedPackage

22Thursday, July 21, 2011

Page 26: Writing a Crawler with Python and TDD

virtualenv

23Thursday, July 21, 2011

Page 27: Writing a Crawler with Python and TDD

$ virtualenv env $ ls -1 env/bin/ env/bin/pip env/bin/python ...

24Thursday, July 21, 2011

Page 28: Writing a Crawler with Python and TDD

$ env/bin/pip install nose$ env/bin/pip install lxml

$ env/bin/pip freeze > stable-req.txt$ cat stable-req.txt

$ env/bin/pip install -r stable-req.txt

25Thursday, July 21, 2011

Page 29: Writing a Crawler with Python and TDD

nose

26Thursday, July 21, 2011

Page 30: Writing a Crawler with Python and TDD

import unittest

class TestSomething(unittest.TestCase): def test_math_should_work(self): self.assertEquals(3, 1+2)

if __name__ == '__main__': unittest.main()

from nose.tools import assert_equals

class TestSomething: def test_math_should_work(_): assert_equals(3, 1+2)

Standard xUnit Library

Nose

Wordiness

27Thursday, July 21, 2011

Page 31: Writing a Crawler with Python and TDD

• Nose:

$ nosetest

• Python <= 2.6: not supported

• Python >=2.7

$ python -m unittest

Automatic Discovery

28Thursday, July 21, 2011

Page 32: Writing a Crawler with Python and TDD

• @SkipTest #function decorator

• raise SkipTest() #within the code

Nose - SkipTest

29Thursday, July 21, 2011

Page 33: Writing a Crawler with Python and TDD

$ curl http://google.com/opensearch | xmllint --format -

xmllint and curl

30Thursday, July 21, 2011

Page 34: Writing a Crawler with Python and TDD

Vim 7.3• Customizable:

:nnoremaps ,t :wa \| !nosetests

• Omni-completion: C-xC-o

• Colored Column:

:set cc+=1

• Two simple commands for indenting XMLs:

:%s/></>\r</g

gg=G

31Thursday, July 21, 2011