Upload
andrea-francia
View
1.727
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Writing a Crawler with Python and TDD
1Thursday, July 21, 2011
The Problem
2Thursday, July 21, 2011
CAP
3Thursday, July 21, 2011
4Thursday, July 21, 2011
Labs Legacy
5Thursday, July 21, 2011
Gremlins No Manuals
6Thursday, July 21, 2011
Labs Legacy
xmldata model?
rdf
7Thursday, July 21, 2011
Come lo verifico?
8Thursday, July 21, 2011
Cosa vi aspetta
• Catalogo, come funziona
• Il processo che ho usato
• Le librerie che ho usato
• Gli strumenti che ho usato
• Il risultato
9Thursday, July 21, 2011
The Catalog
10Thursday, July 21, 2011
11Thursday, July 21, 2011
11Thursday, July 21, 2011
11Thursday, July 21, 2011
11Thursday, July 21, 2011
12Thursday, July 21, 2011
13Thursday, July 21, 2011
14Thursday, July 21, 2011
The program
15Thursday, July 21, 2011
The Program / Usage
$ list-datasets http://catalog.org/description Descritption at: http://catalog.org/description Search url is: http://catalog.org/rdf/?count=&q= Total number of records: 10504 Url for all records: http://catalog.org/rdf/?count=10234&q= https://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_1.TIF https://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_17.TIF https://catalog.org/data/SRTM_CGIAR_JRC_v4/Z_10_19.TIF ...
16Thursday, July 21, 2011
The Program / Installation
$ python setup.py install
17Thursday, July 21, 2011
The Development
18Thursday, July 21, 2011
import lxmlimport reimport urllib2import setuptools
Tools
virtualenvpipnose
19Thursday, July 21, 2011
Test Driven Development
mantras
5min
20Thursday, July 21, 2011
pip
21Thursday, July 21, 2011
$ pip install SomePackage
$ pip uninstall SomeUnwantedPackage
22Thursday, July 21, 2011
virtualenv
23Thursday, July 21, 2011
$ virtualenv env $ ls -1 env/bin/ env/bin/pip env/bin/python ...
24Thursday, July 21, 2011
$ env/bin/pip install nose$ env/bin/pip install lxml
$ env/bin/pip freeze > stable-req.txt$ cat stable-req.txt
$ env/bin/pip install -r stable-req.txt
25Thursday, July 21, 2011
nose
26Thursday, July 21, 2011
import unittest
class TestSomething(unittest.TestCase): def test_math_should_work(self): self.assertEquals(3, 1+2)
if __name__ == '__main__': unittest.main()
from nose.tools import assert_equals
class TestSomething: def test_math_should_work(_): assert_equals(3, 1+2)
Standard xUnit Library
Nose
Wordiness
27Thursday, July 21, 2011
• Nose:
$ nosetest
• Python <= 2.6: not supported
• Python >=2.7
$ python -m unittest
Automatic Discovery
28Thursday, July 21, 2011
• @SkipTest #function decorator
• raise SkipTest() #within the code
Nose - SkipTest
29Thursday, July 21, 2011
$ curl http://google.com/opensearch | xmllint --format -
xmllint and curl
30Thursday, July 21, 2011
Vim 7.3• Customizable:
:nnoremaps ,t :wa \| !nosetests
• Omni-completion: C-xC-o
• Colored Column:
:set cc+=1
• Two simple commands for indenting XMLs:
:%s/></>\r</g
gg=G
31Thursday, July 21, 2011