59
GC3: Grid Computing Competence Center Introduction to Python programming, II (with a hint of MapReduce) Riccardo Murri Grid Computing Competence Center, University of Zurich Oct. 10, 2012

Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

GC3: Grid Computing Competence Center

Introduction to Pythonprogramming, II(with a hint of MapReduce)

Riccardo MurriGrid Computing Competence Center,University of Zurich

Oct. 10, 2012

Page 2: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Today’s class

Explain more Python constructs and semantics bylooking at John Arley Burns’ MapReduce in 98 lines ofPython.

These slides are available for download from:http://www.gc3.uzh.ch/teaching/lsci2012/lecture03.pdf

LSCI2012 Python II Oct. 10, 2012

Page 3: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

References

See the course website for an extensive andcommented list.

– Dean, J., and Ghemawat, S.: MapReduce:Simplified Data Processing on Large Clusters,OSDI’04

– Greiner, J., Wong, S.: Distributed ParallelProcessing with MapReduce

– Carter, J.: Simple MapReduce with Ruby andRinda

LSCI2012 Python II Oct. 10, 2012

Page 4: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

What is MapReduce?

MapReduce is:

1. a programming model

2. an associated implementation

Both are important!!

LSCI2012 Python II Oct. 10, 2012

Page 5: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce

The Map functionprocesses akey/value pair toproduceintermediatekey/value pairs.

Image source: Greiner, J., Wong, S.: Distributed Parallel Processing with MapReduce

LSCI2012 Python II Oct. 10, 2012

Page 6: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce

The Reducefunction mergesall intermediatevalues associatedwith a given key.

Image source: Greiner, J., Wong, S.: Distributed Parallel Processing with MapReduce

LSCI2012 Python II Oct. 10, 2012

Page 7: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce: advantages of the model

Programs written in this style are automaticallyparallelized and executed on a large cluster ofmachines . . .

Quoted from: Dean and Ghemawat: MapReduce: Simplified Data Processing on Large Clusters

LSCI2012 Python II Oct. 10, 2012

Page 8: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Example: word count

Input is a text file, to be split at line boundaries.

Image source: http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/

LSCI2012 Python II Oct. 10, 2012

Page 9: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Example: word count

The Map function scans an input line and outputs apair (word, 1) for each word in the text line.

Image source: http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/

LSCI2012 Python II Oct. 10, 2012

Page 10: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Example: word count

The pairs are shuffled and sorted so that each reducergets all pairs (word, 1) with the same word part.

Image source: http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/

LSCI2012 Python II Oct. 10, 2012

Page 11: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Example: word count

The Reduce function gets all pairs (word, 1) with the sameword part, and outputs a single pair (word, count) wherecount is the number of input items received.

Image source: http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/

LSCI2012 Python II Oct. 10, 2012

Page 12: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Example: word count

The global output is a list of pairs (word, count) where countis the number of occurences of word in the input text.

Image source: http://blog.jteam.nl/2009/08/04/introduction-to-hadoop/

LSCI2012 Python II Oct. 10, 2012

Page 13: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce: features of the implementation

The run-time system takes care of the details:– partitioning the input data,– scheduling the program execution,– handling machine failures,– managing the required inter-machine

communication.

Quoted from: Dean and Ghemawat: MapReduce: Simplified Data Processing on Large Clusters

LSCI2012 Python II Oct. 10, 2012

Page 14: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce: features of the implementation

The run-time system takes care of the details:– partitioning the input data,– scheduling the program execution,– handling machine failures,– managing the required inter-machine

communication.

Quoted from: Dean and Ghemawat: MapReduce: Simplified Data Processing on Large Clusters

LSCI2012 Python II Oct. 10, 2012

Page 15: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce: features of the implementation

The run-time system takes care of the details:– partitioning the input data,– scheduling the program execution,– handling machine failures,– managing the required inter-machine

communication.

Quoted from: Dean and Ghemawat: MapReduce: Simplified Data Processing on Large Clusters

LSCI2012 Python II Oct. 10, 2012

Page 16: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce: features of the implementation

The run-time system takes care of the details:– partitioning the input data,– scheduling the program execution,– handling machine failures,– managing the required inter-machine

communication.

Quoted from: Dean and Ghemawat: MapReduce: Simplified Data Processing on Large Clusters

LSCI2012 Python II Oct. 10, 2012

Page 17: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce: features of the implementation

The run-time system takes care of the details:– partitioning the input data,– scheduling the program execution,– handling machine failures,– managing the required inter-machine

communication.

Quoted from: Dean and Ghemawat: MapReduce: Simplified Data Processing on Large Clusters

LSCI2012 Python II Oct. 10, 2012

Page 18: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

MapReduce: features of the implementation

The run-time system takes care of the details:– partitioning the input data,– scheduling the program execution,– handling machine failures,– managing the required inter-machine

communication.

These are all highly nontrivial tasks to handle!

The quality of a MapReduce implementation should bejudged by how effective it is at handling thenon-Map/Reduce part.

LSCI2012 Python II Oct. 10, 2012

Page 19: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Back to Python!

mapreduce.py by John Arley Burns is a simplePython class that simulates running a MapReducealgorithm using in-memory data structures.

A MapReduce algorithm is specified by subclassingthe MapReduce class and overriding methods toprovide the Split, Map, and Reduce functions.

(There’s no Partition/Shuffle function because all thedata is kept in memory and sorted there, so no localityissues.)

LSCI2012 Python II Oct. 10, 2012

Page 20: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

import refrom mapreduce import MapReduce

class WordCount(MapReduce):def __init__(self, data):

MapReduce.__init__(self)self.data = data

def split_fn(self, data):def line_to_tuple(line):

return (None, line)data_list = [

line_to_tuple(line)for line in data.splitlines() ]

return data_list

def map_fn(self, key, value):for word in re.split(r’\W+’, value.lower()):

bareword = re.sub(r"[ˆA-Za-z0-9]*", r"", word);if len(bareword) > 0:

yield (bareword, 1)

def reduce_fn(self, word, count_list):return [(word, sum(count_list))]

def output_fn(self, output_list):sorted_list = sorted(output_list, key=operator.itemgetter(1))for word, count in sorted_list:

print(word, count)

The word countexample usingmapreduce.py

LSCI2012 Python II Oct. 10, 2012

Page 21: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Importing modules

import re

from mapreduce import MapReduce

class WordCount(MapReduce):# ...

def map_fn(self, key, value):

for word in re.split (...):

bareword = re.sub (...)if len(bareword) > 0:

yield (bareword, 1)

# ...

This imports the re(regular expressions)

module.

All names defined inthat module are nowvisible under the re

namespace, e.g.,re.sub, re.split.

LSCI2012 Python II Oct. 10, 2012

Page 22: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Importing names

import re

from mapreduce import MapReduce

class WordCount( MapReduce ):

def __init__(self, data):

MapReduce .__init__(self)

self.data = data

# ...

This imports theMapReduce name,

defined in themapreduce module,

into this module’snamespace.

So you need not usea prefix to qualify it.

LSCI2012 Python II Oct. 10, 2012

Page 23: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Defining objects

class WordCount(MapReduce):

def __init__(self, data):MapReduce.__init__(self)self.data = data

# ...

The class keywordstarts the definition

of a class (in the OOPsense).

The class definition isindented.

LSCI2012 Python II Oct. 10, 2012

Page 24: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Inheritance

class WordCount( MapReduce ):

def __init__(self, data):MapReduce.__init__(self)self.data = data

# ...

This tells Python thatthe WordCount class

inherits from theMapReduce class.

Every class mustinherit from some

other class; the rootof the class hierarchyis the built-in object

class.

LSCI2012 Python II Oct. 10, 2012

Page 25: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Declaring methods

class WordCount(MapReduce):

def init (self, data):

MapReduce.__init__(self)self.data = data

# ...

A method declarationlooks exactly like afunction definition.

Every method musthave at least one

argument, namedself.

(Why the doubleunderscore? More on

this later!)

LSCI2012 Python II Oct. 10, 2012

Page 26: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

The self argument

class WordCount(MapReduce):

def __init__( self , data):

MapReduce.__init__( self )

self .data = data

# ...

self is a reference tothe object instance(like, e.g., this in

Java).

It is used to accessattributes and invoke

methods of theinstance itself.

LSCI2012 Python II Oct. 10, 2012

Page 27: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

The self argument

Every method of a Python object always has selfas first argument.

However, you do not specify it when calling a method:it’s automatically inserted by Python:

>>> class ShowSelf(object):... def show(self):... print(self)...>>> x = ShowSelf() # construct instance>>> x.show() # ‘self’ automatically inserted!<__main__.ShowSelf object at 0x299e150>

The self variable is a reference to the object instanceitself. You need to use self when accessing methodsor attributes of this instance.

LSCI2012 Python II Oct. 10, 2012

Page 28: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

The self argument

class WordCount(MapReduce):def __init__(self, data):

MapReduce.__init__(self)

self .data = data

# ...

Q: (1)

Why is the dataidentifier qualified

with the self.namespace?

LSCI2012 Python II Oct. 10, 2012

Page 29: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

The self argument

class WordCount(MapReduce):def __init__(self, data):

MapReduce.__init__( self )self.data = data

# ...

Q: (2)

Why do we explicitlywrite self here?

LSCI2012 Python II Oct. 10, 2012

Page 30: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Name resolution rules

Within a function/method body, names are resolvedaccording to the LEGB rule:

L Local scope: any names defined in the currentfunction;

E Enclosing function scope: names defined inenclosing functions (outermost last);

G global scope: names defined in the toplevel ofthe current module;

B Built-in names (i.e., Python’s builtinsmodule).

Any name that is not in one of the above scopes mustbe qualified.

So you have to write self.data to call a method on thisinstance, re.sub to mean a function defined in module re,MapReduce. init to reference a method defined in theMapReduce class, etc.

LSCI2012 Python II Oct. 10, 2012

Page 31: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Object attributes

A Python object is (in particular) a key-value mapping:attributes (keys) are valid identifiers, values can beany Python object.

Any object has attributes, which you can access(create, read, overwrite) using the dot notation:

# create or overwrite the ‘name’ attribute of ‘w’w.name = "Joe"

# get the value of ‘w.name’ and print itprint (w.name)

So, in the constructor you create the required instanceattributes using self.var = ...

Note: also methods are attributes!

LSCI2012 Python II Oct. 10, 2012

Page 32: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

No access control

There are no “public”/“private”/etc. qualifiers forobject attributes.

Any code can create/read/overwrite/delete anyattribute on any object.

There are conventions, though:

– “protected” attributes: name

– “private” attributes: name

(But again, note that this is not enforced by thesystem in any way.)

LSCI2012 Python II Oct. 10, 2012

Page 33: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Class attributes, I

Classes are Python objects too, hence they can haveattributes.

Class attributes can be created with the variableassignment syntax in a class definition block:

class A(object):class_attr = valuedef __init__(self):

# ...

Class attributes are shared among all instances ofthe same class!

LSCI2012 Python II Oct. 10, 2012

Page 34: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Class attributes, II

Methods are class attributes, too.

However, looking up a method attribute on an instancereturns a bound method, i.e., one for which self isautomatically inserted.

Looking up the same method on a class, returns anunbound method, which is just like a regular function,i.e., you must pass self explicitly.

LSCI2012 Python II Oct. 10, 2012

Page 35: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Constructors, I

class WordCount(MapReduce):

def init (self, data):MapReduce.__init__(self)self.data = data

# ...

The init methodis the instance

constructor.

It should neverreturn any value

(other than None).

LSCI2012 Python II Oct. 10, 2012

Page 36: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Constructors, II

The __init__ method is the instance constructor.It should never return any value (other than None).

However, you call a constructor by class name:

# make ‘wc’ an instance of ‘WordCount’wc = WordCount("some text")

(Again, note that the self part is automaticallyinserted by Python.)

LSCI2012 Python II Oct. 10, 2012

Page 37: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

No overloading

Python does not allow overloading of functions.

Any function.

Hence, no overloading of constructors.

So: a class can have one and only one constructor.

LSCI2012 Python II Oct. 10, 2012

Page 38: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Constructor chaining

When a class is instanciated, Python only calls thefirst constructor it can find in the class inheritancecall-chain.

If you need to call a superclass’ constructor, youneed to do it explicitly:

class WordCount(MapReduce):def __init__(self, ...):

# do WordCount-specific stuff hereMapReduce.__init__(self, ...)# some more WordCount-specific stuff

Calling a superclass constructor is optional, and it canhappen anywhere in the __init__ method body.

LSCI2012 Python II Oct. 10, 2012

Page 39: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Multiple-inheritance

Python allows multiple inheritance.

Just list all the parent classes:

class C(A,B):# class definition

With multiple inheritance, it is your responsibility tocall all the needed superclass constructors.

Python uses the C3 algorithm to determine the callprecedence in an inheritance chain.

You can always query a class for its “methodresolution order”, via the __mro__ attribute:>>> C.__mro__(<class ’ex.C’>, <class ’ex.A’>, <class ’ex.B’>, <type ’object’>)

LSCI2012 Python II Oct. 10, 2012

Page 40: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Nested functions

import re

class WordCount(MapReduce):# ...

def split_fn(self, data):

def line to tuple(line):

return (None, line)data_list = [

line to tuple (line)

for line in data.splitlines()]return data_list

# ...

You can definefunctions (andclasses) within

functions.

The nested functionsare only visible within

the enclosingfunction.

(But they can captureany variable from the

enclosing functionenvironment by

name.)

LSCI2012 Python II Oct. 10, 2012

Page 41: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

List comprehensions, I

class WordCount(MapReduce):# ...

def split_fn(self, data):def line_to_tuple(line):

return (None, line)

data list = [

line to tuple(line)

for line in data.splitlines() ]

return data_list

# ...

Q: What is this?

LSCI2012 Python II Oct. 10, 2012

Page 42: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

An easy exercise

A dotfile is a file whose name starts with a dotcharacter “.”.

How can you list the full pathname of all dotfiles in agiven directory?

(The Python library call for listing the entries in adirectory is os.listdir(), which returns a list of filenames.)

LSCI2012 Python II Oct. 10, 2012

Page 43: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

A very basic solution

Use a for loop to accumulate the results into a list:

dotfiles = [ ]for entry in os.listdir(path):

if entry.startswith(’.’):dotfiles.append(os.path.join(path, entry))

LSCI2012 Python II Oct. 10, 2012

Page 44: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

List comprehensions, II

Python has a better and more compact syntax forfiltering elements of a list and/or applying a functionto them:

dotfiles = [ os.path.join(path, entry)for entry in dotfilesif entry.startswith(’.’) ]

This is called a list comprehension.

LSCI2012 Python II Oct. 10, 2012

Page 45: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

List comprehensions, III

The general syntax of a list comprehension is:

[ expr for var in iterable if condition ]

where:

expr is any Python expression;

iterable is a (generalized) sequence;

condition is a boolean expression, depending onvar;

var is a variable that will be bound in turn toeach item in iterable which satisfiescondition.

The ‘if condition’ part is optional.

LSCI2012 Python II Oct. 10, 2012

Page 46: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Generator expressions

List comprehensions are a special case of generatorexpressions:

( expr for var in iterable if condition )

A generator expression is a valid iterable and can beused to initialize tuples, sets, dicts, etc.:

# the set of square numbers < 100squares = set(n*n for n in range(10))

Generator expressions are valid expression, so theycan be nested:

# cartesian product of sets A and BC = set( (a,b) for a in A for b in B )

LSCI2012 Python II Oct. 10, 2012

Page 47: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Generators

Generator expressions are a special case of generators.

A generator is like a function, except it uses yieldinstead of return:

def squares():n = 0while True:

yield n*nn += 1

At each iteration, execution resumes with thestatement logically following yield in the generator’sexecution flow.

There can be multiple yield statements in a generator.

Reference: http://wiki.python.org/moin/GeneratorsLSCI2012 Python II Oct. 10, 2012

Page 48: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Generators in action

class WordCount(MapReduce):# ...

def map_fn(self, key, value):for word in re.split(r’\W+’, value.lower()):

bareword = re.sub(r"[ˆA-Za-z0-9]*", r"", word);if len(bareword) > 0:

yield (bareword, 1)

# ...

This makes map fninto a generator that

return pairs (word, 1)

LSCI2012 Python II Oct. 10, 2012

Page 49: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

The Iterator Protocol

An object can function as an iterator iff it implementsa next() method, that:

either returns the next value in the iteration,

or raises StopIteration to signal the end ofthe iteration.

An object can be iterated over with for if it implementsa __iter__() method.

Reference: http://www.python.org/dev/peps/pep-0234/

LSCI2012 Python II Oct. 10, 2012

Page 50: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

class WordIterator(object):

def __init__(self, text):self._words = text.split()

def next(self):if len(self._words) > 0:

return self._words.pop(0)else:

raise StopIteration

def __iter__(self):return self

Iterate over the words inthe given text: split the

text at white spaces, andreturn the parts

one by one.

Source code available at:

http://www.gc3.uzh.ch/teaching/lsci2011/lecture08/worditerator.py

LSCI2012 Python II Oct. 10, 2012

Page 51: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

class WordIterator( object ):

def __init__(self, text):self._words = text.split()

def next(self):if len(self._words) > 0:

return self._words.pop(0)else:

raise StopIteration

def __iter__(self):return self

Every class must inheritfrom a parent class.

If there’s no other class,inherit from the objectclass. (Root of the class

hierarchy.)

LSCI2012 Python II Oct. 10, 2012

Page 52: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Using iterators

Iterators can be used in a for loop:

>>> for word in WordIterator("a nice sunny day"):... print ’*’+word+’*’,...

*a* *nice* *sunny* *day*

They can be composed with other iterators for effect:

>>> for n, word in enumerate(WordIterator("a ...")):... print str(n)+’:’+word,...0:a 1:nice 2:sunny 3:day

See also: http://docs.python.org/library/itertools.html

LSCI2012 Python II Oct. 10, 2012

Page 53: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

class WordIterator(object):

def __init__(self, text):self._words = text.split()

def next(self):if len(self._words) > 0:

return self._words.pop(0)else:

raise StopIteration

def __iter__(self):return self

Q: What is this?

LSCI2012 Python II Oct. 10, 2012

Page 54: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Exceptions

Exceptions are objects that inherit from the built-inException class.

To create a new exception just make a new class:

class NewKindOfError(Exception):"""Do use the docstring to documentwhat this error is about."""pass

Exceptions are handled by class name, so they usuallydo not need any new methods (although you are freeto define some if needed).

See also: http://docs.python.org/library/exceptions.html

LSCI2012 Python II Oct. 10, 2012

Page 55: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

try:# code that might raise an exception

except SomeException:# handle some exception

except AnotherException, ex:# the actual Exception instance# is available as variable ‘ex’

else:# performed on normal exit from ‘try’

finally:# performed on exit in any case

The optional else clause is executed if and whencontrol flows off the end of the try clause.

The optional finally clause is executed on exit fromthe try or except block in any case.

Reference: http://docs.python.org/reference/compound stmts.html#try

LSCI2012 Python II Oct. 10, 2012

Page 56: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Raising exceptions

Use the raise statement with an Exception instance:

if an_error_occurred:raise AnError("Spider sense is tingling.")

Within an except clause, you can use raise with noarguments to re-raise the current exception:

try:something()

except ItDidntWork:do_cleanup()# re-raise exception to callerraise

LSCI2012 Python II Oct. 10, 2012

Page 57: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

Exception handling example

Read lines from a CSV file, ignoring those that do nothave the required number of fields. If other errorsoccur, abort. Close the file when done.

job_state = { } # empty dicttry:csv_file = open(’jobs.csv’, ’r’)for line in csv_file:

line = line.strip() # remove trailing newlinetry:

name, jobid, state = line.split(",")except ValueError:continue # ignore line

job_state[jobid] = stateexcept IOError:raise # up to caller

finally:csv_file.close()

LSCI2012 Python II Oct. 10, 2012

Page 58: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

A common case

The “cleanup” pattern is so common that Python has aspecial statement to deal with it:

with open(’jobs.csv’, ’r’) as csv_file:for line in csv_file:

line = line.strip() # remove trailing newlinetry:

name, jobid, state = line.split(",")except ValueError:continue # ignore line

job_state[jobid] = state

The with statement ensures that the file is closedupon exit from the with block (for whatever reason).

Reference: http://docs.python.org/reference/compound stmts.html#with

LSCI2012 Python II Oct. 10, 2012

Page 59: Introduction to Python programming, II · mapreduce.py by John Arley Burns is a simple Python class that simulates running a MapReduce algorithm using in-memory data structures. A

The “context manager” protocol

Any object can be used in a with statement, providedit defines the following two methods:

__enter__()Called upon entrance of the with block; it return valueis assigned to the variable following as (if any).

__exit__(exc_cls, exc_val, exc_tb)Called with three arguments upon exit from the block.If an exception occurred, the three arguments are theexception type, value and traceback; otherwise, thethree argument are all set to None

Q: Can you think of other examples where this could beuseful?

See also: http://www.python.org/dev/peps/pep-0343/LSCI2012 Python II Oct. 10, 2012