31
Lecture 4 - Overview • Iterators • Regular expressions • Various things around Python

•Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Lecture 4 - Overview

• Iterators•Regular expressions•Various things around Python

Page 2: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Iterators

• One of the most common tasks in a program is to repeat code.

• In many languages loops are done with numerical indices:

for i in range(len(a))

• Another solution is to use iterators.A• An iterator says: “I can go through all objects in the collection I'm associated to one at a time".

• Must have the functions: __iter__(), next()

Page 3: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

A new range-functionclass my_range: def __init__(self,last=10): self.last=last; def __iter__(self): self.current_number = -1 return self def next(self): self.current_number += 1 if self.current_number == self.last: raise StopIteration return self.current_number

for n in my_range(10): print n

$ python my_range.py0123456789

Page 4: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Another iterator

# -*- coding: utf-8 -*-# iterator_01.pyclass ColorIterator:

colors = ["Red", "Green", "Blue", "Yellow", "Black", "Brown"]

def __iter__(self):self.current_color = -1return self

def next(self):self.current_color += 1if self.current_color == len(self.__class__.colors):

raise StopIterationreturn self.__class__.colors[self.current_color]

if __name__ == "__main__":ci = ColorIterator()for color in ci:

print color

$> python iterator_01.pyRedGreenBlueYellowBlackBrown

Page 5: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Implementing using generators• The statement yield store a function's state• Next time the function is called, execution will continue after

the yield-statement, with local variables kept.

# -*- coding: utf-8 -*-# iterator_02.pydef colors(available = ["Red", "Green", "Blue",

"Yellow", "Black", "Brown"]): for color in available: yield colorfor color in colors(): print color

$> python iterator_02.pyRedGreenBlueYellowBlackBrown

Page 6: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Fibonacci-series with a generator iterator

• The Fibonacci-series is a number series that is begun with 0 and 1. Thereafter each number is the sum of the two preceding numbers, 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144…

• The series occur in many different contexts, for example within biology.

• The series is never ending and therefore suitable to be implemented using an iterator.

# -*- coding: utf-8 -*-# iterator_03.pydef fib(limit=10):

x, y, count = 0, 1, 0while count < limit:

yield xx, y = y, x + ycount += 1

if __name__ == "__main__":for num in fib(15):

print num,

Page 7: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Create a list from an operator• If we want to save the

generated elements in an iterator we can use functions from the module itertools.

# -*- coding: utf-8 -*-# iterator_04.pydef fib( ):

x, y = 0, 1while True:

yield xx, y = y, x + y

if __name__ == "__main__":import itertoolsprint list(itertools.islice(fib(), 10))

$> python iterator_04.py[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Page 8: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Parallell iterators• To iterate over two collections in

parallell the function izip is used, also from itertools.

• The module itertools contain several functions for iterators, see the documentation for details.

# -*- coding: utf-8 -*-# iterator_05.pyimport itertoolsa = ["a1", "a2", "a3", "a4"]b = ["b1", "b2", "b3"]if __name__ == "__main__":

for x, y in itertools.izip(a, b):print x, y

$> python iterator_05.pya1 b1a2 b2a3 b3

Page 9: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Processing sequences

● Often you process sequences in similar ways. For some common operations you don't have to use for-loops:

● To select a subset of a sequence:filtered sequence = filter(function, orig_seq)

● To apply any elementwise function:new_sequence = map(function, orig_seq)

Page 10: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Filter and lambda

● Filters are used to remove elements: short_sequence=filter(function, sequence)

● Lambda is used to write anonymous functions

>>> g = lambda x: x*x>>> g(3)9 >>> >>> nums = range(2, 50) >>> for i in range(2, 7): ...     nums = filter(lambda x: x == i or x % i, nums)... >>> print nums # gives what?

Page 11: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Regular expressions

● Used when searching or matching for string patterns.

● ”You can think of regular expressions as wildcards on steroids.” - http://www.regular-expressions.info/

● Using wildcard notation, you specify *.txt to find .txt-files. The regex equivalent is:

.*\.txt$ ● Defines a set of normal strings out of a regular

expression string

Page 12: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Regular expressions● Regular expressions aren't specific to Python

but rather used in many other languages.● Often abbreviated 'regex'● The module in Python is 're'● A more complicated example:

\b[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[A-Z]{2,4}\b

which matches most e-mail addresses.

Page 13: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Using regular expressions

Create a pattern.Matching the pattern against one or more texts.Use the result.

# -*- coding: utf-8 -*-

# Import regex support, more about this laterimport re

# 1. Create a patternp = re.compile(r'[a-z]+')

# 2. Match the patternresult = p.match('test word')

# 3. Use the patternif result:

print "Match"else:

print "No match"

Page 14: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Searching in the whole string

Create a pattern in the same way. Use the method search instead of match.

# -*- coding: utf-8 -*-import re

# Create a patternp = re.compile(r'[a-z]+')

# Test strings = '12345 testword 12345'

# Use a matchif p.match(s):

print "Match"else:

print "No match"

# Use search insteadif p.search(s):

print "Found"else:

print "Not found"

Page 15: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Finding all occurances

To find all substrings in the text that match the pattern the method findall is used.

>>> import re>>> p = re.compile(r'[a-z]+')>>> print p.findall("a text with several words")['a', 'text', 'with', 'several', 'words']

Page 16: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Operators in regular expressions

Character(s) Matches. Any character except newline^ The start of the string$ The end of a string* Zero or more repetitions of the

preceding token.+ One or more repetitions{m} Exactly m repetitions{m, n} At least m and max m repetitions[...] Any character inside the setA|B A or B

Page 17: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

More operators in regular expressions

\d Numbers between 0 and 9. Equivalent to [0-9]\D Everything except\s White spaces. [\t\n\r\f\v]\S\w\W

Page 18: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Examples

Pattern At least one hit No hitr'ain' 'main', 'rain', 'Maine' 'MAIN', 'ai'r'ab+' 'abc', 'abbbbbbc' 'acc', 'bbc'r'ab*' 'abc', 'abbc', 'acc' 'bbc'r'[abc]' 'a', 'b', 'c' 'hej, 'A'r'[a-h]' 'a', 'f', 'ehiusc' 'urtyx'r'(ab){2}' 'abab', 'ababababc' 'ab', 'abcdab'r'^ab'

Page 19: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Some things around Python

● Debugging● Profiling● Python implementations● Other libraries● Automated testing

Page 20: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Debugger

● A program that runs your program in a controlled way. Enables you to step through your code, line-by-line, and see how local variables change value.

● Helps you track down bugs and can also be used to understand how code works.

● Classical use case: Some unknown error is occuring and the cause isn't evident from the program output. You run the program in debug mode and try to produce error, i.e. to recreate the error scenario. When the error occurs, the debugger halts and you can see where it happens and the local variables.

● Seldom used in academia, very common in the industry.

Page 21: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Debugger in Python

● The standard debugger for Python is pdb. You definately want to use some GUI-frontend to this debugger, such as the one provided in Wing.

● Error messages are quite good in Python, so debuggers are not as useful as when programming C++ for example.

Page 22: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Profiling

● A profiler gathers information during execution about performance in some aspect (usually execution time)

Two types:● exact (deterministic) profilers - follows the

execution and save function call count for example● statistical profilers - use random sampling to

determine where the program is spending the time. Only provides an approximation, but the program can run with less intrusion.

Page 23: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Python implementations

● CPython, the official version. Stands for Classic Python. Implemented in C.

● IronPython. Running in Microsofts .NET-framework. Makes the .NET-libraries available in Python. Implemented in C#.

● Jython. For integration with Java-applications. Can import any Java-object.

● Unladen Swallow, CPython modification at Google to increase speed

Page 24: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Other libraries

● Django, Pylons, web2py, TurboGears - web frameworks

● SQLAlchemy - database toolkit● PIL - Imaging library

Page 25: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Automated testing

For small programs it's often simple to determine whether the program does what it's supposed to do.When the program feature set grows, it's not practical to do manual testing any longer.When you make a small change, you want to verify that the existing functionality is still intact.This requires some form of systematic code testing.

Page 26: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Automated testing

● Especially important when using an rapid prototyping development, which Python partly is aimed at.

● So how can we test that the correct output/behaviour of the program without implementing exactly the same functions again?

Page 27: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Automated testing

● .. you could compare to previous output of the program, if your program is deterministic and no internal or external random variables affect the output. Random variables could be clearly defined as such in an algorithm, or could be introduced be letting your output be dependent on things as execution time.

● Or you can check that program output is within some reasonable limits that you set yourself.

● If developing an algorithm, it's good to build a test suite of known problematic cases, and preferably have some automatic quality measurements for each case.

Page 28: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Knowing a programming language

● Not only syntax and function matter● The underlying implementation of different

constructs is essential for all kinds of performance.

● Having an overview about libraries and being able to quickly start using them if necessary

● Knowing about available development tools and roughly how to use them.

Page 29: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Knowledge - skill

● Being a good programmer is a combination of having both the right knowledge and skill.

● Knowledge - knowing concepts, principles and information (the theory) regarding a particular subject

● Skill - ability to produce solutions in a problem domain (”a well trained piano player”, ”a skilled carpenter”) Often a combination of using knowledge and experience.

Page 30: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

Training• Training is something natural in many areas: Music, sports

and mathematics for example. • In programming training often gets less focus.• Programming is a huge area, but many things are repeated in

different contexts.• If you have good command of a language, you decrease the

threshold when entering new areas of the language, tackle a new problem or start using a new module.

• It's easier to learn new languages, when you already know some.

Page 31: •Iterators •Regular expressions •Various things ...€¦ · Processing sequences ... The regex equivalent is:.*\.txt$ ... Regular expressions aren't specific to Python but rather

What is good training?

• Time without interruptions.• The possibility to try many times until success is reached.

• Being able to explore and try out things, without negative consequences if things don't work.

• A task that challenges you, but is within reach.• Quick feedback