57
Python Data Structures Zbigniew Jędrzejewski-Szmek Institute of Experimental Physics University of Warsaw Python Summer School Kiel, September 4, 2012 Version AP_version_2011-361-ge89b7ef This work is licensed under the Creative Commons CC BY-SA 3.0 License. 1 / 57

Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Python Data Structures

Zbigniew Jędrzejewski-Szmek

Institute of Experimental PhysicsUniversity of Warsaw

Python Summer School Kiel, September 4, 2012

Version AP_version_2011-361-ge89b7ef

This work is licensed under the Creative Commons CC BY-SA 3.0 License.

1 / 57

Page 2: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Outline

1 IntroductionMotivationHow does Python manage memory?

2 Comprehensions, dictionaries, and setsComprehensionsDictionariesHow to create a dictionary?Specialized dictionary subclassesSets

3 IteratorsIterator classesGenerator expressionsGenerator functions

4 Closures5 The end

2 / 57

Page 3: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

Introduction

3 / 57

Page 4: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

Why?

You can write sort(), but it’ll take one day.DRY. Use proper data structures.

4 / 57

Page 5: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

The venerable list

A sequence of arbitrary values

Question from initial survey:Is list.pop(0) faster than list.pop()?

5 / 57

Page 6: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

Reminder: computational complexity

f (x) = O(g(x))

There’s a constant C , such thatf (x) 6 C · g(x) for big enough x .

For “computational complexity” x is N

6 / 57

Page 7: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

Test, test, test

N = 1000L0 = range(N)L = L0[:]for i in xrange(N):

L.pop(0)

7 / 57

Page 8: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

How does time required depend on problem size?

0 200000 400000 600000 800000 1000000 1200000 1400000N

0

200

400

600

800

1000

1200

t/

us

8 / 57

Page 9: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

What does this mean?

Computational complexity depends on the data structure and operation —good to understand what you’re doing

9 / 57

Page 10: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

How is list implemented?

12 4 3 5 6 7

8 9 10 13 12 14 15 1611

L

list.pop(0) list.pop(-1)

10 / 57

Page 11: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

pop(0) vs pop(-1)

0 200000 400000 600000 800000 1000000 1200000 1400000N

0

200

400

600

800

1000

1200

t/

us

11 / 57

Page 12: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

Solutionfor the .pop(0) problem

collections.deque pops from both ends cheaply:.pop().popleft()

. . . but does not allow other removals in the middle

12 / 57

Page 13: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

Trimming from both sides

>>> d = collections.deque([1,2,3])

>>> d.pop()3

>>> d.popleft()1

>>> d.pop()2

13 / 57

Page 14: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

Testing is not enough here

Time complexity issuesdo not show up during development,because O(N2) is small for small N.

14 / 57

Page 15: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

The complexity constant

f (x) 6 C · g(x)

15 / 57

Page 16: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction Motivation

Complexity constant exampleCPython vs. Jython

0 200000 400000 600000 800000 1000000 1200000 1400000N

0

500

1000

1500

2000

2500

t/

us

16 / 57

Page 17: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Other languages have “variables”

int a = 1;

a = 2;

int b = a;

Idiomatic Python by David Goodger

17 / 57

Page 18: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Python has “names”

a = 1

a = 2

b = a

Idiomatic Python by David Goodger

18 / 57

Page 19: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Example

>>> L = [’bird’] * 3>>> L[1] += ’ free!’

19 / 57

Page 20: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Example

>>> L = [[’bird’]] * 3>>> L>>> L[0][0] = ’free’>>> L

20 / 57

Page 21: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Removing an element from a list

N = 1000L0 = range(N)L = L0[:]...

Why?

21 / 57

Page 22: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Objects are kept around if they are referenced

ints are referenced from both L and L0

12 4 3 5 6 7

8 9 10 13 12 14 15 1611

L0

L22 / 57

Page 23: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Demo

>>> x = 100000>>> y = 100000>>> print(id(x), id(y), sep=’\n’)

>>> x = 1>>> y = 1>>> print(id(x), id(y), sep=’\n’)

23 / 57

Page 24: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Identity and equality

is vs ==

if x is None:....

if x == 11:....

24 / 57

Page 25: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Introduction How does Python manage memory?

Why it’s good to use provided data structures

This describes an adaptive, stable, natural mergesort,modestly called timsort (hey, I earned it <wink>).

Tim Peters, in listsort.txt

25 / 57

Page 26: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets

Comprehensions, dictionaries, andsets

26 / 57

Page 27: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Comprehensions

List comprehensions

[day for day in week]

[day for day in week if day[0] != ’M’ ]

[(day, hour) for day in weekfor hour in [’12:00’, ’14:00’, ’16:00’] ]

27 / 57

Page 28: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets How to create a dictionary?

Creating dictionaries

A mapping of keys to arbitrary values

D = {’key1’: "value1",’key2’: "value2",

}

28 / 57

Page 29: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets How to create a dictionary?

From an iterator

text = """\lundi Mondaymardi Tuesdaymercredi Wednesdayjeudi Thursdayvendredi Fridaysamedi Saturdaydimanche Sunday"""

source = (line.split() for line in text.splitlines())

weekdays = dict(source)

29 / 57

Page 30: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets How to create a dictionary?

From a comprehension

{x:x**2 for x in xrange(10)}

30 / 57

Page 31: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets How to create a dictionary?

How much does it cost to enter an item in the dictionary?

>>> D = {}>>> L = range(10000)>>> for i in L:... D[i] = i

O∗(1)

31 / 57

Page 32: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets How to create a dictionary?

How much does it cost to retrieve an item from thedictionary?

>>> for i in L:... assert D[i] == i

O(1)

32 / 57

Page 33: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets How to create a dictionary?

How are dictionaries organized?

12 4 3 5 6 10 7 8

9

D

33 / 57

Page 34: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Specialized dictionary subclasses

Weekdays in French and in English

The order of keys is random!

A list of the weekdays?

>>> weekdays.keys()

[’mardi’, ’samedi’, ’vendredi’, ’jeudi’,’lundi’, ’dimanche’, ’mercredi’]

34 / 57

Page 35: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Specialized dictionary subclasses

collections.OrderedDict

>>> weekdays = collections.OrderedDict(source)

>>> weekdays.keys()[’lundi’, ’mardi’, ’mercredi’, ’jeudi’,’vendredi’, ’samedi’, ’dimanche’]

35 / 57

Page 36: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Specialized dictionary subclasses

Classifying objects

Sorting objects into groupsgrades.log:

20120101 John 520120102 John 420120107 Mary 320120109 Jane 2...

36 / 57

Page 37: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Specialized dictionary subclasses

Sorting objects into groupsversion 0

grades = {}for line in open(’grades.log’):

date, person, grade = line.split()grades[person].append(grade)

37 / 57

Page 38: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Specialized dictionary subclasses

Sorting objects into groupsversion 0.5

grades = {}for line in open(’grades.log’):

date, person, grade = line.split()if person not in grades:

grades[person] = []grades[person].apend(grade)

38 / 57

Page 39: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Specialized dictionary subclasses

Sorting objects into groupsversion 1.0

grades = collections.defaultdict(list)for line in open(’grades.log’):

date, person, grade = line.split()grades[person].append(grade)

39 / 57

Page 40: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Specialized dictionary subclasses

Counting objects in groups

Let’s make a histogram

grades = collections.Counter()for line in open(’grades.log’):

date, person, grade = line.split()grades[grade] += 1

>>> gradesCounter({’4’: 2, ’3’: 1, ’2’: 1, ’5’: 1})

>>> print(’\n’.join(x + ’ ’ + ’x’*y... for (x, y) in sorted(grades.items())))

1 x2 xxx3 x4 xxx5 xx

40 / 57

Page 41: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Sets

Set

A non-ordered collection of unique elements

visitors = set()

for connection in connection_log():ip = connection.ipif ip not in visitors:

print(’new visitor’)visitors.add(ip)

41 / 57

Page 42: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Comprehensions, dictionaries, and sets Sets

Using sets

Poetry:find words used by the poet, sorted by length

>>> poem = ’’’\... She walks in beauty, like the night... Of cloudless climes and starry skies;... And all that’s best of dark and bright... Meet ...... ’’’

>>> words = set(poem.lower().translate(None, ’,;’).split())>>> words{’she’, ’like’, ’cloudless’, ...}

>>> sorted(words, key=len, reverse=True)[’cloudless’, ’starry’, ’beauty’, ...]

42 / 57

Page 43: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators

Iterators

43 / 57

Page 44: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators

Collections and their iterators

sequence .__iter__() −→ iterator

iterator .next() −→ item

iterator .next() −→ item

iterator .next() −→ item

44 / 57

Page 45: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators

Iterators can be non-destructive or destructive

list

file

45 / 57

Page 46: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators

How can we create an iterator?

1 write a class with .__iter__ and .next2 use a generator expression3 write a generator function

46 / 57

Page 47: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators Iterator classes

1. Iterator class

import random

class randomaccess(object):def __init__(self, list):

self.indices = range(len(list))random.shuffle(self.indices)self.list = list

def next(self):return self.list[self.indices.pop()]

def __iter__(self):return self

47 / 57

Page 48: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators Generator expressions

2. Generator expressions

# chomped lines[line.rstrip() for line in open(some_file)]

# remove comments[line for line in open(some_file)

if not line.startswith(’#’)]

>>> type([i for i in ’abc’])<type ’list’>>>> type( i for i in ’abc’ )<type ’generator’>

48 / 57

Page 49: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators Generator functions

3. Generator functions

>>> def gen():... print ’--start’... yield 1... print ’--middle’... yield 2... print ’--stop’

>>> g = gen()

>>> g.next()--start1

>>> g.next()--middle2

>>> g.next()--stopTraceback (most recent call last):

...StopIteration

49 / 57

Page 50: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators Generator functions

Generator objects

>>> g = gen()>>> g<generator object gen at 0x7f24a5e1c690>

>>> g.<TAB>g.__iter__(g.next(g.send(g.throw(g.gi_running

50 / 57

Page 51: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators Generator functions

Generator objects__iter__ and next

>>> g.__iter__()<generator object gen at 0x7f24a5e1c690>>>> g<generator object gen at 0x7f24a5e1c690>

>>> g.next()--start1

51 / 57

Page 52: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators Generator functions

Generator objectssend, throw, and gi_running

>>> help(g.send)send(arg) -> send ’arg’ into generator,return next yielded value or raise StopIteration.

>>> help(g.throw)throw(typ[,val[,tb]]) -> raise exception in generator,return next yielded value or raise StopIteration.

>>> g.gi_running0

52 / 57

Page 53: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Iterators Generator functions

collections.namedtuple

>>> import collections>>> Person = collections.namedtuple(’Person’,... ’first last SSN’)>>> Person<class ’__main__.Person’>>>> Person.__doc__’Person(first, last, SSN)’>>> p = Person(’John’, ’Doe’, 123456789)>>> p[0], p[1], p[2](’John’, ’Doe’, 123456789)>>> p.first’John’>>> p.last’Doe’>>> p.SSN123456789

53 / 57

Page 54: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Closures

What namespaces are involved in running a function

>>> x = 11>>> def f():... x = 12>>> f()

xf

11

f

x12

54 / 57

Page 55: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Closures

Closures

>>> def f(x):... def g():... return x**2... return g

>>> g = f(11)>>> g<function g at 0x1bea1b8>>>> g()121

>>> g.func_closure(<cell at 0x1b6be50: int object at 0x18da6c8>,)

>>> g.func_closure[0].cell_contents11

55 / 57

Page 56: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

Closures

Closure visualization

g

gx

11ff

56 / 57

Page 57: Python Data Structures · Python Data Structures Author: Zbigniew Jedrzejewski-Szmek Created Date: 9/5/2012 11:51:28 PM

The end

That’s all!

57 / 57