29
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Python Tutorial I Jan 25, 2012 Daniel Fernandez and Alejandro Quiroz [email protected] [email protected] 1

Python Tutorial I Jan 25, 2012

  • Upload
    gloria

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Python Tutorial I Jan 25, 2012. Daniel Fernandez and Alejandro Quiroz [email protected] [email protected]. Outline. Introduction Getting started with Python available resources, the environment Data types and operations numbers, strings, lists, dictionaries, sets, tuples - PowerPoint PPT Presentation

Citation preview

Page 1: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Python Tutorial IJan 25, 2012

Daniel Fernandez and Alejandro Quiroz

[email protected]

[email protected]

1

Page 2: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Outline

• Introduction• Getting started with Python

– available resources, the environment• Data types and operations

– numbers, strings, lists, dictionaries, sets, tuples• Useful statements

– if/else ladders, for and while loops• File input/output• Functions• Errors and Exceptions• Modules• Exercises

2

Page 3: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

What is Python?

3

“When Guido began implementing Python, Guido van Rossum was also reading the published scripts from “Monty Python’s Flying Circus”, a BBC comedy series from the 1970s. Van Rossum

thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python”

Page 4: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

What is Python?

• Interpreted, interactive, object-oriented, portable programming language

• Easy to use & easy to learn Full-featured help. Just type help(object/method)

• Offers two to ten fold programmer productivity increases over languages like C, C++, Java, Visual Basic (VB), and Perl

"Python has been an important part of Google since the beginning, and remains so as the system grows and evolves. Today dozens of Google engineers use Python, and

we're looking for more people with skills in this language." said Peter Norvig, director of search quality at Google, Inc.

http://www.python.org/Quotes.html

4

Page 5: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Why Python?

C/C++

Perl

R

Python

5

Page 6: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Resources• www.stat115.com go to the python left menu to check more resources/tutorials.

• General resourceshttp://www.python.org/

• An enhanced Interactive Python shell http://ipython.scipy.org/

• Python IDE: – (recommended) wingware: http://wingware.com/– IDLE (free): http://www.python.org/idle/– Komodo (free trial): http://www.activestate.com/Products/Komodo/

• Python tools for computational molecular biologyhttp://www.biopython.org/

• Fast array manipulationhttp://www.stsci.edu/resources/software_hardware/numarray

6

Page 7: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Python Environment

• Interactive Mode:– simply type the command ‘python’ at UNIX machine/DOS– commands are read from a terminal

primary prompt, usually three greater-than signs (">>> ") – help() -- Enter the name of any module, keyword, or topic to get help on

using Python, e.g. math

• Stand-alone mode: – Make a file hello.py:

#! /usr/bin/pythonprint “hello”

– Run:python hello.py Or chmod +x hello.py./hello.py

7

Page 8: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

OOP: Objects (or Data types) and their methods (or Operations)

• Numbers ()>>> 2+2 # 4>>> (50-5*6)/4 # 5

# sign ("=") is used to assign a value to a variable >>> width = 20 >>> height = 5*9 >>> width * height #900

# float or integer>>> 3 / 2 # 1>>> float(3)/2 # 1.5>>> 3.0/2 # 1.5

# add ‘L’ suffix for long integer>>> 2**2 # 4>>> 9**20#12157665459056928801L

• Strings () can be enclosed by single or double or triple quotes (when we want to have a single quote ‘ inside a string, use double quotes, or use escape characters with single quotes)

# Strings can be indexed, sliced and # concatenated (from 0)>>> word = ’’This is a rather long string‘’# The first two characters ‘Th' >>> word[:2]>>> word[2:4] # ‘is'

# All but the first two characters>>> word[2:] # ‘is is a rather long string’>>> 'x' + word[1:] # ‘xhis is a rather long string’>>> word[-1] # the last character ‘g’>>> word[-2:]# the last two characters ‘ng’

# creating a new string>>> word[:2] +'at' # ‘That'

8

Page 9: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Objects (and their methods)

• Lists (append, extend, insert,…)>>> a = ['spam', 'eggs', 100, 1234]

– Like string indices, list indices start at 0, and lists can be sliced and concatenated

– Unlike strings, which are immutable, it is possible to change individual elements of a list

– The built-in function len() applies to lists: – from string

>>>x=‘a b c d e f’.split() # x=['a', 'b', 'c', 'd', 'e', 'f']

• Dictionaries () >>> tel = {'jack': 4098, 'sape': 4139} >>> tel['guido'] = 4127 # {'sape': 4139, 'guido': 4127, 'jack': 4098}>>> del tel['sape'] # {'jack': 4098, 'guido': 4127}>>> tel.keys() # ['jack', 'guido']>>> tel.items() # [('jack', 4098), ('guido', 4127)]

9

Page 10: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

• Sets: unordered collection with no duplicate elements.>>> a = set('abracadabra') >>> b = set('alacazam') >>> a # unique letters in a set(['a', 'r', 'b', 'c', 'd']) >>> a - b # letters in a but not in b set(['r', 'd', 'b']) >>> a | b # letters in either a or b set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l']) >>> a & b # letters in both a and b set(['a', 'c']) >>> a ^ b # letters in a or b but not both set(['r', 'd', 'b', 'm', 'z', 'l'])

• Tuples: immutable list>>> t = 12345, 54321, 'hello!' >>> t[0] # 12345

10

Objects (and their methods)

Page 11: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Control Flow Statements and Syntax • Indentation (not “ { }”) is Python's way of grouping

statements. Each line within a basic block must be indented by the same amount

• if statements:if x < 0:

print “x <0”elif x ==0:

print ‘zero’else:

print ‘more’

• range() functionrange(10)#[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] range(0, 10, 3) # [0,3,6,9]

• loop:for x in range(9):

print x

b = 14while b > 10:

print bb = b-1

• break: breaks out of the smallest enclosing loop• continue: continues with the next iteration of the

loop• else: executed when the loop terminates

• Example:

>>> for n in range(2, 10):... for x in range(2, n):... if n % x == 0:... print n, 'equals', x, '*', n/x... break... else:... # loop fell through without finding a factor... print n, 'is a prime number'... 2 is a prime number3 is a prime number4 equals 2 * 25 is a prime number6 equals 2 * 37 is a prime number8 equals 2 * 49 equals 3 * 3

11

Page 12: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Input and Output • File IO

>>> f=open(‘tmp.txt’, ‘w’,0) Open a file. The mode can be 'r', 'w' or 'a' for reading (default), writing or appending. The file will be created if it doesn't exist when opened for writing or appending; it will be truncated when opened for writing.>>> filename = ‘infile.txt’>>> file_object = open(filename,‘r’) # create file object>>> for line in file_object:

line = line.rsplit() #splits the line into a list #do something with the line

• raw_input and inputraw_input() collects the characters the user types and presents them as a string, whereas input() collects them and tries to evaluate them as some kind of data.>>> print raw_input("Type something: ")>>> print input("Type a number: ")

• Output Formatting :>>> outfile = open(‘filename.txt’,‘w’)>>> outfile.write(‘\nthis is now on a new line’+str(3)) #note the \n and conversion of the integer object 3 to a string object ‘3’ with the str() command

12

Page 13: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

More on Syntax: Printing variables within strings

• >>> a="spam“

>>> b=“eggs“ >>> "a meal of %s and %s with chips" % (a, b)

• >>> "a meal of %d eggs with chips" % 3• >>> "I like %f spoonfuls of sugar in my coffee" % (3/2.0)• >>> "I like %.1f spoonfuls of sugar in my coffee" %

(3/2.0)• >>> c =17.5

>>> "The percentage is %.1f%%" % c

13

Page 14: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

More on List

>>> help(list)• append(x)

– Add an item to the end of the list; equivalent to a[len(a):] = [x]. • extend(L)

– Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L. • insert(i, x)

– Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x).

• remove(x)– Remove the first item from the list whose value is x. It is an error if there is no such item.

• pop([i])– Remove the item at the given position in the list, and return it. If no index is specified, a.pop() returns the

last item in the list. The item is also removed from the list. (The square brackets around the i in the method signature denote that the parameter is optional, not that you should type square brackets at that position.)

• index(x)– Return the index in the list of the first item whose value is x. It is an error if there is no such item.

• count(x)– Return the number of times x appears in the list.

• sort()– Sort the items of the list, in place.

• reverse()– Reverse the elements of the list, in place.

14

Page 15: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

List Comprehensions

lista = [2, 4, 6][3*x for x in lista] # [6, 12, 18] [3*x for x in lista if x > 3] # [12, 18] [3*x for x in lista if x < 2] # [] [[x,x**2] for x in lista] #[[2, 4], [4, 16], [6, 36]]

# sort the dictionary by valuestel = {2: 4098, 1: 4139, 4:3333}tellist = [(v, k) for (k, v) in tel.items()]tellist.sort() # [(3333, 4), (4098, 2), (4139, 1)]

15

Page 16: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Functions

def test (p1, p2 =5+3, *args, **kwargs): ''‘test function ''‘ print p1 + p2, args, kwargs

• args with "=" have a default value (evaluated at function definition time). test(9) # 17 () {}

• If arg list has "*args" then args is assigned a tuple of all remaining non-keywords args passed to the function. test(9,3,4,5,6) # 12 (4, 5, 6) {}

• If list has "**kwargs" then kwargs is assigned a dictionary of all extra arguments passed as keywords. test(9,3,4,5,6, a=3, b=4) # 12 (4, 5, 6) {'a': 3, 'b': 4}

• Function documentationhelp(test) # test function

16

Page 17: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Errors and Exceptions

• Syntax Errors >>> while True print 'Hello world'

• Exceptions - Errors detected during execution are called exceptions >>> 1/0 # ZeroDivisionError: integer division or modulo by zero>>> ‘2’ + 3 # TypeError: cannot concatenate 'str' and 'int' objects

• Raising Exceptions>>> a = -1>>> if a <0:

raise ‘negative number’

• Handling Exceptions... try:... a=open(‘testtestest')#file not exist... except :... print "Oops! no such a file“ # or pass

17

Page 18: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Modules

• A module is a file containing Python Variables and Functions. The file name is the module name with the suffix .py appended

• Python Library Reference http://www.python.org/doc/current/lib/lib.html

• Import modules>>> import math>>> math.sqrt(9) #3or>>> from math import sqrt>>> sqrt(9) #3

18

Page 19: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Useful Modules

• Modules– Communicate with the interpreter - sys– Communicate to the OS – os– Standard math operations - math– Regular expression - re– Internet access - urllib; – Random number generator - random– Python interface to the R Language - rpy

19

Page 20: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Important Module: sys

Variable Content

argv The list of command line arguments passed to a Python script. sys.argv[0] is the script name.

modules Dictionary of modules that have already been loaded.

path Search path for external modules. Can be modified by program. sys.path[0] == directory of script currently executed.

stdin, stdout, stderr

File objects used for I/O. One can redirect by assigning a new file object to them

Function Result

exit(n) Exits with status n (usually 0 means OK). Raises SystemExit exception (hence can be caught and ignored by program)

exc_info() Info on exception currently being handled; this is a tuple (exc_type, exc_value, exc_traceback). Warning: assigning the traceback return value to a local variable in a function handling an exception will cause a circular reference. try: 1/0 # ZeroDivisionErrorexcept: print sys.exc_info()

20

Page 21: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Important Module: os

Function Result

system(command) Execute a command on the system shell (bash)

chdir(path) Changes current directory to path.

chmod(path, mode) Changes the mode of path to the numeric mode

close(fd) Closes file descriptor fd opened with posix.open.

getcwd() Returns a string representing the current working directory.

getpid() Returns the current process id.

listdir(path) Lists (base)names of entries in directory path, excluding '.' and '..'. # similar function – glob.glob: Return a list of paths matching a pathname pattern

mkdir(path[, mode]) Creates a directory named path with numeric mode (default 0777).

remove(path) See unlink.

rename(old, new) Renames/moves the file or directory old to new. [error if target name already exists]

rmdir(path) Removes the empty directory path

system(command) Executes string command in a subshell. Returns exit status of subshell (usually 0 means OK).

unlink(path) Unlinks ("deletes") the file (not dir!) path. Same as: remove.

21

Page 22: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Some Exercises

22

Page 23: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Ex1. Greeting people

• Write a program that asks two people for their names; stores the names in variables called name1 and name2; says hello to both of them.

23

Page 24: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Ex2. MyGenome Size

• Write a program that asks your genome size in base pairs and prints back the size of your genome in bytes. Can you compress your genome even more? What clever way would you use? How many qbits?

24

Page 25: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Ex3. Check input type

• Ask the user to type an integer, a float or a string(in quotes). Use "input" instead of "raw_input". Check the type of the user-input in the following manner: include the line "import types" at the beginning of your script. Then compare the type of the user-input to the objects types.IntType, types.FloatType and types.StringType. Print "The input was an integer", "the input was a real number", "the input was a string", respectively.

25

Page 26: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Ex4. Guess the Lucky Number

• Write a program that asks you five times to guess the lucky number! The program asks for five guesses. If the correct number is guessed, the program outputs "Good guess!“ and stops, otherwise it outputs "Try again!". After the five incorrect guesses it stops and prints "Game over."

26

Page 27: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Monty Hall Problem

Suppose you’re on a game show, and you’re given a choice of three doors: Behind one door is a car; behind the others, goats.  You pick a door, say number 3, and the host, who knows what’s behind the doors, opens another door, say number 2, which has a goat.  He says to you, ‘Do you want to pick door number 1?’  Is it to your advantage to switch your choice of doors?

27

Page 28: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Monty Hall Problem

• Run montyhall.py to see the results.• Read montyhall.py and try to understand what

did the program do?

Visual Simulation.

Python source.

28

Page 29: Python Tutorial  I Jan 25, 2012

STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology

Monty Hall Problem

• Run montyhall.py to see the results

• Read montyhall.py and try to understand what did the program do?

29