Upload
gloria
View
37
Download
0
Embed Size (px)
DESCRIPTION
Python Tutorial I Jan 25, 2012. Daniel Fernandez and Alejandro Quiroz [email protected] [email protected]. Outline. Introduction Getting started with Python available resources, the environment Data types and operations numbers, strings, lists, dictionaries, sets, tuples - PowerPoint PPT Presentation
Citation preview
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Python Tutorial IJan 25, 2012
Daniel Fernandez and Alejandro Quiroz
1
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Outline
• Introduction• Getting started with Python
– available resources, the environment• Data types and operations
– numbers, strings, lists, dictionaries, sets, tuples• Useful statements
– if/else ladders, for and while loops• File input/output• Functions• Errors and Exceptions• Modules• Exercises
2
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
What is Python?
3
“When Guido began implementing Python, Guido van Rossum was also reading the published scripts from “Monty Python’s Flying Circus”, a BBC comedy series from the 1970s. Van Rossum
thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python”
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
What is Python?
• Interpreted, interactive, object-oriented, portable programming language
• Easy to use & easy to learn Full-featured help. Just type help(object/method)
• Offers two to ten fold programmer productivity increases over languages like C, C++, Java, Visual Basic (VB), and Perl
"Python has been an important part of Google since the beginning, and remains so as the system grows and evolves. Today dozens of Google engineers use Python, and
we're looking for more people with skills in this language." said Peter Norvig, director of search quality at Google, Inc.
http://www.python.org/Quotes.html
4
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Why Python?
C/C++
Perl
R
Python
5
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Resources• www.stat115.com go to the python left menu to check more resources/tutorials.
• General resourceshttp://www.python.org/
• An enhanced Interactive Python shell http://ipython.scipy.org/
• Python IDE: – (recommended) wingware: http://wingware.com/– IDLE (free): http://www.python.org/idle/– Komodo (free trial): http://www.activestate.com/Products/Komodo/
• Python tools for computational molecular biologyhttp://www.biopython.org/
• Fast array manipulationhttp://www.stsci.edu/resources/software_hardware/numarray
6
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Python Environment
• Interactive Mode:– simply type the command ‘python’ at UNIX machine/DOS– commands are read from a terminal
primary prompt, usually three greater-than signs (">>> ") – help() -- Enter the name of any module, keyword, or topic to get help on
using Python, e.g. math
• Stand-alone mode: – Make a file hello.py:
#! /usr/bin/pythonprint “hello”
– Run:python hello.py Or chmod +x hello.py./hello.py
7
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
OOP: Objects (or Data types) and their methods (or Operations)
• Numbers ()>>> 2+2 # 4>>> (50-5*6)/4 # 5
# sign ("=") is used to assign a value to a variable >>> width = 20 >>> height = 5*9 >>> width * height #900
# float or integer>>> 3 / 2 # 1>>> float(3)/2 # 1.5>>> 3.0/2 # 1.5
# add ‘L’ suffix for long integer>>> 2**2 # 4>>> 9**20#12157665459056928801L
• Strings () can be enclosed by single or double or triple quotes (when we want to have a single quote ‘ inside a string, use double quotes, or use escape characters with single quotes)
# Strings can be indexed, sliced and # concatenated (from 0)>>> word = ’’This is a rather long string‘’# The first two characters ‘Th' >>> word[:2]>>> word[2:4] # ‘is'
# All but the first two characters>>> word[2:] # ‘is is a rather long string’>>> 'x' + word[1:] # ‘xhis is a rather long string’>>> word[-1] # the last character ‘g’>>> word[-2:]# the last two characters ‘ng’
# creating a new string>>> word[:2] +'at' # ‘That'
8
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Objects (and their methods)
• Lists (append, extend, insert,…)>>> a = ['spam', 'eggs', 100, 1234]
– Like string indices, list indices start at 0, and lists can be sliced and concatenated
– Unlike strings, which are immutable, it is possible to change individual elements of a list
– The built-in function len() applies to lists: – from string
>>>x=‘a b c d e f’.split() # x=['a', 'b', 'c', 'd', 'e', 'f']
• Dictionaries () >>> tel = {'jack': 4098, 'sape': 4139} >>> tel['guido'] = 4127 # {'sape': 4139, 'guido': 4127, 'jack': 4098}>>> del tel['sape'] # {'jack': 4098, 'guido': 4127}>>> tel.keys() # ['jack', 'guido']>>> tel.items() # [('jack', 4098), ('guido', 4127)]
9
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
• Sets: unordered collection with no duplicate elements.>>> a = set('abracadabra') >>> b = set('alacazam') >>> a # unique letters in a set(['a', 'r', 'b', 'c', 'd']) >>> a - b # letters in a but not in b set(['r', 'd', 'b']) >>> a | b # letters in either a or b set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l']) >>> a & b # letters in both a and b set(['a', 'c']) >>> a ^ b # letters in a or b but not both set(['r', 'd', 'b', 'm', 'z', 'l'])
• Tuples: immutable list>>> t = 12345, 54321, 'hello!' >>> t[0] # 12345
10
Objects (and their methods)
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Control Flow Statements and Syntax • Indentation (not “ { }”) is Python's way of grouping
statements. Each line within a basic block must be indented by the same amount
• if statements:if x < 0:
print “x <0”elif x ==0:
print ‘zero’else:
print ‘more’
• range() functionrange(10)#[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] range(0, 10, 3) # [0,3,6,9]
• loop:for x in range(9):
print x
b = 14while b > 10:
print bb = b-1
• break: breaks out of the smallest enclosing loop• continue: continues with the next iteration of the
loop• else: executed when the loop terminates
• Example:
>>> for n in range(2, 10):... for x in range(2, n):... if n % x == 0:... print n, 'equals', x, '*', n/x... break... else:... # loop fell through without finding a factor... print n, 'is a prime number'... 2 is a prime number3 is a prime number4 equals 2 * 25 is a prime number6 equals 2 * 37 is a prime number8 equals 2 * 49 equals 3 * 3
11
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Input and Output • File IO
>>> f=open(‘tmp.txt’, ‘w’,0) Open a file. The mode can be 'r', 'w' or 'a' for reading (default), writing or appending. The file will be created if it doesn't exist when opened for writing or appending; it will be truncated when opened for writing.>>> filename = ‘infile.txt’>>> file_object = open(filename,‘r’) # create file object>>> for line in file_object:
line = line.rsplit() #splits the line into a list #do something with the line
• raw_input and inputraw_input() collects the characters the user types and presents them as a string, whereas input() collects them and tries to evaluate them as some kind of data.>>> print raw_input("Type something: ")>>> print input("Type a number: ")
• Output Formatting :>>> outfile = open(‘filename.txt’,‘w’)>>> outfile.write(‘\nthis is now on a new line’+str(3)) #note the \n and conversion of the integer object 3 to a string object ‘3’ with the str() command
12
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
More on Syntax: Printing variables within strings
• >>> a="spam“
>>> b=“eggs“ >>> "a meal of %s and %s with chips" % (a, b)
• >>> "a meal of %d eggs with chips" % 3• >>> "I like %f spoonfuls of sugar in my coffee" % (3/2.0)• >>> "I like %.1f spoonfuls of sugar in my coffee" %
(3/2.0)• >>> c =17.5
>>> "The percentage is %.1f%%" % c
13
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
More on List
>>> help(list)• append(x)
– Add an item to the end of the list; equivalent to a[len(a):] = [x]. • extend(L)
– Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L. • insert(i, x)
– Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x).
• remove(x)– Remove the first item from the list whose value is x. It is an error if there is no such item.
• pop([i])– Remove the item at the given position in the list, and return it. If no index is specified, a.pop() returns the
last item in the list. The item is also removed from the list. (The square brackets around the i in the method signature denote that the parameter is optional, not that you should type square brackets at that position.)
• index(x)– Return the index in the list of the first item whose value is x. It is an error if there is no such item.
• count(x)– Return the number of times x appears in the list.
• sort()– Sort the items of the list, in place.
• reverse()– Reverse the elements of the list, in place.
14
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
List Comprehensions
lista = [2, 4, 6][3*x for x in lista] # [6, 12, 18] [3*x for x in lista if x > 3] # [12, 18] [3*x for x in lista if x < 2] # [] [[x,x**2] for x in lista] #[[2, 4], [4, 16], [6, 36]]
# sort the dictionary by valuestel = {2: 4098, 1: 4139, 4:3333}tellist = [(v, k) for (k, v) in tel.items()]tellist.sort() # [(3333, 4), (4098, 2), (4139, 1)]
15
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Functions
def test (p1, p2 =5+3, *args, **kwargs): ''‘test function ''‘ print p1 + p2, args, kwargs
• args with "=" have a default value (evaluated at function definition time). test(9) # 17 () {}
• If arg list has "*args" then args is assigned a tuple of all remaining non-keywords args passed to the function. test(9,3,4,5,6) # 12 (4, 5, 6) {}
• If list has "**kwargs" then kwargs is assigned a dictionary of all extra arguments passed as keywords. test(9,3,4,5,6, a=3, b=4) # 12 (4, 5, 6) {'a': 3, 'b': 4}
• Function documentationhelp(test) # test function
16
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Errors and Exceptions
• Syntax Errors >>> while True print 'Hello world'
• Exceptions - Errors detected during execution are called exceptions >>> 1/0 # ZeroDivisionError: integer division or modulo by zero>>> ‘2’ + 3 # TypeError: cannot concatenate 'str' and 'int' objects
• Raising Exceptions>>> a = -1>>> if a <0:
raise ‘negative number’
• Handling Exceptions... try:... a=open(‘testtestest')#file not exist... except :... print "Oops! no such a file“ # or pass
17
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Modules
• A module is a file containing Python Variables and Functions. The file name is the module name with the suffix .py appended
• Python Library Reference http://www.python.org/doc/current/lib/lib.html
• Import modules>>> import math>>> math.sqrt(9) #3or>>> from math import sqrt>>> sqrt(9) #3
18
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Useful Modules
• Modules– Communicate with the interpreter - sys– Communicate to the OS – os– Standard math operations - math– Regular expression - re– Internet access - urllib; – Random number generator - random– Python interface to the R Language - rpy
19
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Important Module: sys
Variable Content
argv The list of command line arguments passed to a Python script. sys.argv[0] is the script name.
modules Dictionary of modules that have already been loaded.
path Search path for external modules. Can be modified by program. sys.path[0] == directory of script currently executed.
stdin, stdout, stderr
File objects used for I/O. One can redirect by assigning a new file object to them
Function Result
exit(n) Exits with status n (usually 0 means OK). Raises SystemExit exception (hence can be caught and ignored by program)
exc_info() Info on exception currently being handled; this is a tuple (exc_type, exc_value, exc_traceback). Warning: assigning the traceback return value to a local variable in a function handling an exception will cause a circular reference. try: 1/0 # ZeroDivisionErrorexcept: print sys.exc_info()
20
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Important Module: os
Function Result
system(command) Execute a command on the system shell (bash)
chdir(path) Changes current directory to path.
chmod(path, mode) Changes the mode of path to the numeric mode
close(fd) Closes file descriptor fd opened with posix.open.
getcwd() Returns a string representing the current working directory.
getpid() Returns the current process id.
listdir(path) Lists (base)names of entries in directory path, excluding '.' and '..'. # similar function – glob.glob: Return a list of paths matching a pathname pattern
mkdir(path[, mode]) Creates a directory named path with numeric mode (default 0777).
remove(path) See unlink.
rename(old, new) Renames/moves the file or directory old to new. [error if target name already exists]
rmdir(path) Removes the empty directory path
system(command) Executes string command in a subshell. Returns exit status of subshell (usually 0 means OK).
unlink(path) Unlinks ("deletes") the file (not dir!) path. Same as: remove.
21
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Some Exercises
22
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Ex1. Greeting people
• Write a program that asks two people for their names; stores the names in variables called name1 and name2; says hello to both of them.
23
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Ex2. MyGenome Size
• Write a program that asks your genome size in base pairs and prints back the size of your genome in bytes. Can you compress your genome even more? What clever way would you use? How many qbits?
24
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Ex3. Check input type
• Ask the user to type an integer, a float or a string(in quotes). Use "input" instead of "raw_input". Check the type of the user-input in the following manner: include the line "import types" at the beginning of your script. Then compare the type of the user-input to the objects types.IntType, types.FloatType and types.StringType. Print "The input was an integer", "the input was a real number", "the input was a string", respectively.
25
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Ex4. Guess the Lucky Number
• Write a program that asks you five times to guess the lucky number! The program asks for five guesses. If the correct number is guessed, the program outputs "Good guess!“ and stops, otherwise it outputs "Try again!". After the five incorrect guesses it stops and prints "Game over."
26
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Monty Hall Problem
Suppose you’re on a game show, and you’re given a choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say number 3, and the host, who knows what’s behind the doors, opens another door, say number 2, which has a goat. He says to you, ‘Do you want to pick door number 1?’ Is it to your advantage to switch your choice of doors?
27
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Monty Hall Problem
• Run montyhall.py to see the results.• Read montyhall.py and try to understand what
did the program do?
Visual Simulation.
Python source.
28
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology
Monty Hall Problem
• Run montyhall.py to see the results
• Read montyhall.py and try to understand what did the program do?
29