Upload
tatyana-blanchard
View
50
Download
2
Embed Size (px)
DESCRIPTION
Perl for Bioinformatics Part 2. Stuart Brown NYU School of Medicine. Sources. Beginning Perl for Bioinformatics James Tisdall, O’Reilly Press, 2000 Using Perl to Facilitate Biological Analysis in Bioinformatics: A Practical Guide (2nd Ed.) Lincoln Stein, Wiley-Interscience, 2001 - PowerPoint PPT Presentation
Citation preview
Perl for BioinformaticsPart 2
Stuart Brown
NYU School of Medicine
Sources
• Beginning Perl for Bioinformatics– James Tisdall, O’Reilly Press, 2000
• Using Perl to Facilitate Biological Analysis in Bioinformatics: A Practical Guide (2nd Ed.)– Lincoln Stein, Wiley-Interscience, 2001
• Introduction to Programming and Perl– Alan M. Durham, Computer Science Dept., Univ. of São Paulo, Brazil
Debugging
• Hopefully you were lucky enough to have some bugs in your programs from the first Perl exercise.
• Test each line as you write – insert extra print statements to check on
variables
Perl Debugging Help
• Add -w on the first line of your programs:
#!usr/local/perl -w– provides ‘warnings’
• Add use strict as the 2nd line of your programs– enforces proper variable names– must initialize variables before using
(set to some initialvalue such as 0 or empty)
Variable “Interpolation”• A variable holds a value $value = 6;• When you print the variable, Perl gives the value
rather than the name of the variable.print $value;
6 • If you put a variable inside double quotes, Perl
substitutes the value (this is called variable interpolation)print “The result is $value\n”
The result is 6• If you use single quotes, the variable name is used
(interpolation is not used) print ‘The result is $value\n’
The result is $value\n
Input
• A Perl program can take input from the keyboard– The angle bracket operator (<>)takes input– Usually this is assigned to a variable
print “Please type a number: ”;
$num = <>;
print “Your number is $num\n”;
chomp• When data is entered from the keyboard, Perl waits for the
Enter key to be typed
• But the string which is captured includes a newline (carriage return) at its end
• Perl uses the function chomp to remove the newline character:
print “Enter your name: ”;
$name = <>;
print “Hello $name, happy to meet you!\n”;
chomp $name;
print “Hello $name, happy to meet you!\n”;
Working with Text Files
• To do real work, Perl has to read data out of text files and write results into output files
• This is done in two steps
• First, you must give the file a name within the script - this is known as a filehandle
• Use the open command:
open FILE1, ‘/u/schmoj01/Seqs/protein1.seq’;
Read From the File
• Once the file is open, you can read from it using the <> operator – (put the filehandle between the angle brackets)
• Perl reads files one line at a time, each time you input data from the file, the next line is read:
open FILE1, ‘/u/prot1.seq’;$line1 = <FILE1>;chomp $line1;$line2 = <FILE1>;
…etc
Write to a File
• Writing to a file is similar to reading from it
• Use the > operator to open a file for writing:
open FILE1, ‘>/u/prot1.seq’;
• This creates a new file with that name, or overwrites an existing file
• Use >> to append text to an existing file• print to the file using the filehandle:
print FILE1 $data1;
Making Decisons
• Useful programs must be able to make some decisions on their own
• The if operator is very powerful
• It is generally used together with numerical or string comparison operators
numerical: ==, !=, >, <, ≥, ≤
strings: eq, ne, gt, lt, ge, le
True/False
• Perl relies on the concept of True/False decisions.
• Things are true if the math works.
• The not operator ! reverses it
print “positive number” if ! ($a < 0);
Conditional Blocks• An if test can be used to control multiple lines
of commands:print “Enter your age: ”;$age = <>;chomp $age;if ($age < 21) { print “You are too young for this kind of work!\n”; die “too young”;
}print “You are old enough to know better!\n”;
• If the test is true, execute all the command lines inside the {} brackets. If not, then go on past the closing } to the statements below.
• If evaluates some statement in parentheses (must be true or false)
• Note: conditional block is indented– Perl doesn’t care about indents, but it makes your
code more human readable
• die is a special function - stops your script and prints its message– Often used to test if keyboard input data is valid
or if an input file exists.
Else & Elseif• Instead of just letting the script go on if it fails the if
test, you can designate a second block of code for the “or else” condition
• You can also perform multiple tests using elseifif $A = 10 {
print “yadda yadda”; # do some stuff} elseif $A > 10 {
print “yowsa yowsa”; # do different stuff} elseif $A < 10 {
print “do this other stuff”;} else $A {
print “if it ain\’t =, >, or <, then I’m stumped”die “not a number”;
}
Loops• OK, we’ve got variables, input & output and
decisions. Now we need Loops.
• Loops test a condition and repeat a block of code based on the result– while loops repeat while the condition is true
$count = 1;while ($count <= 10) {
print “$count bottles of pop\n”;$count = $count +1;
};print “POP!\n”;
[Try this program yourself]
Read a File: line by line
open FILE1, ‘/u/doej01/prot1.seq’;while ($line = <FILE1>){ chomp($line);
$my_sequence = $my_sequence .
$line;};close FILE1
• Dumps the whole file into the variable $my_sequence
Arrays• It is awkward to store a large DNA sequence in
one variable, or to create many variables for a list of numbers
• Perl has a type of variable called an “array” that can store a list of data– multiple lines of a text file– a list of numbers– a list of words
• Array variables are referred to with an “@” symbol
@numbers = (1,2,45,234,11);
Bioinformatics Uses Arrays
• bioinformatics data often comes in the form of arrays– tab delimited lists– multi-line text files
• Arrays are handy because the entries are indexed– You can grab the third number directly
@numbers = (1, 2, 45, 234, 11);print “$numbers[3]\n”;
234#Note - the index starts with zero!
Read a File into an Array
• Rather than read a file one line at time into a scalar variable, it is often helpful to read the entire file into an array
open FILE1, ‘/u/doej01/prot1.seq’;@DNA = <FILE1>;
• join combines the elements of an array into a single scalar variable (a string)
$DNA = join('', @DNA);
• substr takes characters out of a string
$letter = substr($DNA, $position, 1)
join & substr
which string where in the string
how many letters to take
which arrayspacer(empty here)
Exercise
• Read a DNA sequence from a text file
• Calculate the %GC content
• What about non-DNA characters in the file?– carriage returns and blank spaces– N’s or X’s or unexpected letters
• Write the output to the screen and to a file – use append so that the file will grow as you run
this program on additional sequences