Upload
aldous-harper
View
249
Download
2
Embed Size (px)
1
Introduction to Perl scripting
Part 1 basic perl
2
What is Perl?
Scripting language Practical Extraction and Reporting
Language Pathologically Eclectic Rubbish Lister 病态折中式电子列表器
3
How do I use Perl?
$ vi hello.plprint “hello world\n”;
$ perl hello.plhello world
$ vi add.plprint $ARGV[0] + $ARGV[1], “\n”;
$ perl add.pl 17 2542
4
Why Perl?
FAST text processing Simple Scripting language Cross-platform Many extensions for Biological data
5
TMTOWTDI
Motto: TMTOWTDI (There’s More Than One Way To Do It)
This can be frustrating to new users Focus on understanding what you are
doing, don’t worry about all the other ways yet.
6
Getting started
Primitives– String - “string”, ‘string’– Numeric - 10, 12e4, 1e-3, 120.0123
Data types– scalars - $var = “a”; $num = 10;– lists - @lst = (‘apple’, ‘orange’)– hashes - %hash=(1:’apple’, 2:’orange’)
7
Starter Code
# assign a variable$var = 12;print “var is $var\n”;
# concatenate strings$x = “Alice”;$y = $x . “ & Alex are cousins\n”;print $y;
# print can print lists of variablesprint $y, “var is “, $var, “\n“;
8
Tidbits
To print to screen– print “string”
Special chars – newline - “\n”– tab “\t”
strings and numeric conversion automatic All about context
9
Math
Standard arithmetic +, -, *, / mod operator %
- 4 % 2 = 0; 5 % 2 = 1 Operate on in place: $num += 3 Increment variable, $a++, $a-- power ** 25 = 2**5 sqrt(9) loge(5) = log(5)
- log10(100) = log(100) / log(10)
10
Precision
Round down int ($x) Round up POSIX::ceil( $x ) Round down POSIX::floor( $x ) Formatted printing printf/sprintf
– %d, %f, %5.2f, %g, %e– More coverage later one
11
Some Math Code
# Pythagorean theoremmy $a = 3; my $b = 4;my $c = sqrt($a**2 + $b**2);
# what’s left over from the divisionmy $x = 22; my $y = 6;my $div = int ( $x / $y );my $mod = $x % $y;print $div, “ “, $mod, “\n”;
output: 3 4
12
Logic & Equality
if / unless / elsif / else– if( TEST ) { DO SOMETHING }
elsif( TEST ) { SOMETHING ELSE }else { DO SOMETHING ELSE IN CASE }
Equality: == (numbers) and eq (strings) Less/Greater than: <, <=, >, >=
– lt, le, gt, ge for string (lexical) comparisons
13
Testing equality
$str1 = “mumbo”;$str2 = “jumbo”;
if( $str1 eq $str2 ) { print “strings are equal\n”;}if( $str1 lt $str2 ) { print “less” }else { print “more\n”;
if( $y >= $x ) { print “y is greater or equal\n”;}
14
Boolean Logic
AND – && and
OR– || or
NOT– ! not
if( $a > 10 && $a <= 20) { }
15
Loops
while( TEST ) { }until( ! TEST ) { }
for( $i = 0 ; $i < 10; $i++ ) {}
foreach $item ( @list ) { } for $item ( @list ) { }
16
Using logic
for( $i = 0; $i < 20; $i++ ) { if( $i == 0 { print “$i is 0\n”; } elsif( $i / 2 == 0) { print “$i is even\n”; } else { print “$i is odd }}
17
What is truth?
True– if( “zero” ) {}– if( 23 || -1 || ! 0) {}– $x = “0 or none”; if( $x )
False– if( 0 || undef || ‘’ || “0” ) { }
18
Special variables
This is why many people dislike Perl Too many little silly things to remember
perldoc perlvar for detailed info
19
Some special variables
$! - error messages here $, - separator when doing print “@array”; $/ - record delimiter (“\n” usually) $a,$b - used in sorting $_ - implicit variable perldoc perlvar for more info
20
The Implicit variable
Implicit variable is $_ for ( @list ) { print $_ } while(<IN>) { print $_}
21
Input/Output: Getting and Writing Data
22
Getting Data from Files
open(HANDLE, “filename”) || die $!$line1 = <HANDLE>;while(defined($line = <HANDLE>)) { if( $line eq ‘line stuff’ ) { }}
open(HANDLE, “filename”) || die $!while(<HANDLE>){ print “line is $_”;}
open(HANDLE, “filename”) || die $!@slurp = <HANDLE>;
23
Data from Streams
while(<STDIN>) { print “stdin read: $_”;}
open(GREP, “grep ‘>’ $filename”) || die $!;my $i = 0;while(<GREP>) { $i++;}close(GREP);print “$i sequences in file\n”;
24
Can pass data into a program
while(<STDIN>) { print “stdin read: $_”;}open(GREP, “grep ‘>’ $filename”) || die $!;my $i = 0;while(<GREP>) { $i++;}close(GREP);print “$i sequences in file\n”;
25
Writing out data
open(OUT, “>outname”) || die $!;print OUT “sequence report\n”;close(OUT);
# appending with >>open(OUT, “>>outname”) || die $!;print OUT “appended this\n”;close(OUT);
26
Filehandles as variables
$var = \*STDIN open($fh, “>report.txt”) || die $!;
print $fh “line 1\n”;
open($fh2, “report”) || die $!;$fh = $fh2while(<$fh>) { }
27
String manipulation
28
Some string functions
. - concatenate strings– $together = $one . “ “. $two;
reverse - reverse a string (or array) length - get length of a string uc - uppercase or lc - lowercase a string
29
split/join
split: separate a string into a list based on a delimiter– @lst = split(“-”, “hello-there-mr-frog”);
join: make string from list using delimiter– $str = join(“ “, @lst);– Solves fencepost problem nicely
(want to put something between each pair of items in a list)
print join(“\t”, @lst),”\n”;
30
index
index(STRING, SUBSTRING, [STARTINGPOS]) Find the position of a substring within a string (left
to right scanning) $codon = ‘ATG’;
$str = AGCGCATCGCATGGCGATGCAGATG$first = index($str,$codon);$second = index($str, $codon, $first + length($codon));
rindex Same as index, but Right to Left scanning
31
substr
substr(STRING, START,[LENGTH],[REPLACE]);
Extract a substring from a larger string $orf = substr($str,10,40);
$end = substr($str,40); # get end Replace string
– substr($str,21,10,’NNNNNNNNNNN’);
32
Zero based economy...
1st number is ‘0’ for an index or 1st character in a string
– most programming languages Biologists often number 1st base in a
sequence as ‘1’ (GenBank, BioPerl) Interbase coordinates (Kent-UCSC,
Chado-GMOD)
33
Coordinate systems
Zero based, interbase coordinates A T G G G T A G A0 1 2 3 4 5 6 7 8 9
1 based coordinatesA T G G G T A G A1 2 3 4 5 6 7 8 9
34
Arrays and Lists
Lists are sets of items Can be mixed types of scalars (numbers,
strings, floats) Perl uses lists extensively Variables are prefixed by @
35
List operations
reverse - reverse list order $list[$n] - get the $n-th item
– $two = $list[2]; scalar - get length of array
– $len = scalar @list;– $last_index = $#list
delete $list[10] - delete entry
36
Autovivication
Automatically allocate space for an item $array[0] = ‘apple’;
print scalar @array, “ ”;$array[4] = ‘elephant’;$array[25] = ‘zebra fish’;print scalar @array, “ ”;delete $array[25];print scalar @array, “\n”;output:1 26 5
37
pop,push,shift,unshift
# remove last item$last = pop @list;
# remove first item$first = shift @list;
# add to end of listpush @list, $last;
# add to beginning of listunshift @list, $first;
38
splicing an array
splice ARRAY,OFFSET,LENGTH,LISTsplice ARRAY,OFFSET,LENGTHsplice ARRAY,OFFSETsplice ARRAY
@list = (‘alice’,’chad’,’rod’);($x,$y) = splice(@list,1,2);splice(@list, 1,0, (‘marvin’,’alex’));newlist: (‘alice’,’marvin’,’alex’,’chad’,’rod’);
39
Sorting with sort
@list = (‘tree’,’frog’, ‘log’);@sorted = sort @list;# reverse order@sorted = sort { $b cmp $a } @list;
# sort based on numerics@list = (25,21,12,17,9,8);@sorted = sort { $a <=> $b } @list;
# reverse order of sort@revsorted = sort { $b <=> $a } @list;
40
How would you sort based on part of string in list?
41
@list = (‘E1’,’F3’,‘A2’);@sorted = sort @list; # sort lexical
@sorted = sort { substr($a,1,1) <=> substr($b,1,1) } @list;
42
Filter with grep
@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);
@sl = grep { length($_) == 3} @list;
@oo = grep { index($_,”oo”) >= 0 } @list;# use it to countmy $ct = grep { substr($_,1,1) eq ‘a’} @list;
43
Transforming with map
@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);
@lens = map { length($_) } @list;
@upper = map { $fch = substr($_,0,1); substr($_,0,1,uc($fch)) } @list
44
More list action
@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);
for $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; }}
45
Sort complicated stuff
# want to sort these by gene number@list = (‘CG1000.1’, ‘CG0789.1’, ‘CG0321.1’, ‘CG1227.2’);@sorted = sort { ($locus_a) = split(/\./,$a); ($locus_b) = split(/\./,$b); substr($locus_a,0,2,’’); substr($locus_b,0,2,’’); $locus_a cmp $locus_b; } @list;print “sorted are “,join(“,”,@sorted), “\n”;
46
Scope
The section of program a variable is valid for
Defined by braces { } use strict; Use ‘my’ to declare variables
#!/usr/bin/perl -wuse strict;
my $var = 10;my $var2 = ‘monkey’;print “(outside) var is $var\n”. “(outside) var2 is $var2\n”;{ my $var; $var = 20; print “(inside) var is $var\n”; $var2 = ‘ape’; }print “(outside) var is $var\n”. “(outside) var2 is $var2\n”;
48
Good practices
Declare variables with ‘my’ Always ‘use strict’ ‘use warnings’ to get warnings
49
Let’s practice (old code)
@list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);
for $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; }}
50
Let’s practice
#!/usr/bin/perluse warningsuse strict;my @list = (‘aardvark’, ‘baboon’, ‘cat’, ‘dog’,’lamb’,’kangaroo’);
for my $animal ( @list ) { if( length($animal) <= 3 ) { print “$animal is noisy\n”; } else { print “$animal is quiet\n”; }}
51
Editors
vi filename – begin by using this editor
52
Make a perl script
$ pico hello.pl
#!/usr/bin/perlprint “hello world\n”;
[Control-O , enter, Control-X enter]
$ perl hello.plhello world$ chmod +x hello.pl$ ./hello.pl