40
1 Introduction to Perl Part II

1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

  • View
    230

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

1

Introduction to Perl

Part II

Page 2: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

2

Associative arrays or Hashes

Like arrays, but instead of numbers as indices can use strings

@array = (‘john’, ‘steve’, ‘aaron’, ‘max’, ‘juan’, ‘sue’)

%hash = ( ‘apple’ => 12, ‘pear’ => 3, ‘cherry’ =>30, ‘lemon’ => 2, ‘peach’ => 6, ‘kiwi’ => 3);

10 2 3 4 5

Array Hashpear

apple

cherry

lemon

peach

kiwi‘jo

hn

‘st

eve’

‘aaro

n’

‘m

ax’

‘ju

an

‘su

e’

123

30263

Page 3: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

3

Using hashes

{ } operator

Set a value– $hash{‘cherry’} = 10;

Access a value– print $hash{‘cherry’}, “\n”;

Remove an entry– delete $hash{‘cherry’};

Page 4: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

4

Get the Keys

keys function will return a list of the hash keys

@keys = keys %fruit;

for my $key ( keys %fruit ) { print “$key => $hash{$key}\n”;}

Would be ‘apple’, ‘pear’, ...

Order of keys is NOT guaranteed!

Page 5: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

5

Get just the values

@values = values %hash;

for my $val ( @values ) { print “val is $val\n”;}

Page 6: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

6

Iterate through a set

Order is not guaranteed!while( my ($key,$value) = each %hash){ print “$key => $value\n”;}

Page 7: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

7

Subroutines

Set of code that can be reused

Can also be referred to as procedures and functions

Page 8: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

8

Defining a subroutine

sub routine_name { }

Calling the routine– routine_name;

– &routine_name; (& is optional)

Page 9: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

9

Passing data to a subroutine

Pass in a list of data– &dosomething($var1,$var2);

– sub dosomething { my ($v1,$v2) = @_;}sub do2 { my $v1 = shift @_; my $v2 = shift;}

Page 10: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

10

Returning data from a subroutine

The last line of the routine set the return valuesub dothis { my $c = 10 + 20;}print dothis(), “\n”;

Can also use return specify return value and/or leave routine early

Page 11: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

sub is_stopcodon { my $val = shift @_; if( length($val) != 3 ) { return -1; } elsif( $val eq ‘TAA’ || $val eq ‘TAG’ || $val eq ‘TGA’ ) { return 1; } else { return 0; }

Write subroutine which returns true if codon is a stop codon (for standard

genetic code).-1 on error, 1 on true, 0 on false

Page 12: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

12

Context

array versus scalar context my $length = @lst;

my ($first) = @lst;

Want array used to report context subroutines are called in

Can force scalar context with scalarmy $len = scalar @lst;

Page 13: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

13

subroutine context

sub dostuff { if( wantarray ) { print “array/list context\n”; } else { print “scalar context\n”; }}

dostuff(); # scalarmy @a = dostuff(); # arraymy %h = dostuff(); # arraymy $s = dostuff(); # scalar

Page 14: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

14

Why do you care about context?

sub dostuff { my @r = (10,20); return @r;}

my @a = dostuff(); # arraymy %h = dostuff(); # arraymy $s = dostuff(); # scalar

print “@a\n”; # 10 20print join(“ “, keys %h),”\n”; # 10print “$s\n”; # 2

Page 15: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

15

References

Are “pointers” the data object instead of object itsself

Allows us to have a shorthand to refer to something and pass it around

Must “dereference” something to get its actual value, the “reference” is just a location in memory

Page 16: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

16

Reference Operators

\ in front of variable to get its memory location– my $ptr = \@vals;

[ ] for arrays, { } for hashes

Can assign a pointer directly– my $ptr = [ (‘owlmonkey’, ‘lemur’)];

my $hashptr = { ‘chrom’ => ‘III’, ‘start’ => 23};

Page 17: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

17

Dereferencing

Need to cast reference back to datatype

my @list = @$ptr;

my %hash = %$hashref;

Can also use ‘{ }’ to clarify– my @list = @{$ptr};

– my %hash = %{$hashref};

Page 18: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

18

Really they are not so hard...

my @list = (‘fugu’, ‘human’, ‘worm’, ‘fly’);

my $list_ref = \@list;

my $list_ref_copy = [@list];

for my $item ( @$list_ref ) { print “$item\n”;}

Page 19: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

19

Why use references?

Simplify argument passing to subroutines

Allows updating of data without making multiple copies

What if we wanted to pass in 2 arrays to a subroutine?

sub func { my (@v1,@v2) = @_; }

How do we know when one stops and another starts?

Page 20: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

20

Why use references?

Passing in two arrays to intermix.sub func {

my ($v1,$v2) = @_; my @mixed; while( @$v1 || @$v2 ) { push @mixed, shift @$v1 if @$v1; push @mixed, shift @$v2 if @$v2; } return \@mixed;}

Page 21: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

21

References also allow Arrays of Arrays

my @lst;push @lst, [‘milk’, ‘butter’, ‘cheese’];push @lst, [‘wine’, ‘sherry’, ‘port’];push @lst, [‘bread’, ‘bagels’, ‘croissants’];

my @matrix = [ [1, 0, 0], [0, 1, 0], [0, 0, 1] ];

Page 22: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

22

Hashes of arrays

$hash{‘dogs’} = [‘beagle’, ‘shepherd’, ‘lab’];$hash{‘cats’} = [‘calico’, ‘tabby’, ‘siamese’];$hash{‘fish’} = [‘gold’,’beta’,’tuna’];

for my $key (keys %hash ) { print “$key => “, join(“\t”, @{$hash{$key}}), “\n”;}

Page 23: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

23

More matrix use

my @matrix;open(IN, $file) || die $!;# read in the matrixwhile(<IN>) { push @matrix, [split];}# data looks like# GENENAME EXPVALUE STATUS# sort by 2nd columnfor my $row ( sort { $a->[1] <=> $b->[1] } @matrix ) { print join(“\t”, @$row), “\n”;}

Page 24: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

24

Funny operators

my @bases = qw(C A G T)

my $msg = <<EOFThis is the message I wanted to tell you aboutEOF;

Page 25: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

25

Part of “amazing power” of Perl

Allow matching of patterns

Syntax can be tricky

Worth the effort to learn!

Regular Expressions

Page 26: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

26

if( $fruit eq ‘apple’ || $fruit eq ‘Apple’ || $fruit eq ‘pear’) { print “got a fruit $fruit\n”;}if( $fruit =~ /[Aa]pple|pear/ ){ print “matched fruit $fruit\n”;}

A simple regexp

Page 27: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

27

use the =~ operator to match

if( $var =~ /pattern/ ) {} - scalar context

my ($a,$b) = ( $var =~ /(\S+)\s+(\S+)/ );

if( $var !~ m// ) { } - true if pattern doesn’t

m/REGEXPHERE/ - match

s/REGEXP/REPLACE/ - substitute

tr/VALUES/NEWVALUES/ - translate

Regular Expression syntax

Page 28: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

28

Search a string for a pattern match

If no string is specified, will match $_

Pattern can contain variables which will be interpolated (and pattern recompiled)while( <DATA> ) { if( /A$num/ ) { $num++ }}while( <DATA> ) { if( /A$num/o ) { $num++ }}

m// operator (match)

Page 29: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

29

m// -if specify m, can replace / with anything e.g. m##, m[], m!!

/i - case insensitive

/g - global match (more than one)

/x - extended regexps (allows comments and whitespace)

/o - compile regexp once

Pattern extras

Page 30: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

30

\s - whitespace (tab,space,newline, etc)

\S - NOT whitespace

\d - numerics ([0-9])

\D - NOT numerics

\t, \n - tab, newline

. - anything

Shortcuts

Page 31: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

31

+ - 1 -> many (match 1,2,3,4,... instances )/a+/ will match ‘a’, ‘aa’, ‘aaaaa’

* - 0 -> many

? - 0 or 1

{N}, {M,N} - match exactly N, or M to N

[], [^] - anything in the brackets, anything but what is in the brackets

Regexp Operators

Page 32: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

32

Things in parentheses can be retrieved via variables $1, $2, $3, etc for 1st,2nd,3rd matches

if( /(\S+)\s+([\d\.\+\-]+)/) { print “$1 --> $2\n”;}

my ($name,$score) = ($var =~ /(\S+)\s+([\d\.\+\-]+)/);

Saving what you matched

Page 33: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

33

Simple Regexp

my $line = “aardvark”;if( $line =~ /aa/ ) { print “has double a\n” }if( $line =~ /(a{2})/ ) { print “has double a\n” }if( $line =~ /(a+)/ ) { print “has 1 or more a\n” }

Page 34: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

34

Matching gene names

# File contains lots of gene names# YFL001C YAR102W - yeast ORF names# let-1, unc-7 - worm names http://biosci.umn.edu/CGC/Nomenclature/nomenguid.htm# ENSG000000101 - human Ensembl gene nameswhile(<IN>) { if( /^(Y([A-P])(R|L)(\d{3})(W|C)(\-\w)?)/ ) { printf “yeast gene %s, chrom %d,%s arm, %d %s strand\n”, $1, (ord($2)-ord(‘A’))+1, $3, $4; } elsif( /^(ENSG\d+)/ ) { print “human gene $1\n” } elsif( /^(\w{3,4}\-\d+)/ ) { print “worm gene $1\n”; }}

Page 35: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

35

A parser for output from a gene prediction program

Putting it all together

Page 36: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

GlimmerM (Version 3.0)Sequence name: BAC1Contig11Sequence length: 31797 bp

Predicted genes/exons

Gene Exon Strand Exon Exon Range Exon # # Type Length

1 1 + Initial 13907 13985 79 1 2 + Internal 14117 14594 478 1 3 + Internal 14635 14665 31 1 4 + Internal 14746 15463 718 1 5 + Terminal 15497 15606 110

2 1 + Initial 20662 21143 482 2 2 + Internal 21190 21618 429 2 3 + Terminal 21624 21990 367

3 1 - Single 25351 25485 135

4 1 + Initial 27744 27804 61 4 2 + Internal 27858 27952 95 4 3 + Internal 28091 28576 486 4 4 + Internal 28636 28647 12 4 5 + Internal 28746 28792 47 4 6 + Terminal 28852 28954 103

5 3 - Terminal 29953 30037 85 5 2 - Internal 30152 30235 84 5 1 - Initial 30302 30318 17

Page 37: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

37

while(<>) { if(/^(Glimmer\S*)\s+\((.+)\)/ { $method = $1; $version = $2; } elsif( /^(Predicted genes)|(Gene)|(\s+\#)/ || /^\s+$/ ) { next } elsif( # glimmer 3.0 output /^\s+(\d+)\s+ # gene num (\d+)\s+ # exon num ([\+\-])\s+ # strand (\S+)\s+ # exon type (\d+)\s+(\d+) # exon start, end \s+(\d+) # exon length /ox ) {my ($genenum,$exonnum,$strand,$type,$start,$end, $len) = ( $1,$2,$3,$4,$5,$6,$7); }}

Putting it together

Page 38: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

38

Same as m// but will allow you to substitute whatever is matched in first section with value in the second section

$sport =~ s/soccer/football/

$addto =~ s/(Gene)/$1-$genenum/;

s/// operator (substitute)

Page 39: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

39

Match and replace what is in the first section, in order, with what is in the second.

lowercase - tr/[A-Z]/[a-z]/

shift cipher - tr/[A-Z]/[B-ZA]/

revcom - $dna =~ tr/[ACGT]/[TGCA]/; $dna = reverse($dna);

The tr/// operator (translate)

Page 40: 1 Introduction to Perl Part II. 2 Associative arrays or Hashes Like arrays, but instead of numbers as indices can use strings @array = (‘john’, ‘steve’,

40

aMino - {A,C}, Keto - {G,T}

puRines - {A,G}, prYmidines - {C,T}

Strong - {G,C}, Weak - {A,T}

H (Not G)- {ACT}, B (Not A), V (Not T), D(Not C)

$str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/;

(aside) DNA ambiguity chars