Computer Programming for Biologists
Class 8
Nov 28th, 2014
Karsten Hokamp
http://bioinf.gen.tcd.ie/GE3M25/programming
Computer Programming for Biologists
Revision
Subroutines
Overview
Computer Programming for Biologists
my %seq = (); # initialisation
$freq{$char} = 0; # storing a value
$freq{$char}++; # changing a value
my $aa = $code{$codon}; # extracting
foreach my $header (sort keys %seq) {
my $seq = $seq{$header}; …
}
Revision - Hashes
Hash Variables
Scalars vs Hash
my $A = 0;
A0
my $C = 0;
C
0
my $G = 0;
G
0
my $T = 0;
T
0
Initialisation of values:my %frequency = ();
Hash Variables
if ($char eq 'A') {$A++;
} elsif ($char eq 'C') {$C++
} elsif ($char eq 'G') {$G++;
} elsif ($char eq 'T') {$T++;
}
Scalars vs Hash
C
1
%frequency
C
1
Increment:
$frequency{$char}++;
Hash Variables
Scalars vs Hash
G
T
C
A
9
%frequency
A5
C
9
G
7
T
5
Hash Variables
Scalars vs Hash
G
T
C
A
9
%frequency
print "Frequency of A: $A"\n;print "Frequency of C: $C"\n;print "Frequency of G: $G"\n;print "Frequency of T: $T"\n;
A5
C
9
G
7
T
5
foreach my $char (keys %frequency) {print "Frequency of $char: $frequency{$char}\n";
}
Output:
Computer Programming for Biologists
write your own functions
run "programs" within a program
Subroutines
Computer Programming for Biologists
Definition:sub name_of_routine {
# optional arguments in @_, e.g.my ($arg1, $arg2) = @_;
# specify statementsstatement1;statement2;…
# optionally return scalar or list, e.g.return $result1, $result2;
}
Subroutines
special array with arguments to
subroutine
Computer Programming for Biologists
Subroutines
&: (optional) symbol indicating subroutine
Usage:
name_of_routine;
or
$rv = &name_of_routine();
or
@results = &name_of_routine($arg1, $arg2);
(optionally) capture return value(s)
(optionally) submitlist of arguments
Computer Programming for Biologists
Subroutines
Example:
my $dna = shift;my $rev_comp = &reverse_complement($dna);print "reverse complement:\n".&format($rev_comp, 60);
# sub routines:sub reverse_complement {
my $out = reverse shift @_;$out =~ tr/acgtACGT/tgcaTGCA/;return $out;
}sub format {
my ($sequence, $width) = @_;…
Computer Programming for Biologists
Subroutines
Example:
my $dna = shift;my $rev_comp = &reverse_complement($dna);print "reverse complement:\n".&format($rev_comp, 60);
# sub routines:sub reverse_complement {
my $out = reverse shift @_;$out =~ tr/acgtACGT/tgcaTGCA/;return $out;
}sub format {
my ($sequence, $width) = @_;…
A copy of $dna is passed on
Main area stays tidy and
Details hidden towards end of
script
Code is re-usable, can be applied multiple times
Computer Programming for Biologists
• Can be placed anywhere in the program
• Normally all subroutines located after main block of text
• Definition starts with 'sub' followed by name
• Statements enclosed in curly brackets
• Text normally written indented
• Optionally provide arguments
• Optionally return values
• Can be nested
Subroutines
Computer Programming for Biologists
Scenario:
Read in DNA sequence
Translate in all six reading frames
6 x translation of a sequence
Subroutines
Computer Programming for Biologists
Inefficient coding:# frame 1:$sequence = $orig_seq;# Block of translation code, e.g.$prot = '';while ($sequence) {
$codon = substr $sequence, 0, 3, '';$aa = $genetic_code{$codon};$prot .= $aa;
}print "translation: $prot\n";
Subroutines
Computer Programming for Biologists
Inefficient coding:# frame 1:$sequence = $orig_seq;# Block of translation code, e.g.$prot = '';while ($sequence) {
$codon = substr $sequence, 0, 3, '';$aa = $genetic_code{$codon};$prot .= $aa;
}print "translation: $prot\n";
# frame 2:# remove first basesubstr $sequence, 0, 1, ''
Subroutines
Computer Programming for Biologists
Inefficient coding:# frame 1:$sequence = $orig_seq;Block oftranslation code
# frame 2:# remove first basesubstr $sequence, 0, 1, '';Block of translation code
# frame 3:…# frame -1:…
Subroutines
the same block of code specified 6 times
Computer Programming for Biologists
More efficient coding:# frame 1:$sequence = $orig_seq;&translate($sequence);# frame 2:# remove first basesubstr $sequence, 0, 1, ''&translate($sequence);# frame 3:…# frame -1:…sub translate {
$input = shift;…print "translation: $prot\n";
}
Subroutines
6 times use of subroutine
1 specification oftranslation code
Computer Programming for Biologists
Alternative:# frame 1:$sequence = $orig_seq;print &translate($sequence), "\n";# frame 2:# remove first basesubstr $sequence, 0, 1, ''print &translate($sequence), "\n";# frame 3:…# frame -1:…sub translate {
$input = shift;…return $protein;
}
Subroutines
print return value
return translatedsequence
Computer Programming for Biologists
Other uses – recursion:
# calculate factorial value for a given number:
$fv = &fact(10);
print "factorial 10 is $fv\n";
sub fact {
my $val = shift;
$fact = 1;
if ($val > 1) {
$fact = $val * &fact($val-1);
}
return $fact;
}
Subroutines
call subroutine within itself
Computer Programming for Biologists
Other uses – recursion:
$val = 10;
$fact = $val * &fact($val-1);
$fact = 10 * fact(9);
$fact = 10 * 9 * fact(8);
…
$fact = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * fact(1);
$fact = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1;
Subroutines
Computer Programming for Biologists
reduce programming effort
improve flow
increase clarity
enable recursion
Subroutines
Computer Programming for Biologists
Extend your sequence analysis tool:
-add translation into protein
as subroutine into your script
e-mail me at [email protected]
with questions or problems
Exercises
Computer Programming for Biologists
Mock exam!
Next week