90
What is Perl? What is Perl? Practical Extraction and Report Language Interpreted Language Optimized for String Manipulation and File I/O Full support for Regular Expressions

What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Embed Size (px)

Citation preview

Page 1: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

What is Perl?What is Perl?

• Practical Extraction and Report Language

• Interpreted Language– Optimized for String Manipulation and File I/O

– Full support for Regular Expressions

Page 2: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Running Perl ScriptsRunning Perl Scripts

• Windows– Download ActivePerl from ActiveState

– Just run the script from a 'Command Prompt' window

• UNIX – Cygwin– Put the following in the first line of your script

#!/usr/bin/perl– Run the script

% perl script_name

Page 3: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Basic SyntaxBasic Syntax

• Statements end with semicolon ‘;’

• Comments start with ‘#’– Only single line comments

• Variables– You don’t have to declare a variable before you access it

– You don't have to declare a variable's type

Page 4: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Scalars and IdentifiersScalars and Identifiers

• Identifiers– A variable name

– Case sensitive

• Scalar– A single value (string or numerical)

– Accessed by prefixing an identifier with '$'

– Assignment with '='

$scalar = expression

Page 5: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

StringsStrings

• Quoting Strings– With ' (apostrophe)

• Everything is interpreted literally

– With " (double quotes)• Variables get expanded

Page 6: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

String Operation Arithmetic

lt less than <

gt greater than >

eq equal to ==

le less than or equal to <=

ge greater than or equal to >=

ne not equal to !=

cmp compare, return 1, 0, -1 <=>

Comparison OperatorsComparison Operators

Page 7: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Operator Operation

||, or logical or

&&, and logical and

!, not logical not

xor logical xor

Logical OperatorsLogical Operators

Page 8: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Operator Operation

. string concatenation

x string repetition

.= concatenation and assignment

$string1 = "potato";

$string2 = "head";

$newstring = $string1 . $string2; #"potatohead"

$newerstring = $string1 x 2; #"potatopotato"

$string1 .= $string2; #"potatohead"

String OperatorsString Operators

Page 9: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Perl FunctionsPerl Functions

• Perl functions are identified by their unique names (print, chop, close, etc)

• Function arguments are supplied as a comma separated list in parenthesis. – The commas are necessary

– The parentheses are often not

– Be careful! You can write some nasty and unreadable code this way!

Check 02_unreadable.pl

Page 10: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

ListsLists

• Ordered collection of scalars– Zero indexed (first item in position '0')

– Elements addressed by their positions

• List Operators– (): list constructor– , : element separator– []: take slices (single or multiple element chunks)

Page 11: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

List OperationsList Operations

• sort(LIST)

a new list, the sorted version of LIST• reverse(LIST)

a new list, the reverse of LIST• join(EXPR, LIST)

a string version of LIST, delimited by EXPR• split(PATTERN, EXPR)

create a list from each of the portions of EXPR that match PATTERN

Check 03_listOps.pl

Page 12: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

ArraysArrays

• A named list– Dynamically allocated, can be saved

– Zero-indexed

– Shares list operations, and adds to them

• Array Operators– @: reference to the array (or a portion of it, with [])– $: reference to an element (used with [])

Page 13: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Array OperationsArray Operations

• push(@ARRAY, LIST)

add the LIST to the end of the @ARRAY• pop(@ARRAY)

remove and return the last element of @ARRAY• unshift(@ARRAY, LIST)

add the LIST to the front of @ARRAY• shift(@ARRAY)

remove and return the first element of @ARRAY• scalar(@ARRAY)

return the number of elements in the @ARRAY

Check 04_arrayOps.pl

Page 14: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Associative Arrays - HashesAssociative Arrays - Hashes

• Arrays indexed on arbitrary string values– Key-Value pairs

– Use the "Key" to find the element that has the "Value"

• Hash Operators– % : refers to the hash

– {}: denotes the key

– $ : the value of the element indexed by the key (used with {})

Page 15: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Hash OperationsHash Operations

• keys(%ARRAY)

return a list of all the keys in the %ARRAY• values(%ARRAY)

return a list of all the values in the %ARRAY• each(%ARRAY)

iterates through the key-value pairs of the %ARRAY• delete($ARRAY{KEY})

removes the key-value pair associated with {KEY} from the ARRAY

Page 16: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Arrays ExampleArrays Example

#!/usr/bin/perl # Simple List operations

# Address an element in the list@stringInstruments = ("violin","viola","cello","bass"); @brass = ("trumpet","horn","trombone","euphonium","tuba"); $biggestInstrument = $stringInstruments[3];

print("The biggest instrument: ", $biggestInstrument);

# Join elements at positions 0, 1, 2 and 4 into a white-space delimited string

print("orchestral brass: ", join(" ",@brass[0,1,2,4]), "\n");

@unsorted_num = ('3','5','2','1','4');@sorted_num = sort( @unsorted_num );

# Sort the listprint("Numbers (Sorted, 1-5): ", @sorted_num, "\n");

#Add a few more numbers@numbers_10 = @sorted_num;push(@numbers_10, ('6','7','8','9','10'));print("Numbers (1-10): ", @numbers_10, "\n"); # Remove the lastprint("Numbers (1-9): ", pop(@numbers_10), "\n"); # Remove the firstprint("Numbers (2-9): ", shift(@numbers_10), "\n"); # Combine two opsprint("Count elements (2-9): ",

$#@numbers_10;# scalar( @numbers_10 ), "\n"); print("What's left (numbers 2-9): ",

@numbers_10, "\n");

Page 17: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Hashes ExampleHashes Example#!/usr/bin/perl # Simple List operations

$player{"clarinet"} = "Susan Bartlett"; $player{"basson"} = "Andrew Vandesteeg"; $player{"flute"} = "Heidi Lawson"; $player{"oboe"} = "Jeanine Hassel"; @woodwinds = keys(%player); @woodwindPlayers = values(%player);

# Who plays the oboe?print("Oboe: ", $player{'oboe'}, "\n");

$playerCount = scalar(@woodwindPlayers);

while (($instrument, $name) = each(%player)){

print( "$name plays the $instrument\n" );}

Page 18: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Pattern MatchingPattern Matching

• A pattern is a sequence of characters to be searched for in a character string– /pattern/

• Match operators– =~: tests whether a pattern is matched

– !~: tests whether patterns is not matched

Page 19: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Pattern Matches Pattern Matches

/def/ "define" /d.f/ dif

/\bdef\b/ a def word /d.+f/ dabcf

/^def/ def in start of line

/d.*f/ df, daffff

/^def$/ def line /de{1,3}f/ deef, deeef

/de?f/ df, def /de{3}f/ deeef

/d[eE]f/ def, dEf /de{3,}f/ deeeeef

/d[^eE]f/ daf, dzf /de{0,3}f/ up to deeef

PatternsPatterns

Page 20: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Character RangesCharacter Ranges

Escape Sequence

Pattern Description

\d [0-9] Any digit

\D [^0-9] Anything but a digit

\w [_0-9A-Za-z] Any word character

\W [^_0-9A-Za-z] Anything but a word char

\s [ \r\t\n\f] White-space

\S [^\r\t\n\f] Anything but white-space

Page 21: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

BackreferencesBackreferences

• Memorize the matched portion of input

Use of parentheses.– /[a-z]+(.)[a-z]+\1[a-z]+/

– asd-eeed-sdsa, sd-sss-ws

– NOT as_eee-dfg

• They can even be accessed immediately after the pattern is matched– \1 in the previous pattern is what is matched by (.)

Page 22: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Pattern Matching OptionsPattern Matching Options

Escape Sequence

Description

g Match all possible patterns

i Ignore case

x Ignore white-space in pattern

Page 23: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

SubstitutionsSubstitutions

• Substitution operator– s/pattern/substitution/options

• If $string = "abc123def";– $string =~ s/123/456/

Result: "abc456def"– $string =~ s/123//

Result: "abcdef"– $string =~ s/(\d+)/[$1]/

Result: "abc[123]def“

Use of backreference!

Page 24: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Predefined Read-only VariablesPredefined Read-only Variables

$& is the part of the string that matched the regular expression

$` is the part of the string before the part that matched

$' is the part of the string after the part that matched

EXAMPLEEXAMPLE$_ = "this is a sample string";/sa.*le/; # matches "sample" within the string# $` is now "this is a "# $& is now "sample"# $' is now " string"Because these variables are set on each successful match, you should save the values elsewhere if youneed them later in the program.

Page 25: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

The split and join FunctionsThe split and join FunctionsThe split function takes a regular expression and a string, and looks for all occurrences of the regular expression within that string. The parts of the string that don't match the regular expression are returned in sequence as a list of values.

The join function takes a list of values and glues them together with a glue string between each list element.

Split ExampleSplit Example Join ExampleJoin Example

$line = "merlyn::118:10:Randal:/home/merlyn:/usr/bin/perl";@fields = split(/:/,$line); # split $line, using : as delimiter# now @fields is ("merlyn","","118","10","Randal",# "/home/merlyn","/usr/bin/perl")

$bigstring = join($glue,@list);

For example to rebuilt the password file try something like:$outline = join(":", @fields);

Page 26: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

String - Pattern ExamplesString - Pattern Examples

A simple Example

#!/usr/bin/perlprint ("Ask me a question politely:\n");

$question = <STDIN>;

# what about capital P in "please"?if ($question =~ /please/){

print ("Thank you for being polite!\n");}else{

print ("That was not very polite!\n");}

Page 27: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

String – Pattern ExampleString – Pattern Example#!/usr/bin/perlprint ("Enter a variable name:\n");$varname = <STDIN>;chop ($varname);# Try asd$asdas... It gets accepted!if ($varname =~ /\$[A-Za-z][_0-9a-zA-Z]*/) {

print ("$varname is a legal scalar variable\n");}elsif ($varname =~ /@[A-Za-z][_0-9a-zA-Z]*/) {

print ("$varname is a legal array variable\n");}elsif ($varname =~ /[A-Za-z][_0-9a-zA-Z]*/){

print ("$varname is a legal file variable\n");}else{

print ("I don't understand what $varname is.\n");}

Page 28: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Sources

• Beginning Perl for Bioinformatics– James Tisdall, O’Reilly Press, 2000

• Using Perl to Facilitate Biological Analysis in Bioinformatics: A Practical Guide (2nd Ed.)– Lincoln Stein, Wiley-Interscience, 2001

• Introduction to Programming and Perl– Alan M. Durham, Computer Science Dept., Univ. of São Paulo, Brazil

Page 29: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Why Write Programs?

• Automate computer work that you do by hand - save time & reduce errors

• Run the same analysis on lots of similar data files = scale-up

• Analyze data, make decisions – sort Blast results by e-value &/or species of best mach

• Build a pipeline • Create new analysis methods

Page 30: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Why Perl?• Fairly easy to learn the basics• Many powerful functions for working with

text: search & extract, modify, combine • Can control other programs • Free and available for all operating systems• Most popular language in bioinformatics• Many pre-built “modules” are available that

do useful things

Page 31: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Programming Concepts• Program = a text file that contains

instructions for the computer to follow• Programming Language = a set of

commands that the computer understands (via a “command interpreter”)

• Input = data that is given to the program• Output = something that is produced by the

program

Page 32: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Programming

• Write the program (with a text editor)• Run the program• Look at the output• Correct the errors (debugging)• Repeat(computers are VERY dumb -they do exactly

what you tell them to do, so be careful what you ask for…)

Page 33: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Strings

• Text is handled in Perl as a string

• This basically means that you have to put quotes around any piece of text that is not an actual Perl instruction.

• Perl has two kinds of quotes - single ‘ ‘and double “ “

(they are different- more about this later)

Page 34: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Print

• Perl uses the term “print” to create output

• Without a print statement, you won’t know what your program has done

• You need to tell Perl to put a carriage return at the end of a printed line– Use the “\n” (newline) command

• Include the quotes

– The “\” character is called an escape - Perl uses it a lot

Page 35: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Program details

• Perl programs always start with the line:

#!/usr/bin/perl– this tells the computer that this is a Perl program and

where to get the Perl interpreter

• All other lines that start with # are considered comments, and are ignored by Perl

• Lines that are Perl commands end with a ;

Page 36: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Run your Perl program

• >chmod u+x *.pl [#make the file executable]

• >perl my_perl1.pl[#use the perl interpreter to run your script]

Page 37: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Numbers and Functions• Perl handles numbers in most common formats:

4565.67436.3E-26

• Mathematical functions work pretty much as you would expect:

4+76*443-27256/122/(3-5)

Page 38: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Do the Math (your 2nd Perl program)

#!/usr/bin/perlprint “4+5\n”;print 4+5 , “\n”;print “4+5=” , 4+5 , “\n”;

[Note: use commas to separate multiple items in a print statement, whitespace is ignored]

Page 39: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Variables• To be useful at all, a program needs to be able to

store information from one line to the next• Perl stores information in variables• A variable name starts with the “$” symbol, and it

can store strings or numbers– Variables are case sensitive

– Give them sensible names

• Use the “=”sign to assign values to variables

$one_hundred = 100

$my_sequence = “ttattagcc”

Page 40: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

You can do Math with Variables

#!/usr/bin/perl#put some values in variables$sequences_analyzed = 200 ;$new_sequences = 21 ;#now we will do the work$percent_new_sequences =( $new_sequences /

$sequences_analyzed) *100 ;print “% of new sequences = ” , $percent_new_sequences;

% of new sequences = 952.381

Page 41: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

• Strings (text) in variables can be used for some math-like operations

• Concatenate (join) use the dot . operator$seq1= “ACTG”;$seq2= “GGCTA”;$seq3= $seq1 . $seq2;print $seq3

ACTGGGCTA

• String comparison (are they the same, > or <)• eq (equal ) • ne (not equal ) • ge (greater or equal ) • gt (greater than ) • lt (less than )• le (less or equal )

String Operations

Uses some non-intuitiveways of comparing letters (ASCII values)

Page 42: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

DNA 조각 연결#!/usr/bin/perl#DNA 연결#두개의 단편을 $DNA1 과 $DNA2 라는 변수에 저장$DNA1 =‘ACGGAA’;$DNA2 =‘CCGGAAGAA’;$DNA3=“$DNA1$DNA2”;#이중인용부호는 변수의값을 이중 부호 안의 값으로대체 .#이를 문자열 삽입이라 한다 .# 다른방식으로 연결$DNA4=$DNA1.$DNA2;

Page 43: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

DNA 를 RNA 로 전사#!/usr/bin/perl#DNA 를 RNA 로 변환$DNA1 =‘ACGGAA’;$RNA=$DNA1;$RNA=~ s/T/U/g# 프로그램의 종료Exit;

=~ 는 바인딩 연산자 , s 는 치환연산자 , T 를 U 로 g( 모두 )

대문자를 소문자로 변환하려면 ?

Page 44: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

역상보계산 ( 잘못 )#!/usr/bin/perl#역상보 A->T, T->A, C->G, G->C$DNA1 =‘ACGGAA’;$DNA2=$DNA1;$DNA2=~ s/A/T/g$DNA2=~ s/T/A/g$DNA2=~ s/G/C/g$DNA2=~ s/C/G/g# 프로그램의 종료Exit;

=~ 는 바인딩 연산자 , s 는 치환연산자 , T 를 U 로 g( 모두 )

Page 45: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

역상보계산#!/usr/bin/perl#역상보 A->T, T->A, C->G, G->C$DNA1 =‘ACGGAA’;$DNA2=$DNA1;$DNA2=~ tr/ATGC/TAGC/;# 프로그램의 종료Exit;

=~ 는 바인딩 연산자 , tr 은 치환연산자

Page 46: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

파일내용읽기#!/usr/bin/perl$filename =‘a.dat’;open(MYFILE, $filename);$DNA1= <MYFILE>; # 첫번째 행$DNA2= <MYFILE>; # 두번째 행$DNA3= <MYFILE>; # 세번째 행# 프로그램의 종료Exit;

Page 47: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

파일내용읽기#!/usr/bin/perl$filename =‘a.dat’;open(MYFILE, $filename);@DNA= <MYFILE>; # 첫번째 행print @protein;close MYFILE;# 프로그램의 종료Exit;

@ 는 배열변수 . 벼열은 많은 스칼라 값을 보유하는 변수

Page 48: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

배열 예#!/usr/bin/perl@bases = (‘A’, ‘G’, ‘C’, ‘T’);print @base; #AGCTprint “@base”; # A G C Tprint $base[0];print $base[1];# 프로그램의 종료Exit;

@ 는 배열변수 . 배열은 많은 스칼라 값을 보유하는 변수

Page 49: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

배열 예#!/usr/bin/perl@bases = (‘A’, ‘G’, ‘C’, ‘T’);$base1= pop @bases;print $base1; # T 가 인쇄Print @bases; # AGC 가 인쇄

Pop 는 배열의 마지막 원소를 떼어내는 함수

Page 50: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

배열 예#!/usr/bin/perl@bases = (‘A’, ‘G’, ‘C’, ‘T’);$base1= shift @bases;print $base1; # A 가 인쇄Print @bases; # GCT 가 인쇄

Pop 는 배열의 시작 원소를 떼어내는 함수

Page 51: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

배열 예#!/usr/bin/perl@bases = (‘A’, ‘G’, ‘C’, ‘T’);$base1= pop @bases;Unshift (@bases, $base1);print $base1; # ?Print @bases; # ?

unshift 는 배열의 시작에 하나의 원소를 추가하는 함수

Page 52: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

배열 예#!/usr/bin/perl@bases = (‘A’, ‘G’, ‘C’, ‘T’);$base1= pop @bases;push (@bases, $base1);print $base1; # ?Print @bases; # ?

push 는 배열의 끝에 하나의 원소를 추가하는 함수

Page 53: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

배열 예#!/usr/bin/perl@bases = (‘A’, ‘G’, ‘C’, ‘T’);$rev1 = reverse @bases

배열을 역으로 바꾸어 저장

Page 54: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

배열 예#!/usr/bin/perl@bases = (‘A’, ‘G’, ‘C’, ‘T’);Print scalar @bases; #4 가 인쇄Splice (@bases, 2, 0, ‘X’); # 2 번째 원소에 X 를 삽입Print @bases; # AGXCT 가 출력

unshift 는 배열의 시작에 하나의 원소를 추가하는 함수

Page 55: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

제어흐름

• 조건문– If, if-else, unless– If( 참 ) { do something;} 이때 참은 1– If( 거짓 ) { do not do something;} 이때

거짓은 0– If ( 참 ){ } else { }– Unless (1==0) { print “1 !=0 “;}

Page 56: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

반복

• 반복문– 루프는 {} 로 둘러싸인 문장을 반복해

실행한다 . While, for, foreach 등의 반복문이 있다 .

Page 57: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

반복#!/usr/bin/perl$proteinfilename = ‘file1.pep’;# 파일을 열고 열기가 실패하는경우 에러메시지 출력후 프로그램 종료Unless (open(PROTEINFILE, $proteinfilename)){

print “Could not open file”; exit;

}# while 루프에 있는 파잉로부터 단백질 서열 데이터를 해석 , 각 행을 읽음While ($protein=<PROTEINFILE>){

print $protein;}# 파일닫기Close PROTEINFILE;Exit;

Page 58: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

키보드로부터 입력#!/usr/bin/perl$proteinfilename =<STDIN>;# 파일을 열고 열기가 실패하는경우 에러메시지 출력후 프로그램 종료Unless (open(PROTEINFILE, $proteinfilename)){

print “Could not open file”; exit;

}# while 루프에 있는 파잉로부터 단백질 서열 데이터를 해석 , 각 행을 읽음While ($protein=<PROTEINFILE>){

print $protein;}# 파일닫기Close PROTEINFILE;Exit;

Page 59: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

반복#!/usr/bin/perl

For( ; ; ){}

While ( 조건 ){}

Page 60: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

하나의 긴 문자열로변환#!/usr/bin/perl$proteinfilename = ‘file1.pep’; # 단백질 파일로 부터 개행문자 제거Chomp $proteinfilename;# 파일을 열고 열기가 실패하는경우 에러메시지 출력후 프로그램 종료Unless (open(PROTEINFILE, $proteinfilename)){

print “Could not open file”; exit;

}# 파일로부터 단백질 서열 데이터를 읽고 배열변수 @protein 에 저장@protein=<PROTEINFILE>; # 파일닫기Close PROTEINFILE;# 배열을 스칼라로 바꿈$protein=join(“”,@protein);# 공백제거$protein =~ s/\s//g;Exit;

Page 61: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

문자열을 배열로 분해#!/usr/bin/perl# $DNA 에 문자열이 저장되어 있다 가정@DNA = split(“”, $DNA);#count 를 초기화Count_A=0;Count_G=0;Count_C=0;Count_T=0;

Foreach $base (@DNA) {if($base eq ‘A’) { ++$count_A;}elseif ($base eq ‘G’) { ++$count_G;}

….}

Page 62: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

파일에 출력# “out1” 라는 파일에 결과를 작성$outputfile =“out1”;Unless(open(COUNTBASE, “>$outputfile”)){

print “cannot open file”; exit;}Print COUNTBASE “A=$a G=$g… “;Close(COUNTBASE);

Page 63: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

연습문제1. 사용자가 두 개의 짧은 DNA 의 문자열을 입력하도록 한다 . Dot(.)

연산자 와 할당연산자 (=) 를 사용하여 두번째 문자열을 첫번째 문자열에 붙이는 식으로 두 개의 DNA 의 문자열을 출력해보라

2. 1 부터 100 까지의 모든 숫자를 출력하라 .( 반복문 사용 )3. DNA 의 한 가닥의 역상보를 계산하는 프로그램을 작성하라 . 이때 입력은

키보드로 한 가닥의 DNA 문자열이 입력된다 .4. 인자로 주어진 두 개의 문자열이 서로 역상보인지 확인하는 프로그램을

작성하라 . 펼 함수인 split, pop, shift, eq 를 사용하여라5. 주어진 DNA 서열의 G, C, 의 백분율을 계산하라 .6. 두 개의 파일을 읽어 두 번째 파일의 내용이 첫번째 파일 뒤에 연결되도록

프로그램을 작성하라 .7. 파일을 읽고 해당 파일의 마지막 행부터 첫번째 행까지 거꾸로 출력하는

프로그램을 작성하라 .

Page 64: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

subroutine# ACGT 를 DNA 에 붙이기# 원본 dna$dna=‘CGACTTAA’;$longer_dna=addACGT($dna);Print “ I added ACGT to $dna and got $longer_dna”;Exit;

#subroutine 에 대한 정의Sub addACGT {

my($dna)=@_; # 한 개의 인자를 전달할때 . 변형될 수 없다$dna .=‘ACGT’;return $dna;

}

Page 65: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

subroutine# 두 개 이상을 전달할 때

my($dna, $protein, $name_of_genes)= @

Page 66: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Subroutine(call by reference)# 주 프로그램의 인자의 변수에 영향을 미치려할 때# 인자에 \를 붙여 reference 에 의한 전달임을 알린다 .#!/usr/bin/perlMy @i=(‘1’,’2’,’3’);Reference_sub(\@i);Print “@i”;Exit

Sub reference_sub{# 일단 스칼라 함수로 받아들인다my($i)=@_; # 실제 사용할 때 인자를 어떤 종류의 변수인지 보여주는 기호를 단다push(@$i, ‘4’); # 4 를 추가

}

Page 67: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

subroutine1. 사용자가 두 개의 짧은 DNA 의 문자열을 입력하도록 한다 . Dot(.)

연산자 와 할당연산자 (=) 를 사용하여 두번째 문자열을 첫번째 문자열에 붙이는 식으로 두 개의 DNA 의 문자열을 출력해보라

2. 1 부터 100 까지의 모든 숫자를 출력하라 .( 반복문 사용 )3. DNA 의 한 가닥의 역상보를 계산하는 프로그램을 작성하라 . 이때 입력은

키보드로 한 가닥의 DNA 문자열이 입력된다 .4. 인자로 주어진 두 개의 문자열이 서로 역상보인지 확인하는 프로그램을

작성하라 . 펼 함수인 split, pop, shift, eq 를 사용하여라5. 주어진 DNA 서열의 G, C, 의 백분율을 계산하라 .6. 두 개의 파일을 읽어 두 번째 파일의 내용이 첫번째 파일 뒤에 연결되도록

프로그램을 작성하라 .7. 파일을 읽고 해당 파일의 마지막 행부터 첫번째 행까지 거꾸로 출력하는

프로그램을 작성하라 .

Page 68: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

명령어 라인에서 DNA 에 있는 G 의 개수 계산하기

# !/usr/bin/perl# $0 는 프로그램의 이름을 갖는 특수한 변수My($USAGE)=“$0 DNA”;#@ARGV 는 모든 프로그램의 이름을 갖는 특수 변수이다# 만일 ARGV 가 비어 있으면 Unless(@ARGV){

print $USAGE}My($dna)=$ARGV[0];#subroutine 호출My($num_of_Gs)=countG($dna);# 결과 인쇄하고 종료Print “ $num_of_G”;Exit;

Page 69: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

명령어 라인에서 DNA 에 있는 G 의 개수 계산하기

# subroutineSub countG{

# 인자 $dna 에 존재하는 G 의 숫자 계산을 반환한다 . # 인자와 변수 초기화

my($dna)=@_;my($count)=0;#tr 함수는 문자열에서 검색한 지정된 문자의 개수를 반환한다 . 아래의 예는 교체문자조합을 제시하지 않았기 때문에 문자열을 실제로 바꾸지 않는다 .$count=($dna=~ tr/Gg//); return $count;

}

Page 70: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

연습문제1. DNA 의 두 개의 문자열을 결합하여 하나로 만드는 서브루틴을 작성하라2. DNA 에 있는 A, T, G, C 의 백분율을 계산하는 서브루틴을 작성하라 .3. 한 배열에 10 개의 값이 들어있다 . 이 값의 합을 계산하는 서브루틴을

작성하라 .4. 한 배열에 10 개의 값이 들어있다 . 이 배열의 값들에 50 을 더하는

서브루틴을 만들라 .5. DNA 의 역상보 (A->, G->C, C->G, T-A) 를 만드는 서브루틴을

작성하라 . ( 일단 서열을 역방향으로 바꾼다 . 다음 문자변환한다 )6. DNA 서열을 입력받아 , 단백질 서열로 변환하는 서브루틴을 작성하라 .

Page 71: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

서열의 역상보 만들기# subroutineSub revcom{

my($dna)=@_;my($revcom)=reverse($dna);$revcom=~ tr/ACGTacgt/TGCAtgca/;return $revcom;

}

Page 72: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

72

Regular Expressions• Regular expressions are used to define patterns you

wish to search for in strings• Use a syntax with rules and operators

– Can create extremely sophisticated patterns• Numbers, letters, case insensitivity, repetition, anchoring, zero

or one, white space, tabs, newlines, etc....

– Patterns are deterministic but can be made extremely specific or extremely general

– Test for match, replace, select

• Lots on REGEX tomorrow!

Page 73: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

73

Using REGEX

• =~ is the operator we use with REGEX

• =~ is combined with utility operators to match, replace$DNA = “AGATGATAT”;

if ($DNA =~ /ATG/) {

print “Match!”;

}

=~ pattern match comparison

operator

The pattern is a set of characters

between //

Matching leaves the string

unchanged

Page 74: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

74

REGEX - Substitution

• You can substitute the parts of a string that match a regular expression with another string

$DNA = “AGATGATAT”;

$DNA =~ s/T/U/g;

print $DNA, “\n”;

AGAUGAUAU

Substitution operator

Pattern to search for

Replacement string

Global replacementSubstitution

changes the variable

Page 75: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

75

REGEX - Substitution

$DNA = “AGATGATAT”;

$DNA =~ s/T/U/;

print $DNA, “\n”;

AGAUGATAT

Page 76: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

76

REGEX - Translation

• You can translate a string by exchanging one set of characters for another set of characters

$DNA = “AGATGATAT”;

$DNA =~ tr/ACGT/TGCA/;

print $DNA, “\n”;

Translation operator

Set of characters to replace

Replacement characters

Translation changes the

variable

TCTACTATA

Page 77: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

77

S , tr

• Task– transcription and reverse complement a DNA

sequence

• Concepts– Simple regular expressions using s and tr

Page 78: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

78

Functions

• reverse(STRING)– Function that reverses a string

• STRING =~ s/PATTERN/REPLACEMENT/modifiers– This is the substitute operator

• STRING =~ tr/CHARS/REPLACEMENT CHARS/– This is the translation operator

Page 79: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

79

REGEX - recap

• REGEX are used to find patterns in text

• Use a syntax that must be learned in order to be exploited

• Extremely powerful for processing and manipulating text

• Will be examined more closely tomorrow

Page 80: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

80

Functions• Functions (sub-routines) are like small programs inside your program

• Like programs, functions execute a series of statements that process input to produce some desired output

• Functions help to organise your program – parcel it into named functional units that can be called repeatedly

• There are literally hundreds of functions built-in to Perl

• You can make your own functions

Page 81: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

81

What happens when you call a function?

$DNA = “ACATAATCAT”;

$rcDNA = reverse ($DNA);

$rcDNA =~ tr/ACGT/TGCA/;

sub reverse { # process input

# return output}

Page 82: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

82

Calling a function• Input is passed to a function by way of an ordered

parameter list

$result = function_name (parameter list);Basic syntax of calling a function

$longDNA = "ACGACTAGCATGCATCGACTACGACTACGATCAGCATCGACT"

$shortDNA = substr ($longDNA, 0, 10);

String from

which to extract

the substring

Start from this

position

Length of the substring

Page 83: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

83

Useful string functions in Perl• chop(STRING) OR chop(ARRAY)

– Removes the last character from a string or the last character from every element in an array. The last character chopped is returned.

• index(STRING, SUBSTRING, POSITION) – Returns the position of the first occurrence of SUBSTRING in STRING at or after POSITION. If you don't specify POSITION, the search starts at

the beginning of STRING.

• join(STRING, ARRAY) – Returns a string that consists of all of the elements of ARRAY joined together by STRING. For instance, join(">>", ("AA", "BB", "cc")) returns

"AA>>BB>>cc".

• lc(STRING)– Returns a string with every letter of STRING in lowercase. For instance, lc("ABCD") returns "abcd".

• lcfirst(STRING)– Returns a string with the first letter of STRING in lowercase. For instance, lcfirst("ABCD") returns "aBCD".

• length(STRING)– Returns the length of STRING.

• split(PATTERN, STRING, LIMIT)– Breaks up a string based on some delimiter. In an array context, it returns a list of the things that were found. In a scalar context, it returns the

number of things found.

• substr(STRING, OFFSET, LENGTH)– Returns a portion of STRING as determined by the OFFSET and LENGTH parameters. If LENGTH is not specified, then everything from OFFSET

to the end of STRING is returned. A negative OFFSET can be used to start from the right side of STRING.

• uc(STRING)– Returns a string with every letter of STRING in uppercase. For instance, uc("abcd") returns "ABCD".

• ucfirst(STRING)– Returns a string with the first letter of STRING in uppercase. For instance, ucfirst("abcd") returns "Abcd".

source: http://www.cs.cf.ac.uk/Dave/PERL/

Page 84: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

84

Useful array functions in Perl

• pop(ARRAY) – Returns the last value of an array. It also reduces the size of the array by one.

• push(ARRAY1, ARRAY2)– Appends the contents of ARRAY2 to ARRAY1. This increases the size of ARRAY1 as needed.

• reverse(ARRAY) – Reverses the elements of a given array when used in an array context. When used in a scalar context,

the array is converted to a string, and the string is reversed.

• scalar(ARRAY) – Evaluates the array in a scalar context and returns the number of elements in the array.

• shift(ARRAY) – Returns the first value of an array. It also reduces the size of the array by one.

• sort(ARRAY) – Returns a list containing the elements of ARRAY in sorted order. See next Chapter 8on References

for more information.

• split(PATTERN, STRING, LIMIT)– Breaks up a string based on some delimiter. In an array context, it returns a list of the things that

were found. In a scalar context, it returns the number of things found.

source: http://www.cs.cf.ac.uk/Dave/PERL/

Page 85: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

Lecture 5.1 85

String functions - split• ‘splits’ a string into an array based on a delimiter

• excellent for processing tab or comma delimited files

$line = “MacDonald,Old,The farm,Some city,BC,E1E 1O1”;

($lastname, $firstname, $address, $city, $province, $postalcode) = split (/,/, $line);

LAST NAME: MacDonald FIRST NAME: OldADDRESS: The FarmCITY: Some cityPROVINCE: BCPOSTAL CODE: E1E 1O1

print (“LAST NAME: “, $lastname, “\n”, “FIRST NAME: “, $firstname, “\n”, “ADDRESS: “, $address, “\n”, “CITY: “, $city, “\n”, “PROVINCE: “, $province, “\n”, “POSTAL CODE: “, $postalcode, “\n”);

REGEX goes here

String goes here

Page 86: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

86

Array functions - sort

• You can sort the elements in your array with ‘sort’

@myNumbers = ("one","two","three","four");

@sorted = sort(@myNumbers);

print “@sorted\n”;

four one three two

sorts alphabeticall

y

Page 87: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

87

Making your own function

sub function_name {

(my $param1, my $param2, ...) = @_;

# do something with the parameters

my $result = ...

return $result;

}

This is the function name. Use this name to ‘call’ the function

from within your program

What is this? This is an array that gets created automatically

to hold the parameter list.

‘sub’ tells the interpreter you are declaring a function

What is the word ‘my’ doing here? ‘my’ is a variable

qualifier that makes it local to the function. Without it, the

variable is available anywhere in the program. It is good

practice to use ‘my’ throughout your programs – more on this

tomorrow.

‘return’ tells the interpreter to go back to the place in the

program that called this function. When followed by scalars or variables, these values are passed back to

where the function was called. This is the output of the

function

Page 88: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

88

Making your own function - example

sub mean {

my @values = @_;

my $numValues = scalar @values;

my $mean;

foreach my $element (@values) {

my $sum = $sum + $element;

}

$mean = $mean / $numValues;

return $mean;

}

$avg = mean(1,2,3,4,5);

Function definition

local variables to be used inside the function

do the work!

return the answer

Page 89: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

89

Functions - recap• A function packages up a set of statements to

perform a given task• Functions take a parameter list as input and

return some output• Perl has hundreds of functions built-in that

you should familiarise yourself with– Keep a good book, or URL handy at all times

• You can (and should!) make your own functions

Page 90: What is Perl? Practical Extraction and Report Language Interpreted Language –Optimized for String Manipulation and File I/O –Full support for Regular Expressions

연습문제1. 인자로 주어진 두 개의 문자열이 서로 역상보인지를 확인하는 프로그램을

작성하라 . 펄 내장함수인 split, pop,shift, eq 등을 사용하라2. 문자열의 길이가 20 인 dna 서열이 있다 . 이 서열에 5 번째의 염기를 T

로 변환하는 프로그램을 작성하라 .3. Dna 서열을 소문자로 변환하는 프로그램을 작성하라 . 이때 tr 을

사용하지 말고 하나씩 읽어 바꾸는 프로그램을 작성하라 .4. 파일을 일고 해당 파일의 마지막 행을 처음행으로 오도록 역순서로

프린트하는 프로그램을 작성하라 . Push, pop, shift, unshift 함수를 사용하라 .

5. 위의 문제를 reverse 함수를 사용하여 작성하라 .6. FASTA format 에 저장된 DNA 서열을 단백질 서열로 변환하는

프로그램을 작성하라 .7. FASTA format 에 저장된 DNA 서열이 있다 . 이 서열에 'TAT

A' 라는 문자열이 몇 개 들어있는지 계산하는 프로그램을 작성하라