Introduction To Perl Dulal C. Kar. Perl Perl: Practical extraction and Reporting Language General...

Preview:

Citation preview

Introduction To Perl

Dulal C. Kar

Perl

• Perl: Practical extraction and Reporting Language

• General purpose programming language

• Downloads available from:– www.perl.com for unix or unix-like system– www.activestate.com for windows– www.macperl.com for mac op sys

An Example Perl Program

• Example#!/usr/bin/perl –wprint “Hello world\n”;

• Perl is a free-format scripted language. Perl is like a compiler and an interpreter

• Whitespace separates tokens• White spaces include spaces, tabs, newlines, returns, formfeeds

• Pound sign (#) is used to begin a comment line

• Semicolon (;) is used to separate statements

• There is no “main” routine as in C

Running a Perl Program

• Several ways to run a Perl program

$ perl progfile

$ progfile

Scalar Data

• Scalar data– Numbers and Strings

• Numbers– Integer literals and Float literals– Integer literals: 10, -2005, 0245 (octal), -0x0e

(negative hex num)– Float literals: 1.45, -1.2e20, 2.34e12

• Strings– Single-Quoted Strings: ‘hello’, ‘don\’t’– Double-Quoted Strings: “hello world\n”

Escape Characters

• Double quoted string escapes:

\n – newline \t – tab \\ - backlash

\r – carriage return \b – backspace

\f – form feed \” – double qoute

Scalar Operators

• Addition (+), subtraction (-), Multiplication (*)

• Division (/): Perl always does floating point division

• Exponentiation (**): 3 ** 2 is 9

• Modulus (%)8 % 3 is 28.5 % 3.5 computed as 8 % 3 which is 2

• Logical comparison operators: <, <=, ==, >=, >, !=

Operators for Strings

• Concatenation operator (.)“welcome”.”home” # is “welcomehome”

• String repetition operator (x)“dog” x 4 # is “dogdogdogdog”

• String comparison operators– Use FORTRN like logical operators

eq, ne, lt, gt, le, ge

• Precedence of operations is similar to c++

Conversion Between Numbers and Strings

• In most cases, Perl performs all conversions automatically

• If a string value is an operand in a numeric operator, Perl automatically converts it to its equivalent numeric value

– Trailing nonnumerics ignored, if any– Leading whitespaces ignored, if any– Zero, if it is not a number at all

• Numeric value is converted to string when a string operator is encountered

“x”.(5+5) is same as “x”.”10” or “x10”

Variables

• Scalar variable– To hold a single value

• Array– To hold a list of scalars indexed by

numbers/subscripts

• Hash– To maintain a list of scalars indexed by strings

• By default, all variables are global

Scalar Variables

• Scalar variable names begin with a dollar sign ($) followed by at least one letter (and more letters/digits/underscore)

• Names are case sensitive

• Scalars are typeless– There are no such things as integer variables, string

variables, floating-point variables, etc

• No declaration of scalars needed before use

Scalar Operators and Functions

• Assignment and arithmetic operators

$x = 23;$y = ‘Welcome’;$z = ’50’;$w = $z * 2; #$w is 100$v = “Welcome home!”;

• Similar to C/C++, it supports ++ and -- operators

• Binary assignment operators: +=, -=, *=, /=, .=, %=, **=

Scalar Operators and Functions (cont’d)

• chop function– Built-in function removes the last character from the string value

of that variable

$x = “Welcome”;chop($x); # $x is “Welcom” now$x = chop($x); # But $x is “m” now

• chomp function– Removes only a newline character

$x = “Welcome\n”;chomp($x); # $x is now “Welcome”chomp($x); # $x is still “Welcome”

Interpolation of Scalars and Strings• A double-quoted string can be used for variable

interpolation$x = “home!”;$y = “Welcome $x”; # $y is now “Welcome home!”

• No substitution is performed in single-quoted strings• To prevent substitution

$x = ‘$dude’; # $x is now “$dude”$y = “Hey $x”; # $y is now ‘Hey $dude’

• Case shifting$x = “\Ucams”; # $x is now “CAMS”$x = “\LCAMS”; # $x is now “cams”

<STDIN> as a Scalar Value

• <STDIN> is a complete text line from standard input including the newline character

$a = <STDIN>; # get text and save in $a

chomp($a); # removes newline character

Or

chomp($a = <STDIN>);

Output with print Function

print (“Welcome home\n”);

print “Welcome home\n”;

$str = “home”;

print “Welcome $str.\n”; # Welcome home

print ‘Welcome $str.\n’; # Welcome $str.\n

List Data

• List– ordered scalar data

• List Examples(1, 5, 7) # List of 1, 5, and 7($a+$b, 6, 7, 9, $a)() # Empty list(1 .. 4) # same as (1, 2, 3, 4)(1.5 .. 4.5) # same as (1.5, 2.5, 3.5, 4.5)(1.3 .. 6.1) # same as (1.3, 2.3, 3.3, 4.3, 5.3)

Arrays• Array

– Variable that holds a list– Array name begins with @ character– Can have any number of elements; dynamically

allocated– mixed values allowed

$x = 10; $y = 20; $c = 30;@p = (5, 10, 15);@q = (‘bat’,’cat’, ‘dog’); @r = (12, ‘red’, 56, ‘dog’); # mixed values@s = ($x, $y, $z); # contains 10, 20, and 30@t = (@p, @r); # @t contains elements of @p and @r

More Array Examples

($y, $x) = ($x, $y); # swaps $y and $x

($x, $y, $z) = (1, 2, 3);($w, @p) = ($x, 2, 5);($q, @p) = @p; # makes $q = 2, @p = (5)

@p = (5, 9, 8);$q = @p; # $q gets 3, number of items in @p($q) = @p; # $q gets 5, first element of @p

Accessing Array Elements• Elements are indexed as 0, 1, . . ., n-1 sequentially• Elements are also reverse-indexed beginning with the last element

as -1, -2, -3, . . ., -n• Reference to an element begins with $ sign instead of @• Part of an array can be accessed, called slicing• Subscript for a slice is an array expression enclosed in square

bracket([ ])• Example

@f = (5, 3, 8);$c = $f[1]; # $c gets 3$f[2] = 10; # now @f = (5, 3, 10)

# Examples on Slices (use [.,.,.,] )@f[1,2] = (7, 0) #changes last two values in @f to 7 and 0@f[0,1,2] = @f[0, 0, 0] # all elements set to first

Array Functions

• push and pop functions (LIFO behavior)– push appends and pop removes from end

push (@mylist, $item);push (@mylist, 3, 6, 7);@newlist = (2, 3, 4);push (@mylist, @ newlist); # push accepts a list of values$item = pop(@mylist);

• shift and unshift functions– unshift prepends and shift removes from front

unshift(@mylist, 5); unshift(@mylist, $x, $y);$z = shift(@mylist);

Array Functions (cont’d)

• sort function, reverse function, and chomp function (removes the terminating newline character, if any, from all elements)

• join function @a = (“bat”, “cat”, “dog”);

print join(‘:’, @a); # “bat:cat:dog”

• split function $a = ‘bat:cat:dog’;

@x = split(‘:’,$a); # Now @x is (‘bat’, ‘cat’, ‘dog’)

<STDIN> as an Array

• In a list content, <STDIN> returns all lines up to end of file (Ctrl-D or Ctrl-Z, depends on the system)

• Each line is returned as an element of the list

@a = <STDIN>

• Type 5 lines and then press Ctrl-D, you will have an array @a with 5 elements, each ends with a newline character

Variable Interpolation of Arrays

• In a double quoted string with an array:@a = (“dog”, “cat”, “bat”);

print “Animals: @a”; # “Animals: dog cat bat”

• Portion of an array with a slice:print “Animals: @a[1,2]”; # “Animals: cat bat”

@b = (“bat”, “cat”, “dog”, 1, 2);

print “Animals: @b[@b[3,4])]”; # “Animals: cat dog”

Hashes• Also called associative arrays

• A hash variable name start with % character• A hash stores a list of scalars indexed by arbitrary scalars (called

keys) instead of numbers

• Hash Initialization:

%team = ( ‘Bill’ => ‘William Burns’, ‘Jim’ => ‘James Brown’, ‘Tim’ => ‘Timothy Johnson’);

• Insertion $team{‘Liz’} = “Elizabeth Taylor”;

Hash Access

# Hash creation%team = ( ‘Bill’ => ‘William Burns’, ‘Jim’ => ‘James Brown’, ‘Tim’ => ‘Timothy Johnson’);

# Changing data$a = $team{‘Bill’}; # $a contains ‘William Burns’($c,$d) = @team{‘Bill’,’Tim’}; # $c contains ‘William Burns’ # $d contains ‘Timothy Johnson’

Hash Functions• keys (hash) function

– Yields a list of all the keys in the hash– In a scalar context, keys function returns number of key-value

pairs in hash– Consider %team as the example hash

$n = keys(%team); # $n is 3@k = keys(%team); # @k is (‘Bill’, ‘Jim’, ‘Tim’)

• values (hash) function– Yields a list of the values in the hash

@v = values(%team) # @v = (‘William Burns’, # ‘James Brown’, # ‘Timothy Johnson’)

Hash Functions (cont’d)

• each (hash) function– Returns a key-value pair as a two-element list– Example:

while (($first, $last) = each(%names)) {

print “The last name of $first is $last\n”;

}

• delete function– Removes hash elements

delete $team{“Liz”}; # removes the key-value pair for ‘Liz’

Relational Operators

Numeric (c-like)String (Fortran-like)

== eq

!= ne

< lt

> gt

<= le

>= ge

Control Structures

if (some_expression) { . . . }

if (some_expression) { . . . } else { . . . }

if (some_expression) { . . . }elsif (some_expression) { . . . }..else { . . . }

Looping Contructs• while statement

while (some_expression) { . . . }

• until statementuntil (some_expression) { . . . }

• do-while statementdo { . . . } while some_expression;

• do-until statementdo { . . . } until some_expression;

• for statementfor (initial_exp; test_exp; re_init-exp) { . . . }

foreach Statement

• foreach $i (@some_list) { . . }

@animals = (‘dog’, ‘cat’, ‘bat’);foreach $a (reverse @animals) {

print $a;}

@animals = (‘dog’, ‘cat’, ‘bat’);foreach (@animals) { # use $_ by default

print;}

@a = (2, 3, 4);foreach $item (@a) {

$item ++;} # @a is now (3, 4, 5)

File I/O

• Special filehandles: STDIN, STDOUT, STDERR

$a = <STDIN>; # read next line or undef@a = <STDIN>;# read all remaining lines in

#To process on line a timewhile (defined($line) = <STDIN>) {

# process $line}

#To read into $_ variable and remove newline charwhile (<STDIN>) {

chomp;# process $_ here

}

Diamond Operator: <>

• Reads lines from files given on the command line or read lines from STDIN if no files given on the command line

• Suppose you have a program named test having code:#!/usr/bin/perlwhile (<>) {

print $_;}

• And suppose you invoke with test file1 file2. Diamond operator reads each line of file1 and then file2. If you do not specify any file names, it reads from the standard input automatically.

@ARGV array

• <> operator makes use of @ARGV array which holds command line arguments

• @ARGV array can be even set in the program and used by <> operator

@ARGV = (“file1”, “file2”);# process lines of file1 and file2while (<>) {

print $_;}

More File I/O

• To open a file for readingopen (INFILE, “<filename”); # < indicates inputopen (INF, “filename”); # By default, input

• To open a file for writingopen (OUTFILE, “>filename”);

• To open a file for appendopen (APPFILE, “>>filename”);

• To close a fileclose (FILEHANDLE);

File I/O Error Checking• Check the return status of your open() statement

# File copy program with error checkingopen (INFILE, “file.in”) ||

die “Failed to open file for reading\n”;

open (OUTFILE, “file.out”) ||die “Failed to open file writing\n”;

while ($line = <INFILE>) { # read from file.inprint OUTFILE $line; # write to file.out

}

close (OUTFILE);close (INFILE);

Reading File into an Array

open(INFILE, “<file.in”) ||die “Failed to open file\n”;

@lines = <INFILE>;

close (INPUT);

File for Next Examples

• Assume “input.dat” file contains:

Brown:Mark:Corpus Christi:78413

Smith:Barney:Fargo:58105

Wayne:John:Corpus Christi:78412

Williams:Cathy:New York:10075

Example Application

• Program to read and parse file:

open (FILE, “<input.dat”) || die “Open failed.”;while ($line = <FILE>) {

chomp($line);($lname,$fname,$city,$zip) = split(‘:’,$line);print “$fname $lname lives in $city\n”;

}Close(FILE);

• Program Output………….

Reading File into a Hash

open (INFILE, “input.dat”) || die “open failed.\n”;

while ($line = <INFILE>) {chomp ($line);($lname, $fname, $city, $zip) = split(‘:’, $line);$addr{$lname} = $city;$fname{$lname} = $fname;

}close (INFILE);

# output where everybody livesforeach $n (keys(%addr)) {

print “$fname{$n} $n lives in $addr{$n}\n”;}

# Output sorted by last nameforeach $n (sort(keys(%addr))) { print “$fname{$n} $n lives in $addr{$n}\n”;}

User Defined Subroutines

• A simple subroutinesub Hi { print “Hello World!”; }

Hi (); # Hello world!

• Passing parameterssub Hi {

($name = @_;print “Hello $name!”;

}Hi(‘Scott’); # Hello Scott!Hi(‘Dawn’); # Hello Dawn!

Defining Variable Scope• A problem with parameters

sub Hi {($name = @_;print “Hello $name!”;}$name = ‘George’;Hi(‘Scott’); # Hello Scott!print “$name\n”; # Scott;

• Fixing the problemsub Hi {

my ($name) = @_;print “Hello $name!\n”;

}$name = ‘George’;Hi (‘Scott’); # Hello Scott!print “$name\n”; # George

Regular Expressions

• A regular expression is a pattern to be matched against a string

• A regular expression is enclosed in slashes

• Example 1if (/abc/) { #regular expression abc is tested print $_; # variable $_ containing a text line}

• Example 2while (<>) { # prints all lines containing reg. expr. abc if (/abc/) { print $_; }}

Regular Expressions

• To print lines containing an a followed by zero or more b’s, followed by a c:

while (<>) { if (/ab*c/) { print $_; } }

• Substitute operator s/ab*c/def/; #replaces ab*c by def in $_

Single Character Patterns

• Dot (“.”)– Matches any single character except newline (\n).

• Pattern-matching character class– Enclosed in [] and only one of the characters must be present– Examples

[abcde] # match any character a, b, c, d, or e [aeiouAEIOU] # match any vowel, upper-case or lower-case [0123456789] # match any digit[0-9] # same thing[0-9\-] # match 0-9, or minus[a-z0-9] # match any single lower case letter or digit[a-zA-Z0-9_] # match any single letter, digit, or underscore[^0-9] # match any single non-digit[^aeiouAEIOU] # match any single non-vowel[^\^] # match single character except an up-arrow

Predefined Character Class Abbreviations

Construct Equivalent Class Negated

Construct

Equivalent Negated

Class

\d (a digit)

\w (word char)

\s (space char)

[0-9]

[a-zA-Z0-9_]

[ \r\t\n\f]

\D (digits, not)

\W (words, not)

\S (space, not)

[^0-9]

[^a-zA-Z0-9_]

[^ \r\t\n\f]

Abbreviated classes can be used as part of other characters as well:

[\d-fA-F] # match one hex digit

Grouping Patterns

• Sequence (Ex: abc)

• Multiplier

– Asterisk (*) indicates zero or more of immediate previous character or class

– Question mark (?) means zero or one of immediate previous character

– Plus sign (+) means one or more of immediate previous character

– Example: /fo+ba?r/ # at least one o, optional a

– General multiplier: x{5}, x{5,10}, x{5,}

– Notice that * is equivalent to as {0,}, + is equivalent {1,}, and ? is equivalent to {0,1}.

• For more information on regular expression, read from some Web site maintained for Perl.

More Information• Local online documentation

man perlperldoc perlperldoc –f <built-in function name>

(e.g. perldoc –f chomp)

• Web siteshttp://perl.oreilly.com/http://effectiveperl.com/

• Books“Learning Perl” by Randal Schwartz & Tom Christiansen“Programming Perl” by Larry Wall, Randal Schwartz & Tom

Christiansen

Recommended