Upload
bruno-jacobs
View
228
Download
1
Embed Size (px)
Citation preview
Topic 6: Regular expressionsTopic 6: Regular expressions
CSE2395/CSE3395Perl Programming
CSE2395/CSE3395Perl Programming
Llama3 chapters 7-9, pages 98-127
Camel3 pages 139-195
perlre manpage
2Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
In this topicIn this topic
Regular expressions► performing pattern matching
Regular expressions► performing pattern matching
3Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Matching stringsMatching strings
Can find one string within another using index function► returns position of start of substring, or -1 on failure► $needle = "tac";► print index "haystack", $needle; # 4
Only works for constant substrings► not usually sufficient for common pattern-matching
uses
Can find one string within another using index function► returns position of start of substring, or -1 on failure► $needle = "tac";► print index "haystack", $needle; # 4
Only works for constant substrings► not usually sufficient for common pattern-matching
uses
Llama3 pages 208-209; Camel3 page 731; perlfunc manpage
4Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Regular expressionsRegular expressions
Regular expressions are a mini-language used to describe patterns of characters
► e.g., look for a “t”, followed by any vowel, followed by any letter Some strings satisfy a given regular expression
► haystack► taciturn (twice)► settee► top
Some strings can’t satisfy it► mouse► cattle► bite me (has space where consonant needed to be)► empty string
Regular expressions are a mini-language used to describe patterns of characters
► e.g., look for a “t”, followed by any vowel, followed by any letter Some strings satisfy a given regular expression
► haystack► taciturn (twice)► settee► top
Some strings can’t satisfy it► mouse► cattle► bite me (has space where consonant needed to be)► empty string
Llama3 pages 98-99
5Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Regular expressionsRegular expressions
Several Unix programs have support for regular expressions► usually programs which manipulate text► grep (print lines matching a pattern)► sed and awk (stream editors)► vi and emacs (text editors)► lex (tokenizer)► procmail (mail filter)► perl (some programming language)
Share a (reasonably) common format► some minor differences in capabilities and dialects► previous slide’s example written t[aeiou][a-z]
Several Unix programs have support for regular expressions► usually programs which manipulate text► grep (print lines matching a pattern)► sed and awk (stream editors)► vi and emacs (text editors)► lex (tokenizer)► procmail (mail filter)► perl (some programming language)
Share a (reasonably) common format► some minor differences in capabilities and dialects► previous slide’s example written t[aeiou][a-z]
6Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Unix grep programUnix grep program
grep prints out any line in its input that matches a regular expression► only distantly related to Perl’s grep function
grep prints out any line in its input that matches a regular expression► only distantly related to Perl’s grep function
% grep 't[aeiou][a-z]' /usr/dict/wordsabatedabettedabolition... lots more words here ...yesterdayyoungsterytterbium
Llama3 page 99
7Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Regular expressions in PerlRegular expressions in Perl
Perl tries to match regular expression patterns to the string in the variable $_► if successful anywhere inside string, result is true► otherwise (unsuccessful everywhere), result is false
Pattern is written between two forward slashes► /t[aeiou][a-z]/► /.../ called match operator► boolean value returned
– usually used inside if or while condition– if (/t[aeiou][a-z]/) { ... }
Perl tries to match regular expression patterns to the string in the variable $_► if successful anywhere inside string, result is true► otherwise (unsuccessful everywhere), result is false
Pattern is written between two forward slashes► /t[aeiou][a-z]/► /.../ called match operator► boolean value returned
– usually used inside if or while condition– if (/t[aeiou][a-z]/) { ... }
Llama3 page 100; Camel3 pages 140, 145-150, 218; perldoc manpage
8Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Find occurrences of a pattern in the named files.
# Read lines of input into $_, one at a time.while (<>){ # Check for the pattern in $_. if (/t[aeiou][a-z]/) { # Success. Print out this line. print; }}
# Find occurrences of a pattern in the named files.
# Read lines of input into $_, one at a time.while (<>){ # Check for the pattern in $_. if (/t[aeiou][a-z]/) { # Success. Print out this line. print; }}
9Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Patterns: literal charactersPatterns: literal characters
Alphanumeric characters match themselves► /abc/ matches substring "abc"► /123/ matches substring "123"
Most other characters require a backslash in order to match themselves► /\[a\]/ matches substring "[a]"► /\/usr\/bin/ matches substring "/usr/bin"► if in doubt, backslash all non-alphanumerics
Backslashes before alphanumerics are special► /\n/ matches newline character► /\b/ matches word boundary► /\d/ is shorthand for /[0-9]/► /\1/ is a backreference
Alphanumeric characters match themselves► /abc/ matches substring "abc"► /123/ matches substring "123"
Most other characters require a backslash in order to match themselves► /\[a\]/ matches substring "[a]"► /\/usr\/bin/ matches substring "/usr/bin"► if in doubt, backslash all non-alphanumerics
Backslashes before alphanumerics are special► /\n/ matches newline character► /\b/ matches word boundary► /\d/ is shorthand for /[0-9]/► /\1/ is a backreference
Llama3 page 100; Camel3 page 158; perlre manpage
10Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Patterns: character classesPatterns: character classes
[letters] matches exactly one of the enclosed letters► /[abc]/ matches substrings "a" or "b" or "c"► can specify ranges with hyphen► /[0-9]/ matches any single digit
inverted classes: [^letters] matches any one character except any of those enclosed► /[^abc]/ matches substring "x" but not "a"► /[^0-9]/ matches any one non-digit
Some common character classes have shorthand forms► /\d/ (digit) same as /[0-9]/► /\s/ (space) same as /[ \t\n\r\f]/► /\w/ (“word letter”) same as /[a-zA-Z0-9_]/► inverted shortcuts /\D/ (non-digit), /\S/ (non-space), /\W/
[letters] matches exactly one of the enclosed letters► /[abc]/ matches substrings "a" or "b" or "c"► can specify ranges with hyphen► /[0-9]/ matches any single digit
inverted classes: [^letters] matches any one character except any of those enclosed► /[^abc]/ matches substring "x" but not "a"► /[^0-9]/ matches any one non-digit
Some common character classes have shorthand forms► /\d/ (digit) same as /[0-9]/► /\s/ (space) same as /[ \t\n\r\f]/► /\w/ (“word letter”) same as /[a-zA-Z0-9_]/► inverted shortcuts /\D/ (non-digit), /\S/ (non-space), /\W/
Llama3 page 105-107; Camel3 pages 159, 165-167; perlre manpage
11Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Patterns: any characterPatterns: any character
. (full stop) shorthand for [^\n] (any character but newline)
► effectively “any character” because $_ seldom contains newline– except perhaps unchomped one at very end
► /d.g/ matches substrings "dog", "dig", "d g", "d!g"► /...../ matches substring containing any five characters
– true when $_ contains at least five characters► /.\../ matches any character, a dot, then any character
– true when $_ contains a dot that isn’t the first or last character of the line
. (full stop) shorthand for [^\n] (any character but newline)
► effectively “any character” because $_ seldom contains newline– except perhaps unchomped one at very end
► /d.g/ matches substrings "dog", "dig", "d g", "d!g"► /...../ matches substring containing any five characters
– true when $_ contains at least five characters► /.\../ matches any character, a dot, then any character
– true when $_ contains a dot that isn’t the first or last character of the line
Llama3 page 100; Camel3 page 159; perlre manpage
12Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
Write regular expressions to match strings containing:► the word “dog” in any form of capitalization► a car’s number plate► a phone number► a four-letter word beginning with “s”► “s” at the beginning of the line► no text at all (an empty line)► a double letter
Write regular expressions to match strings containing:► the word “dog” in any form of capitalization► a car’s number plate► a phone number► a four-letter word beginning with “s”► “s” at the beginning of the line► no text at all (an empty line)► a double letter
13Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
MultipliersMultipliers
Multipliers allow the previous part of the pattern to repeat► by default, applies to previous letter or character class
– can group using parentheses► write multiplier after part of pattern to repeat► * (asterisk) means “0 or more times”
– /at*e/ matches strings "Caesar", "fate", "matter"– /.*/ matches zero or more of any character
– by itself, matches any string► + (plus) means “one or more times”
– /at+e/ matches "fate", "matter" but not "Caesar"► ? (question mark) means “0 or 1 times”
– /colou?r/ matches substrings "color" and "colour"
Multipliers allow the previous part of the pattern to repeat► by default, applies to previous letter or character class
– can group using parentheses► write multiplier after part of pattern to repeat► * (asterisk) means “0 or more times”
– /at*e/ matches strings "Caesar", "fate", "matter"– /.*/ matches zero or more of any character
– by itself, matches any string► + (plus) means “one or more times”
– /at+e/ matches "fate", "matter" but not "Caesar"► ? (question mark) means “0 or 1 times”
– /colou?r/ matches substrings "color" and "colour"
Llama3 page 100; Camel3 pages 176-178; perlre manpage
14Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Alternation and groupingAlternation and grouping
| (vertical bar) separates alternatives► more flexible than character classes► /cat|dog/ matches substrings "cat" and "dog"► /a|b|c/ means same as /[abc]/
( parentheses ) used to group part of pattern► to apply multiplier to more than one character
– /c(er)+s/ matches strings "saucers" and "sorcerers"
► to factor out common parts of a pattern– /(cat|sel)fish/ matches substrings "catfish" and "selfish"
► to use backreferences and capture strings– see later
| (vertical bar) separates alternatives► more flexible than character classes► /cat|dog/ matches substrings "cat" and "dog"► /a|b|c/ means same as /[abc]/
( parentheses ) used to group part of pattern► to apply multiplier to more than one character
– /c(er)+s/ matches strings "saucers" and "sorcerers"
► to factor out common parts of a pattern– /(cat|sel)fish/ matches substrings "catfish" and "selfish"
► to use backreferences and capture strings– see laterLlama3 page102; Camel3 page 187-188,182-185; perlre manpage
15Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
AnchorsAnchors
Sometimes want a pattern to match only at beginning or end of string
► called “anchoring” a pattern ^ (caret) means “beginning of string”
► /^s/ matches beginning of string followed by “s”– i.e., any string that starts with “s”
$ (dollar) means “end of string”► /r$/ matches “r” followed by end of string
– i.e., any string that ends with “r”► works even if string has not been chomped
Both can be used in same regular expression► /^dog$/ matches only if entire string is "dog“
\b means “boundary between word (\w) and non-word (\W) characters”
Sometimes want a pattern to match only at beginning or end of string
► called “anchoring” a pattern ^ (caret) means “beginning of string”
► /^s/ matches beginning of string followed by “s”– i.e., any string that starts with “s”
$ (dollar) means “end of string”► /r$/ matches “r” followed by end of string
– i.e., any string that ends with “r”► works even if string has not been chomped
Both can be used in same regular expression► /^dog$/ matches only if entire string is "dog“
\b means “boundary between word (\w) and non-word (\W) characters”
Llama3 pages 108-109; Camel3 page 178-180; perlre manpage
16Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Mail headers revisited: verify mail header format.
# Mail headers look like either of these lines:# word: anything after the colon# continuation lines are indented
while (<>){ # Stop when blank line reached; end of headers. last if /^$/;
# Patterns match if line starts with either # - at least one non-space, then colon, or # - a space unless (/^(\S+:|\s)/) { print "Bad header line:\n$_"; }}
# Mail headers revisited: verify mail header format.
# Mail headers look like either of these lines:# word: anything after the colon# continuation lines are indented
while (<>){ # Stop when blank line reached; end of headers. last if /^$/;
# Patterns match if line starts with either # - at least one non-space, then colon, or # - a space unless (/^(\S+:|\s)/) { print "Bad header line:\n$_"; }}
17Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
split and joinsplit and join
split function breaks a string up into pieces► takes regular expression to specify how pieces are to be
separated; returns the pieces as a list► @threeparts = split / /, "cat and mouse";► foreach (split /\s+/, $line) { ... }► @fields = split /,/, $record; # CSV
join function joins a list into a string► takes string to specify what goes between pieces; returns the
glued pieces together into a string► $phrase = join " and ", "cat", "mouse", "fish"► print join " ", @words;► $record = join ",", @fields; # CSV
split function breaks a string up into pieces► takes regular expression to specify how pieces are to be
separated; returns the pieces as a list► @threeparts = split / /, "cat and mouse";► foreach (split /\s+/, $line) { ... }► @fields = split /,/, $record; # CSV
join function joins a list into a string► takes string to specify what goes between pieces; returns the
glued pieces together into a string► $phrase = join " and ", "cat", "mouse", "fish"► print join " ", @words;► $record = join ",", @fields; # CSV
Llama3 pages 125-127; Camel3 pages 794-796, 733; perlfunc manpage
18Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Iterate over every word in an input stream.
# Read each line of inputwhile (<STDIN>){ foreach (split /\s+/, $_) { next if /^$/; # Skip blank words.
do_something($_); }}
sub do_something{ print "Saw word ", shift, "\n";}
# Iterate over every word in an input stream.
# Read each line of inputwhile (<STDIN>){ foreach (split /\s+/, $_) { next if /^$/; # Skip blank words.
do_something($_); }}
sub do_something{ print "Saw word ", shift, "\n";}
19Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
Write regular expressions to match strings containing:► the word “dog” in any form of capitalization► a car’s number plate► a phone number► a four-letter word beginning with “s”► “s” at the beginning of the line► no text at all (an empty line)► a double letter
Write regular expressions to match strings containing:► the word “dog” in any form of capitalization► a car’s number plate► a phone number► a four-letter word beginning with “s”► “s” at the beginning of the line► no text at all (an empty line)► a double letter
20Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Advanced regular expressionsAdvanced regular expressions
Most languages can process regular expressions of complexity seen so far
Perl has many more advanced features which use regular expressions► case-insensitive matching► interpolating patterns► backreferences► capturing matched strings► substitution► matching variables other than $_► greedy and lazy multipliers
Most languages can process regular expressions of complexity seen so far
Perl has many more advanced features which use regular expressions► case-insensitive matching► interpolating patterns► backreferences► capturing matched strings► substitution► matching variables other than $_► greedy and lazy multipliers
21Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Case-insensitive matchesCase-insensitive matches
Regular expressions normally sensitive to case► /a/ doesn’t match substring "A"
Can make pattern case-insensitive using i modifier► put i character immediately after end of match
operator► /a/i matches substrings "a" or "A"
Regular expressions normally sensitive to case► /a/ doesn’t match substring "A"
Can make pattern case-insensitive using i modifier► put i character immediately after end of match
operator► /a/i matches substrings "a" or "A"
Llama3 page 116; Camel3 pages 147-178; perlre manpage
22Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Interpolating into patternsInterpolating into patterns
Variables can be interpolated into regular expressions► like double-quoted strings► $pattern = 'fish(es)?'; /cat$pattern/
– same as /catfish(es)?/
Variables can be interpolated into regular expressions► like double-quoted strings► $pattern = 'fish(es)?'; /cat$pattern/
– same as /catfish(es)?/
Llama3 page 118; Camel3 pages 190-191; perlre manpage
23Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Perl implementation of Unix grep program
# Pattern is first command-line argument$pattern = shift;
while (<>){ # Print the line if it matches the pattern. # o ("once") modifier tells Perl to assume that # the pattern never changes; this allows Perl # to re-use the compiled regular expression, # making the program run faster. print if /$pattern/o;}
# Perl implementation of Unix grep program
# Pattern is first command-line argument$pattern = shift;
while (<>){ # Print the line if it matches the pattern. # o ("once") modifier tells Perl to assume that # the pattern never changes; this allows Perl # to re-use the compiled regular expression, # making the program run faster. print if /$pattern/o;}
24Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
BackreferencesBackreferences
So far, cannot write pattern to match double letter► /[a-z][a-z]/ matches any two letters, even if different
Need pattern that says: “match any letter, calling the matched string ‘1’, then match string ‘1’ again”
Backreferences refer to the substrings matched by previous parts of the pattern
► put parentheses around part of pattern to remember– first ( and its matching ) become string 1– second ( and its matching ) become string 2
► write backreference as \1, \2, etc.► /([a-z])\1/ matches substring composed of any double letter► /\b(\w+)\b.*\b\1\b/ matches any string containing the
same word twice
So far, cannot write pattern to match double letter► /[a-z][a-z]/ matches any two letters, even if different
Need pattern that says: “match any letter, calling the matched string ‘1’, then match string ‘1’ again”
Backreferences refer to the substrings matched by previous parts of the pattern
► put parentheses around part of pattern to remember– first ( and its matching ) become string 1– second ( and its matching ) become string 2
► write backreference as \1, \2, etc.► /([a-z])\1/ matches substring composed of any double letter► /\b(\w+)\b.*\b\1\b/ matches any string containing the
same word twice
Llama3 pages 109-111; Camel3 pages 182-184; perlre manpage
25Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Capturing stringsCapturing strings
Matched backreference substrings are available after the match succeeds
► backreference \1 is available in special variable $1► backreference \2 is available in special variable $2► etc.
Allows code to find out what strings matched which parts of a pattern► $_ = "haystack"; /(t([aeiou])[a-z])/ puts "tac" in $1 and "a" in $2
Captured strings are available until next match succeeds► if match fails, variables are not set
Matched backreference substrings are available after the match succeeds
► backreference \1 is available in special variable $1► backreference \2 is available in special variable $2► etc.
Allows code to find out what strings matched which parts of a pattern► $_ = "haystack"; /(t([aeiou])[a-z])/ puts "tac" in $1 and "a" in $2
Captured strings are available until next match succeeds► if match fails, variables are not set
Llama3 pages 109-111; Camel3 pages 182-185; perlre, perlvar manpages
26Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Identify mail headers
while (<STDIN>){ last if /^$/;
# Extract name of header (before colon) into $1 # and content of header (after colon to end of # line) into $2. # Match fails on continuation lines, so # $1 and $2 variables not set. if (/^(\S+):\s?(.*)$/) { print "Header name is $1, contains $2\n"; }}
# Identify mail headers
while (<STDIN>){ last if /^$/;
# Extract name of header (before colon) into $1 # and content of header (after colon to end of # line) into $2. # Match fails on continuation lines, so # $1 and $2 variables not set. if (/^(\S+):\s?(.*)$/) { print "Header name is $1, contains $2\n"; }}
27Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Fancy Unix grep that identifies where a match was.
$pattern = shift;
# ANSI terminal escapes$bold = "\033[1m"; $norm = "\033[0m";
while (<>){ # Look for pattern, capture it into $2. # Also capture all previous text on line into $1 # and all following text to $3. if (/^(.*)($pattern)(.*)$/o) { print "$1$bold$2$norm$3"; }}
# Fancy Unix grep that identifies where a match was.
$pattern = shift;
# ANSI terminal escapes$bold = "\033[1m"; $norm = "\033[0m";
while (<>){ # Look for pattern, capture it into $2. # Also capture all previous text on line into $1 # and all following text to $3. if (/^(.*)($pattern)(.*)$/o) { print "$1$bold$2$norm$3"; }}
28Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Multiplier greedinessMultiplier greediness
Multipliers *, + and ? are normally greedy► if there are two ways to successfully match a string, they will try
to match the longest substring► $_ = "mississippi"; /m.*ss/ matches up to second “ss”
– because /.*/ would prefer to match “issi” than just “i”
Non-greedy (lazy) multipliers *?, +? and ?? exist► will try to match the shortest substring► $_ = "mississippi"; /m.*?ss/ matches up to first “ss”
If only one way to match, greedy and lazy multipliers match same way
Greediness only important if need to know which part of string matched a pattern
► if using \1, \2, $1, $2, etc.► if using s/.../.../
Multipliers *, + and ? are normally greedy► if there are two ways to successfully match a string, they will try
to match the longest substring► $_ = "mississippi"; /m.*ss/ matches up to second “ss”
– because /.*/ would prefer to match “issi” than just “i”
Non-greedy (lazy) multipliers *?, +? and ?? exist► will try to match the shortest substring► $_ = "mississippi"; /m.*?ss/ matches up to first “ss”
If only one way to match, greedy and lazy multipliers match same way
Greediness only important if need to know which part of string matched a pattern
► if using \1, \2, $1, $2, etc.► if using s/.../.../
Camel3 pages 177-178; perlre manpage
29Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
SubstitutionSubstitution
To replace a matched substring with a new substring, use s/pattern/replacement/ operator
pattern is a regular expression to find in the $_ variable
replacement is the string to replace the matching part of $_► not a regular expression► may contain $1, $2, etc. captured strings
If pattern not found, no change is made to $_ s/colou?r/hue/; # Make a synonym
To replace a matched substring with a new substring, use s/pattern/replacement/ operator
pattern is a regular expression to find in the $_ variable
replacement is the string to replace the matching part of $_► not a regular expression► may contain $1, $2, etc. captured strings
If pattern not found, no change is made to $_ s/colou?r/hue/; # Make a synonym
Llama3 pages 122-123; Camel3 pages 152-155; perlop manpage
30Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
SubstitutionSubstitution
Variables are interpolated into both pattern and replacement► s/$regex/$new/;
Substitution normally only occurs for the first match in a string► use g (“global”) modifier to make substitution repeat
as often as possible on the string– s/cat/dog/g;
► substitution also takes i (case-insensitive) modifier
Variables are interpolated into both pattern and replacement► s/$regex/$new/;
Substitution normally only occurs for the first match in a string► use g (“global”) modifier to make substitution repeat
as often as possible on the string– s/cat/dog/g;
► substitution also takes i (case-insensitive) modifier
Llama3 pages 123, 124; Camel3 page 153; perlop manpage
31Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Censor: change some words in input to others.
%swearwords = ( 'Micro[s\$]oft' => 'M.......t', 'Windows( (95|98|ME))?' => 'Windoze', 'Python' => 'anti-Perl' );
while (<>){ while (($bad, $euphemism) = each %swearwords) { # s/// returns number of times succeeded $count += s/$bad/$euphemism/gi; } print;}print "$count words changed\n";
# Censor: change some words in input to others.
%swearwords = ( 'Micro[s\$]oft' => 'M.......t', 'Windows( (95|98|ME))?' => 'Windoze', 'Python' => 'anti-Perl' );
while (<>){ while (($bad, $euphemism) = each %swearwords) { # s/// returns number of times succeeded $count += s/$bad/$euphemism/gi; } print;}print "$count words changed\n";
32Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Binding operator =~Binding operator =~
Match /.../ and substitution s/.../.../ operators match against $_ variable by default
Can match against any variable with binding operator =~► put variable on left of operator► put match/substitution on right of operator► if ($string =~ /pattern/) { ... }► $changeme =~ s/cat/dog/g;► ($copy = $orig) =~ s/cat/dog/g;► if ($_ =~ /pattern/) # Redundant
Match /.../ and substitution s/.../.../ operators match against $_ variable by default
Can match against any variable with binding operator =~► put variable on left of operator► put match/substitution on right of operator► if ($string =~ /pattern/) { ... }► $changeme =~ s/cat/dog/g;► ($copy = $orig) =~ s/cat/dog/g;► if ($_ =~ /pattern/) # Redundant
Llama3 pages 117-118; Camel3 pages 93-94; perlop manpage
33Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Covered in this topicCovered in this topic
Regular expressions► Character classes
– [...], ., \s, \S, \d, etc.► Multipliers
– *, +, ?, non-greedy versions *?, +?, ??► Anchors
– ^, $ Match operator /.../ Interpolation split and join Alternation and grouping Backreferences and capturing substrings
► \1, \2, $1, $2, etc. Substitution operator s/.../.../ Binding operator =~
Regular expressions► Character classes
– [...], ., \s, \S, \d, etc.► Multipliers
– *, +, ?, non-greedy versions *?, +?, ??► Anchors
– ^, $ Match operator /.../ Interpolation split and join Alternation and grouping Backreferences and capturing substrings
► \1, \2, $1, $2, etc. Substitution operator s/.../.../ Binding operator =~
34Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Going furtherGoing further
Advanced regular expressions► look-ahead, look-behind, evaluating Perl expressions
as regular expressions, etc.► Camel3 pages 195-216► Mastering Regular Expressions, by Jeffrey Friedl,
O’Reilly & Associates
tr/.../.../► transliteration operator, like Unix tr program
sed, awk, grep, vi, ...► some of Unix’s more powerful pattern-matching tools► man sed, man awk, ...
Advanced regular expressions► look-ahead, look-behind, evaluating Perl expressions
as regular expressions, etc.► Camel3 pages 195-216► Mastering Regular Expressions, by Jeffrey Friedl,
O’Reilly & Associates
tr/.../.../► transliteration operator, like Unix tr program
sed, awk, grep, vi, ...► some of Unix’s more powerful pattern-matching tools► man sed, man awk, ...
35Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Next topicNext topic
File I/O Opening and closing files Reading from and writing to files Manipulating files and directories Communicating with processes
File I/O Opening and closing files Reading from and writing to files Manipulating files and directories Communicating with processes
Llama3 chapter 6, pages 86-97, chapters 11-14, pages 148-207Camel3 pages 20-22, 28-29, 97-100, 426-428, 747-755, 770perlfunc, perlopentut manpages