Upload
griselda-stevenson
View
213
Download
0
Embed Size (px)
Citation preview
By Michael Wolfe
Grouping Things and Hierarchical Matching
In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice
Perl allows grouping with parenthesis to choice between two different choices
Example – a(b|c) means it will match with either ab or ac without us typing each match case
Backtracking
The idea of trying one alternative and seeing if it matches, then moving on to the next one if it doesn’t
Comes from the idea of walking through the woods with multiple paths
Perl goes through all the options until it declares the string false.
Example - $string =~ /(abd|abc) (df|d|de)/
Extracting Matches
Long - # extract hours, minutes, seconds
if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format
$hours = $1; $minutes = $2; $seconds = $3; } Compact -($hours, $minutes, $second) = ($time
=~ /(\d\d):(\d\d):(\d\d)/);
Matching Repetitions
The ability to match tedious expressions like \w\w\w\w|\w\w\w|\w\w|\w.
Using quantifier metacharacters ?, *, +, and {} to a? = match 'a' 1 or 0 times a* = match 'a' 0 or more times, i.e., any number of
times a+ = match 'a' 1 or more times, i.e., at least once a{n,m} = match at least n times, but not more than m
times. a{n,} = match at least n or more times a{n} = match exactly n times Quantifiers that grab as much of the string as possible
are called maximal match or greedy quantifiers.
Repetition Examples
/y(es)?/i; # matches 'y', 'Y', or a case-
insensitive 'yes‘ $year =~ /\d{2,4}/; # make sure year is at least 2 but
not more than 4 digits
The RegExp principles
Principle 0: Taken as a whole, any regexp will be matched at the earliest possible position in the string.
Principle 1: In an alternation a|b|c..., the leftmost alternative that allows a match for the whole regexp will be the one used.
Principle 2: The maximal matching quantifiers ?, *, + and {n,m} will in general match as much of the string as possible while still allowing the whole regexp to match.
Principle 3: If there are two or more elements in a regexp, the leftmost greedy quantifier, if any, will match as much of the string as possible while still allowing the whole regexp to match. The next leftmost greedy quantifier, if any, will try to match as much of the string remaining available to it as possible, while still allowing the whole regexp to match. And so on, until all the regexp elements are satisfied.
Resources
Perl Tutorial on Course website - http://www.cs.drexel.edu/~knowak/cs265_fall_2010/perlretut_2007.pdf
Lots more examples on pages 8-17 involving regular expressions