By Michael Wolfe. Grouping Things and Hierarchical Matching In a regexp ab|ac is nice, but it’s...

Preview:

Citation preview

By Michael Wolfe

Grouping Things and Hierarchical Matching

In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice

Perl allows grouping with parenthesis to choice between two different choices

Example – a(b|c) means it will match with either ab or ac without us typing each match case

Backtracking

The idea of trying one alternative and seeing if it matches, then moving on to the next one if it doesn’t

Comes from the idea of walking through the woods with multiple paths

Perl goes through all the options until it declares the string false.

Example - $string =~ /(abd|abc) (df|d|de)/

Extracting Matches

Long - # extract hours, minutes, seconds

if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format

$hours = $1; $minutes = $2; $seconds = $3; } Compact -($hours, $minutes, $second) = ($time

=~ /(\d\d):(\d\d):(\d\d)/);

Matching Repetitions

The ability to match tedious expressions like \w\w\w\w|\w\w\w|\w\w|\w.

Using quantifier metacharacters ?, *, +, and {} to a? = match 'a' 1 or 0 times a* = match 'a' 0 or more times, i.e., any number of

times a+ = match 'a' 1 or more times, i.e., at least once a{n,m} = match at least n times, but not more than m

times. a{n,} = match at least n or more times a{n} = match exactly n times Quantifiers that grab as much of the string as possible

are called maximal match or greedy quantifiers.

Repetition Examples

/y(es)?/i; # matches 'y', 'Y', or a case-

insensitive 'yes‘ $year =~ /\d{2,4}/; # make sure year is at least 2 but

not more than 4 digits

The RegExp principles

Principle 0: Taken as a whole, any regexp will be matched at the earliest possible position in the string.

Principle 1: In an alternation a|b|c..., the leftmost alternative that allows a match for the whole regexp will be the one used.

Principle 2: The maximal matching quantifiers ?, *, + and {n,m} will in general match as much of the string as possible while still allowing the whole regexp to match.

Principle 3: If there are two or more elements in a regexp, the leftmost greedy quantifier, if any, will match as much of the string as possible while still allowing the whole regexp to match. The next leftmost greedy quantifier, if any, will try to match as much of the string remaining available to it as possible, while still allowing the whole regexp to match. And so on, until all the regexp elements are satisfied.

Resources

Perl Tutorial on Course website - http://www.cs.drexel.edu/~knowak/cs265_fall_2010/perlretut_2007.pdf

Lots more examples on pages 8-17 involving regular expressions

Recommended