8
By Michael Wolfe

By Michael Wolfe. Grouping Things and Hierarchical Matching In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice Perl

Embed Size (px)

Citation preview

Page 1: By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl

By Michael Wolfe

Page 2: By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl

Grouping Things and Hierarchical Matching

In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice

Perl allows grouping with parenthesis to choice between two different choices

Example – a(b|c) means it will match with either ab or ac without us typing each match case

Page 3: By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl

Backtracking

The idea of trying one alternative and seeing if it matches, then moving on to the next one if it doesn’t

Comes from the idea of walking through the woods with multiple paths

Perl goes through all the options until it declares the string false.

Example - $string =~ /(abd|abc) (df|d|de)/

Page 4: By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl

Extracting Matches

Long - # extract hours, minutes, seconds

if ($time =~ /(\d\d):(\d\d):(\d\d)/) { # match hh:mm:ss format

$hours = $1; $minutes = $2; $seconds = $3; } Compact -($hours, $minutes, $second) = ($time

=~ /(\d\d):(\d\d):(\d\d)/);

Page 5: By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl

Matching Repetitions

The ability to match tedious expressions like \w\w\w\w|\w\w\w|\w\w|\w.

Using quantifier metacharacters ?, *, +, and {} to a? = match 'a' 1 or 0 times a* = match 'a' 0 or more times, i.e., any number of

times a+ = match 'a' 1 or more times, i.e., at least once a{n,m} = match at least n times, but not more than m

times. a{n,} = match at least n or more times a{n} = match exactly n times Quantifiers that grab as much of the string as possible

are called maximal match or greedy quantifiers.

Page 6: By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl

Repetition Examples

/y(es)?/i; # matches 'y', 'Y', or a case-

insensitive 'yes‘ $year =~ /\d{2,4}/; # make sure year is at least 2 but

not more than 4 digits

Page 7: By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl

The RegExp principles

Principle 0: Taken as a whole, any regexp will be matched at the earliest possible position in the string.

Principle 1: In an alternation a|b|c..., the leftmost alternative that allows a match for the whole regexp will be the one used.

Principle 2: The maximal matching quantifiers ?, *, + and {n,m} will in general match as much of the string as possible while still allowing the whole regexp to match.

Principle 3: If there are two or more elements in a regexp, the leftmost greedy quantifier, if any, will match as much of the string as possible while still allowing the whole regexp to match. The next leftmost greedy quantifier, if any, will try to match as much of the string remaining available to it as possible, while still allowing the whole regexp to match. And so on, until all the regexp elements are satisfied.

Page 8: By Michael Wolfe. Grouping Things and Hierarchical Matching  In a regexp ab|ac is nice, but it’s not very efficient because it uses “a” twice  Perl

Resources

Perl Tutorial on Course website - http://www.cs.drexel.edu/~knowak/cs265_fall_2010/perlretut_2007.pdf

Lots more examples on pages 8-17 involving regular expressions