27
PERL Regular Expressions

Perl Training Regex

Embed Size (px)

DESCRIPTION

Perl Training Regex

Citation preview

Page 1: Perl Training Regex

PERL Regular Expressions

Page 2: Perl Training Regex

Regular Expressions (0)

• It’s a template that either matches or doesn’t match a given string.

• One of the most important features of PERL - “a strong regular expression support”

/PATTERN/

Page 3: Perl Training Regex

Regular Expressions (1)

the “Dirty Dozen” – Metacharacters

These characters have special meaning in regular expressions.

A backslash in front of any meta-character makes it non special.

\ . * + ? ( ) | [ { ^ $

Page 4: Perl Training Regex

Regular Expressions (2)

/to.*ols/ matches ‘to’, followed by any string, followed by ‘ols’.

/hello.you/ matches any string that has ‘hello’, followed by any one (exactly one) character, followed by ‘you’.

/to*ols/ last character before ‘*’ may be repeated zero or more times. Matches ‘tools’,’tooooools’,’tols’ (but not ‘toxols’ !!!)

/to+ols/ ------//------- one or more -----//------.

“.” matches any char except a newline “\n”Quantifiers – decides how many time the

preceding item has to be repeated.

Page 5: Perl Training Regex

Regular Expressions(3)

/to?ols/ the character before ‘?’ is optional. Thus, there are only two matching strings – ‘tools’ and ‘tols’.

/to{2}ls/ the number in ‘{}’ tells about the repetitions

{count} - Match exactly count times

{min,max} - Match at least min but not more than max times

{min,} - Match at least min times

Write {} quantifier for ‘*’, ‘+’, ‘?’ ?

Page 6: Perl Training Regex

Regular Expressions (4)

Grouping – parentheses ‘( )’ are used for grouping one or more characters.

/(tools)+/ matches “toolstoolstoolstools”.

Alternatives:

/hello (world|Perl)/ - matches “hello world”, “hello Perl”.

Page 7: Perl Training Regex

Regular Expressions (5)

Character Class - A list of all possible characters

/Hello [abcde]/ matches “Hello a” or “Hello b” …

/Hello [a-e]/ the same as above

Negating:

[^abc] any char except a,b,c

Page 8: Perl Training Regex

Regular Expressions (6)

Shortcuts

• \d digit [0-9]

• \w word character [A-Za-z0-9_ ]

• \s white space [\n \t \r \s]

Negative ^ – [^\d] matches non digit

\S anything not \s

\D anything not \d

\W anything not \w

Page 9: Perl Training Regex

The character classes for -

1. Matching of vowels

2. Matching of consonants

3. Anything other than non Numbers

Diff between – \D and [^\d]

Page 10: Perl Training Regex

Regular Expressions (7)

Anchors

^ - marks the beginning of the string

$ - marks the end of the string

/^Hello Perl/ - matches “Hello Perl, good by Perl”, but not “Perl Hello Perl”

What pattern will match blank lines ?

/^\s*$/ - matches all blank lines

/^abc/ - “^” beginning of a string

/a\^bc/ - matches “\^”

/[^abc]/ - negating

Page 11: Perl Training Regex

Regular Expressions (8)

\b - matches at either end of a word (matches the start or the end of a group of \w characters)

/\bPerl\b/ - matches “Hello Perl”, “Perl”

but not “Perl++”

\B - negative of \b

/^\w+\b/ matches with what part of “ That’s my house”

Page 12: Perl Training Regex

Regular Expressions (9)

Back references:

/(World|Perl) \1/ - matches “World World”, “Perl Perl”.

/((hello|hi) (world|Perl))/

•\1 refers to (hello|hi) (world|Perl)

•\2 refers to (hello|hi)

•\3 refers to (world|Perl) $1,$2,$3 store the values of \1,\2,\3 after a reg.expr. is applied.

Page 13: Perl Training Regex

Regular Expressions (10)

Option modifiers

/i : Case insensitive

/s : “.” will match “\n”

/m : Let “^” & “$” match next to embedded “\n”

/x : Ignore white spaces

/o : Compile the pattern once

Page 14: Perl Training Regex

Regular Expressions (11)

Bind Operator “ =~ ” Tells Perl to match the pattern on the right

against the string on the left.

Pattern match operator “ m// ” $str =~ /pattern/; $str =~ m/pattern/;

Page 15: Perl Training Regex

if( $str =~ /hello/){

}

while( <STDIN> ){

if( /hello/ ){

}

}@words = split /\s+/, $str;

When no variable is mentioned the pattern is matched with default variable “$_”

Regular Expressions (12)

Page 16: Perl Training Regex

Examples$date="12 10 10";if($date=~ /(\d+)/){ print $1.":".$2.":".$3.":\n";}

#output ($2 and $3 are empty): #12:::

if($date=~ /(\d+)(\s+\1)+/){ print $1.":".$2.":".$3.":\n"; }

#output (notice $3 is empty): #10: 10::

$str="Hello World";if($str=~ /((Hello|Hi) (World|Perl))/){ print $1.":".$2.":".$3.":\n"; }

#output:#Hello World:Hello:World:

$str="Hello Perl Hi";if($str=~ /((Hello|Hi) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; }

#output: non$str="Hello Perl Hi";if($str=~ /((Hello|Hi) (World|Perl)) \1/){ print $1.":".$2.":".$3.":\n"; }

#output:#Hi Perl:Hi:Perl:

Page 17: Perl Training Regex

Examples

1. What is it?

/^0x[0-9a-fA-F]+$/

2. Date format: Month-Day-Year -> Year:Day:Month

$date = “12-31-1901”;

$date =~ s/(\d+)-(\d+)-(\d+)/$3:$2:$1/;

Page 18: Perl Training Regex

Examples

4. /^\w+\b/ matches with what part of “ That’s my house”

3. Make a pattern that matches any line of input that has the same word repeated two (or more) times in a row. Whitespace between words may differ.

Page 19: Perl Training Regex

Example

1. /\w+/ #matches a word

2. /(\w+)/ #to remember later

3. /(\w+)\1/ #two times

4. /(\w+)\s+\1/ #whitespace between words

5. “This is a test” -> /\b(\w+)\s+\1/

6. “This is the theory” -> /\b(\w+)\s+\1\b/

Page 20: Perl Training Regex

Lets try

1) Write a regular expression that identifies a 24-hour clock. For example: 0:01, 00:20, 15:00, 23:59

2) Write a regular expression that identifies a floating point. For example: 10, 10.0001, -0.1, +001.3456789

For both write a single program that identifies these patterns in the input lines and prints out only the matched patterns.

Page 21: Perl Training Regex

Negated Match

if( $str =~ /hello/){

}

if( $str !~ /hello/){

}

Negation

Page 22: Perl Training Regex

Regular Expressions (13)

$& - what really was matched

$` - what was before

$’ - the rest of the string after the matched pattern

$` . $& . $’ - original string

Caution: Never use this in your script if you really don’t need this.

Page 23: Perl Training Regex

Regular Expressions (14)

Substitutions:

s/T/U/; #substitutes T with U (only once)

s/T/U/g; #global substitution

s/\s+/ /g; #collapses whitespaces

s/(\w+) (\w+)/$2 $1/g;

s/T/U/; #applied on $_ variable

$str =~ s/T/U/;

Page 24: Perl Training Regex

Regular Expressions (15)

File Extension Renaming:

my ($from, $to) = @ARGV;

@files = glob (“*.$from”);

foreach $file (@files){

$newfile = $file;

$newfile =~ s/\.$from/\.$to/g;

rename($file, $newfile);

}

=~ s/\.$from$/\.$to/g

Page 25: Perl Training Regex

Split and Join

$str=“aaa bbb ccc dddd”;

@words = split /\s+/, $str;

$str = join ‘:‘, @words; #result is “aaa:bbb:ccc:dddd”

@words = split /\s+/, $_; “ aaa b” -> “”, “aaa”, “b”

@words = split; “ aaa b” -> “aaa”, “b”

@words = split ‘ ‘, $_; “ aaa b” -> “aaa”, “b”

Page 26: Perl Training Regex

Grep

grep EXPR, LIST;

@results = grep /^>/, @array;@results = grep /^>/, <FILE>;

Page 27: Perl Training Regex

Thank You !!!