Perl Regex 2

Embed Size (px)

Citation preview

  • 8/10/2019 Perl Regex 2

    1/37

    University of VictoriaDepartment of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1

    Perl: Regular

    expressionsA powerfu l tool for search ing and

    transform ing text.

  • 8/10/2019 Perl Regex 2

    2/37

    University of VictoriaDepartment of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2

    Motivation

    We have seen manyoperations involvingstring comparisons

    Several Perl buil t -infunct ions also help wit h

    operat ions on st rings split & join substr length

    There is a lot we can dowith such functions

    Example: Given a string hold ing

    some tim estam p,extract out differentparts of date & tim e

    while (my $line = ) { chomp $line; if ($line eq BEGIN!ST"#T) { % &&& ''

    % &&&

    my ($popey* $+,l-e) = .pli //* $foo;if ($popey eq DST"#T) { % &&& ec ec ec'

    0c.+1fiel2. = .pli /*/* $inp-1line;

    $o-p- = 3oin * 02,,;

    $fi.1ch, = .-4. $inp-* 5* 6;

    $wi2h = len7h $he,2in7;pin $he,2in7* 8npin 9 : $wi2h;

  • 8/10/2019 Perl Regex 2

    3/37

    University of VictoriaDepartment of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 3

    Motivation

    Recall:

    iCalendar dates are usedby iCal-like program s

    The year, m onth, etc.portions of the code arefixed in position

    How could we use substrto help us?

    This code certainly obt ainswhat we need.

    But it can be a bit trickyto get right.

    Adapting code to useanother date/tim e form atis not t rivial

    and is bugbait!

    my $2,eime = 556T5555;

    $ye, = .-4. $2,eime* 5* ;$monh = .-4. $2,eime* * ;$2,y = .-4. $2,eime* ?* ;$ho- = .-4. $2,eime* @* ;$min = .-4. $2,eime* 66* ;

    $.ec = .-4. $2,eime* 6* ;

    % ISA ?56 ime fom,my $2,eime = i5596596T6C69555;

    $ye, = .-4. $2,eime* 6* ;$monh = .-4. $2,eime* C* ;

    % coffee 4e,% &&&$2,y = .-4. $2,eime* @* ;$ho- = .-4. $2,eime* 6* ;$min = .-4. $2,eime* 6* ;$.ec = .-4. $2,eime* 6?* ;

    Hazardous

    toyour

    health

  • 8/10/2019 Perl Regex 2

    4/37

    University of VictoriaDepartment of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 4

    Motivation

    A bett er m ethod is toindicate the str ings patt ernin a way the reflects theactual order of patterncomponents The date begins at the

    start of the string. The year is four d igits. The m onth follows (two

    digits) and then the day . The T character

    separates the date andt ime Hour, minut e and date

    follow, each two digitslong.

    For the elder Perlm ongers:

    my ($ye,* $monh* $2,y* $ho-* $min-e* $.econ2) = $2,eime = m{ 8" % ., of .in7 (82{') % ye, (82{') % monh

    (82{') % 2,y T % lie,l T (82{') % ho- (82{') % min-e (82{') % .econ2 8F % en2 of .in7 ':m.;

    my $2,eime = 556T5555;

    if ($2,eime = /(82{')(82{')(82{')T(82{')(82{')(82{')$/){ ($ye,* $monh* $2,y* $ho-* $min* $.ec) = ($6* $* $* $* $* $?);'

  • 8/10/2019 Perl Regex 2

    5/37

    University of VictoriaDepartment of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 5

    Motivation

    Back to our codemodification example

    Now we have a differentdate format

    Using a regular

    expression, we cangreatly reduce thepossibility of bugs

    Str ing begins with an i

    followed by year

    followed by a dash followed by m onth

    etc

    my ($ye,* $monh* $2,y* $ho-* $min-e* $.econ2) = $ic,l12,e = m{ 8" % ., of .in7 i % lie,l i

    (82{') % ye, 9 % lie,l 2,.h (82{') % monh 9 % lie,l 2,.h (82{') % 2,y T % lie,l T (82{') % ho- % lie,l colon (82{') % min-e

    % lie,l colon (82{') % .econ2 &H % i7noe em,in2e 8F % en2 of .in7 ':m.;

    ISA ?56 ime fom,

    my $2,eime = i5596596T6C69555;

  • 8/10/2019 Perl Regex 2

    6/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 6

    Topics

    Simple m atching

    Metacharacters

    Anchored search

    Character classes

    Range operators incharacter classes

    Matching any character

    Grouping

    Ext ract ing Matches

    Search and Replace

    Our coverage of regex syntax willbe much more slowly paced thatthe mot ivation just shown!

    Prev ious slides have beenshown to give you a f lavourof what regular expressions

    can achieve. We will learn how t oconstruct such expressionover the next few lectures.

    We have a range of t opics

    Regular expressions can seemcomplex and cryptic

    However, slow and pat ientwork with such expressionswill improve yourproductivity.

  • 8/10/2019 Perl Regex 2

    7/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 7

    Perl Regular Expressions

    Perl is renowned for it sexcellence at textprocessing.

    Handling of regularexpressions plays a bigfactor in its fame.

    Mastering even the basicswill allow you to m anipulatetex t w ith ease.

    Regular expressions have astrong formalism (FSA).

    You have already usedsome and seen others.

    Other languages havesome support for regexes,usually via some library.

    l. J&c

    p. ,-: K 7ep .?.J K le..

    L,+,impo 3,+,&-il&e7e:&J;

    Myhonimpo e;

    %-.in7 Sy.em&Te:e7-l,E:pe..ion.;

  • 8/10/2019 Perl Regex 2

    8/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 8

    Sim ple St ring M at ching

    Regular expressions areusually used inconjunct ion with an if

    i f < str ing matchesthis pattern>

    ... then > dosomething with thatmatch> .

    The simplest such match

    refers to a str ing But note: t his is much

    different t hat using eq

    my $line = ;chomp $line;

    % Pn4enown. o po7,mme* he fi. line% of he inp- i. he line Qello* Rol2;

    if ($line = m/Rol2/:m.) { pin #e7e:p m,che.8n;'el.e { pin Ah* poop&8n;'

    if ($line eq Rol2) { pin line i. eq-,l o Rol2U8n;'el.e { pin line .-e ,inU eq-,l o Rol2U8n;'

  • 8/10/2019 Perl Regex 2

    9/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 9

    A w ord about

    m/yadayada/xms The text between t he two slashes is the regular expression

    (regex).

    Leading m indicates the regex is used for a match

    Trailing xm s are three regex options x : Ext ended formatt ing (whitespace in regex is ignored)

    m : For line boundaries (and eliminates a cause of some subt lebugs)

    s : ensures everything is matched by t he . sym bol

    Why all of th is verbiage instead of plain old /yadayada/ as ofold?

    Also note: m { } or m/ //UV88UWJ(X88&V88UWJ)JU/

    m{ % ,n openin7 .in7le q-oe V88UWJ % ,ny non9.peci,l ch,. (X % hen ,ll of&& 88 & % ,ny e:plicily 4,c.l,.he2 ch, V88UWJ % followe2 4y ,ny non9.peci,l ch,. )J % epe,e2 Feo of m,ny ime. % , clo.in7 .in7le q-oe':m.

  • 8/10/2019 Perl Regex 2

    10/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 0

    Anot her exam ple

    The code on the rightsearches for a pat tern insome dict ionary file

    Note that a command-line argum ent is beingused for a regex !

    Also note < > syntax:This takes the firstunused command-lineargum ent , and uses itas a filename foropening!

    %/-./4in/pel

    -.e .ic;

    my $e7e:p = .hif 0"#G!;while (my $wo2 = ) { if ($wo2 = m/$e7e:p/:m.) { pin $wo2;

    ''

    &/.e,ch&pl pe /-./.h,e/2ic/lin-:&wo2.,4-pe"c,lype,e,c,nhope,n"c,nhopei

    &&& &&&-nch,pe-nch,pee2-n2epompe&&& &&&Yy7opei.Fy7opeonFy7opeo-.

  • 8/10/2019 Perl Regex 2

    11/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 1

    Metacharacters

    Regexs obtain their powerby describing sets ofstrings.

    Such descript ions involvethe use of

    metacharacters Of course, some strings

    that we want to m atch willcontain t hese strings.

    Therefore we must

    escape them.

    { ' V W ( ) $ &K J X/ 8

    H= = m/H/:m. % 2oe.nU m,ch

    H= = m/8H/:m. % 2oe. m,ch

    The ine+,l i. V5*6)& = m/V5*6)&/:m. % .yn,: eo

    The ine+,l i. V5*6)& = m/8V5*68)8&/:m. % 2oe. m,ch

    /-./4in/pel = m/8/-.8/4in/8/pel/:m. % m,che.

    /-./4in/pel = m{/-./4in/pel':m. % 4ee

    8RINDARSU = m/88RINDARS/ % m,che.

  • 8/10/2019 Perl Regex 2

    12/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 2

    Anchoring

    We m ay wish to anchor a match to certainlocations ^ m atches the beginning of a l ine. $ m atches the end of a line. \A m atches the beginning of a string.

    \z matches the end of a str ing.ho-.eeepe = m/eepe/:m. % m,che.ho-.eeepe = m/eepe/:m. % 2oe. no m,chho-.eeepe = m/eepe/:m. % m,che.ho-.eeepe = m/eepe8n/:m. % ,l.o m,che.

    eepe = m/eep$/:m. % 2oe. no m,cheepe = m/eepe$/:m. % m,che.eepe = m{8" eepe 8F':m. % m,che.

    my $e: =ZQee i. one line&8nI i. followe2 4y8n"nohe line8nZ;

    if ($e: = m{line8& $':) { pin ZGoch,8nZ; ' el.e { pin ZAh 2e,8nZ; '

    if ($e: = m{line8& $':m) { pin ZGoch,8nZ; ' el.e { pin ZAh 2e,8nZ; '

  • 8/10/2019 Perl Regex 2

    13/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 3

    Charact er classes

    These allowsets ofpossiblecharactersto bematched

    Used at

    desiredpoints withina regex.

    m/c,/:m. % m,che. c,Um/V4cW,/:m. % m,che. 4,* c,U* o ,Um/iemV56?C@W/:m. % m,che. iem5U* && iem@U

    ,4c = m/Vc,4W/:m. % m,che. ,Um/Vy[WVeEWV.SW/:m. % m,che. c,.e9in.en.ii+e [ESm/ye./:m.i % .imple w,y* -.in7 im/(Xi)ye./:m. % .,me

    m/V8WcW2ef/:m. % m,che. W2efU o c2efU

    $: =4cUm/V$:W,/:m. % m,che. 4,U* c,U* ,Um/V8$:W,/:m. % m,che. $,U o :,Um/V88$:W,/:m. % m,che. 8,U* 4,* c,U* o ,U

  • 8/10/2019 Perl Regex 2

    14/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 4

    Range opera t ors

    Ranges canelim inate someugly code [0123456789]

    becomes [0-9]

    [abcdefghijklmnopqrstuvwxyz] becomes [a-z]

    If - is the first or lastcharacter in a characterclass, it is treat ed as anordinary character

    m/iemV59@W/:m. % iem5* iem6* &&& iem@m/V59@4:9FW,,/:m. % 5,,U* &&&* @,,U* % 4,,U* :,,U* y,,U*

    % o F,,Um/V59@,9f"9\W/:m. % m,che. he: 2i7im/V,9FW/i % m,che. , wo2 ch,

    % ,ll ,e eq-i+,len

    m/V9,4W/:m.

    m/V,49W/:m./V,894W/:m.

  • 8/10/2019 Perl Regex 2

    15/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 5

    Negat ed charact erclasses

    The special character^ in the first positionof a character classdenotes a negated

    character class Matches any characterbut those in thebrackets

    m/[a]at/xms # doesnt match aat or at, but # matches all other bat, cat, # 0at, %at, etc.

    m/[0!]/xms # matches a nonnumer"c character

    m/[a]at/xms # matches aat or at here # "s ord"nary

  • 8/10/2019 Perl Regex 2

    16/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 6

    Matching any character

    The per iod '. 'matches any character but "\n"

    A period is a metacharacter, it needs to beescaped to match as an ordinary period.

    m/..rt/xms # matches any $ chars, ollo&ed by rtm/end'./xms # matches end.m/end[.]/xms # same th"n(, matches only end.)) *+ m/./xms # doesnt match needs a character)a) *+ m/./xms # matches

    )) *+ m/./xms # doesnt match needs a character)'n) *+ m/./xms # doesnt match needs a character

    # other than 'n)a'n) *+ m/./xms # matches, "(nores the 'n

  • 8/10/2019 Perl Regex 2

    17/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 7

    M at ching t his or t hat

    We would like to m atch differentpossible words or character strings

    We use the alternationcharacter |(pipe)

    "cats and dogs" = /cat|dog|bird/ # matches "cat"

    "cats and dogs" = /dog|cat|bird/ # matches "cat"

  • 8/10/2019 Perl Regex 2

    18/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 8

    Grouping ThingsTogether

    Sometim es we want alternatives for part of aregular expression.

    /(a|b)b/ # matches ab or bb/(ac|b)b/ # matches acb or bb

    /(a|b)c/ # matches ac at start of string or # bc anywhere/(a|[bc])d/ # matches ad, bd, or cd

    /house(cat|)/ # matches either housecat

    # or house

    /house(cat(s|)|)/ # matches either housecats or # housecat or house. # Note groups can be nested.

  • 8/10/2019 Perl Regex 2

    19/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 1 9

    Ext ract ing M at ches

    The group ing m etacharacters () also serve anothercompletely different function: they allow the ext raction ofthe parts of a string t hat m atched.

    For each grouping, the part that m atched inside goes intothe special variables $1, $2, etc.

    # extract hours, minutes, seconds$time = /(\d\d):(\d\d):(\d\d)/ # match hh:mm:ss format

    # \d is equivalent to [0-9]$hours = $1;$minutes = $2;$seconds = $3;

    # More compact code, equivalent code($hours,$minutes,$second) = ($time =/(\d\d):(\d\d):

    (\d\d)/)

  • 8/10/2019 Perl Regex 2

    20/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 0

    M at ching Repet it ions

    We would like to be able to match multiple times:

    a?= match 'a' 0 or 1 t imes (~ optional)

    a*= match 'a' 0 or m ore times, i.e., any num ber of t im es

    a+= match 'a' 1 or more times, i.e., at least once

    a{n,m}= match at least n t imes, but not m ore than m

    times. a{n,}= match at least n or m ore tim es.

    a{n}= match exactly n t imes

    $year = /\d{2,4}/ # make sure year is at least 2 but # not more than 4 digits

    /[a-z]+\d*/i # match a word and any number of digits

    /y(es)?/i # matches y, Y, # or a case-insensitive yes

  • 8/10/2019 Perl Regex 2

    21/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 1

    Search and Replace

    Regular expressions also play a role insearch and replace operat ions in Perl

    Search and replace is accomplished

    with the s///operator General form :

    s/regexp/replacement/modi ers

    $x = "Time to feed the cat!";

    if ( $x = s/cat/hamster/ ) { print $x; # Time to feed the hamster!

    }

  • 8/10/2019 Perl Regex 2

    22/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 2

    M ore Search and Replace

    Commands$y = "'quoted words'";

    $y = s/'(.*)'$// # strip single quotes, $y

    # contains ""

    $x = "I batted 4 for 4";

    $x = s/4/four/ # doesnt do it all: # $x contains

    # "I batted four for 4

    $x = "I batted 4 for 4";

    $x = s/4/four/g # /g modifier does it all: # $x contains

    # "I batted four for four"

  • 8/10/2019 Perl Regex 2

    23/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 3

    A few m ore regexptopics

    Advanced uses of matches

    Escape sequences

    List and scalar context , e.g., phonenumbers

    Finding all instances of a match

    Parenthesis Subst itut ing with s///

    t r , the translate function

  • 8/10/2019 Perl Regex 2

    24/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 4

    Advanced uses ofmatches

    You can assign pat t ern m em orydirectly to your own variablenames (capturing):($phone) = $value =~ /^phone\:(.+)$/;

    Read from right t o left. Apply this patt ernto the value in $value, and assign t he

    results to t he liston the left.

    ($front,$back) = /^phone\:(\d{3})-(\d{4})/;

    Apply this patt ern to $_and assign t he

    results to t he liston the left.

  • 8/10/2019 Perl Regex 2

    25/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 5

    M eaning of backslash let t ers

    \n : newline

    \r: carriage return

    \t : tab

    \f: form feed

    \d: a digit (same as [0-9] ) \D: a non-d igit

    \w: an alphanum eric character, sam e as [0-9a-z_A-Z]

    \W: a non-alphanum eric character \s: a whitespace character, sam e as [ \t \n\r\f]

    \S: a non-whitespace character

    R i d li l

  • 8/10/2019 Perl Regex 2

    26/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 6

    Rem inder: list or sca larcontext?

    A patt ern m atch returns 0 (false) or 1 (t rue) inscalar context, and a list of matches in arraycontext.

    Recall: There are a lot of functions that dodifferent things depending on whether they areused in scalar or list contex t .

    # returns the number of elements

    $count = @array

    # returns a reversed string

    $revString = reverse $string

    # returns a reversed list

    @revArray = reverse @array

    You m ust always be caut ious of this behaviour.

  • 8/10/2019 Perl Regex 2

    27/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 7

    Pract ical Exam ple ofContext

    $phone = $string =~ /^.+\:(.+)$/;

    $phonecontains 1 if patt ern matches,0 otherwise

    ($phone) = $string =~ /^.+\:(.+)$/;

    $phonecontains the matched st ring

  • 8/10/2019 Perl Regex 2

    28/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen t

    MethodsPerl Regular Expr ession: Slide 2 8

    Finding a ll inst ances of amatch

    Use the gm odifier w ith a regular

    expression@sites = $sequence =~ /(TATTA)/g;

    think gfor global

    Returns a list of all the matches (inorder), and stores them in the array

    If you have n pairsof parentheses,the array looks like the following:] ($6*$*^$n*$6*$*^$n*^)

  • 8/10/2019 Perl Regex 2

    29/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 2 9

    Perl is Greedy

    Perl regular expressions t ry t o match thelargest possible st ring which matches yourpattern:

    lalaaaaagag =~ /(la.*ag)/

    /la.*ag/matches laag, lalag, laaaaaag

    $1contains lalaaaaagag

    If this is not what you wanted t o do, use the?m odifier:lalaaaaagag =~ /(la.+?ag)/

    /(la.+?ag)/ m atches as few charactersas possible to find m atching pat tern

    $1contains lalaaaaag

    M aking pare nt heses

  • 8/10/2019 Perl Regex 2

    30/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 3 0

    M aking pare nt hesesforgetful

    Sometim es you need parentheses to m ake yourregular expression work, but you dont actually wantto keep the result s. You can st ill use parentheses forgrouping.

    /(?:group)/

    Certain characters are overloaded; recall:

    \d?means 0 or 1 instances

    \d+?m eans the fewest non zero number of

    digits(?:group)m eans look for t he group of

    atoms in the string, but dont rem emberthem

  • 8/10/2019 Perl Regex 2

    31/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 3 1

    Exam ple of forgett ing

    #!/usr/bin/perl# Method 1

    if (@ARGV && $ARGV[0] eq "-x") {

    $mod = "?:";

    } else {

    $mod = "";

    }

    $pat1 = "\\w+";

    $pat2 = "\\d+";

    while () {

    $_ =~ /($mod$pat1) ($pat2)/;

    print $1, "\n";

    }

    #!/usr/bin/perl# Method 2

    if (@ARGV && $ARGV[0] eq "-x") {

    $ignore = 1;

    } else {

    $ignore = 0;

    }

    while () {

    $_ =~ /(\w+) (\d+)/;

    if ($ignore) {

    print $2, "\n";

    }

    else {

    print $1, "\n";

    }

    }

    M l i

  • 8/10/2019 Perl Regex 2

    32/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 3 2

    M ore exam ples using.///

    Substitut ing one word for another$string =~ s/dogs/cats/

    If $stringwas I love dogs , it is now I love cats

    Removing t railing white space$string =~ s/\s+$//

    If $stringwas ATG , it is now ATG

    Adding 10 to every num ber in a st ring$string =~ /(\d+)/$1+10/ge

    Note pattern mem ory

    gmeans global(just like in regu lar expressions)

    eis specific t o s, evaluate t he expression on the right

  • 8/10/2019 Perl Regex 2

    33/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 3 3

    trfunct ion

    t ranslate or t ransliterate] Gene,l fom

    tr/list1/list2/

    Even less like a regular expression thans

    substitutes characters in the first listwith characters from the second list :

    $string =~tr/a/A/

    every a to t ranslated t o an A

    No need for a global modifier using tr.

  • 8/10/2019 Perl Regex 2

    34/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 3 4

    M ore exam ples of tr

    convert ing nam ed scalar to lowercase$ARGV[1] =~ tr/A-Z/a-z/

    count t he number of *in $_

    $cnt = tr/*/*/

    $cnt = $_ =~ tr/*/*/ change all non-alphabet ic characters to

    spacestr/a-zA-Z/ /c

    notice space + c= com plement search string delete all non-alphabet ic characters completely

    tr/a-zA-Z//cd

    d = delete found but unreplaced characters

    U i t h lt f t h

  • 8/10/2019 Perl Regex 2

    35/37

    University of Victoria

    Department of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 3 5

    Using t he results of m at chesw it hin a patt ern

    \1, \2, \3refer t o what a previous set ofparentheses matchedabc abc =~ /(\w+) \1/ # matches

    abc def =~ /(\w+) \2/ # doesnt match

    Can also use $1, $2, etc. to perform someinteresting operations:s/^([^ ]*) *([^ ]*)/$2 $1/ #swap first two words

    /(\w+)\s*=\s*\1/ # match foo = foo

    other default variables used in m atches

    $` : returns everything before m atched string $& : returns entire matched string

    $ : returns everything after matched st ring

    Exam ple: Celsius

  • 8/10/2019 Perl Regex 2

    36/37

    University of VictoriaDepartment of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 3 6

    Exam ple: Celsius

    Fahrenheit#! /usr/bin/perl -w

    print "Enter temperature: \n";

    $line = ;

    chomp($line);

    if ( $line =~ /^([-+]?[0-9]+(?:\.[0-9]*)?)\s*([CF])$/i ) { $temp = $1;

    $scale = $2; if ( $scale =~ /c/i ) {

    $cel = $temp;

    $fah = ($cel * 9 / 5) + 32;

    }

    else {

    $fah = $temp;

    $cel = ($fah - 32) * 5 / 9; }

    printf( "%.2f C is %.2f F\n", $cel, $fah );

    }

    else {

    printf( "Bad format\n" );

    }

  • 8/10/2019 Perl Regex 2

    37/37

    University of VictoriaDepartment of ComputerScience

    SENG 265: Software Developmen tMethods

    Perl Regular Expr ession: Slide 3 7

    Regex on com m and line

    We can execute simple regularexpressions on the command line:

    $ perl p i e 's/kat/cat/g' in.txt

    p : apply program to each line in filein.txt

    i: write changes back to in.txte : program between ''