Regular Expression Question Solution

Embed Size (px)

Citation preview

  • 8/10/2019 Regular Expression Question Solution

    1/68

    Dr. Shakir Al Faraji

    Theory of Computation

    Thank you, Shakir

    Shakir Al Faraji

    Computer Science Dept.,

    Petra University

    Amman - Jordan.email: [email protected]

  • 8/10/2019 Regular Expression Question Solution

    2/68

    Dr. Shakir Al Faraji

    IMPORTANT NOTES

    Students

    This presentation is designed to be used inclass as part of a guided discovery

    sequence. It is not self-explanatory! Pleaseuse it only for revision purposes after havingtaken the class. Simply going through theslides will teach you nothing. You must beactively thinking, doing and questioning to

    learn!

    Thank you, Shakir.

  • 8/10/2019 Regular Expression Question Solution

    3/68

    Dr. Shakir Al Faraji

    Course Strategy

    Be Warned: This is not a coursethat spoon-feeds students.

    Students are expected to beinvestigative and resourceful.

    Reading books and other research

    of topics are expected.

    Thank you, Shakir.

  • 8/10/2019 Regular Expression Question Solution

    4/68

    Dr. Shakir Al Faraji

    Material

    There is a book:

    Hopcroft, Rajeev,& Ullman 3edEdition

    (2007), Addison Wesley

    These were the lecture notes.Well, apart from the slides.

    Thank you, Shakir.

  • 8/10/2019 Regular Expression Question Solution

    5/68

    Dr. Shakir Al Faraji

    Regular Expression

  • 8/10/2019 Regular Expression Question Solution

    6/68

    Dr. Shakir Al Faraji

    Definition

    A regu lar exp ress ion, or RE,describes strings of characters

    (words or phrases or anyarbitrary text). It's a pattern thatmatches certain strings and

    doesn't match others.A regularexpressionis a set ofcharacters thatspecify a pattern.ORLanguage defining symbols.

  • 8/10/2019 Regular Expression Question Solution

    7/68Dr. Shakir Al Faraji

    Definition Cont

    Regular expressions are used togeneratepatterns of strings. A

    regular expression is an algebraicformula whose value is a pattern

    consisting of a set of strings,called the language of the

    expression.

  • 8/10/2019 Regular Expression Question Solution

    8/68Dr. Shakir Al Faraji

    Operands in a regular

    expression

    Operands in a regular expression can be:charactersfrom the alphabet over which

    the regular expression is defined.variableswhose values are any pattern

    defined by a regular expression.

    epsi lonwhich denotes the empty stringcontaining no characters.

    nul lwhich denotes the empty set ofstrings.

  • 8/10/2019 Regular Expression Question Solution

    9/68

    Dr. Shakir Al Faraji

    Operators used in

    regular expressions

    Union: If R1 and R2 are regularexpressions, then R1 | R2 (also written asR1 U R2 or R1 + R2) is also a regularexpression.

    L(R1|R2) = L(R1) U L(R2).Concatenation: If R1 and R2 are regular

    expressions, then R1R2 (also written asR1.R2) is also a regular expression.L(R1R2) = L(R1) concatenated with L(R2).

  • 8/10/2019 Regular Expression Question Solution

    10/68

    Dr. Shakir Al Faraji

    Operators used in

    regular expressions

    Kleene closure: If R1 is a regularexpression, then R1* (the Kleene closure

    of R1) is also a regular expression.L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...

    Closure has the highest precedence,

    followed by concatenation, followed byunion.

  • 8/10/2019 Regular Expression Question Solution

    11/68

    Dr. Shakir Al Faraji

    Examples

    The set of strings over {0,1} that end in

    3 consecutive 1's.

    (0 | 1)*111OR(0 + 1)*111

    The set of strings over {0,1} that have

    at least one 1.0*1 (0 + 1) *

  • 8/10/2019 Regular Expression Question Solution

    12/68

    Dr. Shakir Al Faraji

    Examples Cont.

    The set of strings over {0,1} that have atmost one 1.

    0* | 0* 1 0*

    The set of strings over {A..Z,a..z} thatcontain the word "main".

    Let = A | B | ... | Z | a | b | ... | z

    * main *

  • 8/10/2019 Regular Expression Question Solution

    13/68

    Dr. Shakir Al Faraji

    Examples Cont

    .

    The set of strings over {A..Z,a..z} that

    contain 3 x's.

    * x* x* x*

  • 8/10/2019 Regular Expression Question Solution

    14/68

    Dr. Shakir Al Faraji

    Examples Cont

    .

    The set of identifiers in Pascal.

    Let = A | B | ... | Z | a | b | ... | zLet = 0 | 1 | 2 | 3 ... | 9

    ( | )*

  • 8/10/2019 Regular Expression Question Solution

    15/68

    Dr. Shakir Al Faraji

    Examples Cont

    .

    The set of real numbers in Pascal.

    Let = 0 | 1 | 2 | 3 ... | 9

    Let = 'E' * | epsilonLet = '+' | '-' | epsilonLet = '.' * | epsilon

    *

  • 8/10/2019 Regular Expression Question Solution

    16/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    Consider = { a }

    L is a language that each word isof odd length

    a (aa)*

  • 8/10/2019 Regular Expression Question Solution

    17/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    Consider = { a, b }

    L is a language that each wordmust start with the letter b

    b (a+b)*

  • 8/10/2019 Regular Expression Question Solution

    18/68

  • 8/10/2019 Regular Expression Question Solution

    19/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    Consider = { a, b, c }

    L = { a, c, ab, cb, abb, cbb,abbb,cbbb, abbbb, cbbbb . . . }

    L - language ((a+c) b*)

  • 8/10/2019 Regular Expression Question Solution

    20/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    Consider (a+b)*a(a+b)*

    L = language of all words over the= { a, b }that have an ain them

    L = { a, aa, ba, aab, aba, baa,

    bba, aaaa, aaba, abaa . . . }

  • 8/10/2019 Regular Expression Question Solution

    21/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    Consider the following RE

    (a+b)* a(a+b)* a(a+b)*

    L = language of all words over the= { a, b }that have at least two

    asin them

  • 8/10/2019 Regular Expression Question Solution

    22/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    (a+b)* a(a+b)* a(a+b)* = b*ab*a(a+b)*

    ?(a+b)* a(a+b)* a(a+b)* = (a+b)*ab*ab*

    ?

    (a+b)* a(a+b)* a(a+b)* = b*a(a+b)*ab*

  • 8/10/2019 Regular Expression Question Solution

    23/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    (a+b)* a(a+b)* b(a+b)*

    ?

  • 8/10/2019 Regular Expression Question Solution

    24/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    (a+b)* a(a+b)* b(a+b)*

    ?

    Language of all words that have at

    least one aand at least one b!!!!

  • 8/10/2019 Regular Expression Question Solution

    25/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    (a+b)* a(a+b)* b(a+b)*

    What about the word ba!!!!

  • 8/10/2019 Regular Expression Question Solution

    26/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    (a+b)* a(a+b)* b(a+b)*

    What about the word ba!!!!

    MUST BE

    (a+b)* a(a+b)* b(a+b)* + (a+b)* b(a+b)* a(a+b)*

  • 8/10/2019 Regular Expression Question Solution

    27/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    IS

    (a+b)* a(a+b)* b(a+b)* + (a+b)* b(a+b)* a(a+b)*

    SAME AS

    (a+b)* a(a+b)* b(a+b)* + bb*aa*

  • 8/10/2019 Regular Expression Question Solution

    28/68

    Dr. Shakir Al Faraji

    Examples-Cont.

    b* + ab*

    ( + a )b*

    b* + ab* = ( + a )b*

  • 8/10/2019 Regular Expression Question Solution

    29/68

    Dr. Shakir Al Faraji

    More on RE.

    Defini t ion

    If Sand Tare sets of strings of letters, we

    define the product set of strings of letters tobeST= {all combinations of a string from S

    concatenated with a string fromT}

  • 8/10/2019 Regular Expression Question Solution

    30/68

    Dr. Shakir Al Faraji

    More on RE - Cont.

    If S= { a, aa, aaa} , T= { bb, bbb}then

    ST= {abb,abbb,aabb,aabbb,aaabb,aaabbb }

  • 8/10/2019 Regular Expression Question Solution

    31/68

    Dr. Shakir Al Faraji

    Languages Associatedwith RE.

    Defini t ionlanguage associated with the RE just is a single

    letter is that one-letter word alone and the languageassociated with is just {

    }, a one-word language.if r1is a RE associated with the languageL1and r2

    is a RE associated with the language L2then:the RE (r1)(r2) is associated with the product L1L2RE (r1+r2) is associated with the language formed by

    the union of sets L1and L2RE (r1)* is associated with the language L1*

  • 8/10/2019 Regular Expression Question Solution

    32/68

    Dr. Shakir Al Faraji

    Finite Languages areRegular.

    Theorem

    If L is a finite language, then L can be definedby a regular expression. In other words, allfinite languages are regular.ProofLet L = { aa, ab, ba, bb }

    RE isaa+ab+ba+bb(a+b)(a+b)

  • 8/10/2019 Regular Expression Question Solution

    33/68

    Dr. Shakir Al Faraji

    Examples.

    Can you describe the following RE

    (a+b)* (aa+bb) (a+b)*

  • 8/10/2019 Regular Expression Question Solution

    34/68

    Dr. Shakir Al Faraji

    Examples.

    Can you describe the following RE

    (a+b)* (aa+bb) (a+b)*

    All strings of as and bs that at some point

    contain a double letter.

  • 8/10/2019 Regular Expression Question Solution

    35/68

    Dr. Shakir Al Faraji

    Examples.

    = { a, b }What strings do not contain a double

    letter?

  • 8/10/2019 Regular Expression Question Solution

    36/68

    Dr. Shakir Al Faraji

    Examples.

    = { a, b }What strings do not contain a double

    letter? (ab)*

  • 8/10/2019 Regular Expression Question Solution

    37/68

    Dr. Shakir Al Faraji

    Examples.

    = { a, b }What strings do not contain a double

    letter? (ab)*

    Is it correct ?

  • 8/10/2019 Regular Expression Question Solution

    38/68

    Dr. Shakir Al Faraji

    Examples.

    = { a, b }What strings do not contain a double

    letter? (ab)*

    Is it correct ? NO

  • 8/10/2019 Regular Expression Question Solution

    39/68

    Dr. Shakir Al Faraji

    Examples.

    = { a, b }What strings do not contain a double

    letter? (ab)*

    Is it correct ? NO

    (+b)(ab)*(+a)

  • 8/10/2019 Regular Expression Question Solution

    40/68

    Dr. Shakir Al Faraji

    Examples.

    ( a + b* )* = (a + b )* ?

    ( aa + ab*)* = (aa + ab)* ?

    ( a* b* )* = (a + b )* ?

  • 8/10/2019 Regular Expression Question Solution

    41/68

    Dr. Shakir Al Faraji

    Examples.

    ( a + b* )* = (a + b )* YES

    ( aa + ab*)* = (aa + ab)* NO

    ( a* b* )* = (a + b )* YES

  • 8/10/2019 Regular Expression Question Solution

    42/68

    Dr. Shakir Al Faraji

    Examples.

    [aa + bb + ( ab + ba) (aa+bb)*(ab + ba) ]*

  • 8/10/2019 Regular Expression Question Solution

    43/68

    Dr. Shakir Al Faraji

    Examples.

    [aa + bb + ( ab + ba) (aa+bb)*(ab + ba) ]*

    EVEN-EVENtype1= aatype2= bbtype3= (ab+ba)(aa+bb)*(ab+ba)

    E = [ type1+ type2+ type3] *

  • 8/10/2019 Regular Expression Question Solution

    44/68

    Dr. Shakir Al Faraji

    What Regular Expressions AreExactly - Terminology

    Basically, a regular expression is apattern describing a certain amount

    of text. Their name comes from themathematical theoryon which theyare based. But we will not dig intothat. Since most people including

    myself are lazy to type, you willusually find the name abbreviated toregex or regexp.

  • 8/10/2019 Regular Expression Question Solution

    45/68

    Dr. Shakir Al Faraji

    What Regular Expressions AreExactlyCont.

    This first example is actually a perfectlyvalid regex. It is the most basic pattern,

    simply matching the literal text regex. A"match" is the piece of text, or sequence ofbytes or characters that pattern was foundto correspond to by the regexprocessingsoftware. Matches are highlighted in blue onthis site.

  • 8/10/2019 Regular Expression Question Solution

    46/68

    Dr. Shakir Al Faraji

    What Regular Expressions AreExactlyCont.

    b[A-Z0-9._ -] @[A-Z0-9._ -] \.[A-Z]{2,4}\b

  • 8/10/2019 Regular Expression Question Solution

    47/68

    Dr. Shakir Al Faraji

    What Regular Expressions AreExactlyCont.

    b[A-Z0-9._ -] @[A-Z0-9._ -] \.[A-Z]{2,4}\b

    is a more complex pattern. It describes a

    series of letters, digits, dots, percentagesigns and underscores, followed by an atsign, followed by another series of letters,digits, dots, percentage signs andunderscores, finally followed by a single dotand between two and four letters. In otherwords: this pattern describes an emailaddress.

  • 8/10/2019 Regular Expression Question Solution

    48/68

    Dr. Shakir Al Faraji

    What Regular Expressions AreExactlyCont.

    With the above regular expression pattern,you can search through a text file to findemail addresses, or verify if a given string

    looks like an email address. In this tutorial, Iwill use the term "string" to indicate the textthat I am applying the regular expression to.

  • 8/10/2019 Regular Expression Question Solution

    49/68

    Dr. Shakir Al Faraji

    What Regular Expressions AreExactlyCont.

    The term "string" or "character string" isused by programmers to indicate asequence of characters. In practice, you

    can use regular expressions withwhatever data you can access using theapplication or programming language youare working with.

    Wh t R l E i A

  • 8/10/2019 Regular Expression Question Solution

    50/68

    Dr. Shakir Al Faraji

    What Regular Expressions AreExactlyCont.

    A regular expression uses metacharacters(characters that assume special meaning formatching other characters) such as *, [ ], $

    and .. For example, the RE [Hh]ello!* wouldmatch Helloand hel loand Hello!(andhello!!! ! !). The RE [Hh](ello|i)!* would matchHelloand Hiand Hi!(and so on). A backslash

    (\) disables the special meaning of thefollowing character, so you could match thestring [Hel lo]with the RE \[Hello\].

  • 8/10/2019 Regular Expression Question Solution

    51/68

    Dr. Shakir Al Faraji

    How can I use regularexpressions?

    Many text editors allow regular-expressionsearch-and-replace. EditPlusfor Windowshas this capability, as does BBEditfor the

    Macintosh.EditPlus

    The EditPlus search-replace window has acheckbox called Regular expression. To

    use regular expressions in your search,simply check this box.

  • 8/10/2019 Regular Expression Question Solution

    52/68

    Dr. Shakir Al Faraji

    How can I use regularexpressions?Cont.

    BBEdit

    BBEdits search-replace window also hassuch a checkbox; its label, however, is

    Use grep.Grep. What an odd term. The word grep is fromthe creators of the UNIX operating system, someof the first implementers of regular expressions.

    UNIX programmers delighted in reducing longcommands to meaningless acronyms; grep issaid to have meant general regular expression

    print.

  • 8/10/2019 Regular Expression Question Solution

    53/68

    Dr. Shakir Al Faraji

    Defining regular expressionpatterns

    The way regular-expression patterns workis by creating a special little language in

    which ordinary symbols take on special

    meanings. This guide will go through thespecial meanings little by little, withexamples. You will get the most from thisguide if you read all the way through it. To

    be sure you do, Ive left a very importantpiece of informationhow to replace what

    you findfor the end.

  • 8/10/2019 Regular Expression Question Solution

    54/68

    Dr. Shakir Al Faraji

    Defining regular expressionpatterns-Cont

    Dot, question mark, star, plus, andbackslash

    Imagine that you have a book of letters, andyou need to tag all the salutations.Salutations fall into a pattern: the wordDear, a name, and a colon (or possibly a

    comma, but well stick with a colon fornow). Obviously, the problem with findingthis via ordinary search is that the namecould be anything.

  • 8/10/2019 Regular Expression Question Solution

    55/68

    Dr. Shakir Al Faraji

    Defining regular expressionpatterns-Cont

    Regular expressions have a way of sayingany character: the dot, or period. To

    find a three-letter word beginning andending with b, for example, you couldsearch onb.b . This would find bib

    or bob or bub, but not bud or

    dub or bulb.

  • 8/10/2019 Regular Expression Question Solution

    56/68

    Dr. Shakir Al Faraji

    Defining regular expressionpatterns-Cont

    Note that whitespace characters such asspace or tab can also be located by the dot.Sob.bwould find words with a space or

    tab between them. (Quick exercise: wherewouldb.bmatch in the precedingsentence? There are two possibilities!)The hard return, however, is not matched

    by a dot; more on hard returns later.This still wont solve our salutation problem,though: names are made up out of a variablenumber of letters, not just one.

    http://www.textartisan.com/articles/regex.htmlhttp://www.textartisan.com/articles/regex.html
  • 8/10/2019 Regular Expression Question Solution

    57/68

    Dr. Shakir Al Faraji

    Defining regular expressionpatterns-Cont

    Regular expressions have several ways tosay not just one: the question mark (?),the star or asterisk (*), and the plus (+).

    The question mark means zero or one,the star means zero or more, and theplus means one or more. These marksare like adjectives; they modify other

    characters. Whats more, theyre likeadjectives in some foreign languages, inthat they come immediately afterthecharacter they modify.

    fi i l i

  • 8/10/2019 Regular Expression Question Solution

    58/68

    Dr. Shakir Al Faraji

    Defining regular expressionpatterns-Cont

    Regular expressions have several ways tosay not just one: the question mark (?),the star or asterisk (*), and the plus (+).

    The question mark means zero or one,the star means zero or more, and theplus means one or more. These marksare like adjectives; they modify other

    characters. Whats more, theyre likeadjectives in some foreign languages, inthat they come immediately afterthecharacter they modify.

    fi i l i

  • 8/10/2019 Regular Expression Question Solution

    59/68

    Dr. Shakir Al Faraji

    Defining regular expressionpatterns-Cont

    So the regular expression Ba?will match BorBabut not Baaor a. The regular expressionBa*will match Bor Baor Baa, up to any

    number of as. The regular expressionBa+will match Baor Baaand on up, but itwill not match Bby itself, since the plussign demands at least one a.

    Combining the dot with the plus or starsolves our salutation problem. The regularexpression Dear .+:will find anyimaginable business-letter salutation.

    D fi i l i

  • 8/10/2019 Regular Expression Question Solution

    60/68

    Dr. Shakir Al Faraji

    Defining regular expressionpatterns-Cont

    But what if you actually want to look for adot, a star, a question mark, or a plus?How can you find them, if theyve got

    special meanings?Any special regular-expression character loses its

    special meaning if there is a backslash (\) beforeit. So \.will find a real period, like the one at the

    end of this sentence. The backslash works onitself, too; to find a real backslash, put \\in your

    search.

    Wh l d f

  • 8/10/2019 Regular Expression Question Solution

    61/68

    Dr. Shakir Al Faraji

    What weve learned so far

    (Metacharacters)

    Character Regular-expression meaning

    . Any character, including space or tab

    ? Zero or one of the preceding character

    * Zero or more of the preceding

    character

    +

    One or more of the preceding

    character\ Negates the special meaning of the

    following character

  • 8/10/2019 Regular Expression Question Solution

    62/68

    Dr. Shakir Al Faraji

    Metacharacters

    As youve learned, the backslash negates anyspecial meaning that the character

    following it has to a regular expression. It

    has another function, too: it can turnordinary characters into special ones.

    Consider the tab. You dont see it on the

    screen the way you see ordinary letters;you see what it does.

  • 8/10/2019 Regular Expression Question Solution

    63/68

    Dr. Shakir Al Faraji

    Metacharacters-Cont.

    If you turn on the show-invisibles function,however, you generally see an indicationthat there is a character there.

    Regular expressions let you access theseinvisible characters (usually calledmetacharacters):

  • 8/10/2019 Regular Expression Question Solution

    64/68

    Dr. Shakir Al Faraji

    Metacharacters-Cont.

    Metacharacter Meaning

    \n Newline (or paragraph

    mark, or however you

    think of it)

    \t Tab character

    \s Any whitespace

    character (tab, space, ornewline)

  • 8/10/2019 Regular Expression Question Solution

    65/68

    Dr. Shakir Al Faraji

    Metacharacters-Cont.

    For purposes of modifiers like star andplus, these metacharacters act like singlecharacters. So \n+finds one or more

    newlines.A special caution with BBEdit: Because of

    ancient OS wars, Macs and non-Macstreat newlines differently. If a regularexpression containing \nisnt finding

    what you think it should, try replacing \nin your search pattern with \r.

  • 8/10/2019 Regular Expression Question Solution

    66/68

    Dr. Shakir Al Faraji

    Metacharacters-Cont.

    Depending on your regular-expressionengine or editing program, there may beother metacharacters available to you.

    Read the manual or help pages for details.In addition, a few more special regular-

    expression characters provide usefulfunctions. Remember that to look for theactual character, you must precede it witha backslash.

  • 8/10/2019 Regular Expression Question Solution

    67/68

    Dr. Shakir Al Faraji

    Metacharacters-Cont.

    Depending on your regular-expressionengine or editing program, there may beother metacharacters available to you.

    Read the manual or help pages for details.In addition, a few more special regular-

    expression characters provide usefulfunctions. Remember that to look for theactual character, you must precede it witha backslash.

  • 8/10/2019 Regular Expression Question Solution

    68/68

    END