View
214
Download
1
Category
Preview:
Citation preview
Language Recognizer
Connecting Type 3 languages and Finite State Automata
Copyright © 2008-2014 – Curt Hill
Copyright © 2008-2014 – Curt Hill
Introduction• Kleene showed that a Finite State Automaton can recognize a class of languages
• This is Kleene’s Theorem
• This set may be built up using only the following:
• The empty set • The empty string • All single characters from the alphabet• Union• Concatenation• Kleene closure
– Three operations, three starting points
Regular Sets• A regular set is any set that can be
constructed using the three starting points and three operations just given
• Thus every regular set is the language accepted by a regular grammar (type 3) and a FSA
• Another way to specify these regular sets is by using regular expressions
Copyright © 2008-2014 – Curt Hill
Regular Expressions• There are two common
understandings of regular expressions– These two are fundamentally related
but have different purposes
• A means of specifying a set of strings– This will be the principle meaning for
this class
• A means of specifying a string to be searched for within a document– Much more common
Copyright © 2008-2014 – Curt Hill
Set of Strings• In the text are the :• Concatenation
– Merely the writing of two items next to each other
• Union– Symbol: signifying that either of
two sets may be used
• Kleene Closure– Symbol: * signifying that zero or more
copies may be concatenated together
• Parentheses for grouping Copyright © 2008-2014 – Curt Hill
Examples• An alphabet contains a, b, c• The string aac is the concatenation of
three letters• The string a(bc) represents two
strings ab and ac• The string a(b)* represents every
string starting with an a and followed by zero or more cs
• a(abc)*c represents all the strings that start with a end with c
• (abc)* is the set of all strings
Copyright © 2008-2014 – Curt Hill
Search Strings
• Fundamentally the same but modified to the task at hand– Mathematics is not concerned with
beginning and end of lines, special characters or characters not on a keyboard
• The is replaced by the | • Concatenation and Kleene Closure
is similar• Many special characters
Copyright © 2008-2014 – Curt Hill
Specials• The special characters include
– [ ]\^|*$.?+(){}
• Any other character just matches itself• Since many of these characters are
valuable in strings the escape is used to match them
• Most of these are for the special requirements of finding an element of this set in a much larger piece of text or a document
Copyright © 2008-2014 – Curt Hill
Escape• The backslash character is the escape• Thus to look for an asterisk (a special)
in a string it must be escaped: \*– This allows a search to find the asterisk
• The C family uses some of the same escape sequences:– \n newline or linefeed– \t tab– \r carriage return
Copyright © 2008-2014 – Curt Hill
Positioning
• There are two specials that force a position
• ^ matches the beginning of the line
• $ matches the end of the line• Both of these match a position
rather than a character• Without these a pattern could
match anywhere within a string
Copyright © 2008-2014 – Curt Hill
Repetition
• There are three repetition characters which are more general
• Closure is the *– It represents zero or more repetitions
of the previous item– Kleene star
• The + represents one or more repetitions of the previous item
• The ? represent zero or one occurrences of the previous item
Copyright © 2008-2014 – Curt Hill
Examples
• ~* matches any number (including zero) of successive tildes
• \-* matches zero or more dashes• .+ matches one or more of any
character• hats? matches either hat or hats
Copyright © 2008-2014 – Curt Hill
Grouping
• The repetitions could only be applied to a single character
• What is next needed is some type of grouping
• This is provided by the parenthesis• Enclosing a pattern in parenthesis
makes it a group• This group can then be followed by
a repetition character
Copyright © 2008-2014 – Curt Hill
Examples
• (\*\-)* will match– *-– *-*-– *-*-*- etc
• The * is greedy – it will try to match as many of these as is possible
Copyright © 2008-2014 – Curt Hill
More interesting patterns
• A number is pretty easy to understand from our perspective but not so easy to describe – Except in regular expressions
• An integer is a string of digits– Possibly preceded by a plus or minus
• So how is this done?• With sets and repetition
Copyright © 2008-2014 – Curt Hill
A set
• A pair of brackets may be filled with character
• This will match any one of them• Thus the digits could be done with:[0123456789]
• An integer could then be:[-+]? [0123456789]+
• Any single vowel is:[aeiouAEIOU]
Copyright © 2008-2014 – Curt Hill
Alternation• A set provides intuitive alternation• The match process may choose any
character within the set to use• The alternation is only applied to
number of single characters• There is also an alternation
character – The vertical bar |
• This allows either simple or complicated patterns to alternate
Copyright © 2008-2014 – Curt Hill
Alternation• Thus:
A|E|I|O|U is equivalent to [AEIOU]
• However, more interesting alternations are possible and useful– (abc)|(123) will match either of the two
strings– ([-+]?\d)+|(\w+) will match any string of
characters that looks like a number or word
Copyright © 2008-2014 – Curt Hill
Audience Participation• Suppose the following expression:
^ab(cde)*f$• Which of the following lines match
this?• abf• abcdecdef• abcdeaf• abcdecdecdecdef• acdef• abcdefa
Copyright © 2008-2014 – Curt Hill
Limitations
• What kind of sets are not regular?• Consider the following language:
0n1n
– The number of zeros and one are the same
• We know that 0m1n is regular, why is 0n1nnot?
Copyright © 2008-2014 – Curt Hill
We Really Do Know
Copyright © 2008-2014 – Curt Hill
s0s1
01
1
• This accepts 0m1n and is clearly a FSA
• Why is 0n1n harder?• Counter-intuitive since 0n1n is a
subset of 0m1n • Shouldn’t it be harder to generate
a full set than a subset?
Memory
• An FSA determines its next state only based on input and current state
• Since it has no memory, it cannot remember how many zeros we processed so that we can process that many ones
• Next we consider those machines stronger than these
Copyright © 2008-2014 – Curt Hill
Exercises
• 13.4– 3, 5, 15
Copyright © 2008-2014 – Curt Hill
Recommended