46
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars

Regular Grammars

Embed Size (px)

Citation preview

5/13/2018 Regular Grammars - slidepdf.com

Mathematical Foundationsof Computer Science

Chapter 3: Regular Languages and

Regular Grammars

5/13/2018 Regular Grammars - slidepdf.com

Languages

A language (over an alphabet Σ) is any subsetof the set of all possible strings over Σ . Theset of all possible strings is written as Σ*.

Example:Σ = {a, b, c}

Σ* = { , a, b, c, ab, ac, ba, bc, ca, aaa, …}

one language might be the set of strings of length less than or equal to 2.

●  L = { , a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc}

5/13/2018 Regular Grammars - slidepdf.com

Regular Languages

A regular language (over an alphabet Σ) is anylanguage for which there exists a finiteautomaton that recognizes it.

5/13/2018 Regular Grammars - slidepdf.com

Mathematical Models of Computation

This course studies a variety of mathematicalmodels corresponding to notions of computation.

The finite automaton was our first example.

The finite automaton is an example of anautomaton model.

There are other models as well.

5/13/2018 Regular Grammars - slidepdf.com

Mathematical Models of Computation

Another important model is that of a grammar .

We will shortly look at regular grammars.

But first, a digression:

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

A regular expression is a mathematical modelfor describing a particular type of language.

Regular expressions are kind of like arithmeticexpressions.

The regular expression is defined recursively.

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Given an alphabet Σ

, λ and a  Σ are all regular expressions.

If r 1 and r 2 are regular expressions, then so arer 1 + r 2,r 1 r 2 , r 1*

and (r 1).

● Note: we usually write r 1 r 2 as r 1 r 2 .

These are the only things that are regular expressions.

emptyset

emptystring

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Meaning:

represents the empty language

λ represents the language {λ }

a represents the language {a}

r 1 + r 2 represents the language L(r 1)   L(r 2)

r 1 r 2 represents L

(r 1) L

(r 2)r 1* represents ( L(r 1))*

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Example 1:

What does a*(a + b) represent?

It represents zero or more a's followed by either an a or a b.

{a, b, aa, ab, aaa, aab, aaaa, aaab …}

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Example 2:

What does (a + b)*(a + bb) represent?

It represents zero or more symbols, each of which can be an a or a b, followed by either  a or bb.

{a, bb, aa, abb, ba, bbb, aaa, aabb, aba, abbb,baa, babb, bba, bbbb, …}

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Example 3:

What does (aa)*(bb)*b represent?

All strings over {a, b} that start with an evennumber of a's which are then followed by an oddnumber of b's.

It's important to understand the underlyingmeaning of a regular expression.

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Example 4:

Find a regular expression for strings of 0's and 1'swhich have at least one pair of consecutive 0's.

Each such string must have a 00 somewhere in it.

It could have any string in front of it and anystring after it, as long as it's there!!!

Any string is represented by (0 + 1)*

Answer: (0 + 1)*00(0 + 1)*

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Example:

Find a regular expression for strings of 0's and1's which have no pairs of consecutive 0's.

● It's a repetition of strings that are either 1's or, if asubstring begins with 0, it must be followed by at leastone 1.

(1 + 011*)*● or equivalently, (1 + 01)*

● But such strings can't end in a 0.

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Example:

Find a regular expression for strings of 0's and 1'swhich have no pairs of consecutive 0's.

● (1 + 011*)*

● (1 + 01)*

● But such strings can't end in a 0.

● So we add (0 + λ) to the end to allow for this.

● (1 + 01)* (0 + λ)

This is only one of many possible answers.

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Why are they called regular expressions?

Because, as it turns out, the set of languages theydescribe is that of the regular languages.

That means that regular expressions are justanother model for the same thing as finiteautomata.

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions

Homework:

Chapter 3, Section 1

Problems 1-11, 17, 18

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions and RegularLanguages

As we have said, regular expressions and finiteautomata are really different ways of expressingthe same thing.

Let's see why.

Given a regular expression, how can we build anequivalent finite automaton?

(We won't bother going the other way, althoughit can be done.)

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions and RegularLanguages

Clearly there are simple finite automata corresponding to thesimple regular expressions:

λ

a

λ

a

Note that each of these has an initial stateand one accepting state.

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions and RegularLanguages

On the previous slide, we saw that the simplestregular expressions can be represented by afinite automaton with an initial state (duh!) and

one isolated accepting state:

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions and RegularLanguages

We can build more complex automata for morecomplex regular expressions using this model:

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions and RegularLanguages

Here's how we build an nfa for r 1 + r 2:

λ

λ λ

λ

r 1

r 2

r 1 + r 2

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions and RegularLanguages

Here's how we build an nfa for r 1 r 2:

r 1

r 2

λ λ

λ

r 1 r 2

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions and RegularLanguages

Here's how we build an nfa for (r 1)*:

λ λ

λ

λ

1(r 1)*

λ

Note: the last state added is not in book. For safety, I do it

to have only one arc going into the final state.

5/13/2018 Regular Grammars - slidepdf.com

Building an nfa from a regular expression

Example:

Consider the regular expression (a + bb)(a+b)*(bb)

a

b b

λ

λ

λ

λ

λ

λ

a

bλ  λ

λ

λ

λ

λ

λ

λ

b

b

sometimes we just get tired andtake an obvious shortcut

5/13/2018 Regular Grammars - slidepdf.com

Building regular expression from afinite automaton

The book goes on to show that it works the other way around as well: we can find a correspondingregular expression for any finite automaton.

It's fairly easy in some cases and you can "justdo it."

However, it's generally complicated and not

worth the bother studying.You are not responsible for this material

5/13/2018 Regular Grammars - slidepdf.com

Building regular expression from afinite automaton

The above automaton clearly corresponds to

a*(a+b)c*

a, b

ca

5/13/2018 Regular Grammars - slidepdf.com

Regular Expressions and nfa's

Homework:

Chapter 3, Section 2

● Problems 1-5

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

Review: A grammar is a quadruple

G = (V , T , S , P ) where

V is a finite set of variables

T is a finite set of symbols, called terminals

S is in V and is called the start   symbol

P is a finite set of  productions, which are rules of the form

α → β

● where α and  β are strings consisting of terminals and variables.

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

A grammar is said to be right-linear if every production in P is of the form

A → xB or

A → x

where A and B are variables (perhaps the same,

perhaps the start symbol S ) in V and x is any string of terminal symbols(including the empty string λ )

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

An alternate (and better) definition of a right-linear grammar says that every production in P isof the form

A → aB or  A → a or

S → λ  (to allow λ to be in the language)

where A and B are variables (perhaps the same, but B can't be S ) in V

and a is any terminal symbol

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

The reason I prefer the second definition(although I accept the first one that happens to be used in the book) is

It's easier to work with in proving things.

It's the much more common definition.

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

A grammar is said to be left-linear if every production in P is of the form

A → Bx  or

A → x

where A and B are variables (perhaps the same,

perhaps the start symbol S ) in V and x is any string of terminal symbols(including the empty string λ )

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

The alternate definition of a left-linear grammar  says that every production in P is of the form

A → Ba or

A → a or

S → λ

where A and B are variables (perhaps the same, but B can't be S ) in V

and a is any terminal symbol

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

Any left-linear or right-linear grammar is calleda regular grammar .

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

For brevity, we often write a set of productionssuch as

A  → x 1

A  → x 2

A  → x3

As A → x 1 | x 2 | x 3

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

A derivation in grammar G  is any sequence of strings in V and T ,

connected with

starting with S and ending with a string containingno variables

where each subsequent string is obtained byapplying a production in P  is called a derivation.

S    x 1  x 2  x 3  . . .  xn

abbreviated as:

S    xn*

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

Ø S    x 1 x 2 x 3 . . .  xn

Ø abbreviated as:

Ø  S    xn

Ø  We say that xn is a sentence of the languagegenerated by G , L(G ).

Ø We say that the other  x 's are sentential  forms.

*

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars

Ø   L(G) = {w | w  T* and S    xn}

Ø  We call L(G) the language generated by G

Ø   L(G) is the set of all sentences over grammar G

*

5/13/2018 Regular Grammars - slidepdf.com

Example 1

S   →  abS  | a is an example of a right-linear grammar.

Can you figure out what language it generates?

L = {w  {a,b}* | w contains alternating a's

and b's , begins with an a, and ends with a b}   {a}

L((ab)*a)

5/13/2018 Regular Grammars - slidepdf.com

Example 2

S  →  Aab A → Aab | aB B → a

is an example of a left-linear grammar.Can you figure out what language it generates?

L = {w  {a,b}* | w  is aa followed by at least

one set of alternating ab's} L(aaab(ab)*)

5/13/2018 Regular Grammars - slidepdf.com

Example 3

Consider the grammar

S  →  A A → aB | λ

B → Ab

This grammar is NOT regular.

No "mixing and matching" left- and right-recursive productions.

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars and nfa's

It's not hard to show that regular grammars generateand nfa's accept the same class of languages: theregular languages!

It's a long proof, where we must show thatany finite automaton has a corresponding left- or right-linear grammar,

and any regular grammar has a corresponding nfa.We won't bother with the details.

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars and nfa's

We get a feel for this by example.

Let S  → aA   A → abS  | b

S A

a b

b a

5/13/2018 Regular Grammars - slidepdf.com

Regular Grammars and Regular Expressions

Example: L(aab*a)

We can easily construct a regular language for thisexpression:

S  → aA A → aB

B → bB

B → a

5/13/2018 Regular Grammars - slidepdf.com

Regular Languages

regularexpressions

regulargrammars

finiteautomata

5/13/2018 Regular Grammars - slidepdf.com