57
Automata and Language Theory First Semester SY 2009-2010

1 - Language

Embed Size (px)

Citation preview

Page 1: 1 - Language

Automata and Language TheoryFirst Semester SY 2009-2010

Page 2: 1 - Language

To understand language To know the fundamental units of

language

Page 3: 1 - Language

How can you define a language? English Filipino Japanese Chinese Etc…

Page 4: 1 - Language

English Different entities: letters, words, and

sentences Groups of letters make up words Groups of words make up sentences Not all collection of letters form a valid

word and as well as not all collection of words form a valid sentence.

Page 5: 1 - Language

Computer Languages Certain character strings are recognizable words

(DO, IF, END, …) Strings of words that are recognizable commands

Sets of commands becomes a program ( with or without data) that can be compiled.

It is necessary for us to adopt a definition of a “Language Structure” that is, a structure in which the decision of whether a given string of units constitutes a valid larger unit is not a matter of guesswork, but based on explicitly stated rules.

Page 6: 1 - Language

Theory of Formal Languages “formal” refers to the fact that all rules

for the languages are explicitly stated in terms of what strings of symbols can occur.

No liberties are tolerated, and no reference to any “deeper understanding” is required.

Page 7: 1 - Language

Languages - will be considered solely as symbols on paper and not as expressions of ideas in the minds of humans. Not a communication among intellects, but

a game of symbols with formal rules. The term “formal”, emphasizes that it is

the form of the string of symbols we are interested in, not the meaning.

Page 8: 1 - Language

Alphabet – finite set of fundamental units out of which we build structures.

Language – a certain specified set of strings of characters from the alphabet.

Words – those strings that are permissible in the language.

Page 9: 1 - Language

(empty string or null string) – a string having no letter

No matter what “alphabet” we are considering, the null string will always and for all languages the null word, if it is a word in the language, is also

Two words are considered the same if all their letters are the same and in the same order.

Page 10: 1 - Language

- language that has no words A subtle but important difference between a word

that has no letters, , and the language that has no words,

It is not true that is a word in the language since this language has no words at all.

A certain language L does not contain the word and we wish to add it to L, we use the “union of sets” operation denoted by “+” to form L + {}.

Page 11: 1 - Language

A certain language L does not contain the word and we wish to add it to L, we use the “union of sets” operation denoted by “+” to form L + {}. This language is not equal or same as L. Nevertheless, L + is the same as L since no new

word have been added. The fact that is a language even though it

has no words will turn out to be an important distinction.

Page 12: 1 - Language

English – familiar example of language The alphabet is the usual set of letters plus the

apostrophe and hyphen. Let us denote it by a Greek letter capital sigma:

= { a b c d e . . . Z ’ -}

, , , - time-honored tradition If we wished to be supermeticulous, we would

also include in the uppercase letters and the seldom used diacritical marks.

Page 13: 1 - Language

Valid Words in the Language- Strings, more or less, as is done in a dictionary

say,ENGLISH-WORDS={all the words in a standard

dictionary}

{} – mathematical notational denoting a set

o Does not have any grammaro Formal definition of language of the sentences

in English , we must begin by saying that this time our basic alphabet is the entries in the dictionary.

Page 14: 1 - Language

Consider , the capital gamma, as the alphabet

= {the entries in a standard dictionary, plus a blank space, plus the usual punctuation marks}

Language, ENGLISH-SENTENCESRule: Grammatical RulesThe trick of defining the language ENGLISH-SENTENCES by listing all the rules of English grammar allows us to give a finite description of an infinite language.

Page 15: 1 - Language

If we go by the rules of grammar only,Examine:

“I ate three Tuesdays”- this string is allowed in formal language- grammatically correct; only its meaning reveals that it is ridiculous

Meaning is something that we do not refer to in formal languages.

Syntax, not semantics or diction “just like the bad teacher who is interested only

in the correct spelling, not the ideas in a homework composition”

Page 16: 1 - Language

Two ways of defining Abstract Languages we treat Either they will be presented as an alphabet and the

exhaustive list of all valid words, or They will be presented as an alphabet and a set of

rules defining the acceptable words. Defining a Language – presenting the alphabet

and then specifying which strings are words. The word “specify” is trickier than we may at

first suppose

Page 17: 1 - Language

Consider the Language MY-PET and the alphabet for this language is{a c d g o t}- there is only one word for this language, and for

our own perverse reasons we wish to specify it by this sentence:

If the Earth and the Moon ever collide, thenMY-PET = {cat}

but, if the Earth and the Moon never collide, then

MY-PET = {dog}

Page 18: 1 - Language

MY-PET issue One or the other of these two events will occur, but

at this point in history of the universe, it is impossible to be certain whether the word dog is or is not in the language MY-PET.

This sentence is therefore not an adequate specification of the language MY-PET because it is not useful

To be an acceptable specification of a language, a set of rules must enable us to decide, in a finite amount of time, whether a given string of alphabet letters is or is not a word in the language.

Page 19: 1 - Language

MY-PET issue continued… It is not a requirement that all the letters in

the alphabet need to appear in the words selected for the language.

Page 20: 1 - Language

Language Different entities: letters/alphabets, words, and sentences

empty string or null string a string having no letter

A language with no words

+ union of sets

Page 21: 1 - Language

Automata and Language Theory

Page 22: 1 - Language

Define words in a language

Page 23: 1 - Language

The set of language-defining rules can be of two kinds- They can either tell us how to test a

string of alphabet letters that we might be presented with, to see if it is a valid word, or

- They can tell us how to construct all the words in the language by some clear procedures

Page 24: 1 - Language

Consider an alphabet having only one letter, the letter x,

= {x}Language definition: any nonempty string of

alphabet characters is a word:L1= {x xx xxx xxxx . . .}

Alternate form:L1= {xn for n = 1 2 3 . . . }

Page 25: 1 - Language

Concatenation – two strings are written down side by side to form a new longer string Example: to concatenate the word xxx and

the word xx, we obtain the word xxxxx. The words in this language are analogous to

positive integers, and the operation of concatenation is analogous to addition.

xn concatenated with xm is the word xn+m

Page 26: 1 - Language

Concatenation continued… It is often convenient to designate the

words in a given language by new symbols, that is, other than the ones in the alphabet. Say,

The word xxx is called a and the word xx is b. Concatenating a and b,

ab = xxxxx It is not always true that when two words

are concatenated they produce another word in the language

Page 27: 1 - Language

Concatenation continued… For example, the language is

L2 = {x xxx xxxxx xxxxxxx . . . }

= {xodd} = {x2n+1 for n = 0 1 2 3 . . .}

Then, a = xxx and b = xxxxx are both words in L2, but their concatenation ab = xxxxxxxx is not in L2.

In these simple examples, when we concatenate a with b, we get the same word as when we concatenate b with a.

ab = ba

Page 28: 1 - Language

Concatenation continued…ab = ba

This relationship does not hold for all languages. In English when we concatenate “house” and “boat,” we get “houseboat,” which is indeed a word but distinct from “boathouse,” which is different thing – not because they have different meanings, but because they are different words.

“Merry-go-round” and “carousel” mean the same thing, but they are different words.

Page 29: 1 - Language

Example Consider another language. Let us begin

with the alphabet: = { 0 1 2 3 4 5 6 7 8 9 }

and define the set of words:L3= {any finite string of alphabet letters that does not start with the letter zero}L3= { 1 2 3 4 5 6 7 8 9 10 11 12 . . . }

Page 30: 1 - Language

Defining Language rules … how to test …

L1= {xn for n = 1 2 3 . . . }

… how to construct … L3= {any finite string of alphabet letters that

does not start with the letter zero}

Page 31: 1 - Language

DEFINITION We define the function length of a string to be a

number of letters in the string.Example:

a = xxxx in the language L1

length(a) = 4

if c = 428 in the language L3

length© = 3Or, we could write directly that in L1

length(xxxx) = 4and in L3

length(428) = 3

Page 32: 1 - Language

Example:length() = 0

if length(w) = 0, then w = .

therefore, another definition of L3

L3={ any finite string of alphabet letters that, if it has a length more than 1, does not start with a zero}

Page 33: 1 - Language

There is some inherent ambiguity in the phrase “any finite string,” since it is not clear whether we intend to include the null string (, the string of no letters). We’ll define a language like L1 but that does

contain .

L4 = { x xx xxx xxxx . . . }

= { xn for n = 0 1 2 3 . . . }

Here x0 = , not x0 = 1 as in algebra.

Page 34: 1 - Language

DEFINITION Let us introduce the function reverse. If a

is a word in some language L, then reverse(a) is the same string of letters spelled backward, called the reverse of a, even if this backward string is not a word in L.

Page 35: 1 - Language

EXAMPLEreverse(xxx) = xxxreverse(xxxx) = xxxxreverse(145) = 541

But let us also note that in L3

reverse(140) = 041which is not a word in L3.

Page 36: 1 - Language

DEFINITIONLet us define a new language called PALINDROME over the alphabet = {a b}

PALINDROME = {, all strings x such that reverse(x)=x}

If we begin listing the elements in PALINDROME, we findPALINDROME = { a b aa bb aaa aba bab bbb

aaaa abba . . .}*Concatenation???

Page 37: 1 - Language

DEFINITIONGiven an alphabet , we wish to define a language in which any string of letters from is a word, even the null string. This language we shall call the closure of the alphabet. It is denoted by writing a star (an asterisk) after the name of the alphabet as a superscript:

*

This notation is sometimes known as the Kleene star after the logician who was one of the founders of this subject.

Page 38: 1 - Language

EXAMPLEIf = {x}, then

*=L4= { x xx xxx . . .}

EXAMPLEIf = { 0 1 }, then

* = { 0 1 00 01 10 11 000 001 . . .}

EXAMPLEIf = {a b c}, then

* ={ a b c aa ab ac ba bb bc ca cb cc aaa …}

Page 39: 1 - Language

We can think of the Kleene star as an operation that makes an infinite language of strings of letters out of an alphabet. Infinitely many words, each of finite length.

Lexicographic order words arranged in size order (words of shortest length first) and

then listed all the words of the same length alphabetically. We shall usually follow this method of sequencing a language.

DEFINITIONIf S is a set of words, then by S* we mean the set of all finite strings formed by concatenating words from S, where any word may be used as often as we like, and where the null string is also included.

Page 40: 1 - Language

Let us not make the mistake of confusing the two language

ENGLISH-WORDS* and ENGLISH-SENTENCES

the first language contains the word butterbutterbutterhat, the second does not.

Words in the ENGLISH-WORDS are the concatenate of arbitrarily many words from ENGLISH-WORDS

While ENGLISH-SENTENCES are restricted to juxtaposing only words from ENGLISH-WORDS in an order that compiles with the rules of grammar.

Page 41: 1 - Language

EXAMPLEIf S = {aa b}, then

S*= { plus any word composed of factors of aa and b}

= { plus all strings of a’s and b’s in which the a’s occur in even clumps}

= { b aa aab baa bbb aaaa aabb baab bbaa

bbbb aaaab aabaa aabbb baaaa baabb bbaab … }

The string aabaaab is not S* since it has a clump of a’s of length 3.

Page 42: 1 - Language

EXAMPLEIf S = {a ab}, then S*= { plus any word composed of factors of a and ab}

= { plus all strings of a’s and b’s except those that

start with b and those that contain double b} = { a aa ab aaa aab aba aaaa aaab aaba

abaa abab aaaaa aaaab aaaba aabaa aabab

abaaa abaab ababa ...}

Page 43: 1 - Language

To prove that a certain word is in the closure language S*, we must show how it can be written as a concatenate of words from the base set S.

In the last example, to show that abaab is in S*, we can factor as follows:

(ab)(a)(ab)These three factors are all in the set S; therefore, their concatenation is in S*. This is the only way to factor this string into factors of (a) and (ab). When this happens, we say that the factoring is unique.

Page 44: 1 - Language

Sometimes factoring is not unique. For example, consider S= {xx xxx}. Then S*= { and all strings of more than one x }

= {xn for n = 0 2 3 4 5 . . .} = { xx xxx xxxx xxxxx xxxxxx . . . }

Notice that the word x is not in the language S*. The string xxxxxxx is in the closure for any of these three reasons. It is

(xx)(xx)(xxx) or (xx)(xxx)(xx) or (xxx)(xx)(xx)Also, x6 is either x2x2x2 or else x3x3.

Page 45: 1 - Language

Proof by constructive algorithm - the method of proving that something exists by showing how to create it.

If the alphabet has no letters, then its closure is the language with the null string as its only word, because is always a word in a Kleene closure. Symbolically, we write

If = (the empty set), *= { }

This is not the same as if S = { },then S*= { }

which is also true for a different reason, that is = .

Page 46: 1 - Language

The Kleene closure of two sets can end up being he same language even if the two sets that we started with were not.

EXAMPLEConsider the following languages

S = { a b ab} and T= {a b bb}Then both S* and T* are languages of a’s and b’s since any string of a’s and b’s can be factored into syllables or either (a) and (b), both of which are in S and T.

Page 47: 1 - Language

If for some reason we with to modify the concept of closure to refer only to the concatenation of some (not zero) string for a set S, we use the notation + instead of *. For Example

if S={x},then S+ = {x xx xxx xxxx ….}(S+ = S* except the word ^)

Page 48: 1 - Language

This is not to say that S+ cannot, in general, contain a word ^. It can, but only on the condition that S contains the word I initially.

What happens if we apply the closure operator twice?

(S*)* or S**

Page 49: 1 - Language

Theorem 1 For any set S of strings we have S*=S**

Proof It is analogous to say that if people are

made up of molecules and molecules are made up of atoms, the people are made up of atoms

Page 50: 1 - Language

Every words in S** is made up of factors from S*. Every factor from S* is made up of factors from S. Therefore, every word in S** is made up of factors from S. Therefore, every word in S** is also a word in S*. We can write

S** S* means “is contained in or equal

to.”

Page 51: 1 - Language

Now, in general, it is true that for any set A we know that AA*, since A* we can chose as a word any one from A. So if we consider A to be our set S*, we have

S* S**

Together, these two inclusions prove that

S*=S**

Page 52: 1 - Language

Consider the language S*, where S = {ab ba}. a. Write at least 30 words in S*. b. Can any of this language contain the substring aaa or bbb?c. Give another description of this

language.

Page 53: 1 - Language

Consider the language S*, where S={a b}. How many word does this language have:

a. of length 2b. of length 3c. of length n

Page 54: 1 - Language

Consider the language S*, where S={aa aba baa}.

a. Show that the words:aabaa, baaabaaa, &

baaaaababaaaa are all in this language.

b. Can any in this language have an odd total number

Page 55: 1 - Language

1. Consider the language S*,where S = {aa ab ba bb}

Give another description of this language

2. Give an example of set S such that S* only contains all possible strings of a’s and b’s that have length divisible by 3.

Page 56: 1 - Language

3. Prove that for all sets S.a. (S+)* = (S*)*b. (S+)+ = S+

c. is (S*)+ = (S+)* for all sets S?

Page 57: 1 - Language