36
Lecture 1 String and Language

Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Embed Size (px)

Citation preview

Page 1: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Lecture 1 String and Language

Page 2: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

String

• string is a finite sequence of symbols.

For example,

string ( s, t, r, i, n, g)

CS4384 ( C, S, 4, 3, 8)

101001 (1, 0)

• Symbols are given through alphabet.

• An alphabet is a finite set of symbols.

Page 3: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Examples of Alphabet

• {a, b, c, ..., x, y, z} (Roman alphabet)

• {0, 1, ..., 9}

• {0, 1} (binary alphabet)

Page 4: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Length of a String

• The length of a string x is the number of symbols contained in the string x, denoted by |x|.

• For example, | string | = 6,

• |CS5400| = 6, | 101001 | = 6.

• The empty string is a string having

no symbol, denoted by ε.

Page 5: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Equal

• Two strings x1x2···xn and y1y2···ym are

equal if and only if

(1) n=m and

(2) xi=yi for all i.

• For example, 01 ≠ 010 and 1010 ≠1110.

Page 6: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Substring

• s is a substring of x if there exist strings y and z such that x = ysz. • In particular, when x = sz (y=ε), s is called a prefix of

x; when x = ys (z=ε), s is called a suffix of x. For example, CS is a prefix of CS5400• and 5400 is a surfix of CS5400.

Page 7: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Concatenation

• The concatenation of two strings x and y is a string xy, i.e., x is followed by y.• For example, CS5400 is a concatenation

of CS and 5400. • In particular, we denote xx = x, xxx = x, xxxx = x, ..., and define x = ε• For example, 101010 = (10), (10) = ε

2 3 4

0

3 0

Page 8: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Solve equation 011x=x011

• If x=ε, then ok.

• If |x|=1, then no solution.

• If |x|=2, then no solution.

• If |x|>3, then x=011y. Hence,

011x=011y011. So, x=y011.

Hence, 011y=y011.

• x=(011) for k > 0k

Page 9: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Language

• A language is a set of strings. For example, {0, 1}, {all English words}, {0, 0,

0, ...} are all languages.• The following are operations on sets and hence

also on languages. Union: A U B Intersection: A ∩ B Difference: A \ B (A - B when B A) Complement: A = Σ* - A where Σ* is the set of all

strings on alphabet Σ.

0 1 2

_

Page 10: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Concatenation of Languages

• Concatenation: AB = {ab | a \in A, b \in B}

• For example, {0, 1}{1, 2} = {01, 02, 11, 12}.

• Especially, we denote A = A, A = AA, ...,

and define A = {ε}.0

1 2

Page 11: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

If AB=B for any B, then A ={ε}.

• Choose B = {ε }. Then A ≠ empty and A cannot contain a nonempty string.

Page 12: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Examples

• For Σ = {0, 1}, Σ = {00, 01, 10, 11},

• (Σ is the set of all strings of length k on Σ.)

Therefore,

• Σ* = Σ U Σ U Σ U ···.

2

0 1 2

k

Page 13: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Kleene Closure

• Kleene closure:

A* = A U A U A U ···

• Notation:

A = A U A U A U ···

0 1 2

+ 1 2 3

Page 14: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

• A={grand, ε}, B={father, mother}.

What is A*B?

• A*B={father, mother, grandfather, grandmother, …}

Page 15: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

What is ?• What is ?• What is ?

• Where is the empty language.

0

*

Page 16: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

A* = A if and only if ε is in A

• If ε is in A, then ε is in A. Hence A* = A.

• If ε is not in A, then ε is not in A.

Hence A* ≠ A.

+

+ +

+

+

Page 17: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

{0, 10}* is the language of strings not containing substring 11 and not

ending with 1.

• What is the language of strings not containing substring 11 and ending with 0?

• {0, 10}+

Page 18: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Puzzle

• How many strings of length at most 40 are in the following language ?

552 }0,0,0,{

Page 19: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Lecture 2 Regular Language and Regular Expression.

Page 20: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Regular Languages

• The concept of regular languages on an alphabet Σ is defined recursively as follows:

(1) The empty language is regular.

(2) For every symbol a Σ, {a} is regular.

(3) If A and B are regular languages, then

A U B, AB, and A* are regular.

(4) Nothing else is a regular language.

Page 21: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

{ε} is regular.• Because the empty language is regular,

= {ε} is regular.

*

Page 22: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

For Σ={0,1}, {011} is regular.

• Since {0} and {1} are regular, {011}={0}{1}{1} is regular

• Remark: Every language containing only one string is regular.

Page 23: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

{011,100} is regular.

• Because {011} and {100} are regular,

{011, 100} = {011}U{100} is regular.

• Remark: Every finite language is regular.

• Remark: Every infinite regular language must be obtained with Kleene closure.

Page 24: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Operation Preference

• ({0}*U{0}{1}{1}*){0}{0}{1}*

• (1) Kleene closure has the higher preference over union and concatenation.

• (2) Concatenation has the higher preference over union.

Page 25: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

The language of all binary strings starting with 01 is regular.

Proof. The string in this language is in form 01x1··· xn

where x1··· xn {0,1}*. Therefore, the

language can be written as

{01} {0,1}* = ({0}{1})({0} U {1})*,

which is regular.

Page 26: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

The language of all binary strings ending at01 is regular.

Proof. The string in this language is in form

x1··· xn01

where x1 ··· xn {0,1}*. Therefore, the

language can be written as

{0,1}*{01} = ({0} U {1})*({0}{1}),

which is regular.

Page 27: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

The language of all binary strings having substring 01 is regular.

Proof. The string in this language is in form

x1 ··· xn01y1 ··· ym

where x1 ··· xn, y1 ··· ym {0,1}*. Therefore,

the language can be written as

{0,1}* {01} {0,1}* =({0}U{1})*({0}{1})({0}U{1})*,

which is regular.

Page 28: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Question:

Do you fell that the expression of the regular

set in the above example contains too many

parentheses?

• Here is a simple expression -- Regular Expression

Page 29: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Regular Expression

• (1) is a regular expression of the empty

language.• (2) ε is a regular expression of {ε}.• (3) For any symbol a, a is a regular

expression of {a}.• (4) If rA and rB are regular expressions of languages A

and B, then rA+rB is a regular expression of A U B, rArB is a regular expression of AB, and rA* is a regular expression of A*.

Page 30: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Examples

• 011 is a regular expression of {0}{1}{1}.

• 0+1 is a regular expression of {0,1}.

• (0+1)* is a regular expression of {0,1}*.

• Remark: (0+1) is also considered to be

a regular expression of {0, 1}.

+

+

Page 31: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

• The language of all binary strings starting with 01 has a regular expression

01(0+1)*.

• The language of all binary strings ending at

01 has a regular expression

(0+1)*01.

• The language of all binary strings having substring 01 has a regular expression

(0+1)*01(0+1)*.

Page 32: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Induction Proof• Because the regular language is defined recursively,• we can prove the property of regular languages by• proving the following: (1) has the property. (2) For any symbol a Σ, {a} has the property. (3) If A and B has the property, then all A U B, AB, and A* have the property. • Actually, this is an induction proof. (1), (2) serve the basis step and (3) is the induction step.

Page 33: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

• For a string x=x1x2…xn, x =xn…x2x1. • For a language A, A = {x | x A}.• Show that if A is regular, so is A.

Proof. (1) is regular.(2) For any symbol a, {a} = {a} is regular.(3) Suppose that for regular languages A and B, A

and B are regular. Then (A U B) = A U B is regular, (AB) = B A is regular. (A*) = (A )* is regular.

R

R

R

R

R

R

R

R

R R

R R

R R

R

R

Page 34: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Find a regular expression for {xwx | x (0+1)*, w (0+1)*}

• {xwx | x (0+1)*, w (0+1)*} = (0+1)*

R

R

Page 35: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Find a regular expression for {xwx | x (0+1), w (0+1)*}

• {xwx | x (0+1), w (0+1)*}

= 0(0+1)*0 + 1(0+1)*1

R

R +

+

Page 36: Lecture 1 String and Language. String string is a finite sequence of symbols. For example, string ( s, t, r, i, n, g) CS4384 ( C, S, 4, 3, 8) 101001 (1,

Puzzle

• How many regular expressions can a language have?