4
Regular Expressions (RE's)– Review A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite State Automaton (DFA) Steps: 1: RE -> Non Deterministic Finite State Automaton (FSA) 2: FSA -> DFA 3: DFA -> minDFA Aim is the create a mechanism to recognise valid words in a Language. In our course it means recognising words like int, float, public etc. These are called Tokens. NB: Also it classifies the Tokens !!

Regular Expressions (RE's)– Review A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite

Embed Size (px)

Citation preview

Page 1: Regular Expressions (RE's)– Review A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite

Regular Expressions (RE's)– Review

A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite State Automaton (DFA) Steps:

1: RE -> Non Deterministic Finite State Automaton (FSA)2: FSA -> DFA3: DFA -> minDFA

Aim is the create a mechanism to recognise valid words in a Language. In our course it means recognising words like int, float, public etc. These are called Tokens. NB: Also it classifies the Tokens !!

Page 2: Regular Expressions (RE's)– Review A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite

JLex

Java version of Lex. Given a file containing RE's and JLex macros (.lex file) We run JLex over this .lex file and a .java file is produced. We then call JLex to produce a Token by using next_token(). No need to code the DFA ourselves, it is automatic, saves time.

Page 3: Regular Expressions (RE's)– Review A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite

Limitations of RE's

Say we define the following RE's: digits = [0-9]+ sum = (digits “+” )* digits we can define sums like 3+78+9 etc.

If we have: digits = [0-9]+ sum = expr “+” expr expr = “(“ sum “)” | digits we can define (1+(5+8)) etc.

It is impossible for a RE to recognise balanced parenthesis. A machine with only N states can onle recognise N levels of parenthesis nesting. Therefore we need a new notation to represent the language above. We move on to Context Free Grammars.

Page 4: Regular Expressions (RE's)– Review A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite

Context Free Grammars (CFG's)

RE's define lexcial structure declaratively. Similarly CFG's define syntactic structure declaratively.

Definitions: A langauge is a set of strings. Each string is a finite sequence of symbols. Symbols come from a finite alphabet. CFG's describe languages and is formed of productions. E.g. symbol -> sym1 sym2 sym3 ...... sym(N) Symbols are either

1: Terminal < -- > Token2: Non Terminal : Variable to denote a set of Strings.