Upload
damon-fisher
View
218
Download
0
Embed Size (px)
Citation preview
Regular Expressions (RE's)– Review
A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite State Automaton (DFA) Steps:
1: RE -> Non Deterministic Finite State Automaton (FSA)2: FSA -> DFA3: DFA -> minDFA
Aim is the create a mechanism to recognise valid words in a Language. In our course it means recognising words like int, float, public etc. These are called Tokens. NB: Also it classifies the Tokens !!
JLex
Java version of Lex. Given a file containing RE's and JLex macros (.lex file) We run JLex over this .lex file and a .java file is produced. We then call JLex to produce a Token by using next_token(). No need to code the DFA ourselves, it is automatic, saves time.
Limitations of RE's
Say we define the following RE's: digits = [0-9]+ sum = (digits “+” )* digits we can define sums like 3+78+9 etc.
If we have: digits = [0-9]+ sum = expr “+” expr expr = “(“ sum “)” | digits we can define (1+(5+8)) etc.
It is impossible for a RE to recognise balanced parenthesis. A machine with only N states can onle recognise N levels of parenthesis nesting. Therefore we need a new notation to represent the language above. We move on to Context Free Grammars.
Context Free Grammars (CFG's)
RE's define lexcial structure declaratively. Similarly CFG's define syntactic structure declaratively.
Definitions: A langauge is a set of strings. Each string is a finite sequence of symbols. Symbols come from a finite alphabet. CFG's describe languages and is formed of productions. E.g. symbol -> sym1 sym2 sym3 ...... sym(N) Symbols are either
1: Terminal < -- > Token2: Non Terminal : Variable to denote a set of Strings.