30
1 Regular Expressions & Automata Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park

Regular Expressions & Automata

  • Upload
    dena

  • View
    62

  • Download
    0

Embed Size (px)

DESCRIPTION

Regular Expressions & Automata . Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park. Overview. Regular expressions Notation Patterns Java support Automata Languages Finite State Machines Turing Machines Computability. - PowerPoint PPT Presentation

Citation preview

Page 1: Regular Expressions & Automata

1

Regular Expressions & Automata

Nelson Padua-PerezBill Pugh

Department of Computer ScienceUniversity of Maryland, College Park

Page 2: Regular Expressions & Automata

2

Overview

Regular expressionsNotationPatternsJava support

AutomataLanguagesFinite State MachinesTuring MachinesComputability

Page 3: Regular Expressions & Automata

3

Regular Expression (RE)

Notation for describing simple string patternsVery useful for text processing

Finding / extracting pattern in textManipulating stringsAutomatically generating web pages

Page 4: Regular Expressions & Automata

4

Regular Expression

Regular expression is composed ofSymbolsOperators

Concatenation ABUnion A | BClosure A*

Page 5: Regular Expressions & Automata

5

Definitions

AlphabetSet of symbols Examples {a, b}, {A, B, C}, {a-z,A-Z,0-9}…

StringsSequences of 0 or more symbols from alphabetExamples , “a”, “bb”, “cat”, “caterpillar”…

LanguagesSets of stringsExamples , {}, {“a”}, {“bb”, “cat”}…

empty string

Page 6: Regular Expressions & Automata

6

More Formally

Regular expression describes a language over an alphabetL(E) is language for regular expression E

Set of strings generated from regular expressionString in language if it matches pattern specified by regular expression

Page 7: Regular Expressions & Automata

7

Regular Expression Construction

Every symbol is a regular expressionExample “a”

REs can be constructed from other REs usingConcatenationUnion |Closure *

Page 8: Regular Expressions & Automata

8

Regular Expression Construction

ConcatenationA followed by BL(AB) = { st | s L(A) AND t L(B) }

Example a

{“a”}ab

{“ab”}

Page 9: Regular Expressions & Automata

9

Regular Expression Construction

UnionA or BL(A | B) = L(A) union L(B) = { s | s L(A) OR s L(B) }

Example a | b

{“a”, “b”}

Page 10: Regular Expressions & Automata

10

Regular Expression Construction

ClosureZero or more AL(A*) = { s | s = OR s L(A)L(A*) } = = { s | s = OR s L(A) OR s L(A)L(A) OR ... }

Examplea*

{, “a”, “aa”, “aaa”, “aaaa” …}(ab)*c

{“c”, “abc”, “ababc”, “abababc”…}

Page 11: Regular Expressions & Automata

11

Regular Expressions in Java

Java supports regular expressions In java.util.regex.*Applies to String class in Java 1.4

Introduces additional specification methodsSimplifies specificationDoes not increase power of regular expressionsCan simulate with concatenation, union, closure

Page 12: Regular Expressions & Automata

12

Regular Expressions in Java

Concatenationab “ab”(ab)c “abc”

Union ( bar | or square brackets [ ] for chars)a | b “a”, “b”[abc] “a”, “b”, “c”

Closure (star *)(ab)* , “ab”, “abab”, “ababab” …[ab]* , “a”, “b”, “aa”, “ab”, “ba”, “bb” …

Page 13: Regular Expressions & Automata

13

Regular Expressions in Java

One or more (plus +)a+ One or more “a”s

Range (dash –)[a–z] Any lowercase letters[0–9] Any digit

Complement (caret ^ at beginning of RE)[^a] Any symbol except “a”[^a–z] Any symbol except lowercase letters

Page 14: Regular Expressions & Automata

14

Regular Expressions in Java

PrecedenceHigher precedence operators take effect first

Precedence orderParentheses ( … )Closure a* b+Concatenation abUnion a | bRange [ … ]

Page 15: Regular Expressions & Automata

15

Regular Expressions in Java

Examplesab+ “ab”, “abb”, “abbb”, “abbbb”…(ab)+ “ab”, “abab”, “ababab”, …ab | cd “ab”, “cd”a(b | c)d “abd”, “acd”[abc]d “ad”, “bd”, “cd”

When in doubt, use parentheses

Page 16: Regular Expressions & Automata

16

Regular Expressions in Java

Predefined character classes[.] Any character except end of line[\d] Digit: [0-9][\D] Non-digit: [^0-9][\s] Whitespace character: [ \t\n\x0B\f\r][\S] Non-whitespace character: [^\s][\w] Word character: [a-zA-Z_0-9][\W] Non-word character: [^\w]

Page 17: Regular Expressions & Automata

17

Regular Expressions in Java

Literals using backslash \Need two backslashJava compiler will interpret 1st backslash for String

Examples\\] “]”\\. “.”\\\\ “\”

4 backslashes interpreted as \\ by Java compiler

Page 18: Regular Expressions & Automata

18

Using Regular Expressions in Java

Compile patternimport java.util.regex.*;Pattern p = Pattern.compile("[a-z]+");

Create matcher for specific piece of text Matcher m = p.matcher("Now is the time");

Search textboolean found = m.find();

Returns true if pattern is found anywhere in textboolean exact = m.matches()

returns true if pattern matches entire test

Page 19: Regular Expressions & Automata

19

Using Regular Expressions in Java

If pattern is found in textm.group() string found m.start() index of the first character matchedm.end() index after last character matchedm.group() is same as s.substring(m.start(), m.end())

Calling m.find() againStarts search after end of current pattern match

Page 20: Regular Expressions & Automata

20

Complete Java ExampleCode

Outputow – is – the – time –

import java.util.regex.*;public class RegexTest { public static void main(String args[]) { Pattern p = Pattern.compile(“[A-Z]*([a-z]+)”); Matcher m = p.matcher(“Now is the time”); while (m.find()) { System.out.println(m.group() + “ – ” m.group(1));} } }

Page 21: Regular Expressions & Automata

21

Language RecognitionAccept string if and only if in language Abstract representation of computationPerforming language recognition can be

SimpleStrings with even number of 1’s

Not SimpleStrings with any number of a’s, followed by the same number of b’s

HardStrings representing legal Java programs

Impossible!Strings representing nonterminating Java programs

Page 22: Regular Expressions & Automata

22

Automata

Simple abstract computersCan be used to recognize languagesFinite state machine

States + transitions

Turing machineStates + transitions + tape

Page 23: Regular Expressions & Automata

23

Finite State Machine

StatesStartingAcceptingFinite number allowed

TransitionsState to stateLabeled by symbol

L(M) = { w | w ends in a 1}

q1 q2

0

10 1

Start State

Accept Statea

Page 24: Regular Expressions & Automata

24

Finite State Machine

Operations Move along transitions based on symbol Accept string if ends up in accept stateReject string if ends up in non-accepting state

q1 q2

0

10 1

“011” Accept

“10” Reject

Page 25: Regular Expressions & Automata

25

Finite State Machine

PropertiesPowerful enough to recognize regular expressionsIn fact, finite state machine regular expression

Languages recognized by

finite state machines

Languages recognized by

regular expressions

1-to-1 mapping

Page 26: Regular Expressions & Automata

26

Turing Machine

Defined by Alan Turing in 1936Finite state machine + tapeTape

Infinite storageRead / write one symbol at tape headMove tape head one space left / right

Tape Head

… …q1 q2

0

10 1

Page 27: Regular Expressions & Automata

27

Turing Machine

Allowable actionsRead symbol from current squareWrite symbol to current squareMove tape head leftMove tape head rightGo to next state

Page 28: Regular Expressions & Automata

28

Turing Machine

* 1 0 0 1 0 *… …

Current State

Current Content

Value to Write

Direction to Move

New state to enter

START * * Left MOVING

MOVING 1 0 Left MOVING

MOVING 0 1 Left MOVING

MOVING * * No move HALT

Tape Head

Page 29: Regular Expressions & Automata

29

Turing Machine

OperationsRead symbol on current squareSelect action based on symbol & current stateAccept string if in accept stateReject string if halts in non-accepting stateReject string if computation does not terminate

Halting problemIt is undecidable in general whether long-running computations will eventually accept

Page 30: Regular Expressions & Automata

30

Computability

ComputabilityA language is computable if it can be recognized by some algorithm with finite number of steps

Church-Turing thesisTuring machine can recognize any language computable on any machine

IntuitionTuring machine captures essence of computing

Both in a formal sense, and in an informal practical sense