25
Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University) DANSAS 2010 (In proc. of PPDP 2010) Typed and Unambiguous Pattern Matching on Strings using Regular Expressions [http://xkcd.com/208

Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)

  • Upload
    reba

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Typed and Unambiguous Pattern Matching on Strings using Regular Expressions. Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University). DANSAS 2010 (In proc. of PPDP 2010). [http://xkcd.com/208/]. Main Message. For regular expressions : Pattern matching - PowerPoint PPT Presentation

Citation preview

Page 1: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

Claus Brabrand (ITU Copenhagen) & Jakob G. Thomsen (Aarhus University)DANSAS 2010 (In proc. of PPDP

2010)

Typed and Unambiguous Pattern Matching on Strings

using Regular Expressions

[http://xkcd.com/208/]

Page 2: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

2

Main MessageFor regular expressions:Pattern matching

Precise syntax-directed ambiguity analysis

Typed mapping into a target language

Page 3: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

3

Introduction & Motivation

Parsing dynamic input is an ubiquitous problem

URLs:

Log Files:

The solution is pattern matching

http://www.cs.au.dk/index.php?id=141&view=details

13/02/2010 66.249.65.107 get /support.html20/02/2010 42.116.32.64 post /search.html

protocol host path query-string

(list of key-value pairs)

Page 4: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

4

Example Example (date):

Matching against string: yields:

"26/06/1992"day = 26 month = 06 year = 1992

[0-9]{1,2} "/" [0-9]{1,2} "/" [0-9]{4}

<day = [0-9]{1,2} > "/" <month = [0-9]{1,2} > "/" <year = [0-9]{4} >

Page 5: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

5

Example (date):

String 2082010: day = 2 and month = 08 (ie. 2nd of

August) day = 20 and month = 8 (ie. 20th of

August)

Example

<day = [0-9]{1,2} > "/" <month = [0-9]{1,2} > "/" <year = [0-9]{4} >

<day = [0-9]{1,2} > <month = [0-9]{1,2} > <year = [0-9]{4} >

Page 6: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

6

Expressive (enough)DeclarativeDecidable propertiesWell known

Why regular expressions?

Page 7: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

7

.

Outline Our setup Regular Expressions:

The Recording Construction

Ambiguity: Disambiguation

Type Mapping Conclusion

Page 8: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

8

Our setup<URL = [a-z]*>;...

url.rex

URL.java...

Compile (our tool)

Compile (javac)

URL.classFoo.class...

import URL;class Foo { ...}

Foo.java URL.javaFoo.java...

Page 9: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

9

Outline Our setup Regular Expressions:

The Recording Construction

Ambiguity: Disambiguation

Type Mapping Conclusion

Page 10: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

10

Regular Expressions Syntax:

Semantics:

where: L1 L2 is concatenation (i.e., { 1 2 | 1L1,

2L2 }) L* = i0 Li where L0 = { } and Li = L

Li-1

Usual extensions : Any character ”.” as c1|c2|...|cn,

ci Character ranges ”[a-z]” as

a|b|...|z Repetitions ”R{2,3}” as RR|

RRR

Page 11: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

Recording Syntax:

” ” is a recording identifier (it "remembers" the substring it matches)

Semantics:

Example (simplified emails):

Matching against string:yields:

[a-z]+ "@" [a-z]+ ("." [a-z]+)*

"[email protected]"

user = "obama" domain = "whitehouse.gov"&

<user = > <domain = >

11

Related: "x as R" in XDuce; "x::R" in CDuce; and "x@R" in Scala and HaRP

Page 12: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

12

Outline Our setup Regular Expressions:

The Recording Construction

Ambiguity: Disambiguation

Type Mapping Conclusion

Page 13: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

13

Ambiguity Example from before

matched on the string “208” gives rise to: day = 2 and month = 08 (ie. 2nd of August) day = 20 and month = 8 (ie. 20th of August)

Multiple ways of matching => ambiguous

Problem: Concatenation

<day = [0-9]{1,2} > <month = [0-9]{1,2} >

2 0 8

day month

Page 14: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

14

Ambiguity analysis Theorem:

R unambiguous iff NB: sound & complete !

Related work: [Brabrand+Giegerich+Møller’09]: Similar approach for context free grammars.[Book+Even+Greibach+Ott'71] and [Hosoya'03] for XDuce but indirectly via NFA, not directly (syntax-directed).

Page 15: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

15

Outline Our setup Regular Expressions:

The Recording Construction

Ambiguity: Disambiguation

Type mapping Conclusion

Page 16: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

16

2) Restriction: R1 - R2

L(R1 - R2) = L(R1) \ L(R2)

4) Default disambiguation: concat, choice, and star

are all left-biased (by default) !

(Our tool does this)

1) Manual rewriting: Always possible :-) Tedious :-( Error-prone :-( Not structure-preserving :-(

3) Disambiguators: Three basic operators choice:'|L', '|R' concat: 'L', 'R' star: '*L', '*R'

Disambiguation

<foo = a > | <bar = a* >is rewritten to <foo = a > | <bar = |aaa* >

<foo = a > | <bar = a* >using restriction <foo = a > | <bar = a*-a >

<foo = a > | <bar = a* >using restriction we get <foo = a > |L <bar = a* >

<foo = a > | <bar = a* >no need to rewrite

Related work: [Vansummeren'06] but with global, not local disambiguation

Page 17: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

17

Outline Our setup Regular Expressions:

The Recording Construction

Ambiguity: Disambiguation

Type Mapping Conclusion

Page 18: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

Type Mapping Our date example

Type of the recordings day, month, and year? Strings (=> many type casts) Infer the type

<day = [0-9]{2} > "/" <month = [0-9]{2} > "/”<year = [0-9]{4} >

18

Page 19: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

Type Mapping A recording has three type components:

a linguistic type (language of the recording - maps to String, int, float, etc).

a structural type (nested recordings – maps to (nested) classes).

a type modifier (maps to lists).

19

Related work: Exact type inference in XDuce & CDuce(soundness+completeness proof in [Vansummeren'06])but not for stand-alone and non-intrusive usage (Java)

Page 20: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

20

Type Mapping ExamplePerson = <name = > " (" <age = > ")"[a-z]+ [0-9]+

class Person { // auto-generated String name; int age; static Person match(String s) { ... } public String toString() { ... }}

compile(our tool)

String s = "obama (48)";

Person p = Person.match(s);print(p.name + " is " + p.age + "y old");

Usage

Page 21: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

21

ConclusionRegular expressions are alive and well. This paper: Used for pattern matching Precise ambiguity analysis Type mappingFuture work: improve performance, subtype of

recordings "trade (excess) expressivity for

safety+simplicity” Thank you. Questions?

Page 22: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

22

Abstract Syntax Trees (ASTs)

Page 23: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

23

Ambiguity Definition:

R ambiguous iffT,T'ASTR: T T' ||T|| = ||T'||

where ||||: AST * (the flattening) is:

TR

T'R'

=

Page 24: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

24

Characterization of Ambiguity

Theorem: R unambiguous iff

NB: sound & complete !

R* = | RR*

Page 25: Claus  Brabrand  (ITU Copenhagen) &  Jakob G. Thomsen (Aarhus University)

25

Type Inference Type Inference:

R : (L,S)