63
Learning Semantic String Transformations from Examples Rishabh Singh and Sumit Gulwani

Learning Semantic String Transformations from Examples

Embed Size (px)

DESCRIPTION

Learning Semantic String Transformations from Examples. Rishabh Singh and Sumit Gulwani. FlashFill. Transformations. Syntactic Transformations Concatenation of regular expression based substring “VLDB2012”  “VLDB” Semantic Transformations More than just characters - PowerPoint PPT Presentation

Citation preview

Page 1: Learning Semantic String Transformations from Examples

Learning Semantic String Transformations from

ExamplesRishabh Singh and Sumit

Gulwani

Page 2: Learning Semantic String Transformations from Examples

FlashFill

Page 3: Learning Semantic String Transformations from Examples
Page 4: Learning Semantic String Transformations from Examples

Transformations

• Syntactic Transformations – Concatenation of regular expression based

substring

– “VLDB2012” “VLDB”

• Semantic Transformations–More than just characters– “1/5/2010” “May 1st 2010”

Page 5: Learning Semantic String Transformations from Examples

Semantic Transformations

• Semantic information as relational tables– 1 January, 2 February

• Learn table lookup queries– VLOOKUP macro 2nd most problematic

Page 6: Learning Semantic String Transformations from Examples

Outline

• Lookup Transformations

• Lookup + Syntactic Transformations

• Case Studies

Page 7: Learning Semantic String Transformations from Examples

Table Lookup Transformati

ons

Demo

Page 8: Learning Semantic String Transformations from Examples

Learning Framework

Input Strings

FOutput String

F1

1. Domain-specific Language L

Fn…

2. Algorithm to learn all Fs from (i,o)

Page 9: Learning Semantic String Transformations from Examples

Lookup Transformation Language

Page 10: Learning Semantic String Transformations from Examples

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

023-34-3254 6418 Mary Dina

Input v1 Output

044-58-3429 Steve Russell

Select(Name, EmpRecord, (SSN = v1))

Example - Lookup

Page 11: Learning Semantic String Transformations from Examples

ItemRec

ItemId Item

ST-340 Stroller

BI-567 Bib

DI-328 Diapers

WI-989 Wipes

AS-469 Aspirator

PriceRec

ItemId Price

ST-340 $145.67

BI-567 $3.56

DI-328 $21.45

WI-989 $5.12

AS-469 $2.56

Input v1 Output

Stroller $145.67

Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))

Example – Transitive Lookup

Page 12: Learning Semantic String Transformations from Examples

Learn Query

ItemRec

ItemId Item

ST-340 Stroller

BI-567 Bib

DI-328 Diapers

WI-989 Wipes

AS-469 Aspirator

PriceRec

ItemId Price

ST-340 $145.67

BI-567 $3.56

DI-328 $21.45

WI-989 $5.12

AS-469 $2.56

Input v1 Output

Stroller $145.67

Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))

Page 13: Learning Semantic String Transformations from Examples

Synthesis Algorithm :

• Input: (input state , output string )

• Output: all conforming expressions

• Reachability algorithm from input strings

Page 14: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

Strings reachable from input row044-58-3429

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

𝜂1 𝜂2 𝜂3Progs [𝜂 1 ]= {𝑣1 }

Page 15: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

strings in table rows of visited nodes 044-58-3429 1125 Steve Russell

)B≡ {∧𝐶𝑖={𝑣𝑎𝑙−1 (𝑇 [𝐶𝑖 ,𝑟 ] ) }} 𝑗

Page 16: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

……..Repeat until k steps or

fixpoint

Page 17: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

……..

Steve Russell

𝜂Progs [𝜂 ]

Page 18: Learning Semantic String Transformations from Examples

GenerateSt r𝑡• Sound and k-complete

– t: number of reachable strings– p: number of candidate keys–m: maximum size of a candidate key

Page 19: Learning Semantic String Transformations from Examples

Data structure

• Maintains tree structure– share common sub-expressions

• CNF of Boolean Conditionals– independent column predicates

Page 20: Learning Semantic String Transformations from Examples

Intersect t :D t1∧Dt 2

∧ ≡

Page 21: Learning Semantic String Transformations from Examples

Synthesize Procedure

Synthesize((i1,o1), …, (in,on))

P = GenerateStrt(i1,o1)

for j = 2 to n: P’ = GenerateStrt(ij,oj)

P = Intersectt(P’, P)

return P

Page 22: Learning Semantic String Transformations from Examples

Semantic String

Transformations

Demo

Page 23: Learning Semantic String Transformations from Examples

Syntactic String Language [GulwaniPOPL11]

Page 24: Learning Semantic String Transformations from Examples

Combined Language

Syntactic manipulations over lookup outputs

Syntactic manipulations before indexing

Page 25: Learning Semantic String Transformations from Examples

Synthesis Algorithm:

– Reachability based on syntactic string matches•

– Boolean conditionals

Page 26: Learning Semantic String Transformations from Examples

GenerateSt r𝑢SSN: 044-58-3429

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

Mr. Steve Russell

Page 27: Learning Semantic String Transformations from Examples

GenerateSt r𝑢SSN: 044-58-3429

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

GenerateSt r ′𝑡

Page 28: Learning Semantic String Transformations from Examples

GenerateSt r𝑢SSN: 044-58-3429

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

GenerateSt r ′𝑡

Page 29: Learning Semantic String Transformations from Examples

GenerateSt r𝑢{ “SSN: 044-58-3429”, “044-58-3429”, “1125”, “Steve Russell” } Set of reachable

strings

Page 30: Learning Semantic String Transformations from Examples

GenerateSt r𝑢

GenerateSt r𝑠

{ “SSN: 044-58-3429”, “044-58-3429”, “1125”, “Steve Russell” }

Mr. Steve Russell

and in paper

Page 31: Learning Semantic String Transformations from Examples

Experiments

• 50 benchmark problems– 12 , 38

• ~1020 consistent expressions– Size of data structure: ~2000

• Performance: 96% less than 1 second

• Ranking: at most 3 examples (95% 2 examples)

Page 32: Learning Semantic String Transformations from Examples

Related Work

• Matching strings for table joins– Record Matching [Elmagarmid et. al. 07, Koudas et. al. SIGMOD06]– Schema Matching [Dhamankar et. al. SIGMOD04, Warren & Tompa

VLDB06]

• Query Synthesis– from representative view [Das Sharma et.al. ICDT10, Tran et.al.

SIGMOD09]

• Text-editing by example– QuickCode[Gulwani POPL11]– SMARTedit[Lau et.al. ML03], Simulatenous Editing[Miller

et.al. USENIX01]

Page 33: Learning Semantic String Transformations from Examples

Thanks!

 

End-Users

Algorithm Designers

Software Developers

Large potential

Page 34: Learning Semantic String Transformations from Examples

Backup slides

Page 35: Learning Semantic String Transformations from Examples

Semantic String Transformations

Time (12 Hr) Time (24 Hr)

0930 9:30 AM

1520 3:20 PM

1648

0830

1015

2010

1012

1425

=TEXT(C,”00 00”)+0

Page 36: Learning Semantic String Transformations from Examples

Semantic String Transformations

Date Formatted Date

06-03-2008 Jun 3rd, 2008

03-26-2010

08-01-2009

09-24-2007

05-14-2010

07-20-1998

10-24-2004

08-24-1972

Page 37: Learning Semantic String Transformations from Examples

Idea 1: Share sub-expressions

T3

C1 C2 C3

s3 s4 s5

T1

C1 C2 C3

s1 s2 s3

T2

C1 C2 C3

s2 s3 s4

Select(C3, T2, C1=e)

Select(C2, T3, C1=Select(C2,T2,C1=e)

e Select(C2, T1, C1=v1)𝑠2

Page 38: Learning Semantic String Transformations from Examples

Youtube Videos

FrenchPolishUrduGermanSerbianRussian

http://bit.ly/flashfill

Page 39: Learning Semantic String Transformations from Examples

Idea 2: CNF conditionals

T

C1 C2 C3 … Cn Cn+1

s s s s t

v1 v2 … vm Out

s s s t

Page 40: Learning Semantic String Transformations from Examples

No. of Consistent Expressions

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 491

10000

100000000

1000000000000

1E+016

1E+020

1E+024

1E+028

1E+032

1E+036

Large number of consistent expressions

Benchmarks

Nu

mb

er

of

exp

ressio

ns

Page 41: Learning Semantic String Transformations from Examples

Succinct Representation

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

500

1,000

1,500

2,000

Succinct Representation

Benchmarks

Siz

e o

f D

ata

Str

uctu

re

Page 42: Learning Semantic String Transformations from Examples

Performance

1 6 11 16 21 26 31 36 41 460.00

2.00

4.00

6.00

8.00

10.00

12.00

Running Time

Benchmarks

Ru

nn

ing

Tim

e (

in s

econ

ds)

Page 43: Learning Semantic String Transformations from Examples

Ranking

1 2 30

5

10

15

20

25

30

35

40

Ranking Measure

Number of I/O examples

Nu

mb

er

of

Be

nch

ma

rks

Page 44: Learning Semantic String Transformations from Examples

Idea 2: CNF conditionals

{{𝜂1 ,𝜂2 } ,𝜂2 ,Progs }Progs [𝜂1 ]≡ {𝑣1 ,𝑣2 ,⋯ ,𝑣𝑚}

Progs [𝜂2 ]={Select (C𝑛+1 ,𝑇 ,∧𝑖C i= {𝑠 ,𝜂1 })}

𝑚+1Θ ((𝑚+1 )𝑛 )

Page 45: Learning Semantic String Transformations from Examples

GenerateSt r𝑡

: string value𝜂

: set of lookup programs to generate

𝑣𝑎 𝑙−1 (𝑠 ):Node𝜂 ,𝑣𝑎𝑙 (𝜂 )=𝑠

Page 46: Learning Semantic String Transformations from Examples

Related Work

• Record Matching – Similarity functions for matching [Elmagarmid et. al.

07, Koudas et. al. SIGMOD06]– Customizable similarity function [Arasu et. al. VLDB09]

• Learning Schema Matches– iMAP [Dhamankar et. al. SIGMOD04] concat. of

column strings using domain-specific knowledge

– [Warren & Tompa VLDB06] concatenation of column substrings, single table

Page 47: Learning Semantic String Transformations from Examples

Related Work

• Query Synthesis [Das Sharma et.al. ICDT10, Tran et.al. SIGMOD09]

– Infer relation from large representative example view

– no joins or projections

• Text-editing using examples– QuickCode[Gulwani POPL11] string

transformations– SMARTedit[Lau et.al. ML03], Simulatenous

Editing[Miller et.al. USENIX01] programming by demonstration

Page 48: Learning Semantic String Transformations from Examples

General Framework

• A Domain-specific Transformation Language L– Expressive and succinct

• Efficient Data structures for set of expressions– Version-space algebra

• GenerateStr – All sets of expressions from I-O example

• Intersect– Intersect two sets of expressions

Page 49: Learning Semantic String Transformations from Examples

Emp Record

SSN EmpId Name

027-36-4557 1254 John Henry

034-83-7683 2412 William Johnson

044-58-3429 1125 Steve Russell

018-45-8949 4257 Ian Jordan

023-34-3254 6418 Mary DinaInput v1 Output

044-58-3429 Steve Russell

023-34-3254

Select(Name, EmpRecord, (SSN = v1))

Example - Lookup

Page 50: Learning Semantic String Transformations from Examples

ItemRec

ItemId Item

ST-340 Stroller

BI-567 Bib

DI-328 Diapers

WI-989 Wipes

AS-469 Aspirator

PriceRec

ItemId Price

ST-340 $145.67

BI-567 $3.56

DI-328 $21.45

WI-989 $5.12

AS-469 $2.56

Input v1 Output

Stroller $145.67

Bib

Aspirator

Wipes

Select(Price, PriceRec, (ItemId = Select(ItemId, ItemRec, Item = v1))

Example – Transitive Lookups

Page 51: Learning Semantic String Transformations from Examples

Data Structure

Page 52: Learning Semantic String Transformations from Examples

Data structure for expressions

Page 53: Learning Semantic String Transformations from Examples

Data structure

Page 54: Learning Semantic String Transformations from Examples

Data structure

Page 55: Learning Semantic String Transformations from Examples

Data structure

Page 56: Learning Semantic String Transformations from Examples

T1

C1 C2 C3

s1 s2 s3

T2

C1 C2 C3

s2 s3 s4

Ti

C1 C2 C3

si si+1 si+2

Example

…TmInput v1 Output

s1 sm

Page 57: Learning Semantic String Transformations from Examples

Ti-1

C1 C2 C3

si-1 si si+1

Ti-2

C1 C2 C3

si-2 si-1 si

Sub-expression Sharing

𝑠𝑖

Page 58: Learning Semantic String Transformations from Examples

Sub-expression Sharing

𝑠𝑖− 1 𝑠𝑖𝑠𝑖− 2

𝜂𝑖

𝜂𝑖− 1

𝜂𝑖− 2

Page 59: Learning Semantic String Transformations from Examples

Sub-expression Sharing

{{𝜂1 ,𝜂2 ,⋯ ,𝜂𝑚 } ,𝜂𝑚 , Progs }

Progs [𝜂1 ]≡ {𝑣1 }Progs [𝜂2 ]={Select (C2 , T 1,C1= {s1 ,𝜂1 }) }

Page 60: Learning Semantic String Transformations from Examples

Sub-expression Sharing

𝑁 (𝑖 )=𝑁 (𝑖−1 )+𝑁 (𝑖−2)

𝑁 (𝑖 )=Θ (2𝑖 ){{𝜂1 ,𝜂2 ,⋯ ,𝜂𝑚 } ,𝜂𝑚 , Progs }

Progs [𝜂1 ]≡ {𝑣1 }Progs [𝜂2 ]={Select (C2 , T 1,C1= {s1 ,𝜂1 }) }

Page 61: Learning Semantic String Transformations from Examples

Intersect t :D t1∧Dt 2

Page 62: Learning Semantic String Transformations from Examples

Current State of the Art: Help forums

Page 63: Learning Semantic String Transformations from Examples

Observations

• Semantic string transformations

• Input-output examples based interaction– New disambiguating inputs

• Add-in with the same interface