34
December 9, 2005 ISI AI Seminar Series 1 Constraint Systems Laboratory Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder Supported by NSF CAREER award #0133568

Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 1

Constraint Systems Laboratory

Symmetry Detection in Constraint Satisfaction Problems

& its Application in Databases

Berthe Y. ChoueiryConstraint Systems Laboratory

Department of Computer Science & EngineeringUniversity of Nebraska-Lincoln

Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder

Supported by NSF CAREER award #0133568

Page 2: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 2

Constraint Systems Laboratory

Outline

• Definitions– CSP– Interchangeability– Bundling

• Bundling in CSPs

• Bundling for join query computation

• Conclusions

Page 3: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 3

Constraint Systems Laboratory

Constraint Satisfaction Problem (CSP)

• Given P = (V, D, C)– V : set of variables– D : set of their domains– C : set of constraints (relations) restricting the

acceptable combination of values for variables– Solution is a consistent assignment of values to variables

• Query: find 1 solution, all solutions, etc.• Examples: SAT, scheduling, product configuration• NP-Complete in general

V3

{d}

{a, b, d} {a, b, c}

{c, d, e, f}

V4

V2V1

Page 4: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 4

Constraint Systems Laboratory

Solution techniques (simplified)

• Search – Backtrack search

• Constructive• Complete (in theory) and sound

– Iterative repair• Repairs a complete but inconsistent assignment of values to

variables by doing local repairs• In general, neither sound nor complete

• Constraint propagation– Removes from the problem values (or combinations of values)

that are inconsistent with the constraints– In general, efficient (polynomial time)

{1,2,3,4}

{1,2,3}

{1,2}

Page 5: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 5

Constraint Systems Laboratory

Backtrack search

• DFS + backtracking (linear space) – Variable being instantiated: current variable– Un-instantiated variables: future variables– Instantiated variables: past variables

• + Constraint propagation – Backtrack search with forward checking (FC)

c e f d

dV1

V2

S

V3

Solution

V1 dV2 e

V3 aV4 c

{c,d,e,f}

{a,b,d}

{a,b,c}

V1

V2

V3

V4

d

V3

{d}

{a, b, d} {a, b, c}

{ c, d, e, f }

V4

V2V1

Page 6: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 6

Constraint Systems Laboratory

Interchangeability [Freuder, 91]

• Captures the idea of symmetry between solutions • Functional interchangeability

– Any mapping between two solutions– Including permutation of values across variables, equivalent to

graph isomorphism

• Full interchangeability (FI)– Restricted to values of a single variable– Also, likely intractable

V1 V2 {d, e, f}

V3 V4

In every solutionV1 dV2 c

V3 aV4 b

V1 dV2 c

V3 bV4 a

V3

{d}

{a, b, d} {a, b, c}

{ c, d, e, f }

V4

V2V1

Page 7: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 7

Constraint Systems Laboratory

Value interchangeability [Freuder, 91]

• Full Interchangeability (FI): – d, e, f interchangeable for V2 in any solution

• Neighborhood Interchangeability (NI): – Considers only the neighborhood of the variable – Finds e, f but misses d– Efficiently approximates FI– Discrimination tree DT(V2)

{c, d, e, f }{d}

{a, b, d} {a, b, c} V4

V2V1

V3

Page 8: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 8

Constraint Systems Laboratory

Outline

• Definitions

• Bundling in CSPs– Static bundling– Dynamic bundling– Dynamic bundling for non-binary CSPs

• Bundling for join query computation

• Conclusions

Page 9: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 9

Constraint Systems Laboratory

• Static bundling [Haselböck, 93]

– Before search: compute & store NI sets– During search, when a variable:

• is a future variable: forward checking removes bundle of equivalent values • is the current variable: assign a bundle of equivalent values

• Advantages– Reduces search space– Creates bundled solutions

Bundling: using NI in search

Static bundling

c e, f d

dV1

V2

S

V3

{d}

{a, b, d} {a, b, c}

{ c, d, e, f }

V4

V2V1

V4 {b,c}

V1 dV2 {e,f}

V3 aV2

{ c, d, e, f }

{ d, c, e, f }

{ c, d, e, f }

V3

V4

V1

Page 10: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 10

Constraint Systems Laboratory

<V4,a>

<V4,c><V4,b>

<V3,b><V3,a>

• Dynamically identifies NI• Using discrimination tree for forward checking:

– is never less efficient than BT & static bundling

Dynamic bundling (DynBndl) [2001]

Static bundling

S

c d, e, f

dV1

V2

Dynamic bundling

c e, f d

dV1

V2

S

V3

{d}

{a, b, d} {a, b, c}

{ c, d, e, f }

V4

V2V1

V2,{d,e,f}

V2,{c}

Page 11: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 11

Constraint Systems Laboratory

Non-binary CSPs

Constraint Variable

C1 C2 C3 C4

V V1 V2 V V3 V2 V3 V4 V1 V4

1 1 3 1 3 1 2 1 1 1

1 3 3 2 3 1 2 2 2 2

2 1 3 3 2 2 2 1 3 1

2 3 3 4 2 2 2 2

3 1 1 4 2 3 1 1

3 2 2 6 1

4 1 1

4 2 2

5 3 2

6 3 2

C4

{1, 2, 3, 4, 5, 6}

{1, 2, 3}

{1, 2, 3}

{1, 2, 3}

{1, 2, 3}

C2

C1

C3 V1

V2

V3

V4

V

• Scope(Cx): the set of variables involved in Cx

• Arity(Cx): size of scope

Computing NI for non-binary CSPs is not a trivial extension from binary CSPs

Page 12: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 12

Constraint Systems Laboratory

1. Building an nb-DT for each constraint– Determines the NI sets of variable given constraint

2. Intersecting partitions from nb-DTs – Yields NI sets of V (partition of DV)

3. Processing paths in nb-DTs– Gives, for free, updates necessary for forward checking

NI for non-binary CSPs [2003,2005]

C4

{1, 2, 3, 4, 5, 6}

C2

C1

C3

V1

V2

V3

V4

V

{1, 2} {5, 6} {3, 4}

Root

nb-DT(V, C1)

Root

{1, 2} {3, 4}{6}

nb-DT(V, C2)

{5}{1, 2} {3, 4} {6}

{5}

Page 13: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 13

Constraint Systems Laboratory

Robust solutions

• Solution bundle– Cartesian product of domain bundles– Compact representation– Robust solutions

• Dynamic bundling finds larger bundles

V1 dV2 e

V3 aV4 c

Single Solution

V1 dV2 {e,f}

V3 aV4 {b,c}

Static bundling

V1 dV2 {d,e,f}

V3 aV4 {b,c}

Dynamic bundling

Page 14: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 14

Constraint Systems Laboratory

DynBndl: worth the effort?

• Finds larger bundles• Enables forward checking at no extra cost• Does not cost more than BT or static bundling

– Cost model: • # nodes visited by search• # constraint checks made

− Theoretical guarantee holds • for finding all solutions• under same variable ordering

¿ Finding first solution ?− Experiments uncover an unexpected benefit

Page 15: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 15

Constraint Systems Laboratory

Bundling of no-goods…

No-good bundle

{1, 2}

{1, 3}

{3}

{3, 4}

{2}

{1}

{1}

{1}

V

V4

V3

V1

V2

Solution bundle

C4

{1, 2, 3, 4, 5, 6}

{1, 2, 3}

{1, 2, 3}

{1, 2, 3}

{1, 2, 3}

C2

C1

C3 V1

V2

V3

V4

V

• … is particularly effective

Page 16: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 16

Constraint Systems Laboratory

• CSP parameters: – n: number of variables {20,30}– a: domain size {10,15}– t: constraint tightness [25%, 75%]– CR: constraint ratio (arity: 2, 3, 4)– 1,000 instances per tightness value

• Phase transition• Performance measures

– Nodes visited (NV)– Constraint checks (CC)– CPU time– First Bundle Size (FBS)

Experimental set-up

Cos

t of

sol

vin

g

Mostly solvable instances

Mostly un-solvable instances

Critical value Order parameter

Page 17: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 17

Constraint Systems Laboratory

Empirical evaluations

• DynBndl versus FC (BT + forward checking)

• Randomly generated problems, Model B• Experiments

– Effect of varying tightness– In the phase-transition region

• Effect of varying domain size • Effect of varying constraint ratio (CR)

• ANOVA to statistically compare performance of DynBndl and FC with varying t

• t-distribution for confidence intervals

Page 18: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 18

Constraint Systems Laboratory

Analysis: Varying tightness• Low tightness

– Large FBS • 33 at t=0.35 • 2254 (Dataset #13, t=0.35)

– Small additional cost

• Phase transition– Multiple solutions present– Maximum no-good bundling

causes max savings in CPU time, NV, & CC

• High tightness– Problems mostly unsolvable– Overhead of bundling minimal

n=20a=15CR=CR3

0

2

4

6

8

10

12

14

16

18

20

0.325 0.35 0.375 0.4 0.425 0.45 0.475 0.5 0.525 0.55 0.575 0.6

TightnessT

ime

[s

ec

]#

NV

, h

un

dre

ds t FBS

0.350 33.44 0.400 10.91 0.425 7.130.437 6.38 0.450 5.620.462 2.370.475 0.660.500 0.03

0.550 0.00 NV

CPU time

DynBndl

FC

DynBndl

FC

Page 19: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 19

Constraint Systems Laboratory

Analysis: Varying domain size• Increasing a in phase-

transition– FBS increases: More

chances for symmetry– CPU-time improvement

also increases: more bundling of no-goods

CR Improv (CPU) %

FBS

a=10 a=15 a=10 a=15

CR1 33.3 34.3 5.5 11.9

CR2 28.6 33.0 5.0 5.5

CR3 29.8 31.7 3.6 5.0

CR4 28.4 31.6 1.2 1.4

Increasing a (n=30)

Because the benefits of DynBndl increase with increasing domain size, DynBndl is particularly interesting for database applications where large domains are typical

Page 20: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 20

Constraint Systems Laboratory

Outline

• Definitions

• Bundling in CSPs

• Bundling for join query computation– Idea– A CSP model for the query join– Sorting-based bundling algorithm– Dynamic-bundling-based join algorithm

• Conclusions

Page 21: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 21

Constraint Systems Laboratory

The join queryJoin query

SELECT R2.A,R2.B,R2.C

FROM R1,R2

WHERE R1.A=R2.A

AND R1.B=R2.B

AND R1.C=R2.C

R1

A B C

1 12 23

1 13 23

1 14 23

2 10 25

3 16 30

4 10 25

5 12 23

5 13 23

5 14 23

6 13 27

6 14 27

7 14 28

R2

A B C

1 12 23

1 13 23

1 14 23

1 15 23

2 10 25

3 17 20

4 10 25

5 12 23

5 13 23

5 14 23

5 15 23

6 13 27

6 14 27

Result: 10 tuples in

3 nested tuples

R1 R2 (compacted)

A B C

{1, 5} {12, 13, 14} {23}

{2, 4} {10} {25}

{6} {13, 14} {27}

Page 22: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 22

Constraint Systems Laboratory

Databases & CSPs

DB terminology CSP terminology

Table, relation Constraint (relational constraint)

Join condition Constraint (join-condition constraint)

Attribute CSP variable

Tuple in a table Tuple in a constraint or allowed by one

Computing a join sequence Finding all solutions to a CSP

• Same computational problems, different cost models– Databases: minimize # I/O operations– CSP community: # CPU operations

• Challenges for using CSP techniques in DB– Use of lighter data structures to minimize memory usage– Fit in the iterator model of database engines

Administrator
Page 23: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 23

Constraint Systems Laboratory

Join query as a CSP

• R1 xy R2– Most expensive operator in terms of I/O is “=” equi-Join

• x is same as y natural Join

• CSP model– Attributes of relations CSP variables– Attribute values variable domains– Relations relational constraints– Join conditions join-condition constraints

SELECT R1.A,R1.B,R1.CFROM R1,R2WHERE R1.A=R2.A AND R1.B=R2.BAND R1.C=R2.C

Page 24: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 24

Constraint Systems Laboratory

Join algorithms• Join algorithms

– Nested Loop– Sorting-based

• Two steps: sorting, merging• Partitions relations by sorting, minimizes # scans of relations

– Hashing-based

• Progressive Merge-Join– MJ: a sort-merge algorithm

[Dittrich et al. 03]– Two phases

1. Sorting: sorts sub-sets of relations &2. Merging phase: merges sorted sub-sets

– PMJ produces early results – We use the framework of the PMJ

Page 25: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 25

Constraint Systems Laboratory

New join algorithm

• Sorting phase– Load sub-sets of relations in memory– Uses sorting-based bundling (shown next)

• Merging phase– Computes join of in-memory relations using

dynamically computed bundles

Page 26: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 26

Constraint Systems Laboratory

• Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible

R1.A

R2.A

R1.B

R2.B

R1.C

R2.C

R1

R2

• Sort relations using above ordering

• Next: Compute bundles of variable ahead in variable ordering (R1.A)

Sorting-based bundling

Page 27: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 27

Constraint Systems Laboratory

Computing a bundle of R1.A

Partition

Unequalpartitions

Symmetricpartitions

Bundle {1, 5}

R1

A B C

1 12 23

1 13 23

1 14 23

2 10 25

5 12 23

5 13 23

5 14 23

• Partition of a constraint–Tuples of the relation having the same value of R1.A

• Compare projected tuples of first partition with those of another partition

• Compare with every other partition to get complete bundle

Page 28: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 28

Constraint Systems Laboratory

R1

A B C

1 12 23

1 13 23

1 14 23

2 10 25

3 16 30

3 16 24

R2

A B C

1 12 23

1 13 23

1 14 23

1 15 23

2 10 25

3 17 20

{1, 5, x}{1, 5, y, z}

Common {1, 5} 1. Compute a bundle for the attribute

2. Check bundle validity with future constraints

3. If no common value ‘backtrack’

Assign variable with the surviving values in the bundle

Finding the valid bundle

R1 R2 (compacted)

A B C

{1, 5} {12, 13, 14} {23}

{2, 4} {10} {25}

{6} {13, 14} {27}

Page 29: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 29

Constraint Systems Laboratory

Experiments

• XXL library for implementation & evaluation• Data sets

• Random: 2 relations R1, R2 with same schema as example– Each relation: 10,000 tuples– Memory size: 4,000 tuples– Page size 200 tuples

• Real-world problem: 3 relations, 4 attributes

• Compaction rate achieved– Random problem: 1.48

– Savings even with (very) preliminary implementation

– Real-world problem: 2.26 (69 tuples in 32 nested tuples)

Page 30: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 30

Constraint Systems Laboratory

Outline

• Definitions

• Bundling in CSPs

• Bundling for join query computation

• Conclusions– Summary– Future research

Page 31: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 31

Constraint Systems Laboratory

Summary• Dynamic bundling in finite CSPs

– Binary and non-binary constraints

– Produces multiple robust solutions

– Significantly reduces cost of search at phase transition

• Application to join-query computation

Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases

Page 32: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 32

Constraint Systems Laboratory

Future research• CSPs

– Only scratched the surface: – interchangeability + decomposition [ECAI 96]– partial interchangeability [AAAI 98, Wilson 05]– tractable structures

• Databases– Investigate benefit of bundling

• Sampling operator• Automatic categorization of query results• Main-memory databases

• Constraint databases– Design bundling mechanisms for gap & linear constraints over intervals

(spatial databases)

Page 33: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 33

Constraint Systems Laboratory

Thank you for your attention

Time left for questions?

Page 34: Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases

December 9, 2005 ISI AI Seminar Series 34

Constraint Systems Laboratory

Sample projects

1. Symmetry detection• Search, databases

2. Graduate TA Assignment Project (GTAAP) • Modeling, search, GUI

3. Temporal reasoning• STP: STP

• TCSP: AC, search heuristics

4. Structural decompositions• CaT: efficient approximation of the hypertree decomposition• IndSet: multiple solutions, (almost) k-partite structure