Upload
tilden
View
22
Download
0
Embed Size (px)
DESCRIPTION
Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder - PowerPoint PPT Presentation
Citation preview
Fall 2009, Advanced Constraint Programming
1
Symmetry Detection in Constraint Satisfaction Problems
& Its Application in Databases
Berthe Y. ChoueiryConstraint Systems Laboratory
Department of Computer Science & EngineeringUniversity of Nebraska-Lincoln
Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder
Supported by NSF CAREER award #0133568
Fall 2009, Advanced Constraint Programming
December 9, 2005 2
Outline
• Definitions– CSP– Interchangeability– Bundling
• Bundling in CSPs
• Bundling for join query computation
• Conclusions
Fall 2009, Advanced Constraint Programming
December 9, 2005 3
Constraint Satisfaction Problem (CSP)
• Given P = (V, D, C)– V : set of variables– D : set of their domains– C : set of constraints (relations) restricting the
acceptable combination of values for variables– Solution is a consistent assignment of values to variables
• Query: find 1 solution, all solutions, etc.• Examples: SAT, scheduling, product configuration• NP-Complete in general
V3
{d}
{a, b, d} {a, b, c}
{c, d, e, f}
V4
V2V1
Fall 2009, Advanced Constraint Programming
December 9, 2005 4
Backtrack search
• DFS + backtracking (linear space) – Variable being instantiated: current variable– Un-instantiated variables: future variables– Instantiated variables: past variables
• + Constraint propagation – Backtrack search with forward checking (FC)
c e f d
dV1
V2
S
V3
Solution
V1 dV2 e
V3 aV4 c
{c,d,e,f}
{a,b,d}
{a,b,c}
V1
V2
V3
V4
d
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
Fall 2009, Advanced Constraint Programming
December 9, 2005 5
Interchangeability [Freuder, 91]
• Captures the idea of symmetry between solutions • Functional interchangeability
– Any mapping between two solutions– Including permutation of values across variables, equivalent to
graph isomorphism
• Full interchangeability (FI)– Restricted to values of a single variable– Also, likely intractable
V1 V2 {d, e, f}
V3 V4
In every solutionV1 dV2 c
V3 aV4 b
V1 dV2 c
V3 bV4 a
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
Fall 2009, Advanced Constraint Programming
December 9, 2005 6
Value interchangeability [Freuder, 91]
• Full Interchangeability (FI): – d, e, f interchangeable for V2 in any solution
• Neighborhood Interchangeability (NI): – Considers only the neighborhood of the variable – Finds e, f but misses d– Efficiently approximates FI– Discrimination tree DT(V2)
{c, d, e, f }{d}
{a, b, d} {a, b, c} V4
V2V1
V3
Fall 2009, Advanced Constraint Programming
December 9, 2005 7
Outline
• Definitions
• Bundling in CSPs– Static bundling– Dynamic bundling– Dynamic bundling for non-binary CSPs
• Bundling for join query computation
• Conclusions
Fall 2009, Advanced Constraint Programming
December 9, 2005 8
• Static bundling
[Haselböck, 93]
– Before search: compute & store NI sets
– During search: • Future variables: remove bundle of equivalent values • Current variable: assign a bundle of equivalent values
• Advantages– Reduces search space
– Creates bundled solutions
Bundling: using NI in search
Static bundling
c e, f d
dV1
V2
S
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
V4 {b,c}
V1 dV2 {e,f}
V3 aV2
{ c, d, e, f }
{ d, c, e, f }
{ c, d, e, f }
V3
V4
V1
Fall 2009, Advanced Constraint Programming
December 9, 2005 9
<V4,a> <V3,d><V4,a>
<V4,b><V4,c><V4,b>
<V3,b><V3,a>
• Dynamically identifies NI• Using discrimination tree for forward checking:
– is never less efficient than BT & static bundling
Dynamic bundling (DynBndl) [2001]
Static bundling
S
c d, e, f
dV1
V2
Dynamic bundling
c e, f d
dV1
V2
S
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
V2,{d,e,f} V2,{c}
Fall 2009, Advanced Constraint Programming
December 9, 2005 10
Non-binary CSPs
Constraint Variable
C1 C2 C3 C4
V V1 V2 V V3 V2 V3 V4 V1 V4
1 1 3 1 3 1 2 1 1 1
1 3 3 2 3 1 2 2 2 2
2 1 3 3 2 2 2 1 3 1
2 3 3 4 2 2 2 2
3 1 1 4 2 3 1 1
3 2 2 6 1
4 1 1
4 2 2
5 3 2
6 3 2
C4
{1, 2, 3, 4, 5, 6}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
C2
C1
C3 V1
V2
V3
V4
V
• Scope(Cx): the set of variables involved in Cx
• Arity(Cx): size of scope
Computing NI for non-binary CSPs is not a trivial extension from binary CSPs
Fall 2009, Advanced Constraint Programming
December 9, 2005 11
1. Building an nb-DT for each constraint– Determines the NI sets of variable given constraint
2. Intersecting partitions from nb-DTs – Yields NI sets of V (partition of DV)
3. Processing paths in nb-DTs– Gives, for free, updates necessary for forward checking
NI for non-binary CSPs [2003,2005]
C4
{1, 2, 3, 4, 5, 6}
C2
C1
C3
V1
V2
V3
V4
V
{1, 2} {5, 6} {3, 4}
Root
nb-DT(V, C1)
Root
{1, 2} {3, 4}{6}
nb-DT(V, C2)
{5}{1, 2} {3, 4} {6}
{5}
Fall 2009, Advanced Constraint Programming
December 9, 2005 12
Robust solutions
• Solution bundle– Cartesian product of domain bundles– Compact representation– Robust solutions
• Dynamic bundling finds larger bundles
V1 dV2 e
V3 aV4 c
Single Solution
V1 dV2 {e,f}
V3 aV4 {b,c}
Static bundling
V1 dV2 {d,e,f}
V3 aV4 {b,c}
Dynamic bundling
Fall 2009, Advanced Constraint Programming
December 9, 2005 13
DynBndl: worth the effort?
• Finds larger bundles• Enables forward checking at no extra cost• Does not cost more than BT or static bundling
– Cost model: • # nodes visited by search• # constraint checks made
− Theoretical guarantee holds • for finding all solutions• under same variable ordering
¿ Finding first solution ?− Experiments uncover an unexpected benefit
Fall 2009, Advanced Constraint Programming
December 9, 2005 14
Bundling of no-goods…
No-good bundle
{1, 2}
{1, 3}
{3}
{3, 4}
{2}
{1}
{1}
{1}
V
V4
V3
V1
V2
Solution bundle
C4
{1, 2, 3, 4, 5, 6}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
C2
C1
C3 V1
V2
V3
V4
V
• … is particularly effective
Fall 2009, Advanced Constraint Programming
December 9, 2005 15
• CSP parameters: – n: number of variables {20,30}– a: domain size {10,15}– t: constraint tightness [25%, 75%]– CR: constraint ratio (arity: 2, 3, 4)– 1,000 instances per tightness value
• Phase transition• Performance measures
– Nodes visited (NV)– Constraint checks (CC)– CPU time– First Bundle Size (FBS)
Experimental set-up
Cos
t of
sol
vin
g
Mostly solvable instances
Mostly un-solvable instances
Critical value Order parameter
Fall 2009, Advanced Constraint Programming
December 9, 2005 16
Empirical evaluations
• DynBndl versus FC (BT + forward checking)
• Randomly generated problems, Model B• Experiments
– Effect of varying tightness– In the phase-transition region
• Effect of varying domain size • Effect of varying constraint ratio (CR)
• ANOVA to statistically compare performance of DynBndl and FC with varying t
• t-distribution for confidence intervals
Fall 2009, Advanced Constraint Programming
December 9, 2005 17
Analysis: Varying tightness• Low tightness
– Large FBS • 33 at t=0.35 • 2254 (Dataset #13, t=0.35)
– Small additional cost
• Phase transition– Multiple solutions present– Maximum no-good bundling
causes max savings in CPU time, NV, & CC
• High tightness– Problems mostly unsolvable– Overhead of bundling minimal
n=20a=15CR=CR3
0
2
4
6
8
10
12
14
16
18
20
0.325 0.35 0.375 0.4 0.425 0.45 0.475 0.5 0.525 0.55 0.575 0.6
TightnessT
ime
[s
ec
]#
NV
, h
un
dre
ds t FBS
0.350 33.44 0.400 10.91 0.425 7.130.437 6.38 0.450 5.620.462 2.370.475 0.660.500 0.03
0.550 0.00 NV
CPU time
DynBndl
FC
DynBndl
FC
Fall 2009, Advanced Constraint Programming
December 9, 2005 18
Analysis: Varying domain size• Increasing a in phase-
transition– FBS increases: More
chances for symmetry– CPU time decreases:
more bundling of no-goods
CR Improv (CPU) %
FBS
a=10 a=15 a=10 a=15
CR1 33.3 34.3 5.5 11.9
CR2 28.6 33.0 5.0 5.5
CR3 29.8 31.7 3.6 5.0
CR4 28.4 31.6 1.2 1.4
Increasing a (n=30)
Because the benefits of DynBndl increase with increasing domain size, DynBndl is particularly interesting for database applications where large domains are typical
Fall 2009, Advanced Constraint Programming
December 9, 2005 19
Outline
• Definitions
• Bundling in CSPs
• Bundling for join query computation– Idea– A CSP model for the query join– Sorting-based bundling algorithm– Dynamic-bundling-based join algorithm
• Conclusions
Fall 2009, Advanced Constraint Programming
December 9, 2005 20
The join queryJoin query
SELECT R2.A,R2.B,R2.C
FROM R1,R2
WHERE R1.A=R2.A
AND R1.B=R2.B
AND R1.C=R2.C
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
3 16 30
4 10 25
5 12 23
5 13 23
5 14 23
6 13 27
6 14 27
7 14 28
R2
A B C
1 12 23
1 13 23
1 14 23
1 15 23
2 10 25
3 17 20
4 10 25
5 12 23
5 13 23
5 14 23
5 15 23
6 13 27
6 14 27
Result: 10 tuples in
3 nested tuples
R1 R2 (compacted)
A B C
{1, 5} {12, 13, 14} {23}
{2, 4} {10} {25}
{6} {13, 14} {27}
Fall 2009, Advanced Constraint Programming
December 9, 2005 21
Databases & CSPs
DB terminology CSP terminology
Table, relation Constraint (relational constraint)
Join condition Constraint (join-condition constraint)
Attribute CSP variable
Tuple in a table Tuple in a constraint or allowed by one
Computing a join sequence Finding all solutions to a CSP
• Same computational problems, different cost models– Databases: minimize # I/O operations– CSP community: # CPU operations
• Challenges for using CSP techniques in DB– Use of lighter data structures to minimize memory usage– Fit in the iterator model of database engines
Fall 2009, Advanced Constraint Programming
December 9, 2005 22
Modeling join query as a CSP
• Attributes of relations CSP variables• Attribute values variable domains• Relations relational constraints• Join conditions join-condition constraintsSELECT R1.A,R1.B,R1.C
FROM R1,R2
WHERE R1.A=R2.A
AND R1.B=R2.B
AND R1.C=R2.C
R1.A R1.B R1.C
R2.A R2.BR2.C
R1 R2
Fall 2009, Advanced Constraint Programming
December 9, 2005 23
Join operator
• R1 xy R2– Most expensive operator in terms of I/O is “=” Equi-Join
• x is same as y Natural Join
• Join algorithms– Nested Loop– Sorting-based
• Sort-Merge, Progressive Merge-Join (PMJ)• Partitions relations by sorting, minimizes # scans of relations
– Hashing-based
Fall 2009, Advanced Constraint Programming
December 9, 2005 24
Join query• R1 xy R2
– Most expensive operator in terms of I/O is “=” Equi-Join
• x is same as y Natural Join
• CSP model– Attributes of relations CSP variables– Attribute values variable domains– Relations relational constraints– Join conditions join-condition constraints
SELECT R1.A,R1.B,R1.CFROM R1,R2WHERE R1.A=R2.A AND R1.B=R2.BAND R1.C=R2.C
R1.A R1.B R1.C
R2.A R2.BR2.C
R1 R2
Fall 2009, Advanced Constraint Programming
December 9, 2005 25
Progressive Merge Join
• PMJ: a sort-merge algorithm
[Dittrich et al. 03]
• Two phases1. Sorting: sorts sub-sets of relations &
2. Merging phase: merges sorted sub-sets
• PMJ produces early results • We use the framework of the PMJ
Fall 2009, Advanced Constraint Programming
December 9, 2005 26
New join algorithm
• Sorting & merging phases– Load sub-sets of relations in memory– Compute in-memory join using dynamic
bundling• Uses sorting-based bundling (shown next)• Computes join of in-memory relations using
dynamically computed bundles
Fall 2009, Advanced Constraint Programming
December 9, 2005 27
• Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible
R1.A
R2.A
R1.B
R2.B
R1.C
R2.C
R1
R2
• Sort relations using above ordering
• Next: Compute bundles of variable ahead in variable ordering (R1.A)
Sorting-based bundling
Fall 2009, Advanced Constraint Programming
December 9, 2005 28
Computing a bundle of R1.A
Partition
Unequalpartitions
Symmetricpartitions
Bundle {1, 5}
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
5 12 23
5 13 23
5 14 23
• Partition of a constraint–Tuples of the relation having the same value of R1.A
• Compare projected tuples of first partition with those of another partition
• Compare with every other partition to get complete bundle
Fall 2009, Advanced Constraint Programming
December 9, 2005 29
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
3 16 30
3 16 24
R2
A B C
1 12 23
1 13 23
1 14 23
1 15 23
2 10 25
3 17 20
{1, 5, x}{1, 5, y, z}
Common {1, 5}1. Compute a bundle
for the attribute 2. Check bundle
validity with future constraints
3. If no common value ‘backtrack’
Assign variable with the surviving values in the bundle
Finding the valid bundle
Fall 2009, Advanced Constraint Programming
December 9, 2005 30
Experiments
• XXL library for implementation & evaluation• Data sets
• Random: 2 relations R1, R2 with same schema as example– Each relation: 10,000 tuples– Memory size: 4,000 tuples– Page size 200 tuples
• Real-world problem: 3 relations, 4 attributes
• Compaction rate achieved– Random problem: 1.48
– Savings even with (very) preliminary implementation
– Real-world problem: 2.26 (69 tuples in 32 nested tuples)
Fall 2009, Advanced Constraint Programming
December 9, 2005 31
Outline
• Definitions
• Bundling in CSPs
• Bundling for join query computation
• Conclusions– Summary– Future research
Fall 2009, Advanced Constraint Programming
December 9, 2005 32
Summary
• Dynamic bundling in finite CSPs – Binary and non-binary constraints
– Produces multiple robust solutions
– Significantly reduces cost of search at phase transition
• Application to join-query computation
Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases
Fall 2009, Advanced Constraint Programming
December 9, 2005 33
Future research
• CSPs– Only scratched the surface: – interchangeability + decomposition [ECAI 1996],– partial interchangeability [AAAI 1998], – tractable structures
• Databases– Investigate benefit of bundling
• Sampling operator• Main-memory databases• Automatic categorization of query results
• Constraint databases– Design bundling mechanisms for gap & linear constraints over
intervals (spatial databases)