View
61
Download
0
Category
Preview:
DESCRIPTION
Generating Test Data for Killing SQL Mutants: A Constraint Based Approach. Shetal Shah, S. Sudarshan, Suhas Kajbaje, Sandeep Patidar Bhanu Pratap Gupta, Devang Vira. CSE Department, IIT Bombay. Testing SQL Queries: A Challenge. Complex SQL queries hard to get right - PowerPoint PPT Presentation
Citation preview
ICDE 2011 1
Generating Test Data for Killing SQL Mutants:
A Constraint Based Approach
CSE Department, IIT Bombay
Shetal Shah, S. Sudarshan, Suhas Kajbaje, Sandeep PatidarBhanu Pratap Gupta, Devang Vira
ICDE 20112
Testing SQL Queries: A Challenge Complex SQL queries hard to get right Question: How to check if an SQL query is
correct? Formal verification is not applicable since we do
not have a separate specification and an implementation
State of the art solution: manually generate test databases and check if the query gives the intended result Often misses errors
ICDE 20113
Example Find number of high paid instructors in each
department
dept instructor
σ sal>100000
deptnameGcount(instrID)
dept instructor
σ sal>100000
deptnameGcount(instrID)
Test dataset should be able to distinguish these two alternatives
Problem: does not output departments with no high paid instructors
ICDE 20114
Generating Test Data: Prior Work Automated Test Data generation
Initial work based on database constraints alone Later work: DB constraints + SQL query
Agenda, Reverse Query Processing (RQP), QAGen Our earlier work in ICDE 2010 (poster) on killing SQL
mutants More recent (independent) de la Riva et al [AST10]
consider test data generation for killing mutants But limited space of mutations, no guarantees of
completeness None of the above guarantee anything about detecting
errors in SQL queries Question: How do you model SQL errors? Answer: Query Mutation
ICDE 20115
Query Mutations Mutations: model common programming errors
Join used instead of outerjoin (or vice versa) Join/selection condition errors
< vs. <=, missing or extra condition Wrong aggregate (min vs. max)
Mutant: syntactic variation of the given query Q Mutant may be the intended query
A dataset D kills a mutant M of query Q if output of Q and M differ on D
For a mutation space, a dataset suite is complete if all non-equivalent mutants are killed
ICDE 20116
Mutation Testing of Queries Mutation testing can be used to check coverage of a
test suite Given a test suite, generate mutants in a pre-defined
mutation space; evaluate on a given test suite Our goal is different:
Generate datasets for the given query, based on possible mutations
Each dataset and query result on the dataset shown to a human, who verifies that the query result is what is expected on that dataset
We do not need to actually generate and execute mutants
ICDE 20117
Query and Mutation SpaceQuery: Single Block SQL Query with unconstrained
aggregation at the top Mutation Space: Join Type Mutation : change of any of inner join, left
outer join, right outer join, full outer join to another Across all possible join orders (unlike earlier work)
Comparison Operator Mutations: An occurrence of one of (=, <, >, <=, >=, <>) in the WHERE clause replaced by another
Unconstrained Aggregation Mutations: MIN ↔ MAX, SUM ↔ SUM(DISTINCT)
AVG ↔ AVG(DISTINCT), COUNT ↔ COUNT(DISTINCT)
ICDE 20118
Schema: r(A), s(B)
To kill this mutant: ensure that for some r tuple there is no matching s tuple
Generated test case r(A) ={(1)}; s(B) ={}
Basic idea, version 1 [ICDE 2010] run query on given database, from result extract matching tuples for r and s delete s tuple to ensure no matching tuple for r Limitation: foreign keys, repeated relations
Killing Join Mutants: Basic Idea
ICDE 20119
Schema: r(A,B), s(C,D), t(E); Extra join above mutated node
To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple
Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)} Our approach: generate not-matching constraints of the form
∃ r1 in r, t1 in t s.t. (r1.B=t1.E and ∃ si in s s.t. (si.C=r1.A))and solve using a solver Informally, we “nullify” s to kill the above mutation
Need to Ensure Difference in Final Query Result
ICDE 201110
Working around Foreign Key Constraints teaches instructor
is equivalent to teaches instructor if there is a foreign key from teaches.ID to instructor.ID
BUT: teaches σ dept=CS(instructor)is not equivalent to teaches σ dept=CS(instructor)
Key idea: have a teaches tuple with an instructor not from CS
Selections and joins can be used to kill mutations We show that it comes for free when we “nullify” relations
to kill other join mutations or generate data to kill selection mutations
ICDE 201111
Schema: r(A,B), s(C,D), t(E), Mutation: r s
Any result with a r.B being null will be removed by join with t Similarly equivalence can result due to selections But mutation of s to s would be non equivalent with join
order (r t) s SQL Query may not specify join orders; many join trees possible; the
datasets should kill mutants across all join orders Note: de la Riva [AST10] do not consider alternative join orders
Join Orders and Equivalent Mutations
ICDE 201112
Equivalent Trees and Equivalence Classes of Attributes Whether query conditions written as
A.x = B.x AND B.x = C.x or as A.x = B.x AND A.x = C.x should not affect set of mutants generated
Solution: Equivalence classes of attributes
ICDE 201113
Data Generation Algorithm Assumptions
Only PK/FK constraints, single block SQL queries, only simple arithmetic expressions, predicates are purely conjunctive (and a few more in paper)
Algorithm Overview For each dataset generate constraints.
1. DB constraints : primary key, foreign key constraints 2. Query constraints: join conditions, selection conditions, other
than conditions inconsistent with non-matching constraints (see below)
3. Constraints to kill mutations: not-matching constraints derived from join/selection conditions
One constraint per dataset Generate Test Data
1. Using a constraint solver (currently CVC3) on above constraints
ICDE 201114
Main Procedure: generateDataSet(query q) preprocess query tree
build equivalence classes foreign key closure
generateDataSetForOriginalQuery()A constraint set with only DB and Query constraints such that the result set for q is non-empty
killEquivalenceClasses() killOtherPredicates() killComparisonOperators() killAggregates()
ICDE 201115
Killing Join Mutants: Equijoin(killEquivalenceClasses) Query : r s t where r.A = s.B and s.B =t.C
r.A, s.B, t.C are in one equivalence class Generate one dataset for each element of each
equivalence class Three datasets generated, with not-matching
mutation constraints: D1: r1∃ εr s1∃ εs (r1.A = s1.B ti∧ ∃ εt (ti.C=r1.A)) D2: s1∃ εs t1∃ εt (s1.B = t1.C ri∧ ∃ εr (ri.A=s1.B)) D3: r1∃ εr t1∃ εt (r1.A = t1.C si∧ ∃ εs (si.B=t1.C))
But if t.C is an FK referencing s.B, instead generate D3’: r1∃ εr ( ti∃ εt, siεs (si.B=r1.A ti.C=r1.A))∧
ICDE 201116
Comparison Operation Mutations Example of comparison operation mutations: A < 5 vs. A <= 5 vs. A > 5 vs A >= 5 vs. A=5,
vs A <> 5 Idea: generate three datasets (leaving rest of
query unchanged), one each for: A < 5 A = 5 A > 5
These datasets will kill all above mutations
ICDE 201117
Aggregation Operation Mutations Aggregation operations
count(A) ↔ count(distinct A); sum(A) ↔ sum(distinct A) avg(A) ↔ avg(distinct A); min(A) ↔ max(A)
Eg: select min (salary) from instructor group by dept_name; Dataset Generated: [CS, Srinivasan, 80000] [CS, Wu, 80000] [CS, Alex, 65000] will kill each of the above pairs of mutations Issues:
Database/query constraints forcing A to be unique Database/query constraints forcing G to be unique
Carefully crafted set of constraints, which are relaxed to handle such cases Details in paper
One group with two values equal (and <> 0),Third different
ICDE 201118
On Completeness and Complexity With respect to query size:
Number of datasets generated: linear Number of mutants: exponential
Theorem: The problem of generating data to kill a mutant is NP-hard reduction from query containment; polynomial in special cases.
Theorem: For the class of queries, and the space of join-type and selection mutations considered, the suite of datasets generated by our algorithm is complete, i.e., the datasets kill all non-equivalent mutations of a given query
Also complete for a restricted classes of aggregation mutations with aggregation as top operation of tree, under some restrictions on joins in input
ICDE 201119
Performance Results University database schema from Database System
Concepts 6th Ed consisting of 7 relations Further Optimizations : Unfolding of bounded first
order quantified constraints E.g. “for all i in 1 to 4, P(i)” unfolded as
P(1) ∧ P(2) ∧ P(3) ∧ P(4) Also tested with addition of input database
constraints Forcing output tuples to come from input database
ICDE 201120
Results for inner join queries
# Datasets generated small; # mutants (killed) grows exponentially Unfolding of constraints gives substantial benefits Increasing # foreign keys reduces the total time taken Results similar for queries with left/right outer join
ICDE 201121
Results for queries with selections, aggregations
Unfolding reduces times taken, epsecially when the number of tuples are large (aggregations with joins)
(Not shown above) using input database increases time taken but not by much
ICDE 201122
Conclusions and Future Work Presented a principled approach to data generation
for testing queries And showed it works efficiently for a large class of queries
We’ve only made first steps; more work to be done: Constrained aggregations and sub-queries Other mutations: e.g., missing/extra join and selection
conditions Multiple queries Query and form parameters
Acknowledgements Bhupesh Chawda for discussions/implementation help Microsoft Research India for their generous travel support
ICDE 2011 23
Thank YouQuestions
ICDE 201124
Related Work
Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] defined a class of SQL query mutations
Shortcoming: do not address test data generation More recently (and independent of our work) de la
Riva et al [AST10] address data generation using constraints, with the Alloy solver Do not consider alternative join orders, No
completeness results, Limitations on constraints
ICDE 201125
Dataset for Original Query Array of tuples of constraint variables, per relation One or more constraint tuple from array, for each occurrence of a
relation plus occurrences to ensure f.k. constraints
Equality conditions between variables based on equijoins Other selection and join conditions become constraints Constraints for primary and foreign keys
f.k. from R.A to S.B FORALL i EXISTS j (O_R[i].A = O_S[j].B);
p.k. on R.A FORALL i FORALL j (O_R[i].A = O_R[j].A]) => “all other attrs equal”
since range is over finite domain, p.k. and f.k. constraints can be unfolded
See sample set of constraints in file cvc3_0.cvc
ICDE 201126
Killing Join Mutants: Equijoin
killEquivalenceClasses() for each equivalence class ec do
Let allRelations := Set of all <rel, attr> pairs in ec for each element e in allRelations do
conds := empty set Let e := R.a S := (set of elements in ec which are foreign keys
referencing R.a directly or indirectly) UNION R:a P := ec - S if P:isEmpty() then
continue else … main code for generating constraints (see next
slide)
ICDE 201127
Killing Join Mutants: EquiJoins conds.add(generateEqConds(P)) conds:add( “NOT EXISTS i: R[i].a = ” + cvcMap(P[0])) for all other equivalence classes oe do
conds.add(generateEqConds(oe)) for each other predicate p do
conds:add(cvcMap(p)) conds.add(genDBConstraints()) /*P.K. and F.K*/ callSolver(conds) if solution exists then
create a dataset from solver output
ICDE 201128
Killing Other Predicates Create separate dataset for each attribute in
predicate e.g. For Join condition B.x = C.x + 10
Dataset 1 (nullifying B:x): ASSERT NOT EXISTS (i : B_INT) : (B[i].x = C[1].x +
10); Dataset 2 (nullifying C:x):
ASSERT NOT EXISTS (i : C_INT) : (B[1].x = C[i].x + 10);
Recommended