23
Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo Graph Query Reformulation with Diversity Davide Mottin, University of Trento Francesco Bonchi, Yahoo Labs - Francesco Gullo, Yahoo Labs

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Embed Size (px)

Citation preview

Page 1: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo1Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo

Graph Query Reformulation with DiversityDavide Mottin, University of TrentoFrancesco Bonchi, Yahoo Labs - Francesco Gullo, Yahoo Labs

Page 2: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo2

Issues with Pattern SearchO

OH OSQuery

510 matches

• Too many matches• Results are not

grouped

Page 3: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo3

Solution: Discovering Specializations

O

OH OS 510 matches

O

OH

OS

CH3

O

OH OS

SH

O

OH OS

H3C

O

OH OS

CH3

O

OH

OS

CH3HS

448 matches 46 matches 114 matches

382 matches 46 matches

Specializations

Page 4: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo4

Applications

• Finding groups of molecules having a particular reagent

• Analyze a set of proteins to find diseases• Workflow optimization• Anomaly detection in a network• 3D shape search with similar properties

Page 5: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo5

Dealing with Specializations in Web and Relational Data

• Faceted Search• present aspects of the results [Roy08]

• Query reformulation• Modify some of the query conditions

• In structured databases [Mishra09]• In web search [Dang10]

First Study of Problem on GRAPHS

Page 6: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo6

Graph Query Reformulation

Results

Query

Specializations:query supergraphs

Exponential numberof specializations

Page 7: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo7

Challenges

• The number of reformulation is exponential• Quantify the interestingness of a reformulation• Finding query specializations is NP-complete

Page 8: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo8

A Naïve Approach: k-most frequent super-patterns

Query

480 matches

450 matches

100 matches

SuperPatterns

30 matches420 matches

Until k patterns are found:- Retrieve the most frequent super-

patternFrequent ≠ Interesting

!

Page 9: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo9

Our Approach

Graph Query Reformulation with Diversity• Finds k meaningful specializations efficiently

Page 10: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo10

Finding Meaningful Specializations

Results

Query

Coverage Diversity

Find k meaningful specializations:1. Span all the results

2. Present different aspects of the results

?

Page 11: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo11

Diversity Matters

Results

Query

Objective function f(Q)

λ = 1• Non optimal: f({Q1’,Q2’}) = 7

• Optimal: f({Q3’,Q4’}) = 8

Page 12: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo12

Problem

Graph Query Reformulation with Diversity

12

Theorem (NP-hardness)The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard

Page 13: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo13

Solution: Greedy Algorithm

13 Greedy

While k-specializations are not found

1. Find the specialization leading to the maximum increment of the objective function (marginal gain)

2. Add the specialization to the results

TheoremThe algorithm is a ½-approximation

Finding the maximum gain is #P-complete

[Valiant79]

Solution

Fast_MMPG: Branch and bound algorithm to efficiently find the specialization with the maximum marginal gain

Page 14: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo14

The multiplicity vector

Results

0 0 0 0 01 1 0 0 02 2 1 1 02 2 2 2 02 3 3 3 1

Output set of specializations

Page 15: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo15

Upper bound on the Marginal gain

LemmaThe marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far.

Upper bound : is the value of the objective function considering only results with multiplicity

Theorem

Page 16: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo16

Upper bound

Results

0 0 0 0 01 2 1 1 1

Output set of specializations

1 2 1 1 1

Page 17: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo17

Until the reformulation with the maximum upper bound and marginal gain is not found1. Expand the reformulation with the max upper

bound2. Prune Reformulations with marginal gain

smaller than the upper bound so far

The Fast_MMPG Algorithm

upper bound

marginal gain

Page 18: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo18

Experimental Setup

18

• Datasets: • AIDS: 10k chemical compounds

• Financial: 17k transaction workflows

• Web: 13k interactions with a recommender system

• Baseline algorithms: • k-freq: returns top-k frequent supergraphs of a query

• LIndex: informative patterns index

• Experiments: • Time and objective function value varying k, query size, λ

• Anecdotal

• Scalability

Page 19: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo19

Time Comparison

Number of specializations1. k-freq runs only slighly faster2. Time increases linearly in k3. Fast_MMPG has real-time

performance

Query size1. Fast_MMPG comparable to k-

freq2. Time decreases with query

size (less reformulations)

number of reformulations (k)

query size

Page 20: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo20

Objective function gain

20

Analysis1. Lambda correctly moves the objective function towards

diversity2. k-freq only captures coverage

Page 21: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo21

Qualitative evaluation

k-freq

Fast_MMPG

C O

O OH

C

O CH3

C

O Fe

C

O NH2

C

O

CH3

C

CH3

O CH3

C

O CH3

C C

O CH2

C C

O NH2

C

O CH2

C NH

Query

Analysis• k-freq finds specialization of the same superquery

• Fast_MMPG returns reformulations with more diversified structures

Page 22: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo22

Conclusions

• First study of the problem of query reformulation in graph databases

• Principled objective function optimizing coverage and diversity

• Algorithmic solutions with quality guarantees and real time responses

Page 23: Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo23

Questions?

Thank you!