27
THE COMMUNITY-SEARCH PROBLEM AND HOW TO PLAN A SUCCESSFUL COCKTAIL PARTY Mauro Sozio and Aristides Gionis Presented By: Raghu Rangan, Jialiang Bao, Ge Wang 1

The Community-search Problem and How to Plan a Successful Cocktail Party

  • Upload
    jaron

  • View
    54

  • Download
    0

Embed Size (px)

DESCRIPTION

The Community-search Problem and How to Plan a Successful Cocktail Party. Mauro Sozio and Aristides Gionis Presented By: Raghu Rangan , Jialiang Bao , Ge Wang. Introduction. Graphs are one of the most popular data representation Have a wide range of applications - PowerPoint PPT Presentation

Citation preview

Page 1: The Community-search Problem and How to Plan a Successful Cocktail Party

1

THE COMMUNITY-SEARCH PROBLEM AND HOW TO PLAN A SUCCESSFUL COCKTAIL PARTY

Mauro Sozio and Aristides GionisPresented By:Raghu Rangan, Jialiang Bao, Ge Wang

Page 2: The Community-search Problem and How to Plan a Successful Cocktail Party

2

Introduction• Graphs are one of the most popular data representation

• Have a wide range of applications• Communities and social networks as graphs have gained

attention• People represented as nodes• Connection between people are edges

• This paper focuses on the query-dependent variant of the community search problem

Page 3: The Community-search Problem and How to Plan a Successful Cocktail Party

3

Planning a Cocktail Party• Participants should be “close” to the organizers (e.g. a

friend of a friend).• Everybody should know some of the participants.• The graph should be connected.• The number of participants should not be too small• Not too large either• This is difficult

AliceBob

CharlieDavid

Page 4: The Community-search Problem and How to Plan a Successful Cocktail Party

4

Community Search Problem• Need to find the community that a given set of users

belongs to.• Given a graph and a set of nodes, find a densely

connected subgraph containing the set of users given in input.

Page 5: The Community-search Problem and How to Plan a Successful Cocktail Party

5

Related Work• Connectivity Subgraphs

• Work has been done to find a subgraph that connects as set of query nodes

• Not enough• Need to extract best community that query nodes define

• Community Detection• Finding communities in large graphs and social networks• Typical approach looks at optimizing modularity measure• Problem is most methods consider static community detection

problem

Page 6: The Community-search Problem and How to Plan a Successful Cocktail Party

6

Related Work• Team Formation

• Lappas et. al studied this problem• Given a network where nodes are labeled with a set of skills• Find subgraph in which all skills are present and communication

cost is small• A variant of this problem is present for cocktail party planning

Page 7: The Community-search Problem and How to Plan a Successful Cocktail Party

7

Problem definition

• Problem 1:• Given an undirected(connected) graph G(V,E), a set of query nodes Q, a

goodness function f, find the most dense sub graph H = (VH, EH) of G, such that:

1. VH contains Q (all query nodes must be included)2. H is connected3. f(H) is maximized among all feasible choices of H (the large the better)

Page 8: The Community-search Problem and How to Plan a Successful Cocktail Party

8

• Problem 1:• Given an undirected(connected) graph G(V,E), a set of query nodes Q, a

goodness function f, find the most dense sub graph H = (VH, EH) of G, such that:

1. VH contains Q (all query nodes must be included)2. H is connected3. f(H) is maximized among all feasible choices of H (the large the

better)

What is query node? • They are the nodes that form the community.

What is goodness function?

• It is to define the dense degree.• Average degree• Minimum degree

Query node and goodness function?

Page 9: The Community-search Problem and How to Plan a Successful Cocktail Party

9

• Lead to unintuitive result• Easy to add unrelated but dense part

Why not choose Average degree function?

Page 10: The Community-search Problem and How to Plan a Successful Cocktail Party

10

• Problem 2:• Given an undirected(connected) graph G(V,E), a set of query nodes Q, a

goodness function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that:

• VH contains Q (all query nodes must be included)• H is connected• DQ(H) <= d• f(H) is maximized among all feasible choices of H (the larger the

better)

We have distance constraint now.

Problem definition

Page 11: The Community-search Problem and How to Plan a Successful Cocktail Party

11

• Greedy algorithm:• Steps:

1. Set G0 = G,

2. Delete the minimum degree node and all its edges, go to 2

• Termination condition:• Either:

• At least one of the query nodes Q has minimum degree• The Query node Q is no longer connected

Maximizing the minimum degree

Page 12: The Community-search Problem and How to Plan a Successful Cocktail Party

12

• Greedy can be implemented in linear time.• Idea:

1. Make separate lists of nodes with degree d, for d = 1, …, n

2. When Remove a node u from G, a neighbor of u with degree d will be remove from list d to list d – 1. So total amount of moves is O(m) (m is the edge )

3. We can locate the min node in O(1) time, so running time is O(n + m)

Time complexity?

Page 13: The Community-search Problem and How to Plan a Successful Cocktail Party

13

• Minimum degree function is actually a member of this family of functions.

• But sometimes we want some other functions to define the node density.

Generalization to monotone functions

Page 14: The Community-search Problem and How to Plan a Successful Cocktail Party

14

• Problem 3:• Given an undirected(connected) graph G(V,E), a set of query nodes Q, a

node monotone function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that:

• VH contains Q (all query nodes must be included)• H is connected• DQ(H) <= d• f(H) is maximized among all feasible choices of H (the larger the

better)

We have node monotone function now.

Problem definition

Page 15: The Community-search Problem and How to Plan a Successful Cocktail Party

15

• Greedy algorithm:• Steps:

1. Set G0 = G,

2. Delete the minimum degree node

3. Delete the node which f(G,V) is minimum, and all its edges, go to 3

• Termination condition:• Either:

• At least one of the query nodes Q has the minimum f(G,v)• The Query node Q is no longer connected

Greedy Gen

Page 16: The Community-search Problem and How to Plan a Successful Cocktail Party

16

Communities with Size Restriction• Drawback of previous algorithm

• They may return subgraphs with very large size.

Page 17: The Community-search Problem and How to Plan a Successful Cocktail Party

17

Complexity• Formal definition of minimum degree with upper bound on

the size• An integer k (size constraint)• Subgraph H has at most k nodes

• NP-hard

Page 18: The Community-search Problem and How to Plan a Successful Cocktail Party

18

Algorithm• Two heuristics that can be used to find communities with

bounded size• Inspired the Greedy algorithm for maximizing the minimum degree• GreedyDist, GreedyFast

Page 19: The Community-search Problem and How to Plan a Successful Cocktail Party

19

Algorithm• GreedyDist

• The tighter the distance constraint is, the smaller communities are

Page 20: The Community-search Problem and How to Plan a Successful Cocktail Party

20

Algorithm• GreedyDist

• Invoke GreedyGen• If the query nodes are connected but the size constraint is not

satisfied, re-execute GreedyGen with a tighter distance constraint• Repeat until the size constraint is satisfied or the query nodes are

disconnected

Page 21: The Community-search Problem and How to Plan a Successful Cocktail Party

21

Algorithm• GreedyFast

• Preprocess: the input graph is restricted to k’ closest nodes to the query nodes

• Execute Greedy on the restricted graph

• The closer a node is to the query nodes, the more related the node is to the query nodes, the more likely it is to belong to their community

Page 22: The Community-search Problem and How to Plan a Successful Cocktail Party

22

Experiment Evaluation• DBLP

• A coauthorship graph extracted from a recent snapshot of the DBLP database

• 226K nodes, 1.4M edges• Tag

• A tag graph extracted from the flickr photo-sharing portal• 38K nodes, 1.3M edges

• BIOMINE• A graph extracted from the database of the Biomine project• 16K nodes, 491K edges

Page 23: The Community-search Problem and How to Plan a Successful Cocktail Party

23

Quantitative Results• BASELINE: a simple and natural baseline algorithm• |Q|: the number of query nodes• d: distance bound• k: size bound• l: inter-distance between query nodes

Page 24: The Community-search Problem and How to Plan a Successful Cocktail Party

24

Quantitative Results

Page 25: The Community-search Problem and How to Plan a Successful Cocktail Party

25

Page 26: The Community-search Problem and How to Plan a Successful Cocktail Party

26

Conclusion• Aim to find the compact community that contains the

given query nodes and it is densely connected• Measurement based on constraints

• Minimum degree• Distance• Size

• Heuristics• GreedyGen• GreedyDist• GreedyFast

Page 27: The Community-search Problem and How to Plan a Successful Cocktail Party

27

Questions?