Upload
jaron
View
54
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The Community-search Problem and How to Plan a Successful Cocktail Party. Mauro Sozio and Aristides Gionis Presented By: Raghu Rangan , Jialiang Bao , Ge Wang. Introduction. Graphs are one of the most popular data representation Have a wide range of applications - PowerPoint PPT Presentation
Citation preview
1
THE COMMUNITY-SEARCH PROBLEM AND HOW TO PLAN A SUCCESSFUL COCKTAIL PARTY
Mauro Sozio and Aristides GionisPresented By:Raghu Rangan, Jialiang Bao, Ge Wang
2
Introduction• Graphs are one of the most popular data representation
• Have a wide range of applications• Communities and social networks as graphs have gained
attention• People represented as nodes• Connection between people are edges
• This paper focuses on the query-dependent variant of the community search problem
3
Planning a Cocktail Party• Participants should be “close” to the organizers (e.g. a
friend of a friend).• Everybody should know some of the participants.• The graph should be connected.• The number of participants should not be too small• Not too large either• This is difficult
AliceBob
CharlieDavid
4
Community Search Problem• Need to find the community that a given set of users
belongs to.• Given a graph and a set of nodes, find a densely
connected subgraph containing the set of users given in input.
5
Related Work• Connectivity Subgraphs
• Work has been done to find a subgraph that connects as set of query nodes
• Not enough• Need to extract best community that query nodes define
• Community Detection• Finding communities in large graphs and social networks• Typical approach looks at optimizing modularity measure• Problem is most methods consider static community detection
problem
6
Related Work• Team Formation
• Lappas et. al studied this problem• Given a network where nodes are labeled with a set of skills• Find subgraph in which all skills are present and communication
cost is small• A variant of this problem is present for cocktail party planning
7
Problem definition
• Problem 1:• Given an undirected(connected) graph G(V,E), a set of query nodes Q, a
goodness function f, find the most dense sub graph H = (VH, EH) of G, such that:
1. VH contains Q (all query nodes must be included)2. H is connected3. f(H) is maximized among all feasible choices of H (the large the better)
8
• Problem 1:• Given an undirected(connected) graph G(V,E), a set of query nodes Q, a
goodness function f, find the most dense sub graph H = (VH, EH) of G, such that:
1. VH contains Q (all query nodes must be included)2. H is connected3. f(H) is maximized among all feasible choices of H (the large the
better)
What is query node? • They are the nodes that form the community.
What is goodness function?
• It is to define the dense degree.• Average degree• Minimum degree
Query node and goodness function?
9
• Lead to unintuitive result• Easy to add unrelated but dense part
Why not choose Average degree function?
10
• Problem 2:• Given an undirected(connected) graph G(V,E), a set of query nodes Q, a
goodness function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that:
• VH contains Q (all query nodes must be included)• H is connected• DQ(H) <= d• f(H) is maximized among all feasible choices of H (the larger the
better)
We have distance constraint now.
Problem definition
11
• Greedy algorithm:• Steps:
1. Set G0 = G,
2. Delete the minimum degree node and all its edges, go to 2
• Termination condition:• Either:
• At least one of the query nodes Q has minimum degree• The Query node Q is no longer connected
Maximizing the minimum degree
12
• Greedy can be implemented in linear time.• Idea:
1. Make separate lists of nodes with degree d, for d = 1, …, n
2. When Remove a node u from G, a neighbor of u with degree d will be remove from list d to list d – 1. So total amount of moves is O(m) (m is the edge )
3. We can locate the min node in O(1) time, so running time is O(n + m)
Time complexity?
13
• Minimum degree function is actually a member of this family of functions.
• But sometimes we want some other functions to define the node density.
Generalization to monotone functions
14
• Problem 3:• Given an undirected(connected) graph G(V,E), a set of query nodes Q, a
node monotone function f, and a number d as distance, find the most dense sub graph H = (VH, EH) of G, such that:
• VH contains Q (all query nodes must be included)• H is connected• DQ(H) <= d• f(H) is maximized among all feasible choices of H (the larger the
better)
We have node monotone function now.
Problem definition
15
• Greedy algorithm:• Steps:
1. Set G0 = G,
2. Delete the minimum degree node
3. Delete the node which f(G,V) is minimum, and all its edges, go to 3
• Termination condition:• Either:
• At least one of the query nodes Q has the minimum f(G,v)• The Query node Q is no longer connected
Greedy Gen
16
Communities with Size Restriction• Drawback of previous algorithm
• They may return subgraphs with very large size.
17
Complexity• Formal definition of minimum degree with upper bound on
the size• An integer k (size constraint)• Subgraph H has at most k nodes
• NP-hard
18
Algorithm• Two heuristics that can be used to find communities with
bounded size• Inspired the Greedy algorithm for maximizing the minimum degree• GreedyDist, GreedyFast
19
Algorithm• GreedyDist
• The tighter the distance constraint is, the smaller communities are
20
Algorithm• GreedyDist
• Invoke GreedyGen• If the query nodes are connected but the size constraint is not
satisfied, re-execute GreedyGen with a tighter distance constraint• Repeat until the size constraint is satisfied or the query nodes are
disconnected
21
Algorithm• GreedyFast
• Preprocess: the input graph is restricted to k’ closest nodes to the query nodes
• Execute Greedy on the restricted graph
• The closer a node is to the query nodes, the more related the node is to the query nodes, the more likely it is to belong to their community
22
Experiment Evaluation• DBLP
• A coauthorship graph extracted from a recent snapshot of the DBLP database
• 226K nodes, 1.4M edges• Tag
• A tag graph extracted from the flickr photo-sharing portal• 38K nodes, 1.3M edges
• BIOMINE• A graph extracted from the database of the Biomine project• 16K nodes, 491K edges
23
Quantitative Results• BASELINE: a simple and natural baseline algorithm• |Q|: the number of query nodes• d: distance bound• k: size bound• l: inter-distance between query nodes
24
Quantitative Results
25
26
Conclusion• Aim to find the compact community that contains the
given query nodes and it is densely connected• Measurement based on constraints
• Minimum degree• Distance• Size
• Heuristics• GreedyGen• GreedyDist• GreedyFast
27
Questions?