Upload
daniele-rossetti
View
218
Download
0
Embed Size (px)
Citation preview
7/31/2019 Graph Mining and Social Networks
1/23
Graph Mining and Social Network
AnalysisData Mining and Text Mining (UIC 583 @ Politecnico di Milano)
7/31/2019 Graph Mining and Social Networks
2/23
Daniele Loiacono
q Jiawei Han and Micheline Kamber, "DataMining: Concepts and Techniques", TheMorgan Kaufmann Series in DataManagement Systems (Second Edition)
" Chapter 9
7/31/2019 Graph Mining and Social Networks
3/23
Graph Mining
7/31/2019 Graph Mining and Social Networks
4/23
Daniele Loiacono
q Graphs are becoming increasingly important to model manyphenomena in a large class of domains (e.g., bioinformatics,computer vision, social analysis)
q To deal with these needs, many data mining approacheshave been extended also to graphs and trees
q Major approaches" Mining frequent subgraphs" Indexing" Similarity search" Classification" Clustering
7/31/2019 Graph Mining and Social Networks
5/23
Daniele Loiacono
q Given a labeled graph data setD = {G1,G2,,Gn}
q We define as theq A subgraph in D is a subgraph with a support
greater than
q How to find frequent subgraph?" Apriori-based approach" Pattern-growth approach
7/31/2019 Graph Mining and Social Networks
6/23
Daniele Loiacono
q Apply a level-wise iterative algorithm1. Choose two subgraphs in S two similar subgraphs in a subgraph3. If the new subgraph is add to S4. Restart from 2. until all similar subgraphs have been
considered. Otherwise restart from 1. and move to k+1.
q What is subgraph size?" Number of vertex" Number of edges" Number of edge-disjoint paths
q Two subgraphs of size-k are similar if they have the samesize-(k-1) subgraph
q AprioriGraph has a big computational cost (due to themerging step)
7/31/2019 Graph Mining and Social Networks
7/23
7/31/2019 Graph Mining and Social Networks
8/23
Daniele Loiacono
q Closed subgraphs" G is iff there is no proper supergraph G with thesame support of G" Reduce the growth of subgraphs discovered" Is a more compact representation of knowledge
q Unlabeled (or partially labeled) graphs" Introduce a special label " can match any label or only itself
q Constrained subgraphs" Containment constraint (edges, vertex, subgraphs)" Geometric constraint" Value constraint
7/31/2019 Graph Mining and Social Networks
9/23
Daniele Loiacono
q Indexing is basilar for effective search and query processingq How to index graphs?q approach takes the as indexing unit
" All the path up to maxLlength are indexed" Does not scale very well
q approach takes and as indexing unit" A subgraph is frequent if it has a support greater than a
threshold
" A subgraph is discriminative if its support cannot be wellapproximated by the intersection of the graph sets thatcontain one of its subgraphs
7/31/2019 Graph Mining and Social Networks
10/23
Daniele Loiacono
q Mining of frequent subgraphs can be effectively used forclassification and clustering purposes
q Classification" and subgraphs are used as toperform the classification task" A subgraph is discriminative if it is frequent only in one class ofgraphs and infrequent in the others
" The threshold on frequency and discriminativeness should betuned to obtain the desired classification results
q Clustering" The mined frequent subgraphs are used to define between graphs
" Two graphs that should beconsidered and grouped in the same cluster
" The threshold on frequency can be tuned to find the desirednumber of clusters
q As the mining step affects heavily the final outcome, this is anintertwined process rather tan a two-steps process
7/31/2019 Graph Mining and Social Networks
11/23
Social Network Analysis
7/31/2019 Graph Mining and Social Networks
12/23
7/31/2019 Graph Mining and Social Networks
13/23
Daniele Loiacono
q Are social networks random graphs?q
NO!
7/31/2019 Graph Mining and Social Networks
14/23
Daniele Loiacono
Peter
Jane
SarahRalph
7/31/2019 Graph Mining and Social Networks
15/23
Daniele Loiacono
q Definitions" Nodes is the number of incident edges" Network is the max distance within 90% of
the network
q Properties"
""
n: number of nodes
e: number of eges1
7/31/2019 Graph Mining and Social Networks
16/23
Daniele Loiacono
q Several tasks can be identified in the analysis ofsocial networks
q Link based object classification" Classification of objects on the basis of its attributes, its
links and attributes of objects linked to it
" E.g., predict topic of a paper on the basis of Keywords occurrence
q Link type prediction" Prediction of link type on the basis of objects attributes" E.g., predict if a link between two Web pages is an
advertising link or not
q Predicting link existence" Predict the presence of a link between two objects
7/31/2019 Graph Mining and Social Networks
17/23
Daniele Loiacono
q Link cardinality estimation" Prediction of the number of links to an object" Prediction of the number of objects reachable from a
specific object
q Object reconciliation" Discover if two objects are the same on the basis of their
attributes and links" E.g., predict if two websites are mirrors of each otherq Group detection
" Clustering of objects on the basis both of their attributesand their links
q Subgraph detection" Discover characteristic subgraphs within network
7/31/2019 Graph Mining and Social Networks
18/23
Daniele Loiacono
q Feature construction"Not only the objects attributes need to be considered butalso attributes of" and techniques must beapplied to reduce the size of search space
q Collective classification and consolidation" Unlabeled data cannot be classified independently" New objects can be and need to be consideredto consolidate the current model
q Link prediction" The prior probability of link between two objects may bevery low
q Community mining from multirelational networks" Many approaches assume an
while social networks usually representand
7/31/2019 Graph Mining and Social Networks
19/23
Daniele Loiacono
q Link Predictionq Viral Marketing
7/31/2019 Graph Mining and Social Networks
20/23
Daniele Loiacono
q What edges will be added to the network?q
Given a snapshot of a network at time t, aimsto predict the edges that will be added before a given futuretime t
q Link prediction is generally solved assigning to each pair ofnodes a weight
q The higher the scorethe more likely that link will be added inthe near future
q The score(X,Y)can be computed in several way" : the shortest he path between X and Y thehighest is their score" : the greater the number of neighborsX and Y have in common, the highest is their score
" : weighted sum of paths thatconnects X and Y (shorter paths have usually largerweights)
7/31/2019 Graph Mining and Social Networks
21/23
Daniele Loiacono
q Several marketing approaches" Mass marketing is targeted on specific segment ofcustomers" Direct marketing is target on specific customers solely on
the basis of their characteristics
" tries to exploit the tomaximize the output of marketing actions
q Each customer has a specific based on" The number of connections" Its role in the network (e.g., opinion leader, listener)" Role of its connections
q Viral marketing aims to exploit the network value ofcustomers to predict their influence and to maximize theoutcome of marketing actions
7/31/2019 Graph Mining and Social Networks
22/23
Daniele Loiacono
q 500 randomly chosen customers are given a product (from 5000).
7/31/2019 Graph Mining and Social Networks
23/23
Daniele Loiacono
q The 500 most connected consumers are given a product.