On Finding Fine-Granularity User Communities by Profile Decomposition

Preview:

DESCRIPTION

The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 26-29 August, 2012, Kadir Has University, Istanbul, Turkey. On Finding Fine-Granularity User Communities by Profile Decomposition. Seulki Lee , Minsam Ko , Keejun Han, Jae-Gil Lee - PowerPoint PPT Presentation

Citation preview

On Finding Fine-Granularity User Communities by Profile Decomposition

Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee

Department of Knowledge Service EngineeringKAIST(Korea Advanced Institute of Science and Technology)

{seulki15, minsam.ko, brianhan87}@gmail.com, jaegil@kaist.ac.kr

The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining26-29 August, 2012, Kadir Has University, Istanbul, Turkey

2

Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion

3

Community Discovery Community discovery is one of the most popular tasks in social network analysis.

Many real-world applications with community dis-covery• Advertisement to common interest groups• Recommendation of potential collaborators in workplaces

4

Relationships in Social Net-works A social network is modeled as a huge graph.

• A node is a user.• An edge is a relationship between users.

Two types of relationships in social network• Explicit relationship• Implicit relationship

Follower / Following Friend

Explicit relationship Implicit relationship

Unknown, but similar interest

We focus on this relationship.

5

To extract implicit relationships, a user is typically represented by his/her profile, and the similarity between user profiles is measured.

The form of the profile depends on the social network and ap-plication. • In DBLP, the profile is a list of papers he/she wrote• In Twitter, the profile is a list of tweets he/she posted

Extracting implicit relation-ships

Similaritybetween the profiles= Implicit relationship

User A’s profile User B’s profile

… …

6

Limitation of a Single Profile Generally, a user is described by only a single profile which

oversimplifies the multiple characteristics of a user. This problem results in loss of meaningful communities.

Though User A and User B share the same interest about photography, overall similarity between the two users is not very high.

7

DecompClus We propose DecompClus, the community discovery method

of profile decomposition, which divides a profile into sub-profiles.

outdoor, hiking, …

art, museum,

photo, lens, …

photo, color, …

photo, lens, … outdoor, hiking, …

photo, color, …art, museum, …

Step1: Profile Decomposition Step2: sub-profile clustering

photo, lens, …

photo, color, …

outdoor, hiking, …

art, museum, …

Profiles Sub-Profiles Communities

8

Table of Contents• Introduction• DecompClus Algorithm• Evaluation• Related Work• Conclusion

9

Overall Procedure of Decom-pClus

10

Step 1: Profile Decomposition (1/2) A network of unit items (e.g., papers or tweets) is constructed

for each user’s profile.• A node (item) is represented by a term vector (weight: TF-IDF).• An edge is determined as the similarity between two nodes (cosine

similarity).

i2 i6

i5

i4

i3

i1

i7

User A’s profile

11

Step 1: Profile Decomposition (2/2) Clustering is performed on the small network.

• We adopted a clustering algorithm based on modularity optimiza-tion, which tries to detect high modularity partitions of networks [V. D. Blondel, et. al., 2008].

Each cluster becomes a sub-profile.

User A’s profile User A’s sub-profiles

12

Step 2: Sub-Profile Clustering (1/2) A network of sub-profiles is constructed by accumulating sub-

profiles from every user.• A node (sub-profile) is represented by a term vector (weight: TF-

IDF).• A edge is weighted by the similarity between two nodes (cosine

similarity).

User A’s sub-profile User D’s sub-profile

User E’s sub-profile

User A’s sub-profileUser B’s sub-profile

User C’s sub-profile

13

Step 2: Sub-Profile Clustering (2/2) Clustering is performed on the network of sub-profiles.

• The same clustering method is used to group sub-profiles. Now, each cluster becomes a user community.

A user can belong to multiple communities (e.g., User A is in C1 and C2)• DecompClus is a method to discover overlapping community struc-

ture by non-overlapping clustering method.

Community C1 Community C2

User A’s sub-profile User D’s sub-profile

User E’s sub-profile

User A’s sub-profileUser B’s sub-profile

User C’s sub-profile

User A User D

User E

User AUser B

User C

14

Overall Procedure of Decom-pClus

15

Table of Contents• Introduction• DecompClus Algorithm• Evaluation• Related Work• Conclusion

16

Experimental Set-up (1/3) Evaluation methods

• Quantitative evaluation: verify that DecompClus finds more tightly and well-connected communities Modularity value Intra-similarity Inter-similarity

• Qualitative evaluation: explain how the communities by our method and those by compared method are different semanti-cally Defining the theme of each community Case studies (See the paper) Visualization

17

Distribution of users according to their tags

Experimental Set-up (2/3) CiteULike

• Social bookmarking service for scholarly papers

• http://www.citeulike.org/faq/data.adp

Dataset• # of users = 122• # of articles = 25,089• # of unique stemmed tags =

16,161• Half of the users have more than

one interesttag like

'social_network%' or 'socialnetwork%'

tag like 'data_mining%' or 'mining%' or

'knowledge_discovery%'

tag like 'recommend%’

18

Experimental Set-up (3/3) Implementation

• Gephi Library - open-source software for visualizing and analyzing large network graphs

Baseline• Follows almost the same procedures. • Use only one overall profile for a user

photo, lens, … outdoor, hiking, …

photo, color, …art, museum, …

photo, lens, … outdoor, hiking,…

photo, color, …art, museum, …

Profiles Communities

… …

… …

19

Discovered Communities

Community ID # OF USERSBc1 57Bc2 65

# of community• DecompClus finds more communities than Baseline does.

# of users in community• The discovered communities by DecompClus have a greater number of

members than Baseline.∵ DecompClus allows a user to belong to multiple communities at the same time.

Community ID # OF USERSDC1 80DC2 53DC3 91DC4 84

Baseline DecompClus

20

Quantitative Evaluation • DecompClus achieves better metrics than Baseline

• Modularity value: the strength of division of a network into mod-ules

• Intra-similarity: the average value of similarities in a community• Inter-similarity: the average value of similarities between com-

munities

Modularity0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.0035

0.0734

BaselineDecompClus

Intra-similarity0

0.005

0.01

0.015

0.02

0.025

0.03

0.0133

0.0279

BaselineDecompClus

Inter-similarity0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.50.4534

0.3604

BaselineDecompClus

In DecompClus the connections between the members within a community are denser; in contrast, the connections between the members in different communities are sparser.

21

ID THEME

BC1 Data mining &Recommendation

BC2 Social Network

BaselineID THEME

DC1 Data mining &Recommendation

DC2 Semantic Web

DC3 Data mining &Bioinformatics

DC4 Social Network

DecompClus

newly founded

Qualitative Evaluation (1/2) DecompClus preserves the themes defined by Baseline. DecompClus finds new communities that are not found by

Baseline.

22

Distribution of articles related to “Semantic web”

Distribution of articles related to “Bioinformatics”

Baseline DecompClus

Qualitative Evaluation (2/2) In DecompClus , a user’s minor interests are not assimi-

lated into his/her major interests, so new communities which consist of users’ minor interests can be discovered.

23

Visualization

By ForceAtlas2 layout provided by Gephi

The community structure produced by DecompClus is more clearly distinguishable.

Baseline DecompClus

24

Table of Contents• Introduction• DecompClus Algorithm• Evaluation• Related Work• Conclusion

25

Related Work (1/2)

Approach # of profile per user

In clustering, the type of mapping

(Node: Community)Result

Non-overlapping community discovery One profile 1:1 A user belongs to

one community

Overlapping commu-nity discovery One profile 1:N

A user belongs to multiple commu-nities

DecompClus Multiple sub-profiles 1:1

A user belongs to multiple commu-nities

Comparison with related areas

26

Related Work (2/2) Non-overlapping community discovery

• Newman’s method [Newman and Girvan, 2004]• Multi-level graph partitioning method [Karypis and Kumar, 1995]• Attribute augmented graph [Zhou et al., 2006]• Bayesian generative models [Wang, 2006]

Overlapping community discovery• CPM (clique percolation method) [Pallal et al., 2005]• Connectedness and local optimality [Goldberg et al., 2010]• Label propagation [Gregory, 2009]

27

Conclusion A novel concept of profile decomposition, which enables

us to detect fine-granularity user communities with implicit relationships

A new approach to discovering overlapping communi-ties with non-overlapping community discovery algo-rithms

We demonstrate, by using real data set, that our algorithm effectively discovers user communities from social media data.

THANK YOU !!

29

Case Studies Case 1

• Users who become a member in multiple communities by pro-file decomposition

For example, a user A’s profile

In our data set, there are total 99 users (81.1%) like the user A.

Baseline DecompClus

Community Bc1(data mining&Recommendation)

User A

Community Dc2 (semantic web)

Community Dc4 (social network)

Community Dc1 (data mining & recommendation)

semantics, seman-tic web, rdf, ontol-ogy, social seman-tic web …

User A’s sub-profile1user model, rec-ommender, per-sonalization, user profiling, knn, data mining …

User A’s sub-profile2

social network analysis, social search, graphs, …

User A’s sub-profile3

Community Bc2(Social network)Community Dc3 (Data mining & Bioinformatics)

30

Case Studies Case 2

• Users who become a member in the communities newly dis-covered by DecompClus

There are total 9 users (7.3%) like the user B.

For example, a user B’s profileBaseline DecompClus

Community Bc1(data mining&Recommendation)

User B

Community Dc2 (semantic web)

Community Dc4 (social network)

Community Dc1 (data mining & recommendation)

User B’s sub-profile1

Community Bc2(Social network)Community Dc3 (Data mining & Bioinformatics)

statistics, cancer, genomics, gene, sequencing, virus, bacteria, database, classification, …

Recommended