30
On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee , Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering KAIST(Korea Advanced Institute of Science and Technology) {seulki15, minsam.ko, brianhan87}@gmail.com, [email protected] The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 26-29 August, 2012, Kadir Has University, Istanbul, Turkey

On Finding Fine-Granularity User Communities by Profile Decomposition

  • Upload
    akio

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 26-29 August, 2012, Kadir Has University, Istanbul, Turkey. On Finding Fine-Granularity User Communities by Profile Decomposition. Seulki Lee , Minsam Ko , Keejun Han, Jae-Gil Lee - PowerPoint PPT Presentation

Citation preview

Page 1: On Finding Fine-Granularity User  Communities by Profile Decomposition

On Finding Fine-Granularity User Communities by Profile Decomposition

Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee

Department of Knowledge Service EngineeringKAIST(Korea Advanced Institute of Science and Technology)

{seulki15, minsam.ko, brianhan87}@gmail.com, [email protected]

The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining26-29 August, 2012, Kadir Has University, Istanbul, Turkey

Page 2: On Finding Fine-Granularity User  Communities by Profile Decomposition

2

Table of Contents Introduction DecompClus Algorithm Evaluation Related Work Conclusion

Page 3: On Finding Fine-Granularity User  Communities by Profile Decomposition

3

Community Discovery Community discovery is one of the most popular tasks in social network analysis.

Many real-world applications with community dis-covery• Advertisement to common interest groups• Recommendation of potential collaborators in workplaces

Page 4: On Finding Fine-Granularity User  Communities by Profile Decomposition

4

Relationships in Social Net-works A social network is modeled as a huge graph.

• A node is a user.• An edge is a relationship between users.

Two types of relationships in social network• Explicit relationship• Implicit relationship

Follower / Following Friend

Explicit relationship Implicit relationship

Unknown, but similar interest

We focus on this relationship.

Page 5: On Finding Fine-Granularity User  Communities by Profile Decomposition

5

To extract implicit relationships, a user is typically represented by his/her profile, and the similarity between user profiles is measured.

The form of the profile depends on the social network and ap-plication. • In DBLP, the profile is a list of papers he/she wrote• In Twitter, the profile is a list of tweets he/she posted

Extracting implicit relation-ships

Similaritybetween the profiles= Implicit relationship

User A’s profile User B’s profile

… …

Page 6: On Finding Fine-Granularity User  Communities by Profile Decomposition

6

Limitation of a Single Profile Generally, a user is described by only a single profile which

oversimplifies the multiple characteristics of a user. This problem results in loss of meaningful communities.

Though User A and User B share the same interest about photography, overall similarity between the two users is not very high.

Page 7: On Finding Fine-Granularity User  Communities by Profile Decomposition

7

DecompClus We propose DecompClus, the community discovery method

of profile decomposition, which divides a profile into sub-profiles.

outdoor, hiking, …

art, museum,

photo, lens, …

photo, color, …

photo, lens, … outdoor, hiking, …

photo, color, …art, museum, …

Step1: Profile Decomposition Step2: sub-profile clustering

photo, lens, …

photo, color, …

outdoor, hiking, …

art, museum, …

Profiles Sub-Profiles Communities

Page 8: On Finding Fine-Granularity User  Communities by Profile Decomposition

8

Table of Contents• Introduction• DecompClus Algorithm• Evaluation• Related Work• Conclusion

Page 9: On Finding Fine-Granularity User  Communities by Profile Decomposition

9

Overall Procedure of Decom-pClus

Page 10: On Finding Fine-Granularity User  Communities by Profile Decomposition

10

Step 1: Profile Decomposition (1/2) A network of unit items (e.g., papers or tweets) is constructed

for each user’s profile.• A node (item) is represented by a term vector (weight: TF-IDF).• An edge is determined as the similarity between two nodes (cosine

similarity).

i2 i6

i5

i4

i3

i1

i7

User A’s profile

Page 11: On Finding Fine-Granularity User  Communities by Profile Decomposition

11

Step 1: Profile Decomposition (2/2) Clustering is performed on the small network.

• We adopted a clustering algorithm based on modularity optimiza-tion, which tries to detect high modularity partitions of networks [V. D. Blondel, et. al., 2008].

Each cluster becomes a sub-profile.

User A’s profile User A’s sub-profiles

Page 12: On Finding Fine-Granularity User  Communities by Profile Decomposition

12

Step 2: Sub-Profile Clustering (1/2) A network of sub-profiles is constructed by accumulating sub-

profiles from every user.• A node (sub-profile) is represented by a term vector (weight: TF-

IDF).• A edge is weighted by the similarity between two nodes (cosine

similarity).

User A’s sub-profile User D’s sub-profile

User E’s sub-profile

User A’s sub-profileUser B’s sub-profile

User C’s sub-profile

Page 13: On Finding Fine-Granularity User  Communities by Profile Decomposition

13

Step 2: Sub-Profile Clustering (2/2) Clustering is performed on the network of sub-profiles.

• The same clustering method is used to group sub-profiles. Now, each cluster becomes a user community.

A user can belong to multiple communities (e.g., User A is in C1 and C2)• DecompClus is a method to discover overlapping community struc-

ture by non-overlapping clustering method.

Community C1 Community C2

User A’s sub-profile User D’s sub-profile

User E’s sub-profile

User A’s sub-profileUser B’s sub-profile

User C’s sub-profile

User A User D

User E

User AUser B

User C

Page 14: On Finding Fine-Granularity User  Communities by Profile Decomposition

14

Overall Procedure of Decom-pClus

Page 15: On Finding Fine-Granularity User  Communities by Profile Decomposition

15

Table of Contents• Introduction• DecompClus Algorithm• Evaluation• Related Work• Conclusion

Page 16: On Finding Fine-Granularity User  Communities by Profile Decomposition

16

Experimental Set-up (1/3) Evaluation methods

• Quantitative evaluation: verify that DecompClus finds more tightly and well-connected communities Modularity value Intra-similarity Inter-similarity

• Qualitative evaluation: explain how the communities by our method and those by compared method are different semanti-cally Defining the theme of each community Case studies (See the paper) Visualization

Page 17: On Finding Fine-Granularity User  Communities by Profile Decomposition

17

Distribution of users according to their tags

Experimental Set-up (2/3) CiteULike

• Social bookmarking service for scholarly papers

• http://www.citeulike.org/faq/data.adp

Dataset• # of users = 122• # of articles = 25,089• # of unique stemmed tags =

16,161• Half of the users have more than

one interesttag like

'social_network%' or 'socialnetwork%'

tag like 'data_mining%' or 'mining%' or

'knowledge_discovery%'

tag like 'recommend%’

Page 18: On Finding Fine-Granularity User  Communities by Profile Decomposition

18

Experimental Set-up (3/3) Implementation

• Gephi Library - open-source software for visualizing and analyzing large network graphs

Baseline• Follows almost the same procedures. • Use only one overall profile for a user

photo, lens, … outdoor, hiking, …

photo, color, …art, museum, …

photo, lens, … outdoor, hiking,…

photo, color, …art, museum, …

Profiles Communities

… …

… …

Page 19: On Finding Fine-Granularity User  Communities by Profile Decomposition

19

Discovered Communities

Community ID # OF USERSBc1 57Bc2 65

# of community• DecompClus finds more communities than Baseline does.

# of users in community• The discovered communities by DecompClus have a greater number of

members than Baseline.∵ DecompClus allows a user to belong to multiple communities at the same time.

Community ID # OF USERSDC1 80DC2 53DC3 91DC4 84

Baseline DecompClus

Page 20: On Finding Fine-Granularity User  Communities by Profile Decomposition

20

Quantitative Evaluation • DecompClus achieves better metrics than Baseline

• Modularity value: the strength of division of a network into mod-ules

• Intra-similarity: the average value of similarities in a community• Inter-similarity: the average value of similarities between com-

munities

Modularity0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.0035

0.0734

BaselineDecompClus

Intra-similarity0

0.005

0.01

0.015

0.02

0.025

0.03

0.0133

0.0279

BaselineDecompClus

Inter-similarity0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.50.4534

0.3604

BaselineDecompClus

In DecompClus the connections between the members within a community are denser; in contrast, the connections between the members in different communities are sparser.

Page 21: On Finding Fine-Granularity User  Communities by Profile Decomposition

21

ID THEME

BC1 Data mining &Recommendation

BC2 Social Network

BaselineID THEME

DC1 Data mining &Recommendation

DC2 Semantic Web

DC3 Data mining &Bioinformatics

DC4 Social Network

DecompClus

newly founded

Qualitative Evaluation (1/2) DecompClus preserves the themes defined by Baseline. DecompClus finds new communities that are not found by

Baseline.

Page 22: On Finding Fine-Granularity User  Communities by Profile Decomposition

22

Distribution of articles related to “Semantic web”

Distribution of articles related to “Bioinformatics”

Baseline DecompClus

Qualitative Evaluation (2/2) In DecompClus , a user’s minor interests are not assimi-

lated into his/her major interests, so new communities which consist of users’ minor interests can be discovered.

Page 23: On Finding Fine-Granularity User  Communities by Profile Decomposition

23

Visualization

By ForceAtlas2 layout provided by Gephi

The community structure produced by DecompClus is more clearly distinguishable.

Baseline DecompClus

Page 24: On Finding Fine-Granularity User  Communities by Profile Decomposition

24

Table of Contents• Introduction• DecompClus Algorithm• Evaluation• Related Work• Conclusion

Page 25: On Finding Fine-Granularity User  Communities by Profile Decomposition

25

Related Work (1/2)

Approach # of profile per user

In clustering, the type of mapping

(Node: Community)Result

Non-overlapping community discovery One profile 1:1 A user belongs to

one community

Overlapping commu-nity discovery One profile 1:N

A user belongs to multiple commu-nities

DecompClus Multiple sub-profiles 1:1

A user belongs to multiple commu-nities

Comparison with related areas

Page 26: On Finding Fine-Granularity User  Communities by Profile Decomposition

26

Related Work (2/2) Non-overlapping community discovery

• Newman’s method [Newman and Girvan, 2004]• Multi-level graph partitioning method [Karypis and Kumar, 1995]• Attribute augmented graph [Zhou et al., 2006]• Bayesian generative models [Wang, 2006]

Overlapping community discovery• CPM (clique percolation method) [Pallal et al., 2005]• Connectedness and local optimality [Goldberg et al., 2010]• Label propagation [Gregory, 2009]

Page 27: On Finding Fine-Granularity User  Communities by Profile Decomposition

27

Conclusion A novel concept of profile decomposition, which enables

us to detect fine-granularity user communities with implicit relationships

A new approach to discovering overlapping communi-ties with non-overlapping community discovery algo-rithms

We demonstrate, by using real data set, that our algorithm effectively discovers user communities from social media data.

Page 28: On Finding Fine-Granularity User  Communities by Profile Decomposition

THANK YOU !!

Page 29: On Finding Fine-Granularity User  Communities by Profile Decomposition

29

Case Studies Case 1

• Users who become a member in multiple communities by pro-file decomposition

For example, a user A’s profile

In our data set, there are total 99 users (81.1%) like the user A.

Baseline DecompClus

Community Bc1(data mining&Recommendation)

User A

Community Dc2 (semantic web)

Community Dc4 (social network)

Community Dc1 (data mining & recommendation)

semantics, seman-tic web, rdf, ontol-ogy, social seman-tic web …

User A’s sub-profile1user model, rec-ommender, per-sonalization, user profiling, knn, data mining …

User A’s sub-profile2

social network analysis, social search, graphs, …

User A’s sub-profile3

Community Bc2(Social network)Community Dc3 (Data mining & Bioinformatics)

Page 30: On Finding Fine-Granularity User  Communities by Profile Decomposition

30

Case Studies Case 2

• Users who become a member in the communities newly dis-covered by DecompClus

There are total 9 users (7.3%) like the user B.

For example, a user B’s profileBaseline DecompClus

Community Bc1(data mining&Recommendation)

User B

Community Dc2 (semantic web)

Community Dc4 (social network)

Community Dc1 (data mining & recommendation)

User B’s sub-profile1

Community Bc2(Social network)Community Dc3 (Data mining & Bioinformatics)

statistics, cancer, genomics, gene, sequencing, virus, bacteria, database, classification, …