27
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations 1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007 Advisor: Hsin-Hsi Chen Reporter: Y.H Chang 2008-06-06 Rui Li, Shenghua Bao, Yong Yu, Zhong Su, and Ben Fei Shanghai JiaoTong University IBM China Res earch Lab

2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

Embed Size (px)

Citation preview

Page 1: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations1

Towards Effective Browsing of Large Scale Social Annotations

WWW 2007

Advisor: Hsin-Hsi ChenReporter: Y.H Chang

2008-06-06

Rui Li, Shenghua Bao, Yong Yu, Zhong Su, and Ben Fei

Shanghai JiaoTong University IBM China Research Lab

Page 2: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations2

Outline

• Introduction

• ELSABer overview

• Components of ELSABer

• Enhanced models

• Experimental results

• Conclusion

Page 3: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations3

Introduction

• Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations.

• How to effectively find desired resources from large annotation data is a new problem.

• In this paper, we propose a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data.

Page 4: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations4

Introduction

• ELSABer helps the users browse huge number of annotations in a semantic, hierarchical and efficient way.

• By incorporating the personal and time information, ELSABer can be further extended for personalized and time-related browsing.

Page 5: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations5

A set of pages

related to the current annotation“programming”

The prototype system based on ELSABer

Sub-tags (sub category) of “programming”

Page 6: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations6

ELSABer overview• Input An empty concept set SC

• Step 1 Output the initial view of annotations– generates TOP 100 tags from 2000 most frequently URLs and tags. – They are the roots in hierarchical browsing.

• Loop User select a tag Ti

• Step 2 Concept Matching– Add tag Ti to set SC

– Calculate related tag set and URL set• Step 3 (optional) sample URL set and sample Tag set• Step 4 Hierarchical Browsing

– 4-1 Calculate candidate sub-tags– 4-2 Rank the sub-tags by Infor-score

• IF Termination condition Satisfied; Return• ELSE Loop

Page 7: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations7

Components of ELSABer

• Data setup and representation• Semantic Browsing

– a. Annotation Similarity Estimation

– b. Generating the Semantic Concept

• Hierarchical Browsing– c. Sub-Tag Generation

– d. Sub-Tag Clustering

• Efficient Browsing

Page 8: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations8

Data setup and representation

• Del.icio.us (May, 2006)• We define an annotation as a quadruple:

– (User, URL, Tag, Time).• Associated matrix Mmxn

• m and n is the total number of tags and URLs• |URL(ti)| represents the number of URLs annotated by tag ti.• Cij denote the number of users who annotate the jth URL with

the ith tag

Like the TFIDF of IR

Page 9: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations9

Data setup and representation

• Given the associated matrix Mmxn :

T1

T2

.

.

.

Tm

the tag can be represented as a row vector Ti (U1,U2,.. Un) of M

the URL can be represented as a column vector Ui (t1,t2,…,tm) of M.

U1 U2 .. … .. Un

Page 10: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations10

Semantic Browsinga. Annotation Similarity Estimation

• Similarity:

• Special case-1(stemming):

Ex: Programs & Programming

=> add 0.1 weight

• Special case-2(punctuation):

Ex: Web-dev & WebDev

=> add 0.08 weight

Page 11: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations11

Semantic Browsingb. Generating the Semantic Concept

• Given the selected tag ti, we choose a tag set STi that is most related to ti by following rules:– 1. tj should be among the N most similar tags relat

ed to ti– 2. The similarity should be larger than a threshold

θ.– N=4, θ=0.7

• semantic concept Ci = STi {ti}∪

Page 12: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations12

Semantic Browsingb. Generating the Semantic Concept

• The path of user’s clicking: t1, t2,…,tL will bring a sequence of concepts: C1, C2,…,CL.

• Let concept set SC = {C1, C2,…, CL}.

• The related URLs :– ReURL(SC ) = {u | C S∀ ∈ C ,T(u) ∩C ≠ Φ}– T(u) means the set of annotations given to URL u.

• the related tags can be defined as all the tags given to ReURL(SC ):– ReTag(SC ) {t | u ReURL(S∈ C ),t T(u)}∈

Page 13: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations13

Hierarchical Browsingc. Sub-Tag Generation

• If the intersection URL set is the main part of all the URLs of ti, but a small part of tj, we can infer that ti is a sub-tag of tj

40 related tags of “google”

Page 14: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations14

Hierarchical Browsingc. Sub-Tag Generation

Features Features

Coverage of Tags

ICR

Intersection Rate

IR’

IRR Top 1~30 =1 (by IR rank)

Top 30~60 =2

Top 60~ =3

U(ti) denotes the number of URLs tagged with ti

Page 15: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations15

Hierarchical Browsing c. Sub-Tag Generation

• Given the features above, each related tag is represented as a feature vector. A decision tree can be derived from the manually labeled data set to predict the sub-tag relations using C4.5.

Page 16: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations16

Hierarchical Browsingd. Sub-Tag Clustering

Page 17: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations17

Hierarchical Browsingd. Sub-Tag Clustering

• Infor(t) = w1TFIDF(t) + w2ICS(t) + w3TE(t) • Intra-Cluster Similarity:

– ot denotes the centroid of all the URLs associated with the tag

• Tag Entropy:

• In our experiment, these weights are 0.58, 0.27, and 0.13, respectively.

Page 18: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations18

Efficient Browsing

• Observation : People use popular tags to annotate URLs and also the popular URLs are annotated by the majority of tags.

Page 19: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations19

Efficient Browsing

• So we can get good results efficiently by running our algorithm in a small sub tagging space .

• In our experiment, we sampling 2000 most frequently annotated URLs and 2000 most frequently tag , so the size of M is 2000 × 2000

• <we do not cut off the “long tail”>

• After a sequence of click by the user, the intention of the user will be more specific, this causes a decreasing number of related URLs or related tags.

• When the number is less than 2000, all the tags and URLs will be calculated

Page 20: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations20

Enhanced Models

• User’s profile:• The user interested annotations and resources can be f

ound as follows:

• Ri denotes the vector representation of a resource, and Ti denotes the vector representation of Ai.

• Adjust the sampling and ranking algorithms according to the user’s preference:– Infor (t,U) = α × Infor (t) + β ×UI (t | P(U))

Page 21: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations21

Enhanced Models

Given the user required time interval TI= [ts, te]. We define the match of the URL’s time sequence TS and the user required time interval TI as follows:

θ=0.5

Page 22: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations22

Experiment results

• The scale of the dataset:

• Machine: Intel Pentium IV 3.0 GHz, 1GB memory, 2 processors

• Java • Lucene API is also used to build URL and Tag index.

Del.icio.us (May, 2006)

1,736,268 web pages

269,566 different annotations

Page 23: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations23

Experiment results

Red tag: owned by user

Orange tag: recommended

Page 24: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations24

Experiment results

Page 25: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations25

Conclusion

• Our main contributions:• The proposal of the effective algorithm – ELSABer b

ased on the analysis of social annotation’s characteristics.

• The proposal of enhanced models for personalized and time related browsing.

Page 26: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations26

Future work

• more user studies• emphasize on how to find more qualified URL resour

ces• utilize existing hierarchical structures such as ODP an

d WordNet for helping construct more meaningful hierarchical structures for social annotations.

Page 27: 2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007

2008/06/06 Y.H.ChangTowards Effective Browsing of Large

Scale Social Annotations27

• Thank you!!