24
Thesis Proposal: Prediction of popular social annotations Abon

Thesis Proposal: Prediction of popular social annotations Abon

Embed Size (px)

Citation preview

Thesis Proposal:Prediction of popular

social annotations

Abon

Outline

Background Related Work Problem Definition Possible Solution Experiment Plan Evaluation Plan

Background

Prevalence of social web services e.g.

MY WEBSITE

WHAT DO THEY HAVE IN COMMONTAGS & User Generated Content

BackgroundTAGs are for ?

According to del.icio.us founderTags are one-word descriptors that you can assign

to your bookmarks on del.icio.us to help you organize and remember them. Tags are a little bit like keywords, but they're chosen by you, and they do not form a hierarchy. You can assign as many tags to a bookmark as you like and rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.

Blah blah blah…..

BackgroundTAGs are for ?

According to del.icio.us founderTags are one-word descriptors that you can assign

to your bookmarks on del.icio.us to help you organize and to remember them. Tags are a little bit like keywords, but they're chosen by you, and they do not form a hierarchy. You can assign as many tags to a bookmark as you like and rename or delete the tags later. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.

BackgroundAn usage example

Why TAGs are useful

In Information Retrieval field, it is a common

technique to expand query to get more related data.

Tags are like human-expanded index term.

Query expansion here

Why TAGs are useful

Traditional term expansion scheme relies on term-document relations. And each tag’s importance to a document is often determined by tf-idf.

For each tag user applies, it is like voting for what tag should be with some document. Thus the term-document relations could be measured by tag applications.

Why TAGs are useful

Tags are human-expanded query set which enables more complete concept mapping.

With more and more people applying tags,

the popularity of tags reach a stable pattern.

and top tags could be used as weighting parameters for search optimization

Related Work

Usage patterns of collaborative tagging systems J. Inf. Sci., Vol. 32, No. 2. (April 2006), pp. 198-208.by Golder SA, Huberman BA .

100+ users , stable pattern appear Urn model

Stable pattern: top 7 tags remain for one year+

Related Work

Collaborative Tagging and Semiotic Dynamics

Cattuto C,LoretoV, Pietronero L. Long-term memory version of the classic Yul

e–Simon process Memory model based on cognitive model

Yule–Simon process

Qt (x) = a(t)/(x + τ). a(t) is a normalizing factor τis memory parameter

Related work

The Complex Dynamics of Collaborative Tagging,'‘

H.~Halpin,V.~Robu,H.~Shepherd in Proceedings of WWW 2007

Empirical Results for Power Law Regression for Popular Sites

P(x) :

tag probability distribution at each time point

Q(x) :

The final tag probability distribution

Problem definition In initial stage, each url is not sufficiently annot

ated by people. Thus, it is hard to be retrieved at this time.

For an immature url, predicting future popular tags could provide better retrieval experience.

Mature url : Borrowed from [Halpin] ‘s empirical results for tag dynamics. They are defined as

urls with 3+ more years of history on del.icio.us

Expanding tag set

Ti{ } : The tag set applied by the ith user for an url.

ETi {}:The expanded tag set after the ith user.

T0{ } : The tag set suggested by tf-idf term extraction. STi=T0

ETi=ETi-1 relevant∪ n(Ti)

relevantn(Ti)=The n tags with top mutual information to each tag in Ti

Mutual information: f(ti,tj)/f(ti)*f(tj)

Cohesivity

Each tag in ETi has a score which indicates its cohesivity to ETi

cohesivity of tj to ETi Σf(tk,tj)/f(tj)*f(tk)

tk belongs toETi

Pruning ETi

1. Sort tags in ETi by popularity , take top 7 as suggesting tag set STi

2. Sort tags in ETi by popularity*cohesivity , take top 7 as suggesting tag set STi

Experiment Plan

Dataset from del.icio.us rss api Mar 28~April 19, 30000 of url, 234982 of tagging, 8392 of users

1.del.icio.us/rss/popular every 30min

del.icio.us/rss/recent every 2 min

2.del.icio.us/rss/url?url= xxx.com Suggesting tags from no user to the 10th use

r.

Evaluation Plan

For each url, we have mature tags and suggested tags at each iteration.

Recall rate and precision rate could be calculated .

with without

with 4. 2.

without 3. 1.BaselineExpanding with relevant tags

Pruning with cohesivity