Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

TU Graz – Knowledge Management Institute

Hypertext 2010, June 15th, 20101

Of Categorizers and Describers: An Evaluation of Quantitative

Measures for Tagging MotivationChristian Körner, Roman Kern, Hans-Peter Grahsl, Markus Strohmaier

Knowledge Management Institute and Know-CenterGraz University of Technology, Austria



Introduction

Lots of research on folksonomies, their structure and the resulting dynamics

What we do not know are the reasons and motivations users have when they tag.

Question: Why do users tag?

2



Motivation

Knowledge about intuitions why users are tagging would help to answer a number of current research questions:

What are possible improvements for tag recommendation?

What are suitable search terms for items in these systems?

How can we enhance ontology learning?

…

There already exist models for tagging motivation such as [Nov2009] and [Heckner2009].

BUT: These models rely on expert judgements

Automatic measures for inference of tagging motivation are important!

3



Presentation Overview

• Research questions

• Two types of tagging motivation

• Approximating tagging motivation

• Experiments and results– Quantitative Evaluation– Qualitative Evaluation

4



Questions

Can tagging motivation be approximated with statistical measures?

What are measures which enable the inference if a given user has a certain motivation?

Which of these measures perform best to differentiate between different types of tagging motivation?

Does the distinction of the proposed tagging motivation types have an influence on the tagging process?

5



Types of Tagging Motivations

6

Categorizer Describer

Goal later browsing later retrieval

Change of vocabulary costly cheap

Size of vocabulary limited open

Tags subjective objective

Tag reuse frequent rare

Tag purpose mimicking taxonomy descriptive labels

In the “real world” users are driven by a combination of both motivations– e.g. using tags as descriptive labels while maintaining a

few categories[Körner2009]



Terminology

Folksonomies are usually represented by tripartite graphs with hyper edges

Three different disjoint sets:– a set of users u ∈ U– a set of tags t ∈ T– a set of resources r ∈ R

A folksonomy is defined as a set of annotations F ⊆ U x T x R

Personomy is the reduction of a folksonomy F to a user u

A tag assignment (tas) is one specific triple of one user u, tag t and resource r.

7



Based on different intuitions various measures for the differentiation were developed:

• Tag/Resource Ratio (trr)– How many tags does a user use?

• Orphaned Tag Ratio– How many tags of a users vocabulary are attached to only few resources?

• Conditional Tag Entropy– How well does a user “encode” resources with his tags?

Approximating Tagging Motivation / 1

8

Authors Categories of Tagging Motivation Detection Evidence Reasoning Systems investi-gated

# of users Resourcesper user

Coates2005 [3]

Categorization, description Expertjudgment

Anecdotal Inductive Weblog 1 N/A

Hammondet al. 2005[5]

Self/self, self/others, others/self,others/others

Expertjudgment

Observation Inductive 9 different tag-ging systems

N/A N/A

Golder etal. 2006[4]

What it is about, what it is, whoowns it, refining categories, identify-ing qualities, self reference, task or-ganizing

Expertjudgment

Dataset Inductive Delicious 229 300 (aver-age)

Marlow etal. 2006[14]

Organizational, social, [and refine-ments]

Expertjudgment

N/A Deductive Flickr 10(25,000)

100 (mini-mum)

Xu et al.2006 [21]

Content-based, context-based,attribute-based, subjective, organi-zational

Expertjudgment

N/A Deductive N/A N/A N/A

Sen et al.2006 [18]

Self-expression, organizing, learning,finding, decision support

Expertjudgment

Priorexperience

Deductive MovieLens 635(3,366)

N/A

Wash et al.2007 [20]

Later retrieval, sharing, social recog-nition, [and others]

Expertjudgment

Interviews(semistruct.)

Inductive Delicious 12 950 (aver-age)

Ames et al.2007 [1]

Self/organization,self/communication,social/organization,social/communication

Expertjudgment

Interviews(in-depth)

Inductive Flickr, ZoneTag 13 N/A

Heckner etal. 2009[6]

Personal information management,resource sharing

Expertjudgment

Survey(M. Turk)

Deductive Flickr, Youtube,Delicious, Con-notea

142 20 and5 (mini-mum)

Nov et al.2009 [16]

enjoyment, commitment, self devel-opment, reputation

Expertjudgment

Survey (e-mail)

Deductive Flickr (PROusers only)

422 2,848.5(average)

Strohmaieret al. 2009[19]

Categorization, description Automatic Simulation Deductive 7 differentdatasets

2277 1,267.53(average)

Table 1: Overview of Research on Users’ Motivation for Tagging in Social Tagging Systems

users in the real world would likely be driven by a combina-

tion of both motivations, for example following a description

approach to annotating most resources, while at the same

time maintaining a few categories. Table 2 gives an overview

of different intuitions about the two types of tagging moti-

vation.

Categorizer Describer

Goal later browsing later retrievalChange of vocabulary costly cheapSize of vocabulary limited openTags subjective objectiveTag reuse frequent rareTag purpose mimicking taxonomy descriptive labels

Table 2: Intuitions about Categorizers and De-

scribers

4. MEASURES FOR TAGGING

MOTIVATION

In the following measures which capture properties of the

two types of tagging motivation (Table 2) are introduced.

4.1 Terminology

Folksonomies are usually represented by tripartite graphs

with hyper edges. Such graphs hold three finite, disjoint sets

which are 1) a set of users u ∈ U , 2) a set of resources r ∈ Rand 3) a set of tags t ∈ T annotating resources R. A folkson-

omy as a whole is defined as the annotations F ⊆ U × T ×R(cf. [15]). Subsequently a personomy of a user u ∈ U is the

reduction of a folksonomy F to the user u ([8]). In the fol-

lowing a tag assignment (tas = (u,t,r); tas ∈ TAS) is a

specific triple of one user u ∈ U , one tag t ∈ T and one

resource r ∈ R.

4.2 Tag/Resource Ratio (trr)

Tag/resource ratio relates the vocabulary size of a user

to the total number of resources annotated by this user.

Describers, who use a variety of different tags for their re-

sources, can be expected to score higher values for this mea-

sure than categorizers, who use fewer tags. Due to the lim-

ited vocabulary, a categorizer would likely achieve a lower

score on this measure than a describer who employs a the-

oretically unlimited vocabulary. Equation 1 shows the for-

mula used for this calculation where Ru represents the re-

sources which were annotated by a user u. What this mea-

sure does not reflect on is the average number of assigned

tags per post.

trr(u) =|Tu||Ru| (1)

4.3 Orphaned Tag Ratio

To capture tag reuse, the orphan tag ratio of users char-

acterizes the degree to which users produce orphaned tags.Orphaned tags are tags that are assigned to few resources

only, and therefore are used infrequently. The orphaned tagratio captures the percentage of items in a user’s vocabulary

that represent such orphaned tags. In equation 2 T ou denotes

the set of orphaned tags in a user’s tag vocabulary Tu based

on a threshold n. The threshold n is derived from each user’s

individual tagging style in which tmax denotes the tag that

was used the most. |Ru(t)| denotes the number of resources

which are tagged with tag t by user u. The measure ranges

from 0 to 1 where a value of 1 identifies users who use or-

phaned tags frequently and 0 identifies users who maintain

a more consistent vocabulary. Considering the categorizer -

describer paradigm this would mean that categorizers would

!" #$%&& #'()*+,-./0 #12 #3(4567 #8"9():0,0/; #8;(/-< #8*:()/8:09(#

8=8)-.)" #8/0=8:0./ #8/:>).?.*.;< #8)->0:(-:+)( #8): #8,08#

8,:)./.=<#4()*+,-./0 #4*.; #4)+,>(, #-*0=8:( #-=, #-.=0-,#

-.=?8:040*0:< #-,, #-+*:+)( #"(,0;/ #".-, #"..=,"8<#

(-./.=0-, #(/();< #(/90)./=(/: #(@?()0=(/:8*#

A*8,>#A*8,>"(9 #A)((#A+/#;(/08*0:< #;)8?>0-,#>8-B,#>(8*:> #>0,:.)< #>+=.) #0-./, #0"(/:0:< #0**+,:)8:0./#

0/"08#0/,?0)8:0./#0/:()8-:0./#0/+:0*0:0(,#0)8/#0)8C#0:8*<#D898,-)0?: #D.4 # *.;., #=8- #=8A08 #=80/,:)(8= #=("08#=0,:()0"0:8*08 #=.90(, #=+,0-#/890;8:0./#/()" #/(E,#?8::()/#

?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.#?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(#,(.#

,>.-BE89(#,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>-#:.))(/:#

:)89(* #:+:.)08* #:9 #:<?( #:<?.;)8?>< #+:0*0:0(, #90"(.#E8)#E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#F(0:;(0,:

Figure 1: Tag cloud example of a categorizer. Fre-quency among tags is balanced, a potential indicatorfor using the tag set as an aid for navigation.

be expected to be represented by values closer to 0 because

orphaned tags would introduce noise to their personal tax-

onomy. For a describer’s tag vocabulary, it would be repre-

sented by values closer to 1 due to the fact that describers

tag resources in a verbose and descriptive way, and do not

mind the introduction of orphaned tags to their vocabulary.

orphan(u) =|T o

u ||Tu|

, Tou = {t||R(t)| ≤ n}, n =

‰|R(tmax)|

100

ı

(2)

4.4 Conditional Tag Entropy (cte)

For categorizers, useful tags should be maximally discrim-

inative with regard to the resources they are assigned to.

This would allow categorizers to effectively use tags for nav-

igation and browsing. This observation can be exploited

to develop a measure for tagging motivation when viewing

tagging as an encoding process, where entropy can be con-

sidered a measure of the suitability of tags for this task. A

categorizer would have a strong incentive to maintain high

tag entropy (or information value) in her tag cloud. In other

words, a categorizer would want the tag-frequency as equally

distributed as possible in order for her to be useful as a

navigational aid. Otherwise, tags would be of little use in

browsing. A describer on the other hand would have little

interest in maintaining high tag entropy as tags are not used

for navigation at all.

In order to measure the suitability of tags to navigate

resources, we develop an entropy-based measure for tagging

motivation, using the set of tags and the set of resources as

random variables to calculate conditional entropy. If a user

employs tags to encode resources, the conditional entropy

should reflect the effectiveness of this encoding process:

H(R|T ) = −X

r∈R

X

t∈T

p(r, t)log2(p(r|t)) (3)

The joint probability p(r, t) depends on the distribution

!"#$%&!'(%#)&*))+, &-(%$+.(+ &/01 & 2).#3, & $44# &$44#,,(5(3(.6 &$%7(8 & $99"#9$.()8 &$9(3# &$( &$(" &$:$; &$7$<)8&

$8$36<# &$8. &$=$4>#&$=( &$==3# &$==, &$". &$?%() &$?,."($ &$?.) & $@, &5$4+?= &5$"4$7=&5$"4)%# &5$6#,&

5#>$'()?" &5#"3(8 &5(3%?89,?893#(4>>#(.#8 &53)9, &5))+ &5))+7$"+3#., &5))+, &5"$8%&

5")@,#"&5?,(8#,,&4$4># &4$+#=>=&4$+#=>==&4$3#8%$" &4$8'$,&4$=(,."$8) &4>$".,&43$,,#,&47,&

4)4)$ &4)33$5)"$.()8 & 4)8A#"#84# &4)8.(8?)?,(8.#9"$.()8 & 4)8."$4., & 4))+(89 & 4)=6"(9>. &4"?(,#4)8.")3 &4"6=. &4,, &4?",)"&

%$.$,)?"4#&%#5?9 &%#3B(4()B?,&%#=3)67#8.&%#,(98&%#'&%#'>)?,# &%)7$(8&%),&%)@83)$%&

%"&%,3&#5))+&#4C&#43(=,#&#4)8)76&#%(.)"&#3#$"8(89&#3#4.()8&#7$(3&#;=#"(7#8.&A$4#5))+&A(8$84($3&A(8$8<#8&A("#5?9&

A("#A);&A("7@$"#&A3$,>&A3#;&A36&A)8%,&A)"7,&A"$7#@)"+&A"##3$84#"&A"(.<&A?8&9$33#"6&9$7#&9$.#@$6&9#$",&9#..#;.&

9(.&9))93# &9))93#7$=,&9"#$,#7)8+#6 &9.%&9?( &>$8%6&>(9>3(9>.(89&>()9( &>),.(89&>.$44#,, &>.73&>?3?&(DE8&(F)%&($&(4)8,&(%#&(#&(8A)&(8A)"7$.()8&(8>$3.,,.)AA#&(8.#"(#?"&(8.#"'(#@&(8')(4#&(=>)8#&(=)%&(,4>93&(,)&:G?#"6&:#,?,&:)5,&

:H?#"6&:,&:,3(8.&:,= &:?"(,.(,4>#,&+#65)$"%&3$,.A7&3$.#;&3#$"8&3#5#8,7(..#3 &3#9$3&3(5"$"6&3(A#>$4+,&3)9) &3)+$3&

7$4&7$9$<(8#&7$(3 &7$=&7$=,&7$"+#.(89 &7$.>#7$.(4, &7#%($ &7#,,$9#&7(9"$.()8 &7)5(3# &7)8#6&

7)'(# &7,H3 &7?,(4 &7'4 &76,H3&7I84>#8&8% &8#@, &)=#8(% &), &=C=&=$..#"8 &=$..#"8,&=$6=$3 &=#"A)"7$84#&

=>)8# &=>).) &=>=&=3?9(8 &=7 &=89 &=)%4$,. &=)3(.(4, &=),. &="(8. &="('$.# &=")4#,, &=")9"$77(89&

=").).6=(89&=");6&=,64>)3)9(#&H7&H,&"$.(89,&"#$%&"#$%3(,.&"#4(=#,&"#A#"#84#&"#7#75#"&"#8.#&"#=?53(4$&

"#,#$"4>&"#,,)?"4#,&"#,.$?"$8.&"#.>)"(4,&"#<#=.#&",,&"?56&"?>"9#5(#.&"I4+#8&,J&,$A$"(&,$8%"$&,4$3$5(3(.6&,4>$4>&,4>))3 &,4>@$"7'#">$3.#8 &,4"##8,>). &,4"?7 &,#$"4> &,#)&,#"(#, &,>)= &,>)==(89&,+( &,+6=# &,3(%#,&,)$&

,)4($3&,)A.@$"#&,)?8%&,=$7&,H3&,.$".?=&,.$.,&,.#"5#8&,.?%6&,?5.(.3#,&,?5'#",()8&,'8&.$;&.#,.&

.#,.(89 &.#;.7$.# &.#;., & .>(8+=$% & .(7# & .))3 &.))3, &."$4 & ."$'#3 &.?.)"($3&.' &.6=)9"$=>6 &?5?8.? & ?8)5."?,('#&?8.#""(4>. &?">#5#""#4>. &?"3$?5 &?,$5(3(.6 &?.AE &'$4$.()8 & '($K7#8.)B(8A) &'(%#) &'(,?$3(<$.()8 &')(= &@$8%#"8&

@#$.>#"&@#5&@#5CBL&@#54$7&@#5%#,(98&@#,.@(89&@#..#"&@(%9#.,&@(+(&@(8%)@,&@)>8?89&@)"%&

@)"%="#,,&@)"+&;%#5?9&;>.73&;=&<#8%&<(.$.#&

Figure 2: Tag cloud example of a describer. Sometags are used often while many others are rarelyused - a distribution that can be expected whenusers tag in a descriptive, ad-hoc manner.

of tags over the resources. The conditional entropy can be

interpreted as the uncertainty of the resource that remains

given a tag. The conditional entropy is measured in bits and

is influenced by the number of resources and the tag vocab-

ulary size. To account for individual differences in users, we

propose a normalization of the conditional entropy so that

only the encoding quality remains. As a factor of normaliza-

tion we can calculate the conditional entropy Hopt(R|T ) of

an ideal categorizer, and relate it to the actual conditional

entropy of the user at hand. Calculating Hopt(R|T ) can be

accomplished by modifying p(r, t) in a way that reflects a sit-

uation where all tags are equally discriminative while at the

same time keeping the average number of tags per resource

the same as in the user’s personomy.

Based on this, we can define a measure for tagging mo-

tivation by calculating the difference between the observed

conditional entropy and the conditional entropy of an ideal

categorizer put in relation to the conditional entropy of the

ideal categorizer:

cte =H(R|T )−Hopt(R|T )

Hopt(R|T )(4)

4.5 Overlap Factor

When users assign more than one tag per resource on aver-

age, it is possible that they produce an overlap (i.e. intersec-

tion with regard to the resource sets of corresponding tags).

The overlap factor allows to measure this phenomenon by

relating the number of all resources to the total number of

tag assignments of a user and is defined as follows:

overlap = 1− |Ru||TASu|

(5)

We can speculate that categorizers would be interested in

keeping this overlap relatively low in order to be able to

produce discriminative categories, i.e. categories that are

!" #$%&& #'()*+,-./0 #12 #3(4567 #8"9():0,0/; #8;(/-< #8*:()/8:09(#

8=8)-.)" #8/0=8:0./ #8/:>).?.*.;< #8)->0:(-:+)( #8): #8,08#

8,:)./.=<#4()*+,-./0 #4*.; #4)+,>(, #-*0=8:( #-=, #-.=0-,#

-.=?8:040*0:< #-,, #-+*:+)( #"(,0;/ #".-, #"..=,"8<#

(-./.=0-, #(/();< #(/90)./=(/: #(@?()0=(/:8*#

A*8,>#A*8,>"(9 #A)((#A+/#;(/08*0:< #;)8?>0-,#>8-B,#>(8*:> #>0,:.)< #>+=.) #0-./, #0"(/:0:< #0**+,:)8:0./#

0/"08#0/,?0)8:0./#0/:()8-:0./#0/+:0*0:0(,#0)8/#0)8C#0:8*<#D898,-)0?: #D.4 # *.;., #=8- #=8A08 #=80/,:)(8= #=("08#=0,:()0"0:8*08 #=.90(, #=+,0-#/890;8:0./#/()" #/(E,#?8::()/#

?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.#?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(#,(.#

,>.-BE89(#,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>-#:.))(/:#

:)89(* #:+:.)08* #:9 #:<?( #:<?.;)8?>< #+:0*0:0(, #90"(.#E8)#E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#F(0:;(0,:








orphan(u) =|T o

u ||Tu|

, Tou = {t||R(t)| ≤ n}, n =

‰|R(tmax)|

100

ı

(2)























H(R|T ) = −X

r∈R

X

t∈T

p(r, t)log2(p(r|t)) (3)


!"#$%&!'(%#)&*))+, &-(%$+.(+ &/01 & 2).#3, & $44# &$44#,,(5(3(.6 &$%7(8 & $99"#9$.()8 &$9(3# &$( &$(" &$:$; &$7$<)8&

$8$36<# &$8. &$=$4>#&$=( &$==3# &$==, &$". &$?%() &$?,."($ &$?.) & $@, &5$4+?= &5$"4$7=&5$"4)%# &5$6#,&

5#>$'()?" &5#"3(8 &5(3%?89,?893#(4>>#(.#8 &53)9, &5))+ &5))+7$"+3#., &5))+, &5"$8%&

5")@,#"&5?,(8#,,&4$4># &4$+#=>=&4$+#=>==&4$3#8%$" &4$8'$,&4$=(,."$8) &4>$".,&43$,,#,&47,&

4)4)$ &4)33$5)"$.()8 & 4)8A#"#84# &4)8.(8?)?,(8.#9"$.()8 & 4)8."$4., & 4))+(89 & 4)=6"(9>. &4"?(,#4)8.")3 &4"6=. &4,, &4?",)"&

%$.$,)?"4#&%#5?9 &%#3B(4()B?,&%#=3)67#8.&%#,(98&%#'&%#'>)?,# &%)7$(8&%),&%)@83)$%&

%"&%,3&#5))+&#4C&#43(=,#&#4)8)76&#%(.)"&#3#$"8(89&#3#4.()8&#7$(3&#;=#"(7#8.&A$4#5))+&A(8$84($3&A(8$8<#8&A("#5?9&

A("#A);&A("7@$"#&A3$,>&A3#;&A36&A)8%,&A)"7,&A"$7#@)"+&A"##3$84#"&A"(.<&A?8&9$33#"6&9$7#&9$.#@$6&9#$",&9#..#;.&

9(.&9))93# &9))93#7$=,&9"#$,#7)8+#6 &9.%&9?( &>$8%6&>(9>3(9>.(89&>()9( &>),.(89&>.$44#,, &>.73&>?3?&(DE8&(F)%&($&(4)8,&(%#&(#&(8A)&(8A)"7$.()8&(8>$3.,,.)AA#&(8.#"(#?"&(8.#"'(#@&(8')(4#&(=>)8#&(=)%&(,4>93&(,)&:G?#"6&:#,?,&:)5,&

:H?#"6&:,&:,3(8.&:,= &:?"(,.(,4>#,&+#65)$"%&3$,.A7&3$.#;&3#$"8&3#5#8,7(..#3 &3#9$3&3(5"$"6&3(A#>$4+,&3)9) &3)+$3&

7$4&7$9$<(8#&7$(3 &7$=&7$=,&7$"+#.(89 &7$.>#7$.(4, &7#%($ &7#,,$9#&7(9"$.()8 &7)5(3# &7)8#6&

7)'(# &7,H3 &7?,(4 &7'4 &76,H3&7I84>#8&8% &8#@, &)=#8(% &), &=C=&=$..#"8 &=$..#"8,&=$6=$3 &=#"A)"7$84#&

=>)8# &=>).) &=>=&=3?9(8 &=7 &=89 &=)%4$,. &=)3(.(4, &=),. &="(8. &="('$.# &=")4#,, &=")9"$77(89&

=").).6=(89&=");6&=,64>)3)9(#&H7&H,&"$.(89,&"#$%&"#$%3(,.&"#4(=#,&"#A#"#84#&"#7#75#"&"#8.#&"#=?53(4$&

"#,#$"4>&"#,,)?"4#,&"#,.$?"$8.&"#.>)"(4,&"#<#=.#&",,&"?56&"?>"9#5(#.&"I4+#8&,J&,$A$"(&,$8%"$&,4$3$5(3(.6&,4>$4>&,4>))3 &,4>@$"7'#">$3.#8 &,4"##8,>). &,4"?7 &,#$"4> &,#)&,#"(#, &,>)= &,>)==(89&,+( &,+6=# &,3(%#,&,)$&

,)4($3&,)A.@$"#&,)?8%&,=$7&,H3&,.$".?=&,.$.,&,.#"5#8&,.?%6&,?5.(.3#,&,?5'#",()8&,'8&.$;&.#,.&

.#,.(89 &.#;.7$.# &.#;., & .>(8+=$% & .(7# & .))3 &.))3, &."$4 & ."$'#3 &.?.)"($3&.' &.6=)9"$=>6 &?5?8.? & ?8)5."?,('#&?8.#""(4>. &?">#5#""#4>. &?"3$?5 &?,$5(3(.6 &?.AE &'$4$.()8 & '($K7#8.)B(8A) &'(%#) &'(,?$3(<$.()8 &')(= &@$8%#"8&

@#$.>#"&@#5&@#5CBL&@#54$7&@#5%#,(98&@#,.@(89&@#..#"&@(%9#.,&@(+(&@(8%)@,&@)>8?89&@)"%&

@)"%="#,,&@)"+&;%#5?9&;>.73&;=&<#8%&<(.$.#&




















ideal categorizer:


Hopt(R|T )(4)

4.5 Overlap Factor








(5)







• Overlap Factor– Are tags used as discriminative categories?

• Tag/Title Intersection Ratio (ttr)– How likely does a user choose words

from the title as tags?

9

!" #$%&& #'()*+,-./0 #12 #3(4567 #8"9():0,0/; #8;(/-< #8*:()/8:09(#

8=8)-.)" #8/0=8:0./ #8/:>).?.*.;< #8)->0:(-:+)( #8): #8,08#

8,:)./.=<#4()*+,-./0 #4*.; #4)+,>(, #-*0=8:( #-=, #-.=0-,#

-.=?8:040*0:< #-,, #-+*:+)( #"(,0;/ #".-, #"..=,"8<#

(-./.=0-, #(/();< #(/90)./=(/: #(@?()0=(/:8*#

A*8,>#A*8,>"(9 #A)((#A+/#;(/08*0:< #;)8?>0-,#>8-B,#>(8*:> #>0,:.)< #>+=.) #0-./, #0"(/:0:< #0**+,:)8:0./#

0/"08#0/,?0)8:0./#0/:()8-:0./#0/+:0*0:0(,#0)8/#0)8C#0:8*<#D898,-)0?: #D.4 # *.;., #=8- #=8A08 #=80/,:)(8= #=("08#=0,:()0"0:8*08 #=.90(, #=+,0-#/890;8:0./#/()" #/(E,#?8::()/#

?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.#?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(#,(.#

,>.-BE89(#,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>-#:.))(/:#

:)89(* #:+:.)08* #:9 #:<?( #:<?.;)8?>< #+:0*0:0(, #90"(.#E8)#E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#F(0:;(0,:








orphan(u) =|T o

u ||Tu|

, Tou = {t||R(t)| ≤ n}, n =

‰|R(tmax)|

100

ı

(2)























H(R|T ) = −X

r∈R

X

t∈T

p(r, t)log2(p(r|t)) (3)


!"#$%&!'(%#)&*))+, &-(%$+.(+ &/01 & 2).#3, & $44# &$44#,,(5(3(.6 &$%7(8 & $99"#9$.()8 &$9(3# &$( &$(" &$:$; &$7$<)8&

$8$36<# &$8. &$=$4>#&$=( &$==3# &$==, &$". &$?%() &$?,."($ &$?.) & $@, &5$4+?= &5$"4$7=&5$"4)%# &5$6#,&

5#>$'()?" &5#"3(8 &5(3%?89,?893#(4>>#(.#8 &53)9, &5))+ &5))+7$"+3#., &5))+, &5"$8%&

5")@,#"&5?,(8#,,&4$4># &4$+#=>=&4$+#=>==&4$3#8%$" &4$8'$,&4$=(,."$8) &4>$".,&43$,,#,&47,&

4)4)$ &4)33$5)"$.()8 & 4)8A#"#84# &4)8.(8?)?,(8.#9"$.()8 & 4)8."$4., & 4))+(89 & 4)=6"(9>. &4"?(,#4)8.")3 &4"6=. &4,, &4?",)"&

%$.$,)?"4#&%#5?9 &%#3B(4()B?,&%#=3)67#8.&%#,(98&%#'&%#'>)?,# &%)7$(8&%),&%)@83)$%&

%"&%,3&#5))+&#4C&#43(=,#&#4)8)76&#%(.)"&#3#$"8(89&#3#4.()8&#7$(3&#;=#"(7#8.&A$4#5))+&A(8$84($3&A(8$8<#8&A("#5?9&

A("#A);&A("7@$"#&A3$,>&A3#;&A36&A)8%,&A)"7,&A"$7#@)"+&A"##3$84#"&A"(.<&A?8&9$33#"6&9$7#&9$.#@$6&9#$",&9#..#;.&

9(.&9))93# &9))93#7$=,&9"#$,#7)8+#6 &9.%&9?( &>$8%6&>(9>3(9>.(89&>()9( &>),.(89&>.$44#,, &>.73&>?3?&(DE8&(F)%&($&(4)8,&(%#&(#&(8A)&(8A)"7$.()8&(8>$3.,,.)AA#&(8.#"(#?"&(8.#"'(#@&(8')(4#&(=>)8#&(=)%&(,4>93&(,)&:G?#"6&:#,?,&:)5,&

:H?#"6&:,&:,3(8.&:,= &:?"(,.(,4>#,&+#65)$"%&3$,.A7&3$.#;&3#$"8&3#5#8,7(..#3 &3#9$3&3(5"$"6&3(A#>$4+,&3)9) &3)+$3&

7$4&7$9$<(8#&7$(3 &7$=&7$=,&7$"+#.(89 &7$.>#7$.(4, &7#%($ &7#,,$9#&7(9"$.()8 &7)5(3# &7)8#6&

7)'(# &7,H3 &7?,(4 &7'4 &76,H3&7I84>#8&8% &8#@, &)=#8(% &), &=C=&=$..#"8 &=$..#"8,&=$6=$3 &=#"A)"7$84#&

=>)8# &=>).) &=>=&=3?9(8 &=7 &=89 &=)%4$,. &=)3(.(4, &=),. &="(8. &="('$.# &=")4#,, &=")9"$77(89&

=").).6=(89&=");6&=,64>)3)9(#&H7&H,&"$.(89,&"#$%&"#$%3(,.&"#4(=#,&"#A#"#84#&"#7#75#"&"#8.#&"#=?53(4$&

"#,#$"4>&"#,,)?"4#,&"#,.$?"$8.&"#.>)"(4,&"#<#=.#&",,&"?56&"?>"9#5(#.&"I4+#8&,J&,$A$"(&,$8%"$&,4$3$5(3(.6&,4>$4>&,4>))3 &,4>@$"7'#">$3.#8 &,4"##8,>). &,4"?7 &,#$"4> &,#)&,#"(#, &,>)= &,>)==(89&,+( &,+6=# &,3(%#,&,)$&

,)4($3&,)A.@$"#&,)?8%&,=$7&,H3&,.$".?=&,.$.,&,.#"5#8&,.?%6&,?5.(.3#,&,?5'#",()8&,'8&.$;&.#,.&

.#,.(89 &.#;.7$.# &.#;., & .>(8+=$% & .(7# & .))3 &.))3, &."$4 & ."$'#3 &.?.)"($3&.' &.6=)9"$=>6 &?5?8.? & ?8)5."?,('#&?8.#""(4>. &?">#5#""#4>. &?"3$?5 &?,$5(3(.6 &?.AE &'$4$.()8 & '($K7#8.)B(8A) &'(%#) &'(,?$3(<$.()8 &')(= &@$8%#"8&

@#$.>#"&@#5&@#5CBL&@#54$7&@#5%#,(98&@#,.@(89&@#..#"&@(%9#.,&@(+(&@(8%)@,&@)>8?89&@)"%&

@)"%="#,,&@)"+&;%#5?9&;>.73&;=&<#8%&<(.$.#&




















ideal categorizer:


Hopt(R|T )(4)

4.5 Overlap Factor








(5)




free from intersections. On the other hand, describers would

not care about a possibly high overlap factor since they do

not use tags for navigation but instead aim to best support

later retrieval.

4.6 Tag/Title Intersection Ratio (ttr)

In order to address the objectiveness or subjectiveness of

tags, we introduce the tag/title intersection ratio which is

an indicator how likely users choose tags from the words of

a resource’s title (e.g. the title of a web page). This measure

is calculated by taking the intersection of the tags and the

resource’s title words of a specific user. At first, all resource

titles occurring in a personomy are tokenized to build the

set of title words TWu. Then we filtered the tags and title

words using the stop-word list which is packaged with the

Snowball1

stemmer. For normalization purposes we relate

the resulting absolute intersection size to the cardinality of

the set of title words.

ttr =|Tu ∩ TWu|

|TWu|(6)

4.7 Properties of the Presented Measures

When examining the five presented measures, we can ob-

serve that the measures focus on tagging behavior of users

as opposed to the semantics of tags. This makes the in-

troduced measures independent of particular languages. An

advantage of this is that the approach is not influenced by

special characters, internet slang or user specific words (e.g.

“to_read”). In addition, the measures evaluate statistical

properties of a single user personomy only; therefore knowl-

edge of the complete folksonomy is not required.

5. EXPERIMENTAL SETUP

5.1 Dataset

For our experiments we used a dataset from Del.icio.us

which is part of a larger collection of tagging datasets which

was crawled from May to June 2009.2

The requirements for

the resulting datasets were the following:

• The datasets should capture complete personomies.

Therefore all public resources and tags of a crawled

user must be contained.

• Each post should be stored in chronological order which

allows to capture changes in the tagging behavior of a

user over time.

• Users who abandoned their accounts with only a few

posts should be eliminated. Thus, a lower bound for

the post count (Rmin) was introduced which in the

case of the Del.icio.us dataset is Rmin = 1000.

The crawled Del.icio.us dataset consists of 896 users who in

total used 184,746 tags to annotate 1,966,269 resources.

5.2 Correlation between Measures

Figure 3 shows the pairwise Spearman rank correlation

of the proposed measures calculated on all 896 users of the

1http://snowball.tartarus.org/

2Details of the datasets can be found in [12]

Resource Tags User A Tags User B

URL 1 Tag 1A,. . . ,Tag nA Tag 1B ,. . . ,Tag mB

.

.

....

.

.

.

Table 3: Resource Alignment - This allows humansubjects to compare tagging behavior of two usersw.r.t. the same resource.

Posts User A - Tag 1 Posts User B - Tag 1

resources tags resources tagsURL 1A t1A,. . . ,tnA URL 1B t1B ,. . . ,tmB

.

.

....

.

.

....

.

.

....

Posts User A - Tag n Posts User B - Tag n

.

.

....

Table 4: Tag Alignment - This allows human sub-jects to compare tagging behavior of two users w.r.t.the same tag.

Del.icio.us dataset. An interesting observation in this con-

text is that although all measures are based on different

intuitions about the motivation for tagging, some of them

correlate to a great extent empirically. The two measures

exhibiting highest correlation are Tag/Resource Ratio and

Tag/Title Intersection Ratio, where the first measure is de-

rived from the number of unique tags and resources and the

second measure is derived from the content of the tags. Ad-

ditionally these two measures also have a relatively high cor-

relation with the other three measures. The remaining mea-

sures appear to form two separate groups. The Orphaned

Tags and Conditional Tag Entropy represent one group of

highly correlated measures whereas the Overlap Factor rep-

resents the other one. It is expected that measures with a

high correlation will also show similar behavior in the eval-

uations.

6. QUALITATIVE EVALUATION

In order to assess the usefulness of the introduced mea-

sures for tagging motivation, we relate each measure to dif-

ferent dimensions of human judgement. Based on a subset

of posts taken from users’ personomies, participants of a hu-

man subject study were given the task to classify whether

a given personomy represents the tagging record of a user

who follows a categorization or a description approach to

tagging.

To perform this task, participants were given random pairs

of Del.icio.us users, for which they had to decide whether a

user is a categorizer or describer. The information available

to the human subjects for this task is depicted in Table 3

and 4.

6.1 Sampling

We assume that each measure is capable of making a dis-

tinction between categorizers and describers by producing

high scores for describers and low scores for categorizers.

For each of the five measures listed in section 4, we ran-

domly drew five user pairs from the Del.icio.us dataset out

of the measure’s top 25% and bottom 25% users, reflecting

Categorizer Describer Proposed Measure

Goal later browsing later retrieval

Change of vocabulary costly cheap

Size of vocabulary limited open Tag/Resource Ratio

Tags subjective objective Tag/Title Intersection Ratio

Tag reuse frequent rare Orphaned Tag Ratio / Cond. Tag Entropy

Tag purpose mimicking taxonomy descriptive labels Overlap Factor




Properties of the developed measures:

• Agnostic to the semantics of used language

• Evaluate behavior of single user (as opposed to complete folksonomy)– no comparison to the complete folksonomy necessary

• Inspect the usage of tags and NOT their semantic meaning– How often are tags used?– How many tags are used on average to annotate a resource?– How good does a user “encode” her resources with tags?

10



Experimental Setup

Delicious dataset– part of a collection of tagging datasets which we crawled from May to June

2009– Captured folksonomy consists of:

• 896 users• 184,746 tags• 1,089,653 resources

Requirements for the dataset– Holding complete personomies

• all tags and resources which were publicly available– Chronological order of the posts should be conserved

• To capture changes in tagging behavior– “Mostly inactive” users who do not have a lot of annotated resources should be

neglected• The lower bound of tagged resources was 1000 in the case of the Delicious dataset

11



Correlation Between Measures

Although all measures were developed based on different intuitions with regards to the underlying motivation some of them correlate to a great extend

Highest correlation:– Cond. Tag Entropy and Orphaned

Tags– Tag/Title Intersection Ratio and

Tag/Resource Ratio

12

min: 0

max: 0.97

OrphanedTags

* **

*

*

*

**

*****

*

**

*

*

**

**

*

**

***

*

*

*

*

**

* ***

* ****

*

*

* ** *

****

*

***

* ***

*

*

***

*

*

**

**

**** *

*

*

**

*

******

**

*

**

*

** *

*

***

*

*

*

**

***

**

**

* **

*

*

*

*

*

*

*

**

* *

*** **

*

*

**

*****

* **

****

*

**

*

*

***

**

**

**

*

*

*

*

*

****

**

**

*

*

**

***

*

**

**

*

*

*

***

*

*

*

*

**

***

*

** ** *

***

**

**

**

*

*

* *

*

*

*

**

*

*

** *

*

**

*

**

*

*

***

**

***

*

**

***

**

**

* *

***

***

**

***

*

*

*

*

*

**

*** ***

**

*

**

**

***

*

*

* ** **

*

*

*

***

**

*

* *

*

**

*

** *

* *

** *

*

*

***

*

*

*

**

***

*

*

**

*

*

**

**

*

*

*

*

**

*

****

*

***

*

*

**

**

*

*

*

*

*

**

**

**

**

*

**

*

*

**

**

****

*

**

**

**

*

**

*

*

****

*

*

** **

***

*

*

**

*

*

**

*

*

*

*

*

**

** *

*

**

*

***

****

* *** *

**

**

*

*

*

*

*

* *

*

*

**

*

**

** **

*

*

**

*

*

** *

*****

***

*

*

***

***

****

*

*

**

**

**

***

*

* *

**

**

***

*

* *

***

*

**

**

*

* *

*

**

*

**

**

**

*

***

****

*

* *

*

**

*

** **

***

***

*

*

*

*

*

**

*

*

**

**

***

** *

*

*

*

**

* **

* *** ***

*

**

* **

*

** *

* *

** *** *

*

* **

**

*

**

**

*

**

*

*

**

*

****

*

****

*

*

*

**

*

***

**

**

*

*

*

**

*

*

* ****

*

* *

*

**

*

**

**

*

*

*

**

*

*

*

**

* *

*****

*

* **

*

*****

* *

**

**

*

***

*

*

*

*

**

*

*

**

**** **

**

***

**

*

*

*

***

*

* **

**

* * *

*

*

**

*

*

*****

**

***

**

**

*

*

*

***

** **

**

*

**

**

*

**

*

*

**

***

*

**

*

** *

*

**

*

**

**

* ** *

*

** *

*

***

***

* **

*

* *

*

****

*

**

* ***

***

*

*

*

*

*

***

**

*

**

**

*

*

*

*

*

*

*

* **

*

*

**

*** **

* **

*

*

*

**

**** *

*

**

*

*

**

**

*

**

****

*

*

*

**

* ***

* ***

*

*

*

* ** *

**

**

*

***

* ***

*

*

****

*

**

**

** ** *

*

*

**

*

*****

**

*

*

**

*

***

*

***

*

*

*

**

** *

**

****

*

*

*

*

*

*

*

*

**

**

*** **

*

*

**

*****

* **

* ***

*

* *

*

*

***

**

**

* *

*

*

*

*

*

****

**

**

*

*

**

***

*

**

**

*

*

*

**

*

*

*

*

*

**

**

*

*

**** *

**

*

****

**

*

*

* *

*

*

*

***

*

***

*

**

*

**

*

*

* **

**

***

*

**

***

**

**

**

***

****

***

**

*

*

*

*

**

*** ** *

**

*

**

**

***

*

*

* *** *

*

*

*

** *

**

*

* *

*

**

*

***

**

***

*

*

***

*

*

*

**

***

*

*

**

*

*

**

**

*

*

*

*

**

*

*** *

*

***

*

*

**

**

*

*

*

*

*

**

**

**

**

*

**

*

*

**

**

****

*

**

**

**

*

**

*

*

*** *

*

*

****

***

*

*

**

*

*

**

*

*

*

*

*

**

** *

*

**

*

***

****

**** *

**

**

*

*

*

*

*

**

*

*

**

*

**

****

*

*

**

*

*

* **

***

*** **

*

*

***

**

**

***

*

*

****

**

**

*

*

**

**

**

***

*

* *

***

*

**

**

*

**

*

**

*

**

**

**

*

***

***

*

*

* *

*

**

*

***

*

***

**

*

*

*

*

*

*

**

*

*

**

* *

***** *

*

*

*

**

* **

* **** *

*

*

**

* **

*

***

**

***

** **

* **

**

*

**

**

*

**

*

*

**

*

** **

*

****

*

*

*

**

*

***

**

**

*

*

*

**

*

*

* ** **

*

**

*

**

*

**

**

*

*

*

**

*

*

*

**

**

*****

*

***

*

*****

**

**

**

*

***

*

*

*

*

**

*

*

**

**** **

**

* *** *

*

*

*

***

*

***

**

* **

*

*

**

*

*

*****

**

***

**

**

*

*

*

***

** **

**

*

**

**

*

**

*

*

**

** *

*

**

*

** *

*

**

*

* *

**

* ***

*

** *

*

***

***

* **

*

* *

*

***

*

*

**

* ***

***

*

*

*

*

*

***

**

*

**

**

*

*

*

*

*

*

*

****

*

**

**

**** **

*

*

*

**

**** *

*

**

*

*

**

**

*

**

***

*

*

*

*

**

* ***

* ***

*

*

*

* ** *

**

**

*

***

* ***

*

*

****

*

**

**

** ** *

*

*

**

*

***

* **

**

*

**

*

***

*

** *

*

*

*

* *

** *

**

****

*

*

*

*

*

*

*

*

**

**

*** **

*

*

**

*****

* **

****

*

* *

*

*

***

**

**

* *

*

*

*

*

*

****

**

**

*

*

**

***

*

**

**

*

*

*

**

*

*

*

*

*

**

**

*

*

**** *

**

*

**

**

**

*

*

**

*

*

*

**

*

*

***

*

**

*

**

*

*

* **

**

***

*

**

***

**

**

**

***

***

**

***

*

*

*

*

*

**

***** *

**

*

**

**

***

*

*

* *** *

*

*

*

** *

**

*

* *

*

**

*

***

* *

* * *

*

*

***

*

*

*

**

***

*

*

**

*

*

**

**

*

*

*

*

**

*

*** *

*

***

*

*

**

**

*

*

*

*

*

**

***

**

*

*

**

*

*

**

**

****

*

**

**

**

*

**

*

*

*** *

*

*

****

***

*

*

**

*

*

**

*

*

*

*

*

**

** *

*

**

*

***

******** *

**

**

*

*

*

*

*

**

*

*

**

*

**

** **

*

*

**

*

*

***

***

*** **

*

*

***

**

****

**

*

**

**

**

**

*

*

* *

**

**

***

*

* *

***

*

**

**

*

**

*

**

*

**

**

**

*

***

***

*

*

* *

*

**

*

****

***

**

*

*

*

*

*

*

**

*

*

**

* *

***

** *

*

*

*

**

* **

* **** *

*

*

**

* **

*

***

**

***

* ***

* **

**

*

**

**

*

**

*

*

**

*

** **

*

* ***

*

*

*

**

*

***

**

**

*

*

*

**

*

*

* ** **

*

**

*

**

*

**

**

*

*

*

**

*

*

*

**

* *

*****

*

***

*

**

*****

**

**

*

***

*

*

*

*

**

*

*

**

**** **

**

***

* *

*

*

*

***

*

***

**

* * *

*

*

**

*

*

** ***

**

****

***

*

*

*

***

** **

**

*

**

**

*

**

*

*

**

** *

*

**

*

** *

*

**

*

* *

**

* ***

*

** *

*

* ****

** **

*

* *

*

***

*

*

**

* ***

***

*

*

*

*

*

***

**

*

**

**

*

*

*

*

*

*

*

****

*

**

**

** ** * *

*

*

*

**

**** *

*

* *

*

*

**

**

*

**

***

*

*

*

*

**

* ***

*** **

*

*

****

**

**

*

***

* ***

*

*

***

*

*

**

**

** ** *

*

*

**

*

** *

* **

**

*

**

*

***

*

* **

*

*

*

**

** *

**

**

***

*

*

*

*

*

*

*

**

**

*** **

*

*

**

***

*** *

*

* ****

* *

*

*

** *

**

**

* *

*

*

*

*

*

****

* *

**

*

*

**

** *

*

**

**

*

*

*

**

*

*

*

*

*

**

**

*

*

* ****

***

****

**

*

*

**

*

*

*

***

*

***

*

* *

*

**

*

*

* **

**

***

*

***

**

**

**

* *

***

***

**

* **

*

*

*

*

*

**

**** * *

**

*

**

**

**

*

*

*

* *** *

*

*

*

** *

**

*

* *

*

**

*

***

**

** *

*

*

***

*

*

*

**

***

*

*

**

*

*

**

**

*

*

*

*

* *

*

****

*

** *

*

*

**

**

*

*

*

*

*

**

**

**

**

*

**

*

*

**

**

****

*

**

**

**

*

**

*

*

****

*

*

*** *

***

*

*

**

*

*

* *

*

*

*

*

*

**

***

*

**

*

***

* *****

** *

**

**

*

*

*

*

*

**

*

*

**

*

**

** * *

*

*

**

*

*

* **

***

*** * *

*

*

** *

**

**

***

*

*

**

**

**

**

*

*

* *

**

**

***

*

* *

***

*

**

**

*

**

*

**

*

****

**

*

** *

***

*

*

**

*

**

*

***

*

** *

**

*

*

*

*

*

*

**

*

*

**

* *

***

** *

*

*

*

**

* **

* **** *

*

*

**

* **

*

***

**

***** *

*

* **

**

*

* *

**

*

**

*

*

**

*

** **

*

** **

*

*

*

**

*

***

**

**

*

*

*

**

*

*

* ** **

*

**

*

**

*

**

**

*

*

*

**

*

*

*

**

**

**** *

*

***

*

**

** ***

**

**

*

***

*

*

*

*

**

*

*

**

**** ***

*

* **

* *

*

*

*

***

*

***

**

***

*

*

**

*

*

** ***

**

* **

**

* *

*

*

*

***

* ***

**

*

**

**

*

**

*

*

**

** *

*

**

*

** *

*

**

*

* *

**

****

*

** *

*

* *****

***

*

**

*

***

*

*

**

* ***

** *

*

*

*

*

*

***

**

*

**

**

*

*

*

*

*

*

*

***

*

*

**

**

** *

0.89

min: 0.06

max: 1.47

ConditionalTag

Entropy * **

*

*

*

**

**

** *

****

*

*

*

**

*** **** **

*

***

***

***

**

*

*

*** *

****

*

***

****

*

*

**

**

*

***

*

** ** **

*

**

*******

**

*

*

*

*

***

*

****

*

*

***

**

* *

* *** *

**

**

*

***

*

**

***

**

*

**

** ******

*

****

** *

*

*

**

***

**

* ***

***

****

**

*

*

*

*

*

* ***

*

**

*****

** *

**

*

*

*

**

**

*

***

**

** *

* ***

***

****

***

**

**

*

*

**

*

***

*

*

* *****

**

******

**

**

**

*

**

****

***

**

*

*

*

** *

*** ** ***

*

*

** ****

*

*

**

** *

*

**

**

*

**

*

* *

*

**

*

***

****

*

**

***

*

*

*

**

****

*

*

*

*

*

***

**

**

***

*

*** **

**

**

*

****

**

**

*

** **

**

**

**

*

**

*

*

**

* ***

*

*

***

*

**

**

*

*

**

* *

**

***

**

**

*

***

*

*

**

*

***

*

*

*

***

*

**

*

***

****

****

*

**

***

*

*

** *

*

****

**

*

****

*

**

*

*

**

**

***

**

****

*

****

*** *

**

**

****

**

** **

*** *

******

*

**

**

*

**

***

**

*

*

*

*

*

***

**

*

* ***

**

*

*

**

*

**

** *

**

***

* **

*

*

*

*

*

**

*

*

*** *

*****

*

*

*** **

**

**

*** *

**

**

* **

*

***

*****

** **

*

*** *

*

**

***

***

*

***

** **

*

***

*

*

*

**

**

***

**

**

*

**

**

*

**

** *** ***

**

*

** **

*

***

*

*

*

*

**

**

******

****

*****

**

**

* **

***

*

*

*

*

* *

*

*

**

****** *

*

* *

** *

***

***

*****

*

**

*

**

**

*

*

*****

**

*******

*

*

*

*

**

** **

* *

*

***

*

*

**

*

*

*

*

*

* *

*

**

*

**

*

*

**

*

* **

*****

*

**

*

*******

***

*

***

** *** **

****

***

*

*

*

**

***

**

***

*

**

**

**

*

* **

**

*

**

** ***

* **

*

*

*

**

**

** *

***

*

*

*

*

**

*** **** *

**

* **

***

***

**

*

*

*** *

** **

*

***

****

*

*

**

**

*

***

*

** ** **

*

**

**** * **

**

*

*

*

*

****

** **

*

*

* **

**

* *

* *** *

**

**

*

***

*

**

***

**

*

**** *****

*

*

****

** *

*

*

**

** *

**

* **

*

***

****

**

*

*

*

*

*

* ***

*

**

*****

** *

**

*

*

*

**

**

*

***

**

** *

* ***

***

****

* ***

*

**

*

*

**

*

***

*

*

* * ****

**

**

****

**

**

**

*

**

***

*

***

**

*

*

*

** *

***** *

***

*

** ****

*

*

**

** *

*

**

**

*

**

*

* *

*

**

*

***

*** *

*

**

***

*

*

*

**

****

*

*

*

*

*

***

**

***

***

*** **

**

**

*

****

**

**

*

** ***

*

**

**

*

**

*

*

**

* ***

*

*

***

*

**

**

*

*

**

* *

**

***

**

**

*

***

*

*

**

*

***

*

*

*

***

*

**

*

***

*****

* ***

**

***

*

*

** *

*

****

**

*

***

*

*

**

*

*

****

***

**

****

*

**** **

****

**

** * *

**

** **

* ** *

******

*

**

**

*

**

***

**

*

*

*

*

*

***

**

*

* ** *

**

*

*

**

*

**

** *

**

***

* **

*

*

*

*

*

**

*

*

*** *

*** **

*

*

**

* ***

**

**

** **

***

* **

*

********

* ***

*

*** *

*

**

***

** *

*

***

** **

*

***

*

*

*

**

***

***

***

*

**

**

*

**

** *** ***

**

*

****

*

***

*

*

*

*

**

********

**

**** **

*

**

**

* **

***

*

*

*

*

* *

*

*

**

****** *

*

**

** *

* **

***

*****

*

**

*

**

**

*

*

** ***

**

****

***

*

*

*

*

**

** **

* *

*

***

*

*

**

*

*

*

*

*

* *

*

**

*

***

*

**

*

* **

*****

*

**

*

** **** *

***

*

***

** *** *

*

****

***

*

*

*

**

***

**

***

*

**

****

*

* **

**

*

**

** ** *

* **

*

*

*

**

**

** *

** *

*

*

*

*

* *

*** **** *

**

* **

******

**

*

*

****

****

*

***

****

*

*

**

**

*

***

*

** ** *

**

**

*** ** **

**

*

*

*

*

***

*

* ***

*

*

***

**

**

* * ** *

**

**

*

** *

*

**

***

**

*

**

* * ** ****

*

*** *

** *

*

*

**

** *

**

* **

*

***

****

* *

*

*

*

*

*

* **

*

*

**

*** **

** *

**

*

*

*

**

* *

*

* **

**

***

* ***

***

* ***

***

**

* *

*

*

**

*

**

*

*

*

* * ** ****

**

****

**

**

* *

*

**

***

*

** *

**

*

*

*

** *

** ** * ** *

*

*

** ** **

*

*

**

** *

*

**

**

*

**

*

* *

*

**

*

***

** **

*

**

***

*

*

*

**

* ***

*

*

*

*

*

***

**

**

** *

*

*****

**

**

*

* ***

**

**

*

** *

**

*

**

**

*

**

*

*

**

* ***

*

*

**

**

**

**

*

*

**

**

**

***

**

**

*

***

*

*

* *

*

**

*

*

*

*

***

*

**

*

** *

* ****

****

**

* **

*

*

** *

*

** **

**

*

***

*

*

**

*

*

**

**

***

**

** **

*

** ** **

***

*

**

** **

* *

** **

* ** *

*** ***

*

**

**

*

**

***

**

*

*

*

*

*

***

**

*

****

**

*

*

**

*

**

** *

**

* * *

* **

*

*

*

*

*

**

*

*

**

* **

** **

*

*

**

* * **

**

**

** **

**

** **

*

***** ***** *

*

*

**

***

* *

***

***

*

***

** **

*

** *

*

*

*

**

**

***

**

**

*

**

**

*

**

** *** ***

**

*

** **

*

***

*

*

*

*

**

**

**** **

**

**** **

*

**

**

* **

***

*

*

*

*

**

*

*

**

****

** **

* *

** *

***

***

**

* **

*

***

**

**

*

*

** ***

**

* ***

***

*

*

*

*

**

****

* *

*

***

*

*

**

*

*

*

*

*

* *

*

**

*

**

*

*

**

*

* *** *

* **

*

**

*

** *****

***

*

**

*

*** *

* **

****

** *

*

*

*

**

***

* *

***

*

**

**

**

*

* **

* *

*

**

** *

* *

0.64 0.72

min: 0

max: 0.81

Tag/TitleIntersection

Ratio

**

***

****

**

**

* *** *

*

******

**

****

** **

****

****

* ***

* **

****

**

***

*****

***

*****

*

*

**

**

***

**** * *

*

*

***

**

****** **

*

** ***

***

* ****

**

**

*

******

**

**

**** *

*****

*

** ***** *

**

*

**

** *****

* ****

************

******

*******

*

****

**

**

**

*****

**

*

* ******

***** * *****

**

****

***

*** ******

** ****

**

*****

*

**

******

*

** *

* **

*****

**

***

**

**

******

*

** **

**** *

**

* *

* ****

**** ** * ***

****

*

* **

*** **

*

**

***

**

** ***

***** *

**

**

*******

**

*

*

**

*

**

***

** *

**

****

* ***

*

*

****

***

*

*

*

**

**

** *****

** **

***

***

**

***

**

** ***

*

*

*****

***

* **

*

* *

****

*

**

*****

**

*

*

** ** *****

****

*

** ** *

*****

***

*****

***

** * ***

**

*

** ** * *******

**

*

****

****

*****

****

* **

**

* ****

* *

*

**

*

**********

****

**

**

*

**

***

*

** **

*

**

* **

*

*

**

**

**

*

**

*****

*********

* **

***

****

****

*

*

* *****

**

**

*

* **

**

**

*

*** **

*

*

***

**

**

*

***

**** **

**

*****

** ***

*

*

*

**

** ****** *****

** ***

**

***

*

** ***

*

**

* **

**

*

**** **

**

**

* **

* ***

*

*

***** *

** *

**

*

*

**

** *******

** ****

**

***

** *** *

****

*****

*****

*

*

***

***

***

**

****

****

**

*

** *

*

** **

**

* ***** ***

*

* **

*** ****

*** ***

*****

**

*

*****

***

**

*

* **

**

* **

***

*****

**

**

* * ** *

*

** ****

**

****

** *** *

* *** **

* * **

***

*** * *

*

* **

** ***

* *******

*

*

**

**

***

*** ** *

*

*

****

**** ** ***

*

*****

***

* * ***

**

* *

*

** ** **

**

**

* **** *** **

*

*** ** ***

**

*

**

** ** ***

* ****

* **** *****

***

* ****

*** ***

**

****

**

**

**

* ** **

***

* ******

* *** ** **** *

**

** *

** *

***

* ** * ***

** ***

**

**** *

*

*

***** ** *

*

***

* * *

*** *

**

*

* **

**

**

* ** ***

*

** **

**** *

**

* *

*****

* **** ** ***

****

*

* **

* ** * *

*

**

**

**

**

* * ** ******

**

* *

* ** ****

* *

*

*

**

*

**

***

** ***

** **

* ***

*

*

****

***

*

*

*

**

**

** *****

** * *

***

** *

**

* **

**

**** *

*

*

** ** *

***

***

*

**

* ***

*

**

**** *

**

*

*

** * ** * ***

**

***

** ***

* ****

***

** **

**

**** **

* *

**

*

* * ** * *** ****

**

*

********

* ******

**

* **

**

*****

* *

*

**

*

* *** ** * **

** *

***

***

*

**

***

*

** **

*

**

****

*

**

**

**

*

**

*****

*****

* ******

* **

*** *

* ***

*

*

** ****

**

**

*

** *

**

* *

*

**

* **

*

*

****

***

*

***

**** **

**

**** *

** * **

*

*

*

**

* ****** * *

** ** ** ** *

**

***

*

** ***

*

**

***

**

*

**** ***

** *

***

** **

*

*

*** ***

***

* *

*

*

* *

** ****

* * *** ** *

**

*

* **

* **** **

***

* *****

****

*

*

***

**

**

***

**

** ****

**

**

** *

*

******

* ** **** **

*

***

**** ***

* ** ***

* ****

**

*

* ****

***

* *

*

* **

**

* *

0.63 0.74 0.94

min: 0

max: 3.13

Tag/ResourceRatio

* ***

*****

**

*

** *

***

*

** *

***

**

**

**

**

*

*

*** *** *

*

** **

**

*

**

* **

*

* **

** **

** *****

**

*

*

*

**

**

****

*** *

*

**

*

*

**

*** ** ***

*

*****

***

** **

*

**

* *

*

** ** **

**

*

*

* **** *** *

**

*

** ** *

***

*

*

**

** ** **

**

*

**** ***

* ****

**

**

* ****

*** ****

*

****

***

* **

* ** **

* *

*

***

*

*** * *** *

***

***

*

**

**

** *

*

** **

** ***

***

***

*

**** *

*

*

***

** ** *

*

*

**

*

**

*** ** *

*

* *

*

**

*** ** **

*

*

***

*

****

*

**

* *

**

***

* **

** ***

**

***

*

*

* *

*

* ***

*

*

*

*

***

*

*

*

* * ** *****

*

**

*

*

* ** ****

* *

*

*

*

*

*

**

*

***

**

**

**

***

**

**

*

****

**

*

*

*

*

**

**

*****

**

*

* * *

*

**

*

* *

*

** *

* *

*

***

* *

*

*

*

* *

* *

**

***

*

***

* ***

*

*

***** *

**

*

*

** * *** **

**

* **

*

*

**

**

***

**

* **

**

****

***

**

*

* *

**

*

* * ***

*** ****

**

*

*

***

***

** *

***

**

**

**

***

****

*

**

*

**

*

* *** *

* * **** * **

*

***

*

*

*

***

*

***

*

*

**

*** *

*

*

*

*

*

*

**

*****

**

*

***** **

**

* *

* **

*** *

* ****

*

** ****

*

***

*

** *

*

*

* *

*

*

* ***

*

*

**

*

*

***

*

***

*

*** ***

***

** **

** *

*

*

*

*

*

**

** **** * *

** *****

* *

*

***

*

*

*

* ***

*

*

*

***

**

*

*

**

**

*

*

*

* *

***

*

*

*

**

*

*

** *** *

*

*

* *

*

*

* *

** ***

*** *

** ** *

*

*

*

* *

*

****

**

*

***

* *****

****

*

*

*

*

**

* **

***

**

** ** **

*

**

*

*

* *

*

******

**

* **

**

**

*

**

***** *

**

* ***

**

* ****

*

*

*

* *****

*** *

*

****

*

**

0.49 0.42 0.74 0.71

min: 0

max: 0.93

OverlapFactor

Pairwise Correlation of MeasuresNo surprise because both measures

are based on tag distribution

Interesting because

measures have different intuitions in

mind



Qualitative Evaluation / 1

For the qualitative evaluation we conducted a human subject study with 6 participants

Study participants were given personomies and asked to classify each personomy into one of the two tagging motivation types– categorizer– describer

Result– moderate agreement Cohen’s kappa = 0.51 [Cohen1960]

Reasons– Users were only given a fraction of each

personomy– Task is quite subjective and complex

13

Ran

dom

Con

ditio

nal T

agEn

tropy

Orp

hane

d Ta

gs

Tag/

Title

Inte

rsec

tion

Rat

io

Ove

rlap

Fact

or

Tag/

Res

ourc

eR

atio

Accuracy of Evaluated Measures

Accu

racy

0.0

0.2

0.4

0.6

0.8

1.0

± 0.0%

+48.4%+56.5%

+66.1%+74.2% +77.4%



Qualitative Evaluation / 2

Highest agreement– Tag/Resource Ratio– Tag/Title Intersection Ratio– Overlap Factor

Lowest agreement– Orphaned Tag Ratio– Conditional Tag Entropy

--> General agreement between measures and human judgement

14



Quantitative Evaluation / 1

To assess wether the distinction between describers and categorizers has an observable impact during tagging

For this purpose: “Recommender Evaluation”

Intuition:– A user who is motivated by categorization prefers tag recommendation

algorithms which resort to tags from the personal tag vocabulary– Users who are motivated by description favor algorithms which suggest

tags that are most descriptive for the resource she is tagging

15



Quantitative Evaluation / 2

1. Built basic recommenders which recommend tags based on personomy and folksonomy

2. Users were then evenly split between the two groups according to each measure

3. Hold out evaluation

Predict which tags a user will assign to a resource based on her tagging motivation/behavior

16



Quantitative Evaluation / 3According to the recommender evaluation two important

findings were made:

17R

ando

m

Orp

hane

d Ta

gs

Ove

rlap

Fact

or

Con

ditio

nal T

agEn

tropy

Tag/

Res

ourc

eR

atio

Tag/

Title

Inte

rsec

tion

Rat

io

Folksonomy−based RecommenderDescriber Users

Mea

n Av

erag

e Pr

ecis

ion

0.0

0.1

0.2

0.3

0.4

0.5

0.6

± 0.0%

+22.5% +25.4% +26.7% +32.3%+38.9%

– Users who prefer folksonomy-based recommendation (describers) can be best identified by a high Tag/Title intersection ratio

Ran

dom

Orp

hane

d Ta

gs

Ove

rlap

Fact

or

Con

ditio

nal T

agEn

tropy

Tag/

Res

ourc

eR

atio

Tag/

Title

Inte

rsec

tion

Rat

ioPersonomy−based Recommender

Categorizer Users

Mea

n Av

erag

e Pr

ecis

ion

0.0

0.1

0.2

0.3

0.4

0.5

0.6

± 0.0%

+11.9% +13.8% +14.0%+19.0% +17.5%

– Users who prefer personomy-based recommendation (categorizers) can be best identified by a low Tag/Resource Ratio

Motivation beh

ind

tagging has

significant impa

ct on

tagging behavi

or!



Outlook

The findings can be a starting point for the analysis of questions such as:– How does motivation behind tagging influence the performance of current

folksonomy search algorithms?– How can recommender explicitly consider tagging motivation to improve

recommendation?– To what extent are existing algorithms for the acquisition of semantic

relations from folksonomies affected by tagging motivation?

18



Conclusion

Can tagging motivation be approximated by statistical measures?– Yes. This can be seen by the introduced measures.

What are measures which enable the inference if a given user is a categorizer or a describer?– Conditional Tag Entropy, Orphaned Tags, Tag/Title intersection ratio etc.

Which of these measures perform best to differentiate these two types of tagging motivation?– Tag/Resource Ratio has highest agreement with human judgement– Tag/Resource Ratio best predictor for categorization behavior– Tag/Title Intersection best predictor for description behavior

Does the distinction of the proposed tagging motivation types have an influence on the tagging process?– Yes. It influences a user’s selection of tags.

19



References[Ames2007] Ames, M. & Naaman, M. (2007), Why we tag: motivations for annotation in mobile and online

media, in ‘CHI ’07’: Proceedings of the SIGCHI conference on Human factors in computing systems’ ACM, New York, NY, USA, pp.971--980

[Cohen1960] Cohen, Jacob. (1960) A coefficient of agreement for nominal scales, Educational and Psychological Measurement Vol.20, No.1, pp. 37–46.

[Heckner2009] Heckner, M; Heilemann, M. & Wolff, C. (2009) Personal Information Management vs. Resource Sharing: Towards a Model of Information Behavior in Social Tagging Systems, in ‘Int’l AAAI Conference on Weblogs and Social Media (ICWSM)’.

[Körner2009] Körner, C. (2009), Understanding the Motivation Behind Tagging, Student Research Competition - Hypertext 2009

[Nov2009] Nov, O.; Naaman, M. & Ye, C. (2010), 'Analysis of participation in an online photo-sharing community: A multidimensional perspective.', JASIST 61(3), 555-566.

20



Thanks for you attention

Questions?

[email protected]

21

Are you

a categorizer

or

a describer?