24
Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure Date : 2012/10/11 Resource : CIKM’11 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu 1

Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure

  • Upload
    suchin

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure. Date : 2012/10/11 Resource : CIKM’11 Advisor : Dr. Jia -Ling Koh Speaker : I- Chih Chiu. Outline. Introduction Description of the approach Tag similarity computation Tag expansion - PowerPoint PPT Presentation

Citation preview

Page 1: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

1

Effective Retrieval of Resources in Folksonomies Using a New Tag Similarity Measure

Date : 2012/10/11Resource : CIKM’11Advisor : Dr. Jia-Ling KohSpeaker : I-Chih Chiu

Page 2: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

2

Outline Introduction Description of the approach

Tag similarity computation Tag expansion Taming computational complexity

Evaluation Conclusion

Page 3: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

3

Introduction Social media application

Videos, pictures, music, blogs etc.

Pre-defined taxonomies Social tagging

Informally defined Continually changing Ungoverned

Find content of interest has become a main challenge

Page 4: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

4

Motivation Various classic metrics have been used to

compute tag similarity Cosine similarity, Jaccard coefficient, Pearson

correlation

The underlying folksonomy is already dense This assumption does not hold true Most real life folksonomies exhibit a power law

distribution of tag usage

Using traditional metrics like cosine similarity, would almost always yield close-to-zero values

Page 5: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

5

Goal Propose an approach that transparently

induces the creation of a dense folksonomy mutual reinforcement principle

Automatically expand the user-selected tag set Label a new resource Submit a query to retrieve some resources

• Cosine• Latent sematic indexing• SimRank• The novel approach

Tag similarity

computation

• It can automatically expand the tag set chosen by the user.Tag

expansion

Page 6: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

6

Outline Introduction Description of the approach

Tag similarity computation Tag expansion Taming computational complexity

Evaluation Conclusion

Page 7: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

7

Cosine Similarity Co-occurrence

Roughly 81% of resources were described by no more than 5 different tags (and roughly 58% by less than 3 )

Matrix TR is rather sparse

TR =

𝑠 (𝑢𝑠𝑒𝑟 ,𝑠𝑦𝑠𝑡𝑒𝑚)= 2√3 ∙√6

=√23

𝑠 (𝑡𝑖 ,𝑡 𝑗 )=⟨𝑡𝑟 (𝑖 ) ,𝑡 𝑟 ( 𝑗 ) ⟩

√ ⟨𝑡𝑟 (𝑖 ) , 𝑡𝑟 (𝑖 ) ⟩ ∙√𝑡𝑟 ( 𝑗 ) ,𝑡 𝑟 ( 𝑗 )

𝑠 (𝑡𝑖𝑚𝑒 ,𝐸𝑃𝑆)= 0√2∙√2

=0

(1)

Page 8: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

8

Latent Semantic Indexing(1/2)

Singular Value Decomposition(SVD)

𝐴=𝑈 Σ𝑉𝑇 𝐴=𝑈𝑘 Σ𝑘𝑉 𝑘𝑇

Page 9: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

9

Latent Semantic Indexing(2/2)

qk is then compared with every document vector in Vk using the cosine similarity.

The computation of LSI on large matrices is very costly

The tuning of parameter k is complex and time-expensive

query q = “user interface”

Page 10: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

10

SimRank(1/2) More suitable to the folksonomy domain are techniques

that rely on the mutual reinforcement principle. People are similar if they purchase similar items. Items are similar if they are purchased by similar people.

𝑠 ( 𝐴 ,𝐵 )=𝐶1

¿𝑂( 𝐴)∨¿𝑂 (𝐵)∨¿ ∑𝑖=1

¿𝑂 (𝐴)∨¿ ∑𝑗=1

¿ 𝑂(𝐵)∨¿ 𝑠(𝑂 𝑖(𝐴 ), 𝑂 𝑗 (𝐵))

¿ ¿¿

¿¿𝑖𝑓 𝐴≠𝐵 ,

𝑠 (𝑐 ,𝑑)=𝐶2

¿ 𝐼 (𝑐)∨¿ 𝐼 (𝑑)∨¿ ∑𝑖=1

¿ 𝐼 (𝑐)∨¿ ∑𝑗=1

¿ 𝐼 (𝑑)∨¿𝑠 ( 𝐼𝑖 (𝑐 ) ,𝐼 𝑗 (𝑑))

¿¿ ¿

¿¿𝑖𝑓 𝑐≠ 𝑑 ,

𝑠 ( 𝐴 ,𝐵 )= 0.83∗3 ∗ (0.619∗6+1+1+0.437 )=0.547

𝑠 ( 𝑓𝑟𝑜𝑠𝑡𝑖𝑛𝑔 ,𝑒𝑔𝑔𝑠 )= 0.82∗2∗ (1+1+0.547∗2 )=0.619

(2)

(3)

Page 11: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

11

SimRank(2/2) Iteration

Don’t consider the number of times a tag intervenes in labeling a resource Don’t distinguish between tags that have labeled exactly the same

resource

𝑅𝑘+1 (𝑎 ,𝑏 )=𝐶

¿ 𝐼 (𝑎)∨¿ 𝐼 (𝑏)∨¿ ∑𝑖=1

¿ 𝐼 (𝑎 )∨¿ ∑𝑗=1

¿𝐼 (𝑏)∨¿𝑅 𝑘( 𝐼 𝑖( 𝑎 ), 𝐼 𝑗 (𝑏))

¿ ¿¿

¿¿

𝑅0 (𝑎 ,𝑏 )={0( 𝑖𝑓 𝑎≠𝑏)1( 𝑖𝑓 𝑎=𝑏)

𝑅3 (𝑈𝑛𝑖𝑣 ,𝑃𝑟𝑜𝐵 )= 0.81∗2∑𝑖=1

1

∑𝑗=1

2

𝑅2(𝐼𝑖 (𝑈𝑛𝑖𝑣) , 𝐼 𝑗 (𝑃𝑟𝑜𝐵))

𝑅2 (𝑆𝑡𝑢𝑑𝐴 ,𝑆𝑡𝑢𝑑𝐵 )= 0.81∗1∑𝑖=1

1

∑𝑗=1

1

𝑅1(𝐼 𝑖(𝑆𝑡𝑢𝑑𝐴) , 𝐼 𝑗 (𝑆𝑡𝑢𝑑𝐵 ))𝑅2(𝑆𝑡𝑢𝑑𝐴 ,𝑈𝑛𝑖𝑣)

𝑅1 (𝑃𝑟𝑜𝑓𝐴 ,𝑃𝑟𝑜𝑓𝐵 )= 0.81∗2∑𝑖=1

1

∑𝑗=1

2

𝑅0 (𝐼𝑖 (𝑃𝑟𝑜𝑓𝐴) , 𝐼 𝑗 (𝑃𝑟𝑜𝑓𝐵))

𝑅0 (𝑈𝑛𝑖𝑣 ,𝑈𝑛𝑖𝑣 )=1 𝑅0 (𝑈𝑛𝑖𝑣 ,𝑆𝑡𝑢𝑑𝐵 )=0

(4)𝑅3 (𝑈𝑛𝑖𝑣 ,𝑃𝑟𝑜𝐵 )=0.128

Page 12: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

12

A Novel Similarity Metric(1/2)

Mutual reinforcement factor To give more relevance to tags that labeled the very same

resources, with respect to those that labeled related (but not the very same) resources.

is equal to 1 if , while it is equal to if .

(5)

(6)

(7)

(8)

(9)

Cosine similarity

Page 13: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

13

A Novel Similarity Metric(2/2)

TR =

𝑠𝑡0 (h𝑢𝑚𝑎𝑛 , 𝑖𝑛𝑡𝑒𝑟𝑓𝑎𝑐𝑒 )= 1√2√2

=12

How to compute the similarity of

𝑠𝑡1 (h𝑢𝑚𝑎𝑛 , 𝑖𝑛𝑡𝑒𝑟𝑓𝑎𝑐𝑒 )=𝑆𝑇 1(h , 𝑖)

√𝑆𝑇 1(h ,h)∙√𝑆𝑇 1(𝑖 , 𝑖)

=1.438

𝑆𝑇 𝑘 (𝑡𝑎 , 𝑡𝑏)= ∑𝑖 , 𝑗=1

𝑛𝑟

𝑇 𝑅𝑎𝑖 ∙Ψ 𝑖𝑗 ∙𝑠𝑟𝑘−1(𝑟 𝑖 ,𝑟 𝑗) ∙𝑇 𝑅𝑏𝑗

𝑆𝑇 1 (h ,h )=1+0.6∗ 1√18

+0.6∗ 1√18

+1=2.288

𝑆𝑇 1 (𝑖 , 𝑖 )=1+0.6∗ 1√12

+0.6∗ 1√12

+1=2.348

𝑠𝑡1 (h𝑢𝑚𝑎𝑛 , 𝑖𝑛𝑡𝑒𝑟𝑓𝑎𝑐𝑒 )= 1.438√2.288 ∙√2.348

=1.4382.318=0.62

Page 14: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

14

Tag expansion(1/2) Key to their approach is the use of the previously

computed tag similarities to automatically expand the tag set chosen by the user.

𝑆𝐶 (𝑡 𝑖 ,𝑡𝑆𝑒𝑡 )= ∑𝑡 𝑗∈𝑡𝑆𝑒𝑡

𝑠𝑐 (𝑡𝑖 , 𝑡 𝑗)

is the set of user-selected tags is a tag in and a tag not in

𝑠𝑐 (𝑡 𝑖 ,𝑡 𝑗 )=𝑠𝑡 (𝑡𝑖 , 𝑡 𝑗)∙ log𝑐𝑜𝑢𝑛𝑡 (𝑡𝑖) ∙ 𝐼𝑅𝐹 (𝑡𝑖) : the previously computed similarity : the number of times appears in the folksonomy : the inverse resource frequency of

Largely used Important

(10)

(11)

Page 15: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

15

Tag expansion(2/2)Assume = {tree, sea, sky}{sun, fruit} : not choose by user

𝑆𝐶 (𝑡 𝑖 ,𝑡𝑆𝑒𝑡 )= ∑𝑡 𝑗∈𝑡𝑆𝑒𝑡

𝑠𝑐 (𝑡𝑖 , 𝑡 𝑗)

𝑠𝑐 (𝑡 𝑖 ,𝑡 𝑗 )=𝑠𝑡 (𝑡𝑖 , 𝑡 𝑗)∙ log 𝑐𝑜𝑢𝑛𝑡 (𝑡𝑖) ∙ 𝐼𝑅𝐹 (𝑡𝑖)

Recommend top k highest scoring tags and users can decide which one to use.

Page 16: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

16

Computational complexity From a theoretical standpoint, the

computation of each pairwise tag similarity may require an infinite number of iterations.

This could make our similarity measure inapplicable in practical cases, because each iteration would require exactly computations.

Page 17: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

17

Outline Introduction Description of the approach

Tag similarity computation Tag expansion Taming computational complexity

Evaluation Conclusion

Page 18: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

18

Evaluation Is our approach able to increase

the accuracy of searches?

Does our approach scale to large folksonomies?

Page 19: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

19

Datasets Bibsonomy & CiteULike

Bibsonomy CiteULike

Bookmarks 648,924 2,281,609

User 4,696 57,053

Papers 578,587 1,928,302

Distinct tags 147,076 401,620

Page 20: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

20

Accuracy of User Searches The first experiment aimed at determining the ability of

the approach to retrieve resources of relevance to the user querying the folksonomy.

Tag expansion can yield better results

Figure 1: Retrieved Ratio on Bibsonomy and CiteULike

Page 21: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

21

Scalability As previously pointed out, the highest cost caused by

the approach lies in the computation of pairwise tag similarities.

This result confirms that their similarity measure is scalable and well suited to be applied even when operating in large folksonomies.

(12)

Page 22: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

22

Outline Introduction Description of the approach

Tag similarity computation Tag expansion Taming computational complexity

Evaluation Conclusion

Page 23: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

23

Conclusion Have proposed an approach that enables the

effective retrieval of resources within folksonomies.

This metric is used both when users label resources and when users query the folksonomy.

Finally, the computational cost of our iterative approach is limited, as convergence is guaranteed, and in practice reached after a handful of iterations.

Page 24: Effective Retrieval of Resources in Folksonomies Using a New Tag  Similarity Measure

24

Thanks for listening