29
Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine

Exploring Linkability of User Reviews

  • Upload
    mindy

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Exploring Linkability of User Reviews. Mishari Almishari and Gene Tsudik University of California, Irvine. Roadmap. Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion. Motivation. Increasing P opularity of Reviewing Sites - PowerPoint PPT Presentation

Citation preview

Page 1: Exploring  Linkability of User Reviews

Exploring Linkability of User Reviews

Mishari Almishari and Gene Tsudik

University of California, Irvine

Page 2: Exploring  Linkability of User Reviews

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 3: Exploring  Linkability of User Reviews

Motivation

Increasing Popularity of Reviewing Sites

Yelp, more than 39M visitors and 15M reviews in 2010

Page 4: Exploring  Linkability of User Reviews

Example

category

Rating

Page 5: Exploring  Linkability of User Reviews

Motivation

Rising awareness of privacy

Page 6: Exploring  Linkability of User Reviews

Motivation

How is it applied?

Traceability/Linkability

Linkability of Ad hoc Reviews

Linkablility of Several Accounts

Page 7: Exploring  Linkability of User Reviews

Goal

Assess the linkability in user reviews

Page 8: Exploring  Linkability of User Reviews

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 9: Exploring  Linkability of User Reviews

Data Set

• 1 Million Reviews • 2000 Users• more than 300 reviews

Page 10: Exploring  Linkability of User Reviews

Problem Settings

Page 11: Exploring  Linkability of User Reviews

Problem Settings

Page 12: Exploring  Linkability of User Reviews

IR: Identified RecordIR

IR

IR

IR

AR

AR

AR

AR

AR: Anonymous Record

Problem Formulation

Page 13: Exploring  Linkability of User Reviews

Anonymous Record (AR)

Identified Records (IR’s)

Matching Model

TOP-X LinkabilityX: 1 and 10

1, 5, 10, 20,…60

Problem Settings

Page 14: Exploring  Linkability of User Reviews

Methodologies(1) Naïve Bayesian Model

(2) Kullback-Leibler Divergence (KLD)

Decreasing Sorted List of IRs

Increasing Sorted List of IRs

Maximum-Likelihood Estimation

Page 15: Exploring  Linkability of User Reviews

Tokens

• Unigram:• “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y”• 26 values

• Digram• “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy”• 676 values

• Rating• 5 values

• Category• 28 values

Page 16: Exploring  Linkability of User Reviews

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 17: Exploring  Linkability of User Reviews

NB -Unigram

Unigram Results

Anonymous Record Size

Lin

kab

ilit

y R

ati

o

Size 60, LR 83%/ Top-1LR 96% Top-10

Page 18: Exploring  Linkability of User Reviews

Digram Results

NB -Digram

Lin

kab

ilit

y

Rati

o

Anonymous Record Size

Size 20, LR 97%/

Top-1Size10, LR 88%/

Top-1

Page 19: Exploring  Linkability of User Reviews

Improvement (1): Combining Lexical and non-Lexical

onesNB Model

Anonymous Record Size

Lin

kab

ilit

y

Rati

o

Gain, up to 20%

Size 60, 83 % To

96%

Size 30, 60 % To

80%

Page 20: Exploring  Linkability of User Reviews

What about Restricting Identified Record (IR) Size?

NB Model KLD Model

Anonymous Record Size

Lin

kab

ilit

y R

ati

oAnonymous Record

Size

Lin

kab

ilit

y R

ati

o

Affected by IR size

Performed better for smaller IR

Size 20 or less, improved

Page 21: Exploring  Linkability of User Reviews

✖✖

v1 v3v2 v4

v7v5 v6 v8

v9 v10

v11

v12

v13

v14

v15 v1

6

Improvement (2): Matching All IR’s At Once

Page 22: Exploring  Linkability of User Reviews

Matching All Results

Restricted IR Full IR

Anonymous Record Size

Lin

kab

ilit

y R

ati

o

Anonymous Record Size

Lin

kab

ilit

y R

ati

o

Gain, up to 16%

Size 30, From 74% To 90%

Gain, up to 23%Size 20, From 35% To 55%

Page 23: Exploring  Linkability of User Reviews

Improvement (3): For Small IR Size

Changing it to:0.5 + Review Length

Anonymous Record Size

Lin

kab

ilit

y

Rati

o Size 10, 89% To 92%

Size 7, 79% To 84%

Gain up to 5%

Page 24: Exploring  Linkability of User Reviews

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 25: Exploring  Linkability of User Reviews

Discussion

o Unigram and Scalabilityo 26 VS 676o 59 VS 676o Less than 10%

o Prolific Userso On the long run, will be prolific

o Anonymous Record Size o A set of 60 reviews, less than 20% of minimum

contribution o Detecting Spam Reviews

Page 26: Exploring  Linkability of User Reviews

Roadmap

1. Introduction2. Data Set & Problem Settings3. Linkability Results &

Improvements4. Discussion5. Future Work & Conclusion

Page 27: Exploring  Linkability of User Reviews

Future Work

o Improving more for Small AR’so Other Probabilistic Modelso Using Stylometry

o Review Anonymizationo Exploring Linkability in other Preference

Databases

Page 28: Exploring  Linkability of User Reviews

Conclusion

o Extensive Study to Assess Linkability of User Reviewso For large set of userso Using very simple features

o Users are very exposed even with simple features and large number of authors

Reviews can be accurately de-anonymized using alphabetical letter distributions

Takeaway Point:

Page 29: Exploring  Linkability of User Reviews

Questions?