20
Quality of Claim Metrics in Social Sensing Systems: A case study on IranDeal Pooria Taghizadeh : [email protected] Dr. Hadi Tabatabaee : [email protected] Dr. Mona Ghassemian : [email protected] Dr. Hamed Haddadi : [email protected]

Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Embed Size (px)

Citation preview

Page 1: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Quality of Claim Metrics in Social Sensing

Systems: A case study on IranDeal

Pooria Taghizadeh : [email protected]. Hadi Tabatabaee : [email protected]. Mona Ghassemian : [email protected]. Hamed Haddadi : [email protected]

Page 2: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Introduction Sources of claim uncertainty and invalidity Quality of claim metrics Datasets Evaluation and analysis Conclusion

Quality of Claim Metrics in Social Sensing Systems 2/20

Outline

Page 3: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

What is a social sensing system?Social Sensing is referred to systems that use people as sensors and claim the events happening in their surroundings.

The main components

Quality of Claim Metrics in Social Sensing Systems 3/20

Introduction

Page 4: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Quality of Claim Metrics in Social Sensing Systems 4/20

Uncertainty and Invalidity

Spam Gossip

User inaccuracy

Sensor inaccuracy

Problems

Page 5: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Sources of claim uncertainty and

invalidity:

• Gossip• Regular expressions• “is (that | this | it) true”• “wh[a]*t[?!][?1]*”

• Spam• In web-based systems: CAPTCHA • In social networks: by analyzing the inputs

such as tags, links, tips and comments

Quality of Claim Metrics in Social Sensing Systems 5/20

Sources of Claim Uncertainty & Invalidity

Page 6: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Inaccuracy of users• People are the core element

of the social sensing system• Main weak points of the

system: Human errors• Claims cannot be fully

trusted

Quality of Claim Metrics in Social Sensing Systems 6/20

Sources of Claim Uncertainty & Invalidity (Cont.)

Page 7: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Claim validation

assessment:

• How to identify valid claims?• This issue was introduced on web

before:• Sums, Average Log, Investment.

• Some possible solutions:• machine learning• natural language processing• data mining• clustering methods

Quality of Claim Metrics in Social Sensing Systems 7/20

Sources of claim uncertainty & invalidity (Cont.)

Page 8: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Quality of Claim Metrics in Social Sensing Systems 8/20

Quality of claim metricsContent Measure:

The richness of the claim contents facilitates the back-end applications.

Feedback (Popularity) Measure• Each claim published on a social

network may provoke reactions• users judgments• redistributing the claim

Page 9: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Content diversity

• The diversity of the type of information• Text, Video, Image

User tagging

• users can be mentioned and notified by each other• provides new information about the importance of the claim• mentioning can be analyzed to find debates between users

Quality of Claim Metrics in Social Sensing Systems 9/20

Content Measure

Page 10: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Quantity of used

keywords

• The set of keywords is dependent on the subject• The set of keywords needs a prior knowledge• The set can be extracted by preprocessing the claims• The higher number of used keywords will increase the value of the claims

Geo-tagging

• It is used to pin the locations of the users• The information is valuable in location base analysis to cluster the

reporting user

Quantity of used

hashtags

• Analyzing hashtags are easier than the keywords• one of the main approaches to query the posted claims over a

specific period of time

Quality of Claim Metrics in Social Sensing Systems 10/20

Content Measure (Cont.)

Page 11: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Opinion reaction

• This parameter can help validate the information by unknown users.

• In some of the systems, users may rate by giving stars

Redistribution

• The number of reclaims shows the popularity of the claim

Quality of Claim Metrics in Social Sensing Systems 11/20

Feedback Measure

Page 12: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Quality of Claim Metrics in Social Sensing Systems 12/20

Social Network Support

Page 13: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Two hashtag-centric and user-centric datasets are gathered by the crawler for the evaluation

The first dataset is extracted from the Twitter based on IranDeal hashtag• 260,000 tweets• 66,238 users

The second dataset is extracted from the Foursquare social network• 7,402 users• 40,741 Tips• 35,503 restaurants

Quality of Claim Metrics in Social Sensing Systems 13/20

Datasets

Page 14: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Evaluation: Comments/User The users are grouped

according to the number of reported claims

About 14% of the users (36663 users) post exactly 1 tweet.

Only 4% have two posts.

The percentage decreases as the number of tweets increases.

14/20Quality of Claim Metrics in Social Sensing Systems

Page 15: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

The number of likes for each comment shows its popularity

the comments are categorized based on their number of likes

A large fraction of tweets (93%) does not get any favorites

The portion of tweets that gets 1 and 2 favorites are 3.4% and 1.1% respectively

15/20

Popularity of comments

Quality of Claim Metrics in Social Sensing Systems

Page 16: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

One of the other popularity metrics is the rate of sharing a comment.

It expresses the dependency between the QoC metrics and the way the dataset is crawled

people who follow the hashtag are eager to share the news headline

The sparsity of the data for the values of higher than 500 affects the results

16/20

Re-Tweets

Quality of Claim Metrics in Social Sensing Systems

Page 17: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

The tags provide extra information that boosts claims processing applications

The highest frequency belongs to the comments with a single tagged user (140191 tweets)

The highest population of tagged users in a tweet is mentioned to be 12 people

Around 15% of tweets tagged exactly two users and the values decrease in higher numbers

17/20

Tagged user / comment

Quality of Claim Metrics in Social Sensing Systems

Page 18: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

Power law distribution◦ We used the Zipf law.

◦ S shows the degree of curve slope.

18/20

Evaluation and analysis

Comparing the value of s for these datasets implies that the nature of the used social network affects the

characteristics of the dataset.

Quality of Claim Metrics in Social Sensing Systems

Page 19: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

We Review the Sources of

claim uncertainty and

invalidity

Defines a new set of quality of claims metrics

The analysis show that most of the metrics

follow the power law. But

it is not a general rule

The degree of power law is dependent to the nature of

dataset and the social network

19/20

Conclusion

Quality of Claim Metrics in Social Sensing Systems

Page 20: Quality of Claim Metrics in Social SensingSystems: A case study on IranDeal

20/20

Questions

Quality of Claim Metrics in Social Sensing Systems