View
42
Download
0
Category
Preview:
Citation preview
Quality of Claim Metrics in Social Sensing
Systems: A case study on IranDeal
Pooria Taghizadeh : pooria.tgh@gmail.comDr. Hadi Tabatabaee : h_tabatabaee@sbu.ac.irDr. Mona Ghassemian : m_ghassemian@sbu.ac.irDr. Hamed Haddadi : hamed.haddadi@qmul.ac.uk
Introduction Sources of claim uncertainty and invalidity Quality of claim metrics Datasets Evaluation and analysis Conclusion
Quality of Claim Metrics in Social Sensing Systems 2/20
Outline
What is a social sensing system?Social Sensing is referred to systems that use people as sensors and claim the events happening in their surroundings.
The main components
Quality of Claim Metrics in Social Sensing Systems 3/20
Introduction
Quality of Claim Metrics in Social Sensing Systems 4/20
Uncertainty and Invalidity
Spam Gossip
User inaccuracy
Sensor inaccuracy
Problems
Sources of claim uncertainty and
invalidity:
• Gossip• Regular expressions• “is (that | this | it) true”• “wh[a]*t[?!][?1]*”
• Spam• In web-based systems: CAPTCHA • In social networks: by analyzing the inputs
such as tags, links, tips and comments
Quality of Claim Metrics in Social Sensing Systems 5/20
Sources of Claim Uncertainty & Invalidity
Inaccuracy of users• People are the core element
of the social sensing system• Main weak points of the
system: Human errors• Claims cannot be fully
trusted
Quality of Claim Metrics in Social Sensing Systems 6/20
Sources of Claim Uncertainty & Invalidity (Cont.)
Claim validation
assessment:
• How to identify valid claims?• This issue was introduced on web
before:• Sums, Average Log, Investment.
• Some possible solutions:• machine learning• natural language processing• data mining• clustering methods
Quality of Claim Metrics in Social Sensing Systems 7/20
Sources of claim uncertainty & invalidity (Cont.)
Quality of Claim Metrics in Social Sensing Systems 8/20
Quality of claim metricsContent Measure:
The richness of the claim contents facilitates the back-end applications.
Feedback (Popularity) Measure• Each claim published on a social
network may provoke reactions• users judgments• redistributing the claim
Content diversity
• The diversity of the type of information• Text, Video, Image
User tagging
• users can be mentioned and notified by each other• provides new information about the importance of the claim• mentioning can be analyzed to find debates between users
Quality of Claim Metrics in Social Sensing Systems 9/20
Content Measure
Quantity of used
keywords
• The set of keywords is dependent on the subject• The set of keywords needs a prior knowledge• The set can be extracted by preprocessing the claims• The higher number of used keywords will increase the value of the claims
Geo-tagging
• It is used to pin the locations of the users• The information is valuable in location base analysis to cluster the
reporting user
Quantity of used
hashtags
• Analyzing hashtags are easier than the keywords• one of the main approaches to query the posted claims over a
specific period of time
Quality of Claim Metrics in Social Sensing Systems 10/20
Content Measure (Cont.)
Opinion reaction
• This parameter can help validate the information by unknown users.
• In some of the systems, users may rate by giving stars
Redistribution
• The number of reclaims shows the popularity of the claim
Quality of Claim Metrics in Social Sensing Systems 11/20
Feedback Measure
Quality of Claim Metrics in Social Sensing Systems 12/20
Social Network Support
Two hashtag-centric and user-centric datasets are gathered by the crawler for the evaluation
The first dataset is extracted from the Twitter based on IranDeal hashtag• 260,000 tweets• 66,238 users
The second dataset is extracted from the Foursquare social network• 7,402 users• 40,741 Tips• 35,503 restaurants
Quality of Claim Metrics in Social Sensing Systems 13/20
Datasets
Evaluation: Comments/User The users are grouped
according to the number of reported claims
About 14% of the users (36663 users) post exactly 1 tweet.
Only 4% have two posts.
The percentage decreases as the number of tweets increases.
14/20Quality of Claim Metrics in Social Sensing Systems
The number of likes for each comment shows its popularity
the comments are categorized based on their number of likes
A large fraction of tweets (93%) does not get any favorites
The portion of tweets that gets 1 and 2 favorites are 3.4% and 1.1% respectively
15/20
Popularity of comments
Quality of Claim Metrics in Social Sensing Systems
One of the other popularity metrics is the rate of sharing a comment.
It expresses the dependency between the QoC metrics and the way the dataset is crawled
people who follow the hashtag are eager to share the news headline
The sparsity of the data for the values of higher than 500 affects the results
16/20
Re-Tweets
Quality of Claim Metrics in Social Sensing Systems
The tags provide extra information that boosts claims processing applications
The highest frequency belongs to the comments with a single tagged user (140191 tweets)
The highest population of tagged users in a tweet is mentioned to be 12 people
Around 15% of tweets tagged exactly two users and the values decrease in higher numbers
17/20
Tagged user / comment
Quality of Claim Metrics in Social Sensing Systems
Power law distribution◦ We used the Zipf law.
◦ S shows the degree of curve slope.
18/20
Evaluation and analysis
Comparing the value of s for these datasets implies that the nature of the used social network affects the
characteristics of the dataset.
Quality of Claim Metrics in Social Sensing Systems
We Review the Sources of
claim uncertainty and
invalidity
Defines a new set of quality of claims metrics
The analysis show that most of the metrics
follow the power law. But
it is not a general rule
The degree of power law is dependent to the nature of
dataset and the social network
19/20
Conclusion
Quality of Claim Metrics in Social Sensing Systems
20/20
Questions
Quality of Claim Metrics in Social Sensing Systems
Recommended