Upload
ke-tao
View
486
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Talk by Ke Tao (from Web Information Systems, TU Delft) at 23rd ACM Conference on Hypertext and Social Media, June 28 2012, Milwaukee, WI, USA
Citation preview
DelftUniversity ofTechnology
Semantics + Filtering + Search = TwitcidentExploring Information in Social Web StreamsHypertext 2012, Milwaukee, WI – June 28
Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao
Web Information Systems, TU Delft, the Netherlands
2Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
200,000,000number of tweets published per day
3
Pukkelpop 2011
People tweet about everything,
everywhere :-)
4
Pukkelpop 2011
81,000 tweets in four hours
became a tragedy
Filtering
200,000,000
Search & Analytics
Useful tweets?
5
Case NijmegenTrain accident
6Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
First tweet…
And then your train blasts off full of the anvils. #Nijmegen #veolia
7Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
First picture…
Astonishing! My train rams the platform at Nijmegen!
http://pic.twitter.com/QVVfJHyd
8Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Traditional news media
A train ramed the anvils at Nijmegen.
9Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
1. (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident?
2. Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets?
Twitter streams
Research Challenges
Filtering
topic
Search & Analytics
information need
10Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
11Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident system
12Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
13Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Incident detection
• Twiticident relies on Emergency Broadcasting Services for detecting incidents.
• In the Netherlands : P2000 communication network
14Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Incident Profiling• For an incident i:
• The profile of an incident is described as a set of tuples.
• Each tuple includes a facet-value pair (f, v) and its weight to the incident i.
Location, Netherlands
0.4
Incident,Train
accident0.5
Location, Nijmegen
0.8
Orgranization,Veolia
0.6
Incident,Crash
1.0
15Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
16Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Social Media Aggregation • Collecting Twitter messages, pictures, and videos from Social Media Platforms e.g. Twitter, PhotoBucket, Vimeo
17Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
18Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Semantic Enrichment
• Named Entity Recognition
• Classification : Casualties, Damages, Risks…
• Linkage : External Resources
• Metadata extraction
19Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
20Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Filtering
•Which tweets are relevant to the incidents?
• Preprocessing : Language detection
• Semantic Filtering : Compare tweet with P(i)
• Semantic Filtering with News Context• P’(i) : P(i) complemented with f-v pairs from
news
21Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Twitcident Pipeline
Automatic Filtering
Search & Analytics
22Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Faceted Search
•Strategies (ranking)
• Frequency-based
• Time-sensitive based
• Personalized
23Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Real-time analyticsWhat type of things are mentioned in the tweets?
What aspects are mentioned over time? What do people report about over time?
Impact Area
24Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Evaluation - Dataset
• Twitter corpus ( TREC Microblog Track 2011 ) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 )• 4,766,901 tweets classified as English• 6.2 million entity-extractions
• News (Same time period)• 62 RSS News Feeds• 13,959 News Articles• 357,559 entity-extractions
25Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
EvaluationFor tweets Filtering (1/2)
Semantic strategies outperform the keyword-based filtering regarding all metrics.
26Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
EvaluationFor tweets Filtering (2/2)
The semantic strategy is more robust and achieves higher precisions for complex topics.
27Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
EvaluationFor Faceted Search (1/2)
The semantic faceted search strategy improves the search performance by 34.8% and 22.4%.
28Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
EvaluationFor Faceted Search (2/2)
The strategies with semantic enrichment outperform the strategy without semantic enrichment in predicting the appropriate facet-values.
29Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Conclusions
• What we have done:
• Twitcident, a framework for filtering, searching, and
analyzing information about incidents that people
publish in their Social Web Streams
• What we have achieved:
• Better filtering of Twitter messages for a given incident.
• Better search for relevant information about an incident
within the filtered messages.
30Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams
Thank you!
Ke Tao @taubau
@wisdelfthttp://twitcident.org