30
Delft University of Technology Semantics + Filtering + Search = Twitcident Exploring Information in Social Web Streams Hypertext 2012, Milwaukee, WI – June 28 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands

Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

  • Upload
    ke-tao

  • View
    486

  • Download
    2

Embed Size (px)

DESCRIPTION

Talk by Ke Tao (from Web Information Systems, TU Delft) at 23rd ACM Conference on Hypertext and Social Media, June 28 2012, Milwaukee, WI, USA

Citation preview

Page 1: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

DelftUniversity ofTechnology

Semantics + Filtering + Search = TwitcidentExploring Information in Social Web StreamsHypertext 2012, Milwaukee, WI – June 28

Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao

Web Information Systems, TU Delft, the Netherlands

Page 2: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

2Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

200,000,000number of tweets published per day

Page 3: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

3

Pukkelpop 2011

People tweet about everything,

everywhere :-)

Page 4: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

4

Pukkelpop 2011

81,000 tweets in four hours

became a tragedy

Filtering

200,000,000

Search & Analytics

Useful tweets?

Page 5: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

5

Case NijmegenTrain accident

Page 6: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

6Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

First tweet…

And then your train blasts off full of the anvils. #Nijmegen #veolia

Page 7: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

7Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

First picture…

Astonishing! My train rams the platform at Nijmegen!

http://pic.twitter.com/QVVfJHyd

Page 8: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

8Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Traditional news media

A train ramed the anvils at Nijmegen.

Page 9: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

9Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

1. (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident?

2. Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets?

Twitter streams

Research Challenges

Filtering

topic

Search & Analytics

information need

Page 10: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

10Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

Page 11: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

11Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident system

Page 12: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

12Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

Page 13: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

13Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Incident detection

• Twiticident relies on Emergency Broadcasting Services for detecting incidents.

• In the Netherlands : P2000 communication network

Page 14: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

14Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Incident Profiling• For an incident i:

• The profile of an incident is described as a set of tuples.

• Each tuple includes a facet-value pair (f, v) and its weight to the incident i.

Location, Netherlands

0.4

Incident,Train

accident0.5

Location, Nijmegen

0.8

Orgranization,Veolia

0.6

Incident,Crash

1.0

Page 15: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

15Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

Page 16: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

16Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Social Media Aggregation • Collecting Twitter messages, pictures, and videos from Social Media Platforms e.g. Twitter, PhotoBucket, Vimeo

Page 17: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

17Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

Page 18: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

18Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Semantic Enrichment

• Named Entity Recognition

• Classification : Casualties, Damages, Risks…

• Linkage : External Resources

• Metadata extraction

Page 19: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

19Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

Page 20: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

20Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Filtering

•Which tweets are relevant to the incidents?

• Preprocessing : Language detection

• Semantic Filtering : Compare tweet with P(i)

• Semantic Filtering with News Context• P’(i) : P(i) complemented with f-v pairs from

news

Page 21: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

21Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Twitcident Pipeline

Automatic Filtering

Search & Analytics

Page 22: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

22Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Faceted Search

•Strategies (ranking)

• Frequency-based

• Time-sensitive based

• Personalized

Page 23: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

23Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Real-time analyticsWhat type of things are mentioned in the tweets?

What aspects are mentioned over time? What do people report about over time?

Impact Area

Page 24: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

24Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Evaluation - Dataset

• Twitter corpus ( TREC Microblog Track 2011 ) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 )• 4,766,901 tweets classified as English• 6.2 million entity-extractions

• News (Same time period)• 62 RSS News Feeds• 13,959 News Articles• 357,559 entity-extractions

Page 25: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

25Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor tweets Filtering (1/2)

Semantic strategies outperform the keyword-based filtering regarding all metrics.

Page 26: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

26Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor tweets Filtering (2/2)

The semantic strategy is more robust and achieves higher precisions for complex topics.

Page 27: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

27Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor Faceted Search (1/2)

The semantic faceted search strategy improves the search performance by 34.8% and 22.4%.

Page 28: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

28Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

EvaluationFor Faceted Search (2/2)

The strategies with semantic enrichment outperform the strategy without semantic enrichment in predicting the appropriate facet-values.

Page 29: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

29Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Conclusions

• What we have done:

• Twitcident, a framework for filtering, searching, and

analyzing information about incidents that people

publish in their Social Web Streams

• What we have achieved:

• Better filtering of Twitter messages for a given incident.

• Better search for relevant information about an incident

within the filtered messages.

Page 30: Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

30Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

Thank you!

Ke Tao @taubau

@wisdelfthttp://twitcident.org