51
Utilizing Linked Open Data (LOD) Resources for Semantic Enhancement of User-Generated Content Dong-Po Deng 1,2 , Guan-Shuo Mai 3 , Cheng-Hsin Hsu 3 , Chin-Lung Chang 1,4 , Tyng-Ruey Chuang 1 , and Kwang-Tsao Shao 3 1 ITC, University of Twente, Enschede, the Netherlands 2 Institute of Information Science & 3 Biodiversity Research Center, Academia Sinica, Taipei, Taiwan 4 Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, Taiwan Thursday, February 7, 2013

JIST 2012

Embed Size (px)

Citation preview

Page 1: JIST 2012

Utilizing Linked Open Data (LOD) Resources for

Semantic Enhancement of User-Generated Content

Dong-Po Deng1,2, Guan-Shuo Mai3, Cheng-Hsin Hsu3, Chin-Lung Chang1,4, Tyng-Ruey Chuang1, and Kwang-Tsao Shao3

1ITC, University of Twente, Enschede, the Netherlands

2Institute of Information Science & 3Biodiversity Research Center,Academia Sinica, Taipei, Taiwan

4Department of Computer Science and Information EngineeringNational Taiwan University of Science and Technology

Taipei, Taiwan

Thursday, February 7, 2013

Page 2: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

2

Thursday, February 7, 2013

Page 3: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

3

Thursday, February 7, 2013

Page 4: JIST 2012

2012/12/3JIST2012

Background

Web 2.0 technologies enable people to contribute their content on the web, e.g. wiki, blog, tagging

Social media utilize web 2.0 technologies to support social interactive on the web, e.g. twitter, flickr, facebook

The content on the web (or/and social media) contributed by people is called “User-Generated Content” (UGC)

UGC is mainly multimedia or textual dataUGC is considered as a potential resource for scientific projects, e.g. citizen science

4

Thursday, February 7, 2013

Page 5: JIST 2012

2012/12/3JIST2012

Background(cont.)

There are several problems to harvest UGC to scientific purposes The unstructured UGC is difficult to handle The semantics of UGC is often ambiguous or/and poor Social media is not designed for scientific purposes

5

Courtesy from http://www.datenform.de/mapeng.html

Thursday, February 7, 2013

Page 6: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

6

Thursday, February 7, 2013

Page 7: JIST 2012

2012/12/3JIST2012

Motivation

LOD datasets as resources LOD aims on how to make data available on the Web, and to interconnect data with the aim of increasing its value for users

about 300 datasets consisting of over 31 billion RDF triples within LOD projects.

Each entry representing a fact in LOD datasets has a Unique Resource Identifier (URI) which is referenceable and linkable on the Web.

The high interconnectivity between entries potentially increases discoverability, reusability, and the utility of information

7

Thursday, February 7, 2013

Page 8: JIST 2012

2012/12/3JIST2012

Motivation (cont.)

Therefore, if named entities of UGC can be identified and connected to entries of LOD, the semantics of named entities would be disambiguated, so that the UGC could be easier to process.

8

Thursday, February 7, 2013

Page 9: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

9

Thursday, February 7, 2013

Page 10: JIST 2012

2012/12/3JIST2012

Data collection

Two Facebook interest groups for ecological observations in Taiwan

10

http://www.facebook.com/groups/roadkilled/ http://www.facebook.com/groups/enjoymoths/

Thursday, February 7, 2013

Page 11: JIST 2012

2012/12/3JIST2012

Ecological Observations on Facebook

11

Thursday, February 7, 2013

Page 12: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

12

Thursday, February 7, 2013

Page 13: JIST 2012

2012/12/3JIST2012

LOD Ecology

Linked Open Data of Ecology (LODE) is a validated dataset from a LOD project.

LODE integrated 5 previously distributed databases:

13

TFRI: Taiwan Forestry Research Institute

Thursday, February 7, 2013

Page 14: JIST 2012

2012/12/3JIST2012

LODE in Linked Open Data Cloud

14

Thursday, February 7, 2013

Page 15: JIST 2012

2012/12/3JIST2012

LODE in Linked Open Data Cloud

14

Thursday, February 7, 2013

Page 16: JIST 2012

2012/12/3JIST2012

LOD Taiwan Geographic Name (TGN)

LOD TGN is mainly transferred from Taiwan Gazetteer via LOD principles

LOD TGN has 159,241 geographic name entries, in which 17,442 entries are linked to geonames.org

15

Thursday, February 7, 2013

Page 17: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

16

Thursday, February 7, 2013

Page 18: JIST 2012

2012/12/3JIST2012

Information Extraction Information Reuse

Information Formalization

An approach for processing UGC

17

Thursday, February 7, 2013

Page 19: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

18

Thursday, February 7, 2013

Page 20: JIST 2012

2012/12/3JIST2012

Problems in Chinese species names in Facebook ecological observations

玉帶鳳蝶 (Papilio Polytes)

曙鳳蝶 (Atrophaneura Horishana)

琉璃紋鳳蝶 (Papilio Hermosanus)

曙鳳

玉帶

琉璃Adjective Noun

細紋 (pronounced Si-Wen, meaning “fine veined”

細紋新蠍蛉細紋蠍蛉細紋黃鉤蛾

...15 species names with prefix name “細紋”

(1)

(2)

19

Thursday, February 7, 2013

Page 21: JIST 2012

2012/12/3JIST2012

Confidence value =

Identifying shortened species names

20

Thursday, February 7, 2013

Page 22: JIST 2012

2012/12/3JIST2012

Determine a species name for a thread

What if several species names had mentioned in one thread? We used three criteria How many Like does the post or the comments get?

How prestigious are the people who post or make comments?

How many times does a species name occur in a thread?

21

Thursday, February 7, 2013

Page 23: JIST 2012

2012/12/3JIST2012

The problems of geographic names in Facebook ecological observations

特生中心

特有生物研究保育中心Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin

An example:The Endemic Species Research Institute

Te-Sheng-Jhong-Sin

is shorten to

22

Thursday, February 7, 2013

Page 24: JIST 2012

2012/12/3JIST2012

The problems of geographic names in Facebook ecological observations

特生中心

特有生物研究保育中心Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin

An example:The Endemic Species Research Institute

Te-Sheng-Jhong-Sin

is shorten to

There are no rules to shorten long geographic names

22

Thursday, February 7, 2013

Page 25: JIST 2012

2012/12/3JIST2012

Identifying shortened geographic names

23

Thursday, February 7, 2013

Page 26: JIST 2012

2012/12/3JIST2012

The ontology...

is relied on a Facebook thread, which is an entity comprised of social media contents involving peoples, places, time periods, photos, and links to other contents

uses standard vocabularies, Semantically-Interlinked Online communities (SIOC) can be used to represent the structure of Facebook posts, comments, and threads.

Friend of a Friend (FOAF) can be used to describe content creators,

and Dublin Core for the interlinked contents they created

24

Thursday, February 7, 2013

Page 27: JIST 2012

2012/12/3JIST2012

An ontology for formalizing the extractedinformation from Facebook threads

25

Thursday, February 7, 2013

Page 28: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

26

Thursday, February 7, 2013

Page 29: JIST 2012

2012/12/3JIST2012

http://140.109.28.64:2020/page/thread/177883715557195_440860179259546

Transfer ecological observations in Facebook to RDF

27

Thursday, February 7, 2013

Page 30: JIST 2012

2012/12/3JIST2012

http://140.109.28.64:2020/page/thread/177883715557195_440860179259546

Transfer ecological observations in Facebook to RDF

27

Thursday, February 7, 2013

Page 31: JIST 2012

2012/12/3JIST2012

The extracted species name from the Facebook thread is linked to LOD resources

28

Thursday, February 7, 2013

Page 32: JIST 2012

2012/12/3JIST2012

The extracted species name from the Facebook thread is linked to LOD resources

28

Thursday, February 7, 2013

Page 33: JIST 2012

2012/12/3JIST2012

The extracted species name from the Facebook thread is linked to LOD resources

28

Thursday, February 7, 2013

Page 34: JIST 2012

2012/12/3JIST2012

The extracted species name from the Facebook thread is linked to LOD resources

28

Thursday, February 7, 2013

Page 35: JIST 2012

2012/12/3JIST2012

A taxon of Theretra Nessus is the extracted species name

29

Thursday, February 7, 2013

Page 36: JIST 2012

2012/12/3JIST2012

A taxon of Theretra Nessus is the extracted species name

29

This entry is connected to LODE via owl:sameAs

Thursday, February 7, 2013

Page 37: JIST 2012

2012/12/3JIST2012

The extracted place name from the Facebook thread is linked to LOD resources

30

Thursday, February 7, 2013

Page 38: JIST 2012

2012/12/3JIST2012

The extracted place name from the Facebook thread is linked to LOD resources

30

Thursday, February 7, 2013

Page 39: JIST 2012

2012/12/3JIST2012

The extracted place name from the Facebook thread is linked to LOD resources

30

Thursday, February 7, 2013

Page 40: JIST 2012

2012/12/3JIST2012

The extracted place name from the Facebook thread is linked to LOD resources

30

Thursday, February 7, 2013

Page 41: JIST 2012

2012/12/3JIST2012

The entry of LOD TGN transferred from Taiwan Gazetteer

31

Thursday, February 7, 2013

Page 42: JIST 2012

2012/12/3JIST2012

The entry of LOD TGN transferred from Taiwan Gazetteer

31

It is linked to geonames.org via owl:sameAs

Thursday, February 7, 2013

Page 43: JIST 2012

2012/12/3JIST2012

Publish the processed Facebook ecological observations

32

Thursday, February 7, 2013

Page 44: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

33

Thursday, February 7, 2013

Page 45: JIST 2012

2012/12/3JIST2012

A semantic annotation plug-in for entering geographic names in Facebook posts

34

Thursday, February 7, 2013

Page 46: JIST 2012

2012/12/3JIST2012

A semantic annotation plug-in for entering geographic names in Facebook posts

34

Thursday, February 7, 2013

Page 47: JIST 2012

2012/12/3JIST2012

A semantic annotation plug-in for entering geographic names in Facebook posts

34

Thursday, February 7, 2013

Page 48: JIST 2012

2012/12/3JIST2012 35

Thursday, February 7, 2013

Page 49: JIST 2012

2012/12/3JIST2012

Outline

BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse

Conclusion remarking

36

Thursday, February 7, 2013

Page 50: JIST 2012

2012/12/3JIST2012

Conclusion remarking

This study reports our experiences in transferring FB ecological observations to interlink to LOD resources (LODE and LOD TGN)

With these information extraction tools and LOD resources, we developed a tool for semantic enhancement of user input.

The LOD TGN is an ongoing project. In the future, we will consolidate the feature types of the geographic names, and we plan to make the LOD TGN a geospatial semantics reference resource.

37

Thursday, February 7, 2013

Page 51: JIST 2012

2012/12/3JIST2012

Thank you for your attentions

Questions?

[email protected]

38

Thursday, February 7, 2013