Upload
dongpo-deng
View
575
Download
4
Tags:
Embed Size (px)
Citation preview
Utilizing Linked Open Data (LOD) Resources for
Semantic Enhancement of User-Generated Content
Dong-Po Deng1,2, Guan-Shuo Mai3, Cheng-Hsin Hsu3, Chin-Lung Chang1,4, Tyng-Ruey Chuang1, and Kwang-Tsao Shao3
1ITC, University of Twente, Enschede, the Netherlands
2Institute of Information Science & 3Biodiversity Research Center,Academia Sinica, Taipei, Taiwan
4Department of Computer Science and Information EngineeringNational Taiwan University of Science and Technology
Taipei, Taiwan
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
2
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
3
Thursday, February 7, 2013
2012/12/3JIST2012
Background
Web 2.0 technologies enable people to contribute their content on the web, e.g. wiki, blog, tagging
Social media utilize web 2.0 technologies to support social interactive on the web, e.g. twitter, flickr, facebook
The content on the web (or/and social media) contributed by people is called “User-Generated Content” (UGC)
UGC is mainly multimedia or textual dataUGC is considered as a potential resource for scientific projects, e.g. citizen science
4
Thursday, February 7, 2013
2012/12/3JIST2012
Background(cont.)
There are several problems to harvest UGC to scientific purposes The unstructured UGC is difficult to handle The semantics of UGC is often ambiguous or/and poor Social media is not designed for scientific purposes
5
Courtesy from http://www.datenform.de/mapeng.html
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
6
Thursday, February 7, 2013
2012/12/3JIST2012
Motivation
LOD datasets as resources LOD aims on how to make data available on the Web, and to interconnect data with the aim of increasing its value for users
about 300 datasets consisting of over 31 billion RDF triples within LOD projects.
Each entry representing a fact in LOD datasets has a Unique Resource Identifier (URI) which is referenceable and linkable on the Web.
The high interconnectivity between entries potentially increases discoverability, reusability, and the utility of information
7
Thursday, February 7, 2013
2012/12/3JIST2012
Motivation (cont.)
Therefore, if named entities of UGC can be identified and connected to entries of LOD, the semantics of named entities would be disambiguated, so that the UGC could be easier to process.
8
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
9
Thursday, February 7, 2013
2012/12/3JIST2012
Data collection
Two Facebook interest groups for ecological observations in Taiwan
10
http://www.facebook.com/groups/roadkilled/ http://www.facebook.com/groups/enjoymoths/
Thursday, February 7, 2013
2012/12/3JIST2012
Ecological Observations on Facebook
11
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
12
Thursday, February 7, 2013
2012/12/3JIST2012
LOD Ecology
Linked Open Data of Ecology (LODE) is a validated dataset from a LOD project.
LODE integrated 5 previously distributed databases:
13
TFRI: Taiwan Forestry Research Institute
Thursday, February 7, 2013
2012/12/3JIST2012
LODE in Linked Open Data Cloud
14
Thursday, February 7, 2013
2012/12/3JIST2012
LODE in Linked Open Data Cloud
14
Thursday, February 7, 2013
2012/12/3JIST2012
LOD Taiwan Geographic Name (TGN)
LOD TGN is mainly transferred from Taiwan Gazetteer via LOD principles
LOD TGN has 159,241 geographic name entries, in which 17,442 entries are linked to geonames.org
15
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
16
Thursday, February 7, 2013
2012/12/3JIST2012
Information Extraction Information Reuse
Information Formalization
An approach for processing UGC
17
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
18
Thursday, February 7, 2013
2012/12/3JIST2012
Problems in Chinese species names in Facebook ecological observations
玉帶鳳蝶 (Papilio Polytes)
曙鳳蝶 (Atrophaneura Horishana)
琉璃紋鳳蝶 (Papilio Hermosanus)
曙鳳
玉帶
琉璃Adjective Noun
細紋 (pronounced Si-Wen, meaning “fine veined”
細紋新蠍蛉細紋蠍蛉細紋黃鉤蛾
...15 species names with prefix name “細紋”
(1)
(2)
19
Thursday, February 7, 2013
2012/12/3JIST2012
Confidence value =
Identifying shortened species names
20
Thursday, February 7, 2013
2012/12/3JIST2012
Determine a species name for a thread
What if several species names had mentioned in one thread? We used three criteria How many Like does the post or the comments get?
How prestigious are the people who post or make comments?
How many times does a species name occur in a thread?
21
Thursday, February 7, 2013
2012/12/3JIST2012
The problems of geographic names in Facebook ecological observations
特生中心
特有生物研究保育中心Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin
An example:The Endemic Species Research Institute
Te-Sheng-Jhong-Sin
is shorten to
22
Thursday, February 7, 2013
2012/12/3JIST2012
The problems of geographic names in Facebook ecological observations
特生中心
特有生物研究保育中心Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin
An example:The Endemic Species Research Institute
Te-Sheng-Jhong-Sin
is shorten to
There are no rules to shorten long geographic names
22
Thursday, February 7, 2013
2012/12/3JIST2012
Identifying shortened geographic names
23
Thursday, February 7, 2013
2012/12/3JIST2012
The ontology...
is relied on a Facebook thread, which is an entity comprised of social media contents involving peoples, places, time periods, photos, and links to other contents
uses standard vocabularies, Semantically-Interlinked Online communities (SIOC) can be used to represent the structure of Facebook posts, comments, and threads.
Friend of a Friend (FOAF) can be used to describe content creators,
and Dublin Core for the interlinked contents they created
24
Thursday, February 7, 2013
2012/12/3JIST2012
An ontology for formalizing the extractedinformation from Facebook threads
25
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
26
Thursday, February 7, 2013
2012/12/3JIST2012
http://140.109.28.64:2020/page/thread/177883715557195_440860179259546
Transfer ecological observations in Facebook to RDF
27
Thursday, February 7, 2013
2012/12/3JIST2012
http://140.109.28.64:2020/page/thread/177883715557195_440860179259546
Transfer ecological observations in Facebook to RDF
27
Thursday, February 7, 2013
2012/12/3JIST2012
The extracted species name from the Facebook thread is linked to LOD resources
28
Thursday, February 7, 2013
2012/12/3JIST2012
The extracted species name from the Facebook thread is linked to LOD resources
28
Thursday, February 7, 2013
2012/12/3JIST2012
The extracted species name from the Facebook thread is linked to LOD resources
28
Thursday, February 7, 2013
2012/12/3JIST2012
The extracted species name from the Facebook thread is linked to LOD resources
28
Thursday, February 7, 2013
2012/12/3JIST2012
A taxon of Theretra Nessus is the extracted species name
29
Thursday, February 7, 2013
2012/12/3JIST2012
A taxon of Theretra Nessus is the extracted species name
29
This entry is connected to LODE via owl:sameAs
Thursday, February 7, 2013
2012/12/3JIST2012
The extracted place name from the Facebook thread is linked to LOD resources
30
Thursday, February 7, 2013
2012/12/3JIST2012
The extracted place name from the Facebook thread is linked to LOD resources
30
Thursday, February 7, 2013
2012/12/3JIST2012
The extracted place name from the Facebook thread is linked to LOD resources
30
Thursday, February 7, 2013
2012/12/3JIST2012
The extracted place name from the Facebook thread is linked to LOD resources
30
Thursday, February 7, 2013
2012/12/3JIST2012
The entry of LOD TGN transferred from Taiwan Gazetteer
31
Thursday, February 7, 2013
2012/12/3JIST2012
The entry of LOD TGN transferred from Taiwan Gazetteer
31
It is linked to geonames.org via owl:sameAs
Thursday, February 7, 2013
2012/12/3JIST2012
Publish the processed Facebook ecological observations
32
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
33
Thursday, February 7, 2013
2012/12/3JIST2012
A semantic annotation plug-in for entering geographic names in Facebook posts
34
Thursday, February 7, 2013
2012/12/3JIST2012
A semantic annotation plug-in for entering geographic names in Facebook posts
34
Thursday, February 7, 2013
2012/12/3JIST2012
A semantic annotation plug-in for entering geographic names in Facebook posts
34
Thursday, February 7, 2013
2012/12/3JIST2012 35
Thursday, February 7, 2013
2012/12/3JIST2012
Outline
BackgroundMotivationData CollectionLOD resources - LODE and LOD TGNAn approach for processing UGC Information Extraction Information Formalization Information Reuse
Conclusion remarking
36
Thursday, February 7, 2013
2012/12/3JIST2012
Conclusion remarking
This study reports our experiences in transferring FB ecological observations to interlink to LOD resources (LODE and LOD TGN)
With these information extraction tools and LOD resources, we developed a tool for semantic enhancement of user input.
The LOD TGN is an ongoing project. In the future, we will consolidate the feature types of the geographic names, and we plan to make the LOD TGN a geospatial semantics reference resource.
37
Thursday, February 7, 2013
2012/12/3JIST2012
Thank you for your attentions
Questions?
38
Thursday, February 7, 2013