Upload
mohsen-taheriyan
View
358
Download
1
Embed Size (px)
Citation preview
2
Motivation
How to express the intended meaning of data?
Explicit semantics is missing in many of the structured sources
name date city state workplace
1 Fred Collins Oct 1959 Seattle WA Microsoft
2 Tina Peterson May 1980 New York NY Google
Employee? CEO?
Birth date? Death date? Employment date?
Person? Organization?
Birth city? Work city?
3
Semantic ModelMap the source to the classes & properties in an
ontology
object propertydata propertysubClassOf
Person
Organization
Place
State
name
birthdatebornIn
worksFor state
namephone
namelivesIn
CityEvent
ceolocationorganizer
nearby
startDate
endDatetitle
isPartOf
postalCode
name date city state workplace
1 Fred Collins Oct 1959 Seattle WA Microsoft
2 Tina Peterson May 1980 New York NY GoogleSour
ce
Dom
ain
Onto
logy
4
Semantic Types
Person OrganizationCity State
name birthdate name namename
Person
name date city state workplace
1 Fred Collins Oct 1959 Seattle WA Microsoft
2 Tina Peterson May 1980 New York NY Google
5
Relationships
OrganizationCity State
name birthdate name namename
Person
name date city state workplace
1 Fred Collins Oct 1959 Seattle WA Microsoft
2 Tina Peterson May 1980 New York NY Google
bornIn
worksFor
state
8
Outline
• Semi-automatically building semantic models
• Learning semantics models from known models
• Inferring semantic relations from LOD
• Related Work
• Discussion & Future Work
Semi-automatically Building Semantic Models
Contribution: a graph-based approach to extract implicit relationships
10
Approach [Knoblock et al, ESWC 2012]
Domain Ontology
LearnSemantic
Types
ExtractRelationship
s
Steiner Tree
Sample Data
Construct a Graph
Implemented in Karma
http://www.isi.edu/integration/karma
@KarmaSemWeb
12
ExampleSourc
e
object propertydata propertysubClassOf
Domain Ontology
name date city state workplace
1 Fred Collins Oct 1959 Seattle WA Microsoft
2 Tina Peterson May 1980 New York NY Google
Goal: Find a semantic model for the source (map the source to the ontology)
13
Learning Semantic Types[Krishnamurthy et al., ESWC 2015]
class?
property ?
workplace
Microsoft
Amazon
…
14
Learning Semantic Types
Organization
name
workplace
Microsoft
Amazon
…
1. User Specifies2. Systems learns
15
Learning Semantic Types
Organization
name
workplace
Microsoft
Amazon
…
employer
Apple
…
16
Learning Semantic Types
Organization
name
workplace
Microsoft
Amazon
…
Organization
name
employer
Apple
…
17
Semantic Labeling Approach• Each semantic type: label• Each data column: document
• Textual Data– Compute TF/IDF vectors for documents– Compare documents using Cosine Similarity between TF/IDF
vectors
• Numeric Data– Use Statistical Hypothesis Testing – Intuition: distribution of values in different semantic types is
different, e.g., temperature vs. population
• Return Top-k suggestions based on the confidence scores
18
Construct a Graph
Construct a graph from semantic types and ontologyPerson Organizatio
nCity State
name birthdate name namename
Person
name date city state workplace
1 Fred Collins Oct 1959 Seattle WA Microsoft
2 Tina Peterson May 1980 New York NY Google
22
Inferring the RelationshipsSelect minimal tree that connects all semantic types
– A customized Steiner tree algorithm
date
Initial Model
24
Refining the ModelImpose constraints on Steiner Tree Algorithm
– Change weight of selected links to ε– Add source and target of selected link to Steiner nodes
date
Correct Model
26
EvaluationEvaluation Dataset EDM
# sources 29# classes in the ontologies 119# properties in the ontologies 351# nodes in the gold standard models 473# links in the gold standard models 444
• Measured the user effort in Karma to model the sources
• Started with no training data• User actions
– Assign/Change semantic types– Change relationships
27
User Effortsource columns Choose Type Change Link Time
(min)s1 7 7 1 3s2 12 5 2 6s3 4 0 0 2s4 17 5 7 8s5 14 4 6 7s6 18 4 4 7s7 14 1 4 6s8 6 0 4 3… … … … …s29 9 2 1 3Total 329 56 96 128
Avg. min per source: 4.4 minutesAvg. # user actions per source: 152/29=5.2
28
Limitation
• This approach does not learn the changes done by the user in relationships
• User has to go through the refinement process each time
30
Main IdeaSources in the same domain often have similar
data
Exploit knowledge of known semantic models to hypothesize a semantic model for a new sources
31
ExampleEDM
SKOS
FOAF
AAC
ElementsGr2
ORE
DCTerms
Domain ontologies:Source: Dallas Museum of Art dma(title,creationDate,name,type)
Domain: Museum Data
32
ExampleEDM
SKOS
FOAF
AAC
ElementsGr2
ORE
DCTerms
Domain ontologies:
Domain: Museum Data
Source: National Portrait Gallery npg(name,artist,year,image)
33
ExampleEDM
SKOS
FOAF
AAC
ElementsGr2
ORE
DCTerms
Domain ontologies:
Domain: Museum Data
Goal: Automatically suggest a semantic model for dia
Source: Detroit Institute of Art dia(title,credit,classification,name,imageURL)
34
Approach[Taheriyan et al, ISWC 2013, ICSC 2014, JWS 2015]
Domain Ontology
LearnSemantic
Types
Sample Data
Construct a Graph
Generate Candidate
Models
Rank ResultsKnown
Semantic Models Implemented in
Karma
36
ApproachInput
Learn semantic types for attributes(s)
• Sample data from new source (S)• Domain Ontologies (O)• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and VGenerate and rank semantic models
1
Output• A ranked set of semantic models for
S
37
Learn Semantic Types• Learn Semantic Types for each attribute from its
data• Pick top K semantic types according to their
confidence valuesdia(title,credit, classification,name,imageURL)
title<aac:CulturalHeritageObject, dcterms:title> 0.49<aac:CulturalHeritageObject, rdfs:label> 0.28
credit<aac:CulturalHeritageObject, dcterms:provenance> 0.83<aac:Person, ElementsGr2:note> 0.06
classification
<skos:Concept, skos:prefLabel> 0.58<skos:Concept, rdfs:label> 0.41
name<aac:Person, foaf:name> 0.65<fofa:Person, fofaf:name> 0.32
imageURL<foaf:Document, uri> 0.47<edm:WebResource, uri> 0.40
38
ApproachInput
Learn semantic types for attributes(s)
• Sample data from new source (S)• Domain Ontologies (O)• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and VGenerate and rank semantic models
2
Output• A ranked set of semantic models for
S
39
• Annotate (tag) nodes and links with list of supporting models• Adjust weight based on the number of supporting models
Build Graph G: Add Known Models
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept
1
1
1dma
dma
1dma
dma 1dma
1dma
dma
40
• Annotate (tag) nodes and links with list of supporting models• Adjust weight based on the number of tags
Build Graph G: Add Known Models
dma npg
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
EuropeanaAggregation
WebResource
image
uri
aggregatedCHOhasView
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
41
Build Graph G: Add Semantic Typestitle <CulturalHeritageObject,title>
<CulturalHeritageObject,label> credit <CulturalHeritageObject,provenance> <Person,note>
classification <Concept,prefLabel> <Concept,label>name <aac:Person,name> <foaf:Person,name>
imageURL <Document,uri> <WebResource,uri>
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
EuropeanaAggregation
WebResource
image
uri
aggregatedCHOhasView
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
42
Build Graph G: Add Semantic Typestitle <CulturalHeritageObject,title>
<CulturalHeritageObject,label> credit <CulturalHeritageObject,provenance> <Person,note>
classification <Concept,prefLabel> <Concept,label>name <aac:Person,name> <foaf:Person,name>
imageURL <Document,uri> <WebResource,uri>
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
EuropeanaAggregation
WebResource
image
uri
aggregatedCHOhasViewtitle
label
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
43
Build Graph G: Add Semantic Typestitle <CulturalHeritageObject,title>
<CulturalHeritageObject,label> credit <CulturalHeritageObject,provenance> <Person,note>
classification <Concept,prefLabel> <Concept,label>name <aac:Person,name> <foaf:Person,name>
imageURL <Document,uri> <WebResource,uri>
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
note
credit
foaf:Person
EuropeanaAggregation
WebResource
image
uri
aggregatedCHOhasView
Document
imageURL
name
nameclassification
note
credit
label
credittitle
provenance
label
uri
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
44
Build Graph G: Expand with Paths from Ontology
• Assign a high weight to the links coming from the ontology
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
note
credit
foaf:Person
EuropeanaAggregation
WebResource
image
uri
aggregatedCHOhasView
imageURL
name
nameclassification
note
credit
label
creator
Aggregationaggregates
subClassOf
page
credittitle
provenance
label
uri
aggregates
isShownAt
Document
1000
1000
1000
1000
10001000
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
45
ApproachInput
Learn semantic types for attributes(s)
• Sample data from new source (S)• Domain Ontologies (O)• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and VGenerate and rank semantic models
3
Output• A ranked set of semantic models for
S
46
Map Source Attributes to the Graphtitle <CulturalHeritageObject,title>
<CulturalHeritageObject,label> credit <CulturalHeritageObject,provenance> <Person,note>
classification <Concept,prefLabel> <Concept,label>name <aac:Person,name> <foaf:Person,name>
imageURL <Document,uri> <WebResource,uri>
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
note
credit
foaf:Person
EuropeanaAggregation
WebResource
image
uri
aggregatedCHOhasView
Document
imageURL
name
nameclassification
note
credit
label
creator
Aggregationaggregates
subClassOf
page
credittitle
provenance
label
uri
aggregates
isShownAt1000
1000
1000
1000
10001000
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
47
Map Source Attributes to the Graphtitle <CulturalHeritageObject,title>
<CulturalHeritageObject,label> credit <CulturalHeritageObject,provenance> <Person,note>
classification <Concept,prefLabel> <Concept,label>name <aac:Person,name> <foaf:Person,name>
imageURL <Document,uri> <WebResource,uri>
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
note
credit
foaf:Person
EuropeanaAggregation
WebResource
image
uri
aggregatedCHOhasView
Document
imageURL
name
nameclassification
note
credit
label
creator
Aggregationaggregates
subClassOf
page
credittitle
provenance
label
uri
aggregates
isShownAt1000
1000
1000
1000
10001000
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
48
Map Source Attributes to the Graphtitle <CulturalHeritageObject,title>
<CulturalHeritageObject,label> credit <CulturalHeritageObject,provenance> <Person,note>
classification <Concept,prefLabel> <Concept,label>name <aac:Person,name> <foaf:Person,name>
imageURL <Document,uri> <WebResource,uri>
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
note
credit
foaf:Person
EuropeanaAggregation
WebResource
image
uri
aggregatedCHOhasView
Document
imageURL
name
nameclassification
note
credit
label
creator
Aggregationaggregates
subClassOf
page
credittitle
provenance
label
uri
aggregates
isShownAt
title
1000
1000
1000
1000
10001000
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
51
ApproachInput
Learn semantic types for attributes(s)
• Sample data from new source (S)• Domain Ontologies (O)• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and VGenerate and rank semantic models
4
Output• A ranked set of semantic models for
S
52
Generate Semantic Models
• Compute Steiner tree for each mapping– A minimal tree connecting nodes of
mapping– A customization of BANKS algorithm
[Bhalotia et al., 2002]• Our algorithm considers both
coherence and popularity• Each tree is a candidate model• Rank the models based on coherence
and cost
53
What Is Coherence?
PlacePerson
organizer
Event
location
s1 s2 s3Person
bornIn
Place Person
bornIn
Place
Known Models
54
What Is Coherence?
PlacePerson
organizer
Event
location
s1 s2 s3Person
bornIn
Place Person
bornIn
Place
PlacePerson
organizer
Event
location
Graph1 1
bornIn0.5
s2, s3
s1s1
Known Models
55
What Is Coherence?
PlacePerson
organizer
Event
location
s1 s2 s3Person
bornIn
Place Person
bornIn
Place
PlacePerson
organizer
Event
location
Graph1 1
bornIn0.5
s2, s3
s1s1
Known Models
Semantic typesof a new source
PersonEventPlace
56
What Is Coherence?
PlacePerson
organizer
Event
location
s1 s2 s3Person
bornIn
Place Person
bornIn
Place
PlacePerson
organizer
Event
location1 1
bornIn0.5
s2, s3
s1s1
Top 3 Steiner trees
Known Models
PlacePerson
organizer
Event1
bornIn0.5
s2, s3
s1PlacePerson Event
location1
bornIn0.5
s2, s3
s1
PlacePerson
organizer
Event
location1 1
s1s1
PersonEventPlace
Semantic typesof a new source
Not minimal model
but more coherent
Graph
57
Example Mappingtitle <CulturalHeritageObject,title>
<CulturalHeritageObject,label> credit <CulturalHeritageObject,provenance> <Person,note>
classification <Concept,prefLabel> <Concept,label>name <aac:Person,name> <foaf:Person,name>
imageURL <Document,uri> <WebResource,uri>
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
note
credit
foaf:Person
EuropeanaAggregation
WebResource
image
uri
hasView
Document
imageURL
name
nameclassification
note
credit
label
creator
Aggregationaggregates
subClassOf
page
credittitle
provenance label
uri
isShownAt1000
1000
1000
1000
10001000
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
aggregatedCHOaggregates
58
Steiner Treetitle <CulturalHeritageObject,title>
<CulturalHeritageObject,label> credit <CulturalHeritageObject,provenance> <Person,note>
classification <Concept,prefLabel> <Concept,label>name <aac:Person,name> <foaf:Person,name>
imageURL <Document,uri> <WebResource,uri>
aac:Person
CulturalHeritageObject
name
creator
name
hasTypecreated title
prefLabel
type
creationDate
title
Concept aac:Person
name
sitter
name
note
credit
foaf:Person
EuropeanaAggregation
WebResource
image
uri
hasView
Document
imageURL
name
nameclassification
note
credit
label
creator
Aggregationaggregates
subClassOf
page
credittitle
provenance label
uri
isShownAt1000
1000
1000
1000
10001000
0.5
1
1
1
dma
dma
1dma
npg 1
npg 1npg 1
npg 1dma npg 0.5dma npg
0.5dma npgnpg
aggregatedCHOaggregates
59
Final Model in KarmaEDM
SKOS
FOAF
AAC
ElementsGr2
ORE
DCTerms
Domain ontologies:
Domain: Museum Data
Source: Detroit Institute of Art dia(title,credit,classification,name,imageURL)
Evaluation
60
Evaluation Dataset EDM CRM
# sources 29 29
# classes in the ontologies 119 147
# properties in the ontologies 351 409
# nodes in the gold standard models 473 812
# links in the gold standard models 444 785
63
User Effortsource columns Choose Type Change Link Time
(min)s1 7 7 1 3s2 12 5 2 => 1 6s3 4 0 0 2s4 17 5 7 => 6 8s5 14 4 6 => 2 7s6 18 4 4 => 1 7s7 14 1 4 => 0 6s8 6 0 4 => 0 3… … … … …s29 9 2 1 3Total 329 56 96 => 19 128 =>
101
Avg. min per source: 4.4 3.4 minutesAvg. # user actions per source: 5.2 2.6
Accuracy
Compute precision and recall between learned models and correct models
64
How many of the learned relationships are correct?
How many of the correct relationships are learned?
Person
Artwork
location
Museum
creator
corre
ct m
odel
Person
Museum
location
Artwork
founder
lear
ned
mod
el
<Artwork,location,Museum><Artwork,creator,Person>
<Museum,founder,Person><Artwork,location,Museum>
Precision: 0.5, Recall: 0.5
65
Experiment 1correct semantic types are given
EDM
CRM
0.0
0.2
0.4
0.6
0.8
1.0
pre-ci-sionrecall
Number of known semantic models
0.0
0.2
0.4
0.6
0.8
1.0
pre-ci-sionrecall
Number of known semantic models
68
Experiment 2learn semantic types, pick top K
candidates
0.0
0.2
0.4
0.6
0.8
precision (k=1)recall (k=1)precision (k=4)recall (k=4)
Number of known semantic models
0.0
0.2
0.4
0.6
0.8
precision (k=1)recall (k=1)precision (k=4)recall (k=4)
Number of known semantic models
EDM
CRM
Inferring Semantic Relations from Linked Open Data
Contribution: leveraging graph patterns in LOD
to infer relationships
72
Idea
• There is a huge amount of linked data available in many domains (RDF format)
• Use LOD when there is no known semantic model
• Exploit the relationships between instances
73
Approach[Taheriyan et al, COLD 2015]
Domain Ontology
LearnSemantic
Types
Sample Data
Construct a Graph
Generate Candidate
Models
Rank Results
Linked Open Data
Extract Patterns
74
LOD Patterns../person-
institution/57551 E21_Personrdf:type
Thomas Burgonskos:prefLabel
../person-institution/57551/birthP98i_was_born
../person-institution/57551/birth/date
P4_has_time-span
E67_Birthrdf:type
1787rdfs:label
E52_Time-Span
rdf:type
LOD Fragment from the British
Museum
75
LOD Patterns
E67_BirthE21_Person
P98i_was_born
../person-institution/57551 E21_Person
rdf:type
Thomas Burgonskos:prefLabel
../person-institution/57551/birthP98i_was_born
../person-institution/57551/birth/date
P4_has_time-span
E67_Birthrdf:type
1787rdfs:label
E52_Time-Span
P4_has_time-span
E52_Time-Span
rdf:type
LOD Fragment from the British
Museum
Pattern
76
Pattern Templates
• Many possible templates for patterns– Example: patterns for classes C1, C2, C3
• Consider only tree patterns• Limit the length of the patterns
78
Evaluation• Linked data: 3,398,350
triples published by Smithsonian American Art Museum
• Correct semantic types given
• Extracted patterns of length 1 and 2
Evaluation Dataset CRM
# sources 29# classes in the ontologies 147# properties in the ontologies 409# nodes in the gold standard models 812# links in the gold standard models 785
background knowledge precision recall time (s)
domain ontology 0.07 0.05 0.17domain ontology + patterns of length 1 0.65 0.55 0.75domain ontology + patterns of length 1 and 2 0.78 0.70 0.46
80
Related Work• Mapping databases and spreadsheets to ontologies
– Mapping languages and tools (D2R, R2RML)– String similarity between column names and ontology terms
• Understand semantics of Web tables– Use column headers and cell values to find the labels and
relations from a database of labels and relations populated from the Web
• Exploit Linked Open Data (LOD)– Link the values to the entities in LOD to find the types of the
values and their relationships
• Learn semantic definitions of online information sources– Learns logical rules from known sources– Only learns descriptions that are conjunctive combinations of
known descriptions
82
Discussion
• Contributions– Semi-automatically model the relationships– Learn semantic models from previous models– Infer semantic relationships from LOD
• Provide explicit semantics for large portion of LOD
• Help to publish consistent RDF data• Applications
– VIVO– Smithsonian American Art Museum– DIG for DARPA’s Memex project