Some thoughts about the gaps across languages and domains
through the experience on building the core common vocabularies
Hideaki TakedaNational Institute of Informatics
Glocal KO Workshop, Thursday August 13, 2015, Copenhagen
Who am I? Hideaki Takeda, Dr., Eng.
• Professor, National Institute of Informatics– Research Institute mainly for Computer Science
• Background: Computer Science, in particular, Artificial Intelligence
• Current interest: Semantic Web, Ontology, Linked Open Data (LOD), Social Media Analysis
• Social activities– President, Linked Open Data Initiative (NPO)– Founder, Dbpedia Japanese Chapter – Specialist, Information-technology Promotion Agency, Japan (IPA)– Chair, Japan Link Center (Registration Agency of International DOI
Foundation)– Board, ORCID
Core Vocabularies
• Background– Everything is on infosphere, i.e., web– Lots of information, lots of data, lots of systems
• Problems– Misunderstanding/mis-matching/”missing links“
across different domains– Gap between human and machines (computers)
Core Vocabularies
• Aim– Increase interoperability of information/data– Bridge human and machine understanding
• Target– Governmental documents/data
• Method– Define a set of concepts which bridge (human-readable)
terms and (computer-processable) symbols (URIs)– Starting from the most common concepts
Core Vocabularies
• Activities worldwide– USA: NIEM Core• NIEM (National Information Exchange Model)
– Europe: ISA Core Vocabularies– UN: United Nations Centre for Trade Facilitation
and Electronic Business (UN/CEFACT)• Core Components Library (UN/CCL)
– Japan: IMI Core Vocabulary
ISA Core Vocabularies v 1.1
NIEM Architecture
http://niem.github.io/technical/iepd-versions/
NIEM
http://reference.niem.gov/niem/guidance/user-guide/vol1/user-guide-vol1.pdf
http://www.epa.gov/oei/symposium/2010/roy.pdf
10
IMI Project• Supported by– Ministry of Economy, Trade,
and Industry, Japan• Technical Framework– Data Model– Core Vocabulary– Design Rules
• Support Framework– Tools
• for data developer• for schema developer
– Database• schema / tools / templates/ …
rdfxml
Person Type
Name
Gender
Gender Code
Birth Date
Address
…
Name Type
TypeName
Family Name
Given Name
…
Address Type
TypeNotation
Zip Code
Prefecture
City…
String
String
String
Code Type String
String
String
String
String
String
Code Type
TypeValue
Name Type
Address Type
Codelist Type
String
Thing Type
IMI as a template for schema
Registration form for Conference X
Name : Address :Gender : Affiliation :Affiliation Address :Attending date : - -
M / F
Person Type Name Gender Gender Code Birth Date Address …
Name Type Type Name Family Name Given Name …
Address Type Type Notation Zip Code Prefecture City …
StringStringString
Code Type String String
StringStringString
String
Code Type Type Value
Name Type
Address Type
Codelist TypeString
Thing Type
IMI Individual Form
Person Type Name Gender Address Affiliation
Name Type Name
Address Type Notation Zip-code
String String
StringString
Name
AddressOrg.
PersonDate
Event Participation Type Participant Date
Design Schema
Remove unnecessary items
Add necessary items
Roles of IMI
• Structured concept dictionary– Concept dictionary
• Terms as notation of concepts– The entry is concept, not term
• Class concept and relation concept• General-specific relation
– Structured dictionary• Concepts form a network of concepts which in tern represents meaning of
individual concepts• A class concept consists of relation concepts representing attributes and
general/specific relations• A relation concept consists of class concepts connected as domains and
ranges and general/specific relations
• Template for schemata– Add or remove items for the specific needs
Use of IMI• Define the concept model• “Serialize” it into specific “physical” forms• Use suitable a physical form
IMI Concept Model
RDF XML Natural Language Form
For Open Data For data exchange For spread sheets and documents• Relax definition• Interoperability
with other open data schemata
• Strict definition• Interoperability with DB
schemata
• Relax definition with simple structure
• Readability by humans
14
IMI Core vocabulary v2.2• Published on Feb.3 2015• 48 core class terms– person, address, facility, location, date, …
• 206 core property terms– name of person, birth date, birth country, …
• Multi format – rdf schema, xml schema
and documents for human
http://imi.ipa.go.jp/ns/core/2/
16
Class definition (person class)person 人
説明:人の情報を表現するためのデータ型 Data Type to describe a person継承 (inherit from) : ic: 実体型
property Data type cardinality 説明 (ja) Description (en)ID ID ic:ID 型 0..n ID Identification of a Person
Name of person 氏名 ic: 氏名型 0..n 氏名 Name of a PersonGender 性別 xsd:string 0..1 性別の表記 Gender of a Person
Gender code 性別コード ic: コード型 0..1 性別コード Gender of a PersonBirth date 生年月日 ic: 日付型 0..1 生年月日 Date of Birth of a Person
Death date 死亡年月日 ic: 日付型 0..1 死亡年月日 Date of Death of a PersonResidence
address 住所 ic: 住所型 0..n 現住所 Present address of a PersonDomicile of origin 本籍 ic: 住所型 0..1 本籍 Legal residence address of a Person
Contact information 連絡先 ic: 連絡先型 0..n 連絡先 Contact information of a Person
Nationality 国籍 xsd:string 0..n 国籍の表記
A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country.
Nationality code 国籍コード ic: コード型 0..n住民基本台帳で利用されている国籍コード
A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country.
Birth country 出生国 xsd:string 0..1 生まれた国名 A location where a person was born.Birth country
code 出生国コード ic: コード型 0..1 生まれた国のコード A location where a person was born.Birth place 出生地 ic: 住所型 0..1 生まれた場所 A location where a person was born.
Class Structure
person 人name ic: 氏名型Contact ic: 連絡先型 : :
氏名Family name xsd:string
Romanized Family name xsd:string
: :
contact 連絡先Phone number ic: 電話番号型Address ic: 住所型 : :
電話番号 : :
address 住所Country xsd:string
Prefecture xsd:string
: :
A class term has a property term as a sub element and the property term can refer a class term. Again, the class term has a list of property terms. That constructs a layered structure of terms as the following figure.
phone number
name
18
Concept of the IMI framework
International interoperability is highly considered in preparing IMI.
Core Vocabulary
Shelter
Location
Hospital
Station
Geographical Space/Facilities
Transportation
Disaster Prevention
FinanceDomain-specific
Vocabularies
Disaster Restoration Cost
Cross Domain Vocabulary
IMI
Japanese Local
government Standard(APPLIC)
DE fact Standards(DC, foaf,
etc)
NIEM(US)
ISA(EU)
Schema.org
Mapping between concepts in different core vocabularies
• Difficulty of concept-concept mapping– Matching of meaning tends to be very abstract
discussion
Concept
reference
Ontology
Real world
Concept
reference
?
Mapping between concepts in different core vocabularies
• Difficulty of concept-concept mapping– Matching of meaning tends to be very abstract
discussion– Matching of references is easier
Concept
reference
Ontology
Real world
Concept
reference
?
Mapping between concepts in different core vocabularies
• Difficulty of concept-concept mapping– Syntactical mapping vs. semantic mapping• Just consider what it refers in the real world, not how it
is represented in systems. Concept
reference
Ontology
Concept
reference
?
Systems World
Cognitive World
Person
person 人説明:人の情報を表現するためのデータ型 Data Type to describe a person継承 (inherit from) : ic: 実体型
property
Data type
cardinalit
y 説明 (ja) Description (en)
ID ID ic:ID 型 0..n ID Identification of a PersonName of
person 氏名 ic: 氏名型 0..n 氏名 Name of a Person
Gender 性別 xsd:string 0..1 性別の表記 Gender of a Person
Gender code
性別コード
ic: コード型 0..1 性別コード Gender of a Person
Birth date 生年月日
ic: 日付型 0..1 生年月日 Date of Birth of a Person
Death date
死亡年月日
ic: 日付型 0..1 死亡年月日 Date of Death of a Person
Residence address 住所 ic: 住所
型 0..n 現住所 Present address of a Person
Domicile of origin 本籍 ic: 住所
型 0..1 本籍 Legal residence address of a Person
Contact informatio
n連絡先
ic: 連絡先型 0..n 連絡先 Contact information of a
Person
Nationality 国籍 xsd:string 0..n 国籍の表記
A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country.
Nationality code
国籍コード
ic: コード型 0..n
住民基本台帳で利用されている国籍コード
A county that assigns rights, duties, and privileges to a person because of the birth or naturalization of the person in that country.
Birth country 出生国 xsd:stri
ng 0..1 生まれた国名 A location where a person was born.
Birth country
code出生国コード
ic: コード型 0..1 生まれた国の
コードA location where a person was born.
Birth place 出生地 ic: 住所型 0..1 生まれた場所 A location where a person
was born.
?
?
Systems WorldCognitive World
Postal Code
?
?
“101-8430” ^^xsd:string “SW1A 0AA”@en
(postal code in Japan) (postal code in Europe)
Systems WorldCognitive World
Semantic Mapping
• Semantic Mapping– Mapping on the cognitive layer– Two ways of judging mapping
• Extensional Mapping– Check whether ‘things’ are shared– e.g., person– Mostly for Class Mapping
• Intensional Mapping– Check whether ‘values’ are shared– e.g., postal-code– Mostly for Property Mapping
• Syntactical Mapping– Mapping on the systems layer
Types of matching: SKOS
• Exact Match• Close Match• Broad/Narrow Match• Related Match
Close match
• Close match: nearly matched but not exactly matched.
• Extensional mapping– Coverage of ‘things’ are overlapped so much
• Coverage of ‘Country’ is slightly different – ‘things’ are close
• Reference of ‘Person’ is slightly different (person vs. legal Person)
• Intensional mapping– Coverage of ‘values’ are overlapped so much
Broad match/narrow match
• Broad/narrow match– One subsumes the other
• Extensional mapping– Coverage of ‘things’ are subsumed, i.e., the subset
is exact match• Intensional mapping– Coverage of ‘values’ are subsumed, i.e., the subset
is exact match
More different matching
• Complicated match– An element of a system matches a combination of
two or more elements.– “Pathway” match• A single property matches the combination of two or
more properties
– “Conditional” match• An element matches the other element if some condition
is hold
IdentifierIssuingAuthority Link Has related match IMI ic:ID 型 .ic:ID 体系 .ic: 発行者
LegalEntityRegisteredAddress Link Has broad
match IMI ic:法人型.ic:住所 It is exact match if the value of ic: 住所 . 種別 should be " 登記住所 ".
Results
Core Vocabulary Identifier Link Mapping relation Data model IdentifierAddress Link Has exact match IMI ic:住所型AddressAddressArea Link Has narrow match IMI ic:住所型.ic:町名AddressAddressArea Link Has narrow match IMI ic:住所型.ic:丁目AddressAddressArea Link Has narrow match IMI ic:住所型.ic:番地補足AddressAddressArea Link Has narrow match IMI ic:住所型.ic:番地AddressAddressArea Link Has narrow match IMI ic:住所型.ic:号AddressAddressID Link Has exact match IMI ic:住所型.ic:IDAddressAdminUnitL1 Link Has exact match IMI ic:住所型.ic:国AddressAdminUnitL2 Link Has narrow match IMI ic:住所型.ic:都道府県AddressFullAddress Link Has exact match IMI ic:住所型.ic:表記AddressLocatorDesignator Link Has narrow match IMI ic:住所型.ic:ビル番号AddressLocatorDesignator Link Has narrow match IMI ic:住所型.ic:部屋番号AddressLocatorName Link Has narrow match IMI ic:住所型.ic:ビル名AddressPOBox Link Has related match IMI ic:住所型.ic:方書AddressPostCode Link Has exact match IMI ic:住所型.ic:郵便番号AddressPostName Link Has narrow match IMI ic:住所型.ic:市区町村AddressPostName Link Has narrow match IMI ic:住所型.ic:区AddressThoroughfare Link Has no match IMIAgent Link Has exact match IMI ic:実体型
ResultsIdentifier Link Has exact match IMI ic:ID型IdentifierIdentifier Link Has exact match IMI ic:ID型.ic:識別値IdentifierIssueDate Link Has no match IMIIdentifierIssuingAuthority Link Has related match IMI ic:ID 型 .ic:ID 体系 .ic: 発行者IdentifierIssuingAuthorityURI Link Has exact match IMI ic:ID型.ic:ID体系.ic:URIIdentifierType Link Has no match IMI
JurisdictionIdentifier Link Has related match IMI ic:国籍コードJurisdictionName Link Has related match IMI ic:国籍LegalEntity Link Has exact match IMI ic:法人型LegalEntityAddress Link Has broad match IMI ic:法人型.ic:住所LegalEntityAlternativeName Link Has no match IMILegalEntityCompanyActivity Link Has close match IMI ic:法人型.ic:事業種目LegalEntityCompanyStatus Link Has related match IMI ic:法人型.ic:活動状況LegalEntityCompanyType Link Has exact match IMI ic:法人型.ic:組織種別LegalEntityIdentifier Link Has exact match IMI ic:法人型.ic:IDLegalEntityLegalIdentifier Link Has no match IMILegalEntityLegalName Link Has broad match IMI ic: 法人型 .ic: 名称 . 表記LegalEntityLocation Link Has related match IMI ic: 法人型 .ic: 地物 . 説明LegalEntityRegisteredAddress Link Has broad match IMI ic:法人型.ic:住所Location Link Has exact match IMI ic:場所型LocationAddress Link Has exact match IMI ic:場所型.ic:住所LocationGeographicIdentifier Link Has broad match IMI ic:場所型.ic:地理識別子LocationGeographicName Link Has exact match IMI ic:場所型.ic:名称.ic:表記LocationGeometry Link Has exact match IMI ic:場所型.ic:地理座標
Results
Person Link Has exact match IMI ic:人型PersonAddress Link Has exact match IMI ic:人型.ic:住所PersonAlternativeName Link Has broad match IMI ic:人型.ic:氏名.ic:姓名PersonBirthName Link Has broad match IMI ic:人型.ic:氏名.ic:姓名PersonCitizenship Link Has no match IMI PersonCountryOfBirth Link Has exact match IMI ic:人型.ic:出生国PersonCountryOfDeath Link Has no match IMI PersonDateOfBirth Link Has exact match IMI ic:人型.ic:生年月日PersonDateOfDeath Link Has exact match IMI ic:人型.ic:死亡年月日PersonFamilyName Link Has exact match IMI ic:人型.ic:氏名.ic:姓PersonFullName Link Has exact match IMI ic:人型.ic:氏名.ic:姓名PersonGender Link Has exact match IMI ic:人型.ic:性別コードPersonGivenName Link Has exact match IMI ic:人型.ic:氏名.ic:名PersonIdentifier Link Has broad match IMI ic:人型.ic:IDPersonPatronymicName Link Has no match IMI ic:人型.ic:氏名.ic:姓名PersonPlaceOfBirth Link Has narrow match IMI ic:人型.ic:出生地
Bridging core and domain vocabularies (working in progress)
• Aim: Core vocabulary would be extended to domain vocabularies– Agriculture– Finance– Traffic– …
• Task: – Can concepts be shared between core and domains?really?
Agricultural Activity Ontology (AAO)
Agricultural activity
crop production activityactivity for propagationactivity in the vegetative growth stageactivity in the reproductive growth stage
activity for environment controlactivity for soil controlactivity for climate controlactivity for water controlactivity for biotic controlactivity for chemical control
post production activityactivity for harvestingactivity for processingactivity for extending shelf-lifeactivity for wrapping
indirect activity
activity for preparing materialsactivity for cleaningactivity for transportactivity for monitoringactivity for maintaining farm equipment
administrative activityactivity for business administration
http://cavoc.org/aao/
An example: “activity” (and “event”)• S: (n) activity (any specific behavior) "they avoided all recreational activity"
– direct hyponym / full hyponym– direct hypernym / inherited hypernym / sister term
• S: (n) act, deed, human action, human activity (something that people do or cause to happen)– S: (n) event (something that happens at a given place and time)
– [WordNet]• Each activity is a Happening which involves volition and participants. It has
temporal dimension. It is distinguished from Events by the fact that the activity does not trigger change of state and does not have a conceptual end point. – [PROTON Extent module (a lightweight upper-level ontology)]
• Activity: This class represents the abstract content of an event, which may be repeated many times, once or never. For example a training course, or a play. – [The Event Programme Vocabulary (prog)]
• E5 Event– Subclass of: E4 Period– Superclass of: E7 Activity, E63 Beginning of Existence, E64 End of
Existence• E7 Activity
– Subclass of: E5 Event– Superclass of: E8 Acquisition, E9 Move, E10 Transfer of Custody, E11 Modification,
E13 Attribute Assignment, E65 Creation …– [CIDOC Conceptual Reference Model]
Summary
• Sharing concepts is a very long way• No ground truth– Step-by-step understanding of the world– Careful consensus making
• More flexible framework is needed– Simple mapping is not so happy