View
240
Download
3
Embed Size (px)
Citation preview
China 2009 1
语义网的逻辑基础 Logical Foundation of the Semantic Web
主讲: 黄智生 Zhisheng Huang
Vrije University Amsterdam , The Netherlands
助教: 胡伟 Wei Hu
Southeast University
China 2009 2
课程时间表 Schedule
China 2009 3
• 语义网的基本思想• RDF/RDFS
• OWL 语言• OWL-DL 及其与描述逻辑的关系
讲座 4 :语义网与逻辑Lecture 4: The Semantic
Web and its Logics
China 2009 4
从 Google 谈起starting from Google
China 2009 5
存在的问题Existing Problems
China 2009 6
我们能不能做得更好?Can we do it better?
• 基于语义的搜索 Semantics-based search• 概念组合描述 concept combination
specification• 指定特定领域 domain specific• 逼近搜索 approximate search• 搜索代理 search agent
China 2009 7
语义网 (Semantic Web)
•核心思想 : 给网络信息赋于确切定义的意义 , 即语义。„The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in co-operation.“
[Berners-Lee et al., 2001]
China 2009 8
语义网想做什么?(What the Semantic Web wants
to do)
• 机器可自动处理• 机器可理解Content is machine-understandable if it
is bound to some formal description of itself (i.e. metadata).
China 2009 9
万维网 : 影响和展望WWW: Its impacts and
visions
China 2009 10
网络 1.0 Web1.0
China 2009 11
网络 2.0 Web2.0
China 2009 12
对网络 3.0 的期待Expectations on Web3.0
• 新颖性 (Novelty) : 它不同于已有的 Web1.0 和 Web2.0 的技术,它能提供全新的一代网络服务模式 ( 即为什么不是Web1.0 或 Web2.0 )
从字面上看对 Web3.0 的特征期待:
• 可行性 (Achievability) :它在现有的网络环境下,经过努力是可能实现的, 它并不存在不可逾越的技术障碍(即为什么不是 Web4.0 或更高)。
• 迫切性 (Urgency) :它提供网络服务是当前社会迫切需要的,它的技术引入是能够对社会产生重大影响。(即为什么只能是 Web3.0 )
China 2009 13
网络 3.0 Web3.0
China 2009 14
网络 1.0 – 网络 2.0 – 网络 3.0Web1.0 – Web2.0 –
Web3.0• 网络 1.0 : 文件网
Web1.0: Web of documents
• 网络 2.0 : 人际 / 社会网 Web2.0: Web of persons
• 网络 3.0 : 数据网 Web3.0: Web of data (semantics)
China 2009 15
网络发展整体观
China 2009 16
语义联接的好处:从一个实例说起
Advantages of Linked Data
China 2009 17
数据联接的好处:小结• 现有的网页是供人们阅读的,不便于机器自动处
理,数据联接便于机器自动处理• 文件联接在局部文字上只允许一个链接,而数据
联接对局部文字支持多重链接• 文件联接只提供部分文字链接,而数据联接保证
全文链接• 基于关键词的搜索引擎如 Google 虽然看起来支持
全文检索,但它不能区分同一个词的不同含义,这对于人名,地名等重复性频率较高的问题领域处理尤其困难,而且在许多具体应用领域一词多义的情形比比皆是。
China 2009 18
数据联接的统一概念格式
• 三元组 (Triple) 方法 : <subject, predicate, object>
例子: <zhishengHuang, isStaffof, VrijeUnivAm>
• 提供网络资源的描述能力例子: <http://wasp.cs.vu.nl/~huang,
isStaffof, http://www.vu.nl>• 提供语义的唯一标识• 让数据内容独立于表达形式• 提供初步的语义推理能力
China 2009 19
为什么推理支持是必要的?
例子:从 ZhishengHuang 是自由大学的雇员和自由大学在阿姆斯特丹,能够推出ZhishengHuang 在阿姆斯特丹工作。
<ZhishengHuang, isStaffof, VrijeUnivAm>
<VrijeUniv, inCity,Amsterdam>,
<?x, isStaffof, ?y>,<?y,inCity,?z> -><?x,worksin,?z>
= 》 <ZhishengHuang, worksin, Amsterdam>
China 2009 20
语义网与本体Semantic Web and
Ontologies
China 2009 21
五句话介绍语义网的主要思想: Why the Semantic Web ?
•任任何信息系统都需要数据;•数数据表示要独立于具体的应用和平台,以保证最大程度地可重用;•采用统一的数据概念表示以保证数据表示独立于具体系统(即可采用 Triple/Tuple 形式) ;•数数据应能描述网络资源(即要采用 RDF/RDFS 或其他类似的语言)•数数据应提供初步的推理支持(即要采用 OWL 或其他知识表示语言)
(注意; RDF/RDFS/OWL均采用 Triple 语义模型)
China 2009 22
发展趋势
根据美国著名市场研究公司 Gartner 的 2007 五月份报告, 到 2012 年, 70% 的公开网页将带有一定程度的语义标注, 20% 将使用更强的基于语义网的本体。Gartner (May 2007):
"By 2012, 70% of public Web pages will have some level of semantic markup,
20% will use more extensive Semantic Web-based ontologies”
China 2009 23
海量语义数据的一部分Ontologies and Metadata: Billion Triples
dataset(十亿三元组数据集)
• 雅虎数据• 东南大学数据• 马里兰大学• 英国 open 大学• SemWebBase
( DERI)• 维基百科• 地理名字• 出版物• 英文语义词典• Freebase• 美国政府数据
China 2009 24
Linked Data 2009
China 2009 25
一个具体的数据联接的实例http://sindice.com/apiv2/search?q=%22zhisheng%20huang%22&format=atom&page=1&qt=term
China 2009 26
一个具体的数据联接的实例http://sindice.com/apiv2/search?q=%22zhisheng%20huang%22&format=atom&page=5&qt=term
China 2009 27
Falcons
China 2009 28
让数据内容独立于表达形式
China 2009 29
More about the Semantic Web
请见 8月 29日 星期六的导课• 09:00-12:00 导课 1 : Introduction to
the Semantic Web ( Ivan Herman )
China 2009 30
HTML 标识 (HTML Markup)……<h2>Zhisheng Huang</h2><b>Affiliation</b>: Department of Computer Science<br>Faculty of Sciences<br>Vrije University Amsterdam<p><b>Email</b>: huang @ cs.vu.nl<br><b>Phone</b>: 31-20-4447740(office)……
</html>
China 2009 31
XML 标注XML-Annotations
<researcher><name>Zhisheng Huang</name><affiliation><department>Department of Computer
Science</department><faculty>Faculty of Sciences</faculty><university>Vrije University Amsterdam</university></affiliation><email>huang @ cs.vu.nl</email><phone id=“office”> (31)-20-4447740</phone>……</researcher>
</html>
China 2009 32
Data Structures
• 结构化数据 Structured Data:• Database
• 半结构化数据 Semi-structured Data:• HTML, XML, BibTex
• 非结构化数据 Non-structured Data:• Text
China 2009 33
关系数据库的 XML 表示XML representation of a
relational database<group name=“AI”><member id=“001”><name>John</name><phone>1234567</phone></member><member id=“002”><name>Mary</name><phone>7654321</phone></member>…..</group>
member id name phone
001 John 1234567
002 Mary 7654321
… … …
AI group
China 2009 34
文件类型定义Document Type Definition(DTD)
<!DOCTYPE researcher [<!ELEMENT researcher (name, affiliation, email,
phone)><!ELEMENT name (#PCDATA)><!ELEMENT email (#PCDATA)><!ELEMENT phone (#PCDATA)>
<!ATTLIST phone id CDATA #REQUIRED ><!ELEMENT affiliation (department, faculty,
university)>… ]>
China 2009 35
Researcher Affiliation
hasDepartment
Faculty
University
Name
Phone
1n
数据模型 Data Model
China 2009 36
XML 模式 XML Schema
• The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.
China 2009 37
Why XML Schemas
• XML Schemas are extensible to future additions
• XML Schemas are richer and more useful than DTDs
• XML Schemas are written in XML
• XML Schemas support data types
• XML Schemas support namespaces
China 2009 38
名字冲突 Name Conflicts
• Since element names in XML are not fixed, very often a name conflict will occur when two different documents use the same names describing two different types of elements.
• If these two XML documents were added together, there would be an element name conflict because both documents contain a same element with different content and definition.
China 2009 39
XML 名字空间 XML NameSpace
• Using Namespaces to solve Name Conflicts
Examples:
• xmlns:namespace prefix="namespace"
• xmlns:xsd="http://www.w3.org/2001/XMLSchema"
China 2009 40
可扩展标识语言模式XML Schema
<xsd:element name="reseracher"> <xsd:complexType>
<xsd:element name="name" type="xsd:String"/><xsd:element name="affiliation" type="affil"
minOccurs="1" maxOccurs="unbounded"/><xsd:element name="phone" type="xsd:String"/><xsd:element name="email" type="xsd:String"/>
</xsd:complexType> </xsd:element> <xsd:complexType name="affil">
<xsd:element name= " department" type="xsd:String"/><xsd:element name= " faculty" type="xsd:String"/><xsd:element name="university" type="xsd:String"/>
</xsd:complexType>
China 2009 41
资源描述框架Resource Description Framework(RDF)
• Metadata is machine understandable information about web resources or anything that has an URI, it is represented as a set of independent assertions:
http://wasp.cs.vu.nl/sekt/dig/dig.pdf
ZhishengCreator
CeesCreator
Triple: T(subject, attribute, values)
<rdf:Description about="http://wasp.cs.vu.nl/sekt/dig/dig.pdf"> <dc:Creator rdf:ressource="http://www.cs.vu.nl/~huang"/> <dc:Creator rdf:ressource="mailto:[email protected]"/> </rdf:Description>
China 2009 42
RDF: Dublin Core
• The Dublin Core provides properties for describing network objects, suitable for use by network search engines.
• The Dublin Core is a set of predefined properties for describing documents.
• The first Dublin Core properties were defined at the Metadata Workshop in Dublin, Ohio in 1995 and is currently maintained by the Dublin Core Metadata Initiative.
China 2009 43
Dublin Core Metadata Initiative
• The Dublin Core Metadata Initiative is an open forum engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models.
• http://dublincore.org/
China 2009 44
Annotating Metadata
<rdf:Description rdf:about=…dc-rdf/"> <dc:title>
Guidance on expressing the Dublin Core within the Resource Description Framework (RDF)
</dc:title> <dc:creator> Eric Miller </dc:creator> <dc:creator> Paul Miller </dc:creator> <dc:creator> Dan Brickley </dc:creator> <dc:subject> Dublin Core; RDF; XML </dc:subject> <dc:publisher> Dublin Core Metadata Initiative
</dc:publisher> <dc:contributor> Dublin Core Data Model Working
Group </dc:contributor> <dc:date> 1999-07-01 </dc:date> <dc:format> text/html </dc:format> <dc:language> en </dc:language> </rdf:Description>
China 2009 45
资源描述框架模式RDF Schema (RDFS)
• RDFS defines vocabulary for RDF
• Organizes this vocabulary in a typed hierarchy• Class, subClassOf, type• Property, subPropertyOf• domain, range
China 2009 46
RDFS
Prof. QuHu , W
Person
PhDStudent Professor
subClassOfsubClassOf
type
hasSuperVisordomain range
type
China 2009 47
Using A Blank Node
• Here the blank node stands for the concept of "John Smith's address".
China 2009 48
Blank Node Identifiers• Blank nodes must have a name for triple
usage.
• Blank node identifiers have the form _:nameexstaff:85740 exterms:address _:johnaddress ._:johnaddress exterms:street"1501 Grant Avenue" ._:johnaddress exterms:city "Bedford" ._:johnaddress exterms:state "Massachusetts" ._:johnaddress exterms:zip"01730" .
• If a node in a graph needs to be referenced from outside this context, a URIref is required.
• Blank nodes make binary relationships out of an n-ary one (between John and the street, city, etc.).
China 2009 49
资源描述框架模式RDF Schema (RDFS)
• RDFS defines vocabulary for RDF• Organizes this vocabulary in a
typed hierarchy– Class, subClassOf, type– Property, subPropertyOf– domain, range
China 2009 50
4. Other RDF Capabilities
• Containers
• Collections
• Reification
• Structured Values
China 2009 51
本体的主要特征Key features of an Ontology
•特殊与一般关系 InstanceOf Relation (Instances)
•部分与整体关系 PartOf Relation (property)
•概念层次性 Concept hierarchy,
–概念包含关系 concept subsumption
China 2009 52
网络本体语言Web Ontology Language (OWL)
• OWL is built on top of RDF • OWL is for processing information on the
web • OWL was designed to be interpreted by
computers • OWL was not designed for being read by
people • OWL is written in XML • OWL is a web standard
China 2009 53
China 2009 54
China 2009 55
China 2009 56
OWL Example: animals
<?xml version="1.0"?><rdf:RDFxmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:dc="http://purl.org/dc/elements/1.1/" xml:base="http://wasp.cs.vu.nl/sekt/ontology/animal"> <owl:Ontology rdf:about=“animal"/><owl:Class rdf:ID="Eagle"> <rdfs:subClassOf><owl:Class rdf:about="#Bird"/> </rdfs:subClassOf></owl:Class><owl:Class rdf:ID="Animal"/> <owl:Class rdf:ID="Fly"><owl:disjointWith> <owl:Class rdf:about="#Penguin"/></owl:disjointWith> <rdfs:subClassOf rdf:resource="#Animal"/> </owl:Class><owl:Class rdf:ID="Bird"> <rdfs:subClassOf rdf:resource="#Fly"/> </owl:Class> <owl:Class rdf:ID="Penguin"> <rdfs:subClassOf rdf:resource="#Bird"/> <owl:disjointWith rdf:resource="#Fly"/> </owl:Class></rdf:RDF>
China 2009 57
China 2009 58
DL for OWL: SHIQ
• SHIQ = ALCQHIR+
China 2009 59
SHOIN(D) and OWL-DL
• SHION(D):S: ALC + role transitivity
H: role hiersrchies
O: nominals
I: Inverse roles
N: cardinality restriction
D: datatypes
China 2009 60
OWL2 (OWL1.1)
• New features• OWL 2 adds new functionality with respect to
OWL 1. Some of the new features are syntactic sugar (e.g., disjoint union of classes) while others offer new expressivity, including:
• keys;• property chains;• richer datatypes, data ranges;• qualified cardinality restrictions;• asymmetric, reflexive, and disjoint
properties; and• enhanced annotation capabilities
China 2009 61
OWL 变种的三个方向 (I)
• OWL 2 EL: 便于有效地对大规模本体进行推理
OWL 2 EL enables polynomial time algorithms for all the standard reasoning tasks; it is particularly suitable for applications where very large ontologies are needed, and where expressive power can be traded for performance guarantees.
China 2009 62
OWL 变种的三个方向 (II)
• OWL 2 QL: 便于针对大规模数据进行基于数据库技术的合取查询
OWL 2 QL enables conjunctive queries to be answered using standard relational database technology; suitable for applications where relatively lightweight ontologies are used to organize large numbers of individuals or where it is useful or necessary to access the data directly via relational queries (e.g., SQL). .
China 2009 63
合取查询Conjunctive Queries
• Conjunctive queries are of the general form ( in the first order language )
China 2009 64
OWL 变种的三个方向 (III)
• OWL 2 RL: 便于针对 RDF 数据进行规则扩展的有效推理支持
OWL 2 RL enables the implementation of polynomial time reasoning algorithms using rule-extended database technologies operating directly on RDF triples; it is particularly suitable for applications where relatively lightweight ontologies are used to organize large numbers of individuals or where it is useful or necessary to operate directly on data in the form of RDF triples..
China 2009 65
OWL2 的三个变种
China 2009 66
More about OWL2
请见 8月 28日 星期五下午• 特邀讲座(潘志霖博士 Jeff Z.
Pan ): OWL2
China 2009 67
More Notations
• F: role functionality
• Q: qualified cardinality restriction
• R: generalised role inclusion
• E: existential role restriction
China 2009 68
OWL Variants and DL
• OWL Full: is not a DL• OWL DL: SHOIN(D)• OWL Lite: SHIF(D)• OWL2 Full: is not a DL• OWL2 DL: SROIQ(D)• OWL2 EL: EL++ • OWL2 QL: DL-Lite• OWL2 RL: DLP
China 2009 69
EL++
• A lightweight description logic that admits sound and complete reasoning in polytime.
• Dropping the (allValusFrom) restriction, whereas (someValuesFrom) is retained.
• It should be noted, however, that EL++ does admit (objectPropertyRange), which can be seen as an important case of (allValuesFrom).
China 2009 70
EL++: Syntax and Semantics
China 2009 71
EL++ Ontologies
• SNOMED CT, the Systematized Nomenclature of Medizine, Clinical Terms. SNOMED is a large-scale commercial ontology that underlies the standardized terminology of the health-care systems in the US, the UK, and a couple of other countries.
• NCI. The Thesaurus of the National Cancer Institute. An ontology that formalizes terms related to cancer research.
• The Gene Ontology formalizes terms relating to genes and gene products.
• More than 95% of the axioms of the GALEN ontology can also be expressed in EL++.
China 2009 72
Description Logic Programs (DLP)
• Description Logic Programs is a Horn fragment of OWL 2.
• The distinguishing feature of DLP is that it is an existential-free fragment; that is, while reasoning, the universe is fixed in the sense that one only needs to consider the objects explicitly used in the facts of the ontology.
China 2009 73
Overview of DLP Features
• Essentially, DLP captures RDFS subset of DL -- plus a bit more.
• RDFS subset of DL permits the following statements:– Class C is Subclass of class D.– Domain of property P is class C.– Range restriction on property P is class D.– Property P is Subproperty of property Q.– a is an instance of class C.– (a,b) is an instance of property P.
•
China 2009 74
Overview of DLP Features(continued)
• DLP also captures: – Using the Intersection connective
(conjunction) in class descriptions – Stating that a property P is Transitive. – Stating that a property P is Symmetric. • DLP can partially capture: most other DL
features.• Relevant technical issues in LP: – treatment of equality, e.g., uniqueness of
names.
China 2009 75
DLP and OWL DL
• DLP is able to express the following features of OWL DL:
• concept disjointness,• domains and ranges of properties,• inverse and symmetric properties,• functional and inverse-functional properties,• sub-property and equivalence relations
between object properties,• transitive properties, and• a limited form of General Concept Inclusion
axioms (GCIs).
China 2009 76
DL-Lite
• DL-Lite is a fragment of OWL DL especially tailored for handling efficiently large number of facts.
• The main focus is to provide efficient query answering on the data and to allow the use of Relational Database Managment technologies for such a purpose.
China 2009 77
DL-Lite
• DL-Lite also includes most of the main features of conceptual models, like UML class diagrams and ER diagrams. More specifically, DL-Lite includes the following features of OWL DL:
• a constrained form of someValuesFrom restrictions,
• conjunction,• concept disjointness,• domains and ranges of properties,• inverse properties,• inclusion axioms for object properties.
China 2009 78
Complexity
• The Data Complexity: the complexity measured with respect to the number of facts in the ontology.
• The Taxonomic complexity: the complexity measured with respect to the size of the axioms in the ontology.
• The Query Complexity: the complexity measured with respect to the number of conjuncts in the conjunctive query.
• The Combined Complexity: the complexity measured with respect to both the size of the axioms and the number of facts. In the case of conjunctive query answering, the combined complexity also includes the query complexity.
China 2009 79
Complexity of Tractable Fragments- OWL DL
China 2009 80
Complexity of Tractable Fragments- OWL Lite
China 2009 81
Complexity of Tractable Fragments- EL++
China 2009 82
Complexity of Tractable Fragments- DL-Lite
China 2009 83
Complexity of Tractable Fragments- DLP
China 2009 84
Relationship between the fragments of
OWL1.1(OWL2)
China 2009 85
Key Issues of the Semantic Web
• 数据,知识与语义 data , knowledge , and semantics
• 语义相关性, 语义相似性,与语义距离Semantic relevance, semantic similarity, and
semantic distance• 知识表示与推理Knowledge representation and reasoning• 海量语义数据处理 Scalability• 近似推理 Approximate reasoning
China 2009 86
语义网应用的一些实例:DBpedia Mobile
• http://beckr.org/DBpediaMobile/?location=Beijing
• http://beckr.org/DBpediaMobile
China 2009 87
芬兰医疗语义网 HealthFinland – Health Information on the Semantic Web
• http://www.seco.tkk.fi/applications/tervesuomi/• provide a new kind of solution approach to these
problems on a national Finnish level. The system consists of three main components: • Metadata, ontology, and service infrastructure. • Semantic content creation process. A content
creation and harvesting system has been implemented for producing semantically annotated contents, based on the shared metadata model and ontologies.
• Semantic portal HealthFinland (TerveSuomi) and its services. The material is published via a semantic portal that creates a single national entry-point for health information, health promotion and health-related news.
China 2009 88
National Semantic Web Ontology Project in Finland
(FinnONTO), • National Semantic Web Ontology Project in Finland (FinnONTO), 2003-2007
• A large national continuation project of FinnONTO, called Semantic Web 2.0 (FinnONTO 2.0), started in the beginning of 2008.
• The research is directed and is mostly carried our by the Semantic Computing Research Group (SeCo) at the Helsinki University of Technology (TKK) and the University of Helsinki. Also the University of Tampere is contributing to the work.
• The consortium behind the project included 37 public organizations and companies funding the research during the final year 2007. This consortium represents a wide area of functions of the society including museums, libraries, business, health organizations, government, media, and education. Public organizations, companies, and universities are participating in the project.
China 2009 89
荷兰国家文化传承工程The Dutch Cultural
Heritage Eculture Project STiTCH-Catch Chip Project
China 2009 90
Project E-Culture http://e-culture.multimedian.nl/
China 2009 91
China 2009 92
China 2009 93
China 2009 94
China 2009 95
Timeline
China 2009 96
2006国际语义网技术挑战赛冠军
China 2009 97
http://www.ontology-advisory.org/
China 2009 98
China 2009 99
China 2009 100
Balkenende attacks Bos
“You’re a twister and dishonest”, said the Christian Democrat about his most main opponent (30/10/06)
Polls: SP at 25 seats larger than VVD
According to a poll conducted by TNS/NIPO, the SP has risen to be the third largest party. (7/11/06)
Unrest in VVD over Rutte
Liberal MP’s also complain about personal campaign by Rita Verdonk. (1/11/2006)
Reality + SP
Reality - VVD
Balkenende – Bos
Balkenende: Bos - Ideal
VVD – VVD
VVD – Verdonk
语义网应用于政治分析实例研究: 2006 年荷兰大选
China 2009 101
Relational Content Analysis
China 2009 102
Example
China 2009 103
China 2009 104
趋势分析与预测
China 2009 105
China 2009 106
时间推理
China 2009 107
逻辑的作用
• 使用混合逻辑( Hybrid Logic )与事态逻辑描述性质:如 Internal Disagrement 等
China 2009 108
练习题
•用 OWL-DL 来描述家族本体
•罗列一些无法用 OWL-DL 描述的本体性质,并提出一些变通的表达方案•研究 OWL-DL 的合取查询的可判断性与复杂性
China 2009 109
语义网逻辑基础演义
第五回:信息浩瀚语义万维网横空出世 语义搜索本体技术域众人瞩目
欲知后事如何,请听下回分解。。。
China 2009 110
Questions and Discussions