Principles of the Semantic Web DB or KRDB? Alon Halevy

Preview:

Citation preview

Principles of the Semantic WebDB or KRDB?

Alon Halevy

2

Agenda

A spectrum of representation and query formalisms:– From relational databases to description logics and

beyond.Questions:– What can we represent?– What can we query?– How much will it cost?– Can we actually use this stuff?

3

Why Do We Care?

Need to represent data and knowledge on the semantic web.Need to query.Need to map between different representations of data/knowledge.People are confused, very religious about these issues.

Perspectives from the Structure Chasm

Authoring

Creating a schemaWriting text

Querying

keywords Someone else’s schema

Data sharing

Easy Committees & standards

5

Specifics

The relational model, a few query languages– Representing real data

Horn rules, the DB view and the KR view.XML: shattering the myth of no semantics.Description Logics: a logic of descriptions.One shameless plug for my past work.

6

Recalling First-Order Logic

A KB is a set of well-formed formulas X (Person(X) Mortal(X))X (Student(X) Smart(X))Person(a) v Dog(a)

X (Student(x) (x=Aristotle))

Interpretation: a mapping from terms to the universe of discourse.

Model: any interpretation that satisfies the formulas.Key idea: infer implicit facts from the explicit ones.

7

Example Inferences

The KB: X (Person(X) Mortal(X))X (Student(X) Smart(X))Person(a) v Dog(a)

X (Student(x) (x=Oren))– Mortal(a)? don’t know.– Smart(Oren)? Yes.– Mortal(a) v Mortal(a)? YesX (Person(x) not Mortal(x))? No

8

The Relational Data Model

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

Attribute namesTable name

Tuples or rows

9

No’s

No negationNo disjunctionAmbiguous support for incomplete information

The database represents a single model.

Hence, inference is just model checking.

10

Integrity Constraints

Very specific forms of logical formulae. Enforced (maybe) by the database system:– Functional dependencies: A B– Foreign key constraints: every tuple in the Purchase

table must refer to a product in the Product table.– Multi-valued dependencies.– Tuple-generating dependencies, etc. etc.

11

Inference: Type 1 (Querying)

Product (pname, price, category, manufacturer)Company (cname, stockPrice, country)

Find all countries that manufacture some product in the ‘Gadgets’ category.

SELECT countryFROM Product, CompanyWHERE manufacturer=cname AND category=‘Gadgets’

SELECT countryFROM Product, CompanyWHERE manufacturer=cname AND category=‘Gadgets’

Q(c) :- Product(w,y,’Gadgets’,x), Company(x,p,c)

12

Query Language Features

Fundamental question: what queries can I express with my language?Start with selection, projection, and join.+ union and negation (= relational completeness)

To deal with real data:– Grouping and aggregation– Dealing with duplicates– Outer joins

13

Inference Type 2: Query Containment

• Question: is the result of Q1 always a subset of Q2?

Q1(A,B) :- cites(A,B), cites(B,A), sameTopic(A,B)Q2(C,D) :- cites(C,C1), cites(D,D1)

• Inference on a very specific type of formula.• Only finite models are considered.

14

Complexity Results Galore

For select-project-equi-join: NP-CompleteAdd comparisons (e.g, <): Pi^p_2 complete.Add negation: – Level 0: still Pi^p_2 complete.– Level 2: undecidable– Level 1: Sagiv and Ullman know but don’t want to tell

Allow at most 2 occurrences of every predicate name: polynomial.A lot of papers.

15

Last Week Recap

The relational model: ground facts + UNA + CWA.Integrity constraints: expressing more knowledge.Inference Type 1: querying (= model checking).– Note: polynomial time is not good enough.

Inference Type 2: query containment– Type 2.5: answering queries using views.– Both are an inference problem of a particular type of

formula in first-order logic over finite models.

16

Beyond ContainmentAnswering Queries Using Views

Given a query Q and a set of view definitions V1,…,Vn:Is it possible to answer Q using only the V’s?

V1(A,B) :- cites(A,B), cites(B,A)V2(C,D) :- sameTopic(C,D), cites(C,C1), cites(D,D1)Query:q(x,y) :- sameTopic(x,y), cites(x,y), cites(y,x)

Query rewriting: q’(X,Y) :- V1(X,Y), V2(X,Y)

17

Didn’t We Say 590 Semantic Web?

Assume a virtual schema of the WWW, e.g.,– Course(number, university, title, prof, quarter)

Every data source on the web contains the answer to a view over the virtual schema:

UW database: SELECT number, title, prof FROM Course WHERE univ=‘UW’ AND quarter=‘2/02’Stanford database: SELECT number, title, prof, quarter FROM Course WHERE univ=‘Stanford’User query: find all professors who teach “database systems”

18

Horn Rules / Datalog

Easy for KR people. Hard for database people.Add recursion to the query language:– Path (x,y) :- edge(x,y)– Path (x,y) :- Path(x,z), Path(z,y)

DB people consider least fixed point semantics.{ edge(a,b), Path(a,b), Path(b,c), Path(a,c)}:– Is a model for KR folks, not DB folks.

Recursion is not expressible in first-order logic.

19

More on Datalog

Many clever algorithms for evaluating datalog queries:– They have fancy names: e.g., magic sets

Query containment: undecidable, unless you constrain the queries (ask Surajit)Some ideas made it into SQL and relational systems:– Magic sets (useful even without recursion)– Linear recursion in SQL-3

20

XML

<db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>

21

XML: Issues

Data model: edge-labeled graph (/tree):– The tags can be viewed as binary relations

Features:– The schema is embedded in the data.– Can have a predefined schema (XML Schema)– Nesting can be arbitrary.– Can be irregular (e.g., different formats for elements)– Order of elements may be important.

22

Querying XMLXPath, XQuery, XSLT.XQuery is based on XPath, XML-QL, SQL.Query languages features:– Path expressions – The Return clause: creates the output XML

document.– Standard query language bells and whistles.– Not in XQuery: tag variables – bind variables to

schema elements.Query containment: ask Dan and Gerome.

23

Knowledge Representation

McCarthy suggested some form of first-order logic.Semantic networks – popular on the east coast.Evolved into:– Frame-based systems– Description logics (a.k.a. terminological logics).

24

Description Logics

A subset of first-order logic with a German syntax. No variables.Allows only:– Unary relations (Concepts): Person, Happy– Binary relations (Roles, attributes): childOf

A DL Knowledge base:– Abox of ground facts: Person(sue), Happy(bob)– Tbox of definitions.

25

Concept Descriptions

Built using a set of constructs:C, D A | Primitives Konzept | Top Konzept

| Bottom Konzept C D | DurchschnittC U D | Vereinigung

C | Komplement R.C | Rollenquantifikation/Werterestriktion R.C | Rollenquant./Existentielle Restriktion

26

Concept Descriptions

Built using a set of constructs:C, D A | Primitive concepts | Top concept

| Bottom conceptC D | IntersectionC U D | Union

C | Complement R.C | Universal restriction R.C | Existential restriction (> n R) | number restriction

27

TBox Assertions

Concept introduction:– Person Mammal

Concept definition:– Parent = Person (> 0 child)– HappyParent = Parent ( child.Smart)

Inclusion assertions:– Parent ( child. (= name Karina)) HappyParent– Person (> 5 child) HappyParent

28

Reasoning in DLs.

Note: you can assert view facts – HappyParent(bob)– You don’t know who the children are, but you know

they’re smart.Classification: C(a)?Consistency: is C necessarily an empty concept?Subsumption: C1 C2?Theoretically, everything boils down to subsumption.Complexity depends on set of constructors allowed.

29

DLs vs. Horn rules

Horn rules can handle any variable pattern– DL’s can handle only specific patterns.

DL’s can do subsumption with negation, number restrictions, and various other features.They can be combined but decidability is subtle (see CARIN, [Levy and Rousset, 1996]).

Recommended