Rdf with contexts

Blogic in the real world

A d d i n g c o n t e x t s t o R D F

P a t H a y e s , F l o r i d a I H M CA p r i l 2 0 1 2

1Saturday, April 14, 12

Blogic

"Blogic" means the logic that actually gets used on the Semantic Web.

This is not necessarily the way that the formalisms are officially defined.


Blogic in the real world

Most of the actual deployed content on the semantic web is "linked data", which is billions of RDF triples with a few tiny sprinkles of other, more expressive, notations.

The official logic of RDF is pretty trivial (&,∃). But the way that it actually gets used is different and rather more complicated.


RDF and URIsThe official view is that the 'names' in RDF, ie the URIs, are global in scope and have fixed, eternal, referents.

This keeps the logic simple, and conforms to an idealized vision of the Web (cf. TimBL's idea of "cool URIs"). Call this the 2004 globalist ideal.

The actual reality on the Web, however, seems to be that the meaning of a URI might vary depending on where and when it is used. URI referents are context-sensitive.

<can-of-worms> Note, this is about what URIs refer to when used as logical names, not what they "identify" when used by HTTP. These are two quite distinct ideas. Typically (not always) a URI identifies some (source of) data about what it refers to. </can-of-worms>


RDF and SPARQL

SPARQL is the query language designed to fit with RDF.

SPARQL queries are directed to a datastore (AKA a quad store) which has one optional 'default' RDF graph and a finite number of 'named' RDF graphs:

{<a b c> ...} n1 {< d e f> <g h i>} n2 {< a d x> <g h i> ...} n3 ...

SPARQL is now widely deployed in many real-life applications. Unfortunately, datastores have no official semantics, so are being used in all kinds of ways.


SPARQL datastoresA datastore can be:

1. a way to name some graphs 2. a way to keep track of versions of a graph3. a way to keep track of time-varying data (the graph "name" encodes times)4. a way to distinguish data from meta-data (in the default graph)5. a way to distinguish data depending upon its provenance or source (the graph name denotes the source)5a. a way to distinguish data depending upon its topic (the graph name denotes the topic)6. a way to keep data sorted into groups which share a common meaning for the URIs in the graphs (an "island")7. any combination of the above; or sometimes one of the above, sometimes another8. various other things.


SPARQL datastores

After a huge amount of debate, discussion, argument, the RDF WG has distilled these down to two, and the current discussion is about how to find a sweet spot between these.

1. a way to name some graphs

6. a way to keep data sorted into groups which share a common meaning for the URIs in the graphs (an "island")

In 1., the graph name definitely denotes the graph. In 6., it often denotes something else. This is a problem.


SPARQL datastoresAntoine Zimmerman has suggested a model theory for datastores based on the "island" interpretation, as follows:

In substance, this formalization says that each RDF Graph in a Dataset is interpreted separately. This models the fact that different RDF Graphs hold in different contexts. This way, graphs that have been put in different "named graph pairs" can contradict with each other without making the Dataset inconsistent.

Like RDF interpretations, a dataset-interpretation is relative to a vocabulary V. Moreover, dataset interpretations are defined with respect to an entailment regime E, as defined in SPARQL 1.1 Entailment Regimes. Let KE be the set of all E-interpretations. The interpretation of an RDF Dataset (G, (<n1>,Gn1), ..., (<nk>,Gnk)) over vocabulary V is a pair (I,Con) where I is an E-interpretation of G (the default graph) and Con is a mapping from V to KE.

A dataset-interpretation (I,Con) of a vocabulary V wrt entailment regimùe E satisfies an RDF Dataset (G, (<n1>,Gn1), ..., (<nk>,Gnk)) iff I E-satisfies G, and for all iin [1..k], Con(ni) exists and E-satisfies Gni.

Following standard definitions, we say that a dataset D=(G, (<n1>,Gn1), ..., (<nk>,Gnk)) entails a dataset (H, (<m1>,Hm1), ..., (<mp>,Hmp)) iff all dataset-interpretation (I, Con) that satisfies D also satisfy H.

What this does is to treat each named graph as existing in its own local context, with its URIs treated as different in meaning from the same URI occurring elsewhere. Call this the graph-local vision. Nothing could be more different from the 2004 globalist ideal.


http://www.w3.org/TR/sparql11-entailment/

http://www.w3.org/TR/sparql11-entailment/

SPARQL datastores

Sandro Hawke is running with the naming idea, and has a proposal for distinguishing between an actual name for a graph and a mere label (ie a URI used as a "graph name" in the datastore but not actually denoting the graph.) This treats the labeling relationship as a functional RDF property with the constraint that if A is a graph then (A label B) implies A=B, and then the combination

{ <name> rdf:type rdf:Graph }...<name> { ... graph1... }...

forces this particular labeling to be a genuine naming. (This allows other labelings to not be names, which is widely used.)


Web contextsTrying to make sense of all this leads to a vision of RDF on the Web as being a context logic. Let me call this RDFC. RDFC extends RDF with a notion of 'web context'.

A web context represents a social agreement concerning the meaning of a vocabulary of URIs, called the reserved vocabulary of the context. Asserting a graph in a context means that one is a committment to use the reserved vocabulary in a way that conforms to the agreement. The agreement may be explicit or implicit, and it may or may not be accessible in some form from a URI used to indicate the context.

The most explicit and formal case would be a coined URI which identifies via HTTP an RDF graph document which completely formalizes the semantic constraints of the context, with the understanding that the URI denotes this RDF graph. Call this a graph context. However, not all contexts can be represented as graph contexts. The other extreme is that a URI may be used to indicate a context without any explanation or definition of the semantic restrictions it is intended to impose. This is legal, although of limited utility.

Being a context is a role rather than a classification. Anything can be treated as a context (just as anything in RDF can be a property or a class.) A given URI may therefore identify one thing via HTTP, denote another thing, and be used to indicate a context, all at the same time.


RDFC syntaxRDFC looks just like RDF, but RDF graphs are understood to always be asserted in some context, indicated by a URI.

To assert a graph G in a context C, simply include the triple < > rdf:inherit C .in G. rdf:inherit is transitive, of course. ( < > means "this graph".)

If C is a graph context, this means exactly what owl:imports means now, by the way, so it shouldn't be too revolutionary an idea :-)**

A 'bare' assertion of an RDF graph which has no rdf:inherit triple (like all such assertions to date) is understood to be made in the default topmost context, called rdf:, which defines the meaning of the RDF namespace.

**(Footnote) Noticing this similarity between owl:imports and Cyc's context inheritance is what led to the current proposal.


Context inheritanceThe topmost context is called rdf: and defines the RDF namespace as defined by the 2004 RDF specification documents. This is a default, so all existing RDF graphs are understood to be asserted in it. Asserting in this context is accepting the 2004 globalist ideal. If this were the only context, RDFC would be identical to 2004 RDF.

The other extreme is to assert a graph in itself, considered as a context. This effectively declares all its non-reserved URIs as reserved to it, and hence separates them in meaning from the same URIs used outside the graph. This gives Antoine's semantics for graphs named in a SPARQL dataset, ie the graph-localist perspective on graph meaning. One could do this using Sandro's naming trick as follows:

{ :name rdf:type rdf:Graph }:name {:name rdf:inherits :name <other triples of the graph> }

Note that :name is the graph itself (Sandro's convention) and is also the context in which this graph is asserted (our rule for rdf:inherits) giving the pattern required. Note also how one can use a URI denoting a graph to also indicate a context.


RDFC syntax

RDFC syntax requires:

1. a way to assert a graph in a context2. a way to specify the reserved vocabulary of a context3. a way to describe the semantic conditions imposed on the reserved vocabulary by the context. 4. a way to assert that one context inherits another

1. is done using rdf:inherits. We will assume that 4. is also described the same way.

Right now, we do not give any general formal syntax for 2. and 3., allowing users to define their own methods, perhaps informally. (In order to be used by inference engines, an algorithm must be provided which decides, for any URI, whether or not it is in the reserved vocabulary, and the semantic constraint must be expressible as a determinate condition on RDF interpretations of the reserved vocabulary. This can be done by, for example, specifying a set of axioms and inference rules which must be valid on the interpretations, but also by a direct mathematical description of the valid interpretations.)

In the case of a graph context, the non-reserved vocabulary of the context graph is the reserved vocabulary of the defined context, and the semantic constraint is that the context graph be true.


RDF:inherits

All the RDFC context structure (both inheritance between contexts and assertion of a graph in a context) is done with the single property rdf:inherits, and since this property is part of the rdf: namespace whose meaning is fully determined by the rdf: top context inherited by default by all others, its meaning cannot be changed. So RDFC does not allow 'contextual assertions of contexts' or any other oddities. The context structure itself is global.


RDFC model theory

An RDFC interpretation of a vocabulary V is an RDF interpretation I of V together with a mapping con from the universe U of I to the set of RDF interpretations over subsets of V with universes subsets of U. Define voc(x) to be the vocabulary of con(x).

The interpretation of a URI uuu in a context ccc is defined to be con(I(ccc))(uuu) if uuu is in voc(I(ccc)), otherwise I(uuu).

A triple sss rdf:inherits ooo is true in I just when voc(I(ooo)) is the restricted vocabulary specified for a context denoted by the URI ooo, and con(I(sss)) satisfies the semantic conditions specified for a context denoted by the URI ooo.

The remaining truth recursions for triples, graphs, blank nodes, etc. are exactly as in the 2004 RDF model theory.


Context inheritance

In RDFC we have the globalist and localist views as extreme cases within one framework, but we also have more useful cases. Since users can define their own contexts and link them to other contexts and to RDF data, new semantic conditions can be introduced, defined and named 'in the field' without necessitating the elaborate and expensive WG review process needed to define a new 'web standard'.

And since contexts can be published and linked to, we have a way for the RDF/linked-data community to use URIs to refer to things in more nuanced ways than they can at present.

For the usefulness of contexts, see Lenat's papers on the topic from the Cyc project (this proposal is almost exactly like the microtheories machinery implemented in CYCL, transcribed to a Web context.)


Some examples

1. Current entailment regimes (RDFS, OWL,RIF) can be viewed as contexts (and identified using existing URIs, so we now have a realistic way to refer to them in RDF itself), but we can also define new ones, eg the {owl:sameAs, owl:functionalProperty} subset used in FOAF.

2. Time-dependent properties can be described as such in a context definition, which also specifies how its subcontexts can register temporal information. (Use case 3)

3. Topics or information sources can be used as context indicators for RDF information relevant to the topic or derived from the source. (Use cases 5 and 5a)

4. Progressively more 'refined' meanings can be indicated by contexts without inventing new vocabulary, eg the class name :Person might mean all human beings, all living human beings, all living American citizens in three successive subcontexts. (Lenat reports on the usefulness of this in Cyc.)

5. Contexts provide a degree of useful referential opacity, eg an owl:sameAs asserted in one context might cease to be true in a subcontext when more refined meanings are in use (eg chemical elements vs. chemical isotopes)


Education

Rdf with contexts