Federated data stores using semantic web technology

Preview:

Citation preview

Federated Data Stores using

Semantic Web Technology

Steve Ray

Distinguished Research Fellow

Carnegie Mellon University

Interoperability is all about DATA

Three Technology Trends

that could help*

1. Semantic Web technologies

2. Cloud

3. Natural Language Processing

I will focus on semantic web technologies

*Inspired by “Top Three Technologies to Tame the Big Data Beast,” Huffington Post, 11/22/2011 Steve Ray, Carnegie Mellon University

Representation Trends

IBM Card Format

EDI

XML

Metadata

Metamodels

Meta-meta-

models

RDF/OWL

XML Schema

BPML/

BPEL

CBA

Semantic Mediation

Web Services

Protocols

40

25

7

6

5

0

2

4

3

1

SOA

Legacy

Current Practice

Exploratory

18 Info Modeling

FOL

(Slide adapted from Donald Hall, Logistics Enterprise Services Office, DLA)

Steve Ray, Carnegie Mellon University

Why Consider RDF & OWL

Semantic Web Technology?

RDF = Resource Description Framework

OWL = Web Ontology Language

1. Simple representation

– Everything is a triple: <subject – predicate – object>

2. Self-describing models

– Schemas and data coexist in data stores

3. Easy to interrogate

– SPARQL queries (over schema and data)

4. Easy to validate

– Supports automated reasoning

5. Easy to interoperate

– Natively supports distributed data stores

Steve Ray, Carnegie Mellon University

Simple Representation

Everything is stored as triples:

<subject predicate object>

Steve Ray, Carnegie Mellon University

Self-Describing Models

• The schema (model) and the data is stored in

the same place

• Schema:

– Mammal subClassOf Animal

– Human subClassOf Mammal

• Data:

– george is-a Human

– george marriedTo lisa

Steve Ray, Carnegie Mellon University

Easy to Interrogate

SPARQL†

language to query an RDF database

(Just matches against patterns of triples)

SELECT ?x

WHERE {

george marriedTo ?x .

}

Returns a table:

x

lisa

SELECT ?y

WHERE {

y? subClassOf Animal .

}

Returns a table:

y

Mammal

SPARQL = SPARQL Protocol and RDF Query Language Steve Ray, Carnegie Mellon University

Easy to Validate

SPARQL can be used

for reasoning,

not just interrogating

In SPARQL:

If

George sonOf Fred and

Fred siblingOf Mary Then

George nephewOf Mary

CONSTRUCT

{ ?a nephewOf ?c .}

WHERE

{

?a sonOf ?b ;

?b siblingOf ?c .

}

Steve Ray, Carnegie Mellon University

Easy to Interoperate

• A single query can interact with more than one

RDF database

– Linked Movie Database contains movies, actors

– DBPedia contains people and birthdates

• Find the birthdates of all Star Trek actors

– Answer does not exist in one source

Dbpedia is just one

of many RDF data stores

on the Web

We are not alone

Implications

• OWL/RDF provides a representation that can

natively support transformations from other

modeling languages and native formats for

product and process models

• The API is SPARQL

• Storage can be local or web-based

Steve Ray, Carnegie Mellon University

Take-away

• Poor interoperability is expensive

• Interoperability solutions can be expensive

• Semantic technology can make interoperability

solutions easier and cheaper to implement

Steve Ray, Carnegie Mellon University

Recommended