48
Steffen Staab Programming with Semantic Broad Data 1 Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Web and Internet Science Group · ECS · University of Southampton, UK & Programming with Semantic Broad Data Steffen Staab @ststaab west.uni-koblenz.de

Programming with Semantic Broad Data

Embed Size (px)

Citation preview

Steffen Staab Programming with Semantic Broad Data 1Institute for Web Science and Technologies · University of Koblenz-Landau, GermanyWeb and Internet Science Group · ECS · University of Southampton, UK &

Programming with

Semantic Broad Data

Steffen Staab@ststaab

west.uni-koblenz.de

Steffen Staab Programming with Semantic Broad Data 2

The World of Big Data – Volume & Velocity

Genome data• Up to 200 GB/personVideo data• Upload 300 hrs/minSensor data• 5000 sensors/jet

engine• 1 Tera bit/s

360 TB/disc

https://flic.kr/p/8zuDTm

https://flic.kr/p/59jc2hComplex?

Steffen Staab Programming with Semantic Broad Data 3

The World of Big Data – Volume & Velocity

Genome data• Up to 200 GB/personVideo data• Upload 300 hrs/minSensor data• 5000 sensors/jet

engine• 1 Tera bit/s

https://flic.kr/p/8zuDTm

https://flic.kr/p/59jc2h

Hard in some dimensions

18 concepts

Noise amplitudes

Steffen Staab Programming with Semantic Broad Data 4

The World of Big Data – Variety

Data models• Graph data• Relational• XML• RDF• CSV• JPEG• MPEG-1, 2, 4• Dicom• PDF• Excel• ...

Conceptual modelsaka ER schemataaka Logical schemataaka XML schemataaka RDFS / OWL ontologies

Foaf, Dublin Core, Marc81, Unifact,.....

Dozens - Hundreds

Steffen Staab Programming with Semantic Broad Data 5

The World of Big Data – Variety – 15 years agoSAP• In the order of 10,000

‘concepts’• Days to find the right column

Medical information system (Lars)• Treating transplant patients • Approx. 10,000 concepts

Only myvery limited experiences

Big consulting business

Steffen Staab Programming with Semantic Broad Data 6

The World of Big Data – Variety – Today!Wikidata• 1,148,230 concepts • 2515 relationsUMLS• 1 Mio conceptsBioinformatics• 1000s public databases • 35 in Bio2rdf

(11 bio triples)eGov datasets• 200,000 by Fraunh. Fokus• 20,000 by ODIKnowledge Graphs• Ask Google, Microsoft, Samsung, HP,

...Sensor types• 330 broad types in Wikipedia• Tens of thousands

How to write valid, robust

programs?

How to find data?

Steffen Staab Programming with Semantic Broad Data 7

How to write a valid, robust program?

SELECT ?xWHERE { ?x a CONCEPT15}

SELECT ?xWHERE { ?x a CONCEPT151735}

https://flic.kr/p/8zuDTm

18 concepts1,166,040 concepts1,148,230 concepts

Sept, ´16March, ´16

Steffen Staab Programming with Semantic Broad Data 8

How to approach big data

In fhe following I am guessing what Axel Polleres might have told youabout Enterprise Linked Data

Steffen Staab Programming with Semantic Broad Data 9

Traditional Information Architecture

Business Logics

Structured DataUnstructured

Data

Presentation and Interaction

Characteristics:• Processes are

known• Data structures

are known• Meaning of data

primarily in schema and code

Steffen Staab Programming with Semantic Broad Data 10

Big Data in Today‘s Information Architecture

Characteristics:• Little structure• Semi-structured

data• Meaning of data of

primary importance!

Steffen Staab Programming with Semantic Broad Data 11

Variety Issue 1: Data ModelsData Models:• Relational• Tree (XML,...)• Document oriented• Stream• Array• Graph-DB

RDFGraph data model as common denominator

Steffen Staab Programming with Semantic Broad Data 12

Dealing with Issue 1: RDF as Data Model

RDFGraph data model as common denominator

knowsBowie Saran-

don

8-1-1947

bornOn

Steffen Staab Programming with Semantic Broad Data 13

Variety Issue 2: Conceptual Models

Conceptual Models:• ER• UML• ...

RDFSOntology as common

denominator

Steffen Staab Programming with Semantic Broad Data 14

Variety Issue 2: RDFS as common conceptual meta model

RDFSfor explicit conceptual

description

knowsBowie Saran-

don

8-1-1947

bornOn

MusicArtist Actor

typetype

Steffen Staab Programming with Semantic Broad Data 15

Variety Issue 3: System Boundaries

IRIsfor globally unique

referencing

f:knowsm:Bowie d:Saran

-don

8-1-1947

m:bornOn

m:Music Artist d:Actor

rdf:typerdf:type

m = http://musicbrainz.orgd = http://dbpedia.orgf = http://xmlns.com/foaf/0.1/rdf = https://www.w3.org/2001/sw/

Steffen Staab Programming with Semantic Broad Data 16

A Practical Perspective on Broad Data with LITEQ

Steffen Staab Programming with Semantic Broad Data 17

Drosophila: Linked Open Data Cloud

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Dozens of domains

Hundreds of data sources

Thousands of concepts

Millions of entities

Billions of triples

Semantic Broad Data

Steffen Staab Programming with Semantic Broad Data 18

Programming with Linked Data

Steffen Staab Programming with Semantic Broad Data 19

c1

Programming with Linked Data

Tasks of the Programmer 1 Schema exploration

2 Programming code types

3 Programming queries

4 Programming procedures for

• creating, • manipulating,• persisting

objects

Steffen Staab Programming with Semantic Broad Data 20

Node Path Query Language Using AutocompletionExploration of classes

Steffen Staab Programming with Semantic Broad Data 21

Node Path Query Language Using AutocompletionExploration of classes

Exploration of relations

Steffen Staab Programming with Semantic Broad Data 22

Node Path Query Language: Query FormulationExploration of classesExploration of relationsQuerying for instances

Type set of mo:MusicArtist

No definition or declaration needed

Steffen Staab Programming with Semantic Broad Data 23

Node Path Query Language for Code DevelopmentExploration of classesExploration of relationsQuerying for instancesDeveloping code with queries

All translated into SPARQL queries at• Development time• Type inference at compile time

(but also as part of IDE)• Querying again at run time

One language to bind them all

Steffen Staab Programming with Semantic Broad Data 24

Node Path Query Language for Code DevelopmentExploration of classesExploration of relationsQuerying for instancesDeveloping code with queriesDeveloping code with new classes

All translated into SPARQL queries at• Development time• Run time update• Persistence!

Steffen Staab Programming with Semantic Broad Data 25

Formal NPQL Syntax

Data browsing

Restricting Class Expressions

Evaluating Class Expressions

Navigating from Data to Classes

Navigating from Data to Property Types

URI set

Intensional Queries

Extensional Queries

NavigationalQueries

Steffen Staab Programming with Semantic Broad Data 27

NPQL Algebra (Example)Reversibility

can be used to simplify path expressions.

Steffen Staab Programming with Semantic Broad Data 28

Summary on LITEQLanguage Integrated Types, Extensions, and Queries

NPQL (Node Path Query Language)• Navigational Queries• Intensional Queries• Extensional Queries• Compilation to SPARQL

LITEQ• Implementation of NPQL as F# Type Provider in Visual Studio• Autocompletion using NPQL queries• Automatic typing

of extensional query resultsby intensional queries

Steffen Staab Programming with Semantic Broad Data 29

„That seems to work very well in practice, but how does it work in theory?“

17 let allArtists = Store.NPQL().``mo:MusicArtist``.Extension

What is implied by such a line......for the programme?...for the compiler?

seems to

Steffen Staab Programming with Semantic Broad Data 30

A Foundational Perspective on Semantic Broad Data Using DL

Steffen Staab Programming with Semantic Broad Data 31

What we want to have: Static Type Checking

But:• In LITEQ: Queries must receive types• Number of types in our system very/infinitely large• Existing type systems expect complete knowledge

Programming with Data from a Knowledge Base

Issue in our prototype

Steffen Staab Programming with Semantic Broad Data 32

Related Work

Generic Types• Everything is a node

or an edge• No type checking!

Only 2nd place in Halo competition

Mapping approaches• Hibernate• LITEQ• ActiveRDF• Summer / Winter• ...

Preferred in SemWeb now Been there, done that

Steffen Staab Programming with Semantic Broad Data 33

Example – and Issues with Mapping

Mapping DL types to PL types problematic because1. Mix of nominal (MusicArtist) and structural typing (recorded.Song)2. Schema-less information (influencedBy)3. Inference (hendrix:MusicArtist)4. Sheer size of terminology

How to type a query?

Steffen Staab Programming with Semantic Broad Data 34

Example

Code

To be rejected is not subtype of

How to type a query?

Steffen Staab Programming with Semantic Broad Data 35

Example

Code

To be accepted is a

How to type a query?

Steffen Staab Programming with Semantic Broad Data 36

What we want to have: Static Type Checking

Challenge:• A programming language that accepts

concept expressions as types and can deal with inferences

Programming with Data from a Knowledge Base

DL

Steffen Staab Programming with Semantic Broad Data 37

Given • Atomic Types: A={...Ai...}• Plus Function types: T={...Ai..., ...TiTj...}

Add elements• Concept expressions ( Intensional NPQL queries )• Instances ( Extensional NPQL queries)

Add knowledge• Typing and subtyping derived from knowledge base

Core Ideas of DL

Steffen Staab Programming with Semantic Broad Data 38

Concept Forming Expressions

Syntax Semantics

Top T I

Bottom I

Concept Name A AI

Intersection A B AI BI

Negation A I \ AI

Existential Restriction R.C { a I | (a,b) RI and b CI}

Axioms Syntax Semantics

T-Box Subclass C D AI BI

A-Box Concept assertion a:C aI CI

A-Box Role assertion (a,b) : R (aI,bI) RI

Description Logics Fragment

Steffen Staab Programming with Semantic Broad Data 39

Universal model of computation• Abstraction• Application

Example:• f.x.f (f x)

Evaluation rules

Calculus

Steffen Staab Programming with Semantic Broad Data 40

Syntax for core DL

Steffen Staab Programming with Semantic Broad Data 41

Core DL: Evaluation and Typing

Nominal DL-Type

Steffen Staab Programming with Semantic Broad Data 42

Subtyping

¥ many typesAdd KB knowledge

only when needed for checking application,

not proactively

Steffen Staab Programming with Semantic Broad Data 43

• Queries return sets• Concept set type needed• Set operators needed

• Map, Fold, Element• Queries may return infinite sets

• No theoretical problem, but lack of well-defined stopping conditions in KBs

• Type dispatch based on inferencing

Further issues and opportunities in DL

Steffen Staab Programming with Semantic Broad Data 44

DL Interpreter in F# and using HermiT

Steffen Staab Programming with Semantic Broad Data 45

Theorem: A well-typed closed term does not get stuck during evaluation (with common exceptions).

Result for DL

Typing is a safety net, but does not solve the halting problem

(empty list)

Steffen Staab Programming with Semantic Broad Data 46

Conclusion

Steffen Staab Programming with Semantic Broad Data 47

Broad data• has grown from 104 to 106 concepts (plus data)• continues to grow

– more integration of distributed databases– more sensors of different types– More crowdwork

• has not been recognized as a problem of its own, yet• will lead to

– brittleness– high maintenance efforts– loss of opportunities

Present of Broad Data

Steffen Staab Programming with Semantic Broad Data 48

New Methods for Broad data• Explore

– Understand• Find• Relate (see e.g. Linda‘s talk today)• Program• Maintain

Future of Broad Data

Steffen Staab Programming with Semantic Broad Data 49Institute for Web Science and Technologies · University of Koblenz-Landau, GermanyWeb and Internet Science Group · ECS · University of Southampton, UK &

Thank you for your attention!

Thanks to my collaborators for this work:

Stefan Schegelmann, Martin Leinberger, Matthias Thimm (WeST, Koblenz)Evelyne Viegas (Microsoft Research, Redmond)

Ralf Lämmel (SOFTLANG, Koblenz)