44
Tetherless World Constellation On beyond OWL Jim Hendler @jahendler Tetherless World Professor of Computer, Web and Cognitive Science Director, Rensselaer Institute for Data Exploration and Applications

On Beyond OWL: challenges for ontologies on the Web

Embed Size (px)

Citation preview

Page 1: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

On beyond OWL

Jim Hendler@jahendler

Tetherless World Professor of Computer, Web and Cognitive ScienceDirector, Rensselaer Institute for Data Exploration and Applications

Rensselaer Polytechnic Institute

Page 2: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Semantic Web and its contribution

Slide ca. 2001. Were we right?

Page 3: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Were we right?Sort of? Mostly?

Page 4: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Example: Semantic Search

IEEE Computer, Jan 2010)

Page 5: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Contenders ca. 2010

Page 6: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Contenders ca. 2014

Google Sem Webbers include: R. Guha, Dan Brickley, Denny Vrandečić Natasha Noy, Chris Welty, …

Page 7: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Google 2014

Google finds embedded metadata on >20% of its crawl – Guha, 2014

Page 8: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Other success stories

Facebook: 2011 Oracle: 2012

Page 9: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Some others

Page 10: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Watson used Semantic Web

IBM

Page 11: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

All cool, but…

• Which of these use OWL in any significant way?

(This part of the slide intentionally left blank)

Page 12: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Why not OWL?

• This is NOT to say OWL isn’t being used– but it’s not much by comparison– but it’s not much on the Web

• with the possible exception of the misuse of owl:sameAs, but let’s not go there…

• Semantic markup on the Web exceeds anything we predicted in 2001…– … but OWL use on the Web as a proportion of

this lags behind the expectations many of us had

• The rest of this talk is speculation as to why…

Page 13: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

What I used to think the problem is

Page 14: On Beyond OWL: challenges for ontologies on the Web

Ontology: the OWL DL view

• Ontology as Barad-Dur (Sauron's tower):– Extremely powerful!

– Patrolled by Orcs• Let one little hobbit

in, and the whole thing could come crashing down

inconsistency

Decidable Logic basis

Page 15: On Beyond OWL: challenges for ontologies on the Web

ontology: the RDFS view

• ontology and the tower of Babel– We will build a tower

to reach the sky– We only need a little

ontological agreement

• Who cares if we all speak different languages?

Genesis 11:7 Let us go down, and there confound their language, that they may not understand one another's speech. So the Lord scattered them abroad from thence upon the face of all the earth: and they left off to build the city.

Page 16: On Beyond OWL: challenges for ontologies on the Web

ROI: Reasoning over (Enterprise) data

• This "big O" Ontology finds use cases in verticals and enterprises– Where the vocabulary can be controlled– Where finding things in the data is important

• Example– Drug discovery from data

• Model the molecule (site, chemical properties, etc) as faithfully and expressively as possible

• Use "Realization" to categorize data assets against the ontology– Bad or missed answers are money down the drain

BUT VERY EXPENSIVE

Page 17: On Beyond OWL: challenges for ontologies on the Web

ROI: Web 3.0• The "small o" ontology finds use cases in Web

Applications (at Web scales)– A lot of data, a little semantics– Finding anything in the mess can be a win!

• Example– Declare simple inferable relationships and apply, at

scale, to large, heterogeneous data collections• These are "heuristics" not every answer must be right

(qua Google) – But remember time = money!

WHAT I USED TO BELIEVE

Page 18: On Beyond OWL: challenges for ontologies on the Web

ROI: Web 3.0• The "small o" ontology finds use cases in Web

Applications (at Web scales)– A lot of data, a little semantics– Finding anything in the mess can be a win!

• Example– Declare simple inferable relationships and apply, at scale,

to large, heterogeneous data collections• eg. Use InverseFunctional triangulation to find the entities that

can be inferred to be the same– These are "heuristics" not every answer must be right (qua Google) – But remember time = money!

WHAT I NOW BELIEVE

Page 19: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

BUT, not quite the same way

• That explains why the uptake of RDFS/SPARQL is going on– and used by things like schema.org

• But still doesn’t explain why OWL isn’t used as much as it should be – especially on the Web– especially when the need for ontologies is

growing rapidly and many kinds of “ontology like things” are being used heavily

Page 20: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

What I am coming to believe

• My previous view was blaming OWL’s problems on too much expressivity– as opposed to RDFS (or really RDFS+)

• I now believe the problem is lack of expressivity (in an interoperable way)– for tasks where people really need Web

semantics

Page 21: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

The Web is increasingly about LINKING DATA

http://linkeddata.org/ cloud is a tiny fraction of what is out there

Page 22: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Current linked data approaches miss a key need

• Data is converted into RDF and SPARQL’ed– creates huge graph DBs less efficient than the

original DB• Data is converted from DB into SPARQL

return on demand– much better, but you must know the mapping

• owl:sameAs is (ab)used to map data to data– but that only lets you map equals – which is an

easy mapping to express in many ways• defining equality right in a model theory is much harder,

and thus the abuse, but let’s leave that for another talk

Page 23: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Data

MetaData

What we need

Linked Semantic Web “metadata” documents that can be used to link very large databases in distributed data systems. This could lead to orders of magnitude reduction in information flow for large-scale distributed data problems.

Page 24: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

What’s the problem

• We want a knowledge representation that can do things like:– help us find the right data for a problem

• The “Date” field in some DB could be lots of different things

– consider “Database of 1957 NYC births” vs “Database of 1957 NYC deaths”

– help us map between different databases• The problem isn’t primarily translating

ontologies to each other, it is tying the ontologies to the data (for the mappings)

Page 25: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Example: Mereology (and meronymy)

• OWL doesn’t have a part-whole relation– left out of design because we couldn’t reach

consensus (and there’s 2000 yrs of argument behind that)

• but also because most require transitive closure of parts in many cases and that had complexity issues

– but one of the mostused relations in the gene ontologyand many medicalontologies is part of

Page 26: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

But, what? Wait!

Note in the above – we use the fact that the brain has 5 lobes as a driving example for qualified cardinality - but to say that in OWL you need to INVENT the “hasdirectpart” relation

Page 27: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Database mapping

• Many things in database schemas also map to parts/wholes– Who is in what organization?– What components comprise an

assembly?– Where did something occur (that was

part of another event)?– When…

Page 28: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

When?

• Temporal reasoning is missing from OWL– Knowing [this talk occurred during “OWLED

2015” AND “OWLED 2015” occurred in October of 2015, THEREFORE this talk occurred in October of 2015] would be huge• Whole books of temporal logics out there

– picking is hard– but it is also what OWL has to do to be a standard for

this kind of mapping

Page 29: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Geophysical reasoning

• Add math to part-wholes– OY! – but GDBs are widely used for mappings of

information in big databases• especially in science (more OWL use cases)

• Talking about “math” – lots of discussion of adding probability to

OWL• and someone should some day

– but what about

Page 30: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

SIMPLE relationship reasoning

• Discovery Informatics and data mining– Huge industries, and growing– Web data, Internet of things, data interoperability

(the startup holy grails)• Example: supposing some scientific theory

“implies” that if X increases then Y should also increase– Which databases would help confirm my theory?

Which would argue against it?– Easy to check in many new database systems– How do we express that aspect of the theory in OWL?

Page 31: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

procedural attachment

• In fact, what about procedural attachment?– lots of literature on preserving

completeness w/respect to procedures • used in prolog, etc.

– Why isn’t this in OWL?• patent issues w/respect to standard• general concern about whether procedural

attachment was possible in OWL DL framework

Page 32: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Procedural Attachment

No longer a patent issue …

Page 33: On Beyond OWL: challenges for ontologies on the Web

33

Exploring knowledge graphs

• Agent reasons about a network of connected entities and chooses the best next one to discuss based on a combination of factors

– Consistency, Novelty and Ontological relationships

Page 34: On Beyond OWL: challenges for ontologies on the Web

34

Integrating• Automate Agent Creation combining:

Information Extraction: Using anew "living information extraction" technique, we will be able to create a "never-ending extractor" which will be pulling from web documents information about entities and events, and the relationships between them.

The new system can work in a dynamic node, and does not need human annotated samples for training, but it works best if there are a number of known relationships between pages to build off of.

info extraction uses ontologies

Page 35: On Beyond OWL: challenges for ontologies on the Web

35

Integrate• Automate Agent Creation combining:

Information Extraction: Using anew "living information extraction" technique, we will be able to create a "never-ending extractor" which will be pulling from web documents information about entities and events, and the relationships between them.

The new system can work in a dynamic node, and does not need human annotated samples for training, but it works best if there are a number of known relationships between pages to build off of.

Semantic Web: The Semantic Web provides a number of known relationships between pages on the Web in a number of domains. Using general knowledge sources, like dbpedia and Yago, and specialized knowledge sources, like the data from musicbrainz, the reviews from Yelp (which have semantic annotations) and even the Open Graph of Facebook (which is available in a semantic web format), provides a jumpstart for the language extraction. However, the Semantic Web relates pages, but doesn't have any sort of "understanding" of what is on the pages.

ontologies again!

Page 36: On Beyond OWL: challenges for ontologies on the Web

36

Our Approach• Automate Agent Creation combining:

Information Extraction: Using anew "living information extraction" technique, we will be able to create a "never-ending extractor" which will be pulling from web documents information about entities and events, and the relationships between them.

The new system can work in a dynamic node, and does not need human annotated samples for training, but it works best if there are a number of known relationships between pages to build off of.

Semantic Web: The Semantic Web provides a number of known relationships between pages on the Web in a number of domains. Using general knowledge sources, like dbpedia and Yago, and specialized knowledge sources, like the data from musicbrainz, the reviews from Yelp (which have semantic annotations) and even the Open Graph of Facebook (which is available in a semantic web format), provides a jumpstart for the language extraction. However, the Semantic Web relates pages, but doesn't have any sort of "understanding" of what is on the pages.

Cognitive Computing : Cognitive Computing, can allow us to have a better way of accessing information about the entities found on the Web and finding other information about the same entities using various kinds of search and language heuristics. This allows us to have more organized information, rapidly generated, about the entities being explored. However, given a large graph of entities (even the organized linked-open data cloud has information about billions of things), how do we choose what to display next? If the best we can do is provide links, all of the above isn't much better than choosing a page and clicking from there.

and here!

Page 37: On Beyond OWL: challenges for ontologies on the Web

37

Our Approach• Automate Agent Creation combining:

Information Extraction: Using anew "living information extraction" technique, we will be able to create a "never-ending extractor" which will be pulling from web documents information about entities and events, and the relationships between them.

The new system can work in a dynamic node, and does not need human annotated samples for training, but it works best if there are a number of known relationships between pages to build off of.

Semantic Web: The Semantic Web provides a number of known relationships between pages on the Web in a number of domains. Using general knowledge sources, like dbpedia and Yago, and specialized knowledge sources, like the data from musicbrainz, the reviews from Yelp (which have semantic annotations) and even the Open Graph of Facebook (which is available in a semantic web format), provides a jumpstart for the language extraction. However, the Semantic Web relates pages, but doesn't have any sort of "understanding" of what is on the pages.

Cognitive Story-telling Technology: Interactive storytelling techniques are being explored to take information in the kind of "knowledge graph" resulting from the above, and tailoring the presentation to a user using storytelling techniques. It is aimed at presenting the information as an interesting and meaningful story by taking into consideration a combination of factors ranging from topic consistency and novelty, to learned user interests and even a user’s emotional reactions. The system can essentially determine "where to go next" and what to do there in the organized information as processed above..

“User models” (which are a lot like ontologies)

Page 38: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

I could go on

• There are many other such examples– Describing how data in one set could be joined to

data in another is incredibly powerful, timely, and important

• it’s just not really what people have typically used traditional reasoners for

– Providing the ontological glue among different AI technologies: Priceless

• Why aren’t we doing more to understand these issues and bring into the OWL “family?”– de facto standardization leads to adoption

Page 39: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

My challenge

• Is the problem that we cannot have these powerful things in a decidable (sound and complete) language?– then maybe we have to give up decidability?

• or even better, define new maths of expressivity that have different kinds of “sound and complete” behavior

– i.e. approximately sound and complete» (is modern computational theory weakened by “within

epsilon” optimality?)– i.e. “anytime” algorithms for reasoners

» (conjecture) sound and complete at infinity

Page 40: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Challenge

• We may need to rethink how we are defining ontologies with respect to the necessary properties for useon the Web

Page 41: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

I’m not “anti-formalism”

• A sufficient formalism for Semantic Web applications must– Provide a model that accounts for linked data

• What is the equivalent of a DB calculus?– Provide a means for evaluating different kinds of

completeness for reasoners • In practice we must be able to model A-box effects as formally

as T-box technologies– example Weaver 2012 showed what restrictions were needed to maximize

parallelism of some OWL subsets– Think about other processors than formal reasoners

that will use the ontologies• ontologies used in many other ways

– i.e. why does an IE system w/F1=.84 need a DL reasoner?

Page 42: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Summary

• The growing world of (semantic web, linked) data needs ontologies more than ever – OWL has some of the important things– But is missing many of the really important

things• The problem isn’t (formal) expressivity

– it’s the need to express other things (esp relationships between properties, events, etc.)

– We need more research into how to formalize these kinds of relations

Page 43: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

Acknowledgments

• I get funded by lots of folks – this talk may or may not represent anything anyone there believes:– ARL, DARPA, Microsoft, Mitre, Optum Laboratories

• The Rensselaer Institute for Data Exploration and Application pays my salary – Thanks!!

• Many of my best ideas, including a lot in this talk come from listening to smart people– Frank van Harmelen and Peter Mika have influenced my

thinking about many of the ideas in this talk• The students and my colleagues at the RPI

Tetherless World Constellation (tw.rpi.edu)

Page 44: On Beyond OWL: challenges for ontologies on the Web

Tetherless World Constellation

ANY QUESTIONS?