28
Linked Open Data and Next Generation Science Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute, Troy, NY & CEO McGuinness Associates, Latham, NY Earth System Information Partners, Madison Wisconsin, July 18, 2012

20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Embed Size (px)

DESCRIPTION

Linked Data and Semantic Technologies can support a next generation of science. This talk shows examples of discovery, access, integration, analysis, and shows directions towards prediction and vision.

Citation preview

Page 1: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Linked Open Data and Next

Generation Science

Deborah L. McGuinness Tetherless World Senior Constellation Chair

Professor of Computer and Cognitive Science

Rensselaer Polytechnic Institute, Troy, NY

& CEO McGuinness Associates, Latham, NY

Earth System Information Partners, Madison Wisconsin, July 18, 2012

Page 2: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Background I

– Access to data is exploding with open government

data and numerous agencies publishing and

providing services access or at least FOIA access

– Citizen interest and contributions are increasing –

data gathering (e.g., bird observations), reviewing

(e.g., galaxy zoo), compute cycles (e.g., SETI), …

– Arguably the more large (both data volume and area

breadth) science problems need addressing – these

go beyond what a single research team can easily

solve

Page 3: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Background II

– Semantic Technologies – technological support for

encoding meaning in a form computers can

understand and manipulate – are maturing and

increasing in usage

– Computational encodings of meaning can be used

to help integrate, link, validate, filter,…. Essentially

to make smarter, more context-aware applications

– Semantic Technologies enable linking data … and

linked data provides a way of connecting and

traversing information, nodes, graphs, webs, …

Page 4: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Take Home Message

(early) – Linked Data is usable now by any project

– Linked Data and Semantic Technologies can help in

forming and connecting help large, distributed,

evolving efforts such as many earth and space

science projects

– In the rest of talk:

– Brief intro to Linked Data and Semantic

Technologies through examples

– Discussion about what we might do now and strive

for in the future

Page 5: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Linked Data

• Linked Data is quite simple and follows principles set

out by Berners-Lee in

http://www.w3.org/DesignIssues/LinkedData.html

– Use URIs as names for things

– Use HTTP URIs so that people can look up those names.

– When someone looks up a URI, provide useful information,

using the standards (RDF*, SPARQL)

– Include links to other URIs. so that they can discover more

things.

– Introduction by examples and then discussion

Page 6: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Population Sciences Grid Goals

• Convey complex health-related information to

consumer and public health decision makers

for community health impact

• Inform the development of future research

opportunities effectively utilizing

cyberinfrastructure for cancer prevention and

control

McGuinness, D. Shaikh, A., Lebo, T, Ding, L., Courtney, P., McCusker, J., Moser,. Morgan, G.D., Tatalovich, Z., Willis, G., Contractor, N., and Hesse, B.

2012. Towards Semantically-Enabled Next Generation Community Health Information Portals: The PopSciGrid Pilot In Proceedings of Hawaii

International Conference on System Sciences 2012

6

Page 7: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Semantic Web Perspective on

Initial Project Goals

• How can semantic technologies be used to integrate, present,

and analyze data for a wide range of users?

• Can tools allow lay people to build their own demos and

support public usage and accurate interpretation?

• How do we facilitate collaboration and “viral” applications?

• Within PopSciGrid:

– Which policies (taxation, smoking bans, etc) are correlated with health

and health care costs?

– What data should be displayed to help scientists and lay people

evaluate related questions?

– What data might be presented so that people choose to make (positive)

behavior changes?

– What does the data show? why should someone believe that?

– What are appropriate follow up questions to support actionability?

7

Page 8: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

What is an Ontology?

Catalog/

ID

General

Logical

constraints

Terms/

glossary

Thesauri

“narrower

term”

relation

Formal

is-a

Frames

(properties)

Informal

is-a

Formal

instance Value

Restrs.

Disjointness

, Inverse,

part-of…

Ontologies Come of Age McGuinness, 2001, and From AAAI Panel 99 – McGuinness, Welty, Uschold,

Gruninger, Lehmann

Plus basis of Ontologies Come of Age – McGuinness, 2003

Page 9: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Inference Web: Making Data Transparent and

Actionable Using Semantic Technologies

• How and when does it make sense to use smart system results & how do we

interact with them?

9

Knowledge

Provenance in Virtual

Observatories

9

Hypothesis

Investigation /

Policy Advisors

(Mobile)

Intelligent

Agents

Intelligence Analyst

Tools

NSF Interops:

SONET

SSIII – Sea Ice

Page 10: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

SPARQL to Xquery translator RDFS materialization

(Billion triple winner)

Govt metadata search

Linked Open Govt Data

SPARQL WG, earlier QL –

OWL-QL, Classic’ QL, …

OWL 1 & 2 WG Edited main OWL

Docs, quick reference,

OWL profiles (OWL RL),

Earlier languages: DAML,

DAML+OIL, Classic

RIF WG

AIR accountability tool

DL, KIF, CL, N3Logic

Inference Web, Proof

Markup Language, W3C

Provenance Working

group formal model,

W3C incubator group,

Inference Web IW Trust,

Air + Trust

Visualization APIs

S2S

Govt Data

Ontology repositories

(ontolinguag),

Ontology Evolution env:

Chimaera,

Semantic eScience

Ontologies, MANY other ontologies

Transparent Accountable

Datamining Initiative (TAMI)

Foundations: Web Layer Cake

Page 11: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
Page 13: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

PopSciGrid Example

State View

13

Extensible Mashups via Linked Data

Diverse datasets from NIH

Potentially linking to other content (e.g.

“unemployment rate”)

Accountable Mashups via Provenance

Annotate datasets used in demos

Feedback users’ comment to gov contact (e.g. %)

Annotation capabilities coming (and more)

Page 14: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

PopSciGrid II

Page 15: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Reflections

Successful but….

• What if we could allow data experts to build

their own demos?

• What if we could allow non-subject matter

experts to function as subject-literate staff?

• What if team members could interchange roles

(and thus make contributions in other areas)?

• What technological infrastructure is required?

• Claim: all of this is being done now – and it is

starting to scale and growing more accessible 15

Page 16: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Updates and Motivations from a

Computer Science Perspective

Old:

• Raw conversions

• Per-dataset vocabularies

• Custom queries

• Custom data

management code

• Limited use because of

Google Visualization

licenses

• State-level data

New:

• Enhanced conversions

• Vocabulary reuse

• Generic queries

• Re-usable data

management code

• Unlimited use of new

open source visualization

toolkit

• State and county-level

data 16

Page 17: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

County

average life

expectancy (Summary Measures of Health)

Page 18: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Why Did I Show A Population Science

Project and a Water Project?

Questions and goals are similar –

What’s happening with x? – health of a country,

water quality and other parts of an ecosystem,

climate changes

What intervention strategies are being tested

What policies are correlated with factors under

investigation

And

Why should people believe the outcome?

Page 19: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

See Global Change Provenance Representation in the

Global Change Information System (GCIS)

[email protected]

What’s happening with the climate

and how will it affect the U.S.?

National Climate Assessment 2013

30 chapters, 240 authors

A “Highly Influential Scientific Assessment”

Why should I believe it?

GCIS presenting the provenance of the report

itself, the key messages of the report,

including traceable accounts of the >500

technical inputs from reports, papers, models,

datasets, observations, etc.

Page 20: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

SemantEco/SemantAqua

• Enable/Empower citizens &

scientists to explore pollution

sites, facilities, regulations, and

health impacts along with

provenance.

• Demonstrates semantic

monitoring possibilities.

• Map presentation of analysis

• Explanations and Provenance

available

1

2 3

http://was.tw.rpi.edu/swqp/map.html and

http://aquarius.tw.rpi.edu/projects/semantaqua

4 5

1. Map view of analyzed results

2. Explanation of pollution

3. Possible health effect of contaminant (from EPA)

4. Filtering by facet to select type of data

5. Link for reporting problems

6. Now joint with USGS resource managers ; expanded to

endangered species; now more virtual observatory style

Page 21: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

System Architecture

access

Virtuoso

21

Page 22: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Originally developed for VSTO, now in SSIII, SESDI, SESF, OOI …

The Virtual Solar-Terrestrial

Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. 19

Conf. on Innovative Applications of Artificial Intelligence (IAAI-07),

http://www.vsto.org

Page 23: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Reflections

• What began as Semantic water quality monitoring is now SemantEco –

ecological and environmental monitoring in support of ecosystem analysis

• Now includes endangered species and related health impacts working with

USGS to prototype resource manager dashboard

• Expanding to include citizen science reporting on water on mobile platforms

• Now working with SONet, Santa Barbara County LTER, CUASHI to integrate

other related scientific observations

– Current focus use case ecological researcher

– Find relevant data (within and outside DataOne) by region, timeframe,

chemical, measurement dimension, species

– Currently background ontology is relatively simple and aims more at

discovery and integration

• Semantic Sea Ice project aimed at helping arctic ice researchers find and

evaluate data in support of understanding the state of ice in the arctic

• These technologies span the spectrum of supporting discovery, integration,

analysis, and ultimately prediction 23

Page 24: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Discussion

• Semantic Technologies and Linked Data are powering a wide array of applications – many in Big Science, Team Science, at least interdisciplinary science

• Tools and methodologies are ready for use

• We love to partner in these areas

• What do you need or want from linked data and semantic technologies?

Questions? - Deborah McGuinness

dlm @ cs . rpi . edu

Page 25: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Extra

Page 26: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

RDF Data Cube

Vocabulary

• For publishing multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts using RDF.

• Compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange).

• Also compatible with: – SKOS, SCOVO, VoiD,

FOAF, Dublin Core Terms

• Integrated with the LOGD

data conversion

infrastructure

• Integrated with other tooling

like Stats2RDF

26

Page 28: 20120718 linkedopendataandnextgenerationsciencemcguinnessesip final

Directions

28

• Incorporation of TWC data Quality Facts label (Zednik et al)

• Use of DataFAQs automated data quality framework (Lebo et al)

• Additional provenance inclusion / usage (Inference / Provenance Web)

• Annotation / Collaboration facilities (Michaelis et al)

• Other data sets? Or exposition of other parameters?

• Partners in additional topic areas