Upload
deborah-mcguinness
View
586
Download
0
Embed Size (px)
DESCRIPTION
This talk introduces Linked Data and Semantic Web by using two examples - population sciences grid and semantAqua - a semantically enabled environmental monitoring. It shows a few tools and the semantic methodology and opens discussion for LOD and team science
Citation preview
Linked Open Data as an Enabler for
Team Science
Deborah L. McGuinness Tetherless World Senior Constellation Chair
Professor of Computer and Cognitive Science
Rensselaer Polytechnic Institute, Troy, NY
& CEO McGuinness Associates, Latham, NY
Science of Team Science; LOD and Team Science April 19, 2012
Background
– Semantic Technologies – technological support for
encoding meaning in a form computers can
understand and manipulate – are maturing and
increasing in usage
– Computational encodings of meaning can be used
to help integrate, link, validate, filter,…. Essentially
to make smarter, more context-aware applications
– Semantic Technologies enable linking data … and
linked data provides a way of connecting and
traversing information, nodes, graphs, webs, …
Linked Data
• Linked Data is quite simple and follows principles set
out by Berners-Lee in
http://www.w3.org/DesignIssues/LinkedData.html
– Use URIs as names for things
– Use HTTP URIs so that people can look up those names.
– When someone looks up a URI, provide useful information,
using the standards (RDF*, SPARQL)
– Include links to other URIs. so that they can discover more
things.
– Introduction by examples and then discussion
Population Sciences Grid Goals
• Convey complex health-related information to
consumer and public health decision makers
for community health impact
• Inform the development of future research
opportunities effectively utilizing
cyberinfrastructure for cancer prevention and
control
McGuinness, D. Shaikh, A., Lebo, T, Ding, L., Courtney, P., McCusker, J., Moser,. Morgan, G.D., Tatalovich, Z., Willis, G., Contractor, N., and Hesse, B.
2012. Towards Semantically-Enabled Next Generation Community Health Information Portals: The PopSciGrid Pilot In Proceedings of Hawaii
International Conference on System Sciences 2012
4
Semantic Web Perspective on
Initial PopSciGrid Goals
• How can semantic technologies be used to integrate, present,
and analyze data for a wide range of users?
• Can tools allow lay people to build their own demos and
support public usage and accurate interpretation?
• How do we facilitate collaboration and “viral” applications?
• Within PopSciGrid:
– Which policies (taxation, smoking bans, etc) impact health and health
care costs?
– What data should be displayed to help scientists and lay people
evaluate related questions?
– What data might be presented so that people choose to make (positive)
behavior changes?
– What does the data show? why should someone believe that?
– What are appropriate follow up questions to support actionability?
5
Foundations: The Tetherless World
Constellation Linked Open Government
Data Portal
6
Create
TWC LOGD
Convert
Query/
Access
LOGD
SPARQL
Endpoint
Enhance
• RDF
• RSS
• JSON
• XML
• HTML
• CSV
• …
Community Portal
Data.gov deployment
What is an Ontology?
Catalog/
ID
General
Logical
constraints
Terms/
glossary
Thesauri
“narrower
term”
relation
Formal
is-a
Frames
(properties)
Informal
is-a
Formal
instance Value
Restrs.
Disjointness
, Inverse,
part-of…
Ontologies Come of Age McGuinness, 2001, and From AAAI Panel 99 – McGuinness, Welty, Uschold, Gruninger, Lehmann
Plus basis of Ontologies Come of Age – McGuinness, 2003
Inference Web: Making Data Transparent and
Actionable Using Semantic Technologies
• How and when does it make sense to use smart system results & how do we
interact with them?
8
Knowledge
Provenance in Virtual
Observatories
8
Hypothesis
Investigation /
Policy Advisors
(Mobile)
Intelligent
Agents
Intelligence Analyst
Tools
NSF Interops:
SONET
SSIII – Sea Ice
SPARQL to Xquery translator RDFS materialization
(Billion triple winner)
Govt metadata search
Linked Open Govt Data
SPARQL WG, earlier QL –
OWL-QL, Classic’ QL, …
OWL 1 & 2 WG Edited main OWL
Docs, quick reference,
OWL profiles (OWL RL),
Earlier languages: DAML,
DAML+OIL, Classic
RIF WG
AIR accountability tool
DL, KIF, CL, N3Logic
Inference Web, Proof
Markup Language, W3C
Provenance Working
group formal model,
W3C incubator group,
…
Inference Web IW Trust,
Air + Trust
Visualization APIs
S2S
Govt Data
Ontology repositories
(ontolinguag),
Ontology Evolution env:
Chimaera,
Semantic eScience
Ontologies, MANY other ontologies
Transparent Accountable
Datamining Initiative (TAMI)
Foundations: Web Layer Cake
PopSciGrid Workflow
CSV2RDF4LOD
Direct
SemDiff
Archive
CSV2RDF4LOD Enhance
visualize
derive derive
derive
arc
hiv
e
Publish Ban coverage
CHSI 2009
PopSciGrid Example
State -Hawaii
12
Extensible Mashups via Linked Data
Diverse datasets from NIH
Potentially linking to other content (e.g.
“unemployment rate”)
Accountable Mashups via Provenance
Annotate datasets used in demos
Feedback users’ comment to gov contact (e.g. %)
Annotation capabilities coming (and more)
PopSciGrid II
Reflections
Successful but….
• What if we could allow data experts to build
their own demos?
• What if we could allow non-subject matter
experts to function as subject-literate staff?
• What if team members could interchange roles
(and thus make contributions in other areas)?
• What technological infrastructure is required?
• Claim: all of this is being done now – but not at
scale 14
Updates and Motivations from a
Computer Science Perspective
Old:
• Raw conversions
• Per-dataset vocabularies
• Custom queries
• Custom data
management code
• Limited use because of
Google Visualization
licenses
• State-level data
New:
• Enhanced conversions
• Vocabulary reuse
• Generic queries
• Re-usable data
management code
• Unlimited use of new
open source visualization
toolkit
• State and county-level
data 15
RDF Data Cube
Vocabulary
• For publishing multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts using RDF.
• Compatible with the cube model that underlies SDMX (Statistical Data and Metadata eXchange).
• Also compatible with: – SKOS, SCOVO, VoiD,
FOAF, Dublin Core Terms
• Integrated with the LOGD
data conversion
infrastructure
• Integrated with other tooling
like Stats2RDF
16
County
average life
expectancy (Summary Measures of Health)
SemantEco/SemantAqua
• Enable/Empower citizens &
scientists to explore pollution
sites, facilities, regulations, and
health impacts along with
provenance.
• Demonstrates semantic
monitoring possibilities.
• Map presentation of analysis
• Explanations and Provenance
available
1
2 3
http://was.tw.rpi.edu/swqp/map.html and
http://aquarius.tw.rpi.edu/projects/semantaqua
4 5
1. Map view of analyzed results
2. Explanation of pollution
3. Possible health effect of contaminant (from EPA)
4. Filtering by facet to select type of data
5. Link for reporting problems
6. Now joint with USGS resource managers ; expanded to
endangered species; now more virtual observatory style
System Architecture
access
Virtuoso
19
Originally developed for VSTO, now in SSIII, SESDI, SESF, OOI …
The Virtual Solar-Terrestrial
Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. 19
Conf. on Innovative Applications of Artificial Intelligence (IAAI-07),
http://www.vsto.org
Discussion
• Semantic Technologies and Linked Data are powering a wide array of application – many in Big Science, Team Science, at least interdisciplinary science
• Labeled graphs as powered by structured data can be a nice corpus for evaluation
• Tools and methodologies are ready for use
• We love to partner in these areas
• What do you need or want from linked data?
Questions? - dlm @ cs . rpi . edu
Extra
Directions
23
• Incorporation of TWC data Quality Facts label (Zednik et al)
• Use of DataFAQs automated data quality framework (Lebo et al)
• Additional provenance inclusion / usage (Inference / Provenance Web)
• Annotation / Collaboration facilities (Michaelis et al)
• Other data sets? Or exposition of other parameters?
• Partners in additional topic areas
Enabling Subject Area Exploration
and Hypothesis Generation
• What factors influence prevalence (and under what conditions)?
• Within smoking, should we focus on prevalence, packs sold,
quit rate, hospital admission diagnosis, other?
• What is prevalence (definition)? And how is it measured (overall
/ in this data set)?
• What are the conditions under which the data was obtained
(date, sample set, extenuating conditions, …)
• What other data might we include? And how might we show
that data?
• What should be represented ? And how should it be
manipulated?
• What tools and services to people benefit from to explore?
Encode? Act?
Semantically-enabled advisors
utilize:
• Ontologies
• Reasoning
• Social
• Mobile
• Provenance
• Context
Patton & McGuinness.et. al
tw.rpi.edu/web/project/Wineagent
Semantic
Sommelier
Previous versions used ontologies
to infer descriptions of wines for
meals and query for wines
New version uses
Context: GPS location, local
restaurants and wine lists, user
preferences
Social input: Twitter, Facebook, Wiki,
mobile, …
Source variability in quality,
contradictions exist,
Maintenance is an issue… however
new models emerging
The Semantic Web
enables…
• New models of intelligent services
• E-commerce solutions
• M-commerce
• Web assistants
• …
• Semantic Technologies: ready for use
• Tools & tutorials available; deep apps
future planning may benefit from
consultants
• Context-aware, semantic
apps are the future
New forms of web assistants/agents that act on a
human’s behalf requiring less from humans
and their communication devices…
More info: dlm @ cs.rpi.edu