Upload
semantic-web-company
View
146
Download
20
Tags:
Embed Size (px)
DESCRIPTION
In the age of Big Data, filtering mechanisms have to professionalized to increase accessibility to data. This presentation, held at Knowledge Management Academy in Vienna, shows how technologies derived from the Semantic Web can help to establish more efficient means to manage data and information.
Citation preview
Data StrategiesMetadata, Open Data & Linked Data
Andreas BlumauerCEO, Semantic Web Company
www.semantic-web.atwww.poolparty.biz
About Semantic Web Company
Company was founded 2001 in Vienna, Austria
>20 experts in linked data technologies
Product: PoolParty Suite (launched 2009)
Serving global 500 companies & large NGOs
EU- & US-based consulting services
Some customers we serve
• Pearson• GBPN• World Bank
• Daimler• Credit Suisse• Roche
• Wolters Kluwer• Council of EU• Wood Mackenzie
• Ministry of Finance (AUT)• Education Services (AUS)• REEEP
Agenda
Intro
Data management – the current situation
Potential & Benefits of Linked Open Data (LOD) – what is metadata, open data, linked data, what is linked open data?
Use Cases
Global Buildings Performance Network (GBPN) & BPIE
World Bank Thesauri
EIP on Water: Marketplace
Renewable Energy & Energy Efficiency Partnership (REEEP)
Q & A
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years
alone.That are 3,4 billion HD movies.
Which problems can be solved on top of big data?
Common interests. Common topics.
WaterManagement
Common vocabulary? Common understanding?
Wastewatertreatment Wastewater
treatment
WaterManagement
Globalisation + Localisation = Glocalisation
CertificationRating
(Europe) (U.S.)
Common data? Questions in common?
Energy management policiesSearc
h
WaterManagement
The Semantic Puzzle
Data Analytics 2.0 -The islands are now open for the experts
Data management in the environmental sector – The current situation
Example: Buildings performance “2012 saw the launch of an impressive
number of online portals sharing data and analysis on energy efficiency in buildings” (Ingeborg Nolte, Senior Communication Manager at BPIE)
However: how can the value be leveraged of so many (open) data sets which are actually isolated from each other?
Will Excel be the ultimate solution?
What’s wrong with Open Data?
Syntactic heterogenity – different trees
Semantic heterogenity – different tags and attributes (e.g. kindergarten, child_care, daycare)
<kindergarten>
<name>Seven Dwarfs</name>
<location>...</location>
<description>...</description>
</kindergarten>
<child_care name=„Seven Dwarfs“>
<address>
<street>...</street>
<zip>...</zip>
</address>
<text>...</text>
</child_care>
<daycare id=„Seven Dwarfs“
address=„...“>
. . .
</ daycare >
What is metadata, open data or linked data? What is linked open data?
Metadata
This is an important document about solar
energy
solar energy
renewable energy
Data & Information
Metadata
Meta-metadata
Thesaurus / Ontology
Wien
Vindobona
http://voc.org.com/core/54
Places
prefLabel (de)
prefLabel
bro
ader
CoffeehousealtLabel
http://voc.org.com/core/176
prefLabelInnere Stadt
nar
row
er
GastronomyprefLabel
bro
ader
Vienna
hiddenLabel
prefLabel (en)
related
http://voc.org.com/core/77
Das CentralaltLabel
http://voc.org.com/core/44
http://voc.org.com/core/355
Café prefLabel
http://voc.org.com/core/97
nar
row
er
related
Café Central prefLabel
Data Analytics 3.0 –Connected islands based on standards
What is linked data, what is linked open data?
The Free Universal Construction Kit connectsLego®, Duplo®, Fischertechnik®, Gears! Gears! Gears!®, K’Nex®, Krinkles®, Bristle Blocks®, Lincoln Logs®, Tinkertoys®, Zome®, ZomeTool® and Zoob®
with a low cost 3D printed adapter set
CC by Golan Levin (US), Shawn Sims (US)
LOD as a giant knowledge base
Which policies in the area of renewable energy have helped to initiate projects and programmes in the agricultural sector which finally have improved substantially the nutritional situation in a certain country?
Application example #1: Energy Market Intelligence
http://integrator.poolparty.biz/report_renewable/
Scenario #1:
I am an energy market researcher at the International Energy Agency (IEA).
I inform policy makers about the situation in specific renewable energy areas to develop targeted energy support programs.
For my research I need indicators about utilisation levels of all alternative forms of energy with regards to geographical and political categories.
How does it work?
Articles about Renewable Energy 72,018 documents From ~300 web sources
Reegle Thesaurus: ~3,000 concepts Traverse hierarchies below main categories (wind, solar,
etc.) and classify documents
Geonames Annotate documents with regards to their geographical
entities
DBpedia Lookup several Yago classes to all extracted geographical
entities to assert additional categories, e.g.: EU-countries, French-speaking countries etc.
How does it work?
Semantic Search
PoolParty Semantic Integrator
Geospatial Search
Data Visualisation
SPARQL
….
Application example #2: Health Care
Scenario #2:
I am an information officer at the Global Health Observatory of the World Health Organisation.
I inform policy makers about the global situation in specific disease areas to direct support to the required health support programs.
For my research I need data about disease prevalence in relation to socio-economic factors.
http://integrator.poolparty.biz/report_medicine/
How does it work?
PubMed Articles Cardiovascular Diseases: 39,911 documents
Neoplasms: 69,937 documents
Nervous System Diseases: 48,128 documents
MeSH: 26,700 concepts / 346,600 triples Traverse hierarchies below disease main categories and classify
documents
Geonames Annotate documents with regards to their geographical entities
DBpedia Lookup HDI (The Human Development Index (HDI) is a composite
statistic of life expectancy, education, and income indices used to rank countries into four tiers of human development)
How does it work?
Semantic Search
PoolParty Semantic Integrator
Geospatial Search
Data Visualisation
SPARQL
Data management in the environmental sector – The current situation
Example: Energy data
“It’s necessary to split the responsibility for different data sets between different data providers.” (Florian Bauer)
However: how can this ‘splitting’ be co-ordinated and hwo can additional positive network effects be stimulated?
5 stars of data standards
• Publish Open Data in RDF reusing vocabularies which can be understood and combined by apps in unforeseen ways (e.g. visualization widgets)
make your stuff available on the Web (whatever format) under an open license
link your data
use URIs to denote things
use non-proprietary formats(e.g., CSV instead of Excel)
make it available as structured data(e.g., Excel instead of image scan of a table)
Licensing is key for open dataKind of license
Num. %
Not specified 132 39%
Public Domain
69 21%
Attribution 66 20%
Share alike 35 10%
Closed 16 5%
With restrictions
5 2%
Other 3 1%
Source: http://www.licensius.com/blog/lodlicenses
Use Cases
Global Buildings Performance Network (GBPN)
The Global Buildings Performance Network (GBPN) is a globally organised and regionally focused network whose mission is to advance best practice policies that can significantly reduce energy consumption and associated CO2 emissions from buildings.
Goals
Launch of the GBPN global Knowledge Platform for the Energy Performance of Buildings (www.gbpn.org)
Share Knowledge
Build Awareness & showcase best practise
Stimulate collective research
Stimulate collective analysis from experts worldwide
Promote better decision-making
Help the building sector effectively reduce its impact on climate change
Linked Open Data successfully services these objectives!
Technical Solution
Drupal CMS DB
publish
enrich
Integrated view (& search index) GBPN
KnowledgePlattform
annotation & mapping
The GBPN Knowledge Plattform
LOD based GBPN Terminology http://bit.ly/YSbD9S
GBPN News Aggregator Tool: http://bit.ly/13JLJqk
GBPN Policy Comparative Tool: http://bit.ly/X9Vihm
The GBPN Knowledge Platform is a Linked Open Data project that aims to open and connect with the best resources, data and information on buildings energy performance policies worldwide.
Report Database: http://www.gbpn.org/reports
The Laboratory: http://www.gbpn.org/laboratory
GBPN web blog: http://bit.ly/X9VSeW
Live-Demo
The Worldbank Taxonomies & Thesauri
http://vocabulary.worldbank.org/
Same document tagged: PV
Document tagged:
photovoltaic
Understanding synonyms & relations?
Standardisation and consistency is key
Based on our experience in establishing knowledge broker portals we know:
There is a strong need to increase consistency when tagging climate and energy resources
We need to ensure the consistency of message being delivered to the public to avoid confusion using terms in different ways
This needs standardization of the used categories and tags
reegle thesaurus
reegle tagging API
Have a look on http://api.reegle.info – using the API is free!
blog.okfn.org/2013/04/08/sustainable-energy-policy-demands-sustainable-open-data/
Impact
2008 2009 2010 2011 20120
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
reegle.info users per year (not including datasets re-used on other sites)
Contact
Andreas BlumauerCEO, Semantic Web Company
+43 1 [email protected]
www.semantic-web.atwww.poolparty.biz
Partner network