45
Data Strategies Metadata, Open Data & Linked Data Andreas Blumauer CEO, Semantic Web Company www.semantic-web.at www.poolparty.biz

Data Strategies: Metadata, Open Data, Linked Data

Embed Size (px)

DESCRIPTION

In the age of Big Data, filtering mechanisms have to professionalized to increase accessibility to data. This presentation, held at Knowledge Management Academy in Vienna, shows how technologies derived from the Semantic Web can help to establish more efficient means to manage data and information.

Citation preview

Page 1: Data Strategies: Metadata, Open Data, Linked Data

Data StrategiesMetadata, Open Data & Linked Data

Andreas BlumauerCEO, Semantic Web Company

www.semantic-web.atwww.poolparty.biz

Page 2: Data Strategies: Metadata, Open Data, Linked Data

About Semantic Web Company

Company was founded 2001 in Vienna, Austria

>20 experts in linked data technologies

Product: PoolParty Suite (launched 2009)

Serving global 500 companies & large NGOs

EU- & US-based consulting services

Page 3: Data Strategies: Metadata, Open Data, Linked Data

Some customers we serve

• Pearson• GBPN• World Bank

• Daimler• Credit Suisse• Roche

• Wolters Kluwer• Council of EU• Wood Mackenzie

• Ministry of Finance (AUT)• Education Services (AUS)• REEEP

Page 4: Data Strategies: Metadata, Open Data, Linked Data

Agenda

Intro

Data management – the current situation

Potential & Benefits of Linked Open Data (LOD) – what is metadata, open data, linked data, what is linked open data?

Use Cases

Global Buildings Performance Network (GBPN) & BPIE

World Bank Thesauri

EIP on Water: Marketplace

Renewable Energy & Energy Efficiency Partnership (REEEP)

Q & A

Page 5: Data Strategies: Metadata, Open Data, Linked Data

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years

alone.That are 3,4 billion HD movies.

Which problems can be solved on top of big data?

Page 6: Data Strategies: Metadata, Open Data, Linked Data
Page 7: Data Strategies: Metadata, Open Data, Linked Data

Common interests. Common topics.

WaterManagement

Page 8: Data Strategies: Metadata, Open Data, Linked Data

Common vocabulary? Common understanding?

Wastewatertreatment Wastewater

treatment

WaterManagement

Page 9: Data Strategies: Metadata, Open Data, Linked Data

Globalisation + Localisation = Glocalisation

CertificationRating

(Europe) (U.S.)

Page 10: Data Strategies: Metadata, Open Data, Linked Data

Common data? Questions in common?

Energy management policiesSearc

h

WaterManagement

Page 11: Data Strategies: Metadata, Open Data, Linked Data

The Semantic Puzzle

Page 12: Data Strategies: Metadata, Open Data, Linked Data

Data Analytics 2.0 -The islands are now open for the experts

Page 13: Data Strategies: Metadata, Open Data, Linked Data

Data management in the environmental sector – The current situation

Example: Buildings performance “2012 saw the launch of an impressive

number of online portals sharing data and analysis on energy efficiency in buildings” (Ingeborg Nolte, Senior Communication Manager at BPIE)

However: how can the value be leveraged of so many (open) data sets which are actually isolated from each other?

Will Excel be the ultimate solution?

Page 14: Data Strategies: Metadata, Open Data, Linked Data

What’s wrong with Open Data?

Syntactic heterogenity – different trees

Semantic heterogenity – different tags and attributes (e.g. kindergarten, child_care, daycare)

<kindergarten>

<name>Seven Dwarfs</name>

<location>...</location>

<description>...</description>

</kindergarten>

<child_care name=„Seven Dwarfs“>

<address>

<street>...</street>

<zip>...</zip>

</address>

<text>...</text>

</child_care>

<daycare id=„Seven Dwarfs“

address=„...“>

. . .

</ daycare >

Page 15: Data Strategies: Metadata, Open Data, Linked Data
Page 16: Data Strategies: Metadata, Open Data, Linked Data

What is metadata, open data or linked data? What is linked open data?

Page 17: Data Strategies: Metadata, Open Data, Linked Data

Metadata

This is an important document about solar

energy

solar energy

renewable energy

Data & Information

Metadata

Meta-metadata

Page 18: Data Strategies: Metadata, Open Data, Linked Data

Thesaurus / Ontology

Wien

Vindobona

http://voc.org.com/core/54

Places

prefLabel (de)

prefLabel

bro

ader

CoffeehousealtLabel

http://voc.org.com/core/176

prefLabelInnere Stadt

nar

row

er

GastronomyprefLabel

bro

ader

Vienna

hiddenLabel

prefLabel (en)

related

http://voc.org.com/core/77

Das CentralaltLabel

http://voc.org.com/core/44

http://voc.org.com/core/355

Café prefLabel

http://voc.org.com/core/97

nar

row

er

related

Café Central prefLabel

Page 19: Data Strategies: Metadata, Open Data, Linked Data

Data Analytics 3.0 –Connected islands based on standards

Page 20: Data Strategies: Metadata, Open Data, Linked Data

What is linked data, what is linked open data?

The Free Universal Construction Kit connectsLego®, Duplo®, Fischertechnik®, Gears! Gears! Gears!®, K’Nex®, Krinkles®, Bristle Blocks®, Lincoln Logs®, Tinkertoys®, Zome®, ZomeTool® and Zoob®

with a low cost 3D printed adapter set

CC by Golan Levin (US), Shawn Sims (US)

Page 21: Data Strategies: Metadata, Open Data, Linked Data

LOD as a giant knowledge base

Which policies in the area of renewable energy have helped to initiate projects and programmes in the agricultural sector which finally have improved substantially the nutritional situation in a certain country?

Page 22: Data Strategies: Metadata, Open Data, Linked Data

Application example #1: Energy Market Intelligence

http://integrator.poolparty.biz/report_renewable/

Scenario #1:

I am an energy market researcher at the International Energy Agency (IEA).

I inform policy makers about the situation in specific renewable energy areas to develop targeted energy support programs.

For my research I need indicators about utilisation levels of all alternative forms of energy with regards to geographical and political categories.

Page 23: Data Strategies: Metadata, Open Data, Linked Data

How does it work?

Articles about Renewable Energy 72,018 documents From ~300 web sources

Reegle Thesaurus: ~3,000 concepts Traverse hierarchies below main categories (wind, solar,

etc.) and classify documents

Geonames Annotate documents with regards to their geographical

entities

DBpedia Lookup several Yago classes to all extracted geographical

entities to assert additional categories, e.g.: EU-countries, French-speaking countries etc.

Page 24: Data Strategies: Metadata, Open Data, Linked Data

How does it work?

Semantic Search

PoolParty Semantic Integrator

Geospatial Search

Data Visualisation

SPARQL

….

Page 25: Data Strategies: Metadata, Open Data, Linked Data

Application example #2: Health Care

Scenario #2:

I am an information officer at the Global Health Observatory of the World Health Organisation.

I inform policy makers about the global situation in specific disease areas to direct support to the required health support programs.

For my research I need data about disease prevalence in relation to socio-economic factors.

http://integrator.poolparty.biz/report_medicine/

Page 26: Data Strategies: Metadata, Open Data, Linked Data

How does it work?

PubMed Articles Cardiovascular Diseases: 39,911 documents

Neoplasms: 69,937 documents

Nervous System Diseases: 48,128 documents

MeSH: 26,700 concepts / 346,600 triples Traverse hierarchies below disease main categories and classify

documents

Geonames Annotate documents with regards to their geographical entities

DBpedia Lookup HDI (The Human Development Index (HDI) is a composite

statistic of life expectancy, education, and income indices used to rank countries into four tiers of human development)

Page 27: Data Strategies: Metadata, Open Data, Linked Data

How does it work?

Semantic Search

PoolParty Semantic Integrator

Geospatial Search

Data Visualisation

SPARQL

Page 28: Data Strategies: Metadata, Open Data, Linked Data

Data management in the environmental sector – The current situation

Example: Energy data

“It’s necessary to split the responsibility for different data sets between different data providers.” (Florian Bauer)

However: how can this ‘splitting’ be co-ordinated and hwo can additional positive network effects be stimulated?

Page 29: Data Strategies: Metadata, Open Data, Linked Data

5 stars of data standards

• Publish Open Data in RDF reusing vocabularies which can be understood and combined by apps in unforeseen ways (e.g. visualization widgets)

make your stuff available on the Web (whatever format) under an open license

link your data

use URIs to denote things

use non-proprietary formats(e.g., CSV instead of Excel)

make it available as structured data(e.g., Excel instead of image scan of a table)

Page 30: Data Strategies: Metadata, Open Data, Linked Data

Licensing is key for open dataKind of license

Num. %

Not specified 132 39%

Public Domain

69 21%

Attribution 66 20%

Share alike 35 10%

Closed 16 5%

With restrictions

5 2%

Other 3 1%

Source: http://www.licensius.com/blog/lodlicenses

Page 31: Data Strategies: Metadata, Open Data, Linked Data

Use Cases

Page 32: Data Strategies: Metadata, Open Data, Linked Data

Global Buildings Performance Network (GBPN)

The Global Buildings Performance Network (GBPN) is a globally organised and regionally focused network whose mission is to advance best practice policies that can significantly reduce energy consumption and associated CO2 emissions from buildings.

Page 33: Data Strategies: Metadata, Open Data, Linked Data

Goals

Launch of the GBPN global Knowledge Platform for the Energy Performance of Buildings (www.gbpn.org)

Share Knowledge

Build Awareness & showcase best practise

Stimulate collective research

Stimulate collective analysis from experts worldwide

Promote better decision-making

Help the building sector effectively reduce its impact on climate change

Linked Open Data successfully services these objectives!

Page 34: Data Strategies: Metadata, Open Data, Linked Data

Technical Solution

Drupal CMS DB

publish

enrich

Integrated view (& search index) GBPN

KnowledgePlattform

annotation & mapping

Page 35: Data Strategies: Metadata, Open Data, Linked Data

The GBPN Knowledge Plattform

LOD based GBPN Terminology http://bit.ly/YSbD9S

GBPN News Aggregator Tool: http://bit.ly/13JLJqk

GBPN Policy Comparative Tool: http://bit.ly/X9Vihm

The GBPN Knowledge Platform is a Linked Open Data project that aims to open and connect with the best resources, data and information on buildings energy performance policies worldwide.

Report Database: http://www.gbpn.org/reports

The Laboratory: http://www.gbpn.org/laboratory

GBPN web blog: http://bit.ly/X9VSeW

Live-Demo

Page 36: Data Strategies: Metadata, Open Data, Linked Data

The Worldbank Taxonomies & Thesauri

http://vocabulary.worldbank.org/

Page 37: Data Strategies: Metadata, Open Data, Linked Data

EIP on Water - Marketplace

http://www.eip-water.eu

Page 38: Data Strategies: Metadata, Open Data, Linked Data

reegle – country profiles

http://reegle.info/countries

Page 39: Data Strategies: Metadata, Open Data, Linked Data

Same document tagged: PV

Document tagged:

photovoltaic

Understanding synonyms & relations?

Page 40: Data Strategies: Metadata, Open Data, Linked Data

Standardisation and consistency is key

Based on our experience in establishing knowledge broker portals we know:

There is a strong need to increase consistency when tagging climate and energy resources

We need to ensure the consistency of message being delivered to the public to avoid confusion using terms in different ways

This needs standardization of the used categories and tags

Page 41: Data Strategies: Metadata, Open Data, Linked Data

reegle thesaurus

Page 42: Data Strategies: Metadata, Open Data, Linked Data

The trusted Clean Energy LOD Cloud

http://blog.semantic-web.at/

Page 43: Data Strategies: Metadata, Open Data, Linked Data

reegle tagging API

Have a look on http://api.reegle.info – using the API is free!

blog.okfn.org/2013/04/08/sustainable-energy-policy-demands-sustainable-open-data/

Page 44: Data Strategies: Metadata, Open Data, Linked Data

Impact

2008 2009 2010 2011 20120

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

reegle.info users per year (not including datasets re-used on other sites)