Upload
dapaas
View
224
Download
0
Embed Size (px)
Citation preview
8/11/2019 Dumitru Roman : Summer School ESWC 2014
1/36
Open Data Publication and Consumption
An Overview of Relevant Data Access Approaches andDaaS Solutions
@ESWC Summer School, 2014
Dumitru Roman, SINTEF, Norway
mailto:[email protected]:[email protected]8/11/2019 Dumitru Roman : Summer School ESWC 2014
2/36
Outline
The context: Open Data
Data access: Web APIs, OData, SPARQL/LDP
DaaS solutions landscape and open DaaS architecture
2
8/11/2019 Dumitru Roman : Summer School ESWC 2014
3/36
Outline
The context: Open Data
Data access: Web APIs, OData, SPARQL/LDP
DaaS solutions landscape and open DaaS architecture
3
8/11/2019 Dumitru Roman : Summer School ESWC 2014
4/36
The context: Open Data
Open Data Movement: make data available (primarily governmentdata)
Businesses and citizens can develop new ideas, services andapplications
Can support (government) transparency and accountability
4Source: McKinseyhttp://www.mckinsey.com/insights/business_technology/open_data_unlocking_innovation_a
nd_performance_with_liquid_information
Gartner:
By 2016, the use of "open data" will continue to
increase but slowly, and predominantly limited to
Type A enterprises.
By 2017, over 60% of government open dataprograms that do not effectively use open data
internally, will be scaled back or discontinued.
By 2020, enterprises and governments will fail to
protect 75% of sensitive data and will declassify andgrant broad/public access to it.
Source: Garnerhttp://training.gsn.gov.tw/uploads/news/6.Gartner+ExP+Briefing_Open+Data
_JUN+2014_v2.pdf
8/11/2019 Dumitru Roman : Summer School ESWC 2014
5/36
Lots of open datasets on the Web
A large number of datasets have been published as open data in therecent years
Many kinds of data: cultural, science, finance, statistics, transportenvironment,
Popular formats: tabular (e.g. CSV, XLS), HTML, XML, JSON,
5
8/11/2019 Dumitru Roman : Summer School ESWC 2014
6/36
but few applications
Applications utilizing open and distributed datasets have been ratherfew, e.g.
Challenges include: Lack of resources: unreliable data access
Lack of expertise: not easily available to organisations
Technical/organizational
6
Open Data Portal Datasets Applications
data.gov ~ 110 000 ~ 350
publicdata.eu ~ 50 000 ~ 80
data.gov.uk ~ 20 000 ~ 350
data.norge.no ~ 300 ~ 40
8/11/2019 Dumitru Roman : Summer School ESWC 2014
7/36
Open data publication and access
Data publishers: complicated data publishing and maintenanceprocess
Data consumers/developers: complicated programmatic dataaccess
A decision which lifts a data publication burden from a datapublisher will place that burden on the data access for the dataconsumer
7
Easy data
publication
Easy data
access
Complicated
data access
Complicated data
publication
Simplify data access!Simplify data publication !
8/11/2019 Dumitru Roman : Summer School ESWC 2014
8/36
Outline
The context: Open Data
Data access: Web APIs, OData, SPARQL/LDP
DaaS solutions landscape and open DaaS architecture
8
8/11/2019 Dumitru Roman : Summer School ESWC 2014
9/36
(Programmatic/Web-based) Data access
Traditional approaches for programmatically consuming data: ODBC,JDBC, RMI, CORBA, ...
Modern Web applications and data services rely extensively on
lightweight Web service based approaches exchanging data viastandard protocols (HTTP) and formats (e.g. XML, JSON, RDF, )
Relevant approaches for programmatic access to open data
Web APIs
OData SPARQL and Linked Data Platform (LDP)
9
8/11/2019 Dumitru Roman : Summer School ESWC 2014
10/36
Web APIs
Programmatic interfaces accessible through HTTP calls (e.g. GET,POST)
Data (requests/responses) typically in JSON or XML
Very popular among application developers
10Source: http://www.programmableweb.com/
Protocol: HTTP
Payload: JSON/XML/
Data Consumer / Dev Data Provider
Client
LibraryAppWeb
ServiceWeb API
8/11/2019 Dumitru Roman : Summer School ESWC 2014
11/36
Web APIs - example
11
Request:
GET http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58
Response payload:
http://api.yr.no/weatherapi/locationforecast/1.9/documentation
http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.58http://api.yr.no/weatherapi/locationforecast/1.9/?lat=60.10;lon=9.588/11/2019 Dumitru Roman : Summer School ESWC 2014
12/36
Open Data Protocol (OData)
ODBC for the Web
A protocol forcreating and
consuming data APIs Builds on HTTP and
REST
OASIS Standard(2014), promoted by
Microsoft, IBM, andSAP
12
http://www.odata.org/
http://www.odata.org/http://www.odata.org/8/11/2019 Dumitru Roman : Summer School ESWC 2014
13/36
OData
Principles: Metadata, Data, Querying, Editing, Operations,Vocabularies
The OData Data Model based on the Entity Data Model (EDM)
The OData protocol: CRUD + query language
XML and JSON serialization
Source: Microsoft
http://msdn.microsoft.com/en-us/data/hh237663.aspx
8/11/2019 Dumitru Roman : Summer School ESWC 2014
14/36
OData - requesting data examples
14
Request (entity by ID):GET serviceRoot/People('russellwhyte')
Source: http://www.odata.org/getting-started/basic-tutorial/
Response payload:
Request (collections):GET serviceRoot/People
Request (individual property):
GET serviceRoot/Airports('KSFO')/Name
8/11/2019 Dumitru Roman : Summer School ESWC 2014
15/36
OData - querying data examples
15
Source: http://www.odata.org/getting-started/basic-tutorial/
Request (filter):GET serviceRoot/People?$filter=FirstName eq 'Scott' Response payload:
Filter on complex type:GET serviceRoot/Airports?$filter=contains(Location/
Address, 'San Francisco')
orderby:GET serviceRoot/People('scottketchum')/Trips?
$orderby=EndsAt desc
top:GET serviceRoot/People?$top=2
count:GET serviceRoot/People/$count
expand:GET serviceRoot/People('keithpinckney')?$expand=
Friends
select:GET serviceRoot/Airports?$select=Name, IcaoCode
search:GET serviceRoot/People?$search=Boise
Lambda Operators: any / allGET serviceRoot/People?$filter=Emails/any(s:endswith(s, 'contoso.com'))
8/11/2019 Dumitru Roman : Summer School ESWC 2014
16/36
OData - data modification example
16Source: http://www.odata.org/getting-started/basic-tutorial/
Request (Create an Entity):
POST serviceRoot/PeopleOData-Version: 4.0Content-Type:application/json;odata.metadata=minimalAccept: application/json{"@odata.type" :"Microsoft.OData.SampleService.Models.TripPin.Person","UserName": "teresa", "FirstName" : "Teresa","LastName" : "Gilbert", "Gender" : "Female","Emails" : ["[email protected]","[email protected]"], "AddressInfo" : [{ "Address" : "1 Suffolk Ln.", "City" : {"CountryRegion" : "United States", "Name" :"Boise", "Region" : "ID }
}] }
Response payload:
Remove an Entity:DELETE serviceRoot/People('vincentcalabrese')
Update an Entity (uses PATCH or PUT)
Relationship Operations (Link to Related Entities):POST serviceRoot/People('scottketchum')/Friends/$ref
{"@odata.id": "serviceRoot/People('vincentcalabrese')"}
8/11/2019 Dumitru Roman : Summer School ESWC 2014
17/36
SPARQL
A set of specifications that provide languages and protocols to queryand manipulate RDF graph content on the Web or in an RDF store
17
Service DescriptionRequest:
GET /sparql/
Host: www.example.org
Response: An RDF description,
using the Service Description
vocabulary
Protocol for RDFRequest:
GET /sparql/?query=[SPARQL
Query]
Host: www.example.org
Response: A SPARQL Results
Document or RDF graph
Update LanguagePREFIX foaf: .
INSERT DATA {
foaf:knows [ foaf:name "Dorothy" ]. } ;
DELETE { ?person foaf:name ?mbox }
WHERE { foaf:knows
?person .
?person foaf:name ?name FILTER ( lang(?name) = "EN"
) .}
Examples taken from http://www.w3.org/TR/sparql11-overview/
Query LanguagePREFIX foaf:
SELECT ?name (COUNT(?friend) AS ?count)WHERE {
?person foaf:name ?name .
?person foaf:knows ?friend .
} GROUP BY ?person ?name
Result (serialized in XML, JSON, CSV, TSV):
Graph Store HTTP ProtocolPOST /rdf-
graphs/service?graph=http%3A%2F%2Fwww.example.org%2Falice
Host: example.org
Content-Type: text/turtle
@prefix foaf: .
foaf:knows [ foaf:name "Dorothy" ] .
http://www.w3.org/TR/sparql11-overview/
http://www.w3.org/TR/sparql11-overview/http://www.w3.org/TR/sparql11-overview/8/11/2019 Dumitru Roman : Summer School ESWC 2014
18/36
Linked Data Platform
Describes the use of HTTP for accessing, updating, creating anddeleting resources from servers that expose data as Linked Data
Centered around LDPRs, LDPCs, membership, containment
Under development at W3C; working draft
18
http://www.w3.org/TR/ldp/
LDP-BCRequest: GET /c1/
Response payload:
Resource
Request: GET /netWorth/nw1Response payload:
LDP-DCRequest: GET /netWorth/nw1/liabilities/Response payload:
Examples taken from http://www.w3.org/TR/ldp/
LDP-DCRequest:
http://www.w3.org/TR/ldp/http://www.w3.org/TR/ldp/8/11/2019 Dumitru Roman : Summer School ESWC 2014
19/36
Data Access Summary
Web APIs
Very flexible, popular with Web developers, no specific commitment to datamodels
OData
ER-based data model, abstract interface to datastores (focus on CRUD),
perceived as vendor-pushed (strong tool support) SPARQL and LDP
Graph data model, community-pushed, some interesting features (querying,federation, linking,)
Though there is overlapping between the various approaches, they all aimto simplify access to distributed data sources for application developers
Which approach to choose depends on many factors, e.g. type of data, size,relationships, infrastructure, skills to support, frequency of updates, end-usescenarios,
19
8/11/2019 Dumitru Roman : Summer School ESWC 2014
20/36
Outline
The context: Open Data
Data access: Web APIs, OData, SPARQL/LDP
DaaS solutions landscape and open DaaS architecture
20
8/11/2019 Dumitru Roman : Summer School ESWC 2014
21/36
Data publication
Data access mechanisms simplify data consumption for applicationdevelopers
But data needs to be provisioned to applications according to thechosen data access mechanism
And applications will always be dependent on the hosting for the datathey use
Data publishers and application developers need to rely on genericCloud platforms and build, deploy and maintain a complex Open
Data software and data stack from scratch Complicated data provisioning and maintenance process
Data-as-a-Service (DaaS) solutions are emerging to address this issue
21
Like all members of the "as a Service" (XaaS) family, DaaS is based on the concept that the product,
data in this case, can be p ro v id ed o n d em an d to the user reg ar d les s o f g eo g r ap h ic o r
o r g a n i z a ti o n a l s e p ar a t i o n o f p r o v i d e r a n d c o n s u m e r .
Source: Wikipedia; https://en.wikipedia.org/wiki/DaaS
8/11/2019 Dumitru Roman : Summer School ESWC 2014
22/36
Relevant DaaS solutions
22
Windows Azure
MarketplaceSocrata DataMarket
Factual Junar PublishMyData
DaPaaS
8/11/2019 Dumitru Roman : Summer School ESWC 2014
23/36
Windows Azure Marketplace
A marketplace for applicationsand data (~170 datasets; ~700applications)
Charging data consumers Tools and APIs for data
publishing, analytics, metadatamanagement, accountmanagement and pricing,monitoring and billing, as well
as a data portal for datasetexploration
Supports OData
23
https://datamarket.azure.com/
Source: Microsofthttp://go.microsoft.com/fwlink/?LinkID=201129&clcid=0x409
https://datamarket.azure.com/https://datamarket.azure.com/8/11/2019 Dumitru Roman : Summer School ESWC 2014
24/36
Socrata
Specific focus on Open Data
Open Data Portal: data publishing &clean-up, metadata generation, data-driven portals for data exploration and
portal management
API Foundry for creating and deployingRESTful APIs on top of the data
Hosted data is accessible through theSocrata Open Data API (SODA) aRESTful interface for searching and
reading data in XML, JSON or RDF
24
http://www.socrata.com/
Source: Socrata
http://www.socrata.com/http://www.socrata.com/8/11/2019 Dumitru Roman : Summer School ESWC 2014
25/36
DataMarket
Provides statistical data fromalmost 100 data providers
~ 71 000 datasets
Supports embeddable
visualisations of data, dataexport, live feeds for dataupdates, ability for datapublishers to monetize data viathe marketplace, custom datadriven portals for publishers,
data portal, Web API
25
http://datamarket.com/
http://datamarket.com/http://datamarket.com/8/11/2019 Dumitru Roman : Summer School ESWC 2014
26/36
Factual
Data for ~ 65 million local business and pointsof interest in 50 countries; a product databaseof over 650,000 products
Used to provide the option for hosting
thousands of 3rd party data sets (CommunityData) but activity has been discontinued
Data is populated by means of Web crawls,data extraction and 3rd party data services;data model is tabular, based on taxonomy ofaround 400 categories
Pricing is based on a pay-per-use model Data access is provided through a RESTful API
Provides a set of tools for data management
26
http://www.factual.com/
http://www.factual.com/http://www.factual.com/8/11/2019 Dumitru Roman : Summer School ESWC 2014
27/36
Junar
Cloud-based Open Dataplatform to collect,enrich, publish andanalyse open data
Data can be consumedeither directly via theJunar API, or via variousvisual widgets
27
http://www.junar.com/
http://www.junar.com/http://www.junar.com/8/11/2019 Dumitru Roman : Summer School ESWC 2014
28/36
PublishMyData
28
Hosted, as-a-service solution for Open and Linked Datapublishing
Uses DCAT and provides data access via Web APIs, a
SPARQL endpoint and raw data-dumps
http://www.swirrl.com/publishmydata
http://www.swirrl.com/publishmydatahttp://www.swirrl.com/publishmydata8/11/2019 Dumitru Roman : Summer School ESWC 2014
29/36
Other relevant solutions
Comprehensive Knowledge Archive Network (CKAN)(http://ckan.org/) web-based open source data management system forthe storage and distribution of open data; datahub (http://datahub.io/)
LOD2 (http://lod2.eu/) research project aimed at providing an open
source, integrated software stack for managing the lifecycle of Linked Data,from data extraction, enrichment, interlinking, to maintenance; not meantto be as-a-service solution
Project Open Data (http://project-open-data.github.io/) a set of open
source tools, methodologies and use cases for publishing and utilising OpenData
COMSODE (http://www.comsode.eu/) research project aiming to createa publication platform for Open Data called Open Data Node
29
http://ckan.org/http://ckan.org/http://datahub.io/http://datahub.io/http://lod2.eu/http://lod2.eu/http://project-open-data.github.io/http://www.comsode.eu/http://www.comsode.eu/http://www.comsode.eu/http://project-open-data.github.io/http://lod2.eu/http://datahub.io/http://ckan.org/8/11/2019 Dumitru Roman : Summer School ESWC 2014
30/36
DaPaaS towards an Open Data- andPlatfom-as-a-Service for Open Data
DaPaaS research project for simplifying data publication andconsumption via a Data- and Platform-as-a-Service approach
30
http://dapaas.eu
DaPaaS Platform
Data Publisher
End-Users Data Consumer
Application Developer
publishes
open data
develops and deploys
applications on top
published data
consumes data resulting
from the available
applications
http://dapaas.eu/http://dapaas.eu/8/11/2019 Dumitru Roman : Summer School ESWC 2014
31/36
DaPaaS Requirements for Data Publisher
31
DP-02: Data
storage and
querying
DP-04: Data
interlinking
DP-03: Dataset
search &
exploration
DP-09: Data availability
DaPaaS Platform
DP-05: Data
cleaning &
transformation
DP-01: Dataset
Import
DP-11: Secure
access to platform
DP-10: User
registration & profile
management
Data
Publisher
DP-08: Data scalability
DP-06: Dataset
bookmarking &
notifications
DP-07: Dataset metadata
management, statistics &access policies
DP-12: UI for data
publisher
DP-13: Data
publishing
methodology support
8/11/2019 Dumitru Roman : Summer School ESWC 2014
32/36
DaPaaS Requirements for ApplicationDeveloper
32
AD-04:
Configure
application
deployment
AD-01: Access to
Data Publisher
services
(DP-01 DP-13)
AD-03: Develop
applications in state-
of-art programming
languages
AD-05: Deploy
and monitor
application
AD-06: Application
metadata management,
statistics & access policies
DaPaaS Platform
AD-07: UI for
application
developer
AD-08: Application
development methodology
support
AD-02: Data
export
ApplicationDeveloper
8/11/2019 Dumitru Roman : Summer School ESWC 2014
33/36
DaPaaS Requirements for End-Users DataConsumer
33
DaPaaS Platform
End-User
Data Consumer
EU-03: Datasets and
applications bookmarking
and notifications
EU-01: User
registration & profile
management
EU-02: Search &
explore datasets
and applications
EU-04: Mobile and
desktop GUI access
EU-07: High availability of
data and applications
EU-05: Data export and
download
8/11/2019 Dumitru Roman : Summer School ESWC 2014
34/36
DaPaaS PlatformAbstract High-Level Architecture
34
Data Layer
UX Layer
UX Services
Open Data
Warehouse
Platform Layer
UsageMonitoring
Application Hosting
Environment
Security&AccessControl
Tool-supportedMethodologyfor
D
ataPublishing/Consumption
DaaS Services
PaaS Services
DatasetsDaaS Services
DaaS Services
Data-Driven
ApplicationsPaaS ServicesPaaS Services
UX ServicesUX Services
8/11/2019 Dumitru Roman : Summer School ESWC 2014
35/36
Summary
Lots of open datasets, but few applications using them
Simplifying data publication/consumption can enable anincrease in the number (and quality) of applicationsusing open data
Various approaches emerging
For data access: Web APIs, OData, SPARQL/LDP
For data publication/provisioning: DaaS solutions
35
8/11/2019 Dumitru Roman : Summer School ESWC 2014
36/36
Thank you!
36
Contact: [email protected]
mailto:[email protected]:[email protected]