Linked Data Marketplaces

Preview:

Citation preview

v0.6 / Mar 2011

(Linked) Data Marketplaces

Marin Dimitrov (Ontotext)

Contents

• Introduction

• Data Marketplaces

– Factual, InfoChimps, Azure DataMarket, Freebase, Socrata, Kasabi

– Data Market, Timetric, xIgnite

• Data Marketplaces for Linked Data

#2(Linked) Data Marketplaces Jan 2011

INTRODUCTION

(Linked) Data Marketplaces #3Jan 2011

Definitions

• Data-as-a-Service (DaaS)– “Like all members of the "as a Service" (XaaS) family, DaaS is based on

the concept that the product, data in this case, can be provided on demand to the user regardless of geographic or organizational separation of provider and consumer. Additionally, the emergence of service-oriented architecture (SOA) has rendered the actual platform on which the data resides also irrelevant” (Wikipedia)

• Data Marketplaces– “Services that make it easy to find data from a range of secondary

data sources, then consume the data in a usable and unified format. Several of these services are trying to create marketplaces for data, envisioning that data providers can offer their data sets for sale to data seekers” (DataMarket.com)

#4(Linked) Data Marketplaces Jan 2011

Data Marketplaces properties

• Proposed classification by Bauereiss & Fensel

1. Data domain

2. Population of content

3. Community management

4. Operating party

5. Pricing models

6. Data exchange

• Some additional differentiating characteristics

– Data model, Data size, Data export

– Branded marketplaces, SLA

– Query languages, Data tools#5(Linked) Data Marketplaces Jan 2011

DATA MARKETPLACES

(Linked) Data Marketplaces #6Jan 2011

Factual

• www.factual.com / @factual

#7(Linked) Data Marketplaces Jan 2011

Factual (2)

• Data domain

– Travel, finance, sports, autos, movies, music, TV, books, health, food, politics, education, science, arts, …

– High quality local data• USA, Germany, France, Italy, UK, Japan, Switzerland, Australia, …

• Used by Facebook Places

• Data population

– Crawling the web

– Public data sources

– Community contributions

• Upload XLS/ODS, CSV

#8(Linked) Data Marketplaces Jan 2011

Factual (3)

• Data model

– tabular

– Taxonomy of 400 categories• 13 Level 1 categories: Arts, Automotive, Business, Government, …

• Data size – 500,000 datasets

• Company info

– Factual Inc. (USA)

– $27M VC funding so far

#9(Linked) Data Marketplaces Jan 2011

Factual (4)

• Monetization model

– Pricing model not finalised yet (currently free)

– Pay-per-use pricing (per API call) with subscriptions• Companies that contribute data will have a fee reduction

• Data access options

– REST API

• Read from table, Add/Write to table, Get schema info

– Web applications

• Read/write raw data from a web page (JavaScript)

• Web widgets for visualising, filtering and sorting data

#10(Linked) Data Marketplaces Jan 2011

Factual (5)

• Data tools

– AutoClipper – find tables on the web

– PageClipper – extract tabular data from a web page

– FactClipper – find individual facts (query templates)

#11(Linked) Data Marketplaces Jan 2011

InfoChimps

• www.infochimps.com / @infochimps

#12(Linked) Data Marketplaces Jan 2011

InfoChimps (2)

• Data domain

– All purpose• Including data from Freebase, Wikipedia infoboxes, CKAN, Twitter,

Data.gov, Data.gov.uk, GeoNames, …

• Data population

– Public datasets

– User submitted datasets

• Data model is dataset specific

• 10,000+ datasets organised in 13 collections

#13(Linked) Data Marketplaces Jan 2011

InfoChimps (3)

• Company info

– InfoChimps (USA)

– $1.6M VC funding so far

– Acquired DataMarketplace in 12/2010

• Monetization model

– Charge data sellers

• Data sellers choose the price & licensing of their data

• Charge for data storage

• 30% commission for InfoChimps on each sale

#14(Linked) Data Marketplaces Jan 2011

InfoChimps (4)

• Monetization model (2)

– Charge data buyers

• Baboon – free, 100K API calls / mo

• Brass Monkey – $20/mo, 500K API calls / mo

• Silverback – $250/mo, 2M API calls / mo

• Golden Ape – $4,000/mo, 15M API calls / mo

• Data access options

– REST API• api.infochimps.com/DATASET/METHOD.json?PARAM=VALUE

– YQL tables

#15(Linked) Data Marketplaces Jan 2011

Azure DataMarket

• https://datamarket.azure.com

#16(Linked) Data Marketplaces Jan 2011

Azure DataMarket (2)

• Data domain

– All purpose, incl. Data.gov, UN data, Wolfram|Alpha, ESRI

• Data population

– Data publishers (need prior approval)

• Data can be stored on SQL Azure, Azure Storage or 3rd party clouds (via Data Access Layers)

• Data model

– Depends on the dataset and the storage, but always presented as OData to consumers

• Data size – 90 datasets

#17(Linked) Data Marketplaces Jan 2011

Azure DataMarket (3)

#18(Linked) Data Marketplaces Jan 2011

(c) Microsoft

Azure DataMarket (4)

• Company info

– Microsoft

• Monetization model

– Subscription for data buyers (limited/unlimited API calls)

• Access options

– OData (feeds, queries, updates)

• Data tools

– Service Explorer

– Excel add-in (find, purchase, consume data)

– Integration with SQL Server Reporting Services / Integration Services

#19(Linked) Data Marketplaces Jan 2011

DataMarket

• www.datamarket.com / @datamarket

#20(Linked) Data Marketplaces Jan 2011

DataMarket (2)

• Data domain

– Statistical data from 2,000 providers, incl. UN, Eurostat, World Bank, US agencies, BP, FIFA, …

• Data population

– Data aggregation (2,000 data providers)

• Data size

– 13K datasets, 100M time series, 600M facts

• Company info

– DataMarket (Iceland)

#21(Linked) Data Marketplaces Jan 2011

DataMarket (3)

• Monetization model

– Charge data sellers

• Free datasets – $249/mo; Paid datasets – 25% commission; Branded datasets – $699/mo + commission

– Charge data buyers

• Free – 50 API calls/mo; $99 – 500 API calls/mo; $299 – 10K API calls/mo; $799 – 100K API calls/mo

• Data access

– REST API

#22(Linked) Data Marketplaces Jan 2011

Socrata

• www.socrata.com / @socrata

#23(Linked) Data Marketplaces Jan 2011

Socrata (2)

• Data domain

– Business, education, government data

• Data population

– Uploads from data publishers

• Data size

– 13K datasets

• Data model

– tabular

#24(Linked) Data Marketplaces Jan 2011

Socrata (3)

• Company info

– Socrata (USA)

• Monetization model

– Charge data buyers (“Plans starting at $499 per month”)

• Basic – 100K API calls/mo + 50GB traffic; Plus – 250K API calls/mo + 250GB traffic; Premium – 1M API calls/mo + 1.2TB traffic; Ultimate – 10M API calls/mo + 5TB traffic

• Data access

– REST API (Socrata Open Data API)

– Data export (XLS, CSV, RDF, XML)

– RSS updates

#25(Linked) Data Marketplaces Jan 2011

Kasabi

• www.kasabi.com / @TeamKasabi

#26(Linked) Data Marketplaces Jan 2011

Kasabi (2)

• Data domain

– All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …

• Data population

– Public datasets

– User submitted datasets

• Data size

– 55 datasets

• Data model

– RDF

#27(Linked) Data Marketplaces Jan 2011

Kasabi (3)

• Company info

– Talis (UK)

• Monetization model

– Charge data consumers

– Data hosting is free

• Data access

– SPARQL / Linked Data endpoint

– REST API

– Additional APIs

– PHP & Ruby client libraries

#28(Linked) Data Marketplaces Jan 2011

Freebase

• www.freebase.com / @fbase

#29(Linked) Data Marketplaces Jan 2011

Freebase (2)

• Data domain

– General purpose

• Data model

– Graph (RDF dumps available)

• Data population

– Community curated data (licensed as CC-BY)

– Import of public data sources (Wikipedia, MusicBrainz, WordNet, LoC, …)

• Data size

– 20M entities

#30(Linked) Data Marketplaces Jan 2011

Freebase (3)

• Company info

– Metaweb (USA), now Google

• Monetization model

– Free for 100K read API calls per day (10K write)

– Paid for higher volumes

• Data access

– REST API

– Linked Data endpoint (http://rdf.freebase.com)

– Triple uploader / RDF dumps

– Acre (application hosting platform)

#31(Linked) Data Marketplaces Jan 2011

Freebase (4)

• Data tools

– Web based – schema editor, review queue, viewers, …

– GridWorks (Google Refine)• Exploring, data cleaning, transformation of tabular data

• Map data to Freebase schema & RDF export (3rd party extension)

– Acre• Application hosting platform

– User contributed JavaScript code (converted to Java with Rhino)

• Access & store data directly into Freebase

#32(Linked) Data Marketplaces Jan 2011

timetric

• www.timetric.com / @timetric

#33(Linked) Data Marketplaces Jan 2011

timetric (2)

• Data domain

– Economic data

• Data population

– aggregate data from the world's leading sources of economic data (World Bank, Eurostat, …)

– User uploaded data

• Data size

– 2.5M public statistics

#34(Linked) Data Marketplaces Jan 2011

timetric (3)

• Company info

– Timetric Ltd. (UK)

• Monetization model

– Free public datasets

– Paid exclusive datasets

• Data access

– REST API

#35(Linked) Data Marketplaces Jan 2011

xIgnite

• www.xignite.com

#36(Linked) Data Marketplaces Jan 2011

xIgnite (2)

• Data domain

– Financial data

• Data population

– aggregate data from leading sources (Dow Jones, Thomson Reuters, stock exchanges, …)

– Public datasets (national banks, SEC, Federal Reserve, …)

– User uploaded data

• Company info

– Xignite (USA)

#37(Linked) Data Marketplaces Jan 2011

xIgnite (3)

• Monetization model

– Paid subscriptions

• Data access

– Web services (REST/SOAP)

#38(Linked) Data Marketplaces Jan 2011

Coming soon…

• BuzzData

– www.buzzdata.com / @buzzdata

– Company: BuzzData

#39(Linked) Data Marketplaces Jan 2011

Data marketplaces – features summary

• Data

– Data model, domain, export options

• Monetization

– Charge buyers/ sellers

– free API calls

– branded marketplaces & Service Level Agreement

• For developers

– REST API; query language

– Tools for data management / integration

– Application hosting

#40(Linked) Data Marketplaces Jan 2011

Feature matrix

#41(Linked) Data Marketplaces Jan 2011

Fa

ctu

al

Info

Ch

imp

s

Azu

re

Da

taM

ark

et

Da

taM

ark

et

So

cra

ta

Ka

sa

bi

Fre

eb

ase

tim

etr

ic

xIg

nit

e

DATA

Data from all domains + + + - + + + - -

Data model tabular various various ? tabular RDF graph ? ?

Data export - - + - + ? + - -

RDF export - - - - + + + - -

MO

NETIZ

ATIO

N

Charge buyers + +/- + +/- + + +/- +/- +

Charge sellers ? + - + - ? - ? ?

Free API calls (month) ? 100K ? 50 - ? 3M ? -

Branded marketplaces - - + + + ? - - -

Service Level guarantee ? - - - - ? - - -

TO

OLS

REST API + + + + + + + + +

Query language + - + - - + + - -

Tools + - + - - + + - -

App hosting - - + - - ? + - -

LINKED DATA + MARKETPLACES

(Linked) Data Marketplaces #42Jan 2011

Linked Data cloud (Sep 2010)

#43(Linked) Data Marketplaces Jan 2011

(c) R. Cyganiak and A. Jentzsch

Benefits of Linked Data for Data Marketplaces

• Unified data representation model (RDF)

– Easy consumption of the data

• Global identifiers for all objects (URI)

– Makes incremental data integration & federation easier

• Interlinked datasets

– New data added to the marketplace can be integrated with existing data

– Network effects

• Data marketplace interoperability

– Data from different marketplaces can be easily integrated

#44(Linked) Data Marketplaces Jan 2011

Benefits of Linked Data for Data Marketplaces (2)

• Derived knowledge / facts

– RDF inference of additional implicit facts

– (see FactForge and LinkedLifeData)

• Rich queries

– SPARQL offers unmatched query expressivity

• Easy import of existing LOD datasets

– Linked Open Data cloud already includes 200+ datasets with 20+ billion RDF triples

#45(Linked) Data Marketplaces Jan 2011

Linked Data for marketplaces – challenges

• Quality of data

– Different (public) datasets may come with inconsistent or controversial data

– Quality more important than quantity

• Large scale data integration

– Ontology (schema) mapping of different datasets & vocabularies

• Licensing

– Some datasets come with “CC-BY-NC” or unclear licensing

• Billing

– API calls / SPARQL queries with varying computational cost #46(Linked) Data Marketplaces Jan 2011

Linked Data for marketplaces – challenges (2)

• Billing

– API calls / SPARQL queries with varying computational cost

• Operations

– Service Level guarantees

– Availability & scalability challenges• Most Linked Data endpoints at present are neither scalable, nor

available

#47(Linked) Data Marketplaces Jan 2011

LinkedLifeData & FactForge

#48(Linked) Data Marketplaces Jan 2011

(c) R. Cyganiak and A. Jentzsch

FactForge

LinkedLifeData

LinkedLifeData & FactForge

• FactForge

– Integrates some of the most central LOD datasets

– General-purpose information (not specific to a domain)

– 1.2 billion explicit and 1 billion inferred statements

– The largest upper-level knowledge base

– http://www.FactForge.net

• Linked Life Data

– 25 of the most popular life-science datasets

– 2.7 billion explicit and 1.4 billion inferred statements

– http://www.LinkedLifeData.com

#49(Linked) Data Marketplaces Jan 2011

Strategic questions

• Monetization strategy

– which (linked) datasets can be monetized

– Charge buyers / charge sellers / free quota

– Branded marketplaces

• Community building

– Crowdsource the data curation to the community

– How to provide incentives to data curators?

#50(Linked) Data Marketplaces Jan 2011

Strategic questions (2)

• Operations

– How to ensure Service Level guarantees?

– How to deal with licensing issues?

– Account management, metering, billing

• Platform

– RDF database – data volume, query volume

– ETL tools

– Curation tools

– Data export & consumption

#51(Linked) Data Marketplaces Jan 2011

Data monetization with WebServius

• Benefits

– user management, quotas & restrictions

– Metering, pricing, billing

– Security, scalability, SLAs

#52(Linked) Data Marketplaces Jan 2011

(c) WebServius

Q & A

Questions?@ontotext

#53(Linked) Data Marketplaces Jan 2011

Recommended