Download pdf - Open Data and Linked Data

Transcript
Page 1: Open Data and Linked Data

Open Data and Linked Datathe what and how of linked open data

James G. Boram KimLiST Inc.

[email protected]

March 5th, 2016

Page 2: Open Data and Linked Data

Prologue

Page 3: Open Data and Linked Data

Photography: Jason Madara, WIRED UK 02:13, 2013.

Tim O’ReillyFounder and CEO, O’Reilly Media

Page 4: Open Data and Linked Data

“Data is the Next Intel Inside®.”

Photography: Jason Madara, WIRED UK 02:13, 2013.

Tim O’ReillyFounder and CEO, O’Reilly Media

”Every significant Internet application to date has been backed by a specialized database. […] Much as the rise of proprietary software led to the Free Software movement, we expect the rise of proprietary databases to result in a Free Data movement within the next decade.” — “What is Web 2.0,” Sep. 2005.

Page 5: Open Data and Linked Data

Tim O’ReillyFounder and CEO, O’Reilly Media

John BattelleCEO, Co-founder, and Chairman,

NewCo

Photography: James Duncan Davidson, Web 2.0 Summit, 2010.

Page 6: Open Data and Linked Data

Data is the “Intel Inside®” of the Next Generation of Applications

Tim O’ReillyFounder and CEO, O’Reilly Media

John Battelle CEO, Co-founder, and Chairman,

NewCo

Photography: James Duncan Davidson, Web 2.0 Summit, 2010.

”Collective intelligence applications depend on managing, understanding, and responding to massive amounts of user-generated data in real-time. The “subsystems” of the emerging Internet operating system are increasingly data subsystems: location, identity (of people, products, and places), and the skeins of meaning that tie them together. This leads to new levers of competitive advantage: Data is the “Intel Inside®” of the next generation of computer applications.” — “Web Squared: Web 2.0 Five Years On,” Oct. 2010.

Page 7: Open Data and Linked Data

Infographic: visually, http://visual.ly/open-data-movement, 2011.

Page 8: Open Data and Linked Data

Barack Obama44th President of the United States

Photography: Kevin S. O’Brien, U.S. Navy, 2009.

Page 9: Open Data and Linked Data

Open Government

Barack Obama44th President of the United States

Photography: Kevin S. O’Brien, U.S. Navy, 2009.

”My administration is committed to creating an unprecedented level of openness in Government. We will work together to ensure the public trust and establish a system of transparency, public participation, and collaboration. Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” — Memorandum for the Heads of Executive Departments and Agencies, “Transparency and Open Government,” Jan. 2009.

Page 10: Open Data and Linked Data

Tim O’ReillyFounder and CEO, O’Reilly Media

Photography: Eric Laycock, Esri, 2011.

Page 11: Open Data and Linked Data

“Government as a Platform”

Tim O’ReillyFounder and CEO, O’Reilly Media

Photography: Eric Laycock, Esri, 2011.

”This is the right way to frame the question of “Government 2.0.” How does government itself become an open platform that allows people inside and outside government to innovate? How do you design a system in which all of the outcomes aren’t specified beforehand, but instead evolve through interactions between the technology provider and its user community? […] That’s Government 2.0: technology helping build the kind of government the nation’s founders intended: of, for and by the people.” — “Gov 2.0: The Promise of Innovation,” Forbes, Aug. 2009.

Page 12: Open Data and Linked Data

Todd Park2nd United States Chief Technology Officer

Photography: U.S. Department of Labor, 2012.

Page 13: Open Data and Linked Data

Open Data Policy

Todd Park2nd United States Chief Technology Officer

Photography: U.S. Department of Labor, 2012.

”Making information resources accessible, discoverable, and usable by the public can help fuel entrepreneurship, innovation, and scientific discovery — all of which improve Americans’ lives and contribute significantly to job creation.” — Sylvia M. Burwell, Steven VanRoekel, Todd Park, and Dominic J. Mancini, Memorandum for the Heads of Executive Departments and Agencies, “Open Data Policy — Managing Information as an Asset,” May. 2013.

Page 14: Open Data and Linked Data

Joel GurinPresident and Founder,

Center for Open Data Enterprise

Photography: Techonomy, 2014.

Page 15: Open Data and Linked Data

Open Data Movement

Joel GurinPresident and Founder,

Center for Open Data Enterprise

Photography: Techonomy, 2014.

”The Open Data movement began with democratic goals, fuelled by the idea that governments should make the data they collect available to the taxpayers who’ve paid to collect it. But in addition to its social benefits, Open Data has created tremendous new business opportunities.”— “Open Data Now,” McGraw-Hill Education, 2014.

Page 16: Open Data and Linked Data

Why Open Data? — Open Data and Social Impact

Sketchnote: Open Government Partnership, 2013.

Page 17: Open Data and Linked Data

Why Open Data? — Driving Growth, Ingenuity, and Innovation

Sketchnote: Open Government Partnership, 2013.

”Data is the new capital of the global economy, and as organisations seek renewed growth, stronger performance and more meaningful customer engagement, the pressure to exploit data is immense. […] As a result, we foresee that open data, and not simply big data, will be a vital driver for growth, ingenuity and innovation in the UK economy. There are four key aspects to our vision:

1. Every business wil have a strategy to exploit the rapidly growing estate of open data. 2. Businesses will increasingly open up their data to revolutionise the way they compete. 3. Businesses will use open data to inspire customer engagement. 4. Businesses will work with the Government to establish a new paradigm in data

responsibility and privacy.” — “Open data: Driving growth, ingenuity and innovation,” Deloitte, 2012.

Page 18: Open Data and Linked Data

Why Open Data? — Large Amount of Economic Value

”Making data more “liquid” (open, widely available, and in shareable formats)” has the potential to unlock large amount of economic value (approx. $3 trillion annually), by improving the efficiency and effectiveness of existing processes; making possible new products, services, and markets; and creating value for individual consumers and citizens.” — “Open data: Unlocking innovation and performance with liquid information,” McKinsey Global Institute, Oct. 2013.

McKinsey & CompanyMcKinsey Global Institute

More open data for more users . . .

40+Number of countries with government open data platforms*

90,000+Data sets on data.gov (US site)*

1.4 millionPage views for the UK open data site in the summer of 2013

102Cities that participated in 2013 International Open Data Hackathon Day

1 million+Data sets made open by governments worldwide

* As of 2013

Page 19: Open Data and Linked Data

Why Open Data? — Large Amount of Economic Value

”While sources differ in their precise estimates of the economic potential of Open Data, all are agreed that it is potentially very large. In countries which were early movers in Open Data, there is already evidence of significant businesses having developed to exploit that potential. Leading governments have recognised that their role is not simply to publish data — they are supporting the whole value chain of the use of data […].” — “Open Data for Economic Growth,” The World Bank, Jun. 2014.

The World BankIBRD· IDA

Page 20: Open Data and Linked Data

Screenshot: “Open Data 500,” http://www.opendata500.com/.

Page 21: Open Data and Linked Data

Screenshot: “Open Data 500,” http://www.opendata500.com/.

Page 22: Open Data and Linked Data

Screenshot: “Open Data 500,” http://www.opendata500.com/.

Page 23: Open Data and Linked Data

Joel GurinPresident and Founder,

Center for Open Data Enterprise

Photography: The GovLab, 2013.

Page 24: Open Data and Linked Data

Joel GurinPresident and Founder,

Center for Open Data Enterprise

Defining Data Categories

OPEN DATABusiness Reporting And

Other Business Data(e.g., ESG data and comsumer complaints)

BIG DATA OPEN GOV

Non-Public Data

for marketing, business analysis, national security

CitizenEngagement

Programsnot based on

data (e.g., petition

websites)

Large Datasetsfrom scientific research, social media, or other non-government

sources

Public Datafrom state, local,

federal government (e.g., budget

data)Large Public Government

Datasets(e.g., weather, GPS, Census,

SEC, healthcare)

Photography: The GovLab, 2013.Diagram: From Joel Gurin, “Open Data Now,” McGraw-Hill Education, 2014.

Page 25: Open Data and Linked Data

Definition

Page 26: Open Data and Linked Data

Rufus PollockPresident and Co-Founder

Open Knowledge (Foundation)

Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.

Page 27: Open Data and Linked Data

Rufus PollockPresident and Co-Founder

Open Knowledge (Foundation)

Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.

Page 28: Open Data and Linked Data

Rufus PollockPresident and Co-Founder

Open Knowledge (Foundation)

Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.

Page 29: Open Data and Linked Data

Rufus PollockPresident and Co-Founder

Open Knowledge (Foundation)

Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.

Page 30: Open Data and Linked Data

Open Definition

Rufus PollockPresident and Co-Founder

Open Knowledge (Foundation)

Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.Source: “Open Definition 2.1,” http://opendefinition.org/od/2.1/, 2015.

Knowledge is

OPENif ANYONE is

FREE to ACCESS, USE,

MODIFY, and SHARE it— subject, at most, to measures that preserve

PROVENANCE and OPENNESS.

Page 31: Open Data and Linked Data

Open Definition

Rufus PollockPresident and Co-Founder

Open Knowledge (Foundation)

Photography: Sebastiaan ter Burg, http://www.flickr.com/photos/31013861@N00/14860905785/, 2014.Source: “Open Definition 2.1,” http://opendefinition.org/od/2.1/, 2015.

Data and Content are

OPENif ANYONE is

FREE to ACCESS, USE,

MODIFY, and SHARE it— subject, at most, to measures that preserve

PROVENANCE and OPENNESS.

Page 32: Open Data and Linked Data

How Data are Open or Closed, based on four characteristics

Source: From “Open data: Unlocking innovation and performance with liquid information,” McKinsey Global Institute, Oct. 2013.

Completely Closed

Completely Open

More “Liquid”

Degree of Access Everyone has access Access to data is to a subset ofindividuals or organizations

Machine-Readability Available in formats that can be easily retrieved and processed by computers

Data in formats not easily retrieved and processed by computers

Cost No cost to obtain Offered only at a significant fee

Rights Unlimited rights to reuseand redistribute data

Re-use, republishing, ordistribution of data is forbidden

Page 33: Open Data and Linked Data

Methods

Page 34: Open Data and Linked Data

Tim Berners-LeeThe Inventor of the World Wide Web

Photography: Bret Hartman, TED, 2014.

Page 35: Open Data and Linked Data

5-Star Deployment Scheme for Open Data

Tim Berners-LeeThe Inventor of the World Wide Web

Photography: Bret Hartman, TED, 2014.

”In order to encourage people — especially government data owners — along the road to good linked data, I have developed this star rating system. Linked Open Data (LOD) is Linked Data which is released under an open license, which does not impede its reuse for free. […] Linked Data does not of course in general have to be open. […]However, if it claims to be Linked Open Data then it does have to beopen, to get any star at all.” — “Linked Data,” Design Issues, 2010.

Page 36: Open Data and Linked Data
Page 37: Open Data and Linked Data

Image: Science for all, 2015.

Page 38: Open Data and Linked Data

The Tip of the Iceberg

Image: Science for all, 2015.

“All those pages on websites are only tips of icebergs:

• The real data is hidden in databases, XML files, Excel sheets, …

• You only have access to what the Web page designers allow you to see. […] Various data sources expose their data via Web Services or APIs, each with a different API, a different logic, a different structure. Mashups are forced to reinvent the wheel many times because there is no standard way getting to the data.”— Ivan Herman, “High Level Intro to Semantic Web,” Feb. 2012.

Page 39: Open Data and Linked Data

Tim Berners-LeeThe Inventor of the World Wide Web

Open Data, Now!

Page 40: Open Data and Linked Data

Conformant Licenses

Screenshot: http://opendefinition.org/licenses/ Screenshot: http://licenses.opendefinition.org/

Page 41: Open Data and Linked Data

Screenshot: http://opendatacommons.org/

Page 42: Open Data and Linked Data

Licenses for the “Database” and its “Contents”

Screenshot: http://opendatacommons.org/

”The database and its contents may have separate rights. […] Different types of subject matter (e.g., code, content, or data) necessitate differences in licensing. Licenses designed for one type of subject matter — as CC licenses (lower than 4.0) were designed for content, and F/OSS licenses for code — aren’t always best suited to licensing another type of subject matter.” — “Licenses FAQ,” Open Data Commons, 2010.

Page 43: Open Data and Linked Data

Creative Commons Rights Expression Language (CC REL)

Source: https://www.w3.org/Submission/ccREL/

Page 44: Open Data and Linked Data

Open Data Rights Statement Vocabulary

Screenshot: https://alpha.openaddressesuk.org/about/terms/ Screenshot: http://schema.theodi.org/odrs/

Page 45: Open Data and Linked Data

Open Data Rights Statement Vocabulary

Screenshot: https://alpha.openaddressesuk.org/about/terms/ Screenshot: http://schema.theodi.org/odrs/

Page 46: Open Data and Linked Data
Page 47: Open Data and Linked Data

Screenshot: The Next Web, TED, 2009.

Tim Berners-LeeThe Inventor of the World Wide Web

Page 48: Open Data and Linked Data

Tim Berners-LeeThe Inventor of the World Wide Web

Raw (Structured) Data, Now!

Screenshot: The Next Web, TED, 2009.

Page 49: Open Data and Linked Data

No Ontological Commitment, No Machine-Understandability.

”The term ontological commitment is used as a general term in both philosophy and in information systems to refer to the essential elements of an ontology. An ontological commitment in describing ontological comparisons is taken to refer to a subset of elements of an ontology that it shares with all other ontologies based upon the same theory or conceptualization.” — Citizendium, 2013.

Diagram: John R. Brews, Citizendium, 2013.

Page 50: Open Data and Linked Data
Page 51: Open Data and Linked Data

Open Formats

Page 52: Open Data and Linked Data

Screenshot: http://data.okfn.org/

Page 53: Open Data and Linked Data

Frictionless Data

Diagram: Open Knowledge, http://blog.okfn.org/2013/04/24/frictionless-data-making-it-radically-easier-to-get-stuff-done-with-data/, 2013.

Page 54: Open Data and Linked Data

Frictionless Data

Diagram: Open Knowledge, http://data.okfn.org/roadmap

Page 55: Open Data and Linked Data

Data Package Standards & Tools

Diagram: Open Knowledge, http://blog.okfn.org/2013/04/24/frictionless-data-making-it-radically-easier-to-get-stuff-done-with-data/, 2013. Screenshot: http://data.okfn.org/tools

Page 56: Open Data and Linked Data

CSV on the Web

Screenshot: https://www.w3.org/TR/tabular-metadata/ Screenshot: http://www.w3.org/TR/tabular-data-model/ Screenshot: http://www.w3.org/TR/csv2json/ Screenshot: https://www.w3.org/TR/csv2rdf/

Page 57: Open Data and Linked Data
Page 58: Open Data and Linked Data

Core & Community Datasets

Screenshot: http://data.okfn.org/data Screenshot: https://github.com/datasets

Page 59: Open Data and Linked Data

Screenshot: https://github.com/datasets/country-codes/blob/master/data/country-codes.csv

Page 60: Open Data and Linked Data

Screenshot: http://dat-data.com/

Page 61: Open Data and Linked Data
Page 62: Open Data and Linked Data

Tim Berners-LeeThe Inventor of the World Wide Web

Photography: Paul Clarke, 2014.

Page 63: Open Data and Linked Data

Linked Data

Tim Berners-LeeThe Inventor of the World Wide Web

Photography: Paul Clarke, 2014.

1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4. Include links to other URIs so that they can discover more things

“I’ll refer to the steps above as rules, but they are expectations of behavior. Breaking them does not destroy anything, but misses an opportunity to make data interconnected. This in turn limits the ways it can later be reused in unexpected ways. It is the unexpected re-use of information which is the value added by the Web.” — “Linked Data,” Design Issues, 2010.

Page 64: Open Data and Linked Data

Hypertext Transfer Protocol Uniform Resource Identifiers

Tim Berners-LeeThe Inventor of the World Wide Web

Photography: Paul Clarke, 2014.

”The first rule, to identify things with URIs, is pretty much understood by most people doing semantic Web technology. […] The second rule, to use HTTP URIs, is also widely understood. The only deviation has been a constant tendency for people to invent new URI schemes such as XRIs, DOIs, and so on for various reasons. Typically, these involve not wanting to commit to the established Domain Name System (DNS) for delegation of authority but to construct something under separate control. Sometimes it has to do with not understanding that HTTP URIs are names […].” — “Linked Data,” Design Issues, 2010.

Page 65: Open Data and Linked Data

Hypertext Transfer Protocol Uniform Resource Identifiers

”HTTP URIs, in the Web architecture, have been used to denote documents. However, with the growth of the Semantic Web, which uses URIs to denote anything at all, the urge to use and practice of using HTTP URIs for arbitrary things grew steadily.” — “What HTTP URIs Identify,” Design Issues, 2010.

Diagram: “What do HTTP URIs Identify?,” Design Issues, 2002.

KEY 1 … … … … … Car

KEY 2 … … … … … …

KEY 3 … … … … … …

KEY 4 … … … … … …

KEY 5 … … … … … …

URI 1

URI 2

URI 3

URI 4

URI 5

URI 0

URI 6

Page 66: Open Data and Linked Data

Resource Description Framework

Subject ObjectPredicate

Triple

URI 1 URI 3 / Value

URI 2

- COL 1 COL 2 COL 3 COL 4 COL 5 COL 6

KEY 1 … … … … … Car

KEY 2 … … … … … …

KEY 3 … … … … … …

KEY 4 … … … … … …

Page 67: Open Data and Linked Data

Resource Description Framework

Subject 1Object 1Subject 2

Predicate 1

Triple 1

URI 1 URI 3

URI 2

- COL 1 COL 2 COL 3 COL 4 COL 5 COL 6

KEY 1 … … … … … Car

KEY 2 … … … … … …

KEY 3 … … … … … …

KEY 4 … … … … … …

Object 2Predicate 2

URI 5 / Value

URI 4

Triple 2

- COL 7

Car Tire

… …

… …

… …

Graph

Page 68: Open Data and Linked Data

Linked Open Vocabularies (LOV)

Screenshot: http://lov.okfn.org/

Page 69: Open Data and Linked Data

Dereferenceable Uniform Resource Identifiers

Tim Berners-LeeThe Inventor of the World Wide Web

Photography: Paul Clarke, 2014.

”The third rule, that one should serve information on the Web against a URI, is, in 2006, well followed for most ontologies, but, for some reason, not for some major datasets. […] Large datasets provide a SPARQL query service, but the basic linked data should be provided as well. Many research and evaluation projects in the few years of the Semantic Web technologies produces ontologies, and significant data stores, but the data, if available at all, is buried in a zip archive somewhere, rather than being accessible on the Web as linked data.” — “Linked Data,” Design Issues, 2010.

Page 70: Open Data and Linked Data

Dereferenceable Uniform Resource Identifiers

Diagram: From “Architecture of the World Wide Web, Volume One,” W3C, 2004.

Page 71: Open Data and Linked Data
Page 72: Open Data and Linked Data

Why Linked Data?

Source: Tom Heath, “How to Publish Linked Data on the Web,” 2008.

• Ease of Discovery

• Ease of Consumption

- standards-based data sharing

• Reduced Redundancy

- avoid duplication

• Added Value

- build ecosystems around your data/content

Page 73: Open Data and Linked Data

Diagram: “The Linking Open Data cloud diagram,” http://lod-cloud.net/, 2014.

Page 74: Open Data and Linked Data

Open Data EcosystemFigure 1. The open data ecosystem

Supplies data toUses data to deliver to

Source: Deloitte LLP

Business data

Businessdata

Businessdata

Governmentdata

Citizendata

Citizendata

Government data

Citizen data

Citizen

Governmentdata

Government

Business

There are three principal constituencies in any successful open data ecosystem: government, business and citizen. Each constituency supplies data to itself and to others. In turn, businesses and government use the data to deliver services demanded by all constituencies. The three classes of open data supplied by the constituencies and used to deliver services are:

Open government data – data produced, collected or paid for by the public sector, subject to restrictions relating to sub judice, national security, commercial sensitivity and privacy. In addition, special commercial arrangements also being made for certain trading funds, including Companies House, the Ordnance Survey, the Meteorological Office and HM Land Registry, which together form the newly created Public Data Group.10

Open business data – data produced or collected by the private sector and published freely and openly, subject to restrictions that individual businesses decide to put in place.

Open citizen data – the personal and non-personal data of individual citizens published into the open domain.

Open data Driving growth, ingenuity and innovation 9

Diagram: From “Open data: Driving growth, ingenuity and innovation,” Deloitte, 2012.

Page 75: Open Data and Linked Data
Page 76: Open Data and Linked Data
Page 77: Open Data and Linked Data

Epilogue

Page 78: Open Data and Linked Data

Diagram: John Snow, 1854.

Page 79: Open Data and Linked Data

Diagram: Florence Nightingale, “Notes on Matters Affecting the Health, Efficiency, and Hospital Administration of the British Army,” 1858.

Page 80: Open Data and Linked Data

Photography: "Don’t Panic – the Truth about Population,” BBC, 2013.

Hans RoslingCo-founder and Chairman,

Gapminder Foundation

Page 81: Open Data and Linked Data

Photography: "Don’t Panic – the Truth about Population,” BBC, 2013.

Hans RoslingCo-founder and Chairman,

Gapminder Foundation

Animation: “Mesmerizing Animation Shows How Much Healthier The World Has Become,” Business Insider, 2014.

Page 82: Open Data and Linked Data

Reality Mining: Serendipitous Reuse

Figure 1. Normalized data from Fluwatch (influenza cases, lab tests, ILI reports from sentinel physicians) and Google (number of clicks on an keyword-triggered influenza link).

Results Over the flu-season period, the Google campaign received a total of 54,507 impressions and 4,582 clicks (Figure 1). Among all the ad campaign measures, the number of clicks on the ad was found to have the best correlation with traditional surveillance measures, which is why I show only correlation data for clicks. In general, clicks correlated better with flu events than ILI reports from sentinel physicians (Table 1). Internet clicks also were a timelier marker than ILI-SPR, in that they performed better to predict the flu events of the following week, whereas correlation coefficients in the ILI-SPR method were better for the current week than for the following week. All correlations were significant on a P<.001 level. Trivariate linear regression analysis adjusting for the ad position within Google did not improve the fit substantially, as most ads appeared close to the top anyway (data not shown). Would a threshold of 150 clicks per week have been used to trigger a flu-outbreak alert, all 11 weeks with 524 flu-cases or more following the

query sampling week could be predicted with 100% specificity and sensitivity. The costs of the Google sentinel method were negligible compared to traditional methods: Google charges $0.08 per click-through, thus the campaign cost only Can$365.64 for the entire flu-season.

Table 1. Pearson correlation coefficients of ad clicks and influenza like illness reports from sentinel physicians (ILI-SPR) as measures for predicting influenza incidence data from the current or following week (all P<.001). (* see Figure 2, ** see Figure 3)

Clicks ILI-SPR Same week ILI-SPR .73 — Lab tests .85 .83 Cases .88 .80 Following week ILI-SPR .81 .71 Lab tests .90 .82 Cases .91 * .75 **

AMIA 2006 Symposium Proceedings Page - 246

Screenshot: https://www.google.com/adsense/start/ Diagram: Gunther Eysenbach, "Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance," 2006.

Page 83: Open Data and Linked Data

Thank You.