42
Bernadette Hyland, CEO co-chair W3C Government Linked Data WG [email protected] @BernHyland NARA II - College Park MD ~ 07 February 2013 US Government Linked Data 1

US National Archives & Open Government Data

Embed Size (px)

DESCRIPTION

Presentation to the US National Archives on the use of Linked Data by US Government. Linked Data increases access and re-use opportunities for publishers and data consumers.

Citation preview

Page 1: US National Archives & Open Government Data

Bernadette Hyland, CEOco-chair W3C Government Linked Data WG

[email protected]@BernHyland

NARA II - College Park MD ~ 07 February 2013

US GovernmentLinked Data

1

Page 2: US National Archives & Open Government Data

Agenda

• Intros ...• Trends in data management• Government data publication• Update on new Linked Data Services

2

Page 3: US National Archives & Open Government Data

3 Round Stones produces the leading platform for the publication of data on the Web. Our commercially supported Open Source platform is used by the Fortune 2000 and US Government agencies to collect, publish and reuse data, both on the public Internet and behind institutional firewalls.

3

Page 4: US National Archives & Open Government Data

Callimachus

Our Partners

4

Our partners ...Our customers - 50% US Gov’t and 50% private sector, focused on pharma & health delivery, and business publishing.

Page 5: US National Archives & Open Government Data

5

Headlines and agency memos about government transparency with open data and various government Web sites.... innovation challenges based on open government data

... High energy datapalooza’s are emerging with awards ranging from a couple thousand to $100k+. These challenges open the doors to innovation for better healthcare solutions and more efficient use of energy, to name but a few. They all require access to and re-use of HIGH QUALITY DATA.

In 2012, we read many headlines about big data and world’s search engines and social media sites.

Page 6: US National Archives & Open Government Data

6

Page 7: US National Archives & Open Government Data

7

Who is sharing their data as Linked Data? Small and large commercial and government organizations, NGOs, Non-profits ... plus many universities. Governments in the last few years have been responding to Open Government initiatives that mandate publishing open government data. Some are careful, slow-moving entities who simply needed to find real solutions to real problems.

Page 8: US National Archives & Open Government Data

GovernmentsGoals: Governmental transparency and/or improved

internal efficiencies (data warehouses)

8

Page 9: US National Archives & Open Government Data

9Photo credit: http://www.flickr.com/photos/glennharper/4452247708/9

However, while there is lots of gold to be mined from public data, it is an uncomfortable time for Government IT and business managers who are tasked with data management programs.

Most people are having a difficult time keeping up. If you feel like you are hanging on while the world changes too fast, you are not alone.

Photo credit: http://www.flickr.com/photos/glennharper/4452247708/

Page 10: US National Archives & Open Government Data

10

Linked data is used extensively by the government seen to be the global leader in data transparency -- the UK Government. This is their home page.

Page 11: US National Archives & Open Government Data

Big DataSimple dataComplex dataLegacy data

11

KEY POINT: Search, discovery and data access approaches have evolved over the last decade and techniques are beginning to come together. GoPubMed was launched in 2002 as the first semantic search portal. Later, Microsoft’s Bing, Google’s Knowledge Graph are two of the other well known search engines employing semantic techniques.

Big data research has grown to include the MapReduce algorithm for handling really large data sets, often measured in terabytes or greater. This is the kind of data that people at the Large Hadron Collider at CERN are working on to provide insights into how the universe works, including the recent discovery of the Higgs Boson, the particle that gives mass to matter.

Under the big top tent of semantic search we’re dealing with different types of content, big, public, complex and legacy data. Simple, complex and legacy data comes in small, medium and large sizes.

Many government agencies by contrast have lots of small to medium data sets in structured databases. These databases (and the systems that depend upon them) are not going away however fewer new data warehouse projects are likely to be started. Data warehouses are widely recognized to be costly to create and maintain, and change SLOWLY.

The biggest win for governments worldwide who adopt a Web architecture for data publishing is combining data sets to discover new or previously uncontemplated relationships.

Page 12: US National Archives & Open Government Data

“Big Data Is Important, but Open Data Is More Valuable”As change agents, enterprise architects can help

their organizations become richer through strategies such as open data.

David Newman, VP Research, Gartner

12

Open data refers to the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other forms of control.

The term “open data” has gained popularity with open data initiatives including data.gov.uk, data.gov and other government data catalog sites.

Enterprise architects are playing an important role in fostering information-sharing practices. Access to, and use of, open data will be particularly critical for a business that operate using the Web; organizations should focus on using open data to enhance business practices that generate growth and innovation.

Page 13: US National Archives & Open Government Data

13

A sound government information management strategy requires providing CONTEXT and CONFIDENCE to those accessing and potentially re-using your data.

Giving people have timely access to information, for disaster preparedness, scientific research, policy and research, the network effect of people helping people is our greatest hope.

On the heels of the recent East Coast hurricane that devastated parts of New York and New Jersey, government executives suggested that fear of cyber-doom scenarios may be taking too much of our thinking & planning. According to Secretary Panetta, it may be driving us to unrealistic and potentially dangerous responses to threats that don’t exist.

The reality is that when disaster strike, people come together and help one another. We don’t see paralysis, panic and social collapse.

During today’s session, I’ll describe how several agencies and private sector organizations are using Web technologies and semantics to improve information access and discovery. Simply put, semantic technologies provide CONTEXT.

Page 14: US National Archives & Open Government Data

Open Government Data

14

Page 15: US National Archives & Open Government Data

“We’re moving from managing documents to managing discrete pieces of open data and content which can be tagged, shared, secured, mashed up and presented in the way that is most useful for the consumer of that information.”

-- Report on Digital Government: Building a 21st Century Platform to Better Serve the American People

Growing chorus ...

15

The Digital Government Strategy sets out to accomplish three things: Access to high quality digital information & services; procure and manage devices, applications, and data in smart, secure and affordable ways; and unlock the power of government data to spur innovation.

Governments around the world are defining detailed digital services plans based on open data, open APIs and open source data platforms. They are defining how governments are publishing data with an eye towards improving access and re-use. Administrators and program managers are committing to delivery of digital services using semantic technologies broadly, and Linked Data specifically.

Page 16: US National Archives & Open Government Data

Open data + open standards + open platforms

Highly scalable computing &

hosting via the CloudInternational Data Exchange

Standards

5 Star Data (Linked Data)

Open Source tools

16

A Web-oriented approach to information sharing has impacted how scientists, researchers, regulators and the public interacts with government.

Linked data lowers the barriers to re-use and interoperability among multiple, distributed and heterogeneous data sources.

Access to high-quality Linked Open Data via the Web means millions of researchers and developers will be able to shorten the time-consuming research process involving data cleansing and modeling.

Page 17: US National Archives & Open Government Data

17

How do we get a loose coupling of shared data over Web architectures? By using the structured data model for the Web: RDF.

There is a project to create freely available data on the Web in this way, which is known as the Linked Open Data project.

W3C sees Linked Data as the set of best practices and technologies to support worldwide data access, integration and creative re-use of authoritative data.

Page 18: US National Archives & Open Government Data

18

September 2011: 295 datasets that meet the LOD Cloud criteria, consisting of over 31 billion RDF triples and are interlinked by around 504 million links.

Page 19: US National Archives & Open Government Data

Callimachushttp://callimachusproject.orghttp://3roundstones.com

19

Callimachus is that platform. It is available via 3roundstones.com or its Open Source site callimachusproject.org.

Page 20: US National Archives & Open Government Data

CONTENTMANAGEMENT

SYSTEM

LINKED DATAMANAGEMENT

SYSTEM

Callimachus

UNSTRUCTURED

TEXT

TEXT

STRUCTURED

DATA

DATA

20

Callimachus may be compared to a distributed CMS. CMS’s manage mostly unstructured information. Callimachus, by contrast to a CMS, manages primarily structured Linked Data. We call this a Linked Data management system.

Page 21: US National Archives & Open Government Data

21

Clinical Trials + enterprise linked

data

US Legislation + enterprise data

DBpedia + enterprise datasets

Data driven Web apps using Callimachus

21

Callimachus integrates (very) well with other enterprise systems as well as Web content. It can form an entire application or part of one.NB: Mention Documentum, Oracle via HTTP

Page 22: US National Archives & Open Government Data

22

• US HHS committed to making a vast array of open data more readily available to improve health care delivery & reduce costs in 2013 and beyond.

• In 2012, Sentara created a Web application that integrates authoritative data from 5 different sources including content from NLM, NOAA, EPA and DBpedia

• This application utilizes open data, open standards and an open source data platform

Page 23: US National Archives & Open Government Data

User

NOAA US EPA AirNow

DBpediaNational Library of Medicine

US EPA SunWise

23

Page 24: US National Archives & Open Government Data

US EPA Linked Data

• Cloud-based Linked Data provision of 3 core programs:

• 2.9M Facilities• 100K substances• 25 years of toxic pollution reports• FISMA compliant• 16 Callimachus templates• Official launch March 2013

24

Page 25: US National Archives & Open Government Data

25

Envirofacts, EPA’s older system.

Page 26: US National Archives & Open Government Data

26

EPA’s new Linked Data system. Cooperation without coordination. Data reuse breaks the back of API gridlock. Clay Shirky stole that from me :)

Page 27: US National Archives & Open Government Data

27

This data is exactly the same data used to create the interface. Unlike traditional database-driven applications, the data is immediately accessible for reuse by third parties. This prevents data duplication, allows for tracking of provenance and avoids reinventing the wheel.

Page 28: US National Archives & Open Government Data

We’ve Seen This Before

28

Like HTML and RDF, credit cards have a human-readable side and a machine-readable side.

Page 29: US National Archives & Open Government Data

Contractor (3 Round Stones, Inc.)

Public

Application, Script or automated client

Web Browser

SPARQL endpointREST APIResource URIs

Linked Data management systemlocated at a Tier 1 Cloud Provider

(FISMA compliant)

RDF Database

Registered developer

29

Introduce Callimachus, an open source, open data platform based on open standards. 3 Round Stones provides commercial support for Callimachus and is a major contributor to the OS project.

Users of Callimachus see a generated Web interface, but can also directly access the data via REST or SPARQL.

SPARQL Named Queries (like stored procedures) allow for automated conversion to different formats for reuse in non-RDF environments.

Page 30: US National Archives & Open Government Data

From WikipediaFrom EPA

Open Street Map

30

Data may be easily combined from several sources.

Page 31: US National Archives & Open Government Data

US GPO• Cloud-based Linked Data provision of persistent URLs for US Government documents:

• 33K documents• Used by 1,240 Federal Depository Libraries and public

• In 3rd year of operation• Deemed an “Essential service” supporting US Congress

31

Page 32: US National Archives & Open Government Data

Real World Linked Data

32

Now let’s look at the same workflow in the Linked Data Service.

Page 33: US National Archives & Open Government Data

Finding Hanson Permanente

33

By keeping the application simple - and letting the results be viewed either as a table or a map - the user can adjust their search as they see fit without extra navigation. Also, by having the data in a table that can searched or sorted however the user sees fit, finding a specific facility is as easy as typing the name in or sorting on relevant criteria. This is made possible by exposing the data, rather than containing it in a standard HTML table.

I fully recognize that Envirofacts could offer identical functionality by tweaking their application, but the key underlying point is that this application was created very cheaply and quickly *because* the data is modeled as Linked Data. When the developing environment is a Web Browser, and the data is described and Linked, an application can be a simple XHMTL page with JavaScript, instead of a heavy-weight dedicated application.

Page 34: US National Archives & Open Government Data

Finding Mercury Released in 20041

2

34

There are two very important things to note on this page. 1 is that on any facility’s page, there is always an option to download the data. This data is available in two formats (RDF/XML and Turtle). With the click of a button a user can have all of the data that was used to drive the creation of the current page, which means he or she can repurpose that data into any new application. Note here that this download is not an extract, summary, or recreation of the data - it is literally the *same* data that was used to drive that page.

2 is that because this page is “data-driven”, navigation relies on exploring the data, not the system that contains it. On the same page where we get information like it’s latitude and longitude, we can also find a link to a report detailing exactly how much mercury was released in 2004. We could easily do an in-page search for 2004 or Mercury to identify the releases associated with those terms.

Page 35: US National Archives & Open Government Data

TRI Report

35

Rather than aggregating the data for presentation, the actual report is presented with the raw data continuously available in the top right of the page.

A subtle difference to be pointed out here is the difference in the name of the facility. Previously it was identified as Hanson Permanente, but now it is known as Lehigh Southwest Cement Co. During the modeling phase, the Linked Data was created to implicitly include this relationship (which is known via the mapping of EPA FRS identifiers). On the other hand, pulling down the CSV files would not give the user any obvious way of understanding this relationship.

Page 36: US National Archives & Open Government Data

Data Reuse

36

Developers can grab the data off any page, at any time during navigation. The site facilitates the reuse of data. These graphs are not natively embedded in the webpage of a given facility. Rather, by downloading the data the user can quickly and easily make new and different visualizations for a report or presentation in 10 minutes.

For example, this history of air stack pollution reports was made with a single parameterized SPARQL query and a single JavaScript pattern. This could very easily be applied to any number of facilities, changed to a bar graph, or altered in any number of other ways with very little effort thanks to the fact it was modeled using Linked Data.

Page 37: US National Archives & Open Government Data

Potential Audience

• Middle school student doing a science project

• Concerned citizen worried about local pollution

• Environmental Science PhD from EPA

• Doctor from NIH writing a research paper

37

Linked Data allowed us to reach all the members of our potential audience by giving the user options, aggregating based on relevance rather than data source, and by exposing the data that drives the service for reuse.

The middle school student or concerned citizen that want to know the location of a facility, the amount of a particular chemical it released, and the year it was released in never have to click any of the options in the Linked Data box. They can simply use the interface, explore the data, and find what they need in a read-only experience.

The Environmental Science PhD is still able to find what he is looking for with Linked Data but can do so in a much more intuitive way. The doctor from NIH is now able to find the data they’re interested in and if they choose to take the next step, download the actual data behind the page. By quickly and easily obtaining the raw data, anyone from scientists to journalists can generate their own applications without any knowledge of the Linked Data Service itself.

Page 38: US National Archives & Open Government Data

http://www.manning.com/dwood/

http://3roundstones.com/linking-government-data/

http://3roundstones.com/linking-enterprise-data/

38

Page 39: US National Archives & Open Government Data

39

Page 40: US National Archives & Open Government Data

The mission of the Government Linked Data (GLD) Working Group is to provide standards and other information which help governments around the world publish their data as effective and usable Linked Data using Semantic Web technologies.

40

We are 16 months into the Government Linked Data Working group’s two year charter.

Page 41: US National Archives & Open Government Data

Credits

David NewmanGartner: “Innovation Insight: Linked Data Drives Innovation Through Information-Sharing Network Effects” Published: 15 December 2011

David Wood, ed. Linking Government Data, Springer (2011) http://3roundstones.com/linking-government-data/

US Executive Branch

Digital Government Strategy: Building a 21st Century Platform to Better Serve the American People, http://www.whitehouse.gov/sites/default/files/omb/egov/digital-government/digital-government.html

W3C Linked Data Cookbook http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

All other photos and images © 2010-2013 3 Round Stones, Inc. and released under a CC-by-sa licenseAll other photos and images © 2010-2013 3 Round Stones, Inc. and released under a CC-by-sa license

41

Page 42: US National Archives & Open Government Data

This work is Copyright © 2011-2012 3 Round Stones Inc.It is licensed under the Creative Commons Attribution 3.0 Unported LicenseFull details at: http://creativecommons.org/licenses/by/3.0/

You are free:

to Share — to copy, distribute and transmit the work

to Remix — to adapt the work

Under the following conditions:Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

42

This presentation is licensed under a Creative Commons BY-SA license, allowing you to share and remix its contents as long as you give us attribution and share alike.