46
transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

The Social Data Web

Embed Size (px)

DESCRIPTION

This presentation is the culmination of my detail to the E-Government Office in the US Office of Management and Budget and the work I did to evolve and mature initiatives like recovery.gov and data.gov.

Citation preview

Page 1: The Social Data Web

transparency, collaboration and information sharing

solution architecture tools and techniques using the social data web

george thomas, 1105 ea2009

Page 2: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 3: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 4: The Social Data Web

Web Oriented Architecture (WOA)• REpresentational State Transfer (REST)

– The architectural style of the World Wide Web– aka Resource Oriented Architecture (ROA)

• hyperlinks dereference (information) resource representations– HTTP URI's and content negotiation

• user agent prefers .htm, .xml, .rdf, .etc

• statefulness– servers maintain resource state, clients maintain application state

• RESTful Web services– HTTP uniform interface

• CRUD analog to HTTP PUT/GET/POST/DELETE– contrast to Remote Procedure Call (RPC) style Web services

• SOAP/WSDL, you design the methods to invoke

• global visibility (the Web) and persistence (permalinks)– caching, crawling, indexing

Page 5: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 6: The Social Data Web

XForms - human data capture• Orbeon server side XForms engine, Ajax browser GUI's

• catalog and builder apps• create new XSD bound forms• populate, persist, search• Tomcat and eXist• off-line capability• transformation pipeline

Page 7: The Social Data Web

Atom Publishing Protocol (APP)• automated invocation of the RESTful Web service

– HTTP PUT/POST the spreadsheet or XML instance doc• to atomserver.codehaus.org

• where else is APP used?– Google Data API's, Microsoft Live Framework

Page 8: The Social Data Web

Atom Syndication Format• transform XForm or APP captured info into XHTML+RDFa • (permalinked) public recordset in feed entry <content>

Page 9: The Social Data Web

the london-gazette.co.uk

Page 10: The Social Data Web

london-gazette.co.uk/listing

small, discreet, component ontology/data-domain-metamodels

Page 11: The Social Data Web

web page = web service

Page 12: The Social Data Web

RDFa enabled 'deep link' discovery• Rich Snippets from Google

• SearchMonkey from Yahoo

Page 13: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 14: The Social Data Web

goal: federated dataset correlation• graph based dynamic schema evolution across silos

– centralization/normalization not required (or realistic/practical!)

Page 15: The Social Data Web

Web as DB - Web API• Linking Open (Government) Data (LOD)

• SPARQL endpoints

linkeddata.org

Page 16: The Social Data Web

browse: from web of docs to web of data

Page 17: The Social Data Web

http://data.linkedmdb.org/page/actor/10

• content negotiation, user agent prefers;– human (html) or machine (rdf/xml) readable

RDF/N3

Page 18: The Social Data Web

http://data.linkedmdb.org/page/actor/10

• now at the bottom of the same page/actor/10– triple is Subject (S) Predicate (P) Object (O)

• 10 (S) vocabulary:property (P) <object> (O)

– properties link to other dataset instances• that use different datatype definitions

– note D2R app, expose RDB as RDF, SPARQL to SQL

Page 19: The Social Data Web

http://data.linkedmdb.org/data/actor/10• <subject> has predicate {space} object1 , objectN ; repeat until .

<http://data.linkedmdb.org/resource/actor/10> foaf:page <http://www.freebase.com/view/guid/9202a8c04000641f800000000007821e> ,

<http://www.imdb.com/name/nm0000564/> ;

owl:sameAs <http://mpii.de/yago/resource/Peter_O%27Toole> , <http://dbpedia.org/resource/Peter_O%27Toole> ;

rdf:type movie:actor ,

foaf:Person .

• this is an 'N3' RDF serialization, instead of RDF/XML (or others)

• some properties have RESTful SPARQL queries as <objects>

foaf:person rdfs:seeAlso <http://data.linkedmdb.org/sparql?query=DESCRIBE+<http://xmlns.com/foaf/0.1/Person>

Page 20: The Social Data Web

Web based SPARQL query builder

http://dbpedia.org/ is powered by http://www.openlinksw.com 'Virtuoso' that provides a 'SPARQL endpoint' (DRM 'query point')

Page 21: The Social Data Web

creates dbpedia.org query

• use response data in next query

Page 22: The Social Data Web

authoritative metadata - provided tags!!• using standardized datatype and property specifications

• ontologies emerges from social folksonomy

http://commontag.org

Page 23: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 24: The Social Data Web

indexing/searching the Data Web

Page 25: The Social Data Web

aggregation and live data reporting

http://sig.ma

Page 26: The Social Data Web

many to many set visualization

http://mqlx.com/~david/parallaxinterface used to aggregate data across multiple (data) 'bases' on

http://freebase.com

Page 27: The Social Data Web

ad-hoc analyst/end-user 'meshups'

Page 28: The Social Data Web

schema/bizmo/federal_enterprise

• bizmo.freebase.com = OMG BMM + CPIC (+SOA...)– Obama is an instance of the Federal Enterprise type

• Federal Enterprise (S) Fed Ent Goal (P) Goal (O)

Page 29: The Social Data Web

/rdf/bizmo.federal_enterprise (excerpt)• (W3C/FBase) <subject/topic> <predicate/property>

<object/topic> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.object.name> "Federal

Enterprise"@en.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/freebase.type_profile.instance_count> "1"^^<http://www.w3.org/2001/XMLSchema#long>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.instance> <http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000c61962c>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_strategy>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_tactic>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_directive>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_objective>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_information_technology_budget>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_goal>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://www.w3.org/1999/xhtml/vocab#license> <http://creativecommons.org/licenses/by/3.0/>.

Page 30: The Social Data Web

connecting the data dots:• create the following subject/predicate/object or topic/property/topic

schema:

Goal / amplifies / Vision

Objective / quantifies / Goal

Federal Enterprise / (has) Fed Ent Goal / (of type) Goal

Federal Agency / maintains / Exhibit 53

Exhibit 53 / contains (multiple) / Exhibit 53 Recordset(s)

Exhibit 53 Recordset / Supports Federal Goal / (of type) Goal

• then create instances with data from http://it.usaspending.gov:

Obama / is of type / Federal Enterprise

Obama / has a Fed Ent Goal / Health Care Reform

HHS / is of type / Federal Agency

HHS / maintains / HHS Exhibit 53

HHS Exhibit 53 / contains / Nat Health Info Network Connect

Nat Health Info Network Connect / supports Obama Goal / Health Care Reform

Page 31: The Social Data Web

search all 'bases' for 'Exhibit 53'

http://mqlx.com/~david/parallax interface tohttp://bizmo.freebase.com

Page 32: The Social Data Web

base/bizmo/e53 returns

• a collection (2 instances) of an Exhibit 53 topic– one from HHS and GSA (data from it.usaspending.gov)

• triple in Exhibit 53 topic schema– Exhibit 53 (S) contains (P) Exhibit 53 Recordset (O)

Page 33: The Social Data Web

discovering unknown data structures

• the power of 'faceted' search and browsing• interactive query – which of these?

– Ex53 Recordset (S) Supports Federal Goal (P) ? (O)

Page 34: The Social Data Web

traversing the data graph

• from info about an IT investment• to info about Administration priorities

• 2 Ex53's to 3 Recordsets to 1 that has Obama Goal– <uri> (S) <uri> (P) <uri> (O)

Page 35: The Social Data Web

http://freemix.it - more faceted filtering

Page 36: The Social Data Web

scatter chart driven by tag clouds

Page 37: The Social Data Web

more multi-dataset faceted meshups

Page 38: The Social Data Web

drag & drop metadata/data 'curation'

Page 39: The Social Data Web

publish new freemix merged dataset choose a stylesheet, view lenses and facets to include for your end users to interact with

Page 40: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 41: The Social Data Web

crowdsourced analyticsshown using 'Top Braid Composer Maestro' from

http://topquandrant.com

'SPARQLMotion' script – also see Yahoo | Derihttp://pipes.yahoo.com | http://pipes.deri.org

Page 42: The Social Data Web

cloud scale analytics (petabyte batch)• proprietary Google

– GFS, BigTable and MapReduce

– page rank impl• open source Apache Hadoop

– HDFS, HBase and MapReduce

– entity, RDFa extraction• Amazon EMR, Cloudera

– COSS prof service providers

facebook.com

Page 43: The Social Data Web

talis.com/platform - cloud graph store• Software as a Service, enabling rapid development with zero deployment

costs

• a simple, consistent web API for storing, managing and retrieving both structured and unstructured data

• flexible, schema-free metadata that allows applications to be easily evolved

• a range of data access and query options enabling easy integration into both new and existing applications

• access control options to support hosting of both public and private data

• a data hosting solution that is founded on open internet standards and web architectural best practices

• ...

• every resource in your (data)store has a unique URL from which its metadata can be retrieved with a single web request

• SPARQL queries can be used to perform more complex queries, retrieving results as a tabular result set or as RDF

• content negotiation can be used to retrieve data as RDF, XML, or JSON allowing you to chose the right format for your application

Page 44: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 45: The Social Data Web

application to EA discipline getting there from here

– stop:• publishing / analyzing / visualizing unstructured data• using structure data only in file or message exchanges

– start:• align Gov and Web architecture (including EA KB's!)• publish component ontologies on the Web• and begin linking their metadata and data• using the Social Data Web

– continue:• embrace emergent structure and continuous improvement• using open source and enabling long-tail crowd-sourcing

Page 46: The Social Data Web

q&a - discussion• thanks for your time and attention!

• contact me

– http://xri.net/=george.thomas

– GSA OCIO Chief Enterprise Architect– FCIOC-AIC Services Subcommittee Chair– W3C eGov IG invited expert– OMG GovDTF Steering Committee– Graduate School Faculty SOA Instructor