16
Digital Worlds (applications) VEC (Enterprise Scale) 1,300 source databases 10+ million views (via data integration) US Healthcare (National Scale) Scale o Health care and social assistance offices: 784,626 incl Doctors offices: 220,131 Dentists: 127,057 Hospitals: 6,505 Clinics: ~5,000 ~= SME say 100 Databases o Patients: 100-300+ million o Databases: ~32 million Scope o Comprehensive medical events, methods, analysis, … E.g., Alice (62) in Emergency Room with liver failure o Insurance, payments, … o New metric: healthcare quality Examples o SHRINE (2009): 3 hospitals; uses 2,381,883 distinct concepts (ontologies) o HHS CIO (Todd Park): Open Health Data Initiative o US (PCAST, White House) vision

Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Digital Worlds (applications) q  VEC (Enterprise Scale)

•  1,300 source databases •  10+ million views (via data integration)

q  US Healthcare (National Scale) •  Scale

o  Health care and social assistance offices: 784,626 incl •  Doctors offices: 220,131 •  Dentists: 127,057 •  Hospitals: 6,505 •  Clinics: ~5,000 ~= SME say 100 Databases

o  Patients: 100-300+ million o  Databases: ~32 million

•  Scope o  Comprehensive medical events, methods, analysis, …

•  E.g., Alice (62) in Emergency Room with liver failure

o  Insurance, payments, … o  New metric: healthcare quality

•  Examples o  SHRINE (2009): 3 hospitals; uses 2,381,883 distinct concepts (ontologies) o  HHS CIO (Todd Park): Open Health Data Initiative o  US (PCAST, White House) vision

Page 2: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Observations q Data Sources

•  Massive o  Number o  Heterogeneity o  Distribution (data at source) o  Constant change – data, model, ontology, business rules, …

•  Constrained o  Governance: privacy, confidentiality, legal, … o  Quality, correctness, precision, … o  Competition

q Critical Requirement: meaningful •  Human lives •  Health of individuals, communities, nation •  Economic impact: $ trillions / year •  Political: meaningless debates

Page 3: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Trends q Digital Universe

q Holistic Views •  Information Ecosystems: data •  Ecosystems: Processes over services

q Big Data: massive o  Number o  Distribution o  Heterogeneity

•  Semantics •  Structure: relational databases, X databases, web, deep web •  Technology: databases, data warehouses, files, …

q New Models: problem solving, data, … •  Data-driven •  Social computing: data as social artifacts •  Science: Wolfram Alpha •  Pragmatics: Driven by healthcare quality improvement

Page 4: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Databases and AI: The Twain Just Met q Database World

•  Engineering (RDBMSs) @ scale •  Reasoning: Relational model (FoL)

q AI World •  Reasoning: more powerful & expressive •  Engineering: in the small

q Digital Universe, e.g., Web •  Reasoning: beyond the RDM & AI? •  Engineering: way beyond RDBMS

q  Information ecosystems •  Databases: join •  Web: link

Power Law of Data The value of a data element is proportional to the number of its meaningful uses.

Page 5: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

What Underlies the Digital Universe

Languages

Modelling

Semantics

Data Models

Problem Solving Computation

Algorithms

Execution

Semantics

DBMS Engines

Page 6: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

What Underlies the Data Universe

Relational Data Model

Problem Solving Computation Semantics Semantics

RDBMS Data Independence

Page 7: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Relational Database Improvements q Pre-Relational

•  Hierarchical •  Network

q Relational •  Row store •  OLAP / Data Warehouse

q Post-Relational •  RDF store •  Column store •  Bare bones relational •  Stream / complex event processing

q Push Down •  Database / data warehouse appliances (20+ on the market) •  In-database analytics, … (10+ on the market)

Page 8: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Data Models For New Domains Must Honor Data Independence

q  Array (Matrix)-store (SciDB) [Linear algebra]

q  XML databases: structured content, information exchange

q  Content management: e.g., Sharepoint

q  Graph/network store: social networking (Facebook), link analysis

q  Protein store: protein folding, drug discovery, …

q  Geospatial / map store: location-based applications

q  Time series: signal processing, statistical and financial analysis

q  Cloud / Mesh data (NoSQL) stores: web scale applications

q  and they just keep coming …

Page 9: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Data Universe

Database Universe

Relational Data

Universe

Page 10: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

DBU

RDM

Scientific Data

Model

Time Series Data

Model

Geo-Spatial Data

Model

Digital Media Data

Model

Document Data

Model

Graph- Network

Data Model

ETC. ETC. ETC.

Data Universe

Page 11: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

DBU

RDM

Scientific Data

Model

Time Series Data

Model

Geo-Spatial Data

Model

Digital Media Data

Model

Document Data

Model

Graph- Network

Data Model

ETC. ETC. ETC.

Data Universe

Page 12: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Data Integration Solution Space:

Semantic Technologies (AI) Knowledge Representation Minimal Powerful

Ontologies Minimal Powerful Semantic Web Modest / emerging Modest / emerging

Semantic Data Management Emerging Emerging Architectural

Information-As-A-Service Emerging Emerging Cloud Emerging N/A

Databases Relational Optimal 4 homogeneous

relational data Optimal 4 pure relational data

Domain-specific Emerging Emerging

Computation Problem Solving

Data Independence Required

Page 13: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Databases vs. Semantic Web

Mathematical Logic

Discrete Worlds

Probabilistic / Eventual Reasoning

Single Versions of Truth

Databases Semantic Web

What Logic ?

Heterogeneous Worlds

Common Sense Reasoning?

Multiple Truths

1,000s of databases

Data Models LOD Models?

DI: Relational Join DI: Evidence Gathering

Page 14: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

.  .  .    

Data  Management  

Seman+cally  Heterogeneous  Views  

Single  versio

ns  

of  truth  

Mul2p

le  versio

ns  of  truth  

Seman+cally  Homogeneous  Databases  

.  .  .    Web  

.  .  .    Data  Warehouses  

Sca

le

Databases vs. Web Evidence  Gathering  

Analysis  /  BI  Explora2on  

Page 15: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Data Integration q Query: define the result

•  Entity •  Computation

q Find candidate data sets: search q Extract, Transform, and Load (ETL): engineering q Data Integration

•  Entity resolution •  Integration computation

Harder

Hard

Page 16: Digital Worlds (applications) · Digital Worlds (applications) ! VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National

Managing Data @ Scale I q Introduction

•  Michael L. Brodie

q Global Data Integration and Global Data Mining •  Chris Bizer

q DB vs RDF: structure vs correlation •  Peter Boncz