Digital Worlds (applications) VEC (Enterprise Scale) • 1,300 source databases • 10+ million views (via data integration) US Healthcare (National Scale) • Scale o Health care and social assistance offices: 784,626 incl • Doctors offices: 220,131 • Dentists: 127,057 • Hospitals: 6,505 • Clinics: ~5,000 ~= SME say 100 Databases o Patients: 100-300+ million o Databases: ~32 million • Scope o Comprehensive medical events, methods, analysis, … • E.g., Alice (62) in Emergency Room with liver failure o Insurance, payments, … o New metric: healthcare quality • Examples o SHRINE (2009): 3 hospitals; uses 2,381,883 distinct concepts (ontologies) o HHS CIO (Todd Park): Open Health Data Initiative o US (PCAST, White House) vision
1. Digital Worlds (applications) q VEC (Enterprise Scale) 1,300
source databases 10+ million views (via data integration) q US
Healthcare (National Scale) Scale o Health care and social
assistance offices: 784,626 incl Doctors offices: 220,131 Dentists:
127,057 Hospitals: 6,505 Clinics: ~5,000 ~= SME say 100 Databases o
Patients: 100-300+ million o Databases: ~32 million Scope o
Comprehensive medical events, methods, analysis, E.g., Alice (62)
in Emergency Room with liver failure o Insurance, payments, o New
metric: healthcare quality Examples o SHRINE (2009): 3 hospitals;
uses 2,381,883 distinct concepts (ontologies) o HHS CIO (Todd
Park): Open Health Data Initiative o US (PCAST, White House)
vision
2. Observations q Data Sources Massive o Number o Heterogeneity
o Distribution (data at source) o Constant change data, model,
ontology, business rules, Constrained o Governance: privacy,
confidentiality, legal, o Quality, correctness, precision, o
Competition q Critical Requirement: meaningful Human lives Health
of individuals, communities, nation Economic impact: $ trillions /
year Political: meaningless debates
3. Trendsq Digital Universeq Holistic Views Information
Ecosystems: data Ecosystems: Processes over servicesq Big Data:
massive o Number o Distribution o Heterogeneity Semantics
Structure: relational databases, X databases, web, deep web
Technology: databases, data warehouses, files, q New Models:
problem solving, data, Data-driven Social computing: data as social
artifacts Science: Wolfram Alpha Pragmatics: Driven by healthcare
quality improvement
4. Databases and AI: The Twain Just Met q Database World
Engineering (RDBMSs) @ scale Reasoning: Relational model (FoL) q AI
World Reasoning: more powerful & expressive Engineering: in the
small q Digital Universe, e.g., Web Reasoning: beyond the RDM &
AI? Engineering: way beyond RDBMS q Information ecosystems
Databases: join Web: link Power Law of Data The value of a data
element is proportional to the number of its meaningful uses.
5. What Underlies the Digital Universe Modelling Execution Data
Models DBMS Engines Languages Algorithms Semantics Semantics
Problem Solving Computation
6. What Underlies the Data Universe Relational Data
Independence RDBMS Data Model Semantics SemanticsProblem Solving
Computation
7. Relational Database Improvements q Pre-Relational
Hierarchical Network q Relational Row store OLAP / Data Warehouse q
Post-Relational RDF store Column store Bare bones relational Stream
/ complex event processing q Push Down Database / data warehouse
appliances (20+ on the market) In-database analytics, (10+ on the
market)
8. Data Models For New Domains Must HonorData Independence q
Array (Matrix)-store (SciDB) [Linear algebra] q XML databases:
structured content, information exchange q Content management:
e.g., Sharepoint q Graph/network store: social networking
(Facebook), link analysis q Protein store: protein folding, drug
discovery, q Geospatial / map store: location-based applications q
Time series: signal processing, statistical and financial analysis
q Cloud / Mesh data (NoSQL) stores: web scale applications q and
they just keep coming
9. Data Universe Database Universe Relational Data
Universe
10. Data Universe Graph- Network Time Data Series Scientific
Model Data Data Model Model DBU Geo- Spatial RDM Data Model
Document Data Digital Model Media ETC. Data ETC. ETC. Model
11. Data Universe Graph- Network Time Data Series Scientific
Model Data Data Model Model DBU Geo- Spatial RDM Data Model
Document Data Digital Model Media ETC. Data ETC. ETC. Model
12. Data Integration Solution Space:Data Independence Required
Computation Problem Solving Databases Relational Optimal 4
homogeneous Optimal 4 pure relational data relational data
Domain-specific Emerging Emerging Semantic Technologies (AI)
Knowledge Representation Minimal Powerful Ontologies Minimal
Powerful Semantic Web Modest / emerging Modest / emerging Semantic
Data Management Emerging Emerging Architectural
Information-As-A-Service Emerging Emerging Cloud Emerging N/A
13. Databases vs. Semantic Web Discrete Worlds Heterogeneous
WorldsSingle Versions of Truth Multiple Truths Data Models LOD
Models? Mathematical Logic What Logic ? 1,000s of
databasesProbabilistic / Eventual Common Sense Reasoning Reasoning?
DI: Relational Join DI: Evidence Gathering Databases Semantic
Web
14. Databases vs. Web Web Explora2on Mul2ple versions of truth
. . . Analysis / BI Evidence Gathering Data Warehouses Scale . . .
Seman+cally Heterogeneous Views Single versions Data Management of
truth . . . Seman+cally Homogeneous Databases
15. Data Integration q Query: define the result Entity
Computation q Find candidate data sets: search Hard q Extract,
Transform, and Load (ETL): engineering q Data Integration Entity
resolution Harder Integration computation
16. Managing Data @ Scale I q Introduction Michael L. Brodie q
Global Data Integration and Global Data Mining Chris Bizer q DB vs
RDF: structure vs correlation Peter Boncz