22
Data Virtualization: Fulfilling the Promise of Data Lakes Dr. Christian Kurze Principal Sales Engineer – DACH [email protected] [email protected]

Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

Embed Size (px)

Citation preview

Page 1: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

Data Virtualization: Fulfilling the Promise of Data LakesDr. Christian KurzePrincipal Sales Engineer – [email protected]

[email protected]

Page 2: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

2

Key qestions I want to answer today What is Data Virtualization? How to leverage Hadoop Data Lakes to support Internet of Things / Operational Data Store / Offloading / … use cases? How to query Hadoop Data Lakes combined with any other structured, semi-structured and unstructured data sources using a single logical data lake? What about Cloud? How to avoid Data Swamps via a light weight data governance approach that helps enterprises maximize the value of their Data Lake? How to use a logical data lake/data warehouse to prevent a physical data lake from becoming a silo?

Agenda

Page 3: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

3

Status Quo – Data IntegrationAccess to all information

MarketingSales ExecutiveSupport

Access to complete information … in an economically meaningful way … real-time and in high quality incl.

monitoring, security and audit

Cross-sell / Up-sellChannel

WarrantyProduct Customer

Database

Apps Warehouse Cloud

Big DataDocuments

AppsNoSQL

Manual Access to legacy systems and constantly new technologies – IoT, Big Data, Cloud

Point-to-Point connections Too slow projects for new initiatives

– from disparate silos and technologiesThe Requirement…

… versus the current architecture

Page 4: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

4

Status Quo – Data IntegrationAccess to all information

MarketingSales ExecutiveSupport

Access to complete information … in an economically meaningful way … real-time and in high quality incl.

monitoring, security and audit

Cross-sell / Up-sellChannel

WarrantyProduct Customer

Database

Apps Warehouse Cloud

Big DataDocuments

AppsNoSQL

Manual Access to legacy systems and constantly new technologies – IoT, Big Data, Cloud

Point-to-Point connections Too slow projects for new initiatives

– from disparate silos and technologiesThe Requirement…

… versus the current architecture

„My architecture works fine, but I am not able to access all my silos.“- Enterprise Data Architect• Different locations• Different technologies• Different data structures• Too large datasets to move them• Different APIs and access methods• Excessive use of ETL to copy data• Synchronization issues

Page 5: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

5

The SolutionData Virtualization as a Data Abstraction Layer

DATA ABSTRACTION LAYER

Central repository to access all dataAbstracts the underlying technology of the data sourcesEnables the definition of a semantic data modelOffers a metadata-rich catalogMultiple access methods:

SQL basedKeyword based search (via index)RESTful navigation (hyperlinks) Native support for nexted documentstructures (XML, JSON, …)

Page 6: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

6

Modelling in a Data Virtualization Solution

Sources

Combine, Transform & Integrate

Publish

Base View (Source Abstraction)Client Address ClientType Company Invoicing ServiceUsageProduct Logs WebIncidents

Customer Invoice Product

Customer Invoicing

Service Usage Incident

Hadoop Web SiteRest Web Service MultidimensionalSalesforceSQL ServerOracle

SQL, SOAP, REST, ODATA, Message Queues (JMS), etc.. Denodo’sInformation Self ServiceIndependent of theaccess method – all views use the same metadata and accessprivileges

Page 7: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

7

Common Data Virtualization Use CasesData Virtualization

BIG DATA, CLOUD INTEGRATION Advanced Analytics Data Warehouse Offloading Big Data for Enterprise Cloud / SaaS Integration

AGILE BUSINESS INTELLIGENCE Logical Data Warehouse Virtual Data Marts Self-Service BI Operational BI / Analytics

SINGLE VIEW APPLICATIONS Single Customer View - Call Centers, Portals Single Product View - Catalogs Single Inventory View - Inventory Reconciliation Vertical Specific - Single View of Wells

DATA SERVICES Unified Data Services Layer Logical Data Abstraction Agile Application Development Linked Data Services

Page 8: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

8

DWH & MartsAdvanced Analytics(multiple structures) Advanced Analytics(structured) MDMStreams

Multiple platforms optimized for different WorkloadsAdditionally in a hybrid environment: OnPrem vs. Cloud

CR

UD

NoSQL / Graph DBData Lake: Hadoop / Spark / Hive / …

EDWMart

DW ApplianceDW ApplianceCust

Prod

Real-time streamprocessing & decisionmanagementGraph analysisGraph analysis

Investigative analysis, data refineryData mining, modeldevelopment

Data mining, modeldevelopment

Traditional query, reporting & analysisGovernedcontextinformation

Traditional query, reporting & analysis

Page 9: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

9

Business requires a combination of dataMDM

CR

UD

Hadoop

CustProd

Who are our customers?What products do we sell?

What are the most popularnaviational paths throughour web site that led tohigh-fee products?

Who are our most loyal, lowrisk customers that generatelow fees?

What is the online behaviorof our loyal, low risk, low feecustomers so that we canoffer them higher feeproducts?

Where do I find this data? How to combine this data? How to share it with mycolleagues? What abouttheir access privileges?

EDW

Page 10: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

Big Data ConnectivityBigData and Cloud Databases Connectivity■ Hadoop Ecosystem:

■ SQL on Hadoop: Hive, Impala, Presto,… ■ HDFS, Parquet, Avro, CSV…■ Execution of map/reduce Jobs■ Certified with major Hadoop distributions

■ In-memory platforms: Apache Spark SQL, Presto DB, HANA,…■ Parallel DWs and Appliances: Vertica, Impala, Teradata, Greenplum,…■ Cloud RDBMS: Redshift, Snowflake, DynamoDB,…■ NoSQL (MongoDB, CouchDB, Neo4J, Redis, Oracle NoSQL, Cassandra, etc.)■ Streaming data (Spark streams, Splunk, IBM Streams, Kafka,…)

10

Enhanced Adapters for Big Data ecosystem

Delimited text filesSequence filesMap filesAvro files

Page 11: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

11

How to provide access by multiple tools and technologies?

DWH MDM Hadoop Appliances NoSQL ExternalServices

Excel / MS BI Tableau Power BI Composite Desktop 360 Views Cockpit Other Applications

Complex Security Policies? RBAC? Single Sign On (Kerberos) Governance / Audit Fast Prototyping? Automated Processes? Manual development of Service Layer?

Source Changes New Attributes and Requirements Accounting of source usage(cloud migration pending) Refactoring of sources New Sources

Page 12: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

12Marketing

Data Lakes

Research

Logical Data Lake

Finance

Self-ServiceAnalyticsOperationalApps

A Single Governed Logical Data LakeData Virtualization combines one or more physical data lakes with other enterprise data to create a “virtual” or “logical” data lake.

Other Data Sources

MDM Cloud Apps

BI/AnalyticalToolsExcelReports

DATA VIRTUALIZATIONSemanticModel Data Discovery MetadataCatalog Security Governance

Denodo Platform Bridges Distinct Data Architectures

Simplified Architecture Single Point of Access Lower TCO Lower Operational Costs Improved Agility Improved Flexibility Consistency and Integrityfor multiple tools

Page 13: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

13

Information Self ServiceE/R diagram

1Click on a view to navigate to the details

2 Hover on the arrows to show the details of the PK-FK relationships

Page 14: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

14

Information Self ServiceBrowse Metadata Catalog

1Browse and searchvirtual databases

2 Browse and search available views

3 Review metadata and descriptions

4 Query the view

Page 15: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

15

Information Self ServiceSearch Metadata Catalog

1 Full-text search within view metadata (name, column names, descriptions)

2 Show additional view information and query data

Page 16: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

16

Information Self ServiceQuerying Data

1Access to the Denodo catalog

2 Query and filter for data

3 Click on the green arrows to drill down into related information

Page 17: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

17

Information Self ServiceData Lineage 1 Select Data Lineage for the View

2 Select column to see lineage

3 Hover and click the icons to see details

Page 18: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

18

Telematics & Predictive MaintenanceLeading Construction Manufacturer

Dealer

Maintenance

Parts InventoryOSI PI Hadoop Cluster

Tableau: Dealer / Customer Dashboard

Page 19: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

19

Business Benefits Improved asset performance and proactive maintenance. Reduced warranty costs due to proactive maintenance of

parts preventing parts failure. Optimized pricing for services and parts among global service

providers. New Business Model opportunities based on real-time

analysis of detailed sensor data.

Page 20: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

20

How can I get started?

Read New Whitepaper by Rick F. Van der LansDeveloping a Bimodal Logical Data Warehouse Architecture Using Data VirtualizationRegister at: http://bit.ly/2frs782Get Started Today!Download Denodo Express: www.denodoexpress.comAccess Denodo on AWS:www.denodo.com/en/denodo-platform/denodo-platform-for-aws

Page 21: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"
Page 22: Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise of Data Lakes"

www.denodo.com [email protected]© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.