24
Correctly Architecting your Solutions for Analytical & Operational Uses Alberto Bengoa, Sr. Product Manager, Denodo

Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

  • Upload
    denodo

  • View
    101

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Correctly Architecting your Solutions for Analytical & Operational Uses

Alberto Bengoa, Sr. Product Manager, Denodo

Page 2: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Agenda1.Introduction

2.Operational use cases

3.Analytical use cases

4.Deployment architectures

5.Optimization of use cases

Page 3: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Introduction

Page 4: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

4

Introduction

• The Denodo Platform is a powerful middleware

• Horizontal, rather than vertical

• It can be deployed in a range of use cases

• Each use case is different

• The use case guides the way we deploy data virtualization

• Two main groups of use cases:

• Operational (or transactional)

• Analytical (or informational)

Page 5: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Operational Use Cases

Page 6: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

6

Operational Use Cases

• Online/interactive work: a human is in an interactive session with

the software

• This requires low latency in the queries

• Queries with low selectivity: each query uses only a small subset

of our data

• Short queries: queries are fast and return small data sets

• Transactional: read & write support with many data sources

• High concurrency: typically there are many clients

Characteristics

Page 7: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

7

Operational Use Cases

• Goal: retrieve information about a single customer

• Multiple data sources

• Billing

• Sales

• Technical support

• Potentially write back new information about the customer case

• Many (1000s+) concurrent agents solving customer problems

• Short queries

Example: call center software

Page 8: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Analytical Use Cases

Page 9: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

9

Analytical Use Cases

• Offline use cases: queries are typically scheduled

• Latency is not important

• Processing of very big data sets

• Long running queries

• Read-only

• Low concurrency

• Infrequent queries

Characteristics

Page 10: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

10

Analytical Use Cases

• Create reports across multiple data sources

• Daily, weekly, monthly, quarterly, yearly…

• Very big input data sets

• Fact tables (sales!)

• Multiple dimensions

• Products

• Territories

• Small output data sets

Example: LDW reporting

Page 11: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Reference Architectures

Page 12: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

12

Denodo Platform Reference ArchitecturesOperational use cases

Page 13: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

13

Denodo Platform Reference ArchitecturesAnalytical use cases

Page 14: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

14

Denodo Platform Reference ArchitecturesAnalytical use cases (extended)

Page 15: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Optimization and Deployments

Page 16: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

16

Optimization of Use Cases

• Optimize queries for < execution time

• Online use cases have a human waiting for results

• Increase data source pool sizes

• Data source pools can be a bottleneck with many concurrent queries

• Configure high number of concurrent users

• Make sure no clients are waiting unnecessarily

• Hardware resources (CPU, RAM) permitting

• Use historical analysis of queries to characterize workloads

• Know your queries: median and mean duration, number of concurrent queries, etc.

Operational uses (I)

Page 17: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

17

Optimization of Use Cases

• Use partial caching for queries that are very frequent

• Partial caching with low TTL is specially useful during a single agent session

• Scale horizontally: more servers before bigger servers

• Each query is small, so many servers add capacity and resiliency

• Bigger servers do not help with the network bottleneck

• Drop queries vs waiting queues – your choice

• Use case requirements will determine what is best

Operational uses (II)

Page 18: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

18

Optimization of Use Cases

• Single sign on, pass-through credentials

• Ensure the data source applies security restrictions when accessing the

data

• Set up low logging levels (logging is slow)

• Make sure logging is done only as absolutely necessary

• Possible to enable higher levels temporarily to diagnose issues

Operational uses (III)

Page 19: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

19

Optimization of Use Cases

• Critical to push down processing to the sources

• Analyze the query plans and optimize for delegation

• Push down reduces network data transfers by orders of magnitude

• High memory settings

• Source pushdown is key, but sometimes processing at the data

virtualization layer is needed

• Set up swapping

• Ensure that the server does not run out of memory

• Gracefully degrade performance

Analytical use cases (I)

Page 20: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

20

Optimization of Use Cases

• Fast hard drives for swapping

• SSDs will help reduce the impact of accessing primary storage

• Balance query optimization between speed, memory usage and

data source load

• Depending on the limitations of the source systems, the requirements of

the use case, and the hardware of the data virtualization layer

• Set low number of concurrent connections

• Ensure your server is not competing for resources with other queries

Analytical use cases (II)

Page 21: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

21

Optimization of Use Cases

• Increase query timeouts

• Standard timeouts are too small for long-running queries

• Monitor server in real time

• Check the health of the servers and ensure everything is ok

• Our monitoring tools allow you to inspect query plans in real time to verify a

correct query behavior

• Use full cache

• To cache slow sources

• To store intermediate results that are reused several times through the report

Analytical use cases (III)

Page 22: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

22

Optimization of Use Cases

• Scale vertically: bigger servers before more servers

• Bigger servers allow faster processing of big volumes of data

• More servers do not help in a scenario with few concurrent queries

• Multi-level servers for minimal network transfers and maximum

push down to sources

• Multiple data virtualization servers, close to geographically distributed data

sources

• Each server aggregates local data

• Each server shares with other servers aggregation results (small data sets)

Analytical use cases (IV)

Page 23: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

23

Deploying Mixed Solutions

• Use separate servers for separate use cases

• Do not mix use cases in the same server

• Very different performance profiles

• Treat your operational and analytical servers separately

• Assign different hardware resources

• Assign separate security configuration

• Assign separate update policies

• Assign separate code deployment policies

Page 24: Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical & Operational Uses

Thanks!

www.denodo.com [email protected]

© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.