Data Virtualization Reference Architectures: Correctly Architecting your Solutions for Analytical...

Preview:

Citation preview

Correctly Architecting your Solutions for Analytical & Operational Uses

Alberto Bengoa, Sr. Product Manager, Denodo

Agenda1.Introduction

2.Operational use cases

3.Analytical use cases

4.Deployment architectures

5.Optimization of use cases

Introduction

4

Introduction

• The Denodo Platform is a powerful middleware

• Horizontal, rather than vertical

• It can be deployed in a range of use cases

• Each use case is different

• The use case guides the way we deploy data virtualization

• Two main groups of use cases:

• Operational (or transactional)

• Analytical (or informational)

Operational Use Cases

6

Operational Use Cases

• Online/interactive work: a human is in an interactive session with

the software

• This requires low latency in the queries

• Queries with low selectivity: each query uses only a small subset

of our data

• Short queries: queries are fast and return small data sets

• Transactional: read & write support with many data sources

• High concurrency: typically there are many clients

Characteristics

7

Operational Use Cases

• Goal: retrieve information about a single customer

• Multiple data sources

• Billing

• Sales

• Technical support

• Potentially write back new information about the customer case

• Many (1000s+) concurrent agents solving customer problems

• Short queries

Example: call center software

Analytical Use Cases

9

Analytical Use Cases

• Offline use cases: queries are typically scheduled

• Latency is not important

• Processing of very big data sets

• Long running queries

• Read-only

• Low concurrency

• Infrequent queries

Characteristics

10

Analytical Use Cases

• Create reports across multiple data sources

• Daily, weekly, monthly, quarterly, yearly…

• Very big input data sets

• Fact tables (sales!)

• Multiple dimensions

• Products

• Territories

• Small output data sets

Example: LDW reporting

Reference Architectures

12

Denodo Platform Reference ArchitecturesOperational use cases

13

Denodo Platform Reference ArchitecturesAnalytical use cases

14

Denodo Platform Reference ArchitecturesAnalytical use cases (extended)

Optimization and Deployments

16

Optimization of Use Cases

• Optimize queries for < execution time

• Online use cases have a human waiting for results

• Increase data source pool sizes

• Data source pools can be a bottleneck with many concurrent queries

• Configure high number of concurrent users

• Make sure no clients are waiting unnecessarily

• Hardware resources (CPU, RAM) permitting

• Use historical analysis of queries to characterize workloads

• Know your queries: median and mean duration, number of concurrent queries, etc.

Operational uses (I)

17

Optimization of Use Cases

• Use partial caching for queries that are very frequent

• Partial caching with low TTL is specially useful during a single agent session

• Scale horizontally: more servers before bigger servers

• Each query is small, so many servers add capacity and resiliency

• Bigger servers do not help with the network bottleneck

• Drop queries vs waiting queues – your choice

• Use case requirements will determine what is best

Operational uses (II)

18

Optimization of Use Cases

• Single sign on, pass-through credentials

• Ensure the data source applies security restrictions when accessing the

data

• Set up low logging levels (logging is slow)

• Make sure logging is done only as absolutely necessary

• Possible to enable higher levels temporarily to diagnose issues

Operational uses (III)

19

Optimization of Use Cases

• Critical to push down processing to the sources

• Analyze the query plans and optimize for delegation

• Push down reduces network data transfers by orders of magnitude

• High memory settings

• Source pushdown is key, but sometimes processing at the data

virtualization layer is needed

• Set up swapping

• Ensure that the server does not run out of memory

• Gracefully degrade performance

Analytical use cases (I)

20

Optimization of Use Cases

• Fast hard drives for swapping

• SSDs will help reduce the impact of accessing primary storage

• Balance query optimization between speed, memory usage and

data source load

• Depending on the limitations of the source systems, the requirements of

the use case, and the hardware of the data virtualization layer

• Set low number of concurrent connections

• Ensure your server is not competing for resources with other queries

Analytical use cases (II)

21

Optimization of Use Cases

• Increase query timeouts

• Standard timeouts are too small for long-running queries

• Monitor server in real time

• Check the health of the servers and ensure everything is ok

• Our monitoring tools allow you to inspect query plans in real time to verify a

correct query behavior

• Use full cache

• To cache slow sources

• To store intermediate results that are reused several times through the report

Analytical use cases (III)

22

Optimization of Use Cases

• Scale vertically: bigger servers before more servers

• Bigger servers allow faster processing of big volumes of data

• More servers do not help in a scenario with few concurrent queries

• Multi-level servers for minimal network transfers and maximum

push down to sources

• Multiple data virtualization servers, close to geographically distributed data

sources

• Each server aggregates local data

• Each server shares with other servers aggregation results (small data sets)

Analytical use cases (IV)

23

Deploying Mixed Solutions

• Use separate servers for separate use cases

• Do not mix use cases in the same server

• Very different performance profiles

• Treat your operational and analytical servers separately

• Assign different hardware resources

• Assign separate security configuration

• Assign separate update policies

• Assign separate code deployment policies

Thanks!

www.denodo.com info@denodo.com

© Copyright Denodo Technologies. All rights reservedUnless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

Recommended