20
Attribution-NonCommercial-No Derivative http://creativecommons.org/licenses/by-nc-nd/3.0/us/ How Real Time Data Requirements Change the Data Warehouse Environment Mark Madsen – September 17, 2008 www.ThirdNature.net

How Real TIme Data Changes the Data Warehouse

Embed Size (px)

DESCRIPTION

Surveys show a growing demand for more up-to-date data in our BI environments. To meet these needs requires changing from a strict reliance on nightly batch-style ETL to other methods. What is often ignored is how this affects the data warehouse. This shift introduces new technology and methods, which means the warehouse must support new types of workloads. • Methods and tools for processing up-to-date data • New requirements for your data warehouse database or platform • What to look for as you address these requirements

Citation preview

Page 1: How Real TIme Data Changes the Data Warehouse

Attribution-NonCommercial-No Derivativehttp://creativecommons.org/licenses/by-nc-nd/3.0/us/

How Real Time Data Requirements Change the Data Warehouse EnvironmentMark Madsen – September 17, 2008www.ThirdNature.net

Page 2: How Real TIme Data Changes the Data Warehouse

Slide 2Third Nature, January 2008 Mark Madsen

OutlineWhat’s real-time about?

Impacts on the data warehouse architecture

Delivering data to users

Extracting the data

Storing the data

Operations

Getting started

Page 3: How Real TIme Data Changes the Data Warehouse

Slide 3Third Nature, January 2008 Mark Madsen

Speeding Up the Data Warehouse

Why?Faster reaction time

Reduced decision time

New process capabilities

Page 4: How Real TIme Data Changes the Data Warehouse

Slide 4Third Nature, January 2008 Mark Madsen

Which Decisions Benefit?

Most real time needs will be driven by operational decision making, not strategic decisions.

Strategic Operational

Decision time flexible, long cycle constrained, short cycle

Decision scope broad, organizational narrow, departmental or process

Decision model Complex Simple

Data latency High, history is core to decisions

Low, recent data is core to decisions

Data scope Many sources, many types, aggregated

Few sources, structured, detailed

Page 5: How Real TIme Data Changes the Data Warehouse

Slide 5Third Nature, January 2008 Mark Madsen

Strategy, Decisions and Data Latency

Increase share of low to mid market customers

Efficient sourcing

Consolidate suppliers

Decrease Out of StocksTactics

Reduce cost of products soldStrategy

Goal

Improve promotional performance

Catch out of stocks before they occur

Improve delivery compliance

Reports & spreadsheets

Dashboards, alerts & scorecards

Real time alerts & embedded analytics

BI Needs

Page 6: How Real TIme Data Changes the Data Warehouse

Slide 6Third Nature, January 2008 Mark Madsen

What People Are Doing Today

3

27%

24

34%

44

69% 15%

29

19%

6%32%

29% 65% 30%

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

2006

2004

2002

Monthly Weekly Daily Multiple times per day On demand

Sources: TDWI, Gartner

At the same time, data volumes are rising for most data warehouses at 50% to 100% per year.

Page 7: How Real TIme Data Changes the Data Warehouse

Slide 7Third Nature, January 2008 Mark Madsen

BI Efforts Involving Real Time Data Access

Terms you may hear from the BI market that imply real time:

Operational BIEmbedded analyticsDecision automationComplex event processingEvent-driven BIProcess-driven BI

They are all similar in requiring some level of low latency data access.

Page 8: How Real TIme Data Changes the Data Warehouse

Slide 8Third Nature, January 2008 Mark Madsen

Impacts on the DW Architecture

Databases Documents Flat Files XML Queues ERP Applications

Source Environments

Databases Dashboards OLAP Productivity BAM/BPM Reporting Analytics Applications

Data Consumers

Delivery

Warehouse Database

ETL

Mart

ODS

EDR EII

Content Store

Adding current data to the system requires effort at all three layers

DW Platforms

Page 9: How Real TIme Data Changes the Data Warehouse

Slide 9Third Nature, January 2008 Mark Madsen

In-line with process:• Real time data flows separately

from the warehouse data• May include a low-latency data

store in the real time environment• This model be needed for

extremely low latency data• More applicable for event-driven

Out of band:• Data to the consumer first flows

through the DW• Unified architecture for both low

and high latency data• More applicable for on-demand

One Architecture or Two?

Batch DWBI

Process

RT BI

DW

BI & RT BI

ProcessProcess

Page 10: How Real TIme Data Changes the Data Warehouse

Slide 10Third Nature, January 2008 Mark Madsen

User Interface: Two BI Usage ModelsDemand driven• Users ask for current data• Most BI tools work this way• Harder to adapt these tools to

event-driven models

Event driven• System takes action based on

data, e.g. alerts, rule engines• May not have (or need) an end

user interface• Need understanding of decision

& action process for this model

Page 11: How Real TIme Data Changes the Data Warehouse

Slide 11Third Nature, January 2008 Mark Madsen

BI Tools Need New Capabilities

Embedding BI within applications

• UI embedding• Full embedding

Event-based integration

Feeding BI data to applications: services, not SQL, may be desired

Custom UI code may be preferable to a BI tool

Page 12: How Real TIme Data Changes the Data Warehouse

Slide 12Third Nature, January 2008 Mark Madsen

The Data Integration Layer• Integration is the most complex

element of adding real time data.• Inline vs. out of band, demand vs.

event-driven BI usage create different DI requirements.

• You may not have exactly the same metrics, attributes or data extract logic.

• Don’t count on replacing the ETL batch; more likely you are augmenting it.

• You probably need to add new DI technologies to your portfolio.

• Batch performance design isn’t like real time design.

Page 13: How Real TIme Data Changes the Data Warehouse

Slide 13Third Nature, January 2008 Mark Madsen

Speeding Up Data Integration Methods

Hourly+

Single batch

Frequent batch

Continuous load

Streaming

Immediate

Mini-batch

Page 14: How Real TIme Data Changes the Data Warehouse

Slide 14Third Nature, January 2008 Mark Madsen

The Platform Layer: Data and Database

• Schemas will need changes.• You don’t need to convert the

entire database to a real time schema.

• One schema or two?• Event-driven BI creates

different query patterns and workloads.

• Configuration and tuning may be different than what you are used to with traditional BI.

• Application developers want services or ORMs, not SQL.

Page 15: How Real TIme Data Changes the Data Warehouse

Slide 15Third Nature, January 2008 Mark Madsen

Different Platform Workloads

Three workloads:

Data loading +Normal BI +Real time BI

= complications

Databases Documents Flat Files XML Queues ERP Applications

Source Environments

Databases Dashboards OLAP Productivity BAM/BPM Reporting Analytics Applications

Data Consumers

Delivery

Warehouse Database

ETL

Mart

ODS

EDR EII

Content Store

DW Platforms

Page 16: How Real TIme Data Changes the Data Warehouse

Slide 16Third Nature, January 2008 Mark Madsen

Development, Maintenance & Operations

• Real time decisions on real time data mean data quality plays a larger role, and it’s harder to address.

• Warehouse availability becomes much more important to the business, and it isn’t just the database – it’s everything.

• Performance and meeting strict BI SLAs will rise in importance since you are now tied in to business operations.

Page 17: How Real TIme Data Changes the Data Warehouse

Slide 17Third Nature, January 2008 Mark Madsen

A Prescription for Getting Started1. Star with a decision

process2. Define data needs for the

process3. Ensure that data is

available at the right latency

4. Determine appropriate data integration technologies.

5. Design and initiate upstream work

6. Build

Page 18: How Real TIme Data Changes the Data Warehouse

Slide 18Third Nature, January 2008 Mark Madsen

Thanks

Page 19: How Real TIme Data Changes the Data Warehouse

Slide 19Third Nature, January 2008 Mark Madsen

Thanks to the people who supplied the creative commons licensed images used in this presentation:• Divers - http://flickr.com/photos/raveller/ • Fast dog - http://flickr.com/photos/marinacvinhal/379111290/• Febo - http://flickr.com/photos/igor/419425754/• Subway - http://flickr.com/photos/neilsphotoalbum/504517855/• Cadillac ranch - http://flickr.com/photos/whatknot/179655095/

CC Image Attributions

Page 20: How Real TIme Data Changes the Data Warehouse

Page 20

About the PresenterMark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, data integration and data management. Mark is an award-winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net.