26
Ziyu Lin Peking University Oct 19, 2006 Survey Report on Real-time Data Warehouses Department of Computer Science and Technology, Peking University, Oct 19, 2006

Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Ziyu Lin

Peking University

Oct 19, 2006

Survey Report on Real-time Data Warehouses

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 2: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Project introduction

Real-Time Data Warehousing:

Challenges and Solutions

Our Research Work

Reference

Outline

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 3: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Project Background

Reporting Analysis Prediction Operationalize Activate

1 2 3 4 5

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 4: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Scalable, Real-Time and Active MPP based Data

Warehouse For Telecommunications Industry

advance the Real-Time Active Data Warehouse

(RTADW) technology on a scalable, parallel

database system platform

demonstrate its applicability in tackling emerging

business system challenges in the

telecommunication industry.

Project Objective

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 5: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Project introduction

Real-Time Data Warehousing:

Challenges and Solutions

Our Research Work

Reference

Outline

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 6: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Real-Time Data Warehousing: Challenges and Solutions

4 – scalability & query contention

3 – OLAP queries vs. changing data

2 – modeling real-time fact tables

1 – enabling real-time ETL

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 7: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Challenge 1:Enabling Real-time ETL

– almost all ETL tools and

systems, whether based on

off-the-shelf products or

custom-coded, operate in a

batch mode.

– ETL process typically

involves downtime of the

data warehouse

ETL in batch mode

– there can't be any system

downtime

– The requirements for

continuous updates with no

warehouse downtime are

generally inconsistent with

traditional ETL tools and

systems.

Real-time ETL

Problem

Statement

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 8: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Near real-

time ETL

Real-time ETL

External

Real-time

Data Cache

Direct trickle

feedTrickle & Flip

1

2 3

4

Challenge 1:Enabling Real-time ETL

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 9: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Challenge 2:Modeling Real-time Fact Tables

– where the real-time data is

stored

– how best to link it into the rest

of the data model

Problem Statement

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 10: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Modeling as

Usual with

Direct Fact

Table FeedModeling Real-

time Fact Tables

Modeling with

an External

Real-time

Data Cache

Separate

Real-time

Partition

Integrated

Real-time

through Views

1

2 3

4

Challenge 2:Modeling Real-time Fact Tables

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 11: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Challenge 3:OLAP Queries vs. Changing Data

– OLAP and Query tools were designed to

operate on top of unchanging, static historical

data

– the results may be negatively influenced by

data changes concurrent to query execution

– relational OLAP tools are particularly sensitive

to this problem

Problem

Statement

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 12: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

OLAP

Queries vs.

Changing

Data

Use a Near

Real-time

Approach

Use an

External

Real-time

Data Cache

Risk

Mitigation

for True

Real-time

1

23

Challenge 3:OLAP Queries vs. Changing Data

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 13: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Challenge 4: Scalability & Query Contention

the issue of query contention and scalability is the most difficult

issue facing organizations deploying real-time data warehouse

solutions.

1

Problem Statement

in a real-time system, the additional burden of continuously

loading and updating data further strains system resources. 1

the contention between complex selects and continuous inserts

tends to severely limit scalability 1

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 14: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Reverse Just-in-

time Data Merge

Just-in-time

Information

Merge

Simplify and

Limit Real-time

Reporting

Apply More

Database

Horsepower

Separate &

Isolate in a Real-

time Data Cache

Scalability &

Query

Contention

5

1

2

34

Challenge 4: Scalability & Query Contention

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 15: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Project introduction

Real-Time Data Warehousing:

Challenges and Solutions

Our Research Work

Reference

Outline

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 16: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Introduction

•OLAP and Query tools:

designed to operate on top of unchanging, static historical data

assume that the underlying data is not changing

the results they produce may be negatively influenced by data changes concurrent to query execution

In some cases, this can lead to inconsistent and confusing query results, which is called internal inconsistency of report.

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 17: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Introduction

0:00 create table TEMP1(

Category_Id LONG, DOLLARSALES

DOUBLE)

0:01 insert into TEMP1

select all.[Category_Id] AS Category_Id,

sum (all.[Tot_Dollar_Sales]) AS

DOLLARSALES

from [YR_CATEGORY_SLS] all

group by all.[Category_Id]

0:05 create table TEMP2 (ALLPRODUCTSD

DOUBLE)

0:06 insert into TEMP2

select sum((all.[Tot_Dollar_Sales]) AS

ALLPRODUCTSD

from [YR_CATEGORY_SLS] all

0:08 select distinct pa1.[Category_Id] AS

Category_Id,

all.[Category_Desc] AS

Category_Desc,

all.[DOLLARSALES] AS

DOLLARSALES,

(pa1.[DOLLARSALES]/pa2.[ALLPRODUCTSD])

AS DOLLARSALESC

from [TEMP1] pa1,

[TEMP2] pa2,

[LU_CATEGORY] all

where

pa1.[Category_Id]=all.[Category_Id]

0:09 drop table TEMP1

0:10 drop table TEMP2

Table 1: A multi-pass SQL statement

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 18: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

The description of layer-based view approach

Layer 1 Layer 2 Layer 3

Layer 1

Layer 2

Layer 3

This is what we see

Figure Layer technology used in painting software

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 19: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

The description of layer-based view approach

Layer 1 Layer 2

This is what OLAP tools

see at time T2

data at time T1 with the

top-right part locked

changed data between

T1 and T2 in locked area

View 1

( Layer 1+ Layer 2 )

timeT1 T2

When to update already read-locked data,

Layer 2 is generated above Layer 1.

Layer 1

Layer 2

timeT1 T2

read

lock

write

lock

Figure Layer technology used in this paper

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 20: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

The architecture of real-time OLAP

Data Source

ETL

RTDC

76

95

116

128

60

76

93102.7

40

60

80

100

120

140

00年 05年 10年 15年

预测1

预测2

Data Warehouse

OLAP

batch

load

Data Source

Data Source

real-

time

load

Figure The architecture of real-time OLAP

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 21: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

The architecture of real-time OLAP

•Data modeling

no special data modeling is required

generally modeled identically to the data warehouse

typically contains only the tables that are real-time.

•Data integrating

batch loading

change data capture

•Data merging

JIM (Just-in-time information merging )

RJIM (Reverse Just-in-time information merging)

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 22: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Project introduction

Real-Time Data Warehousing:

Challenges and Solutions

Our Research Work

Reference

Outline

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 23: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Reference

[1] Langseth, J., "Real-Time Data Warehousing: Challenges and Solutions",

DSSResources.COM, 02/08/2004.

[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan

Goldstein. Relaxed currency and consistency: how to say "good enough" in

SQL. In SIGMOD 2004.P:815-826.

[3] Robert M. Bruckner and A.M. Tjoa. Managing Time Consistency for

Active Data Warehouse Environments. DaWaK 2001, LNCS 2114, pp. 254–

263, 2001.

[4] Robert M. Bruckner and A.M. Tjoa. Capturing Delays and Valid Times in

Data Warehouses—Towards Timely Consistent Analyses. Journal of

Intelligent Information Systems, 19:2, 169–190, 2002

[5] Zaniolo, C., Ceri, S., Faloutsos, C., Snodgrass, R.T., Subrahmanian,V.S.,

and Zicari, R. Advanced DatabaseSystems. San Francisco: Morgan

Kaufmann Publishers. 1997.

[6] Gregersen, H. and Jensen, C.S. Temporal Entity-Relationship Models—A

Survey. IEEE Transactions on Knowledge and Data Engineering, 11(3),

464–497, 1999.

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 24: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Reference

[7] Widom, J. (1995). Research Problems in Data Warehousing. In Proc. of

the 5th Intl. Conference on Information and Knowledge Management (CIKM),

Baltimore, Maryland (pp. 25–30). ACM-Press.

[8] Yang, J.. Temporal DataWarehousing. Ph.D. Thesis, Department of

Computer Science, Stanford University. 2001.

[9] Yang, J. and Widom, J. .Maintaining Temporal Views over Nontemporal

Information Sources for Data Warehousing. In Proc. of the 6th Intl. Conf. On

Extending Database Technology (EDBT), Valencia, Spain, Springer LNCS

Vol. 1377 (pp. 389–403). 1998.

[10] Yang, J. and Widom, J.. Temporal View Self-Maintenance. In Proc. of

the 7th Intl. Conf. On Extending Database Technology (EDBT), Konstanz,

Germany, Springer LNCS, Vol. 1777 (pp. 395–412). 2000.

[11] Pedersen, T.B., Jensen, C.S., and Dyreson, C.E.. A Foundation for

Capturing and Querying Complex Multidimensional Data. Information

Systems, 26(5), 383–423.2001.

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 25: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

Reference

[12] Tho, M. Nguyen; Tjoa, A. Min. ―Zero-Latency Data Warehousing for

Heterogeneous Data Sources and Continuous Data Streams‖;Proceedings

of iiWAS 2003, Fifth International Conference on Information and Web-

based Applications Services, Jakarta, Indonesia; Austrian Computer Society

(OCG) (2003), 3-902134-72-0; 55–64.

[13] S. S. Conn. OLTP and OLAP data integration: a review of feasible

implementation methods and architectures for real time data analysis. In:

Southeast Con, 2005. Proceedings. IEEE., pages 515-520, 2005.

[14] Itamar Ankorion. Change Data Capture – Efficient ETL for Real-Time BI.

Article published in DM Review Magazine, January 2005 Issue.

[15] T. Thalhammer and M. Schrefl. Realizing active data warehouses with

off-the-shelf database technology. software-Practice & Experience, ACM,

32(12), pages 1193-1222, 2002.

Department of Computer Science and Technology, Peking University, Oct 19, 2006

Page 26: Survey Report on Real-time Data Warehouses[2] Hongfei Guo, Per-Ake Larson, Raghu Ramakrishnan, Jonathan Goldstein. Relaxed currency and consistency: how to say "good enough" in SQL

北京大学计算机系数据库实验室 2006年8月18日

主动数据仓库调研报告

Department of Computer Science and Technology, Peking University, Oct , 2006