Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Data Aggregation in Today's Data Aggregation in Today's Data WarehouseData Warehouse
New England Business Objects New England Business Objects User GroupUser Group
Yossi MatiasYossi MatiasCTOCTO
HyperRollHyperRoll
2 1/24/2006
Recap Recap –– BI made easyBI made easy
3 1/24/2006
Reports made easyReports made easy
4 1/24/2006
…… and wide spread ..and wide spread ..
5 1/24/2006
Business Objects XI Platform CapabilitiesBusiness Objects XI Platform Capabilities
• High performance• Scalability• Reliability• Service-oriented architecture
• But what about the underlying data warehouse?
6 1/24/2006
An example applicationAn example application
7 1/24/2006
BO UniverseBO Universe
8 1/24/2006
Aggregate queriesAggregate queries
9 1/24/2006
Typical Data WarehouseTypical Data Warehouse
• A schema• Methodology • Lots of summary tables• Table management
challenges– Numbers of tables– Complex configurations– Table refresh– Redundant storage
10 1/24/2006
WhatWhat’’s wrong with this picture?s wrong with this picture?
Multiple Views of Multiple Views of summary tables and summary tables and
complex universecomplex universe
11 1/24/2006
Performance Fundamentals:Performance Fundamentals:AggregationsAggregations
Number of AggregationsNumber of Aggregations
Tim
eTi
me
Processing Time
Processing TimeQuery Time
Query Time
PrePre--calculated summaries of datacalculated summaries of dataIntersections of levels from each dimensionIntersections of levels from each dimensionTradeoff between processing and query timesTradeoff between processing and query times
12 1/24/2006
The Summary Table DilemmaThe Summary Table Dilemma
# of Summary Tables
Que
ry P
erfo
rman
ce
ROLAP enginesrequire a steadydiet of summarytables to perform
Maintenance B
urden
Unbearable
Simple
A few querieshave acceptablePerformance…
….but the majority of queries,especially ad-hoc requests,perform poorly and system
adoption suffers
At some point summary tablemaintenance becomes
unbearable
13 1/24/2006
Typical Data Warehouse EnvironmentsTypical Data Warehouse Environments
Applications Databases Flat Files MainframeEAI/EDI
ETL LayerETL Layer
Data Warehouse ODS
Data marts
$ $!Summary TablesMultidimensional
Data StoresBursted Reports Data AlertsCached Reports Extra Hardware
Memory, CPUs
NEED FOR REAL TIME INFORMATIONLow High
Poor Query Performance & Poor User Concurrency
DSS Ad-hocQuery
Budgeting &Planning
OperationalBI
CPM BAM Real-TimeDashboards
Longer Batch Window
14 1/24/2006
On the limitation of RDBMS On the limitation of RDBMS
“In fact, relational DBMS were never intended to provide the very powerful functions for data synthesis, analysis, and consolidation that is being defined as multi-dimensional data analysis.
These types of functions were always intended to be provided by separate, end-user tools that were outside and complementary to the relational DBMS products.”
E.F. Codd, S.B. Codd and C.T. SalleyProviding OLAP to User-Analysts: An IT Mandate
15 1/24/2006
The CatchThe Catch--22 of data aggregation in DW22 of data aggregation in DW
• We want a Data Warehouse that performs data aggregations effectively
• The Data Warehouse should ideally consist of relational databases
• Relational databases are not set to support effectively data aggregation
16 1/24/2006
The HyperRoll approachThe HyperRoll approach
• Build an effective non-relational data aggregation server
• Have the data aggregation server provide “aggregation services” to a relational database
• As a result, have a HyperRoll enabled relational database that effectively supports aggregations
17 1/24/2006
DBMS
DB2 CLI ODBC Oracle OCI ASCII
HyperRoll for RelationalHyperRoll for Relational
Access
Storage
Loading
FACTTABLE
ETLData is loaded into
HR in order to build aggregates
Hyp
erR
oll E
ngin
e
DBMSViewGateway
Up to 90% reduction in batch window compared to existing aggregation strategies
Benefit
Summary table storage &
maintenance reduced or eliminated
Benefit
Up to 100x faster queries, and endusers continue to use familiar
applications
Benefit
18 1/24/2006
HyperRollHyperRoll--enabled Data Warehouseenabled Data Warehouse
Hyp
erR
oll
StarSchema AggregatesView
ROLAP Queries (SQL)
Data Warehouse or Mart
10x – 100x performanceimprovement Replace or
ComplementSummary Tables(but does NOT Build or storesSummary tables)
19 1/24/2006
A DW implementationA DW implementation
Fact1
Fact2
HyperRoll
MV1
MV3
MV4
MV5
MV9
MV10
MV2
MV6
MV7
MV8
MV11
MV12
Query Tools
400 Millions
36 Millions
20 1/24/2006
Typical Data WarehouseTypical Data Warehouse
• A schema• Methodology • Lots of summary tables• Table management
challenges– Numbers of tables– Complex configurations– Table refresh– Redundant storage
21 1/24/2006
Data Warehouse with HyperRollData Warehouse with HyperRoll
• Same methodology• Same schema• Now only “one summary” table
– Represents all aggregations – Simplifies management
22 1/24/2006
Data Warehouse with HyperRollData Warehouse with HyperRoll
• Same methodology• Same schema• Now only “one summary” table
– Represents all aggregations – Simplifies management
23 1/24/2006
WhatWhat’’s wrong with this picture?s wrong with this picture?
Multiple Views of Multiple Views of summary tables and summary tables and
complex universecomplex universe
24 1/24/2006
Data Warehouse with HyperRollData Warehouse with HyperRoll
One View of All One View of All Possible TablesPossible Tables
25 1/24/2006
Query to the HyperRoll ViewQuery to the HyperRoll View
HyperRoll View --- Simple Query
Few SecondsFew Seconds - query response time !!!
26 1/24/2006
Significant Performance Significant Performance EnhancementEnhancement
0
500
1000
1500
2000
2500
3000
3500
1M 5M 10M 15M 20MMillions of Records
Number of Seconds to Complete Query
Business Objects + Oracle + HyperRoll
Business Objects + Oracle
Less than 1 second
27 1/24/2006
Typical Data Warehouse EnvironmentsTypical Data Warehouse Environments
Applications Databases Flat Files MainframeEAI/EDI
ETL LayerETL Layer
Data Warehouse ODS
Data marts
$ $!Summary TablesMultidimensional
Data StoresBursted Reports Data AlertsCached Reports Extra Hardware
Memory, CPUs
NEED FOR REAL TIME INFORMATIONLow High
Poor Query Performance & Poor User Concurrency
DSS Ad-hocQuery
Budgeting &Planning
OperationalBI
CPM BAM Real-TimeDashboards
Longer Batch Window
28 1/24/2006
Typical Data Warehouse EnvironmentsTypical Data Warehouse Environments
Longer Batch Window
$ $!Summary TablesMultidimensional
Data StoresBursted Reports Data AlertsCached Reports Extra Hardware
Memory, CPUs
DSS Ad-hocQuery
Budgeting &Planning
NEED FOR REAL TIME INFORMATIONLow High
Poor Query Performance & Poor User Concurrency
OperationalBI
CPM BAM Real-TimeDashboards
Applications Databases Flat Files MainframeEAI/EDI
ETL LayerETL Layer
Data Warehouse ODS
Data marts
Hyp
erR
oll E
ngin
e
Hyp
erR
oll E
ngin
e
29 1/24/2006
The best of both worldThe best of both world
RDBMS
OLAP
Relational“Unlimited” scope of dataVariety of client toolsHigh maintenanceComplex table joins and aggregations slows down queriesComplex analysis difficult
OLAPFast QueriesComplex AnalysisLimited scopeLong cube buildsLimited client tools
HyperRoll offers the best of both worldsTransparent integration with both Relational and Multidimensional databasesSeamless to the existing client toolsFast build process (dramatically faster then OLAP)Fast queries without having to design, build and maintain multiple summary tablesBroader scope of analysis (dimensions and data)Eliminates complex Joins and GroupBy
30 1/24/2006
The HyperRoll aggregation serverThe HyperRoll aggregation server
• What’s the magic with the HyperRoll aggregation server
• Does it compute all possible aggregates?
• How come it can perform so much better than OLAP cubes
31 1/24/2006
Legend
Multidimensional CubeMultidimensional Cube
Theoretical scope of data
Leaf level dataAggregated Data
Problems:•Sparsity•Irregularity
32 1/24/2006
16 81 256 1024 4096
16384
65536
0
10000
20000
30000
40000
50000
60000
70000
2 3 4 5 6 7 8
Data Explosion SyndromeData Explosion Syndrome
Number of DimensionsNumber of Dimensions
Num
ber o
f Agg
rega
tions
Num
ber o
f Agg
rega
tions
(4 levels in each dimension)(4 levels in each dimension)
Typical OLAP ProblemsTypical OLAP ProblemsData ExplosionData Explosion
33 1/24/2006
What is the HyperRoll?What is the HyperRoll?• An intelligent Aggregation Server• Software engine based on proprietary
algorithms for data aggregation– Pre-computes a small-footprint data store– Enables quick computation of aggregate values– Highly-efficient I/O
• The logical equivalent of OLAP for relational without the limitations
• Patented Architecture for standalone data aggregation
• Integrated into existing relational databases and Business Intelligence systems
34 1/24/2006
What about HardwareWhat about Hardware--based solutionsbased solutions
• Will better H/W make the aggregation problem go away?
• The good news:– Better h/w platforms improve performance
• The bad news– The problem will just get worse over time
35 1/24/2006
Longer Processing TimesSoaring Costs
Limited Analysis FlexibilityOut of Date Information
ConsequencesConsequencesConsequences
InfoGlut is Only Getting WorseInfoGlut is Only Getting Worse
9 Mo. 18 Mo.Time
MultipleVolume ofVolume of
Corporate DataCorporate Data
Linear Processing Linear Processing CapabilityCapability
(Moore(Moore’’s Law)s Law)3
1
2
36 1/24/2006
Expense process required 4 reports to run sequentially Total time to complete task taking 4 hours Queries ran from 11 to 14 minutes
Financial Services Company Application: Expense Management Primary Business Issue: Analyst Productivity
Financial Institution Financial Institution
Before
Query Performance Increase: 37 to 90X
Process now completed in minutes Queries run in 2 to 18 secondsProjected manpower saving: >$500K
Oracle, Business Objects
37 1/24/2006
Customer Test Results Customer Test Results
Query Name Oracle Timing (MV)
HR Timing Improvement
Bill to 7 min 51 sec 14 sec 31 X
Territory 7 min 15 sec 1 sec 427 X
Region 11 min 4 sec 1 sec 422 X
Sales Force12 min 12 sec 1 sec 438 X
All Sales Force 16 min 37 sec 1 sec 541 X
38 1/24/2006
CustomersCustomers
39 1/24/2006
Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll
Step1Analyze
the
Business
• Analyze Reports
• Analyze Semantic Layer
• Select Measures
• Select Dimensions
• Select Hierarchies
• Obtain Design Validation
• Look for hidden requirements
Step1
Analyze the Business Objects Universe, reports, queries and schema
40 1/24/2006
Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll
Step2
Design the HyperRoll metadata structure using HyperRoll HDF Builder
41 1/24/2006
Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll
Step3
HyperRoll is loaded with source data:•RDBMS•Flat files H
yper
Rol
l • Source data is read
• HyperRoll aggregation engine is loaded and calculated
• Hierarchies are developed
42 1/24/2006
Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll
Step4
Create the Database view and link it to HyperRoll
HyperRoll
• Define an ODBC System DSN for HyperRoll
• Create a DBlink for the DSN
• Create View as Select * from HyperRoll@DBlink
43 1/24/2006
Setting up Business Objects to work with Setting up Business Objects to work with HyperRollHyperRoll
Step5
Modify the Business Objects Universe by adding the Database View that points to HyperRoll
• Add the View to the Universe
• Enable the Aggregate Aware Function
44 1/24/2006
Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll
Step6
Execute queries against Database as they normally would
RDBMS Hyp
erR
oll
• Transparent redirection between detail and aggregate data
• No user training
• Dramatically improved query response
• Dramatically improved manageability
45 1/24/2006
Add the View to Business ObjectsAdd the View to Business Objects
• Here the new Database view has been added to the current BO Universe
• The view comprises the aggregated data for the existing schema
Database View Accessing HyperRoll data
46 1/24/2006
Enable Aggregate Aware FunctionEnable Aggregate Aware Function• In the Aggregate Aware function place the matching
column from the view as the first parameter, and the column from the fact table as the second parameter
@Aggregate_Aware(SH.HR_SALES_VW.AMOUNT_SOLD, SH. SALES.AMOUNT_SOLD)
47 1/24/2006
Setting up Queries to work with HyperRollSetting up Queries to work with HyperRoll
RDBMS
Business Objects End User Layer
Query Request
Business Objects SQL GenerationIs it a summary
request?N Y
SQL Request
FACTTABLE VIEW
DETAILED SUMMARIZED Hyp
erR
oll I
nsta
nce
Gateway
48 1/24/2006
HyperRoll Value Propositions HyperRoll Value Propositions
• Improved query performance• Reduced batch window to load data• Lower maintenance and support costs • Enables operational BI• Complimentary to existing BI, DB and
DW infrastructures
49 1/24/2006
How to learn moreHow to learn more
• On algorithms for massive data sets– http://theory.stanford.edu/~matias/
• On HyperRoll– Talk to me over the break
– Talk to Kathleen • [email protected]• (845)-928-6974
– Take a webinar www.hyperroll.com
50 1/24/2006
Yossi Matias, CTOHyperRoll
NEBOUG
January 19, 2006January 19, 2006
Realizing the Potential of Realizing the Potential of Business IntelligenceBusiness Intelligence