Upload
doanhanh
View
266
Download
1
Embed Size (px)
Citation preview
How Oracle Essbase
Aggregate Storage Option
and how to
Dan Pressman [email protected]
Blog: [email protected]
www.nTuple.net Jun 24, 2014
Seattle, WA
Warning – Danger!
The Information and techniques in this
presentation will soon be
So sayeth Gabby Rubin
So sayeth Kumar Ramaiyer
Warning – Danger!
But they won’t sayeth WHEN †
We all look forward to that day!
But in the meantime…
† As governed by Oracle NDA and advance disclosure requirements and other legally necessary equivocations.
Caveat:
● No secret discussions with Essbase developers
● Based on documentation, patent filings, and
empirical testing
Use accordingly.
Assumption, Basis and a Caveat
Assumption:
● Basic understanding of ASO cubes
Basis:
Assumption, Basis and a Caveat
My chapter,
How ASO Works
and
How to Design for Performance in:
Developing Essbase Applications:
Advanced Techniques for Finance
and IT Professionals
By a show of hands:
How many of you actually have seen
an ASO cube?
Who Has Seen an ASO Cube?
Really? You’ve looked inside the computer
and seen an ASO cube?
Who Has Seen an ASO Cube?
Still made and used for field identification of minerals.
Similar guides are available for Trees and Birds.
An ASO Cube You Can Hold in Your Hand
Holes represent metadata presence; notches represent absence.
Note notches for all levels (67510, Abbyville, Kansas and Central).
Query for Check and Adult = ((Check) AND Adult)
Card Used With Sorting-Needle
Cards with holes are pulled up and used for next part of query; cards with notches fall out. Check and adult = ((Check) AND (Adult))
Data Is Queried Using a “Sorting-Needle”
Length of Sorting-Needle
Size of card box
Not based on level of query
● Upper-level data queried as fast as Level 0
Is based on number of Dimensions queried
● Each Dimension requires another pass of the
Sorting-Needle
● Unless you have the new…
Query Performance is Dependent on…
Patented Multi-Processor Sorting-Needle
Logic is reversed with multi-processor sorting-needle (holes represent absence of
metadata; notches represent presence). Multi-processor needle query pulls cards NOT
matching the query. Query for Check and Adult = NOT(NOT(Check) OR NOT(Adult))
Card Used With Multi-Processor Sorting-Needle
Card-based systems widely used in 1960’s
Fact data printed on card face
Manual aggregation of Fact data after query
completion
The Cards
Level 0 bitmap mask – New York, Jan, Cola
Bitmap mask Aggregated
values
Moving up the bitmap – New York, Qtr1, Cola
Bitmap mask Aggregated
values
A little higher – Colas, East, Qtr1
Bitmap mask Aggregated
values
Almost at the top – Year, Colas, East
Bitmap mask Aggregated
values
Top of the stack – Product, Market, Year
Bitmap mask
Aggregated
values
Holes and notches represented by bitmap, as
seen on DB statistics page
● Upper-level membership coded into bitmap
Fact data
● Multiple pieces often on single card
● Equivalent to ASO Compression dimension
● Computers are fast at running query through card
“stack” and summing Fact data
● Often a hardware instruction
ASO Essbase vs. The Cards
Complex query results are calculated only after
all Sorting-Needle queries
● Equivalent to ASO Stored Hierarchy summations
● Complex query results are equivalent to MDX
Sorting-needle length and Card-box size
equivalent to amount of RAM
ASO Essbase vs. The Cards
ASO Rule R1
● Card-deck Queries:
● If cards don’t fit into single box, you’d have to:
● Fetch Box 1
● Perform query on Box 1 and store results
● Repeat for all boxes
● Fetch combined results
R1 - The Input-level and Aggregation-data for all loaded
ASO cubes should fit into memory (or it ain’t really ASO)
ASO Rule R1
● ASO Queries:
● Fetch data for Stored Hierarchy portion of query in
pieces and sum results
● Performance primarily related to memory
footprint of input data
R1 - The Input level and Aggregation-data for all loaded
ASO cubes should fit into memory (or it ain’t really ASO)
Dynamic calcs first seen in v5 Essbase
● Reduced disk size of dense data blocks
● Allowed elimination of dynamic sparse blocks
ASO is logical extension of Dynamic calcs
ASO design can be summarized thus:
A BSO Analogy
Pop’s Rule
● My father (hardware designer in late 1940’s) taught me
at age 10 this way:
● Two numbers and their sum were written on separate pieces of
paper and placed in another room
● I could fetch only one piece of paper at a time
● I was timed fetching: 3 pieces of paper vs. 2 and adding them
● Even at age 10, I could add faster than I could fetch
ASO works the same way
Pop’s Rule - “Computers do arithmetic fast - but they don’t
like to run errands”
Remaining rules derived from analysis of
Bitmap, as on Statistics page
● Bitmap documentation first appeared in v11
● DBAG Chapter 62 page 934: An aggregate storage
database outline cannot exceed 64-bits per dimension
Note: ASOsamp application shown in DBAG differs
slightly from delivered ASOsamp. To replicate DBAG
results, modify your ASOsamp to match example.
The Rest of the Rules
The Bitmap and the Statistics Page
Bitmap Size based on:
● Width of widest level
● Number of levels
Bitmap rounded up to next higher 64-bit level
Cube size dependent on:
● Bitmap size
● Number of data rows, modified by
● Compression dimension settings
Highlights from the Bitmap
12+1 Rules numbered in order developed
Selected Rules discussed in simplest order
For more detail:
● See my chapter in “Developing Essbase
Applications”
● Note: I did not write this book to get rich! My
shameless plugs are only for my vanity and your use
The Rules of ASO Designing for Performance
The Rules of ASO Designing for Performance
R1 - The Input level and Aggregation-data for all loaded
ASO cubes should fit into memory (or it ain’t really ASO)
R2 - Wherever possible, data should be calculated from
Stored non-formula Members
R3 - All queries against the same aggregation level take the
same time
R4 - Do not depend on aggregation or other “Maintenance”
to make up for bad design
R5 - Alternate hierarchies, whether Dynamic or Stored or
Attribute, are almost always cheap… give the user what they
want
R6 - Label-Only members have no cost - use them to
enhance your cube’s readability
R7 - Changes to hierarchy order are cheap or free, so
design for user convenience
R8 - Designs requiring queries of multiple Attributes of the
same base dimension may suffer performance degradation -
evaluate and consider alternatives
R9 - The use of a Compression dimension is not a given;
consider and test alternatives including not having a
Compression dimension
The Rules of ASO Designing for Performance
R10 - The use of the Accounts dimension tag has substantial
costs - alternatives should be considered strongly
R11 - Analysis dimensions are cheap or free - use them
R12 - A query will be run against the smallest View whose
aggregation level on each dimension is less than or equal to
the aggregation level of the query (for the same hierarchy) -
you do not have to create Aggregated Views on all
dimensions
And One More Rule:
Pop’s Rule - “Computers do arithmetic fast - but they don’t
like to run errands”
The Rules of ASO Designing for Performance
Card-deck analogy makes R2 apparent
● All queries resolved as MDX combination of one or
more Stored Hierarchy queries
● Objective is to eliminate “or more”
Rules R2 and R3
R2 - Wherever possible, data should be calculated from
Stored non-formula Members
R3 - All queries against the same aggregation level take the
same time
Like a Sorting-Needle, ASO is dumb:
● Both go through entire “deck” for each query
● Unlike BSO, there’s no sparse dimension index
● Bitmap reflects all dimensions: Which ones would you index,
in what order?
Rules R2 and R3
R2 - Wherever possible, data should be calculated from
Stored non-formula Members
R3 - All queries against the same aggregation level take the
same time
Alternate Hierarchies Based on R2
● Load data with “Natural Sign”
● Positive and Negative Values
● Not + and – consolidations
● Use UDA’s to flip signs for presentation
● In high solve-order MDX
R2 - Wherever possible, data should be calculated from
Stored non-formula Members
Load “Flow” data, not “Balance” data
● YTD’s don’t change every period – why load them?
● Load BoY and Period deltas
● Reconstruct YTD values using:
● MDX (boo hiss), or
● Stored Hierarchies (much faster)
The Result: Major Reductions in Cube Size
● If your only data source is YTD, load it; then load again
reversed to following month
Alternate Hierarchies Based on R2
R2 - Wherever possible, data should be calculated from
Stored non-formula Members
Alternate Hierarchies Based on R2
● Avoid Summing using MDX
● Use compound members to recreate YTD values
● JunYTD instead of (Jun, YTD)
● Construct Stacked Hierarchies to calculate
● Hide ugly stacked hierarchies
● Use MDX to “redirect” queries from (Jun YTD) or (Jun,
YTD) to JunYTD
R2 - Wherever possible, data should be calculated from
Stored non-formula Members
Monthly Stacked Hierarchy
New! Blurred concatenation formula replaced with…
Monthly Stacked Hierarchy
…a simple case statement to avoid performance
issues:
Monthly Stacked Hierarchy – New Info
Rule R10
● Rule is restatement of R2, specific to Accounts
dimension
● Use of Accounts Dimension Tag forces entire
Accounts dimension to be Dynamic
R10 - The use of the Accounts dimension tag has
substantial costs - alternatives should be considered
strongly
Tagging a dimension Compression forces it to
be Dynamic
Are there Intra-dimension calculations that
could have used Stored Hierarchies?
What is cost, in terms of increased memory
footprint, of forgoing Compression?
Rule R9
R9 - The use of a Compression dimension is not a given;
consider and test alternatives including not having a
Compression dimension
Rule R9
● If memory is available and Stored Hierarchy
consolidation options exist:
● Then NO Compression performs fastest
● Use Compression Dimension Wizard
● Use Real data when evaluating
● Average Bundle Fill (ABF) and Average Value Length (AVL)
must be based on realistic data
R9 - The use of a Compression dimension is not a
given; consider and test alternatives including not having
a Compression dimension
Rule R9
● ABF is optimal for multiples of 16 Level 0, non-formula
members
● Follow DBAG recommendations for member order in outline
● AVL is optimal when data have fewer significant digits
● Note: Two digits after decimal seem to be optimized
R9 - The use of a Compression dimension is not a
given; consider and test alternatives including not having
a Compression dimension
Rule R12 - But First, What Is an Aggregation?
● To visualize an Aggregation, think of card deck
● Aggregated Deck would have fewer cards
● Aggregated Deck would have “shorter” cards
R12 - A query will be run against the smallest View whose
aggregation level on each dimension is less than or equal to
the aggregation level of the query (for the same hierarchy) -
you do not have to create Aggregated Views on all
dimensions
Can calculate how much shorter the Bitmap will be
Cannot calculate how many cards, without checking every card in input level
view (aka Level 0 view)
Rule R12 - Data Card Representing Aggregation
L0 View:
Bitmap:
Cells:
Rows:
Aggregated
Bitmap:
Cells:
Rows:
63 Bits
1,249,859
311,156
View:
54 bits
???
???
Aggregation at: Time L1, Stores L2 and Age L1
Time to Compute ● Accuracy based on ASOSAMPLESIZEPERCENT
Disk/Memory Footprint ● Design wizard gives estimate only of Aggregation size
Rule R12 - What Is the Cost of an Aggregation?
ASOSamp Recommended Views:
24 Total
4 at L1 of Time (Qtr)
10 at L2 of Time (Half)
Queries run on smaller stack of shorter cards
Will all Aggregations be used?
● Recommended ASOsamp Aggregation:
● Levels 1&2 of Time - how often are Qtrs or Halves used?
● Consider adding hint into outline
● Better to have Aggregations that speed up YTDs
● YTDs are Stored Hierarchies now, right?
Remember, Aggregations can be done only
on Stored Hierarchies
Rule R12 - What Is the Benefit of an Aggregation?
Important: Not all dimensions require Aggregation
Some Aggregations, expected to be useful, will
be used rarely (Rule R8)
Rule R12
R12 - A query will be run against the smallest View whose
aggregation level on each dimension is less than or equal to
the aggregation level of the query (for the same hierarchy) -
you do not have to create Aggregated Views on all
dimensions
Each dimension has fixed number of allocated
bits ● Based on requirements of largest Alternate
Hierarchy
● Therefore, only one Hierarchy is represented in
Bitmap at any one time
Rule R5
R5 - Alternate hierarchies, whether Dynamic or Stored or
Attribute, are almost always cheap… give the user what
they want
Add all Alternate Hierarchies the users want ● Without increasing Bitmap size
Performance is independent of number of
Alternate Hierarchies ● Use them freely (unlike BSO!)
Rule R5
R5 - Alternate hierarchies, whether Dynamic or Stored or
Attribute, are almost always cheap… give the user what
they want
But if Alternate Hierarchy is not in Bitmap, how
will Sorting-Needle work? ● I don’t know… but I have some guesses
● Several algorithms can be envisioned, but precise
ASO method not disclosed
Rule R5
R5 - Alternate hierarchies, whether Dynamic or Stored or
Attribute, are almost always cheap… give the user what
they want
Even if we don’t know how an Alternate Hierarchy is queried in
Level 0 view, it’s easy to imagine an upper-level Attribute
Aggregation on the data card.
Note: Square Footage Hierarchy never appeared on previous slides.
Rule R5
Attribute Dimensions are Alternate Hierarchies
Only one Alternate Hierarchy in Bitmap at a time
ASO must query an un-Aggregated view of the
dimensions
● Aggregation no longer “knows” the base associated
Level 0
Rule R8
R8 - Designs requiring queries of multiple Attributes of
the same base dimension may suffer performance
degradation - evaluate and consider alternatives
Includes anything other than topmost level of base
dimension and Attribute dimension
● And at topmost level only if ALL Level 0 members
roll up to it
AND
● All Level 0 members are associated to each Attribute
Dimension
Rule R8
R8 - Designs requiring queries of multiple Attributes of
the same base dimension may suffer performance
degradation - evaluate and consider alternatives
First rule to consider when users ask:
● Why is cube slow sometimes?
● Why are some queries slower than others?
Rule R8
R8 - Designs requiring queries of multiple Attributes of
the same base dimension may suffer performance
degradation - evaluate and consider alternatives
Temp Tablespace
● A separate drive/spindle/channel
● Great place to employ SSD drives
Operating System File Compression
● If you have CPU cycles, employ Pop’s Rule:
● Try compressing primary tablespace directory
● Try compressing temp tablespace directory
Other Suggestions
Buy more memory
Use Stored Hierarchies
Stop writing MDX!!! (No one will think less of you)
Let ASO be ASO
Summary
Dan Pressman nTuple, LLC
TheEssbaseMechanic.wordpress.com
Contact Information
See You Later in 2014 at:
Like the best, most advanced Essbase
conference there ever could be
Advanced content
Good practices
Written by some of the most well
known Essbase developers
Source code at
www.developingessbasebook.com
You should buy it
Developing Essbase Applications
My chapter, How ASO Works and How to Design for
Performance includes:
12+1 Rules to guide your ASO Designs:
Previously unpublished information based on the
statistics page and documentation, and gleaned from
related patent filings, all distilled into 12+1 Rules to
guide your ASO Designs. These Rules will ensure that
your cubes perform maximally, require less
Aggregation, and have a minimal memory footprint.
The 12+1 Rules emphasize the use of “Stored
Hierarchies” and include real-world examples showing
how to design around common requirements without
using MDX and in conformance to the rules, to truly…
Let ASO be ASO.
Developing Essbase Applications
Much of this information is found nowhere else.