Performance Tuning Cubes and Queries in Analysis Services 2008 Chris Webb [email protected] [email protected]

Performance Tuning Cubes and Queries in Analysis Services 2008

Chris [email protected]

mailto:[email protected]

http://www.crossjoin.co.uk/

Who am I?

• Chris Webb ([email protected])• Independent consultant specialising in

Analysis Services and MDX: http://www.crossjoin.co.uk

• Blogger: http://cwebbbi.spaces.live.com • Author:– MDX Solutions 2nd Edition– Expert Cube Development With Analysis Services

2008

mailto:[email protected]

http://www.crossjoin.co.uk/

http://cwebbbi.spaces.live.com/

Agenda

• Designing for Performance• How Analysis Services Answers Queries• Partitions• Aggregations• Formula Engine bottlenecks• Caching

Designing for Performance

• Some features give increased flexibility at a cost to performance– Eg parent/child hierarchies, referenced relationships,

unnatural hierarchies, many-to-many, distinct count, semi-additive measures...

• Mistakes made in modelling the data warehouse are often ‘papered over’ using SSAS features– Eg calculations being done in MDX when they should have

been done in the ETL• Design decisions should always be made with query

performance in mind

Attribute Relationships

• Attribute relationships describe the one-to-many relationships between attributes in your dimensions

• Will probably not be optimal after you’ve run the Dimension wizard – only knows about foreign key relationships

• Are very important for query performance

Parent/Child Hierarchies

• Parent/child hierarchies are very flexible, but can perform badly sometimes– Especially where the hierarchy contains thousands

of members• If you know the maximum depth of the

hierarchy, use a ragged hierarchy instead– Although ragged hierarchies can also perform badly

• The ‘parent/child hierarchy naturaliser’ in BIDS Helper can do this automatically

SSAS Query Processing

Formula Engine works out what data isneeded for each query,and requests it from theStorage Engine

Storage Enginehandles retrieval of raw data from disk, and any aggregation required

MDX Query In Cellset Out

Query Subcube Requests

Cache

Cache

Disk

Tuning the Storage Engine

• Tuning the Storage Engine is all about:– Reducing the time taken to read data from disk– Reducing the time taken to aggregate that data to

the required granularity• As a result, disk and CPU have an important

role to play in Storage Engine performance• However, throwing hardware at the problem is

rarely the whole answer

Partitions

• Measure groups can be divided up into multiple partitions– Enterprise Edition only

• You should partition your measure groups to reflect the slices that users are likely to use in their queries– Usually by the Time dimension

• Partitioning improves query performance in two ways:– It reduces overall IO because AS should only scan the

partitions containing the data requested– It increases parallelism because AS can scan more than one

partition simultaneously

Partitions

• You should be partitioning so that a single partition contains between 2 million rows and 30 million rows of data

• Although SSAS should auto-detect the slice of a partition, it is good practice to set the Slice property manually, even on MOLAP partitions

• In Profiler, Progress Report Begin/End events show which partitions are being read

Aggregations

• Aggregations are pre-calculated sets of summary values– Similar to what is returned by a GROUP BY query in SQL

• Building aggregations improves query performance by reducing the amount of calculation done at query time

• Aggregations start to show an effect on regular Sum measures where partition size is greater than a few million rows

• A Measure Group can have multiple Aggregation Designs• A Partition can be associated with one Aggregation Design

– aggregations are built on a per-partition basis

Aggregation Design Methodology

1. Make your dimension design as ‘clean’ as possible – delete or disable unnecessary attributes, set attribute relationships

2. Set the AggregationUsage property appropriately on each Cube dimension

3. Run the Aggregation Design Wizard1. First using the ‘I Click Stop’ option2. Then, if this creates too many aggregations, using

‘Performance Gain’ of 30%

Aggregation Design Methodology

4. Set up Query Logging and run Usage-Based Optimisation

5. If necessary, design aggregations manually using BIDS Helper or the Aggregation Design tab

6. Check aggregations are being used by looking for the Get Data From Aggregation event in Profiler

The Formula Engine

• If Storage Engine requests make up a negligible percentage of the time taken by the query, then the Formula Engine is the problem– Profiler and MDX Studio can be used to find out if this is the case– MDX Script Performance Analyser can help to work out which, if

any, calculation on the cube is the problem• Another giveaway sign is that when a query runs on a

multi-processor server, only one CPU shows activity– The Formula Engine is single-threaded

• Rewriting MDX calculations can result in massive gains in performance

Writing Efficient MDX

• As with all code, using efficient algorithms is very important– It’s important to avoid doing expensive operations more than is

necessary– Named sets can store the result of expensive set operations– Calculated members can cache the result of expensive numeric

calculations• In the implementation of these algorithms, it’s important to

know which MDX functions perform best– MDX Studio can tell you which functions to avoid– Will vary by version and SP

• Calculations that execute in ‘bulk mode’ can perform many times better than those that execute in ‘cell-by-cell’ mode

Caching

• Caching can take place in the Storage Engine and the Formula Engine

• Caching is extremely important to query performance

• SE caching is very reliable, and caches can be shared between all users

• FE caching is much more temperamental...

Formula Engine Caching

• There are three different ‘scopes’ or lifetimes of a FE cache:– Query – for calculations defined in the WITH clause of a

query, the FE values can only be cached for the lifetime of the query

– Session – for calculations defined for a session, using the CREATE MEMBER statement executed on the client, FE values can only be cached for the lifetime of a session

– Global – for calculations defined in the cube’s MDX Script, FE values can be cached usually until processing takes place

• Global scope is best from a performance point of view!

Restrictions on FE Caching

• Any calculations defined in the WITH clause of a query will force query scope for the FE cache

• As will the use of cell security• As will the use of non-deterministic functions

like NOW()• With dimension security, calculation caches

can only be shared between users who have the same permissions

Exploiting FE Caching

• Caching numeric values, so that expensive calculations are only performed once, can be very important for calculation performance

• Use of certain MDX functions and constructs can force Query scope for FE caching, and hurt performance

Cache Warming

• Data can be loaded into the SE cache by executing CREATE CACHE statements

• The FE cache can only be warmed by running MDX queries

• Cache warming involves automatically loading data into the cache to improve query performance for real users– SSIS packages can be used to do this

Thanks!

Links

• http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part1

http://www.packtpub.com/article/query-performance-tuning-microsoft-analysis-services-part2







Documents

Performance Tuning Cubes and Queries in Analysis Services 2008 Chris Webb [email protected] [email protected]