41
Building Scalable OLAP Applications with Mondrian and MySQL Julian Hyde: Chief Architect, OLAP, at Pentaho and Mondrian Project Founder Tuesday, April 24th 2007

Building Scalable OLAP Applications with Mondrian and MySQL

  • Upload
    allayna

  • View
    68

  • Download
    2

Embed Size (px)

DESCRIPTION

Building Scalable OLAP Applications with Mondrian and MySQL. Julian Hyde: Chief Architect, OLAP, at Pentaho and Mondrian Project Founder Tuesday, April 24th 2007. Agenda. Pentaho Introduction What is OLAP? Key features and architecture Getting started Schemas & queries - PowerPoint PPT Presentation

Citation preview

Page 1: Building Scalable OLAP Applications with Mondrian and MySQL

Building Scalable OLAP Applications with Mondrian and MySQL

Julian Hyde: Chief Architect, OLAP, at Pentahoand Mondrian Project Founder

Tuesday, April 24th 2007

Page 2: Building Scalable OLAP Applications with Mondrian and MySQL

Agenda

Pentaho Introduction

What is OLAP?

Key features and architecture

Getting started

Schemas & queries

Tuning & scalability

Active Data Warehousing

Case Studies

Business Intelligence suite

Q & A

Page 3: Building Scalable OLAP Applications with Mondrian and MySQL

Pentaho Introduction

World’s most popular enterprise open source BI Suite2 million lifetime downloads, averaging 100K / month

Founded in 2004: Pioneer in professional open source BI

Management - proven BI and open source veterans from Business Objects, Cognos, Hyperion, JBoss, Oracle, Red Hat, SAS

Board of Directors – deep expertise and proven success in open sourceLarry Augustin - founder, VA Software, helped coin the phrase “open source”

New Enterprise Associates – investors in SugarCRM, Xensource, others

Index Ventures – investors in MySQL, Zend, others

Widely recognized as the leader in open source BIDistributed worldwide by Red Hat via the Red Hat Exchange

Embedded in next release of OpenOffice (40 million users worldwide)

Page 4: Building Scalable OLAP Applications with Mondrian and MySQL

What is OLAP?

View data “dimensionally”i.e. Sales by region, by channel, by time period

Navigate and exploreAd Hoc analysis“Drill-down” from year to quarterPivotSelect specific members for analysis

Interact with high

performanceTechnology optimized for rapid interactive response

Page 5: Building Scalable OLAP Applications with Mondrian and MySQL

A brief history of OLAP

1970

1980

1990

2000

2010

APL

Express

Mondrian

Pentaho

Essbase

Business Objects

Cognos PowerPlay

Mainframe OLAPComshare

MicroStrategy

JPivot

Desktop OLAP

Open-source OLAP

Open-source BI suites OpenIBEE

RDBMS support for DW

Sybase IQ

Informix MetaCube

Oracle partitions,bitmap indexes

JRubikPalo

Microsoft OLAP Services

Codd’s “12Rules of OLAP”

Spreadsheets

MySQL

Page 6: Building Scalable OLAP Applications with Mondrian and MySQL

Mondrian features and architecture

Page 7: Building Scalable OLAP Applications with Mondrian and MySQL

Key Features

On-Line Analytical Processing (OLAP)

cubesautomated aggregationspeed-of-thought response times

Open Architecture100% JavaJ2EESupports any JDBC data sourceMDX and XML/A

Analysis ViewersEnables ad-hoc, interactive data explorationAbility to slice-and-dice, drill-down, and pivot Provides insights into problems or successes

Page 8: Building Scalable OLAP Applications with Mondrian and MySQL

How Mondrian Extends MySQL for OLAP Applications

MySQL Provides

Data storage

SQL query execution

Heavy-duty sorting, correlation,

aggregation

Integration point for all BI tools

Mondrian Provides

Dimensional view of data

MDX parsing

SQL generation

Caching

Higher-level calculations

Aggregate awareness

Page 9: Building Scalable OLAP Applications with Mondrian and MySQL

Cube Schema

XML

Cube Schema

XML

Cube Schema

XML

Open Architecture

Open Standards (Java, XML,

MDX, XML/A, SQL)

Cross Platform (Windows &

Unix/Linux)

J2EE ArchitectureServer ClusteringFault Tolerance

Data SourcesJDBCJNDI

J2EE Application Server

Mondrian

Web Server

JDBC

RDBMS

cube cube cube

File or RDBMS Repository

RDBMS

JDBC JDBC

JPivot servlet

Viewers

JPivot servlet XML/A servlet

Microsoft Excel(via Spreadsheet

Services)

Page 10: Building Scalable OLAP Applications with Mondrian and MySQL

Getting started

Page 11: Building Scalable OLAP Applications with Mondrian and MySQL

Getting started with Mondrian

1. Download mondrian from http://mondrian.pentaho.org and unzip

2. Copy mondrian.war to TOMCAT_HOME/webapps.

3. Create a database:$ mysqladmin create foodmart$ mysqlmysql> grant all privileges on *.* to 'foodmart'@'localhost' identified by 'foodmart';Query OK, 0 rows affected (0.00 sec)

mysql> quitBye

4. Install demo dataset (300,000 rows):$ java -cp

"/mondrian/lib/mondrian.jar:/mondrian/lib/log4j-1.2.9.jar:/mondrian/lib/eigenbase-xom.jar:/mondrian/lib/eigenbase-resgen.jar:/mondrian/lib/eigenbase-properties.jar:/usr/local/mysql/mysql-connector-java-3.1.12-bin.jar"     mondrian.test.loader.MondrianFoodMartLoader     -verbose -tables -data -indexes     -jdbcDrivers=com.mysql.jdbc.Driver     -inputFile=/mondrian/demo/FoodMartCreateData.sql      -outputJdbcURL= "jdbc:mysql://localhost/foodmart?user=foodmart&password=foodmart“

5. Start tomcat, and hit http://localhost:8080/mondrian

Page 12: Building Scalable OLAP Applications with Mondrian and MySQL

demonstration

Page 13: Building Scalable OLAP Applications with Mondrian and MySQL

Schemas and queries

Page 14: Building Scalable OLAP Applications with Mondrian and MySQL

A Mondrian schema consists of…

A dimensional model (logical)Cubes & virtual cubesShared & private dimensionsCalculated measures in cube and in query languageParent-child hierarchies

… mapped onto a star/snowflake

schema (physical)Fact tableDimension tablesJoined by foreign key relationships

Page 15: Building Scalable OLAP Applications with Mondrian and MySQL

Writing a Mondrian Schema

Regular cubes, dimensions,

hierarchies

Shared dimensions

Virtual cubes

Parent-child hierarchies

Custom readers

Access-control

<!-- Shared dimensions -->

<Dimension name="Region">

<Hierarchy hasAll="true“ allMemberName="All Regions">

<Table name="QUADRANT_ACTUALS"/>

<Level name="Region" column="REGION“ uniqueMembers="true"/>

</Hierarchy>

</Dimension>

<Dimension name="Department">

<Hierarchy hasAll="true“ allMemberName="All Departments">

<Table name="QUADRANT_ACTUALS"/>

<Level name="Department“ column="DEPARTMENT“ uniqueMembers="true"/>

</Hierarchy>

</Dimension>

(Refer to http://mondrian.pentaho.org/documentation/schema.php )

Page 16: Building Scalable OLAP Applications with Mondrian and MySQL

Tools

Schema

Workbench

Pentaho cube

designer

cmdrunner

Page 17: Building Scalable OLAP Applications with Mondrian and MySQL

MDX – Multi-Dimensional Expressions

A language for multidimensional queries

Plays the same role in Mondrian’s API as SQL does in JDBC

SQL-like syntax

… but un-SQL-like semantics

SELECT {[Measures].[Unit Sales]} ON COLUMNS,

{[Store].[USA], [Store].[USA].[CA]} ON ROWS

FROM [Sales]

WHERE [Time].[1997].[Q1]

(Refer to http://mondrian.pentaho.org/documentation/mdx.php )

Page 18: Building Scalable OLAP Applications with Mondrian and MySQL

Scalability & Tuning

Page 19: Building Scalable OLAP Applications with Mondrian and MySQL

Tuning is a Process

Some techniques:

Test performance on a realistic data set!

Use tracing to identify long-running SQL statementsmondrian.trace.level=2

Tune SQL queries

Run the same query several times, to examine cache

effectiveness

When Mondrian starts, prime the cache by running queries

Get to know Mondrian’s system properties

Page 20: Building Scalable OLAP Applications with Mondrian and MySQL

Tuning MySQL for Mondrian

Fact table:Index foreign keysTry indexing multiple combinations of foreign keysUse MyISAM or MERGE (also known as MRG_MyISAM)Partitioned tables

Dimension tables:Index primary and foreign keysUse MyISAM for large dimension tables, MEMORY for smaller dimension tables

Create aggregate tables:Contain pre-computed summariesPrevent full table scan followed by massive GROUP BY

Parent-child hierarchiesCreate closure tables

Page 21: Building Scalable OLAP Applications with Mondrian and MySQL

Scaling to large datasets

Tune MySQLPartitions allow parallelism

Aggregate tables allow you to do the

hard work to load-time, not runtime

Increase cache size

Page 22: Building Scalable OLAP Applications with Mondrian and MySQL

Scaling to large numbers of concurrent users

Tune MySQL

Increase JVM memory

Make sure that

connections are using

the same cache

Multiple web servers

Multiple mondrian

instances

Multiple DBMS

instances with read-

only copies of the

data

J2EE Application Server

Mondrian

cube

RDBMS

JDBC

Viewers

JPivot servlet

J2EE Application Server

Mondrian

Web Server

cube

RDBMS

JDBC

JPivot servlet

Viewers

XML/A servlet

Web Server

JPivot servlet

J2EE Application Server

Mondrian

Web Server

cube

JDBC

JPivot servlet

XML/A servlet

Web Server

JPivot servlet

J2EE Application Server

Mondrian

Web Server

cube

RDBMS

JDBC

JPivot servlet

XML/A servlet

Web Server

JPivot servlet

J2EE Application Server

Mondrian

Web Server

cube

JDBC

JPivot servlet

XML/A servlet

Web Server

JPivot servlet

J2EE Application Server

Mondrian

Web Server

cube

RDBMS

JDBC

JPivot servlet

XML/A servlet

Web Server

JPivot servlet

RDBMS

Viewers

Page 23: Building Scalable OLAP Applications with Mondrian and MySQL

Active Data Warehousing

Page 24: Building Scalable OLAP Applications with Mondrian and MySQL

Active Data Warehousing

An Active Data Warehouse:

Extends the traditional DW to support operational processing

Provides a single view of the business

Is updated in near real-time

Cost/benefit:

High cost (technical challenges)

High reward (time is money)

Page 25: Building Scalable OLAP Applications with Mondrian and MySQL

Technical challenges of Active Data Warehousing

DBMS has mixed work-loadShort, fast queries to query specific recordsLonger queries to look at changing patterns

Emphasis on recent dataBackground DML to keep database up to date

Continuous, incremental ETLContinuous or near real-timeIncrementalConsider message-driven ETL

OLAP cacheOLAP servers use cache and/or MOLAP database for performanceCache consistency

Page 26: Building Scalable OLAP Applications with Mondrian and MySQL

Mondrian and Active Data Warehouse

“ROLAP with caching”No MOLAP data store to maintain

Disable aggregate tablesUnless your ETL process can maintain these incrementally

Either turn off cachemondrian.rolap.star.disableCaching=true

“Pure ROLAP” modeResults are cached for the lifetime of a single MDX statement

Or use new CacheControl API to selectively flush mondrian’s

cache

Page 27: Building Scalable OLAP Applications with Mondrian and MySQL

Cache control

In an Active Warehouse, the most recent data are modified much

more often than historic data

Cache control API allows the application to selectively flush the

cache

year quarter month day unit_sales

2007 Q1 1 1 1500

2007 Q1 1 2 1700

2007 Q1 1 3 1250

2007 Q1 1 4 1800

2007 Q2 4 22 1378

2007 Q2 4 23 1920

2007 Q2 4 24 3502007 Q2 4 24 4502007 Q2 4 24 525

(Refer to http://mondrian.pentaho.org/documentation/cache_control.php )

Page 28: Building Scalable OLAP Applications with Mondrian and MySQL

Cache control: A simple example

Let’s run a simple MDX query:

SELECT

{[Time].[1997],

 [Time].[1997].[Q1],

[Time].[1997].[Q2]} ON COLUMNS,

{[Customers].[USA],

 [Customers].[USA].[OR],

 [Customers].[USA].[WA]} ON ROWS

FROM [Sales]

Page 29: Building Scalable OLAP Applications with Mondrian and MySQL

Cache contents after running query

year nation unit_sales

1997 USA 266,773

year nation state unit_sales

1997 USA OR 67,659

1997 USA WA 124,366

year quarter nation unit_sales

1997 Q1 USA 66,291

1997 Q2 USA 62,610

year quarter nation state unit_sales

1997 Q1 USA OR 19,287

1997 Q1 USA WA 30,114

1997 Q2 USA OR 15,079

1997 Q2 USA WA 29,479

year=1997, nation=USA,

measure=unit_sales

year=1997, nation=USA, state={OR, WA},

measure=unit_sales

year=1997, quarter=*, nation=USA,

state={OR, WA}, measure=unit_sales

year=1997, quarter=any, nation=USA,

measure=unit_sales

Page 30: Building Scalable OLAP Applications with Mondrian and MySQL

Flushing part of a segment

year=1997, quarter=*, nation=USA,

state={OR, WA}, measure=unit_sales

year quarter nation state unit_sales

1997 Q1 USA OR 19,287

1997 Q1 USA WA 30,114

1997 Q2 USA OR 15,079

1997 Q2 USA WA 29,479

OR WA

Q

1

19,28

7

30,11

4

Q

2

15,07

9

29,47

9

year=1997, quarter=*, nation=USA,

state={OR, WA}, measure=unit_sales;

exclude: state=WA and quarter=Q2

CA WA

Q

1

16,89

0

30,11

4

Q

2

18,05

2

30,07

9

Cachemanager

year=1997, quarter=*, nation=USA,

state={CA, WA}, measure=unit_sales

Page 31: Building Scalable OLAP Applications with Mondrian and MySQL

Code using CacheControl to flush part of a segment

Connection connection;

CacheControl cacheControl = connection.getCacheControl(null);

Cube salesCube = connection.getSchema().lookupCube("Sales", true);

SchemaReader schemaReader = salesCube.getSchemaReader(null);

// Create region containing [Customer].[USA].[OR].

Member memberOregon = schemaReader.getMemberByUniqueName(

new String[]{“Customer", “USA", “OR"}, true);

CacheControl.CellRegion regionOregon =

cacheControl.createMemberRegion(memberOregon, true);

// Create region containing [Customer].[USA].[OR].

Member memberTimeQ2 = schemaReader.getMemberByUniqueName(

new String[]{“Time", “1997", “Q2"}, true);

CacheControl.CellRegion regionTimeQ2 =

cacheControl.createMemberRegion(memberTimeQ2, true);

// Create region containing ([Customer].[USA].[OR], [Time].[1997].[Q2])

CacheControl.CellRegion regionOregonQ2 =

cacheControl.createCrossjoinRegion(regionOregon, regionTimeQ2);

cacheControl.flush(regionOregonQ2);

Page 32: Building Scalable OLAP Applications with Mondrian and MySQL

Case studies

Page 33: Building Scalable OLAP Applications with Mondrian and MySQL

Case Study: Frontier AirlinesFrontier Airlines

Key Challenges

Understanding and optimizing fares to ensure

Maximum occupancy (no empty seats)

Maximum profitability (revenue per seat)

Pentaho Solution

Pentaho Analysis (Mondrian)

Chose Open Source RDBMS and Mondrian over Oracle

500 GB of data, 6 server cluster

Results

Comprehensive, integrated analysis to set strategic pricing

Improved per-seat profitability (amount not disclosed)

Why Pentaho

Rich analytical and MDX functionality

Cost of ownership

“The competition is intense in the airline industry and Frontier is committed to staying ahead of the curve by leveraging technology that will help us offer the best prices and the best flight experience…. [the application] fits right in with our philosophy of providing world-class performance at a low price.”

Page 34: Building Scalable OLAP Applications with Mondrian and MySQL

Case Study: U.S.-based, Public Software Company OEM Example

Key Requirements

Embedding analysis within a pricing planning application

Incumbent Vendor

MicroStrategy

Decision Criteria

Embeddability

Extensibility

Cost of ownership

Best Practices

Evaluate replacement costs holistically

Treat migrations as an opportunity to improve a deployment, not just move it

Understand and prioritize requirements – functionality, usability, architecture, etc.

“The old solution (MicroStrategy) was built to be a standalone application. It expected to own the browser session, to have separate security, and its own user interface standards. We were able to make Mondrian a seamless part of the application, and it was much easier to do so.”

Page 35: Building Scalable OLAP Applications with Mondrian and MySQL

The big picture

Page 36: Building Scalable OLAP Applications with Mondrian and MySQL

Business Intelligence Suite

Mondrian OLAP

Analysis tools:Pivot tableChartingDashboards

ETL (extract/transform/load)

Integration with operational reporting

Integration with data mining

Actions on operational data

Design/tuning tools

Page 37: Building Scalable OLAP Applications with Mondrian and MySQL

Pentaho Open Source BI Offerings

All available in a Free Open Source license

Page 38: Building Scalable OLAP Applications with Mondrian and MySQL

A Sample of Joint MySQL-Pentaho Users

“Pentaho provided a robust, open source platform for our sales reporting application, and the ongoing support we needed. The experts at OpenBI provided outstanding services and training, and allowed us to deploy and start generating results very quickly.”

“We selected Pentaho for its ease-of-use. Pentaho addressed many of our requirements -- from reporting and analysis to dashboards, OLAP and ETL, and offered our business users the Excel-based access that they wanted.”

Page 39: Building Scalable OLAP Applications with Mondrian and MySQL

Next Steps and Resources

Contact InformationJulian Hyde, Chief Architect, [email protected]

More information http://www.pentaho.org and

http://mondrian.pentaho.org

Pentaho Community Forum http://community.pentaho.orgGo to Developer ZoneDiscussions

Pentaho BI Platform including Mondrian

http://www.pentaho.org/download/latest

Mondrian OLAP Library onlyhttp://sourceforge.net/project/showfiles.php?group_id=35302

Page 40: Building Scalable OLAP Applications with Mondrian and MySQL

Birds of a Feather session:MySQL Data Warehousing and BI

Speakers:

James Dixon, Senior Architect and Chief Technology (a.k.a. "Chief Geek"),

Pentaho Corporation

Roland Bouman, Certification Developer, MySQL A.B.

Matt Casters, Data Integration Architect and Kettle Project Founder,

Pentaho

Julian Hyde, OLAP Architect and Mondrian Project Founder, Pentaho

Brian Miezejewski, Principal Consultant, MySQL inc.

Track: Data Warehousing and Business Intelligence

Date: TONIGHT!!

Time: 7:30pm - 9:00pm

Location: Camino Real

Page 41: Building Scalable OLAP Applications with Mondrian and MySQL

Thank you for attending!