Upload
nguyennhi
View
213
Download
0
Embed Size (px)
Citation preview
© Copyright IBM Corporation 2009. All rights reserved. U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS AND/OR SOFTWARE.
IBM, the IBM logo, ibm.com, Informix, solid, DataMirror, Optim, Cognos are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries.
Other company, product, or service names may be trademarks or service marks of others.
Disclaimer
Contents
Definition of BI/DW/BA
Types of IDS BI Users
OLTP vs. Data Warehousing
Informix Warehouse
IDS Storage Optimization
Your Feedback and Requirements
Business Intelligence
• A set of concepts and methodologies to improve decision making in business through use of facts and fact-based systems …..Howard Dresner, The Gartner Group
• The processes, technologies, and tools needed to turn data into information, information into knowledge, and knowledge into plans that drive profitable business actions … .David Loshin, Business Intelligence: The Savvy Manager’s Guide
The foundation that enables BI is the enterprise architecture – business, data, and technology. A well-implemented data warehousing
program provides much of that foundation.
Data Warehousing
• A data warehouse is a subject-oriented, integrated, non-volatile, time variant collection of data organized to support management needs ….W H Inmon
• The Data Warehouse is nothing more than the union of all the constituent data marts
….Ralph Kimball, et al, The Data Warehouse Life Cycle Toolkit
The data warehousing process turns raw data into potentially valuable information usable by people and systems. Warehousing enhances data assets value by:
– Applying standards and consistency to the data – Organizing the data into subject areas that cross business functional
lines – Integrating the data – Enforcing data consistency over time to provide meaningful history – Acting as a stable and reliable source – Providing easy access to data
Business Analytics The process of using information to enhance knowledge and apply that
knowledge to help a business achieve its objectives. Analytic applications provide tools to facilitate the business analytics process.
Business Metrics and Business Management
Business Process Management
Business Performance Management
Business Activity Monitoring
Customer Relationship Management
Supply Chain Management
Performance Dashboards for Information Delivery
Real-time (or near Real-time) Monitoring
Scorecards for Information Delivery
Monitoring history & trends
Analytic Applications for Information Delivery
Customer Analysis, Marketplace Analysis, Sales Channel Analysis, …
Range of Business Analytics
Reporting
Using Query, Reporting and search tools
Analysis
Monitoring
Prediction
Using OLAP & Virtualization tools
Using Dashboards & Scorecards
Using Predictive Analysis tools
Business Value High Low
High
Com
plex
ity
Source: TDWI
IDS in BI/Warehousing
• Given the IDS Characteristics of Reliability, High Availability, Performance, Ease of Use, why isn’t IDS in this space? – IDS has traditionally been viewed as an OLTP solution
• However, there a lot more warehousing users on IDS than one realizes! – Some customers have implemented IDS warehouses at
Terabyte levels – There are a lot of features already in IDS that make it suitable
for BI/Warehousing – BI tools have become very sophisticated over the years
• We recognize the need to provide better warehousing capabilities for IDS users
What’s Available? IDS Warehousing Features
• Performance & Scalability – Inherent SMP Multi-threading – Parallel Data Query (PDQ) – Light Scan for fast table scans – Online Index build – Efficient Hash Joins – Auto Fragment Elimination – Memory Grant Manager (MGM) – High Performance Loader – Optimistic Concurrency
• Easy of Management – Time cyclic data management using Range Partitioning – OPTCOMPIND optimization
BI Users Classification
1. BI on Existing OLTP Schema (Operational BI) 2. BI on Star Schema (Data Mart) 3. BI in a Mix-Workload Environment 4. Enterprise BI
Type 1: BI/Analytics on OLTP Schema
• Majority of today’s IDS customers have the need to do BI/Analytics on their existing IDS (OLTP) database.
• They currently use a combination of 4GL programs, Excel, and BI tools (Business Objects, Cognos, Crystal Reports)
• Custom code and maintenance required by customer • Performance may be acceptable even on an OLTP schema • Allows for “operational BI”
OLTP vs. Data Warehousing Workload
• Short Transactions – Relatively simple SQL
• Random Updates – Few Rows accessed
• Sub-second response time • ER Modeling
– Minimizes redundancy • Normalized data (5NF)
– Minimizes duplicates • Few indexes
– Avoids index maintenance • Pre-compiled queries
– Repeated execution of queries
• Longer Transactions – Complex SQL with analytics
• Sequential Updates – Many Rows Accessed
• Secs to Mins response time • Dimensional Modeling
– OK to have redundancy • De-normalized data (3NF)
– Duplicates are OK • OK to have more indexes
– Mostly read only • Ad-hoc queries
– Unpredictable load
Type 2. BI/Analytics on IDS on Star Schema
• Transform OLTP database into Star Schema database
• Better performance for data warehousing and dimensional queries
• Star Schema database may be on a separate machine/domain
• Suitable for customers building separate data mart
• Use IDS as is against Star Schema
What’s Available? BI Tools
The Performance Management Framework Cognos identifies best-practice decision areas, or information sweet spots by business function:
Cognos 8 provides a comprehensive set of BI tools for:
Reporting
Analysis
Dashboards
Scorecards
Performance Management Framework for:
Solutions for different areas of the organization
Cognos Business Intelligence and Performance Management One Platform, One Architecture
Industry and Functional Solutions
Complete Coverage of all capabilities
Enterprise-Class SOA Platform
SQL Warehousing Tools Overview
• Typical process – Identify requirements
• Data Architect
– Define data transformation (ETL/ELT) process
• SQL/ETL developer – Development of sql/shell scripts
• SQL/ETL developer – Deployment in production system
• Application Architect, DBA
– Reporting • Business user
– Refine requirements
• SQW Solution – Data Modeling
• Physical Data Model (Reverse engineering, New from scratch, generate DDL), compare & sync
– Data Flows �• Visual Design • Optimized SQL code generation • Control flow supports programming
logic – Admin Console
• Schedule, Monitor, Parameterized values
– Eclipse free reporting tool • e.g. BIRT
– Reusable flows • Easy refinement • Copy & paste, refactor • Challenges
– Dynamic requirements • Constantly refinement
– Multiple roles, tools • Each have different
perspective • Communication cost/
information loss – Unreadable, hard-to-debug scripts
• Poor productivity
• Values – Easy to design & reuse
• Increased productivity – Integrated tools
• Seamless integration inside Eclipse
– Auto generated code from visualized flows
• Optimized SQL code – Impact analysis for any data model
change
SQW
Control DB
IDS
Execution
DESIGN
Design Center
(Eclipse)
Data Flows + Control Flows
DEPLOY
Deployment
preparation
Deployment package
Code Units Build Profile
User scripts
Deploy
RUNTIME HTTP service (WAS )
SQW Runtime
Applications Other Servers
(DataStage)
Warehouse
DB
IDS
DB2
Oracle
SQL Server
Desig
n
Stu
dio
A
dm
in C
on
sole
Deploy
SQW
Execution
DB
IDS
Data Source
Data
base
s
SQW Architecture
SQW: Design Studio • Design Studio
– Eclipse based IDE • Integrated tools, shell sharing
– Team development • CVS, clearcase for checkin/checkout
projects, flows • Data Warehousing Project
– Data Models – Data Flows – Control Flows – Warehouse Applications (deployment
packages) – Subflow & Subprocess (reusable flow
module) – Variables
• Data Source Explorer – Database connections to multiple
vendors, e.g. Informix, DB2 LUW, Oracle, SQL Server, MySQL, DB2 z/OS
• DataStage Servers – Integration with IBM DataStage
SQW: Data Modeling
Physical Data Model Visualized data modeling
Impact analysis
Reverse engineering or new from scratch
Compare & sync
Generate DDL
Overview diagram
Shell Sharing with Rational Data Architect & other Data Studio products
SQW: Data Flows
Data Flow Operators: -- source & target operators (table, file)
-- SQL Transformation operators
-- Warehousing operators
File source
Table source
Table join
aggregation
Table target
SQW: Data Flows
A simple flow
Generated SQL code
-- optimization across SQL statements.
-- optimized staging strategy
-- in-database transformation
SQW: Control Flows
Control flow
Common utility operators
Control logic, parallel execution, loop iteration
Error handling
SQW Overview
Design Studio
Eclipse Based Design Environment
Admin Console
Production Environment in Websphere
deploy Application package (zip file)
deployment profile (database connections, machine resources, variable definitions, DDL files etc..)
Generated code
create
Manage warehouse applications Schedule
Monitor
man
age
Admin Console
Flex RIA based Warehouse Admin Console
Admin Console manages common resources (e.g. databases connections, ftp servers, datastage servers)
Schedule & monitor warehouse processes
XPS Customers Looking to Migrate to IDS
• External Tables – XPS style loader for easy migration
• Partitioning Strategies – Auto fragmentation – Fragment Advisor – Fragment stats Update – Truncate Fragments
• Primary Storage Manager (PSM) – For simpler, easier management of backups
(replacing ISM) • Merge
– UpSert capabilities
* Features to be included in the next release(s)
Shared Disk
OLTP Apps
SQW
Connection Manager
Primary
SDS
SDS
“OLTP” Node Group
SDS
“SQW” Node Group
M
AC
H 1
1
Blade Server
User transparency Single
database view
OLTP Apps SQW
OR
(ETL) OLTP
Database
Data Warehouse Database
Use Separate Boxes
Use MACH 11
Using Mach11 for OLTP/Warehousing in IDS
Row Compression Concepts
• Compression looks for repeating patterns across the entire table
– When pattern found, string replaced by a 12 bit symbol
– Symbols are stored in a dictionary for fast lookup
• Data resides compressed on pages (both on-disk and in bufferpool)
– Significant I/O bandwidth savings – better performance
– Significant memory savings – more efficient memory utilization
– Some CPU overhead costs
• Rows must be uncompressed before being processed for evaluation
Row Compression Using a Compression Dictionary
• Dictionary contains repeated information from the rows in the table
– Compression candidates can be across column boundaries or within columns
A (01) 220J 200 (02) S (01) 580
T 132 (02) …
Animated Slide
PartCode SPart Quantity LotNum BinLoc Aisle
ANCPRPLT 220J 200 Z165-3 NE132 6157
SNCPRPLT 580T 132 Z165-3 NE132 6157
Dictionary
01 NCPRPLT
02 Z165-3NE1326157
… …
ANCPRPLT 220J 200 Z165-3 NE132 6157 SNCPRPLT 580T 132 Z165-3 NE132 6157 …
A (01) 220J 200 (02) S (01) 580T 132 (02) …
Storage savings
• Tables will often compress in the range of 60% - 80%
• Overall database storage savings will be between 40% and 50%
• That’s 50% less disk space needed to support IDS 11 database!
81% S
maller
78% S
maller
Sales Table Product Table
Performance Benefit
• Performance can be improved using compression
• Many queries will benefit from compression with fewer I/Os
• Consumes more CPU - most customers not 100% CPU bound
40% Faster
– Lab tests show I/O bound workloads improve by 30-40%
• Many utility (backup and recovery for example) will be faster – 2x as fast in some cases as the
database may now be ½ the size
IDS 11 Compression Operations
• estimate_compression – Estimates compression ratio on a table
• create_dictionary – Creates compression dictionary for a table
• compress – Does implicit create_dictionary and compress all previous data
• uncompress – Uncompress the table and deactivates compression
• uncompress_offline – XLOCK table and uncompress it. Also deactivates compression
• purge_dictionary – Delete old inactive dictionaries
Storage Optimization Operations
• repack – Move rows within a table or fragment to consolidate free space
• repack_offline – XLOCK the table and move rows within a table or fragment to
consolidate free space
• shrink – Return free space at end of table or fragment to the dbspace
– Normally done after a repack
Compression On Data Page With Multiple Rows
compress repack
Uncompressed Compressed Compressed
shrink
Multiple Compressed
Pages Dictionary
Empty Data Pages Animated Slide
Admin API Interface
• All compression and storage optimization operations are invoked via the IDS Admin API built-in UDRs – execute function task(…); – execute function admin(…);
• Example execute function task
(
”table compress repack shrink”,
”table_name”, ”database_name”, ”owner_name”
);
Features That Cannot Be Compressed
• Out-of-row data (e.g. blobs)
• Indexes
• Temp tables
• Catalog tables (Data Dictionary)
• Partition tables (Tablespace Tablespace)
• Dictionary Partitions
• Tables in the following databases: – Sysmaster – Sysutils – Sysuser – Syscdr – Syscdcv1
HDR, ER, CDC (DataMirror) and Compression
• All are supported on compressed tables • HDR
– Tables will be compressed on secondary iff they are compressed on primary
• ER – Compression status of tables is independent between source
and target, specified by user • CDC
– Compression of targets is a function of what the target database supports and what use specifies
Summary
• Storage optimization through IDS 11 compression can save 40-50% of your database storage requirements
• For IO-bound workloads Compression can also improve performance
• You not only see your online database shrink but often more importantly, your backup storage and disaster recovery storage is cut in half as well
• In real customer examples storage savings are realized and performance benefits are apparent
• Add in the time savings with utilities processing (particularly database backup and recover time is cut in half) and you can see the benefits of IDS 11 compression