48
Oracle 10g for Data Warehousing Hermann Baer, Oracle Product Management Data Warehousing Server Technologies NoCOUG Winter Conference, Feb 8 th 2005

"Oracle10g for Data Warehousing"

  • Upload
    tess98

  • View
    863

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: "Oracle10g for Data Warehousing"

Oracle 10g for Data Warehousing

Hermann Baer, OracleProduct Management Data Warehousing

Server Technologies

NoCOUG Winter Conference, Feb 8th 2005

Page 2: "Oracle10g for Data Warehousing"

Agenda

Oracle10g for data warehousing - short trip back in the history

– Continuous innovation over decades Adoption trends and drivers

– What do we see in the market Design and build a Data Warehouse

– Ensure a well-balanced system– Optimize Oracle

Oracle Database 10gR2 – sneak preview

Page 3: "Oracle10g for Data Warehousing"

The way to Oracle10g …

Data Warehousing development started decades ago with Oracle 7.0

– Primary focus on performance and scalability– Extended with Manageability and the BI platform

vision in the Oracle8i time frame Data Warehousing Imperatives

– Efficient Extract, Transform, Load (ETL)– Managing Large Data Volumes– Fast Query Response– Supporting Large User Population– Managing Simply

Page 4: "Oracle10g for Data Warehousing"

Oracle 7.3Oracle 7.3

Oracle10g for Data Warehousing Continuous Innovation

Partitioned Tables and Indexes Partition Pruning Parallel Index Scans Parallel Insert, Update, Delete Parallel Bitmap Star Query Parallel ANALYZE Parallel Constraint Enabling Server Managed Backup/Recovery Point-in-Time Recovery

Partitioned Tables and Indexes Partition Pruning Parallel Index Scans Parallel Insert, Update, Delete Parallel Bitmap Star Query Parallel ANALYZE Parallel Constraint Enabling Server Managed Backup/Recovery Point-in-Time Recovery

Oracle 8.0Oracle 8.0

Hash and Composite Partitioning Resource Manager Progress Monitor Adaptive Parallel Query Server-based Analytic Functions Materialized Views Transportable Tablespaces Direct Loader API Functional Indexes Partition-wise Joins Security Enhancements

Hash and Composite Partitioning Resource Manager Progress Monitor Adaptive Parallel Query Server-based Analytic Functions Materialized Views Transportable Tablespaces Direct Loader API Functional Indexes Partition-wise Joins Security Enhancements

Oracle9iOracle9i

List and Range-List Partitioning Table Compression Bitmap Join Index Self-Tuning Runtime Memory New Analytic Functions Grouping Sets External Tables MERGE Multi-Table Insert Proactive Query Governing System Managed Undo

List and Range-List Partitioning Table Compression Bitmap Join Index Self-Tuning Runtime Memory New Analytic Functions Grouping Sets External Tables MERGE Multi-Table Insert Proactive Query Governing System Managed Undo

Oracle8iOracle8i

Oracle10gOracle10g Self-tuning SQL Optimization SQL Access Advisor Automatic Storage Manager Self-tuning Memory Change Data Capture SQL Models SQL Frequent Itemsets SQL Partition Outer Joins Statistical functions and much more ...

Self-tuning SQL Optimization SQL Access Advisor Automatic Storage Manager Self-tuning Memory Change Data Capture SQL Models SQL Frequent Itemsets SQL Partition Outer Joins Statistical functions and much more ...

Page 5: "Oracle10g for Data Warehousing"

Agenda

Oracle10g for data warehousing - short trip back in the history

– Continuous innovation over decades Adoption trends and drivers

– What do we see in the market Design and build a Data Warehouse

– Ensure a well-balanced system– Optimize Oracle

Oracle Database 10gR2 – sneak preview

Page 6: "Oracle10g for Data Warehousing"

Oracle VLDWs are growing– Less systems, more data

DW systems are consolidated– Global view of the business

Importance of Data Warehousing increases dramatically

– Growing operational/tactical importance Cost Effectiveness becomes more important

– Better decisions, lower cost

Main Trends and Drivers

Page 7: "Oracle10g for Data Warehousing"

1. Sears Teradata 4.63

2. HCIA Informix 4.50

3. Wal-Mart Teradata 4.42

4. Tele Danmark DB2 2.84

5. CitiCorp DB2 2.47

6. MCI Informix 1.88

7. NDC Health Oracle 1.85

8. Sprint Teradata 1.30

9. Ford Oracle 1.20

10. Acxiom Oracle 1.13

SBC Teradata10.50

First Union Informix 4.50

Dialog Proprietary 4.25

Telecom Italia DB2 3.71

FedEx Teradata 3.70

Office Depot Teradata 3.08

AT & T Teradata 2.83

SK C&C Oracle 2.54

NetZero Oracle 2.47

Telecom Italia Informix 2.32

2001 Survey1998 SurveyFrance Telecom Oracle

29.23

AT&T Daytona26.27

SBC Teradata24.81

Anonymous DB216.19

Amazon.com Oracle13.00

Kmart Teradata12.59

Claria Oracle12.10

HIRA Sybase IQ11.94

FedEx Teradata9.98

Vodafone Teradata9.91

2003 Survey

Oracle VLDWs are growingWinter 2003 VLDB SurveyLargest Database Size, Decision Support

Page 8: "Oracle10g for Data Warehousing"

Powerful RDBMS functionality becomes more important and visible, e.g.

– Partitioning– Table compression– Automatic Storage Management (ASM)– Parallel processing

Oracle VLDWs are growing

Page 9: "Oracle10g for Data Warehousing"

Increasing Importance of DW

Latency between operational and analytical data must be minimized

– Intelligence when you need it Need for new and enhanced analytical

capabilities– More value from your data

“Classical” strengths of an RDBMS become more important

– E.g. Security, B/R, Availability, Concurrency

Page 10: "Oracle10g for Data Warehousing"

Safe money whenever possible– Commodity servers– Commodity disks– Software manageability

Example Amazon– 16 low cost Intel boxes replaced one SuperDome– Low cost storage arrays replaced high end storage

arrays– 2 DBAs

Cost Effectiveness

Page 11: "Oracle10g for Data Warehousing"

3 6 9 12 15 18 21 24

Months

100%

200%

300%W

o

r

k

l

o

d

Cost EffectivenessPay and Scale Incrementally

Page 12: "Oracle10g for Data Warehousing"

3 6 9 12 15 18 21 24

Months

100%

200%

300%W

o

r

k

l

o

d

Cost EffectivenessPay and Scale Incrementally ... with RAC

Page 13: "Oracle10g for Data Warehousing"

Commodity components make specific database functionality more important

– RAC for Scalability and Availability– Resource Manager– Automatic Storage Management (ASM)– RMAN / Oracle Backup (Oracle10gR2)

Cost Effectiveness

Page 14: "Oracle10g for Data Warehousing"

Oracle Database 10gDW Major Feature Summary

• ULDB support– Database size extended to

Exabytes (BIGFILES)– Unlimited size LOBs– Hash Partitioned Global Indexes– ASM removes file system limits

More Value From Your Data– Many New OLAP Features– New Data Mining algorithms – Stand-alone Data Mining Tool– Advanced Statistics

– SQL Model Clause– Frequent Item Sets– Partition Outer Join

Intelligence When You Need It–Cross Platform Transportable Tablespaces–Data Pump

– Async Change Data Capture– Enhancements to MERGE

Reduced Total Cost of Ownership• Manageability

– Workload Repository– Automatic SQL Tuning– Self-Tuning Global Memory– ASM

Page 15: "Oracle10g for Data Warehousing"

Agenda

Oracle10g for data warehousing - short trip back in the history

– Continuous innovation over decades Adoption trends and drivers

– What do we see in the market Design and build a Data Warehouse

– Ensure a well-balanced system– Optimize Oracle

Oracle Database 10gR2 – sneak preview

Page 16: "Oracle10g for Data Warehousing"

Build the foundation for Success

Even after decades of innovation, a computer ‘still’ consists of three main components

– CPU provides the computing power– Memory stores the transient data for computational operations– Disks (I/O) store the persistent information

Getting the best performance is finding the right balance of all these components and use them optimally

– Size your system appropriately– Design your database appropriately– Use the database appropriately

Data Warehousing is ‘just a special kind of application’

Page 17: "Oracle10g for Data Warehousing"

Configuring for your Workload

CPU requirements depend on user workload:– Concurrency of users, ratio of CPU-related tasks

Memory requirement mostly user-process driven IO requirements depend on query-mix:

– CPU vs. IO Relative CPU power for IO related tasks

– Logically Random IOs (predominant in star schema) required for index driven queries, e.g. Index lookups, Index

driven joins, Index scans– Logically Sequential IOs (predominant in 3rd NF schema)

required for table scans, e.g. Hash Joins

Find the balance between CPU and IO

Page 18: "Oracle10g for Data Warehousing"

Oracle can read 300+MB/sec per GHz/CPU power– Direct Read, multi-block IO,

e.g, parallel full table scan ('lab environment')

An ‘average’ DW system should plan for 75 -100MB/sec per GHz/CPU

– Typical mixture of IO and CPU intensive operations – Ball park number, adjust accordingly

TPC-H plans for appr. 200MB per 3GHz Xeon

Sizing GuidelinesConfiguring for Throughput

Page 19: "Oracle10g for Data Warehousing"

Configuring for Throughput“The weakest link” defines the throughput

Components to consider:● CPU: Quantity and speed ● HBA (Host Bus Adapter):

Quantity and speed● Switch speed● Controller: Quantity and speed● Disk: Quantity and speed

FC-Switch1 FC-Switch2

DiskArray 1

DiskArray 2

DiskArray 3

DiskArray 4

DiskArray 5

DiskArray 6

DiskArray 7

DiskArray 8

HB

A1

HB

A2

HB

A1

HB

A2

HB

A1

HB

A2

HB

A1

HB

A2

Page 20: "Oracle10g for Data Warehousing"

Throughput Performance

Component theory (Bit/s) maximal Byte/s

HBA 1/2Gbit/s 100/200 Mbytes/s

16 Port Switch 8 x 2Gbit/s 1600 Mbytes/s

Fibre Channel 2Gbit/s 200 Mbytes/s

Disk Controller 2Gbit/s 200 Mbytes/s

GigE NIC 1Gbit/s 80 Mbytes/s

Infiniband 10Gbit/s 890 Mbytes/s

CPU 200MB/s

Configuring for ThroughputBit is not Byte

Page 21: "Oracle10g for Data Warehousing"

Configuring for Throughput

FC-Switch1 FC-Switch2

DiskArray 1

DiskArray 2

DiskArray 3

DiskArray 4

DiskArray 5

DiskArray 6

DiskArray 7

DiskArray 8

HB

A1

HB

A2

HB

A1

HB

A2

HB

A1

HB

A2

HB

A1

HB

A2

Each switch needs to support 800MB/s to guarantee a total system throughput of 1600 MB/s

Each machine has 2 HBAs = 400MB/s; all 8 HBAs can sustain 8 * 200MB/s = 1600 MB/s

Each machine has 2 CPUs; all four servers drive about 2 * 200MB/s * 4 = 1600 MB/s

Each disk array has one 2Gbit controller; all 8 disk arrays can sustain 8 * 200MB/s = 1600 MB/s

Page 22: "Oracle10g for Data Warehousing"

Configuring the Storage

Design for throughput, not capacity Keep it simple

– Try using RAID 0+1 Use S.A.M.E. methodology

– Stripe And Mirror Everything– At the HW level, if available– Using ASM capabilities

Leverage ASM whenever possible– Striping and Mirroring capabilities– Automatic rebalancing– Enables low cost storage

Page 23: "Oracle10g for Data Warehousing"

You can easily compute the theoretical I/O performance of your system

– Typically measured by the minimum of [ I/O channel capacity, I/O controller capacity, disk I/O capacity]

Verify the I/O performance limits using OS-level commands

– Do this prior to using the database Cover basic IO operations and the average future load

pattern– Random single block IO vs. sequential multi block IO– Concurrency

Calibrate your System

Page 24: "Oracle10g for Data Warehousing"

Calibrate your SystemThroughput dd vs. ORCL DIRECT READ

0

100

200

300

400

500

Throughput [MB/s]

1 2 3 4 5 6 7 8 9

Copies of dd/Degree of Paralellism

dd Oracle

● Oracle drives about 90% of what dd can drive with a table scan● If you do not get the expected throughput fix the hardware

Page 25: "Oracle10g for Data Warehousing"

Agenda

Oracle10g for data warehousing - short trip back in the history

– Continuous innovation over decades Adoption trends and drivers

– What do we see in the market Design and build a Data Warehouse

– Ensure a well-balanced system– Optimize Oracle

Oracle Database 10gR2 – sneak preview

Page 26: "Oracle10g for Data Warehousing"

Schema – which way to go?

Don’t get lost in theory and academia– Philosophical discussions won’t help (“Star fights 3NF”)– None of the two extremes will work (RedBrick?, Teradata?)

Design according to your business needs– Reality shows that most of the customers are doing a mix and

match

3NF more in an ODS layer ‘Denormalized’ 3NF in DW/Stage for general

purposes Dimensional model for subject areas, e.g. sales,

marketing (remember shared dimensions!)

* OLAP will not be covered in this presentation

Successful database has to support everything

Page 27: "Oracle10g for Data Warehousing"

The chosen schema approach determines used Oracle functionality

The chosen schema approach determines IO pattern– Logically Random IOs (predominant in star schema)

required for index driven queries, e.g. Index lookups, Index driven joins, Index scans

– Logically Sequential IOs (predominant in 3rd NF schema) required for table scans, e.g. Hash Join

Oracle has both functionality to – Push the IO to the limit– Optimize the IO requirements

Schema – which way to go?

Page 28: "Oracle10g for Data Warehousing"

Schema – which way to go?Star Schema

Leading performance for dimensional schemas Innovative usage of bitmap indexes and

bitmap join indexes– Index access instead of large table access– Bitmap indexes 3 to 20 times smaller than

btree indexes

Support for complex star schemas– Multiple fact tables– Snowflake schemas– Large number of dimensions

Fully integrated Parallel execution Partition Pruning

Page 29: "Oracle10g for Data Warehousing"

99-May

99-Apr

99-Feb

99-Jan

99-Mar

99-Jun

Sales

I/O – Minimize RequestsPartition Pruning

Only the relevant partitions will be accessed

Optimizer knows or finds the relevant partitions

– Static pruning with known values in advance– Dynamic pruning uses internal recursive

SQL to find the relevant partitions Minimizes I/O operations

– Also provides order of magnitude performance gains

Page 30: "Oracle10g for Data Warehousing"

Monthly Salesby Region

QueryWhat were the sales in the West and South regions for the past three Quarters?

DetailDetail

I/O – Minimize RequestsMaterialized Views

A simple rollup Month -> Quarter provides unprecedented gain on performance and minimal I/O

QueryRewrite

Page 31: "Oracle10g for Data Warehousing"

CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS

............

Jan

Feb

Mar

Ap

r............

Jan

Feb

Mar

Ap

r

Jan

Jan

Jan, Hash 1

Jan, Hash 2

Jan, Hash 3

Jan, Hash 4

Example of an optimized parallel partition-wise join of a composite partitioned table

Schema – which way to go?3NF example

Page 32: "Oracle10g for Data Warehousing"

Use parallelism to enable single process scalability

Unrestricted parallelism– No data layout requirement or restriction (as in

shared nothing systems) – All operations can be parallelized

Data on Disk Query Servers

scanscan

scanscan

scanscan

sort A-Ksort A-K

sort L-S sort L-S

sort T-Zsort T-Z

Dispatch workDispatch work

Scanners Sorters (Aggregators)

Coordinator

Schema Agnostic - Parallel ExecutionSchema – which way to go?

Page 33: "Oracle10g for Data Warehousing"

DOP 2

DOP 2

Total 200 MB/sec

I/O bandwidth requirement increases with single process parallelism and multi-user concurrency

– Plan for your system’s expected I/O throughput based on average concurrent users and parallelism

Schema – which way to go?Schema Agnostic - Parallel Execution

Page 34: "Oracle10g for Data Warehousing"

DOP 4

DOP 4

DOP 4

DOP 4

Total 400 MB/sec

I/O bandwidth requirement increases with single process parallelism and multi-user concurrency

– Plan for your system’s expected I/O throughput based on average concurrent users and parallelism

Schema – which way to go?Schema Agnostic - Parallel Execution

Page 35: "Oracle10g for Data Warehousing"

Total 800 MB/sec

DOP 8

DOP 8

DOP 8

DOP 8

DOP 8

DOP 8

DOP 8

DOP 8

Total 1600 MB/sec

DOP 8

DOP 8

DOP 8

DOP 8

DOP 8

DOP 8

DOP 8

DOP 8

I/O bandwidth requirement increases with single process parallelism and multi-user concurrency

– Plan for your system’s expected I/O throughput based on average concurrent users and parallelism

Schema – which way to go?Schema Agnostic - Parallel Execution

Page 36: "Oracle10g for Data Warehousing"

Star schema– Range-partition fact tables by time– Bitmap indexes on dimension-key columns of fact table– ‘Star transformation’ for end-user queries– Materialized views for pre-aggregated cubes

3NF or normalized schema– Composite range-hash partitioning on large tables– ‘Partition-wise’ joins and parallel execution are key

performance enabler for joining large tables Hybrid environments

– Use both dogmas concurrently in the same system without affecting each other

Schema – which way to go?Oracle‘s functionality

Choose what fits your needs best!Oracle provides optimizations for any kind of setup

Page 37: "Oracle10g for Data Warehousing"

Init.ora – less is more

Do not de-tune Oracle– Very often, our performance engineers are

getting improvements just by removing parameters

– Results can be poor optimizer plans, wasted memory, and serialization points

Trust Oracle– Don’t try and second guess the software– With the exception of buffer and subject area

related parameters, the system defaults are usually optimum

Lessons learned from History

Page 38: "Oracle10g for Data Warehousing"

Init.ora – less is more

Ensure that data warehouse relevant parameters are set

– Not all parameters are enabled by default in older database releases prior to Oracle10g

Size and set buffer and memory related parameters

– Two parameters are enough

Do not touch other parameters unless necessary

Basic Rules

Page 39: "Oracle10g for Data Warehousing"

Init.ora – less is more

COMPATIBLE– Database release version to enable new functionality

OPTIMIZER_FEATURES_ENABLED– Database release version to enable new functionality

DB_MULTIBLOCK_READ_COUNT– Maximize multiblock I/O (use multiple of OS I/O size)

DISK_ASYNCH_IO– Set to TRUE (Only relevant for older Linux versions)

PARALLEL_MAX_SERVERS– Adjust to system capabilities (default to 5 prior to Oracle10g)

QUERY_REWRITE_ENABLED– Set to TRUE, enabled by default with Oracle10g

QUERY_REWRITE_INTEGRITY– ENFORCED by default, can be potentially lowered

STAR_TRANSFORMATION_ENABLED– Set to TRUE

Data Warehouse relevant parameters

Page 40: "Oracle10g for Data Warehousing"

Data Warehousing is ‘just a special kind of application’

Ensure a well-tuned I/O subsystem– Size for I/O throughput, not for disk capacity– Use appropriate hardware / storage

Find a schema balance– Design according your needs using the

appropriate model, not the other way around Init.ora settings: less is more

Build the foundation for SuccessSummary

Page 41: "Oracle10g for Data Warehousing"

Agenda

Oracle10g for data warehousing - short trip back in the history

– Continuous innovation over decades Adoption trends and drivers

– What do we see in the market Design and build a Data Warehouse

– Ensure a well-balanced system– Optimize Oracle

Oracle Database 10gR2 – sneak preview

Page 42: "Oracle10g for Data Warehousing"

ETL Enhancements

DML error logging• Column values that are too large• Constraint violations (NOT NULL, unique,

referential, check constraints)• Errors raised during trigger execution• Type conversion errors• Partition mapping errors

• Distributed Change Data Capture• Enables 9.2 as source for asynchronous CDC

Page 43: "Oracle10g for Data Warehousing"

DML Error Logging (example)

INSERTINTO salesSELECT product_id, customer_id, TRUNC(sales_date), 3, promotion_id, quantity, amountFROM sales_activity_directLOG ERRORS INTOsales_activity_errors('load_20050801')REJECT LIMIT UNLIMITED ;

Page 44: "Oracle10g for Data Warehousing"

Performance Enhancements

Sort– ORDER BY statements– (B-tree) index creation– Up to 5 times performance improvement

Aggregation– GROUP BY statements– Materialized views using aggregations– Implicit use of aggregations, e.g. statistics gathering– Two to three times performance improvement

Query rewrite using multiple materialized views

Page 45: "Oracle10g for Data Warehousing"

Partitioning Enhancements

Scalability– Maximum number of partitions 64K -> 1M– Resource optimization for DROP TABLE of a

partitioned table– Support for partitioning on index-organized tables– Support for hash-partitioned global indexes

Performance– Support for ‘Multi dimensional’ partition pruning

Page 46: "Oracle10g for Data Warehousing"

Other Enhancements

Manageability– SQL Access Advisor improvements– Materialized view refresh improvements

Analytics– SQL model clause enhancements

Page 47: "Oracle10g for Data Warehousing"

Summary

Oracle10g for data warehousing - short trip back in the history

– The most powerful and successful DW platform

Adoption trends and drivers– Be visionary, though conservative – Guarantee success and protect investments

Design and build a Data Warehouse– Ensure a well-balanced system– Optimize Oracle

Oracle Database 10gR2 Beta – Interested?

Page 48: "Oracle10g for Data Warehousing"

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S