33
Tuning ETLs for Better BI Datavail is the largest provider of remote database administration in the U.S. with nearly 400 DBAs, 24/7 support and onsite/offsite, Presented by Chuck Ezell Performance, Tuning & Optimization Services, Datavail

Tuning et lsforbetterbi_ioug_collaborate

Embed Size (px)

Citation preview

Tuning ETLs for Better BI

Datavail is the largest provider of remote database administration in the U.S. with nearly 400 DBAs, 24/7 support and onsite/offsite, onshore/offshore delivery.

Presented by Chuck EzellPerformance, Tuning & Optimization

Services, Datavail

www.datavail.com

Agenda• OLTP and OLAP an approach for tuning• More than just data: peeling back the layers• Components & Layers of Common ETLs• Component Points of Failure• Source, Transformation & Target Tuning Points• High-Level Tuning Examples• Monitoring ETL Activity (tools to make it easy)• Recap & Questions

www.datavail.com

OLTP & OLAP• OLTP Online Transaction Processing

• Best for relational database transactions.• Emphasis is on Fast Query & Relational Data Integrity• Emphasis very normalized data• Business Process Data (operational, workflows, etc…)• Insert, Update & Delete activity

• OLAP Online Analytical Processing• Best for structured, sometimes redundant data.• Emphasis is on ability to aggregate & analyze• Emphasis on de-normalized & fewer tables• Data Warehouse (trending, historical, analytical, etc…)• Write (loading) & Reads (complex selects)

Organizationof Data

Most often both OLTP and OLAP systems exist within all ETLs but the tuning of each is different.

www.datavail.com

The Essence of ETLExtracting data from various sources, performing transformations and loading transformed data ready for reporting.

Extraction Transform Load{Workflow / Task / Procedure

www.datavail.com

ReportingData

Target(s)

ReportingData

ETL Stage Components

TempTables

LookupFile

LookupFile

LookupFile

LookupFile

Transform

LookupTables

DataWarehouse

Files

Cloud Data

EBSData

FlatFiles

Source(s)

www.datavail.com

Data

Data Structure

Code Base

Database Setup Application Setup

Host ServerDisk/CPU/RAM CPU/RAM

ETL Component Layers

OS Architecture

Storage

Network Speeds

High-Level ETL Tuning Helpful Tips

www.datavail.com

ReportingData

Target(s)

ReportingData

ETL Stage Component Points of Failure

TempTables

LookupFile

LookupFile

LookupFile

LookupFile

Transform

LookupTables

DataWarehouse

Files

Cloud Data

EBSData

FlatFiles

Source(s)

Disk (I/O)

NetworkLatency

Too MuchIn-Memory

LimitedRAM

File System orCache Fragmentation

IOP & CPUBottlenecks

LimitedSpace

PoorCode

www.datavail.com

Source Bottlenecks & Tuning Ideas• Source is often OLTP structured data (but not always)• A traditional tuning approach will apply• Factor in DML causing Fragmentation & Stats problems• Find poor plans and tune in traditional fashion Data

Warehouse

Files

Cloud Data

EBSData

FlatFiles

• SQL Code (better filtering, use of custom and vendor functions)

• Statistics• Indexing & Table

Fragmentation• Conflicting Sessions or

Processes during ETL• Offload or replicate data for

better isolation

www.datavail.com

Transformation Bottlenecks & Tuning Ideas• Depending on your ETL, high % could be in-memory• RAM & Temp space is critical (the more the better)

• Filesystem lookups can be slow (lack of indexing)

• Filesystems can become fragmented (depending OS)

• SQL Code (in memory merges and joins)

• Statistics can hinder on temp tables• Indexing could slow a process

down• Lack of proper temp space will

cause failures (watch logs & ASM)

• Filesystem lookups perform better if they’re converted to DB table lookups

TempTables

LookupFile

LookupFile

LookupFile

LookupFile

LookupTables

www.datavail.com

Target Bottlenecks & Tuning Ideas• OLAP Write speeds and I/O are overlooked• Indexing and Stats can be problematic• Loading could be single inserts in a loop

• SQL Code• Inserts can benefit from HINT

“APPEND” or “APPEND_VALUES”• Inserts and Updates could benefit from

PARALLEL hinting

• Stats and Indexing added after loads and performed in Parallel (split out tasks)

• Confirm Async I/O settings in OS and DB• Use Bulk Loading where possible

ReportingData

ReportingData

Common Problems & Fails

www.datavail.com

? ?

What do we want from our ETLs?Setting goals will affect our approach however, there are two main goals for any and all ETLs.

Speed Consistency&

www.datavail.com

Common Problems Seen• Doing too much in-memory• Doing too much from filesystem• Not considering network speeds or drive speeds• Not considering system or session conflicts• Not taking advantage of ASYNC features• Not Partitioning• Not providing enough resources to database• Not reviewing workflow logs• Not knowing the business purpose of the data or each task • Using HINTs too much or wrongly (ordered, cardinality, parallel)

www.datavail.com

Using ORDERED /*+ HINTs */• ORDERED forces the table join order• Instructs Optimizer to join in the order they appear in the SQL code• Use LEADING() instead but only for investigation

/*+ ORDERED */

/*+ LEADING(FA_BOOK_TYPE_D, FIN_BUSN_LOCATION_D) */

www.datavail.com

Using CARDINALITY /*+ HINTs */• Cardinality has been deprecated from 10g on • Use OPT_ESTIMATE() instead

CARDINALITY(5) OPT_ESTIMATE(table tabname rows=5)

Wrong

select count(*) from tabname; Result=35,754,849

CARDINALITY(35754849) or

OPT_ESTIMATE(table tabname rows=35754849)

Right

www.datavail.com

Using PARALLEL /*+ HINTs */

Original PlanPlan with Full Table Scans

PARALLEL(auto) or PARALLEL(32)

Could cause unpredictable runtimes

www.datavail.com

Using PARALLEL /*+ HINTs */ Parallelism Introduced

Time and Cost is Reduced

Parallel Hinting also consumed CPU and didn’t solve plan problems.

www.datavail.com

Plan Improvement w/ IndexingFull Table Scan due to NVL() function on filter

condition causing Long Operations

Filtering against almost 1 million rows

www.datavail.com

Plan Improvement w/ IndexingFunction Based Index Immediately

Improved Performance

Index improved filtering performance by reducing read activity from 947k to 253 rows

www.datavail.com

Plan Improvement w/ Indexing

Parallel Hints didn’t reduce Long Ops

Parallel Hinting could improve the performance of the indexing further but

alone would only a band-aid.

Monitoring ETL ActivityFinding the Bottlenecks

www.datavail.com

Monitor Sessions

www.datavail.com

Long Operations = Potential Slow Reads v$session_longops

www.datavail.com

Monitoring Tools: Putty & Top

www.datavail.com

Monitoring Tools: DB Time Monitor dominicgiles.com

www.datavail.com

Monitoring Tools: Monitor DB dominicgiles.com

www.datavail.com

Monitor Tasks in DACDAC serves the following purposes:- DAC is a metadata driven administration and deployment tool- Manages Application Configuration- Manages the execution of warehouse loads- Provides a monitoring capabilities

www.datavail.com

Monitor Tasks in DAC

www.datavail.com

In Closing• OLTP and OLAP an approach for tuning• More than just data: peeling back the layers• Components & Layers of Common ETLs• Component Points of Failure• Source, Transformation & Target Tuning Points• High-Level Tuning Examples• Monitoring ETL Activity (tools to make it easy)• Recap & Questions

Questions?Questions can also be sent to

[email protected] [email protected]

Presented by Chuck EzellPerformance, Tuning & Optimization

Services, Datavail

Datavail is the largest provider of remote database administration in the U.S. with nearly 400 DBAs, 24/7 support and onsite/offsite, onshore/offshore delivery.

www.datavail.com

www.datavail.com

Monitoring Large Datasets

Long Operations