Upload
datavail
View
43
Download
0
Tags:
Embed Size (px)
Citation preview
Tuning ETLs for Better BI
Datavail is the largest provider of remote database administration in the U.S. with nearly 400 DBAs, 24/7 support and onsite/offsite, onshore/offshore delivery.
Presented by Chuck EzellPerformance, Tuning & Optimization
Services, Datavail
www.datavail.com
Agenda• OLTP and OLAP an approach for tuning• More than just data: peeling back the layers• Components & Layers of Common ETLs• Component Points of Failure• Source, Transformation & Target Tuning Points• High-Level Tuning Examples• Monitoring ETL Activity (tools to make it easy)• Recap & Questions
www.datavail.com
OLTP & OLAP• OLTP Online Transaction Processing
• Best for relational database transactions.• Emphasis is on Fast Query & Relational Data Integrity• Emphasis very normalized data• Business Process Data (operational, workflows, etc…)• Insert, Update & Delete activity
• OLAP Online Analytical Processing• Best for structured, sometimes redundant data.• Emphasis is on ability to aggregate & analyze• Emphasis on de-normalized & fewer tables• Data Warehouse (trending, historical, analytical, etc…)• Write (loading) & Reads (complex selects)
Organizationof Data
Most often both OLTP and OLAP systems exist within all ETLs but the tuning of each is different.
www.datavail.com
The Essence of ETLExtracting data from various sources, performing transformations and loading transformed data ready for reporting.
Extraction Transform Load{Workflow / Task / Procedure
www.datavail.com
ReportingData
Target(s)
ReportingData
ETL Stage Components
TempTables
LookupFile
LookupFile
LookupFile
LookupFile
Transform
LookupTables
DataWarehouse
Files
Cloud Data
EBSData
FlatFiles
Source(s)
www.datavail.com
Data
Data Structure
Code Base
Database Setup Application Setup
Host ServerDisk/CPU/RAM CPU/RAM
ETL Component Layers
OS Architecture
Storage
Network Speeds
www.datavail.com
ReportingData
Target(s)
ReportingData
ETL Stage Component Points of Failure
TempTables
LookupFile
LookupFile
LookupFile
LookupFile
Transform
LookupTables
DataWarehouse
Files
Cloud Data
EBSData
FlatFiles
Source(s)
Disk (I/O)
NetworkLatency
Too MuchIn-Memory
LimitedRAM
File System orCache Fragmentation
IOP & CPUBottlenecks
LimitedSpace
PoorCode
www.datavail.com
Source Bottlenecks & Tuning Ideas• Source is often OLTP structured data (but not always)• A traditional tuning approach will apply• Factor in DML causing Fragmentation & Stats problems• Find poor plans and tune in traditional fashion Data
Warehouse
Files
Cloud Data
EBSData
FlatFiles
• SQL Code (better filtering, use of custom and vendor functions)
• Statistics• Indexing & Table
Fragmentation• Conflicting Sessions or
Processes during ETL• Offload or replicate data for
better isolation
www.datavail.com
Transformation Bottlenecks & Tuning Ideas• Depending on your ETL, high % could be in-memory• RAM & Temp space is critical (the more the better)
• Filesystem lookups can be slow (lack of indexing)
• Filesystems can become fragmented (depending OS)
• SQL Code (in memory merges and joins)
• Statistics can hinder on temp tables• Indexing could slow a process
down• Lack of proper temp space will
cause failures (watch logs & ASM)
• Filesystem lookups perform better if they’re converted to DB table lookups
TempTables
LookupFile
LookupFile
LookupFile
LookupFile
LookupTables
www.datavail.com
Target Bottlenecks & Tuning Ideas• OLAP Write speeds and I/O are overlooked• Indexing and Stats can be problematic• Loading could be single inserts in a loop
• SQL Code• Inserts can benefit from HINT
“APPEND” or “APPEND_VALUES”• Inserts and Updates could benefit from
PARALLEL hinting
• Stats and Indexing added after loads and performed in Parallel (split out tasks)
• Confirm Async I/O settings in OS and DB• Use Bulk Loading where possible
ReportingData
ReportingData
www.datavail.com
? ?
What do we want from our ETLs?Setting goals will affect our approach however, there are two main goals for any and all ETLs.
Speed Consistency&
www.datavail.com
Common Problems Seen• Doing too much in-memory• Doing too much from filesystem• Not considering network speeds or drive speeds• Not considering system or session conflicts• Not taking advantage of ASYNC features• Not Partitioning• Not providing enough resources to database• Not reviewing workflow logs• Not knowing the business purpose of the data or each task • Using HINTs too much or wrongly (ordered, cardinality, parallel)
www.datavail.com
Using ORDERED /*+ HINTs */• ORDERED forces the table join order• Instructs Optimizer to join in the order they appear in the SQL code• Use LEADING() instead but only for investigation
/*+ ORDERED */
/*+ LEADING(FA_BOOK_TYPE_D, FIN_BUSN_LOCATION_D) */
www.datavail.com
Using CARDINALITY /*+ HINTs */• Cardinality has been deprecated from 10g on • Use OPT_ESTIMATE() instead
CARDINALITY(5) OPT_ESTIMATE(table tabname rows=5)
Wrong
select count(*) from tabname; Result=35,754,849
CARDINALITY(35754849) or
OPT_ESTIMATE(table tabname rows=35754849)
Right
www.datavail.com
Using PARALLEL /*+ HINTs */
Original PlanPlan with Full Table Scans
PARALLEL(auto) or PARALLEL(32)
Could cause unpredictable runtimes
www.datavail.com
Using PARALLEL /*+ HINTs */ Parallelism Introduced
Time and Cost is Reduced
Parallel Hinting also consumed CPU and didn’t solve plan problems.
www.datavail.com
Plan Improvement w/ IndexingFull Table Scan due to NVL() function on filter
condition causing Long Operations
Filtering against almost 1 million rows
www.datavail.com
Plan Improvement w/ IndexingFunction Based Index Immediately
Improved Performance
Index improved filtering performance by reducing read activity from 947k to 253 rows
www.datavail.com
Plan Improvement w/ Indexing
Parallel Hints didn’t reduce Long Ops
Parallel Hinting could improve the performance of the indexing further but
alone would only a band-aid.
www.datavail.com
Monitor Tasks in DACDAC serves the following purposes:- DAC is a metadata driven administration and deployment tool- Manages Application Configuration- Manages the execution of warehouse loads- Provides a monitoring capabilities
www.datavail.com
In Closing• OLTP and OLAP an approach for tuning• More than just data: peeling back the layers• Components & Layers of Common ETLs• Component Points of Failure• Source, Transformation & Target Tuning Points• High-Level Tuning Examples• Monitoring ETL Activity (tools to make it easy)• Recap & Questions
Questions?Questions can also be sent to
[email protected] [email protected]
Presented by Chuck EzellPerformance, Tuning & Optimization
Services, Datavail
Datavail is the largest provider of remote database administration in the U.S. with nearly 400 DBAs, 24/7 support and onsite/offsite, onshore/offshore delivery.