55
CS 8630 Database Administration, Dr. Guimaraes 10-05-2009, Physical Design and Performance Class Will Start Momentarily… CS8630 Database Administration Dr. Mario Guimaraes

CS 8630 Database Administration, Dr. Guimaraes 10-05-2009, Physical Design and Performance Class Will Start Momentarily… CS8630 Database Administration

Embed Size (px)

Citation preview

CS 8630 Database Administration, Dr. Guimaraes

10-05-2009, Physical Design and Performance

ClassWill

Start Momentarily…

CS8630 Database AdministrationDr. Mario Guimaraes

CS 8630 Database Administration, Dr. Guimaraes

Overview

• Introduction: input to Physical Design, Decisions• Create Index• Rewrite SQL / Query Optimizer (Leccotech)• Denormalization, Materialized Views• Partition Database• Redundant Arrays of Inexpensive Disks (RAID)• Redefine Main memory structures (SGA in Oracle)• Change default Block Size at installation • Export/Import (drop indexes): defragment• Check Locks• Separate data by category in proper tablespaces• Redefining Client-Server Architecture

Where should a DBA start when trying to optimize ? Why ?

a) DB, b) OS, c) DB Application, 4) Other

CS 8630 Database Administration, Dr. Guimaraes

DB Design Phases

• Conceptual Design• Logical Design• Physical Design

CS 8630 Database Administration, Dr. Guimaraes

Introduction - Inputs to Physical Design

• Normalized relations.• Volume estimates.• Attribute definitions.• Data usage: entered, retrieved, deleted, updated.• Response time requirements.• Requirements for security, backup, recovery, retention,

integrity.• DBMS characteristics.• system

CS 8630 Database Administration, Dr. Guimaraes

Physical Design Decisions

• Specifying attribute data types.• Modifying the logical design.• Specifying the file organization (sometimes)• Choosing indexes.

CS 8630 Database Administration, Dr. Guimaraes

Designing Fields

• Choosing PK• Choosing data type.• Coding, compression, encryption.• Controlling data integrity.

– Default value.– Range control.– Null value control.– Referential integrity.

CS 8630 Database Administration, Dr. Guimaraes

Selection of a Primary Key

• Consider a shorter field or selecting another candidate key to substitute for a long, multi-field primary key (and all associated foreign keys.)– System-generated non-information-

carrying key– Versus– Primary key like Phone number

CS 8630 Database Administration, Dr. Guimaraes

Example of Data Dictionary

Attribute Table Null? Unique? Pkey? Fkey? Ref table Domain

CID College N Y Y N NA

4 digit integer greater than 1000

Office College Y N N N NA

character string length 10

DID Dept N Y Y N NA

4 digit integer greater than 1000

Location Dept Y N N N NA

character string length 65

CID Dept Y N N Y College

4 digit integer greater than 1000

CS 8630 Database Administration, Dr. Guimaraes

Example code-look-up table

CS 8630 Database Administration, Dr. Guimaraes

Composite usage map

CS 8630 Database Administration, Dr. Guimaraes

Designing Fields

• Handling missing data.– Substitute an estimate of the missing

value.– Assign default value.– Trigger a report listing missing values.– In programs, ignore missing data unless

the value is significant.

CS 8630 Database Administration, Dr. Guimaraes

• END OF INTRODUCTION TO PHYSICAL DESIGN

• START OF PERFORMANCE (INDEXES, QUERY OPTIMIZATION).

CS 8630 Database Administration, Dr. Guimaraes

INDEXES

• What is an INDEX ?• Why do we CREATE an INDEX ?

A) To speed up query B) To speed up data entry (insert/update/delete) ?C) Both ?

CS 8630 Database Administration, Dr. Guimaraes

Rules for Using Indexes

1. Use on larger tables.2. Index the primary key of each table.3. Index search fields.4. Fields in WHERE clause of SQL commands.5. Cardinality is high. For example, not on SEX, where cardinality

is 2.Typically: When there are >100 different values but not when there are <10 values.

CS 8630 Database Administration, Dr. Guimaraes

Rules for Using Indexes

6. DBMS may have limit on number of indexes per table and number of bytes per indexed field(s).

7. Null values may not be referenced from an index.

8. Use indexes heavily for non-volatile databases (Datawarehouse); limit the use of indexes for volatile databases.

CS 8630 Database Administration, Dr. Guimaraes

Different Type of Indexes

Typical Indexes• B-Trees (traditional) Indexes• Hash-cluster• Bitmap Indexes• Index-Organized Tables• Reverse-Key Indexes--------------------------------------• When we issue the command:

Create index cidx on orders (cid);What type of an index do we create ?

• General Format: Create index <iName> on <tname> (<col_name>);

CS 8630 Database Administration, Dr. Guimaraes

Indexes (Defaults)

• Anytime a PK is created, an index is automatically created.

• Anytime when the type of index is not specificied, the type of index created isa B-Trees.

CS 8630 Database Administration, Dr. Guimaraes

B-Tree (Balanced Tree)

• Most popular type of index structure for any programming language or database.

• When you don’t know what to do, the best option is usually a B-Tree. They are flexible and perform well (not very well) in several scenarios.

• It is really the B+ tree or B* tree

CS 8630 Database Administration, Dr. Guimaraes

B-Trees (continued)

• One node corresponds to one block/page(minimum disk I-O).

• Non-Leaf nodes(n keys, n+1 pointers)• Leaf-Nodes (contain n entries, where

each entry has an index and a pointer to a data block). Also, each node has a pointer to next node.

• All leaves are at the same height.

CS 8630 Database Administration, Dr. Guimaraes

Good Indexing (B-Tree) Candidates

• Table must be reasonably large• Field is queried by frequently• Field has a high cardinality (don’t index by

sex, where the cardinality is 2!!).• Badly balanced trees may inhibit

performance. Destroying and re-creating index may improve performance.

CS 8630 Database Administration, Dr. Guimaraes

Bitmap Index

• Bitmap indexes contain the key value and a bitmap listing the value of 0 or 1 (yes/no) for each row indicating whether the row contains that value or not.

• May be a good option for indexing fields that have low cardinality (opposite of B-trees).

CS 8630 Database Administration, Dr. Guimaraes

Bitmap Index (cont.)

• Syntax: Create Bitmap index ….• Bitmap index works better with equality tests = or

in (not with < or > )• Bitmap index maintenance can be expensive; an

individual bit may not be locked; a single update locks a large portion of index.

• Bitmap indexes are best in read-only datawarehouse situations

CS 8630 Database Administration, Dr. Guimaraes

Hash Indexing

• B-trees and Bitmap index keys are used to find rows requiring I/O to process index

• Hash gets rows with a key based algorithm• Rows are stored based on a hashed value• Index size should be known at index

creation• Example:

– create index cidx on orders (cid) hashed;

CS 8630 Database Administration, Dr. Guimaraes

Hash Index work best with

• Very-high cardinality columns• Only equal (=) tests are used• Index values do not change• Number of rows are known ahead of time

CS 8630 Database Administration, Dr. Guimaraes

Index-Organized Tables

• Table data is incorporated into the B-Tree using the PK as the index.

• Table data is always in order of PK. Many sorts can be avoided.

• Especially useful for “lookup” type tables• Index works best when there are few (and

small) columns in your table other than the PK.

CS 8630 Database Administration, Dr. Guimaraes

Reverse Key Indexes

• Key ‘1234’ becomes ‘4321’, etc. • Only efficient for few scenarios envolving

parallel processing and a hughe amount of data.

• By reversing key values, index blocks might be more evenly distributed reducing the likelihood of densely or sparsely populated indexes.

CS 8630 Database Administration, Dr. Guimaraes

Conclusions on Indexes

• For high-cardinality key values, B-Tree indexes are usually best.

• B-Trees work with all types of comparisons and gracefully shrink and grow as table changes.

• For low cardinality read-only environments, Bitmaps may be a good option.

CS 8630 Database Administration, Dr. Guimaraes

Denormalization

• Normally, we want to design our tables up to

3NF or BCNF (at least)• When do we want to violate 3NF / BCNF ?• When do we want to store Derived Data ?

– A) Read Only Databases ?– B) Updateable Databases ?

CS 8630 Database Administration, Dr. Guimaraes

Rules for Adding Derived Columns

• Use when aggregate values are regularly retrieved.

• Use when aggregate values are costly to calculate.

• Permit updating only of source data.• Create triggers to cascade changes

from source data.

CS 8630 Database Administration, Dr. Guimaraes

Rules for Storing Repeating Groups

• Consider storing repeating groups across columns rather than down rows when:– The repeating group has a fixed number

of occurrences, each of which has a different meaning or

– The entire repeating group is normally accessed and updated as one unit.

CS 8630 Database Administration, Dr. Guimaraes

Rules for Storing Repeating Groups Across Columns

EMPLOYEE Phone

Design Option:EMPLOYEE(EmpID, EmpName, …)EMP_PHONE(EmpID, Phone)

Another Design Option:EMPLOYEE(EmpID, EmpName, Phone1, Phone2, …)

CS 8630 Database Administration, Dr. Guimaraes

• One-to-one relationship. Student 1,1 Submits 0,1 Application

• STUDENT and APPLICATION become a single relation STUDENT instead of 2

• Many-to-many relationship. Vendor 1,N PriceQuote 1, N Item

• Physical design may suggest collapsing ITEM and PRICE_QUOTE into a single relation ITEM_QUOTE

Denormalization

CS 8630 Database Administration, Dr. Guimaraes

A possible denormalization situation:

One-to-many relationship

CS 8630 Database Administration, Dr. Guimaraes

Partitioning

• Horizontal Partitioning: Distributing the rows of a table into several separate files/locations.

• Vertical Partitioning: Distributing the columns of a table into several separate files/locations.– The primary key must be repeated in

each file.

CS 8630 Database Administration, Dr. Guimaraes

Partitioning

• Advantages of Partitioning:– Records used together are grouped together.– Each partition can be optimized for performance.– Security, recovery.– Partitions stored on different disks: contention.– Take advantage of parallel processing capability.

• Disadvantages of Partitioning:– Slow retrievals across partitions.– Complexity.

CS 8630 Database Administration, Dr. Guimaraes

RAID with four disks and striping

R RedundantA Arrays ofI InexpensiveD Disks

RAID

CS 8630 Database Administration, Dr. Guimaraes

Intro. To Query Processing

• In network and hierarchical DBMSs, low-level procedural query language is generally embedded in high-level programming language.

• Programmer’s responsibility to select most appropriate execution strategy.

• With declarative languages such as SQL, user specifies what data is required rather than how it is to be retrieved.

• Relieves user of knowing what constitutes good execution strategy

• Gives DBMS more control over system performance.• Disk access tends to be dominant cost in query

processing for centralized DBMS.

• Two main techniques for query optimization:– heuristic rules that order operations in a query; – comparing different strategies based on relative

costs, and selecting one that minimizes resource usage.

CS 8630 Database Administration, Dr. Guimaraes

Goals

• Aims of QP:– transform query written in high-level

language (e.g. SQL), into correct and efficient execution strategy expressed in low-level language (implementing RA);

– execute strategy to retrieve required data. • As there are many equivalent transformations

of same high-level query, aim of QO is to choose one that minimizes resource usage.

• Generally, reduce total execution time of query.

• Problem computationally intractable with large number of relations, so strategy adopted is reduced to finding near optimum solution.

CS 8630 Database Administration, Dr. Guimaraes

3 alternatives

Find all Managers who work at a London branch.

SELECT *

FROM Staff s, Branch b

WHERE s.branchNo = b.branchNo AND

(s.position = ‘Manager’ AND b.city = ‘London’);

• Three equivalent RA queries are:

(1) (position='Manager') (city='London')

(Staff.branchNo=Branch.branchNo) (Staff X Branch) (2) (position='Manager') (city='London')(

Staff Staff.branchNo=Branch.branchNo Branch)(3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo

(city='London' (Branch))

CS 8630 Database Administration, Dr. Guimaraes

Comparing costs

• Assume:– 1000 tuples in Staff; 50 tuples in Branch;– 50 Managers; 5 London branches;– no indexes or sort keys;– results of any intermediate operations stored on

disk;– cost of the final write is ignored;– tuples are accessed one at a time.

• Cost (in disk accesses) are:

(1) (1000 + 50) + 2*(1000 * 50) = 101 050 (2) 2*1000 + (1000 + 50) = 3 050 (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160

• Cartesian product and join operations much more expensive than selection, and third option significantly

reduces size of relations being joined together.

CS 8630 Database Administration, Dr. Guimaraes

Phases of Query Processing

• QP has four main phases:

CS 8630 Database Administration, Dr. Guimaraes

Dynamic versus Static Optimization

• First three phases of QP can be carried out:– dynamically every time query is run;– statically when query is first submitted. – Similar to compiled vs. interpreted lang.

• Advantages of dynamic QO arise from fact that information is up to date.

• Disadvantages are that performance of query is affected, time may limit finding optimum strategy.

• Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy.

• Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query is run.

• Could use a hybrid approach to overcome this.

CS 8630 Database Administration, Dr. Guimaraes

Query Optimizer - Plan

• DBMSs allow you to view the query plan

• In ORACLE, you must use either set autotrace on or explain plan. Set autotrace on is much simpler. Explain plan is a little bit more efficient, but more complicated.

CS 8630 Database Administration, Dr. Guimaraes

Oracle operations (results of autotrace)

• TABLE ACCESS FULL• TABLE ACCESS BY ROWID• INDEX RANGE SCAN• INDEX UNIQUE SCAN• NESTED LOOPS

CS 8630 Database Administration, Dr. Guimaraes

• TABLE ACCESS FULL (full table scan):

Oracle will look at every row in the table to find the requested information. This is usually the slowest way to access a table.

CS 8630 Database Administration, Dr. Guimaraes

TABLE ACCESS BY ROWIDOracle will use the ROWID method to find a row in the table.ROWID is a special column detailing an exact Oracle block wherethe row can be found. This is the fastest way to access a table (faster than any index. Less flexible than any index).

CS 8630 Database Administration, Dr. Guimaraes

INDEX RANGE SCANOracle will search an index for a range of values. Usually, this even occurs when a range or between operation is specified by the query or when only the leading columns in a composite index are specified by the where clause. Can perform well or poorly, based on the size of the range and the fragmentation of the index.).

CS 8630 Database Administration, Dr. Guimaraes

INDEX UNIQUE SCANOracle will perform this operation when the table’s primary key or a unique key is part of the where clause. This is the most efficientway to search an index.

CS 8630 Database Administration, Dr. Guimaraes

NESTED LOOPS Indicates that a join operation is

occurring. Can perform well or poorly, depending on performance on the index and table

operations of the individual tables being joined.

CS 8630 Database Administration, Dr. Guimaraes

Tuning SQL and PL/SQL Queries

Sometimes, Same Query written more than 1000 ways.

Generating more than 100 execution plans.Some firms have products that re-write

correctly written SQL queries automatically.

CS 8630 Database Administration, Dr. Guimaraes

ROWID

• SELECT ROWID, …INTO :EMP_ROWID, …FROM EMP

WHERE EMP.EMP_NO = 56722FOR UPDATE;

UPDATE EMP SET EMP.NAME = …WHERE ROWID = :EMP_ROWID;

CS 8630 Database Administration, Dr. Guimaraes

ROWID (cont.)

• Fastest• Less Flexible• Are very useful for removing duplicates of

rows

CS 8630 Database Administration, Dr. Guimaraes

SELECT STATEMENT

• Not exists in place of NOT IN• Joins in place of Exists• Avoid sub-selects• Exists in place of distinct• UNION in place of OR on an index column• WHERE instead of ORDER BY

CS 8630 Database Administration, Dr. Guimaraes

QUERY OPTIMIZER

• END OF QUERY OPTIMIZER

CS 8630 Database Administration, Dr. Guimaraes

End of Lecture

End Of

Today’s

Lecture.