B208 Access & Constraints

Embed Size (px)

Citation preview

  • 7/27/2019 B208 Access & Constraints

    1/27

    Module 8: Access Considerations and

    Constraints

    After completing this module, you will be able to:

    Analyze Optimizer Access scenarios.

    Explain partial value searches and data conversions.

    Identify the effects of conflicting data types.

    Determine the cost of I/Os.

    Identify column level attributes and constraints.

    Identify table level attributes and constraints.

    Add, modify and drop constraints from tables.

    Explain how the Identity column allocates new numbers.

  • 7/27/2019 B208 Access & Constraints

    2/27

    Access Method Comparison

    Unique Primary Index

    Very efficient

    One AMP, one row

    No spool file

    Non-Unique Primary Index

    Efficient if the number of rows

    per value is reasonable and

    there are no severe spikes.

    One AMP, multiple rows

    Spool file if needed

    Unique Secondary Index

    Very efficient

    Two AMPs, one row

    No spool file

    Non-Unique Secondary Index

    Efficient only if the number of rows

    accessed is a small percentage of

    the total data rows in the table.

    All AMPs, multiple rows

    Spool file if needed

    Full-Table Scan

    Efficient since each row is touched

    only once.

    All AMPs, all rows

    Spool file may equal the table insize

    The Optimizer chooses the fastest access method.

    COLLECT STATISTICS to help the Optimizer make

    good decisions.

  • 7/27/2019 B208 Access & Constraints

    3/27

    Optimizer Access Scenarios

    SINGLE TABLE CASE

    WHERE Table_1.Col_1 = :value_1

    AND Table_1.Col_2 = :value_2 ; Column theOptimizer

    uses foraccess.

    USI NUSI

    NOT

    INDEXEDCol_1

    Col_2

    USI USI USI USI

    NUSI USI

    NOT

    INDEXEDFTS

    NUPI NUPI NUPI

    UPI UPI UPI UPI

    NUPI or

    USI

    Either, Both,

    orFTSNUSI orFTS

    USI NUSIorFTS

    1

    2 3

    3

    1. The Optimizer prefers Primary Indexes over Secondary Indexes. It chooses the NUPIif only one I/O (block) is accessed.

    The Optimizer prefers Unique indexes over non-unique indexes. Only one row is

    involved with USI even though it is a two-AMP operation.

    2. Depending on relative selectivity, the Optimizer may use either NUSI, may use both

    with NUSI Bit Mapping, or may do a FTS.

    3. It depends on the selectivity of the index.

    Notes:

  • 7/27/2019 B208 Access & Constraints

    4/27

    Partial Value Searches

    Columns values must not be decomposable.

    LIKE, INDEX, and SUBSTRING operators indicate decomposable data.

    Show all calls placed by people within Area Code 415:

    SELECT , phone,

    FROM Call

    WHERE phone LIKE '415%' ;

    Always decompose data to the finest level of access usage.

    Use the SQL concatenation operator ( ll ) to display the data:

    SELECT , area_code ll '/' ll phone,

    FROM Call

    WHERE AREA_CODE = 415 ;

    The Teradata Database does a FTS on a partial index value unless the index is

    ordered by value (Value-ordered NUSI or Hash Index).

    Data storage and display should be treated as separate issues.

  • 7/27/2019 B208 Access & Constraints

    5/27

    Data Conversions

    Columns (or values) must be of the same data type to be compared.

    If column (or values) types differ, internal conversion is performed.

    Character data is compared using the hosts collating sequence.

    Unequal-length character strings are converted by right-padding the shorter

    one with blanks.

    Numeric values are converted to the same underlying representation.

    Character to numeric comparison requires the character value to be

    converted to a numeric value.

    Data conversion is expensive and generally unnecessary.

    Implement data types at the Domain level.

    Comparison across data types may indicate that Domain definitions are not

    clearly understood.

  • 7/27/2019 B208 Access & Constraints

    6/27

    Storing Numeric Data

    When comparing character data to numeric, Teradata will always convert

    character to numeric, then do the comparison.

    Case 1

    Table 1

    CREATE TABLE Emp1

    (Emp_no CHAR(6),Emp_name CHAR(20))

    PRIMARY INDEX (Emp_no);

    Statement 1

    SELECT *

    FROM Emp1

    WHERE Emp_no = '1234';

    Statement 2

    SELECT *

    FROM Emp1

    WHERE Emp_no = 1234;

    Table 1

    CREATE TABLE Emp2

    (Emp_no INTEGER,Emp_name CHAR(20))

    PRIMARY INDEX (Emp_no);

    Statement 1

    SELECT *

    FROM Emp2

    WHERE Emp_no = 1234;

    Statement 2

    SELECT *

    FROM Emp2

    WHERE Emp_no = '1234';

    Case 2Comparison Rules:

    To compare columns, they

    must be of the same Data

    types.

    Character data types will

    always be converted to

    numeric (when comparing

    character to numeric).

    Bottom Line:

    Always store numeric data

    in numeric data types to

    avoid unnecessary and

    costly data conversions.

    Results in Full Table Scan Results in unnecessaryconversion

  • 7/27/2019 B208 Access & Constraints

    7/27

    Data Conversion Example

    CREATE SET TABLE TFACT01.Table1

    (col1 CHAR(12) NOT NULL)UNIQUE PRIMARY INDEX (col1);

    EXPLAIN SELECT * FROM Table1 WHERE col1 = '8';

    1) First, we do a single-AMP RETRIEVE step from TFACT01.Table1 by way of the unique primary index

    "TFACT01.Table1.col1 = '8' " with no residual conditions. The estimated time for this step is 0.03

    seconds.-> The row is sent directly back to the user as the result of statement 1. The total estimated time is

    0.03 seconds.

    EXPLAIN SELECT * FROM Table1 WHERE col1 = 8;

    1) First, we lock a distinct TFACT01."pseudo table" for read on a RowHash to prevent global deadlock

    for TFACT01.Table1.2) Next, we lock TFACT01.Table1 for read.

    3) We do an all-AMPs RETRIEVE step from TFACT01.Table1 by way of an all-rows scan with a

    condition of ("(TFACT01.Table1.col1 (FLOAT, FORMAT '-9.99999999999999E-999')UNICODE)=

    8.00000000000000E 000") into Spool 1, which is built locally on the AMPs. The size of Spool 1 is

    estimated with no confidence to be 1,001 rows. The estimated time for this step is 0.28 seconds.

    4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

    -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimatedtime is 0.28 seconds.

  • 7/27/2019 B208 Access & Constraints

    8/27

    Matching Data Types

    The following data types are identical to the hashing algorithm:

    INTEGER = DATE = DECIMAL (x,0)

    CHAR = VARCHAR = LONG VARCHAR

    BYTE = VARBYTE

    GRAPHIC = VARGRAPHIC

    Administer data type assignments at the domain level.

    Give matching Primary Indexes across tables the same data type.

  • 7/27/2019 B208 Access & Constraints

    9/27

    Counting I/O Operations

    Many factors influence the number of physical I/Os in a transaction:

    Cache hits

    Swapping

    Rows per block

    Cylinder splits/migrates

    Mini-Cylpacks

    Number of spool files

    Spool file sizes

    I/Os may be done serially or in parallel.

    Data and index block I/O may or may not require Cylinder Index I/O.

    Changes to data rows and USI rows require Transient Journal I/O.

    I/O counts indicate the relative cost of a transaction.

    A given I/O operation may not cause any actual physical I/O.

  • 7/27/2019 B208 Access & Constraints

    10/27

    Transient Journal I/O

    The Transient Journal is

    A journal of transaction before images.

    Provides for automatic rollback in the event of TXN failure.

    Is automatic and transparent.

    TJ space comes from available free cylinders in the system.

    When a transaction completes, TJ space is returned to free cylinder lists.

    Provides Transaction Integrity.

    Therefore, when modifying a table, there are I/Os for data table and the Transient

    Journal.

    Some situations where Transient Journal is not used include:

    INSERT / SELECT into an empty table

    DELETE FROM tablename ALL

    Utilities such as FastLoad and MultiLoad

  • 7/27/2019 B208 Access & Constraints

    11/27

    INSERT and DELETE Operations

    INSERT INTO tablename . . . ; DELETE FROM tablename . . . ;

    * = I/O Operations

    DATA ROW * READ DATA BLOCK

    * WRITE TRANSIENTJOURNAL

    INSERT or DELETE the DATA ROW

    * WRITE NEW DATA BLOCK

    * WRITE CYLINDER INDEX

    For each USI * READ INDEX BLOCK

    * WRITE TRANSIENTJOURNAL

    INSERT or DELETE the NEW INDEX ROW

    * WRITE NEW INDEX BLOCK

    * WRITE CYLINDER INDEX

    For each NUSI * READ INDEX BLOCK

    ADD or DELETE the ROWID on the ROWID LIST or

    ADD or DELETE the SUBTABLE ROW

    * WRITE NEW INDEX BLOCK

    * WRITE CYLINDER INDEX

    I/O operations per row = 4 + [ 4 * (#USIs) ] + [ 3 * (#NUSIs) ]

    Double for FALLBACK

  • 7/27/2019 B208 Access & Constraints

    12/27

  • 7/27/2019 B208 Access & Constraints

    13/27

  • 7/27/2019 B208 Access & Constraints

    14/27

    Permanent Journal I/O

    SINGLE image journaling is not allowed on

    FALLBACK tables.

    AFTER

    IMAGE

    NONE NONE 0

    NONE SINGLE 2

    SINGLE NONE 2

    SINGLE SINGLE 4

    4DUALNONE

    DUAL NONE 4

    SINGLE DUAL 6

    DUAL SINGLE 6

    DUAL DUAL 8

    BEFORE

    IMAGE PJ I/O COUNT (Count)These counts include:

    1. Write the PJ block,

    2. Write the Cylinder Index.

    INSERT : Total PJ I/O = Count + (#USIs * Count)

    DELETE :

    Total PJ I/O = Count + (#USIs changed * Count * 2)UPDATE :

    Total PJ I/O = Count + (#USIs * Count)

    Total I/O = Total PJ I/O + DATA I/O

    Changes to NUSI columns cause no additional I/Os.

    Changes to PI columns double the counts.

    The total number of Permanent Journal I/O

    operations per row is:

  • 7/27/2019 B208 Access & Constraints

    15/27

    Table Level Attributes

    CREATE MULTISET TABLE Table_1, FALLBACK,

    DATABLOCKSIZE = 16384 BYTES, FREESPACE = 10 PERCENT, CHECKSUM = NONE(column1 INTEGER,

    column2 CHAR(5) );

    SET Dont allow duplicate rows

    MULTISET Allow duplicate rows (ANSI)

    DATABLOCKSIZE = Maximum multi-row block size for table in:

    BYTES Rounded to nearest sector (512)

    KILOBYTES (or KBYTES) Increments of 1024

    MINIMUM DATABLOCKSIZE (7168)

    MAXIMUM DATABLOCKSIZE (130,560)IMMEDIATE May be used to immediately re-block the data (ALTER)

    FREESPACE Percent of freespace to keep on cylinder during load operations (0 - 75%).

    CHECKSUM = DEFAULT | NONE | LOW | MEDIUM | HIGH | ALLDisk I/O Integrity Check V2R5.1 feature

  • 7/27/2019 B208 Access & Constraints

    16/27

    Column Level Constraints

    PRIMARY KEY No Nulls, No Duplicates

    UNIQUE No Nulls, No Duplicates

    CHECK Verify values or range

    REFERENCES Relates to other columns

    CREATE TABLE Table_2(col1 INTEGER NOT NULL CONSTRAINT primary_1 PRIMARY KEY,

    col2 INTEGER NOT NULL CONSTRAINT unique_1 UNIQUE,

    col3 INTEGER CONSTRAINT check_1 CHECK (col3 > 0),

    col4 INTEGER CONSTRAINT reference_1 REFERENCES Table_3(col_a)

    );

    All constraints are named.

    All constraints are at column level.

    PRIMARY KEY columns must have NOT NULL attribute.

    UNIQUE columns must also have NOT NULL attribute.

  • 7/27/2019 B208 Access & Constraints

    17/27

    Table Level Constraints

    CREATE TABLE Table_4

    (col1 INTEGER NOT NULL,col2 INTEGER NOT NULL,

    col3 INTEGER NOT NULL,

    col4 INTEGER NOT NULL,

    col5 INTEGER,

    col6 INTEGER,

    CONSTRAINT primary_1 PRIMARY KEY (col1, col2),CONSTRAINT unique_1 UNIQUE (col3, col4),

    CONSTRAINT check_1 CHECK (col2 > 0 OR col4 > 0),

    CONSTRAINT reference_1 FOREIGN KEY (col5, col6)

    REFERENCES Table_5 (colA, colB),

    CHECK (col4 > col5),

    FOREIGN KEY (col3) REFERENCES Table_6 (colX)

    );

    Some constraints are named.

    Some constraints are unnamed.

    All constraints are at table level.

    Named

    Unnamed

  • 7/27/2019 B208 Access & Constraints

    18/27

  • 7/27/2019 B208 Access & Constraints

    19/27

    Example: SHOW Department Table

    SHOW TABLE Department;

    CREATE SET TABLE PD.Department , FALLBACK ,

    NO BEFORE JOURNAL,

    NO AFTER JOURNAL,

    CHECKSUM = DEFAULT

    (

    dept_number INTEGER NOT NULL,

    dept_name CHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,dept_mgr_number INTEGER,

    budget_amount DECIMAL(10,2),

    CONSTRAINT dn_1000_plus CHECK ( dept_number > 999 ),

    CONSTRAINT refer_1 FOREIGN KEY ( dept_mgr_number ) REFERENCES

    PD.EMPLOYEE ( EMPLOYEE_NUMBER ))

    UNIQUE PRIMARY INDEX primary_1 ( dept_number )UNIQUE INDEX ( dept_name );

    Notes:

    Primary key constraint becomes a named index.

    Unique constraint becomes a unique index.

    All constraints are specified at table level.

    Note: The Primary Key constraintdefined with the CREATE TABLE

    doesn't appear in this SHOW TABLE.

  • 7/27/2019 B208 Access & Constraints

    20/27

    Altering Table Constraints

    ALTER TABLE tablename

    ADD CONSTRAINT constrname CHECK . . .

    ADD CONSTRAINT constrname UNIQUE . . .

    ADD CONSTRAINT constrname PRIMARY KEY . . .

    ADD CONSTRAINT constrname FOREIGN KEY . . .

    To add constraints to a table:

    ALTER TABLE tablename

    DROP CONSTRAINT constrname ;

    ALTER TABLE tablename

    MODIFY CONSTRAINT constrname . . . ;

    To modify existing constraints:

    To drop constraints:

    Note:

    Only constraint that can be modified

    is a named CHECK constraint.

    In V2R5, the ALTER TABLE command can also be used to add new columns (up to

    2048) to an existing table.

  • 7/27/2019 B208 Access & Constraints

    21/27

    Also known as a DBS Generated Uniqu e Primary Index: A table-level unique

    number system-generated for every row as it is inserted in the table.

    Identity Columns may be used to ...

    Guarantee row uniqueness in a table

    Guarantee even row distribution for a table

    Optimize and simplify initial port from other databases that use generated keys

    Identity Columns are valid for:

    Single inserts

    Multi-session concurrent insert requests (e.g., TPump)

    INSERT SELECT

    Identity Columns Save Overhead/Maintenance Costs:

    Reduce need for uniqueness constraints

    Reduce manual coding tasks

    Generate unique PK values

    Comply with the ANSI Standard

    Identity Column Overview

  • 7/27/2019 B208 Access & Constraints

    22/27

    Identity Column Implementation

    Characteristics of the IDENTITY Column feature are ...

    Implemented at column level in a CREATE TABLE statement

    Data type may be any exact numeric type

    GENERATED ALWAYS always generates a value

    GENERATED BY DEFAULT generates a value only when no value is specified

    GENERATED ALWAYS + NO CYCLE implies uniqueness

    CYCLE restarts numbering after the maximum/minimum number is

    generated

    DBSControl setting indicates the number pool size to reserve for generating

    numbers

    Each Vproc may reserve 1 1,000,000 numbers; default is 100000.

    Numbering gaps can occur

    Generated numbers do not reflect row insertion sequence

    Exact incrementing is not guaranteed

    Scalability and performance are favored over enforced sequential

    numbering

  • 7/27/2019 B208 Access & Constraints

    23/27

    Identity Column Example 1

    Example 1: GENERATED ALWAYS AS IDENTITY

    This command always generates a value. It does not cycle and does not repeat priorused values.

    CREATE TABLE Table_A

    (Cust_Number INTEGER GENERATED ALWAYS AS IDENTITY

    (START WITH 1001 INCREMENT BY 1 MAXVALUE 1000000 NO CYCLE),

    LName VARCHAR(15),

    Zip_code INTEGER);

    INSERT INTO Table_A SELECT c_custid, c_lname, c_zipcode FROM Customer;

    Customer has 500 rows new customer

    numbers generated are not sequentially

    numbered from 1001 to 1500.

    Numbering gaps can occur exactincrementing is not guaranteed.

    Pools (range of numbers) are reserved

    and allocated by Teradata software.

    Default for next allocation pool is

    DBSControl parameter value of 100,000.

    SELECT * FROM Table_A ORDER BY 1;

    Cust_Number LName Zip_Code

    1001 Tatem 897141002 Kroger 98101

    1003 Yang 77481

    1004 Miller 45458

    : : :

    101001 Powell 57501

    101002 Gordan 89714

    101003 Smoothe 80002

    : : :

  • 7/27/2019 B208 Access & Constraints

    24/27

    Identity Column Example 2

    Example 2: GENERATED BY DEFAULT AS IDENTITY

    This option generates a value only when no value is specified for the column.

    CREATE TABLE Table_B

    (Cust_Number INTEGER GENERATED BY DEFAULT AS IDENTITY

    (START WITH 10000000 INCREMENT BY -1 MINVALUE 0),

    LName VARCHAR(15),

    Zip_code INTEGER);

    INSERT INTO Table_B SELECT NULL, c_lname, c_zipcode FROM Customer;

    Customer has 500 rows new customer

    numbers are generated because NULL was

    part of SELECT.

    If MINVALUE is not used, the minimumvalue for an INTEGER is -2,147,483,647.

    CYCLE option is not used default is NO

    CYCLE.

    GENERATED BY DEFAULT provides

    capability of copying the contents of one

    table with an Identity column into another.

    SELECT * FROM Table_B ORDER BY 1 DESC;

    Cust_Number LName Zip_Code

    10000000 Tatem 897149999999 Kroger 98101

    9999998 Yang 77481

    9999997 Miller 45458

    : : :

    9900000 Powell 57501

    9899999 Gordan 89714

    9899998 Smoothe 80002

    : : :

  • 7/27/2019 B208 Access & Constraints

    25/27

    Identity Column Considerations

    Generated Always Identity Columns

    Typically define the Primary Index.

    Define as the Primary Index only i f i t is the pr imary path.

    If it is also used as an access path, consider it as a Secondary Index.

    Generated By Default Identity Columns

    Facilitate copying data from one table into another.

    Use a numeric type large enough to hold all the values that will ever be required.

    Neveruse as a subst i tutefor a good logical database design.

    May not optimally utilize Teradata join and access capabilities.

    Restrictions

    A table can only have 1 Identity column.

    FastLoad and MultiLoad do not support Identity columns with Teradata V2R5.0.

    ALTER TABLE statement can not add an Identity Column to an existing table.

    Cannot be part of a composite primary or a composite secondary index.

    Cannot be used with Global Temporary or volatile tables.

    Cannot be used in a join index, hash index, PPI or value-ordered index.

    Atomic UPSERTs are not supported on a table with an Identity Column as its PI.

    GENERATED ALWAYS Identity Column value updates are not supported.

    Note: With Teradata V2R5.1, Identity columns are supported with the FastLoad, MultiLoad,and Teradata Warehouse Builder (TWB) utilities.

  • 7/27/2019 B208 Access & Constraints

    26/27

    Review Questions

    1. Which one of the following situations requires the use of the Transient Journal?

    a. INSERT / SELECT into an empty table

    b. UPDATE all the rows in a table

    c. DELETE all the rows in a table

    d. loading a table with FastLoad

    2. What is a negative impact of updating a UPI value?

    ______________________________________________________

    ______________________________________________________

    3. What are the 4 types of constraints?

    _____________ _____________ _____________ _____________

    4. True or False? A primary key constraint is always implemented as a primary index.

    5. True or False? A primary key constraint is always implemented as a unique index.

    6. True or False? Multi-column constraints must be coded as table level constraints.

    7. True or False? Only named check constraints may be modified.

    8. True or False? Named primary key constraints may always be dropped if they are no longer

    needed.

    9. True or False? Using the START WITH 1 and INCREMENT BY 1 options with an Identity

    column will provide sequential numbering with no gaps for the column.

  • 7/27/2019 B208 Access & Constraints

    27/27

    Module 8: Review Question Answers

    1. Which one of the following situations requires the use of the Transient Journal?

    a. INSERT / SELECT into an empty table

    b. UPDATE all the rows in a tablec. DELETE all the rows in a table

    d. loading a table with FastLoad

    2. What is a negative impact of updating a UPI value?

    Very I/O intens ive - updating th e Primary Ind ex requires th at (internally ) the data row b e deleted and

    re-inserted into the table as well as updating the existing secondary index references to the new

    RowID

    3. What are the 4 types of constraints?

    Primary Key Unique References Check

    4. True orFalse? A primary key constraint is always implemented as a primary index.

    5. Trueor False? A primary key constraint is always implemented as a unique index.

    6. Trueor False? Multi-column constraints must be coded as table level constraints.

    7. Trueor False? Only named check constraints may be modified.

    8. True orFalse? Named primary key constraints may always be dropped if they are no longer

    needed.

    9. True orFalse? Using the START WITH 1 and INCREMENT BY 1 options with an Identity