Join Index

4.) Column Value Management

Objectives

Upon completion of this module, the student should be able to:

Insert into a table using the DEFAULT VALUES feature. Perform string functions using the POSITION feature. Test case sensitivity with the LOWER attribute. Rename a column of a table.

Using DEFAULT VALUES

Teradata has the ability to insert a row using only the DEFAULT VALUES keywords. For this feature to work successfully, one of the following statements must be true for each column of the table:

- the column has defined default values - the column has a default system value specified - the column permits nulls

If none of these statements is true, an insert using DEFAULT VALUES will fail. Note that such an insert may be executed multiple times as long as no uniqueness attributes of the table are violated.

Column Attributes:

NOT NULL - Nulls are not permitted for this column DEFAULT 22 - Unless otherwise specified, assign the column a value of 22 DEFAULT DATE '2010-01-01' - Unless otherwise specified, assign a date of

Jan 1, 2010 WITH DEFAULT - Assign the system default, spaces for char strings, zero for

numeric data types and current date for date data type DEFAULT TIME - Assign the current time to the integer data type DEFAULT USER - Assign the user id of the session to the character string

The command - INSERT INTO tablename DEFAULT VALUES;

Will insert defined default values into each column. Will insert a null if no default is defined. Will fail if no default is defined and a null is not allowed.

Create a Test table

CREATE TABLE test_tbl (cola SMALLINT NOT NULL DEFAULT 22 ,colb CHAR(1) ,colc DATE DEFAULT DATE '2010-01-01',cold DEC(3,2) NOT NULL WITH DEFAULT ,cole TIME(0) DEFAULT CURRENT_TIME ,colf INT DEFAULT TIME ,colg CHAR(8) DEFAULT USER);

Populate the test table

INSERT INTO test_tbl DEFAULT VALUES;SELECT * FROM test_tbl; cola colb colc cold cole colf colg ------ ---- -------- ----- -------- ----------- -------- 22 ? 10/01/01 .00 15:27:31 152731 TD036 INSERT INTO test_tbl DEFAULT VALUES;SELECT * FROM test_tbl; cola colb colc cold cole colf colg ------ ---- -------- ----- -------- ----------- -------- 22 ? 10/01/01 .00 15:27:31 152731 TD036 22 ? 10/01/01 .00 15:27:43 152743 TD036

Defaulting Methods

Defaulting of data values may also be accomplished by the use of positional commas in an INSERT statement. The positional commas indicate the use of a default value if one is specified and a null if one is not. If neither outcome is possible, an error is returned.

Let's look at the traditional method of defaulting values.

The INSERT

INSERT INTO test_tbl VALUES (,,,,,,);

The SELECT

SELECT * FROM test_tbl; cola colb colc cold cole colf colg ------ ---- -------- ----- -------- ----------- -------- 22 ? 10/01/01 .00 15:27:31 152731 TD036 22 ? 10/01/01 .00 15:27:43 152743 TD036 22 ? 10/01/01 .00 15:33:13 153313 TD036

While it is possible to alter a column definition via an ALTER TABLE statement, care must be used to not add attributes which are in conflict with existing attributes or existing data. Adding a new column with a NOT NULL attribute to a table returns an error if the table already has rows defined. The new column must initially be set to either a null or a value for the existing rows.

The ALTER TABLE

ALTER TABLE test_tbl ADD colh SMALLINT NOT NULL;***Failure 3559 Column COLH is not NULL and it has no default value.

By adding the WITH DEFAULT phrase, the NOT NULL attribute is permitted. The new column being added will carry the system default value initially.

The Correct ALTER TABLE

ALTER TABLE test_tbl ADD colh SMALLINT NOT NULL WITH DEFAULT;

The INSERT

INSERT INTO test_tbl DEFAULT VALUES;

The SELECT

SELECT * FROM test_tbl; cola colb colc cold cole colf colg colh ------ ---- -------- ----- -------- ----------- -------- ------ 22 ? 10/01/01 .00 15:27:31 152731 TD036 0 22 ? 10/01/01 .00 15:27:43 152743 TD036 0 22 ? 10/01/01 .00 15:33:13 153313 TD036 0 22 ? 10/01/01 .00 15:38:42 153842 TD036 0

Creating a Table

CREATE TABLE abc (a INT NOT NULL)

The INSERT

INSERT INTO abc DEFAULT VALUES;

***Failure 3811 Column 'a' is NOT NULL. Give the column a value.

Creating Tables in Teradata Mode

Tables may be created in either Teradata mode or ANSI mode. Tables created in Teradata mode will have all character columns defined as NON CASESPECIFIC by default. This means that the data will be stored in the column in the same case that it was entered. It will also be returned to the user in this stored case, however all testing against the column will ignore case specificity.

The default comparison operation for non-casespecific columns is always non-casespecific.

CREATE a Table

CREATE TABLE case_nsp_test (col1 CHAR(7) ,col2 CHAR(7));

(Columns created in Teradata mode are defaulted to NOT CASESPECIFIC)

SHOW the Table

SHOW TABLE case_nsp_test;CREATE SET TABLE PED.case_nsp_test ,NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL ( col1 CHAR(7) CHARACTER SET LATIN NOT CASESPECIFIC, col2 CHAR(7) CHARACTER SET LATIN NOT CASESPECIFIC)PRIMARY INDEX (col1);

INSERT into the Table

INSERT INTO case_nsp_test VALUES('LAPTOP','laptop');

SELECT from the Table

SELECT * FROM case_nsp_test WHERE col1=col2;

Result

col1 col2 LAPTOP laptop

Because both columns are defined as NCS, all comparision tests will be done NCS.

Creating ANSI Mode Tables

Tables created in ANSI mode will have character columns defaulted to CASESPECIFIC. As before, data will be stored and retrieved in the case of the original INSERT, however all tests and comparisons of the data will be done in casespecific mode.

In order to do non-casespecific testing, also called ‘case blind’ testing, it is necessary to apply a function such as the UPPER function to both sides of the comparison, thus rendering case a non-factor in the test.

Example

Initiate an ANSI session.

.SET SESSION TRANSACTION ANSI;

.LOGON L7544/tdxxx;

CREATE a Table

CREATE TABLE case_sp_test(col1 CHAR(7),col2 CHAR(7));

(Columns created in ANSI mode are defaulted to Casespecific.)

SHOW the Table

SHOW TABLE case_sp_test;

CREATE MULTISET TABLE tdxxx.case_sp_test,NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL

( col1 CHAR(7) CHARACTER SET LATIN CASESPECIFIC, col2 CHAR(7) CHARACTER SET LATIN CASESPECIFIC) PRIMARY INDEX (col1);

INSERT into the Table

INSERT INTO case_sp_test VALUES('LAPTOP','laptop');

SELECT from the Table

SELECT * FROM case_sp_test WHERE col1=col2;

***Query completed. No rows found.

SELECT using UPPER

SELECT * FROM case_sp_test WHERE UPPER(col1)=UPPER(col2); col1 col2 ------- -------LAPTOP laptop

In ANSI, it is necessary to perform a case blind test to do non-case specific testing.

The LOWER Function (1 of 2)

The LOWER function behaves similarly to the UPPER function but in the reverse direction. LOWER may be used as the choice for case blind test, just as UPPER may.

The LOWER function:

Allows case blind testing on case specific strings. Allows storage and retrieval of lower case characters.

Example

SELECT * FROM case_sp_test WHERE LOWER(col1)=LOWER(col2);col1 col2 ------- -------LAPTOP laptop

Note: Case blind test.

UPDATE the Table

UPDATE case_sp_test SET col2=col1;

SELECT * FROM case_sp_test;col1 col2 ------- ------- LAPTOP LAPTOP

Both LOWER and UPPER may be used to change the stored contents of a column from

one case to another. This may be done via an UPDATE statement which applies the function to the updated column values.

UPDATE the Table

UPDATE case_sp_Test SET col1=LOWER(col1);

SELECT * FROM case_sp_test; col1 col2 ------- -------laptop LAPTOP

The LOWER function provides the reverse capabilities of the UPPER function.

The LOWER Function (2 of 2)

A third way to accomplish a case blind test is via use of the NOT CASESPECIFIC attribute applied to the column test. This may be abbreviated to NOT CS and is applied to the column within parenthesis following the column name. Unlike UPPER and LOWER, NOT CS is not a function but rather an attribute of the column, to be applied to the column as part of the test in this situation.

Note that the use of NOT CS, while accomplishing the same case blind test, is not considered ANSI standard syntax. If ANSI standard compliance is required, the case blind test should be done using either LOWER or UPPER.

Example

Let's INSERT a second row into the table.

INSERT INTO case_sp_test VALUES ('LAPTOP','LAPTOP');

SELECT * FROM case_sp_test; col1 col2 ------- -------laptop LAPTOP LAPTOP LAPTOP

SELECT * FROM case_sp_test WHERE col1=col2; col1 col2 ------- -------LAPTOP LAPTOP

Note: Case sensitive result.

SELECT * FROM case_sp_test WHERE col1=LOWER(col2); col1 col2 ------- -------laptop LAPTOP

SELECT * FROM case_sp_test WHERE col1(NOT CS)=col2(NOT CS);col1 col2 ------- -------laptop LAPTOP LAPTOP LAPTOP

Note: Case blind test but non-ANSI syntax.

SELECT LOWER (col1) FROM case_sp_test;Lower(col1) ----------- laptop laptop

Note: Convert display to lowercase.

POSITION Function

The POSITION function is the ANSI standard form of the INDEX function of Teradata SQL. They are both used for locating the position of a string within a string.

Both functions require two arguments, the column or character string to be tested, and the character or string of characters to be located.

With the INDEX function, the two arguments are separated with a comma.

With the POSITION function, the more English-like IN keyword is used in place of a comma.

While both functions will continue to be available for compatibility purposes, it is suggested that future coding be done with POSITION.

Example

SELECT INDEX ('laptop','p'); Index('laptop','p')------------------- 3

SELECT INDEX ('laptop','top'); Index('laptop','top')--------------------- 4

The POSITION function is the ANSI standard function for locating a string within a string.

SELECT POSITION ('p' IN 'laptop'); Position('p' in 'laptop')------------------------- 3

SELECT POSITION ('top' IN 'laptop'); Position('top' in 'laptop')--------------------------- 4

Both POSITION and INDEX are available functions, but only POSITION is ANSI standard.

POSITION and Case Sensitivity

Care must be exercised in executing the same scripts in both ANSI and Teradata session mode, particularly where issues of case sensitivity are involved. This may become very significant in using the POSITION function which will or will not attempt to match case depending on the session type.

Note that in three of the five examples shown here, the ANSI session produces a different result than the Teradata session.

Examples

Note: "ANSI result" implies an ANSI transaction session mode. "Teradata result" implies a BTET transaction session mode.

SELECT POSITION ('top' IN 'laptop');

ANSI result4

Teradata result 4

SELECT POSITION ('TOP' IN 'laptop');

ANSI result0

Teradata result4

SELECT POSITION (upper('top') IN 'laptop');

ANSI result0

Teradata result4

SELECT POSITION (lower('TOP') IN 'laptop');

ANSI result4

Teradata result4

SELECT POSITION (lower('TOP') IN 'LAPTOP');

ANSI result0

Teradata result4

Renaming Columns

Columns in a table may be renamed using the ALTER TABLE command. In order to qualify for renaming, a column must not be referenced by any external objects and must be assigned a new name not already in use by the table.

A column which participates in any index is not a candidate for renaming. Likewise, a column which is either a referenced or referencing column in a referential

integrity constraint may not be renamed.

Note that renaming a column does not cascade the new name to any macros or views which reference it. The views and macros will no longer function properly until they have been updated to reflect the new name.

A column may be renamed to a different name provided that:

The new name doesn't already exist in the table. The column is not part of an index. The column is not part of any referential integrity contraints. The affected column is not referenced in the UPDATE OF clause of a trigger.

Example

CREATE a Table

CREATE TABLE rename_test(col1 INT,col2 INT,col3 INT)UNIQUE PRIMARY INDEX (col1),INDEX (col3);

Now ALTER it

ALTER TABLE rename_test RENAME col2 TO colb;

SHOW the Table

SHOW TABLE rename_test;CREATE SET TABLE PED.rename_test ,NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL ( col1 INTEGER, colb INTEGER, col3 INTEGER)UNIQUE PRIMARY INDEX (col1)INDEX (col3);

ALTER it Again

ALTER TABLE rename_test RENAME col1 TO cola;

Result

Failure: Column COL1 is an index column and cannot be modified.

ALTER it Again

ALTER TABLE rename_test RENAME col3 TO colc;

Result

Failure: Column COL3 is an index column and cannot be modified.

Lab

Try It! For this set of lab questions you will need information from the

Database Info document.

To start the online labs, click on the Access Lab Server button in the lower left hand screen of the course. A window will pop-up with instructions on accessing the lab server

You will need these instructions to log on to Teradata.

If you experience problems connecting to the lab server, contact [email protected]

Be sure to change your default database to the Customer_Service database in order to run these labs.

Click on the buttons to the left to see the answers.

Answers: Lab A Lab B Lab C Lab D Lab E

A. Show the first and last name of any employee in the employee table who uses an initial followed by a period, instead of his/her first name. Use position to solve this.

B. Show any employee who has an upper case letter B in his first name, but in a position other than the first position. Use the POSITION function in an ANSI mode session to solve this. While in ANSI mode, use the LIKE operator to solve this problem again. Try these solutions in Teradata (BTET) mode. Do they produce the same result? Is there a solution to this problem in Teradata mode?

C. Show first and last name of any employee who has the letter 'a' in the same position in both their first and last name. (Return to BTET mode first.)

D. Display first names in lowercase and last names in uppercase for employees whose last name begins with the letter R. Use Position to solve this.

E. Create a small table according to the following definition. Hint: For the remainder of these labs, it will be helpful to reset your default database to your userid. (DATABASE tdxxx;)

F. CREATE TABLE rename_tblG. (col1 INTH. ,col2 INTI. ,col3 INT);

Populate the table with one row as follows. INSERT INTO rename_tbl (1,2,3);

Create a view to access this table as follows:

CREATE VIEW rename_view ASSELECT col1 AS vcol1 ,col2 AS vcol2

http://www.teradatau.courses.teradata.com/learning/BLADE_MS/legacy/29957_SQL03_ApplDevNew/wbt-printmod00.htm





FROM rename_tbl;

Select the row via the view.

Attempt to rename col1 of rename_tbl to be colA.

Attempt to rename col2 of the rename_tbl to be colB.

Attempt to select the row from the view again.

Replace the view with the renamed column.

Attempt to select the row from the view again.

5.) Identity Columns and Key Retrieval

Objectives

After completing this module, you should be able to:

Use the Identity Column feature to generate an identity column for a table. Use this feature to implement a Primary Key column. Use this feature to implement a Unique column. Automatically retrieve generated identity column values after row insertion.

Identity Column Features

The Generated Identity Column feature permits the automated generation of a column value based on a prescribed sequencing and interval set. A typical application of this feature would be to generated the values associated with a system assigned primary key.

The following are options which can be used with this feature:

GENERATED ALWAYS - will always generate a value for this column, whether or not a value has been specified.

GENERATED BY DEFAULT - will generate a value for this column only when defaulting or a null is specified for the column value.

START WITH - the value which will be used to start the generated sequence. (default is 1)

INCREMENT BY - the interval which is used for each new generated value. (default is 1)

MINVALUE - the smallest value which can be place in this column (default is smallest value supported by the data type of the column)

MAXVALUE - the largest value which can be place in this column (default is largest value supported by the data type of the column)

CYCLE - after the maxvalue has been generated, restart the generated values using the MINVALUE.

Only numeric data types from the following list may be used for identity columns:

INTEGER SMALLINT BYTEINT DECIMAL NUMERIC

The identity column feature only supports whole numbers.

Use of ALWAYS, MAXVALUE and CYCLE

Create a table with an ALWAYS system-generated unique primary index with a maximum value of 3.

CREATE SET TABLE test_idCol ,FALLBACK ( Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 3),Col2 INTEGER NOT NULL)UNIQUE PRIMARY INDEX ( Col1 );

Populate the table as follows:

INSERT INTO test_idcol VALUES (, 9);INSERT INTO test_idcol VALUES (NULL, 9);INSERT INTO test_idcol VALUES (6, 9);

*** Warning: 5789 Value for Identity Column is replaced by a system-generated number

SELECT * FROM test_idCol ORDER BY 1;

Col1 Col2----------- ----------- 1 9 2 9 3 9

Things to notice:

A value is generated regardless of whether:

a default is specified a null is specified a value is specified

When a value is specified and replaced, a warning is given.

Now, let's add another row.

INSERT INTO test_idcol VALUES (7, 9);

***Failure 5753 Numbering for Identity Column Col1 is over its limit.

It was not possible to add another row, because the MAXVALUE has been achieved. In fact, no new row may be added to this table ever again.

Remove the rows from the table.

DELETE FROM test_idCol;

Now try to add the additional row.

INSERT INTO test_idcol VALUES (7, 9);

***Failure 5753 Numbering for Identity Column Col1 is over its limit.

Note that even after you delete all rows from the table, you cannot add new rows if you have exceeded MAXVALUE and you have not specifed the CYCLE option. You must drop and recreate this table in order to add rows.

Create another table similar to the previous one, but with the CYCLE option.

CREATE SET TABLE TEST_ID3 ,FALLBACK (Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MAXVALUE 3 CYCLE),Col2 INTEGER)UNIQUE PRIMARY INDEX ( Col1 );

Again, insert the first three rows as follows:

INSERT INTO test_id3 VALUES (, 9);INSERT INTO test_id3 VALUES (NULL, 9);INSERT INTO test_id3 VALUES (6, 9);


SELECT * FROM test_id3 ORDER BY 1; Col1 Col2----------- ----------- 1 9 2 9 3 9

Now, attempt to execute the same three insert statements a second time.



SELECT * FROM test_id3 ORDER BY 1;

Col1 Col2----------- ------------2147483647 9-2147483646 9-2147483645 9 1 9 2 9 3 9

Notice what happens here:

After hitting the MAXVALUE, it reverts to the default minimum value. It does not revert to the START WITH value, which is 1. The default minimum value for an integer is approximately negative two

billion. Each subsequent insert increments the value by the default increment of 1.

Use of ALWAYS, MINVALUE and CYCLE

Drop and recreate table with a MINVALUE of 1 specified.

DROP TABLE test_id3;

CREATE SET TABLE test_id3 (Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1 MAXVALUE 3 CYCLE), Col2 INTEGER) UNIQUE PRIMARY INDEX ( Col1 );

Insert the same three starter rows.




Col1 Col2----------- ----------- 1 9 2 9 3 9

Now, insert a 4th row.

INSERT INTO test_id3 VALUES (NULL, 9);

***Failure 2801 Duplicate unique prime key error in PaulD.TEST_ID3.

What happened here?

The fourth row would have made Col1 = 1 due to the CYCLE option. This violates the uniqueness of the primary index Col1, thus it

is rejected.

Drop and recreate the table with a MINVALUE specified and with a non-unique primary index.


CREATE SET TABLE TEST_ID3(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1 MAXVALUE 3 CYCLE), Col2 INTEGER) PRIMARY INDEX ( Col1 );




Now try to add the 4th row.

INSERT INTO test_idcol VALUES (NULL, 9);

***Failure 2802 Duplicate row error in PaulD.TEST_ID3.

What happened this time?

The fourth row would have had values (1,9) due to the CYCLE option..

This violates the 'no duplicate row' rule, thus it is rejected.

The following rows are inserted.


*** Warning: 5789 Value for Identity Column is replaced by a system-generated number.

SELECT * FROM test_id3 ORDER BY 2,1;

Col1 Col2----------- ----------- 1 7 2 7

3 7 1 9 2 9 3 9

All rows are inserted successfully and the identity column recycles.

Handling Gaps in the Sequence

Now, let's drop and recreate the table as previously with a Unique Primary Index.


CREATE SET TABLE test_id3(Col1 INTEGER GENERATED ALWAYS AS IDENTITY (MINVALUE 1 MAXVALUE 5 CYCLE),Col2 INTEGER)UNIQUE PRIMARY INDEX ( Col1 );





Col1 Col2----------- ----------- 1 9 2 9 3 9

Remove the second row inserted.

DELETE FROM test_id3 WHERE Col1 = 2;

Insert an additional row.

INSERT INTO test_id3 VALUES (NULL, 9);


Col1 Col2----------- ----------- 1 9 3 9 4 9

What happened here?

Gaps in the sequence are not filled.

The column always knows what its last used value is. It increments that value and attempts to assign the next value.

Now, add the following two additional rows.

INSERT INTO test_id3 VALUES (, 9); - adds the (5,9) rowINSERT INTO test_id3 VALUES (, 9); - fails

*** Failure 2802 Duplicate row error in TEST_ID3.

SELECT

* FROM test_id3 ORDER BY 1;

Col1 Col2----------- ----------- 1 9 3 9 4 9 5 9

Things to notice:

The final insert intended to insert a (1,9) row. This is because the MAXVALUE is 5 and CYCLE was specified. It couldn't do that since the (1,9) row already exists.

Insert an additional row.

INSERT INTO test_id3 VALUES (NULL, 8); - adds (2,8)


Col1 Col2----------- ----------- 1 9 2 8 3 9 4 9 5 9

Note that the increment occurs, even though the insert of 1 previously

failed.

Generating BY DEFAULT (1 OF 2)

Now let's switch to BY DEFAULT mode. This mode will generate a value only when a value is not explicitly expressed.

CREATE SET TABLE TEST_ID3 ,FALLBACK (Col1 INTEGER GENERATED BY DEFAULT AS IDENTITY (MINVALUE 1 MAXVALUE 5 CYCLE),Col2 INTEGER )PRIMARY INDEX ( Col1 );

Add two rows to the table.

INSERT INTO test_id3 VALUES (1, 9);INSERT INTO test_id3 VALUES (3, 9);


Col1 Col2----------- ----------- 1 9 3 9

Note that no identity columns were generated. The values were provided explicitly.

INSERT INTO test_id3 VALUES (, 9);

*** Failure 2802 Duplicate row error in TEST_ID3.

Things to notice:

This insert attempted to add the row (1,9) again. This is because it used the START WITH value of 1. Because this row already exists, the insert fails.

Add the following row.

INSERT INTO test_id3 VALUES (, 8); - adds (2,8)


Col1 Col2 ----------- ----------- 1 9 2 8 3 9Notice, 2 is the generated value.The insert the value 1 previously failed and it will not be reused until it cycles.

Now, add the following row.

INSERT INTO test_id3 VALUES (6, 9); - adds (6,9)


Col1 Col2----------- ----------- 1 9 2 8 3 9 6 9

Note, added values are explicit, thus no defaulting

Insert the following three additional rows.

INSERT INTO test_id3 VALUES (, 8); - adds (3,8)INSERT INTO test_id3 VALUES (, 8); - adds (4,8)INSERT INTO test_id3 VALUES (, 8); - adds (5,8)


Col1 Col2----------- ----------- 1 9 2 8 3 8 3 9 8 5 8 6 9

Note, last generated value, prior to these inserts, was 2.

Add the following row.



Col1 Col2----------- ----------- 1 7 1 9 2 8 3 8 3 9 4 8 5 8 6 9

Note that the recycle has begun by reverting back to the MINVALUE OF

1.

Generating BY DEFAULT (2 OF 2)

Consider that the last row generated for this table is the (1,7) row.


Col1 Col2----------- ----------- 1 7 1 9 2 8 3 8 3 9 4 8 5 8 6 9

Now remove all rows from the table.

DELETE FROM test_id3;

Add the following row to the table.

INSERT INTO test_id3 VALUES (, 6);


Col1 Col2----------- ----------- 2 6

Things To Notice:

The id generation picks up where it left off at 2. Emptying the table does not reset to the START WITH value.

Add the following row to the table.

INSERT INTO test_id3 VALUES (4, 6);


Col1 Col2----------- ----------- 2 6 4 6

Note that no identity columns were generated. The value 4 was provided explicitly.

Now insert the following three rows.

INSERT INTO test_id3 VALUES (, 5); - adds (3,5)INSERT INTO test_id3 VALUES (, 5); - adds (4,5)INSERT INTO test_id3 VALUES (, 5); - adds (5,5)

SELECT * FROM test_id3 ORDER BY 1; Col1 Col2----------- ----------- 2 6 3 5 4 5 4 6 5 5

Note that another 4 row is generated even though one already exists.The generating mechanism operates independently or pre-existing rows.



Col1 Col2----------- ----------- 1 5 2 6 3 5 4 5 4 6 5 5

Note, the recycle begins again.

Rules and Restrictions Of Identity Columns

Identity Column as an Attribute

An Identity Column (IC) is considered an attribute of a column. You cannot drop (or modify) the IC attribute of a column. You can drop an IC column from a table. An IC attribute cannot co-exist with any of the following attributes:

DEFAULTBETWEENCOMPRESSCHECKREFERENCES

Some Rules for Identity Columns (ICs)

Only one IC is permitted per table. It can be any column in the table with some exceptions. IC's cannot be any part of any of the following:

Composite indexes (primary or secondary) Hash or Join Indexes Partitioned Primary Indexes Value-ordered Indexes

Upserts are not supported on tables where the PI is an IC. Column compression cannot be specified with ICs.

Bulk Row Inserts with Identity Columns (ICs)

IC's are not supported for load utilities Fastload and Multiload. IC's with multi-session BTEQ or multi-statement TPUMP are permitted. Bulk inserts across multiple sessions cannot guarantee that the sequence of

the IC numbers will correlate to the sequence of the specified INSERT statements.

Bulk inserts done via INSERT SELECT also cannot guarantee that the sequence of the assigned IC's will be unbroken. This is because each AMP pre-allocates a range of numbers based on a pre-defined interval (specified in the DBS Control Record). Consequently each AMP will provide its own sequence independently of the others.

Auto-Generated Key Retrieval

History

Prior to the auto-generated key retrieval feature, there was no simple way to determine the value assigned to the identity column for an INSERTed row of a table. If there was a unique column, or a unique combination of columns, with a USI assigned, then the user could do a SELECT of the identity column, qualifying on the unique column(s). This required an additional query request and would, in some cases, require an all-AMP operation, and therefore was considered inefficient. It also presented a bigger problem in the case of INSERT-SELECT which usually adds multiple rows to the table in question.

Business Value

Having the IdCol values automatically returned enhances applications that require quick or immediate retrieval of assigned identity values.

Examples

To enable this feature, use the new .SET command in BTEQ:

.[SET] AUTOKEYRETRIEVE [OFF|COLUMN|ROW]

Where:

• OFF = Disabled• COLUMN = Enabled, display IDCol only• ROW = Enabled, display entire row

Example

Let’s say we have the following CREATE TABLE statement:

CREATE TABLE customerDetails (custID INTEGER GENERATED ALWAYS AS IDENTITY,

custName VARCHAR (30), city VARCHAR (20), phoneNo INTEGER );With Teradata Database V2R6.1 and prior, we enter the following:INSERT INTO CustomerDetails (, ‘John’, ‘London’, 919866234567);And we receive the following results:

*** Insert completed. One row added.*** Total elapsed time was 1 second.

We now have no idea what value was assigned to the customer without doing a separate retrieve of this row. In Teradata V2R6.1 and prior, quick retrieval of the new custID was possible only if a USI was present on the table. It then required a separate SELECT of the identity column, using a qualifying condition on the unique column(s).

In Teradata Database V2R6.2, we can enter the following:

.SET AUTOKEYRETRIEVE ROW; INSERT INTO customerDetails VALUES (, ‘George’, ‘London’, 919848123);And we receive the following results:

*** Insert completed. One row added.*** Total elapsed time was 1 second.

custID custName city phoneno------ --------- ---------- ------------------ 2 George London 919848123

Note: If no “.SET AUTOKEYRETRIEVE” statement is used, the system functionsas before – no rows or IDcol values are displayed.

Considerations and Limitations

Considerations

INSERTs have the additional cost of row retrieval if Auto Generated Key Retrieval (AGKR) is requested. However, the additional retrieval takes less time overall compared with having to run a separate SELECT to retrieve the identity value.

Limitations

The following limitations apply:

This feature supports explicit single INSERT and INSERT SELECT statement only. It does not support any other form of inserts, e.g., Upsert, MERGE-INTO, Triggered Inserts, Multiload and Fastload.

Iterated INSERTs have to adhere to the 2048 spool limit of the Array Support feature. A max of 1024 iterations is possible as each iteration uses an AGKR spool and a response spool.

This feature is enabled through the Client, e.g., JDBC driver or BTEQ. If the Client version does not include the AGKR capability, then there will be no AGKR response for INSERT requests.

Lab






Prior to doing these labs, it will be helpful to reset your default database to your own user id (i.e. DATABASE TDnnnn;).


The lab problems in this modules use the following table which exists in the Customer_Service database.

CREATE SET TABLE agent_sales ,NO FALLBACK ,NO BEFORE JOURNAL,NO AFTER JOURNAL(agent_id INTEGER,sales_amt INTEGER)UNIQUE PRIMARY INDEX ( agent_id );

In this lab, we will attempt to assign a badge to each agent. Each badge must be given a sequentially unique id as it is assigned to an agent. We will use an identity column to accomplish this.

Answers:

Lab 2 Lab 3

1.) Create a new table in your own database using the following text:

CREATE SET TABLE agent_badge ,NO FALLBACK ,NO BEFORE JOURNAL,NO AFTER JOURNAL(badge_id INTEGER GENERATED ALWAYS AS IDENTITY,agent_id INTEGER)UNIQUE PRIMARY INDEX ( agent_id);

2a.) Populate this table from the agent_sales table in the Customer_Service database using an INSERT SELECT. The first column should be assigned a NULL and the second column should select the agent_id from the agent_sales table. To keep the example concise, only populate this table with agents whose id is less than 40.



2b.) Veryify the resulting table. Order the rows by badge_id. Are the identity column values what you might have expected?

3a.) We will now try to recreate the agent_badge table with sequentially assigned ids starting with number one and with no gaps. First, drop and recreate the table.

3b.) Now, set up and execute a BTEQ EXPORT script which exports the agent ids to a file called agent_exp.

3c.) Now, set up and execute a BTEQ IMPORT script which imports values from

the agent_exp file into the agent_badge table.

3d.) Verify the results by selecting all rows of the agent_badge table, ordered by badge_id.

6.) Multi-Column Compression

Objectives


Implement column compression on a table using the ALTER TABLE command.

Recognize the benefits of column compression implemented with ALTER TABLE.

Implement multi-value column compression for table columns. Recognize certain benefits, considerations and limitaions of multi-value

column compression.

Multiple Value Column Compression

The COMPRESS phrase allows values in one or more columns of a permanent table to be compressed to zero space, thus reducing the physical storage space required for a table.

The COMPRESS phrase has three variations:

Compression Option What HappensCOMPRESS Nulls are compressed.COMPRESS NULL Nulls are compressed.COMPRESS <constant>,<constant>,..

Nulls and the specified <constant> value(s) are compressed.

Note: COMPRESS & COMPRESS NULL mean the same thing.

The last of these three options allows up to 255 values to be compressed for a

single column.

Example 1

An example of single-value column compression:

CREATE TABLE bank_account_data (customer_id INTEGER ,account_type CHAR(10) COMPRESS 'SAVINGS');

Things to notice in this example:

Both nulls and the string 'Savings' will be compressed when the row is stored.

The value of 'Savings' is written in the table header for table bank_account_data on each AMP in the system.

Example 2

An example of multiple-value column compression:

CREATE TABLE bank_account_data (customer_id INTEGER ,account_type CHAR(10) COMPRESS ('SAVINGS','CHECKING','CD','MUTUAL FUND'));

Things to notice in this example:

Each of the four specified strings are now compressed when the row is stored.

Nulls are also compressed. Each of these values is written to the table header on each AMP. Zero space is taken in the physical row for each of these values. This can significantly reduce the amount of space needed to contain the

table.

Compression has two primary benefits:

It reduces system storage costs. It enhances system performance.

Impact Of Multiple Value Column Compression

System Storage Costs:

Compression reduces storage costs by storing more 'logical' data using fewer 'physical' resources. In general, compression causes physical rows to be smaller, consequently permitting more rows per data block and thus fewer data blocks for the table.

The amount of storage reduction is a function of the following factors:

The number of values compressed. The size of the values compressed. The percentage of rows in the table with these values.

System Performance:

System performance can be expected to improve as a result of value compression. Because each data block can hold more rows, fewer blocks need to be read to resolve a query, and thus fewer physical I/O's can be expected to take place. Also, because the data remains compressed while in memory, more rows can be available in cache for processing per I/O.

Compression Transparency:

Compression is transparent to all user applications, utilities, ETL tools, ad hoc queries and views.

Compression Suggestions

Examples of highly compressible values:

Any of the following should be considered candidates for compression, when the frequency of their occurrence is high:

Nulls Zeroes Spaces Default Values

Suggested Application Columns For Compression:

Any column with a high frequency values, or with a relatively small number of values should be considered a candidate for compression. The following is a list of possible candidate columns.

State City Country Automobile Make Credit Card Type Account Type First Name Last Name

Limitations On Multi-Value Column Compression

Limitations:

A maximum of 255 values may be compressed per column. The maximum size of a compressed value is 255 bytes. Only columns with a fixed physical length may be compressed - e.g. CHAR

but not VARCHAR. Primary index columns cannot be compressed. The aggregate of all compressed values may not exceed the maximum size

of a table header (64K).

What is Not Compressible

The following data types are not compressible:

INTERVAL TIME TIMESTAMP VARCHAR VARBYTE VARGRAPHIC

ALTER TABLE Compression

The SQL ALTER TABLE command supports adding, changing, or deleting column compression on one or more existing columns of a table, whether the table is loaded with data or is empty.

History

Traditionally there has been a trade-off between the desire to store more data and the cost of storing the additional data. The column compression feature of the Teradata database has always provided an opportunity to reduce the amount of data to be physically carried in the database by storing frequently repeating values in a table header rather than repeating them for every row in the table. The column compression feature thereby helps improve the trade-off between these competing requirements, making it less expensive to store more data.

Without the ALTER TABLE Compression feature, when compression requirements on a table change, it would be necessary to recreate the table with the new compression requirements specified, and then reload the table. The ALTER TABLE Compression feature makes it easier to add, change or delete compression requirements from an existing table without the user having to recreate and reload the table.

Examples

The following are examples of the ALTER TABLE Compression feature. Note, whenever the COMPRESS attribute is applied to a nullable column, nulls will always be compressed by default. If other compression values are specified, they will be compressed in addition to null compression.

There is an implied sequence in the statements that follow:

Example Set 1:

ALTER TABLE Table1 ADD Col1 COMPRESS;

If the column Col1 exists and is nullable, then the column will be a compressible column with NULL as the compress value.

If the column Col1 does not exist, then the column is added to the table and the column will be a compressible column with NULL as the compress value.

ALTER TABLE Table1 ADD Col1 COMPRESS NULL;

Same as previous example.

ALTER TABLE Table1 ADD Col1 COMPRESS ‘Savings’;

The column will be compressed for nulls and for the constant value ‘Savings’. If the column is already compressed the constant value ‘Savings’ will replace

the existing compress value or list of values.

ALTER TABLE Table1 ADD Col1 COMPRESS NULL,’Savings’, ‘Checking’;

The column will be compressed on the specified compress list. If the column is already compressed, the new compress list will replace the

existing compress value or list of values. Null will be compressed in either case.

Example Set 2:

ALTER TABLE Table1 ADD Col2 COMPRESS 0;

Column will be compressed for one value - zero. Nulls will also be compressed if the column is nullable (default is nullable).

ALTER TABLE Table1 ADD Col2 COMPRESS (0, 100, 1000);

Add compressed values (100, 1000). Note value zero must be restated if this follows the previous ALTER

statement.

ALTER TABLE Table1 ADD Col2 COMPRESS (NULL,0,100,1000,10000);

Adds compressed values 10000. Only 10000 is added to list since NULL, 0, 100 and 1,000 are already

compressed in prior example.

ALTER TABLE Table1 ADD Col2 COMPRESS (NULL, 0, 100);

Reduces the compression list to null and two values. Values 1,000 and 10,000 are not compressed.

ALTER TABLE Table1 ADD Col2 NO COMPRESS;

All compressed values are now disabled. Column is now uncompressed.

Compression Optimizations

ALTER TABLE Compression Optimizations

The following listing shows the mechanics of how compression is physically implemented via the ALTER TABLE command and the optimizations used.

The table will actually be rebuilt at the time of the execution of the ALTER TABLE statement.

Space overhead requirement for rebuilding a table is around 2 MB. The table is rebuilt one cylinder at a time, so no matter how big or small a table is, the overhead remains the same.

A full duplicate copy of the table is never created nor required. The ALTER process is restartable via checkpoints in the Transient Journal. No Rollback process is possible. If a restore of the original table is desirable this can be accomplished by:

- Re-ALTERing the table with the original compression specifications.

- An archive of the original table followed by a restore.

Considerations and Limitations

Limitations

The ALTER TABLE ADD cname syntax allows certain other attributes to be included in the same statement as the COMPRESS attribute. The following are exceptions to this rule:

A column CONSTRAINT cannot be defined at the same time. A COMPRESS modification with a NULL value in the compress list is

not allowed in conjunction with a NOT NULL attribute change. Altering a non-compressible column to a compressible column is not

allowed if changing the column to an Identity column at the same time. These changes may be implemented separately.

Additional Considerations

Compressing columns on which secondary indexes are defined is allowed, unless the index is the either the PK or the FK of a Referential Constraint.

An Exclusive lock is required on the table being compressed.

Compression Versus VARCHAR

Character data which has a significant length variability can be also stored using a VARCHAR data type to save space. VARCHAR stores only the actual value with a two-byte length field in the physical row, while omitting trailing blanks. VARCHAR data types are not eligible for compression because they are not fixed length.

When debating whether VARCHAR or compression is preferable for a character column, three factors are to be considered:

The average field length of the character data.

The maximum field length of the character data. The frequency of occurrence of the compressible values.

The following rule dictates which approach should be favored:

Choose VARCHAR - when the difference between maximum and average field length is high and the frequency of occurrence is low.

Choose Compression - when the difference between maximum and average field length is low and the frequency of occurrence is high.

Choose VARCHAR - when there is no clear winner between the two. This is because VARCHAR uses slightly less CPU resource.

Summary

Multiple Value Column Compression

This feature permits up to 255 values to be compressed for a given column. The benefits of using this features are:

Decreased usage of disk space. Increase in table query performance.

ALTER TABLE Compression

This feature of the SQL ALTER TABLE command supports adding, changing, or deleting column compression on one or more existing columns of a table, regardless of whether the table is loaded with data or is empty. When ALTER TABLE is used to change or initiate column compression, the table will be rebuilt internally.

Lab







Click on the answer button to the left to see the answers.

Answers:

Lab 1 Lab 2 Lab 3 Lab 4 Lab 5 Lab 6 Lab 7

1.) Display the table definition for the Customer_Service.accounts table.

2.) How many distinct values of the column 'city' are found in the accounts table and how many occurrences are there for each value?

3.) How much space is taken by this table currently?

4.) Create the table Accounts in your database with compression specified for the following 'city' values:

'Culver City', 'Hermosa Beach', 'Los Angeles', 'Santa Monica'

5.) Populate your table with the rows from Customer_Service.accounts

6.) See how much space your table requires compared to the uncompressed version in the Customer_Service database, as seen in lab #3.

7.) How many distinct cities, states and zip codes are contained in the accounts table. Use a single query to answer this question.

7.) Join Indexes and NUSI's

Objectives


Describe the purpose of value-ordered NUSI's and their implementation Describe the purposes of Join Indexes and their implementation. Distinguish between single-table and multi-table join indexes. Add a NUSI to a Join Index.

NUSI Review

Non-Unique Secondary Indexes (NUSI's) are a Teradata index feature which permits defining non-primary indexes on non-unique columns. Typically, this is done to improve performance on queries which use the column or columns in the WHERE clause selection criteria. NUSI's may be created either as a part of the CREATE TABLE syntax, or they may be created after table creation using CREATE INDEX syntax. NUSI's may be easily dropped when their presence is no longer needed by using the DROP INDEX syntax.

Alternative 1 – CREATE TABLE Syntax

Create an 'employee' table with a NUSI on the job code.

CREATE SET TABLE employee ,FALLBACK ,








(employee_number INTEGER,manager_employee_number INTEGER,department_number INTEGER,job_code INTEGER,last_name CHAR(20) NOT NULL,first_name VARCHAR(30) NOT NULL,hire_date DATE FORMAT 'YY/MM/DD' NOT NULL,birthdate DATE FORMAT 'YY/MM/DD' NOT NULL,salary_amount DECIMAL(10,2) NOT NULL) UNIQUE PRIMARY INDEX ( employee_number )INDEX (job_code);

Alternative 2 – CREATE INDEX Syntax

Create a NUSI on the job code column for existing 'employee' table.

CREATE INDEX (job_code) ON employee;

Example – DROP INDEX Syntax

Drop the NUSI on the job code column of the 'employee' table.

DROP INDEX (job_code) ON employee;

Upon creation of a NUSI, a subtable is built on each AMP. The subtable contains a row for each NUSI value to be found on this AMP and the row-ids of the associated base table rows which are co-located on the AMP.

Rows are sequenced in the subtable based on the hash of the NUSI value. While this is convenient for finding all rows with a particular NUSI value, it is less useful for doing 'range' searches. For example, the index created here would be useful in finding all employee rows with a job code of 122100, but less useful in locating all employee rows whose job code is between 122000 and 123000.

Value Ordered NUSIs

Value Ordered NUSI's allow NUSI subtable rows to be sorted based on a data value, rather than on a hash of the value. This is extremely useful for range processing where a sequence of values between an upper and lower limit is desired.


Create an 'employee' table with a value-ordered NUSI on the job code.

CREATE SET TABLE employee ,FALLBACK , ( employee_number INTEGER,manager_employee_number INTEGER, department_number INTEGER, job_code INTEGER,last_name CHAR(20) NOT NULL, first_name VARCHAR(30) NOT NULL,hire_date DATE FORMAT 'YY/MM/DD' NOT NULL,

birthdate DATE FORMAT 'YY/MM/DD' NOT NULL,salary_amount DECIMAL(10,2) NOT NULL) UNIQUE PRIMARY INDEX ( employee_number ) INDEX (job_code) ORDER BY VALUES (job_code);


Create a value-ordered NUSI on the job code column of existing 'employee' table.

CREATE INDEX (job_code) ORDER BY VALUES (job_code) ON employee;

The optimizer may now choose this index to do range searches on job codes.

The NUSI's are automatically maintained by the Teradata database, that is, when a base table row changes values, any corresponding values in the NUSI subtable are also changed. It is never necessary to do anything to maintain a secondary index. You can only create it, drop it and collect statistics on it.

Limitations of Value-Ordered NUSI's

A column defined as a value-ordered index column must be:

A single column A column which is a part of or all of the index definition A numeric column – non-numerics are not allowed No greater than four bytes in length – INT, SMALLINT, BYTEINT, DATE,

DEC are valid

Note: Although DECIMAL data types are permitted, their storage length must not exceed four bytes and they cannot have any precision digits.

Index Covering

If a query references only those columns that are contained within a given index, the index is said to "cover" the query. In these cases, it is often more efficient for the optimizer to access only the index subtable and avoid accessing the base table rows altogether.

Covering will be considered for any query that references only columns defined in a given NUSI. These columns can be specified anywhere in the query including the:

1. SELECT list 2. WHERE clause 3. aggregate functions 4. GROUP BY 5. expressions

The presence of a WHERE condition on each of the indexed columns is not a guarantee for using the index to cover the query. The optimizer will consider the appropriateness and cost of 'covering' versus other alternative access paths and choose the optimal plan.

The potential performance gains from index covering require no user intervention and will be transparent except for the improved access time. The use of the NUSI

can be validated by reviewing the execution plan returned by EXPLAIN.

Join Index

The Join Index feature provides indexing techniques that can improve the performance of certain types of queries. The Join Index is a physical structure, populated with rows that contain columns from one or more tables. Once created, it becomes an option available to the optimizer but is never directly accessed by the user.

Its purpose is to aid in the joining of tables by providing needed data from an index rather than having to access the base rows of the table. By using a join index the optimizer may be able to avoid having to access or redistribute many individual tables and their base rows.

The Join Index supports syntax for the following types of indexes:

1. Multiple-table Join Index - Used to pre-join multiple tables 2. Single-table Join Index - Used to rehash and redistribute the rows of a

single table based on a specified column or columns 3. Aggregate Join Index - Used to create an aggregate index to be used as a

summary table

In this section we will be discussing the first two only. The third item, aggregate indexes, are discussed in a separate module of this training.

Multiple-Table Join Indexes

Multiple-table Join Indexes are used to pre-join two or more tables. Consider the following tables which are in the 'Student' database.

CREATE TABLE customers ( cust_id INTEGER NOT NULL,cust_name CHAR(15),cust_addr CHAR(25) )UNIQUE PRIMARY INDEX ( cust_id );

CREATE TABLE orders ( order_id INTEGER NOT NULL,order_date DATE FORMAT 'yyyy-mm-dd', cust_id INTEGER,order_status CHAR(1)) UNIQUE PRIMARY INDEX ( order_id );

The relationship of these table is demonstrated by the following diagram:

There are 49 orders with valid customers. There is 1 order that has an invalid customer.

There is 1 valid customer who has no orders.

Query 1 – Without Join Index

How many orders have assigned customers?

SELECT COUNT(order_id) FROM orders WHERE cust_id IS NOT NULL;

Count(order_id) --------------- 50

A join index will not help this query. The 'order' table covers the query.

Query 2 – Without Join Index

How many orders have assigned valid customers?

SELECT COUNT(o.order_id) FROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id;

Count(order_id) --------------- 49

A join index can help this query. Two tables are needed to cover the query.

Creating A Join Index

The following shows the creation of a join index which will improve the performance of any joins it can cover.

CREATE JOIN INDEX cust_ord_ix AS SELECT (c.cust_id, cust_name) ,(order_id, order_status, order_date) FROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id PRIMARY INDEX (cust_id);

The join index is comprised of a 'fixed portion' (first parenthesis) and a 'repeatable portion' (second parenthesis). This represents a denormalization of the data and logically looks like the following:

CUST_ID CUST_NAME ORDER_ID ORDER STATUS ORDER_DATE1001 ABC Corp 501 C 990120

502 C 990220

503 C 990320

504 C 990420

505 C 990520

506 C 9906201002 BCD Corp 507 C 990122

508 C 990222

509 C 990322

: : : : :Now, let's revisit the same query again.

Query 2 – With Join Index


SELECT COUNT(o.order_id) FROM customers c INNER JOIN orders oON c.cust_id = o.cust_id; Count(order_id)

--------------- 49

The join index helps this query because it covers the query and therefore the

result may be generated without ever accessing the rows of the base table.

Compare the costs as shown by EXPLAIN.

Without Join Index – .39 secs.With Join Index – .17 secs.

This represents a 50% decrease in query time because of the join index.

The join index is automatically maintained by the Teradata RDBMS, that is,

when a base table row changes values, any corresponding values in the join index

are also changed. It is never necessary to do anything to maintain a join

index. You can only create it and drop it.

The join index seen above may also be used to cover other queries.

(Repeated For Convenience)CUST_ID CUST_NAME ORDER_ID ORDER STATUS ORDER_DATE1001 ABC Corp 501 C 990120

502 C 990220

503 C 990320

504 C 990420

505 C 990520

506 C 990620

1002 BCD Corp 507 C 990122

508 C 990222

509 C 990322

: : : : :

Query 3 – With Join Index

How many valid customers have assigned orders in January 1999?

SELECT COUNT(c.cust_id) FROM customers c INNER JOIN orders oON c.cust_id = o.cust_id WHERE o.order_date BETWEEN 990101 AND 990131;

Count(cust_id)

--------------- 9

Because the 'order_date' column is included in the join index, once again

this query is covered by it. Compare the costs as shown by EXPLAIN.

Without Join Index – .40 secs.With Join Index – .17 secs.

This represents a greater than 50% decrease in query time because of the join index.

Assigning the Primary Index For Join Indexes

Join Indexes are always assigned a primary index in order to hash distribute the index rows across the AMPs. In the example created here, the Primary Index of the Join Index is the column Cust_id.

We can explicitly specify the primary index for the Join Index or allow it to default to the first column specified. In our upcoming look at Single-Table Join Indexes, we will see the usefullness of being able to choose and specify a primary index on a Join Index.

CREATE JOIN INDEX cust_ord_ix AS SELECT (c.cust_id, cust_name) ,(order_id, order_status, order_date)FROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id PRIMARY INDEX (cust_id);

Value Ordering Join Index Rows

Optionally, you can specify the sequencing of the join index rows on each AMP. Normally, the rows will be sequenced on each AMP by the hash value of the primary index for the Join Index. Because this default sequencing limits the efficiency for doing 'range' processing, an ORDER BY clause is available which allows this default sequencing to be overridden.

CREATE JOIN INDEX cust_ord_ix ASSELECT (c.cust_id, cust_name) ,(order_id, order_status, order_date) FROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id ORDER BY c.cust_id PRIMARY INDEX (cust_id)

Query 3 – With Join Index ORDERed BY cust_id

How many distinct valid customers with customer ids between 1001 and 1005 have assigned orders?

SELECT COUNT(DISTINCT(c.cust_id)) FROM customers c INNER JOIN orders oON c.cust_id = o.cust_id WHERE c.cust_id BETWEEN 1001 AND 1005; Count(Distinct(cust_id))

------------------------ 5

Because this query accesses a range of customer ids, the optimizer can access the rows of the Join Index more efficiently because the qualifying rows are already sequenced by cust_id and thus easily located.

The rules for the ORDER BY column are the same as for Value-Ordered NUSI's.

The ORDER BY column must be:

A single column A column which is a part of or all of the fixed-portion index definition A numeric column – non-numerics are not allowed No greater than four bytes in length – INT, SMALLINT, BYTEINT, DATE, DEC are valid

Note: Although DECIMAL data types are permitted, their storage length must not exceed four bytes and they cannot have any precision digits.

NUSIs On Join Indexes

A NUSI may be created on a join index and may be used to improve access to the join index rows. In the example just seen, we ordered the rows of the join index by 'cust_id' in order to facilitate 'range' processing on customer numbers.

Because the rows of the join index can only be sequenced by one column, we need to use another technique to facilitate 'range' processing for the order date.

We can solve this problem by adding a NUSI on the join index and value ordering it on the order date. NUSI's on join indexes can be built as part of the CREATE JOIN INDEX statement, or they can be added after join index creation using the CREATE INDEX statement.


Create the same join index and also create a NUSI on the Join Index for the 'order_date' column. Value order the NUSI on the 'order_date' column.

CREATE JOIN INDEX cust_ord_ix AS SELECT (c.cust_id, cust_name),(order_id, order_status, order_date) FROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id ORDER BY c.cust_id /* This ORDER BY controls how the rows of the Join Index will be sorted on the AMPs */ PRIMARY INDEX (cust_id) INDEX (order_date) ORDER BY (order_date); /* This ORDER BY controls how the rows of the NUSI will be sorted on the AMPs */ ;


Create a NUSI on the existing join index 'cust_ord_ix'.

CREATE INDEX (order_date) ORDER BY VALUES (order_date) ON cust_ord_ix;

Note: The keyword VALUES is optional.

Single-Table Join Indexes

Single-Table Join Indexes are created to rehash and redistribute the rows of a table by a column other than the Primary Index column. The redistributed index table may be a subset of the columns (vertical subset) of the base table. It can significantly reduce the costs associated with doing a table redistribution for join processing.

In building join plans for two tables, the optimizer must first decide how to insure that all joinable rows are co-located on the same AMP. If both tables are being joined on their respective Primary Index columns, the joinable rows are already co-located on the same AMP, thus no redistribution of data is needed. If either table is not using its Primary Index columns as the join column(s), then a redistribution must occur.

Single-Table Join Indexes provide the ability to 'pre-distribute' the rows of a table based on the hash of the join value. This will eliminate the need for the optimizer to require a redistribution to perform the join - it can take advantage of the already distributed rows of the Single-Table Join Index.

Consider the two tables 'employee' and 'department'.

CREATE SET TABLE employee ,FALLBACK ( employee_number INTEGER, manager_employee_number INTEGER, department_number INTEGER, job_code INTEGER, last_name CHAR(20) NOT NULL, first_name VARCHAR(30) NOT NULL, hire_date DATE FORMAT 'YY/MM/DD' NOT NULL, birthdate DATE FORMAT 'YY/MM/DD' NOT NULL, salary_amount DECIMAL(10,2) NOT NULL) UNIQUE PRIMARY INDEX ( employee_number ); CREATE TABLE department, FALLBACK (department_number SMALLINT ,department_name CHAR(30) NOT NULL ,budget_amount DECIMAL(10,2) ,manager_employee_number INTEGER ) UNIQUE PRIMARY INDEX (department_number);

Assume we would like to perform the following query.

Query 4

Select all employee numbers, their department number and department name.

SELECT e.employee_number ,d.department_number ,d.department_name FROM employee e INNER JOIN department d ON e.department_number = d.department_number;

Joining these two tables on the 'department_number' column might require a redistribution of the rows of the employee table. The department table is already distributed based on the PI column 'department_number', but the employee table is distributed on the PI column 'employee_number'. Depending on the size of the tables, the redistribution can become a costly operation.

One possible technique to expedite this join would be to create a Single-Table Join Index on the employee table as follows:

CREATE JOIN INDEX emp_deptno AS SELECT employee_number, department_number FROM employee PRIMARY INDEX (department_number);

Executing Query 4 again would give the optimizer the opportunity to use the join index and thus enable it to avoid the cost of redistributing the rows of the employee table. If we EXPLAIN the query, we can see that the join index was indeed used.

Query 4 (Explained)

Select all employee number, their department number and department name.

EXPLAIN SELECT e.employee_number ,d.department_number ,d.department_name FROM employee e INNER JOIN department d ON e.department_number = d.department_number;

(Partial Listing)

4. We do an all-AMPs JOIN step from PED.d by way of a RowHash match scan with no residual conditions, which is joined to PED.emp_deptno. PED.d and PED.emp_deptno are joined using a merge join, with a join condition of ("PED.emp_deptno.department_number = PED.d.department_number"). The result goes into Spool 1, which is built locally on the AMPs.' with low confidence to be 24 rows. The estimated time for this step is 0.18 seconds.

Summary

The following are index options available for query performance enhancement which we have seen in this module.

Hash-ordered NUSI's – traditional NUSI's Value-ordered NUSI's – to facilitate range seaches on the index value Single-table join indexes – to pre-hash-distribute rows of one table to co-

locate with joinable rows Multi-table join indexes – to pre-join existing table rows from multiple

tables

Join indexes may be further enhanced by applying the following features to them:

Define the PI of the join index – to distribute the join index rows most effectively

Define the ordering of the join index rows – to sequence the join index rows Define a NUSI on the join index – to access rows in the join index more

effectively (Multi-table join index only)

Define the ordering of the NUSI rows – to sequence the NUSI rows, either hash or value-based (Multi-table join index only)

Lab



To start the online labs, click on the Telnet button in the lower left hand screen of the course. Two windows will pop-up: a BTEQ Instruction Screen and your Telnet Window. Sometimes the BTEQ Instructions get hidden behind the Telnet Window. You will need these instructions to log on to Teradata.

Prior to doing these labs, it will be helpful to reset your default database to your own user id (i.e. DATABASE tdxxx;).


Answers: Lab A.1 Lab A.2 Lab A.3 Lab B.1 Lab B.2 Lab B.3 Lab B.4 Lab C.1 Lab C.2

A.1 Make copies in your own database of the city and state tables which are located in the Student database. You may accomplish this by using the following SQL:

CREATE TABLE state AS Student.state WITH DATA ;

CREATE TABLE city AS Student.city WITH DATA

Look at the table definitions for the City table and the State table. Construct a query which returns the following information by inner joining these two tables. Order the results by city population within state population.

City Name City Population State Name State Population

... ... ... ...A.2 Create a Join Index in your own database called citystateidx. The fixed potion of the index should contain the state name and the state population. The variable portion should contain the city name and










the city population.

A.3 Now rerun the query to see if you get the same result. If you do, EXPLAIN the query to see if your Join Index was used.

B.1 Drop the Join Index citystateidx.

B.2 Modify the query to only show states whose population is between one and three million. Run the query.

B.3 Recreate the Join Index, however this time insure that the index rows will be sorted by the state population column.

B.4 Rerun the query to insure the same results, then EXPLAIN the query to determine if the Join Index was used.

C.1 Add a NUSI on the Join Index on the city population column. Value order the NUSI by city population.

C.2 Rerun the query to insure the same results, then EXPLAIN the query to determine if the NUSI was used.

8.) Aggregate Join Indexes

Objectives

Upon completion of this module, you should be able to:

Create an aggregate join index. Determine when an aggregate join index is advantageous. Determine when an aggregate join index is being used by the optimizer.

Join Index Review

A Join Index is an optional index which may be created by the user for one of the following three purposes:

Pre-join multiple tables Distribute the rows of a single table on the hash value of a foreign key value Aggregate one or more columns of a single table or multiple tables into a

summary table

The first two listed purposes are covered in an earlier module of this training program.

In this module, we will concentrate on the last of the three purposes — aggregating columns into a join index that the optimizer may choose to use as a summary table.

Why An Aggregate Index

Summary Tables

Queries which involve counts, sums, or averages over large tables require processing to perform the needed aggregations. If the tables are large, query performance may be affected by the cost of performing the aggregations. Traditionally, when these queries are run frequently, users have built summary tables to expedite their performance. While summary tables do help query performance there are disadvantages associated with them as well.

Summary Tables Limitations

Require the creation of a separate table Require initial population of the table Require refresh of changing data, either via update or reload Require queries to be coded to access summary tables, not the base tables Allow for multiple versions of the truth when the summary tables are not up-

to-date

Aggregate Indexes

Aggregate indexes provide a solution that enhances the performance of the query while reducing the requirements placed on the user. All of the above listed limitations are overcome with their use.

An aggregate index is created similarly to a join index with the difference that sums, counts and date extracts may be used in the definition. A denormalized summary table is internally created and populated as a result of creation. The index can never be accessed directly by the user. It is available only to the optimizer as a tool in its query planning.

Aggregate indexes do not require any user maintenance. When underlying base table data is updated, the aggregate index totals are adjusted to reflect the changes. While this requires additional processing overhead when a base table is changed, it guarantees that the user will have up-to-date information in the index.

Aggregate Index Properties

Aggregate Indexes are similar to other Join Indexes in that they are:

Automatically kept up to date without user involvement. Never accessed directly by the user. Optional and provide an additional choice for the optimizer. Multiload and Fastload may not be used to load tables for which join indexes

are defined.

Aggregate Indexes are different from other Join Indexes in that they:

Use the SUM and COUNT functions. Permit use of EXTRACT YEAR and EXTRACT MONTH from dates.

You must have one of the following two privileges to create any join index:

DROP TABLE rights on each of the base tables (or the containing database) or,

INDEX privilege on each of the base tables

Additionally, you must have this privilege:

CREATE TABLE on the database or user which will own the join index

The following table will be used in the subsequent examples:

CREATE SET TABLE PED.daily_sales ,NO FALLBACK , NO BEFORE JOURNAL, NO AFTER JOURNAL (itemid INTEGER ,salesdate DATE FORMAT 'YY/MM/DD' ,sales DECIMAL(9,2)) PRIMARY INDEX ( itemid );

Without An Aggregate Index

Consider the following problem, without the use of an aggregate index.

Example

SELECT EXTRACT(YEAR FROM salesdate)AS Yr , EXTRACT(MONTH FROM salesdate)AS Mon , SUM(sales) FROM daily_sales_2010 WHERE itemid = 10 AND Yr IN ('2009', '2010') GROUP BY 1,2 ORDER BY 1,2;

Yr Mon Sum(sales)----------- ----------- ----------- 2009 1 2150.00 2009 2 1950.00 2009 8 1950.00 2009 9 2100.00 2010 1 1950.00 2010 2 2100.00 2010 8 2200.00 2010 9 2550.00

EXPLAINing Without An Aggregate Index

Explaining the previous query shows us that this is a primary index access against the 'daily_sales_2004' table. (Note that because the cost of aggregation is not calculated, no final cost for the query is generated.)

EXPLAIN SELECT EXTRACT(YEAR FROM salesdate)AS Yr , EXTRACT(MONTH FROM salesdate)AS Mon , SUM(sales) FROM daily_sales_2010 WHERE itemid = 10 AND Yr IN ('2009', '2010') GROUP BY 1,2 ORDER BY 1,2;

Explanation

1. First, we do a SUM step to aggregate from PED1.daily_sales_2010 by way of the primary index "PED1.daily_sales_2010.itemid = 10" with a residual condition of ("((EXTRACT(YEAR FROM (PED1.daily_sales_2010.salesdate )))= 2009) OR ((EXTRACT(YEAR FROM (PED1.daily_sales_2010.salesdate )))= 2010)"), and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 2. The size of Spool 2 is estimated with high confidence to be 1 to 1 rows.

2. Next, we do a single-AMP RETRIEVE step from Spool 2 (Last Use) by way of the primary index "PED1.daily_sales_2010.itemid = 10" into Spool 1, which is built locally on that AMP. Then we do a SORT to order Spool 1 by the sort key in spool field1. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.17 seconds.

3. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1.

Creating An Aggregate Index

Creating a join index gives the optimizer the option of using the 'pre-aggregated' information kept in the index, thus avoiding the need to generate a separate aggregation step.

CREATE JOIN INDEX monthly_sales AS SELECT itemid AS Item ,EXTRACT(YEAR FROM salesdate) AS Yr ,EXTRACT(MONTH FROM salesdate) AS Mon ,SUM(sales) AS SumSales FROM daily_sales_2010 GROUP BY 1,2,3;

If we run the exact query as previously, with no changes, we will get the same result as before, however it will take advantage of the aggregate index.

SELECT EXTRACT(YEAR FROM salesdate)AS Yr, EXTRACT(MONTH FROM salesdate)AS Mon, SUM(sales) FROM daily_sales_2010 WHERE itemid = 10 AND Yr IN ('2009','2010') GROUP BY 1,2 ORDER BY 1,2;

Yr Mon Sum(sales)--- ---- ---------2009 1 2150.002009 2 1950.002009 8 1950.002009 9 2100.002010 1 1950.002010 2 2100.002010 8 2200.002010 9 2550.00

EXPLAINing The Use Of Aggregate Index

Explaining the previous query shows us that this time the aggregate index is employed.

EXPLAIN SELECT EXTRACT(YEAR FROM salesdate)AS Yr , EXTRACT(MONTH FROM salesdate)AS Mon , SUM(sales) FROM daily_sales_2010 WHERE itemid = 10 AND Yr IN ('2009', '2010') GROUP BY 1,2 ORDER BY 1,2;

Explanation

1. First, we do a SUM step to aggregate from join index table PED1.monthly_sales by way of the primary index "PED1.monthly_sales.Item = 10", and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 2. The size of Spool 2 is estimated with low confidence to be 4 to 4 rows.

2. Next, we do a single-AMP RETRIEVE step from Spool 2 (Last Use) by way of the primary index "PED1.monthly_sales.Item = 10" into Spool 1, which is built locally on that AMP. Then we do a SORT to order Spool 1 by the sort key in spool field1. The size of Spool 1 is estimated with low confidence to be 4 rows. The estimated time for this step is 0.17 seconds.

3. Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.-> The contents of Spool 1 are sent back to the user as the result of statement 1.

Because the aggregations are already calculated and available in the index, the costs associated with step one are reduced. The cost of step two is unchanged (0.17).

Because aggregation costs are not currently carried in EXPLAIN text, the savings in processing time for step one are not shown, however the response time reduction for the user can and should be substantial.

SHOWing The Aggregate Index

Join index definitions may be seen using the SHOW JOIN INDEX construct.

Example

Show the aggregate index named monthly_sales;

SHOW JOIN INDEX monthly_sales;

CREATE JOIN INDEX PED1.monthly_sales ,NO FALLBACK AS SELECT COUNT(*)(FLOAT, NAMED CountStar ) ,PED1.daily_sales_2010.itemid (NAMED Item )

,EXTRACT(YEAR FROM (PED1.daily_sales_2010.salesdate)) (NAMED Yr ) ,EXTRACT(MONTH FROM (PED1.daily_sales_2010.salesdate)) (NAMED Mon ) ,SUM(PED1.daily_sales_2010.sales )(NAMED SumSales ) FROM PED1.daily_sales_2010 GROUP BY 2,3,4 PRIMARY INDEX ( Item );

In showing an index definition, some changes should be noted:

All column names are fully qualified to the database level. A count of rows is automatically added if not specified in the definition, thus

supporting count aggregations. This can be seen in the SHOW result - (COUNT(*) NAMED CountStar). If both COUNT and SUM are in the index, AVERAGE calculations may also

make use of the index.

Covering The Query - Example 1

Ultimately, just as with any join index, it is the optimizer's choice whether or not the index is useful for a specific query. The index created previously is repeated here for convenience.

CREATE JOIN INDEX monthly_sales AS SELECT itemid AS Item ,EXTRACT(YEAR FROM salesdate) AS Yr ,EXTRACT(MONTH FROM salesdate) AS Mon ,SUM(sales) AS SumSales FROM daily_sales_2010 GROUP BY 1,2,3;

Any query against the 'daily_sales_2010' table which requests an aggregation in any of the following formats, will use the 'monthly_sales' index.

Sum of sales by year Sum of sales by month Sum of sales by month within year Grand total sum of sales

An index is said to 'cover' the query (or cover part of the query) if the optimizer can generate the query results using the index as a replacement for one or more of the specified tables.

Example

Show the grand total sales for item 10 as contained in the daily_sales_2010 table.

SELECT itemid ,SUM(sales) FROM daily_sales_2010 WHERE itemid = 10 GROUP BY 1;

itemid Sum(sales) ------- ---------- 10 16950.00

EXPLAIN SELECT itemid ,SUM(sales) FROM daily_sales_2010 WHERE itemid = 10 GROUP BY 1;

Explanation (Partial)

1. First, we do a SUM step to aggregate from join index table PED1.monthly_sales by way of the primary index "PED1.monthly_sales.Item = 10" with no residual conditions, and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 2. The size of Spool 2 is estimated with high confidence to be 1 to 1 rows.

Note: This is an example of the index 'monthly_sales' covering the query.

Covering The Query - Example 2

Show the total sales for item 10 for 2010.

SELECT itemid ,SUM(sales) FROM daily_sales_2010 WHERE itemid = 10 AND EXTRACT (YEAR FROM salesdate) = '2010' GROUP BY 1; itemid Sum(sales) ------- --------- 10 8800.00

EXPLAIN SELECT itemid ,SUM(sales) FROM daily_sales_2010 WHERE itemid = 10 AND EXTRACT (YEAR FROM salesdate) = '2010' GROUP BY 1;

Explanation (Partial)

1. First, we do a SUM step to aggregate from join index table PED1.monthly_sales by way of the primary index "PED1.monthly_sales.Item = 10", and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 2. The size of Spool 2 is estimated with high confidence to be 1 to 1 rows.

Aggregate Indexes With Functions

Aggregate indexes are not used in conjunction with queries using SUM Window, COUNT Window, WITH or WITH BY functions. Because these functions must process and display all qualifying detail rows, the value of the aggregate index is reduced. Explaining any query using these functions will validate that the index is not used.

Lab

Try It! For this set of lab questions you will need information from the Database Info document.






Answers: Lab 1 Lab 2 Lab 3 Lab 4 Lab 5

1.) Prior to doing this lab exercise, it will be necessary to recreate a copy of the employee table in your user space. Accomplish this with the following commands:

DATABASE tdxxx;

CREATE TABLE employee AS Customer_Service.employee WITH DATA;

Create an aggregate index called 'dept_sals' which sums all of the salaries of the employees in the employee table by department.

2.) Write a query which shows each department number and the sum of salaries for that department. Order results by department number.

3.) Explain the query in lab #2. Does it use the aggregate index?

4.) Modify the query in lab #2 and add a column which shows the average salary in each department. Call this column 'Avg_Sal'

5.) Explain this query and note if the aggregate index was used.

9.) Hash Indexes

Objectives

After completing this module, you should be to:






Recognize the performance advantage of using hash indexes. Create and implement hash indexes. Determine when hash indexes are used by a query.

Indexes Revisited

In the next few pages, we will be looking at Hash Indexes and their properties. Because they share in common many attributes of secondary indexes and join indexes, let's first review the basics of secondary indexes and join indexes.

Secondary Indexes

Secondary indexes are defined to provide alternate access pathways to the base rows of a single table. Users may define secondary indexes, but they cannot be accessed directly by the user, nor can the user affect how the index rows are distributed. Their use or non-use is a option to the optimizer in its query planning.

The following are properties of secondary indexes:

They contain pointers to the base rows of the table Are always defined on a single table Can 'cover' certain queries, but their primary purpose is locating base rows

Secondary Indexes exist in two formats:

Unique Secondary Index (USI) - there is a one-to-one relationship between the index rows and the base table rows.

Non-Unique Secondary Index (NUSI) - there is a one-to-many relationship between the index rows and the base table rows. NUSI rows may be either 'hash' or 'value' ordered.

Join Indexes

Join indexes are defined to reduce the number of rows processed in generating result sets from certain types of queries, especially joins. Like secondary indexes, users may not directly access join indexes. They are an option available to the optimizer in query planning. The following are properties of join indexes:

Are used to replicate and 'pre-join' information from several tables into a single structure.

Are designed to cover queries, reducing or eliminating the need for access to the base table rows.

Usually do not contain pointers to base table rows (unless user defined to do so).

Are distributed based on the user choice of a Primary Index on the Join Index. Permit Secondary Indexes to be defined on the Join Index (except for Single

Table Join Indexes), with either 'hash' or 'value' ordering.

Join Indexes exist in three general formats:

Single Table Join Index (STJI) - o Defined on a single table, usually for the purpose of redistributing the

table rows based on the hash value of a foreign key column (or columns).

o Facilitates the ability to join the foreign key table with the primary key table.

(Multi-Table) Join Indexes (JI) - o A join index which contains 'pre-joined' data from two or more tables. o Facilitates join operations by reducing or eliminating the need to

redistribute and join base table rows. Aggregate Join Index (AJI) -

o A join index which contains an aggregation operator such as COUNT or SUM.

o Facilitates aggregation queries wherein the pre-aggregated values contained in the AJI may be used instead of relying on base table calculations

Hash Index Definition

Hash Indexes are database objects that are user-defined for the purpose of improving query performance. They are file structures which contain properties of both secondary indexes and join indexes.

Like Secondary Indexes

Hash Indexes are similar to secondary indexes in the following ways:

They are created for a single table only. They contain information which allows access to base table rows. The CREATE syntax is very similar to a secondary index. They may sometimes cover a query without use of the base table rows.

Like Join Indexes

Hash Indexes are similar to join indexes in the following ways:

They 'pre-locate' joinable rows to a common location. The distribution and sequencing of the rows are user specified. They are very similar to single-table join indexes (STJI), however with added

functionality.

Unlike Join Indexes

Hash Indexes are unike join indexes in the following ways:

No aggregation operators are permitted. They are always defined on a single table. Automatically contains base table PI value as part of the hash index subtable

row. Contains additional information needed to locate the base table row (e.g.

uniqueness value). No secondary indexes may be built on the hash index.

Note:

All indexes, whether secondary, join or hash, are automatically updated by the system when the underlying table rows are changed.

Hash Index Examples

Creating Hash Indexes

Example 1

Consider the following Hash Index definition:

CREATE HASH INDEX hash_1 (employee_number, department_number) ON emp1 BY (employee_number) ORDER BY HASH (employee_number);

This index is built for the table 'emp1' which is defined as follows:

CREATE SET TABLE emp1 (employee_number INTEGER, manager_employee_number INTEGER, department_number INTEGER, job_code INTEGER, last_name CHAR(20) NOT NULL, first_name VARCHAR(30) NOT NULL, hire_date DATE NOT NULL, birthdate DATE NOT NULL, salary_amount DECIMAL(10,2) NOT NULL) UNIQUE PRIMARY INDEX ( employee_number );

Points to consider about this hash index definition:

Each hash index row contains the employee number, the department number.

Specifying the employee number is unnecessary, since it is the primary index of the base table and will therefore be automatically included.

The BY clause indicates that the rows of this index will be distributed by the employee_number hash value.

The ORDER BY clause indicates that the index rows will be ordered on each AMP in sequence by the employee_number hash value.

Example 2

The same hash index definition could have been abbreviated as follows:

CREATE HASH INDEX hash_1 (employee_number, department_number) ON emp1;

This is essentially the same definition because of the defaults for hash indexes.

The BY clause defaults to the primary index of the base table.

The ORDER BY clause defaults to the order of the base table rows.

Hash Index Definition Rules

There are two key rules which govern the use of the BY and ORDER BY clauses:

The column(s) specified in the BY clause must be a subset of the columns which make up the hash index.

When the BY clause is specified, the ORDER BY clause must also be specified.

Covered Query

The following is an examply of a simple query which is covered by this index:

SELECT employee_number, department_number FROM emp1;

Normally, this query would result in a full table scan of the employee table. With the existence of the hash index, the optimizer can pick a less costly approach, namely retrieve the necessary information directly from the index rather than accessing the lengthier (and costlier) base rows.

Consider the explain of this query:

EXPLAIN SELECT employee_number, department_number FROM emp1;

1) First, we lock a distinct TD000."pseudo table" for read on a RowHash to prevent global deadlock for TD000.hash_1. 2) Next, we lock TD000.hash_1 for read. 3) We do an all-AMPs RETRIEVE step from TD000.hash_1 by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 8 rows. The estimated time for this step is 0.15 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.15 seconds.

Example 3

The following is an alternate definition of the hash index 'hash_1'.

CREATE HASH INDEX hash_1 (employee_number, department_number) ON emp1 BY (employee_number) ORDER BY VALUES(employee_number);

Points to consider about this hash index definition:

This definition produces the same hash index, however the index rows are ordered based on employee_number value rather than the hash value.

This might be more useful for certain 'range processing' queries.

This definition would be equally helpful in covering the query indicated previously. The order of index rows would be of no significance.

Hash Indexes and Joins

Creating Hash Indexes For Joins

Example 1

Consider the following Hash Index definition:

CREATE HASH INDEX hash_2 (employee_number, department_number) ON emp1 BY (department_number) ORDER BY HASH (department_number);

This hash index is to be used for the purpose of facilitating joins between the 'employee' and 'department' tables, based on the PK/FK relationship on 'department_number'.

EXPLAIN SELECT employee_number, department_name FROM emp1 e INNER JOIN dept1 d ON e.department_number = d.department_number;

1) First, we lock a distinct TD000."pseudo table" for read on a RowHash to prevent global deadlock for TD000.hash_2. 2) Next, we lock a distinct TD000."pseudo table" for read on a RowHash to prevent global deadlock for TD000.d. 3) We lock TD000.hash_2 for read, and we lock TD000.d for read. 4) We do an all-AMPs JOIN step from TD000.hash_2 by way of a RowHash match scan. with no residual conditions, which is joined to TD000.d. TD000.hash_2 and TD000.d are joined using a merge join, with a join condition of ("TD000.hash_2.department_number = TD000.d.department_number"). The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 16 rows. The estimated time for this step is 0.18 seconds. 5) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.18 seconds.

Points to consider about the effect of the hash index on this join plan:

No redistribution of emp1 rows is needed. No sort of emp1 rows is needed. The merge join step (#4) is able to proceed directly after lock steps. In this case, the hash index functions much the same as a single table join

index (STJI).

Hash Indexes and ROWID Pointers

Using the ROWID in Hash Indexes

Because the primary index is automatically carried in a Hash Index (as is the uniqueness value associated with the base row-id), the system many easily calculate the row-id of the base row. This permits columns values not explicitly contained in the hash index definition to be accessed and returned as part of a covered query.

Example 1

Consider again the following Hash Index definition:


Perform the same join on the two tables, however this time add the column 'job-code' to the SELECT. Note, this column isn't part of the hash index.

EXPLAIN SELECT e.employee_number , d.department_name , e.job_codeFROM emp1 e INNER JOIN dept1 d ON e.department_number = d.department_number;

: : 5) We do an all-AMPs JOIN step from TD000.d by way of a RowHash match scan. with no residual conditions, which is joined to TD000.hash_2. TD000.d and TD000.hash_2 are joined using a merge join, with a join condition of ("TD000.hash_2.department_number = TD000.d.department_number"). The result goes into Spool 2, which is duplicated on all AMPs. The size of Spool 2 is estimated with low confidence to be 64 rows. The estimated time for this step is 0.18 seconds. 6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to TD000.e. Spool 2 and TD000.e are joined using a product join, with a join condition of ( "(Field_2 = TD000.e.employee_number) AND (Field_3 = (SUBSTRING(TD000.e.RowID FROM 5 FOR 4 )))"). The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 8 rows. The estimated time for this step is 4.92 seconds. 7) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 5.10 seconds. )

Points to consider about this query plan:

The hash index is used in place of the employee table No redistribution of rows is needed. Job-code values are returned by using the ROWID pointer to the base row

(Step #6). Row hash locks are used to access the base rows of employee table.

This plan assumes that both tables are fairly large.

A similar effect could have been achieved with a single table join index (STJI) by adding an explicit ROWID to the index definition as follows:

CREATE JOIN INDEX ji_emp AS SELECT employee_number, department_number, ROWID FROM emp1;

The following page lists advantages of Hash Indexes over STJI's.

Hash Index Advantages and Limitations

Hash Index Advantages

Hash indexes, by comparison, have the following advantages over Single Table Join Indexes.

They automatically contain the primary index of the base table. They automatically contain ancillary information (such as uniqueness

number) needed to calculate the row-id of the base table row. Their syntax is similar to secondary index syntax, thus simpler. They are automatically compressed for storage.

Hash Index Limitations

The following are limitations of using Hash Indexes:

A total maximum of 32 hash indexes, join indexes and secondary indexes can be associated with a table.

A hash index can consist of no more than 16 columns. Hash indexes are not supported with the following Teradata features and

utilities: o Multiload o Fastload o Archive/Recovery o Triggers o Permanent Journal o Upsert Processing

Lab





If you experience problems connecting to the lab server, contact

[email protected]


Lab 1 1.) Prior to doing this lab exercise, it will be necessary to recreate two tables in your user space. Accomplish this with the following commands:

CREATE TABLE loc1 AS Customer_Service.location WITH DATA;

CREATE TABLE loc_emp1 AS Customer_Service.location_employee WITH DATA;

1a.) Create a hash index which will facilitate joins between the 'loc1' and 'loc_emp1' tables. The hash index should:

- Contain the columns 'employee_number' and 'location_number'.- Be distributed based on the hash of location_number.- Be ordered based on the hash of location_number.

1b.) Once the hash index is successfully created, execute the following join. EXPLAIN the join to see if the hash index was used.

SELECT l.location_number , employee_number , customer_number FROM loc1 l INNER JOIN loc_emp1 l_e ON l.location_number = l_e.location_number ORDER BY 3;

10.) Materialized Views

Objectives

After completing this module, you should be able to do the following:

Implement a materialized view as a partially covered join index Implement a materialized view as a sparse join index

Materialized Views and Hash Index Review

What Are Materialized Views?

Materialized views refer to precomputing and maintaining query results in a database management system. In the Teradata database, materialized view features are based on a range of capabilities built upon the existing join index technology. Said differently, Teradata implements materialized views as join indexes. Before looking at the materialized view features available with Teradata, it

will be helpful to review the concept of a hash index.

Hash Indexes

First, let's review a little bit about hash indexes as seen in a previous section.


Hash indexes, by definition are defined on a single table. This hash index 'hash_2' can be useful to the optimizer in handling the following query:

SELECT employee_number, department_name FROM emp1 e INNER JOIN dept1 d ON e.department_number = d.department_number;

This query is partially covered by the hash index (HI). Partially covered means that the optimizer cannot resolve the query with the hash index alone. It must bring in another table - in this case the department table - in order to retrieve the department_name column. Because the index is ordered on the hash of department number, it can join the HI directly to the department table based on same primary indexes (PI)s.

The join step of the EXPLAIN of this query is seen here:

4) We do an all-AMPs JOIN step from TD000.hash_2 by way of a RowHash match scan. with no residual conditions, which is joined to TD000.d. TD000.hash_2 and TD000.d are joined using a merge join, with a join condition of ("TD000.hash_2.department_number = TD000.d.department_number"). The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 16 rows. The estimated time for this step is 0.18 seconds.

As can be seen, this is simply a join between the HI and the department table. The optimizer picked this plan because it did not have to prepare either side of the join - the rows are already in department number hash sequence for both the HI and the department table. Accessing the employee table was not necessary to resolve this query.

Join Backs

Note that the following query, which additionally selects the job_code column, is also able to use the HI. This is due to the availability of the ROWID which is implicitly included in all hash indexes. The implicit ROWID allows the optimizer to 'join back' to the base employee row to pick up additional information (i.e., job_code), not available in the HI itself.

SELECT e.employee_number, d.department_name, e.job_codeFROM emp1 e INNER JOIN dept1 d

ON e.department_number = d.department_number;

5) We do an all-AMPs JOIN step from TD000.d by way of a RowHash match scan. with no residual conditions, which is joined to TD000.hash_2. TD000.d and TD000.hash_2 are joined using a merge join, with a join condition of ("TD000.hash_2.department_number = TD000.d.department_number"). The result goes into Spool 2, which is duplicated on all AMPs. The size of Spool 2 is estimated with low confidence to be 64 rows. The estimated time for this step is 0.18 seconds. 6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an all-rows scan, which is joined to TD000.e. Spool 2 and TD000.e are joined using a product join, with a join condition of ( "(Field_2 = TD000.e.employee_number) AND (Field_3 = (SUBSTRING(TD000.e.RowID FROM 5 FOR 4 )))"). The result goes into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 8 rows. The estimated time for this step is 4.92 seconds.

Step 5 joins the HI to the department table, and step 6 uses the implicit ROWID in the HI to locate the base row for each employee to extract its job code. We consider the HI to partially cover the query because it still requires information from another table.

A limitation of HI's is that they are by definition single table indexes.

Partial Covering Join Indexes

Multi-Table Partial Covering Join Indexes

We can also create a join index on multiple tables with the same 'partial covering' capability of hash indexes.

CREATE JOIN INDEX ji_emp_dept1 AS SELECT e.employee_number, d.department_nameFROM employee e INNER JOIN department dON e.department_number = d.department_number;

Note that the following query is the same query we executed before, however in this case, the join index completely covers the query.

EXPLAIN SELECT employee_number, department_name FROM employee e INNER JOIN department d ON e.department_number = d.department_number;

1) First, we lock a distinct SQL00."pseudo table" for read on a RowHash to prevent global deadlock for SQL00.ji_emp_dept1.2) Next, we lock SQL00.ji_emp_dept1 for read.3) We do an all-AMPs RETRIEVE step from SQL00.ji_emp_dept1 by way of an all-rows scan with no residual conditions into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 40 rows. The estimated time for this step is 0.06 seconds.4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

-> The contents of Spool 1 are sent back to the user as the result ofstatement 1. The total estimated time is 0.06 seconds.

Note how simple the EXPLAIN becomes in this case. The query is completely covered by a scan of the join index. No join is needed to get the answer set. The real work of getting the answer set all occurs in step 3 of the EXPLAIN text.

Now create a second join index which includes the ROWID for the employee table.

CREATE JOIN INDEX ji_emp_dept2 AS SELECT e.employee_number, d.department_name, e.ROWIDFROM employee e INNER JOIN department dON e.department_number = d.department_number;

Now in addition to selecting the employee number and department name, let's add the job code to the select list. This column exists on the employee table and is not an explicit part of the join index.

SELECT e.employee_number, d.department_name, e.job_codeFROM employee e INNER JOIN department d ON e.department_number = d.department_number;

The optimizer may choose to cover this query using the join index to acquire the employee number and the department name, and it may use the rowid of the employee table to 'join back' to the base employee row to acquire the job code. Thus, it works similarly to a hash index, however the join index has the added property of being defined on multiple tables. This provides the oppotunity to 'join back' to either table, assuming both ROWID's are specified in the join index definition.

It is still considered a 'partial covering' index because it still had to join back to the employee table to fully resolve the query, but it did not have to scan the entire employee table.

The join back capability provided by the ROWID syntax is typically chosen by the optimizer when the number of rows in the table is fairly large. Smaller table joins may not demonstrate this approach.

Multiple Join Back Capability

Specifying Multiple Rowids In a Join Index

We could also have coded the join index as follows:

CREATE JOIN INDEX ji_emp_dept3 AS SELECT e.employee_number, d.department_name, e.ROWID AS e_rowid, d.ROWID AS d_rowidFROM employee e INNER JOIN department dON e.department_number = d.department_number;

Now we are including the rowid for both tables, thus giving the optimizer the ability to join back in either direction. Thus, a query like the following may use the join index to join back to the department table this time, to pick up the department manager's number.

SELECT e.employee_number, d.department_name, d.manager_employee_numberFROM employee e INNER JOIN department d ON e.department_number = d.department_number;

Note, to use this option, aliases must be assigned to each rowid column selected in the join index definition. If the columns are not 'renamed' using an alias, the syntaxer will not allow more than one column named 'ROWID' in the same query.

Sparse Join Indexes

Definition of Sparse Join Index

A Spare Index is a Join Index with a WHERE clause which restricts the participating rows from the base tables. A Sparse Index can significantly reduce the size of the join index which must be built and maintained by the system.

A sparse index makes sense when a definable subset of the rows in the underlying tables are needed to satisfy a large percentage of the queries which will use it.

For example, a join index might be defined on a join of two tables, one a history table containing 5 years of order data and the other a table providing the customer name. Since the history table might be large, maintenance of this join index might be costly. Furthermore, if 90% of the queries are typically requesting information on the current year, it might make sense to create a sparse index with entries for that year only. Queries for the remaining four years could access the base tables since their frequency is much lower.

Example of reasons for creating a Sparse Index, might include:

Tables with lots of nulls which are ignored for purposes of most queries Tables with frequent access for rows which contain quantities above or

below a certain limit. Tables which are time oriented and where the most frequent accesses are

for current information.

Example

Create a non-sparse join index between the customers and orders tables.

CREATE JOIN INDEX cust_ord_ix AS SELECT (c.cust_id, cust_name),(order_id, order_status, order_date) FROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id PRIMARY INDEX (cust_id);


SELECT COUNT(o.order_id) FROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id;

Count(order_id) --------------- 86

This query represents a simple join of the two tables to produce an aggregation. The defined join index is able to resolve this query as seen in the extract of the EXPLAIN seen here.

3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix by way of an all-rows scan with no residual conditions, and thegrouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 4. The size of Spool4 is estimated with high confidence to be 1 row. The estimated time for this step is 0.08 seconds.4) We do an all-AMPs RETRIEVE step from Spool 4 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locallyon the AMPs. The size of Spool 1 is estimated with high confidence to be 1 row. The estimated time for this step is 0.03seconds.

Now, let's try another query, this time restricting the time interval.

How many valid customers have assigned orders in January 2002?

SELECT COUNT(c.cust_id) FROM customers c INNER JOIN orders oON c.cust_id = o.cust_id WHERE o.order_date BETWEEN (DATE '2009-01-01') AND (DATE '2009-01-31');

Once again, we see that the join index is able to cover the query. The number of rows participating is however, expected to be much fewer.

3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix by way of an all-rows scan with a condition of ("(SQL00.cust_ord_ix.order_date <= DATE '2009-01-31') AND (SQL00.cust_ord_ix.order_date >= DATE '2009-01-01')"), and thegrouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 4. The size of Spool4 is estimated with high confidence to be 1 row. The estimated time for this step is 0.06 seconds.

Since we anticipate that most of the queries against this table will involve rows from the year 2009, we may wish to create a sparse index with only those rows represented in the index.

Creating a Sparse Join Index

The following creates a new sparse join index which only includes rows for the year 2009.

CREATE JOIN INDEX cust_ord_ix_2009AS SELECT (c.cust_id, cust_name),(order_id, order_status, order_date) FROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id WHERE EXTRACT (YEAR FROM order_date) = '2009' PRIMARY INDEX (cust_id);

Now, we can run the same query again and see if the optimizer uses the sparse index.

SELECT COUNT(c.cust_id) FROM customers c INNER JOIN orders oON c.cust_id = o.cust_id WHERE o.order_date BETWEEN (DATE '2009-01-01') AND (DATE '2009-01-31');

Count(order_id) --------------- 86

3) We do an all-AMPs SUM step to aggregate from SQL00.cust_ord_ix_2009 by way of an all-rows scan with a conditionof ("(((EXTRACT(YEAR FROM (SQL00.cust_ord_ix_2009.order_date )))< )))= 2009) AND ((EXTRACT(MONTH FROM <= 1 ))) AND ((EXTRACT(YEARFROM (SQL00.cust_ord_ix_2009.order_date )))>= 2009)"), and the are computed globally, then placed in Spool 4. The size of Spool time for this step is 0.08 seconds.

Any query for the year 2009 which is covered by the sparse index will be optimized to use the sparse index instead of the base tables.

Example

SELECT c.cust_id, cust_name, order_id, order_status, order_dateFROM customers c INNER JOIN orders o ON c.cust_id = o.cust_id WHERE c.cust_id > 600AND EXTRACT (YEAR FROM order_date) = '2009'AND EXTRACT (MONTH FROM order_date) IN (2,3);

cust_id cust_name order_id order_status order_date ----------- --------------- ----------- ------------ ---------- 1009 YZA Corp 648 C 2009-02-08 1004 JKL Corp 620 O 2009-02-17 1008 VWX Corp 645 O 2009-03-04 1005 MNO Corp 627 O 2009-03-27

1006 PQR Corp 633 O 2009-03-24 1009 YZA Corp 649 O 2009-03-08 1004 JKL Corp 621 O 2009-03-17 1008 VWX Corp 644 C 2009-02-04 1005 MNO Corp 626 C 2009-02-27 1006 PQR Corp 632 C 2009-02-24 1003 GHI Corp 614 C 2009-02-12 1007 STU Corp 639 O 2009-03-14 1003 GHI Corp 615 C 2009-03-12 1007 STU Corp 638 C 2009-02-14

The following is an EXPLAIN of this query. Note the use of the sparse index.

Explanation------------------------------------------------------------------------ 1) First, we lock a distinct SQL00."pseudo table" for read on a RowHash to prevent global deadlock for SQL00.CUST_ORD_IX_2009.2) Next, we lock SQL00.CUST_ORD_IX_2009 for read.3) We do an all-AMPs RETRIEVE step from SQL00.CUST_ORD_IX_2009 by way of an all-rows scan with a condition of ("(SQL00.CUST_ORD_IX_2009.cust_id > 600) AND (((EXTRACT(YEAR FROM (SQL00.CUST_ORD_IX_2009.order_date )))= 2009) AND (((EXTRACT(MONTHFROM (SQL00.CUST_ORD_IX_2009.order_date )))= 2) OR ((EXTRACT(MONTH FROM (SQL00.CUST_ORD_IX_2009.order_date )))=3 )))") into Spool 1(group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 1 row. Theestimated time for this step is 0.06 seconds.4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.06 seconds.

Sparse Index Advantages and Limitations

Sparse Join Indexes are a type of Join Index which contains a WHERE clause that

reduces the number of rows which would otherwise be included in the index. All types of join indexes, including single-table, multi-table, simple or aggregate can be sparse.

A sparse index makes sense when a definable subset of the rows in the join index are needed to satisfy a large percentage of the queries which will use it.

By default, any join index, including sparse join index, has a NUPI on the first column specified. You can explicitly define other columns to be the primary index.

Any combination of AND,OR,IN conditions may be applied to the sparse index WHERE clause.

Sparse Index Advantages:

Reduces the storage requirements for a join index subtable. Access is faster since the size of the subtable is smaller. Better maintenance performance since not all changes to the base table will

affect the sparse index.

Sparse Index Limitations:

Sparse indexes have the same restrictions as any join index. They require additional space and maintenance resources over and above

the base table requirements.

Summary

Materialized views are a cross between an index and a view.

Like an index, a materialized view has subtable rows, it can carry a row-id for join-back purposes and it requires maintenance when base table rows change.

Like a view, a materialized view represents a subset of the data in a table and the view changes along with changes in the underlying base rows.

Materialized views are typically implemented to improve query performance. We have discussed the following materialized view features in this module.

Partial-Covering Multi-Table Join Indexes - These are multi-table join indexes which include a ROWID specification for one or more of the base tables. This permits the optimizer to 'join-back' to the base row when information is needed which is not provided in the join index. Ultimately, the optimizer will decide whether or not to use this approach in place of another, depending on the associated query costs.

Sparse Join Indexes - These are join indexes which include a WHERE clause which limits the base table rows that will be reflected in the join index. This permits a join index to be built only for the rows which are most frequently accessed by queries, such as current year or current month. The size of the join index is thereby smaller and the maintenance costs are subsequently less. The optimizer will decide if it can use a sparse index to reduce the costs associated with a given query.

Lab








Answers: Lab 1b Lab 2 Lab 3

1a.) Create and populate the following two tables in your database, then run the UPDATE statement.

CREATE TABLE orders AS Student.orders WITH DATA;CREATE TABLE customers AS Student.customers WITH DATA;

UPDATE ordersSET order_date = order_date + INTERVAL '10' YEAR;

1b.) Create a sparse join index named cust_ord_ix_2009 with the following properties:

Fixed index columns cust_id, cust_name Variable index columns order_id, order_status, order_date Inner join on customers and orders tables Join condition on the cust_id columns Where condition to include only open orders (order_status =

'O') NUPI index is on cust_id.

2.) Create a query that returns a count of all open orders held by valid customers.

3




Documents

Join Index