Teradata Tuts

Embed Size (px)

DESCRIPTION

sg

Citation preview

ello guys....

I have few links n docs to share which i have collected during my interview prepartions...

among them TeraTomon book is very nice which will explain teradata basics in details...

there is one more doc 'Teradata Utitlities' on ultimatix (learning & dev>> Learning>safari online) which explains Fload ,. Mload TPT in detailsI couldnt download that one beccause of insuffiecient token... so you can read it there...

this will be useful for your daily work as well.. so save it some where..

utilities fload and mload : http://www.bi-dw.info/teradata-loading-tools.htmhttp://www.javaorator.com/teradata/tutorial/Fast-load-in-Teradata-132.code

SQl Questions :http://usefulfreetips.com/Teradata-SQL-Tutorial/tag/teradata-sql-query-test/

Interview:http://www.teradatahelp.com/2010/09/teradata-interview-questions-part-3.htmlhttp://www.teradatahelp.com/2010/08/teradata-interview-questions-part-1.html

IMP DOCS

--------------------------------------------------------------------------------------------------

Steps to improve performance of the query?

Explain Primary Index and what would be the constrains to select PI?

What is skew factor?

If your Skew factor is going up. What are remedies?

When, How and why we use Secondry Indexes?

What is difference between Primary Key and Primary Index?

What is difference between database and user in Teradata? What are the things you can do or cannot do in both?

When do you use BTEQ. What other softwares have you used or can we use rather than BTEQ?

What is AMP?

Types of indexes in Teradata

What is PE?

What is Collect Statistics?

What is Hashing Alogo

What is Hash value

What is HashMAP

What is HashBucket

What is hash collision?

How does PI work to insert row?

How does PI work to sel an row?

What is SI?

How to tune a query?

Steps to improve performance of the query?

How you will load a table/flat file with 5000 columns in Teradata?

Maximum number of columns supported in Teradata Table? In different versions of Teradata.

What is difference between Primary Key and Primary Index?

Primary Key

Primary Index

Logical concept of data modeling

Physical mechanism for access and stroge

Terada doesnt need to recognize

Each table must have exactly one

No Limit on column no.

64- column limit

Documented in data model

Defined in CREATE TABLE statement

Uniquely identifies each row

Used to place and locate each row on an AMP

Values should not change

Values may be changed (Del+Ins)

Must be not NULL

May be NULL

Does not imply an access path

Defines most efficient access path

Chosen for logical correctness

Chosen for physical performance

What is Hashing Algorithms?

- When the primary index value of a row is input to the hashing algorithm, then the output is called the row hash. Row hash is the logical storage address of the row, and identifies the amp of the row.

What is Hash value

- Hash value determine in which AMP the row will reside and it always attached along with the ROW to make it a UNIQUE identification for the ROW.

What is HashMAP

- HASHMAP contains the different bucket called as Hash Map Buckets, distributed along the rows and columns.

What is HashBucket

- Hash Buckets contain only the different AMP number which is attached with the TD system.

What is hashcollision

- This occurs when there is same hash value generated for two different Primary Index Values.

- To handle hash collision increase the contrast between the two column values, if your input column is char then try to change the values to alphanumeric to get more contrast in values.

What is skew factor?

- Skew Factor refers to the row distribution of table among the AMPs. If the data is highly skewed, it means some AMPs are having more rows and some very less. Means data is not evenly distributed. It affects the Teradata's performance. The data distribution or skewness can be controlled by choosing indexes.

If your Skew factor is going up. What are remedies?

- We will create new index which will have less skew factor.

What is PE?

Parsing Engine(PE) We can say PE as the mother of TD. Whenever a user login to TD it actually connected to PE. And when a user submits a query, then the PE takes action. It perform following task

1. It creates a plan and instructs AMPs what to do in order to get the result from the query.

2. Session control ( 120 session per PE) it check on the access right of the user that weather the user has the privilege to execute the query or not.

3. Act as an OPTIMIZER - Dispatching the optimized plan to AMPs by creating best possible execute plan.

4. It Parses the SQL request act as a compiler.

What is AMP?

- Access Module Processor (AMP) AMP is attached to the PE via BYNET for instruction and connected to its own disk and has the privilege to read or write the data to its disk.

- Each AMP is allowed to read and write in its own disk only it is known as the SHARED NOTHING ARCHITECTURE

- AMPs can be best considered as the computer processor with its own disk attached to it.

- Whenever it receives the instructions form the PE it fetches the data from its disk and sends it to back to PE through BYNET.

SQL

First highest sal second highest third salary

1. List all employees

2. List only one employee, Sorted Alphabetically

Finding 1 and 2 highest salary from emp table.Using rank() select * from (select empno, ename, sal, rank() over ( order by sal desc) rn from emp) a where a.rn = 1--- a.rn = 1 will return 1st highest and a.rn = 2 will return 2nd highest salary.Using row_number() Similarly we can use row_number() function in place of rank() function to find the 1st highest, 2nd highest and so on.select * from (select empno, ename, sal, row_number() over ( order by sal desc) rn from emp) a where a.rn = 2If will want highest salary department wise we need to add partition by deptno just before order by like rank() over (partition by deptno order by sal desc) rnBasic difference between these two is that rank() will skip the next no if more than 1 row receive the same rank were as row_number() will generate sequence no.Sql 2) How will you insert text '2014-02-01' in a table?Sql 3) If i have below table Col abcdef abc_def How will i search string abc_def?Sql4) If i have a table as below Customer Expenditure A 100 A 200 A 300 B 200 B 400 B 500 C 600 C 700 C 800

We have to find that customer name that has done maximum sum of expenditure keeping in mind the performance of the query .Data in table is huge.Destination question ans : -Table-routedname dnoDelhi 1Nagpur2Mumbai 3Chennai 4

Show the route rows Delhi to Nagpur.so on

select a.dname, b.dname from route a inner join (select dno, dname from route) b on a.dno+1 = b.dno;

A column has some negative values and some positive values. It is required to find the sum of negative numbers and the sum of the positive numbers in two separate columns.SELECTSUM(CASE WHEN num < 0 THEN num ELSE 0 END) neg,SUM(CASE WHEN num > 0 THEN num ELSE 0 END)posFROM neg_pos;

A Employee table has column Gender. By mistake all Male has flag F and all female has flag M. How to correct this.

Find the employees who make more than twice the average salary in their department.CREATE VIEW DS(D,S,C) ASSELECT DEPT,SUM(SALARY),COUNT(*)FROM EMPGROUP BY DEPT;

SELECT E.NAMEFROM EMP E, DSWHERE E.DEPT=DS.DAND E.SALARY>2*(DS.S/DS.C);

Find the departments whose salary total is more than twice the average departmental salary total.SELECT DFROM DSWHERE D.S>2*(SELECT AVG(S) FROM DS)

Find the employees whose salaries are among the top 100 salaries.SELECT E1.NAMEFROM EMP E1WHERE 99>= (SELECT COUNT(DISTINCT SALARY) FROM EMP E2 WHERE E2.SALARY>E1.SALARY)

____________BASICS______________________________

Read architecture ..... indexes....(PI, Secondary etc.).....Utilities a little..., performance tuning...and queries

What are the types of PI (Primary Index) in Teradata?

There are two types of Primary Index. Unique Primary Index ( UPI) and Non Unique Primary Index (NUPI). By default, NUPI is created when the table is created. Unique keyword has to be explicitly given when UPI has to be created.

UPI will slower the performance sometimes as for each and every row , uniqueness of the column value has to be checked and it is an additional overhead to the system but the distribution of data will be even.Care should be taken while choosing a NUPI so that the distribution of data is almost even . UPI/NUPI decision should be taken based on the data and its usage.

How to Choose Primary Index(PI) in Teradata?

Choosing a Primary Index is based on Data Distribution and Join frequency of the Column. If a Column is used for joining most of the tables then it is wise to choose the column as PI candidate.For example, We have an Employee table with EMPID and DEPTID and this table needs to be joined to the Department Table based on DEPTID.

It is not a wise decision to choose DEPTID as the PI of the employee table. Reason being, employee table will have thousands of employees whereas number of departments in a company will be less than 100. So choosing EMPID will have better performance in terms of distribution.

How the data is distributed among AMPs based on PI in Teradata?

Assume a row is to be inserted into a Teradata table The Primary Index Value for the Row is put into the Hash Algorithm The output is a 32-bit Row Hash The Row Hash points to a bucket in the Hash Map.The first 16 bits of the Row Hash of is used to locate a bucket in the Hash Map The bucket points to a specific AMP The row along with the Row Hash are delivered to that AMP

When the AMP receives a row it will place the row into the proper table, and the AMP checks if it has any other rows in the table with the same row hash. If this is the first row with this particular row hash the AMP will assign a 32-bit uniqueness value of 1. If this is the second row hash with that particular row hash, the AMP will assign a uniqueness value of 2. The 32-bit row hash and the 32-bit uniqueness value make up the 64-bit Row ID. The Row ID is how tables are sorted on an AMP.

This uniqueness value is useful in case of NUPI's to distinguish each BUPI value.Both UPI and NUPI is always a One AMP operation as the same values will be stores in same AMP.What are Secondary Indexes (SI) , types of SI and disadvantages of Secondary Indexes in Teradata?

Secondary Indexes provide another path to access data. Teradata allows up to 32 secondary indexes per table. Keep in mind; row distribution of records does not occur when secondary indexes are defined. The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored. Keep in mind that Secondary Indexes (when defined) do take up additional space.

Secondary Indexes are frequently used in a WHERE clause. The Secondary Index can be changed or dropped at any time. However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed.

There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI). Unique Secondary Indexes are extremely efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP.

A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.

Secondary indexes can be useful for: Satisfying complex conditions Processing aggregates Value comparisons Matching character combinations Joining tables

How are the data distributed in Secondary Index Subtables in Teradata?

When a user creates a Secondary Index, Teradata automatically creates a Secondary Index Subtable. The subtable will contain the: Secondary Index Value Secondary Index Row ID Primary Index Row ID

When a user writes an SQL query that has an SI in the WHERE clause, the Parsing Engine will Hash the Secondary Index Value. The output is the Row Hash, which points to a bucket in the Hash Map.That bucket contains an AMP number and the Parsing Engine then knows which AMP contains the Secondary Index Subtable pertaining to the requested USI information.

The PE will direct the chosen AMP to look-up the Row Hash in the Subtable. The AMP will check to see if the Row Hash exists in the Subtable and double check the subtable row with the actual secondary index value. Then, the AMP will pass the Primary Index Row ID back up the BYNET network. This request is directed to the AMP with the base table row, which is then easily retrieved.

What are the types of JOINs available in Teradata?

Types of JOINs are : Inner Join, Outer Join (Left, Right, Full), Self Join, Cross Join and Cartesian Joins.