RDBMS Using SQL Server 2000

RDBMS Concepts using SQL Server 2000

Training & Development Division

Page 1 of 194


PRAVESH – Student Guide

Subject: RDBMS using SQL server 2000

V1.0



Page 2 of 194

Chapter 1: Introduction to RDBMS ................................................................................................3

Chapter 2: Introduction to Microsoft SQL Server.........................................................................24

Chapter 3: Creating and Maintaining Databases...........................................................................32

Chapter 4: Structured Query Language........................................................................................49

Chapter 5 : Tables.........................................................................................................................61

Chapter 6: In-Built Functions.......................................................................................................73

Chapter 7: Constraints ..................................................................................................................95

Chapter 8: Subqueries and joins.................................................................................................106

Chapter 9: Indexes......................................................................................................................120

Chapter 10: Views ......................................................................................................................133

Chapter 11: Programming with Transact - SQL ........................................................................141

Chapter 12: Stored Procedure & User Defined Functions ........................................................149

Chapter 13: Cursors....................................................................................................................162

Chapter 14: Transactions and Locks ..........................................................................................173

Chapter 15: Triggers...................................................................................................................186



Page 3 of 194

RDBMS Concepts using SQL Server 2000 Chapter 1: Introduction to RDBMS

Objectives

• Learn the need for data organization and its maintenance

• Understand the different Database Systems Prevalent

• The need for an Relational Database System

• Different paradigms used in RDBMS

• Concept of Normalization and its application

• Understand the concept of denormalisation



Page 4 of 194

Introduction A database management system (DBMS) is a collection of interrelated data and a set of programs to access those data. The collection of data, usually referred to as the database, contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient. Database systems are designed to manage large bodies of information. Management of data involves both defining structures for storage of information and providing mechanisms for manipulation of information. The database system must ensure the safety of the information stored, despite system crashes or attempts at unauthorized access. Since the data is to be shared among several users, the system must avoid possible anomalous results.

Database Legacy

Flat file, hierarchy, and network databases are usually referred as legacy databases. They represent the ways people used to organize information in prehistoric times — about 30 years ago. Flat file databases The flat file database was probably one of the earliest database management systems. The idea behind flat file is a very simple one: one single, mostly unstructured data file. You could compare it to a desk drawer that holds virtually everything — bill stubs, letters, small change. While requiring very little effort to put information in, such a "design" becomes a nightmare to get the information out, as you would have to scroll through each and every record searching for the right one. Putting relevant data into separate files and even organizing them into tables alleviates the problem somewhat but does not remove the major obstacles: data redundancy, slow processing speed, error-prone storage and retrieval. Moreover, it required intimate knowledge of the database structure to work at all — it would be utterly useless to search for, say, orders information in the expenses file. Example of a Flat File Database System

Name Type Address Price Quantity

Nails Product n/a 100 2000

Ace Hardware Customer 1234 Willow Ct Seattle, Washington

n/a n/a

Cedar planks Product n/a 2000 5000

Dissatisfaction with these shortcomings stimulated development in the area of data storage-and-retrieval systems.

Hierarchical databases

The concept of a hierarchical database was around since the 1960s and — believe it or not — it is still in use. The hierarchical model is fairly intuitive: As the name implies, it stores data in hierarchical structure, similar to that of a family tree, organization chart, or pyramid; some readers could visualize a computer file system as it is presented through some graphical interface. It is based on "parent/child" paradigm in which each parent could have many children but each child has one and only one parent. You can visualize this structure as an upside down tree, starting at the root (trunk) and branching out at many (many branches). Since the records in a child table are accessed through a hierarchy of levels there could not be a record in it without a corresponding pointer record in the parent table — all the way up to the root. You could compare it to a file management system (like a tree-view seen in the Microsoft Windows Explorer) — to get access to a file within a directory one must first open the folder that contains this file. Let's improve upon the previously discussed flat file model. Instead of dumping all the information into a single file you are going to split it among three tables, each containing pertinent information:



Page 5 of 194

business name and address for the CUSTOMER table; product description, brand name, and price for the PRODUCT table; and an ORDER_HEADER table to store the details of the order. In the hierarchical database model redundancy is greatly reduced (compared with flat file database model): You store information about customer, product, and so on once only. The table ORDER_HEADER would contain pointers to the customer and to the product this customer had ordered; whenever you need to see what products any particular customer purchased, you start with ORDER_HEADER table, find list of id(s) for all the customers who placed orders and list of product id(s) for each customer; then, using CUSTOMER table you find the customer name you are after, and using products id(s) list you get the description of the products from the PRODUCT table. To get any information from the hierarchical database a user has to have an intimate knowledge of the database structure; and the structure itself was extremely inflexible — if, for instance, if it is decided that the customers must place an order through a third party, you'd need to rewire all relationships because CUSTOMER table would not be related to ORDER_HEADER table anymore, and all your queries will have to be rewritten to include one more step — finding the sales agent who sold this product, then finding customers who bought it The hierarchical database is incapable of storing information in child tables without a parent table having a pointer to it: by the very definition of hierarchy there should be neither a product without an order, nor a customer without an order — which obviously cannot be the case in the real world. The hierarchical databases handle one-to-many relationship very well. However, in many cases you will want to have the child be related to more than one parent: Not only one product could be present in many orders, but one order could contain many products. There is no answer (at least not an easy one) within the domain of hierarchical databases.

Network databases

Attempts to solve the problems associated with hierarchical databases produced the network database model. This model has its origins in the Conference on Data Systems Languages (CODASYL), an organization founded in 1957 by the U.S. Department of Defense. CODASYL was responsible for developing COBOL — one of the first widely popular programming languages — and publishing the Network Database standard in 1971. The most popular commercial implementation of the network model was Adabas (long since converted to the relational model). The network model is very similar to the hierarchical one; it is also based on the concept of parent/child relationship but removes the restriction of one child having one and only one parent. In the network database model a parent can have multiple children, and a child can have multiple parents. This structure could be visualized as several trees that share some branches. In network database jargon these relationships came to be known as sets. In addition to the ability to handle a one-to-many relationship, the network database can handle many-to-many relationships. Also, data access did not have to begin with the root; instead one could traverse the database structure starting from any table and navigating a related table in any direction

ORDER HEADER

CUSTOMER SALESMAN PRODUCT

ORDER HEADER

CUSTOMER PRODUCT



Page 6 of 194

In this example, to find out what products were sold to what customers we still would have to start with ORDER_HEADER and then proceed to CUSTOMER and PRODUCT — nothing new here. But things greatly improve for the scenario when customers place an order through more than one agent: no longer does one have to go through agents to list customers of the specific product, and no longer has one to start at the root in search of records. While providing several advantages, network databases share several problems with hierarchical databases. Both are very inflexible, and changes in the structure (for example, a new table to reflect changed business logic) require that the entire database be rebuilt; also, set relationships and record structures must be predefined. The major disadvantage of both network and hierarchical database was that they are programmers' domains. To answer the simplest query, one had to create a program that navigated database structure and produced an output; unlike SQL this program was written in procedural, often proprietary, language and required a great deal of knowledge — of both database structure and underlying operating system. As a result, such programs were not portable and took enormous (by today's standards) amount of time to write. ENTITY RELATIONSHIP MODEL To design a database, you need to have a complete understanding of the Entity Relationship Model. As a database designer, you use an Entity Relationship (ER) diagram as a tool to build the logical database design of a system. An ER diagram represents the following three elements:

• Entities • Relationships • Attributes

Entities An entity is an object with a distinct set of properties that is easily identified. Entities are the building blocks of a database. Some examples of entities are Student, Course, and Grade. You represent an entity using a rectangular box that contains the name of the entity. An entity instance is a specific value of an entity. For example, if Johan and Laura are students, they are instance of the entity Student. Similarly, Science and Mathematics are instance of the entity Course. Typically, in a problem scenario, you may treat the nouns as entities. Weak Entity Sets An entity set may not have sufficient attributes to form a primary key. Such an entity set is termed a weak entity set. An entity set that has a primary key is termed a strong entity set. For a weak entity set to be meaningful, it must be associated with another entity set, called the identifying or owner entity set. The relationship associating the weak entity set with the identifying entity set is called the

Student Course

Grade



Page 7 of 194

identifying relationship. Ideally a weak entity set does not have a primary key; nevertheless, there is a need to distinguish all entities of the weak entity set that depend on one particular strong entity. The discriminator of a weak entity is a set of attributes that allows this distinction to be made. Hence, the primary key of the identifying entity set plus the weak entity’s discriminator forms the primary key of a weak entity set. Attributes An attribute is a property of an entity that differentiates it from other entities and provides information about the entity. An attribute type is a property of an entity type. For example, the attributes of the entity Student are Student Name, Student Id and Course Id. In an ER diagram, you represent attributes as ellipses and label them with the name of the attribute. Typically, in a problem scenario, you may treat the noun that describe an entity as attributes. An attribute, can be characterised by the following attribute types

1. Simple and composite attributes Simple attributes are those, which have not been divided into subparts. Composite attributes, can be divided into other subparts (attributes). For example, name can be considered to be a complex attribute, consisting of the first name, middle name, and last name.

2. Single-valued and multivalued attributes. When an attribute has a single value for a particular entity, it is called Single-valued attribute. Consider the following example for explaining multivalued attribute. Consider an employee entity set with the attribute phone number. An employee may have zero, one or several phone numbers and different employees may have different numbers of phones. This type of attribute is said to be multivalued

3. Derived attribute. The value for this type of attributes can be derived from the values of the other related attributes and entities. The value of a derived attribute is not stored, but is computed when required. Consider the customer entity set as an example, which has an attribute age. If the customer entity set also has an attribute date-of-birth, we can calculate age from the date-of-birth attribute and the current date. Thus, age is a derived attribute.

Student Id

Student

Student Name

Course



Page 8 of 194

Relationships A relationship is a crucial part of the design of a database. It is used to establish a connection between a pair of logically related entities. It is an association between entities. Separate entities. Separate entities can have relationships with each other. For example, if student study various courses, the entities are Student and Course, while the relationship between them is Studies. You represent relationship between two entities using a diamond labelled with the name of the relationship. A relationship may associate an entity with itself. For example, in a company, one employee may marry another employee. You then represent the relationship as follows: An ER Diagram, will consists of other major components, such as

1. Lines, which will link attributes to entity sets and entity sets to relationship sets 2. Double Ellipses is used to represent multivalued attributes 3. Dashed Ellipses which denotes derived attributes 4. Double lines which indicates total participation of an entity in a relationship set 5. Double rectangle which represent weak entity sets

Types of Relationships

There are three types of relationships that can exist between entities: One-to-one (1:1) One-to-many (1:m) or Many-to-one (m:1) Many-to-many (m:m) One-to-One (1:1) Relationship Two entities have a one-to-one relationship if for every instance of the first entity, there is only one instance of the second entity. Consider example of a university. For one department (say, the

Student Course Studies

Employee

Marries



Page 9 of 194

department of Social Science), there can be only one department head. One faculty member cannot head more than one department. This is an example of a one-to-one relationship. One-to-Many (1:m) Relationship Two entities are related in one-to-many relationship if for every instance of the first entity, there can be zero, one or several instances of the second entity, and for every sentence of the second entity, there is exactly one instance of the first entity. For example, a student can major in only one discipline, but many students can register for a discipline. This is an example of a many-to-one relationship. m 1 A many-to-one relationship is the same as a one-to-many relationship. A one-to-many or many-to-one relationship is also referred to as apparent-child or a master-detail relationship. It is the most commonly used relationship among entities. Two entities are related in a many-to-one relationship if for every instance of the first entity, there is exactly one instance of the second entity. For every instance of the second entity, there can be zero, one, or several instances of the first entity. For example, an employee can belong to a single department, but a department can have several employee. Many-to-Many (m:m) Relationship Two entities are related in a many-to-many relationship when for every instance of the first entity, there can be multiple instances of the second entity, and for every instances of the second entity there can be multiple instances of the first entity. For example, a product can be sold to several customers, and a customer can buy several products. m m

Deparment Faculty-Member Headed By

Student Discipline Register For

Customer Product Purchases



Page 10 of 194

Subtypes and Supertypes A subtype is a subset of another entity. For example, consider the entity Employee. There are two types of employee—salaried employee and wage—earning employees. In this example, Employee is a super type, and Salaried and Wage-Earning employee are the subtypes. There would be some attributes such as ‘name’ and ‘address’ that are common to the subtypes. Wage-earning employees will have some attributes such as ‘overtime’ and ‘daily wage’ that do not belong to the SALARIED subtype. The Salaried subtype has some attributes that do not belong to the subtype Wage-Earning.

Diagram A sub entity (or subtype) is always dependent on the super entity for its existence. The attributes of the super entity apply to all of its sub entities. The converse is not true. The subtype is connected to the super type via an unnamed relationship. The super type is connected to the relationship by a line containing a crossbar, as shown above. The super type is described by the attributes that are unique to it; these are the attributes that do not belong to the other subtypes.

Code

Employee

Wage-Earning Salaried

Daily Wage Overtime allowance Salary



Page 11 of 194

Tables Once the database is designed using an ER diagram, you map the entities and relationship on the tables. A table is a set of rows and columns. You represent the attributes of the entity as column heading of the table and the data about the entities as rows. The tables that you can derive from this diagram are Department and Faculty Member. You represent the tables as follows: Department DepartmentCode DepartmentName Location DepartmentHead PHY Physics P-Block 0089 MAT Maths M-Block 0145 CHE Chemistry C-Block 0127 FacultyMember FacultyMemberCode FacultyName DepartmentCode BirthDate 0024 Jim Smith CHE 4/20/56 0056 Joe Clement PHY 9/19/55 0089 David Shaw PHY 9/9/51 0113 Peter Hopper MAT 7/3/60 0127 Tony Copperman CHE 2/22/58 0143 Sarah Smith CHE 3/14/56 0145 Daniel Thomas MAT 4/17/54 You can represent the table structure of the table Department as follows: Department DepartmentCode DepartmentName Location DepartmentHead Table and Attribute Naming Conventions You should name the tables and attributes meaningful. If the name consists of two words, there should not be any space between them. To improve readability of the names, you can write the names in mixed case. For example, a table for faculty members can be named FacultyMember. Some attributes of the faculty member are faculty member code faculty name which can be named as FacultyMemberCode and FacultyName respectively.

Department Faculty-Member Headed

By



Page 12 of 194

KEYS Enforcing data integrity ensures that the data in the database is valid and correct. Keys play an important role maintaining data integrity. The various types of keys that have been identified are the: Candidate key Primary key Alternate key Composite key Foreign key Candidate Key It is important to have an attribute in a table that uniquely identifies a row. An attribute or set of attributes that uniquely identifies a row is called a Candidate key. This attribute has values that are unique. Consider the table Vehicle. Vehicle Serial# Regn# Description 023451 5602 Leyland 023452 4502 Volvo 023453 4513 Toyota The values of the attributes Serial#, Regn# and Description are unique in every row. Therefore, all three are Candidates keys. Note A candidate key can also be referred to as a Surrogate key. Primary Key The candidate key that you choose to identity each row uniquely is called the Primary key. In the table Vehicle, if you choose Serial# to identify rows uniquely, Serial# is the primary key. Alternative Key A Candidate key that is not chosen as a Primary key is an Alternate key. In the table Vehicle, if you choose Serial# as the Primary key, Regn# is the Alternate key. It is important that you understand that a primary key is the only sure way to identify the rows of a table. Hence, an alternate key may have the value NULL. A NULL value is not to be permitted in a Primary key since it would be difficult to uniquely identify rows containing NULL values. Composite Key In certain tables, a single attribute cannot be used to identify rows uniquely and a combination of two or more attributes is used as a Primary key. Such keys are called Composite keys. Consider the following table, Purchase, which is used to maintain the purchases made by various customers. Purchase



Page 13 of 194

CustomerCode ProductCode QtyPurchased PurchaseDate C122 P002 12 2/15/99 C134 P005 15 2/15/99 C018 P002 10 2/15/99 C122 P003 17 2/16/99 C 144 P001 4 2/16/99 C134 P003 9 2/16/99 You can see all values are not unique for any of the attributes. However, a combination of CustomerCode and ProductCode result in all unique values. Hence, the combination can be used as a Composite Primary key. Foreign Key When a Primary key of one table appears as an attribute in another table, it is called the Foreign key in the second table. A Foreign key is used to relate two tables. Consider the tables, Department and FacultyMember. Department DepartmentCode departmentName Location DepartmentHead PHY Physics P-Block 0089 MAT Maths M-Block 0145 CHE Chemistry C-Block 0127 FacultyMember FacultyMemberCode FacultyName DepartmentCode BirthDate 0024 Jim Smith CHE 4/20/56 0056 Joe Clement PHY 9/19/55 0089 David Shaw PHY 9/9/51 0113 Peter Hopper MAT 7/3/60 0127 Tony Copperman CHE 2/22/58 0143 Sarah Smith CHE 3/14/56 0145 Daniel Thomas MAT 4/17/54 Since DepartmentCode is unique in the Department table, you can choose it as a Primary key. Since it appears in the FacultyMember table as an attribute, it will be the foreign key in the FacultyMember table. You must ensure that the values of the foreign key match with any one value of the Primary key. You can use the following conventions to represent the keys in the table structure: Primary key – PK Aternative key –AK Foreign – FK You can represent the keys in the table structures as follows: Department DepartmentCode (PK) DepartmentName (AK) Location DepartmentHead BirthDate



Page 14 of 194

Faculty Member Faculty MemberCode (PK) FacultyName DepartmentCode (FK) DepartmentHead Data Integrity Data integrity falls into the following categories: Entity integrity Domain integrity Referential integrity Entity Integrity Entity integrity ensures that row can be identified by an attribute called the Primary key. The Primary key cannot have a NULL value. Domain Integrity Domain integrity refers to the range of valid entries for a given column. it ensure that there are only valid entries in the column. Referential Integrity Referential integrity ensures that for every value of foreign key, there is a matching value of the Primary key. Normalization The logical design of the database, including the tables and the relationships between them, is the core of an optimized relational database. A good logical database design can lay the foundation for optimal database and application performance. A poor logical database design can reduce the performance of the entire system. Normalization improves performance by reducing redundancy. Redundancy can lead to:

• Inconsistencies—Errors are more likely to occur when facts are repeated. • Update anomalies—Inserting, modifying and deleting data may cause inconsistencies.

Normalizing a logical database design involves using formal methods to separate the data into multiple, related tables. A greater number of smaller tables (with fewer columns) is characteristic of a normalized database. Some of the benefits of normalization include:

• Faster sorting and index creation. • A larger number of clustered indexes. • Narrower and more compact indexes. • Fewer indexes per table, which improves the performance of INSERT, UPDATE, and DELETE

statements. • Fewer null values and less opportunity for inconsistency, which increase database

compactness.



Page 15 of 194

As normalization increases, so do the number and complexity of joins required to retrieve data. Too many complex relational joins between too many tables can hinder performance. Reasonable normalization often includes few regularly executed queries that use joins involving more than four tables. Sometimes the logical database design is already fixed and total redesign is not feasible. Even then, however, it might be possible to normalize a large table selectively into several smaller tables. If the database is accessed through stored procedures, this schema change could take place without affecting applications. If not, it might be possible to create a view that hides the schema change from the applications. In relational-database design theory, normalization rules identify certain attributes that must be present or absent in a well-designed database.

• A table should have an identifier.

The fundamental rule of database design theory is that each table should have a unique row identifier, a column or set of columns used to distinguish any single record from every other record in the table. Each table should have an ID column, and no two records can share the same ID value. The column or columns serving as the unique row identifier for a table is the primary key of the table.

• A table should store only data for a single type of entity.

Attempting to store too much information in a table can prevent the efficient and reliable management of the data in the table. In the pubs database in SQL Server 2000, the titles and publishers information is stored in two separate tables. Although it is possible to have columns that contain information for both the book and the publisher in the titles table, this design leads to several problems. The publisher information must be added and stored redundantly for each book published by a publisher. This uses extra storage space in the database. If the address for the publisher changes, the change must be made for each book. And if the last book for a publisher is removed from the title table, the information for that publisher is lost.

In the pubs database, with the information for books and publishers stored in the titles and publishers tables, the information about the publisher has to be entered only once and then linked to each book. Therefore, if the publisher information is changed, it must be changed in only one place, and the publisher information will be there even if the publisher has no books in the database.

• A table should avoid nullable columns.

Tables can have columns defined to allow null values. A null value indicates that there is no value. Although it can be useful to allow null values in isolated cases, it is best to use them sparingly because they require special handling that increases the complexity of data operations. If you have a table with several nullable columns and several of the rows have null values in the columns, you should consider placing these columns in another table linked to the primary table. Storing the data in two separate tables allows the primary table to be simple in design but able to accommodate the occasional need for storing this information.

• A table should not have repeating values or columns.

The table for an item in the database should not contain a list of values for a specific piece of information. For example, a book in the pubs database might be coauthored. If there is a column in the titles table for the name of the author, this presents a problem. One solution is to store the name of both authors in the column, but this makes it difficult to show a list of the individual authors. Another solution is to change the structure of the table to add another column for the name of the second author, but this accommodates only two authors. Yet another column must be added if a book has three authors.



Page 16 of 194

If you find that you need to store a list of values in a single column, or if you have multiple columns for a single piece of data (au_lname1, au_lname2, and so on), you should consider placing the duplicated data in another table with a link back to the primary table. The pubs database has a table for book information and another table that stores only the ID values for the books and the IDs of the authors of the books. This design allows any number of authors for a book without modifying the definition of the table and allocates no unused storage space for books with a single author.

Normal Forms Normalization result in the formation of tables that satisfy certain specified constraints, and represent certain normal forms. The normal forms are used to ensure that various types of anomalies and inconsistencies are not introduced in the database. Normal forms are table structures with minimum redundancy. Several normal forms have been identified. The most important and widely used of these are:

� First Normal Form (1 NF) � Second Normal Form ( 2 NF) � Third Normal Form (3 NF) � Boyce Codd Normal Form (BCNF)

First Normal Form (1 NF)

A relation R (you may recall that a table is also called a relation) is said to be in 1NF, if the domains of all attributes of R are atomic. Consider the following table Project. Project Ecode Dept ProjCode Hours

E101 Systems P27 P51 P20

90 101 60

E305 Sales P27 P22

109 98

E508 Admin P51 P27

NULL 72

The data in the table is not normalized because a cell in ProjectCode and Hours contains nonatomic values. By applying the 1NF definition to the Project table, you arrive at the following table.

Project

Ecode Dept ProjCode Hours

E101 Systems P27 90 E101 Systems P51 101 E101 Systems P20 60 E305 Sales P27 109



Page 17 of 194

E305 Sales P22 98 E508 Admin P51 NULL E508 Admin P27 72

Functional Dependency

The Normalization theory is based on the fundamental notion of functional dependency. First, let us examine the concept of functional dependency. Given a relation (you may recall that a table is also called a relation) R, attribute A is functionally dependent on attribute B if each value of A in R is associated with precisely one value of B. In words, attribute A is functionally dependent on B if and only if, for each value of B, there is exactly one value of A. A attribute B is called the determinant. Consider the following table Employee:

Employee

Code Name City

E1 Mac Delhi E2 Sandra C A E3 Henry France

Given a particular value of Code, there is precisely one corresponding value for Name. For example, considering Code E1, there is exactly one value of Name Mac. Hence, Name is functionally dependent on Code, Similarly, there is exactly one value of City for each value of Code is the determinant. You can also say that Code determines City and Name.

Second Normal Form (2 NF) A table is said to be 2 NF when it is in 1 NF and every attribute in the row is functionally dependent upon the whole key, and not just part of the key. Consider Project table: Project

ECode ProjCode Dept Hours

The table has the following rows: ECode ProjCode Dept Hours

E101 P27 Systems 90 E305 P27 Finance 10 E508 P51 Admin NULL E101 P51 Systems 101 E101 P20 Systems 60 E508 P27 Admin 72



Page 18 of 194

For a given employee, the employee code and department are repeated several times. Hence, if an employee is transferred to another department, this change will have to be recorded in every row of the Employee table. Any omission will lead to inconsistencies.

� Deletion

If an employee completes work on a project, the employee’s record will be deleted. The information regarding the department to which the employee belongs will also be lost. The primary key here is composite (ECode + ProjCode).

The table satisfies the definition of 1NF. You need to one check if it satisfies 2NF. In the table, for each value of ECode, there is more than one value of HOURS. For example, for ECode, E101, there are three values hours: 90, 101 and 60. Hence hours are not functionally dependent of ECode. Similarly, for each value of ProjCode, there is more than one value of hours. For example, for ProjCode, P27, there are three values of Hours, 90, 10 and 72. However, for a combination of the ECode and ProjCode values, there is exactly one value of Hours. Hence, Hours of functionally dependent on the whole key, ECode + ProjCode.

Now, you must check if Dept is functionally dependent on the whole key, ECode-ProjCode. For each value of ECode, there is exactly one value of Dept. for example, for ECode, 101, there is exactly one value, the system department. Hence, Dept is functionally dependent on ECode. However, for each value of ProjCode, there is more than one value of Dept. for example, for ProjCode P27, there are two values of Dept, system and Finance. Hence, Dept is not functionally dependent on ProjCode. Dept is, therefore, functionally dependent on part of the key (which is ECode) and not functionally dependent on the whole key (ECode+ProjCode). Therefore, the table, Project is not in 2NF. For the table to be in 2NF, the non-key attributes must be fully functionally dependent on the whole key and not part of the key. Guidelines for Converting a Table to 2NF

� Find and remove attributes that are functionally dependent on only a part of the key and not

on the whole key. Place them in a different table � Group the remaining attributes.

To convert the table Project into 2NF, you must remove the attributes that are fully functionally dependent on the whole key and place them in a different table along with attribute that that it is functionally dependent on. In the above example, since Dept is not fully functionally dependent on the whole key ECode+ProjCode, you place Dept along with ECode in a separate table called EmployeeDept. Now, the table Project will contain ECode, ProjCode and Hours.

ECode Dept E101 Systems E305 Sales E508 Admin



Page 19 of 194

ECode ProjCode Hours E101 P27 90 E101 P51 101 E101 P20 60 E305 P27 10 E508 P51 NULL E508 P27 72

Third Normal Form (3 NF)

A relation is said to be in NF when it is in 2 NF and every non-key attribute is functionally dependent only on the primary key. Consider the table Employee ECode Dept DeptHead E101 Systems E901 E305 Finance E906 E402 Sales E906 E508 Admin E908 E607 Finance E909 E608 Finance E909

The problems with dependencies of this kind are:

� Insertion

The department head of a new department that does not have any employees at present cannot be entered in the DeptHead column. This is because the primary key is unknown.

� Updating

For given department, the code for a particular department head (DeptHead) is repeated several times. Hence, if a department head moves to another department, the change will have to be made consistently across the table.

� Deletion

If the record of an employee is deleted, the information regarding the head of the department will also be deleted. Hence, there will be a loss of information. You must check if the table 3NF. Since each cell in the tables has a single value. The table is in 1NF.

The primary key in Employee table is ECode. For each value of ECode, there is exactly one value of Dept. Hence, the attribute Dept is functionally dependent on the primary key, ECode. Similarly, for each value of ECode, there is exactly one value of DeptHead. Hence, DeptHead is functionally dependent on the primary key ECode. Hence, all the attributes are functionally dependent on the whole key, ECode. Hence the table is in 2NF.

However, the attribute DeptHead is dependent on the attribute Dept also. As per 3NF, all non-key attributes have to be functionally dependent only on the primary key. This table is not in 3NF since DeptHead is functionally dependent on Dept, which is not a primary key.

Guidelines for Converting a Table to 3NF



Page 20 of 194

� Find and remove non-key attribute that are functionally dependent on attribute that are not

the primary key. Place them in a different table. � Group the remaining attributes,

To convert the table Employee into 3NF, you must remove the column DeptHead since it is not functionally dependent on only the primary key ECode and place it in another table called Department along with the attribute Dept on which it is functionally dependent. Employee

Ecode Dept E101 Systems E305 Finance E402 Sales E508 Admin E607 Finance E608 Finance

Dept DeptHead

Systems E901 Sales E906 Admin E908 Finance 909

Boyce-Codd Normal Form

The original definition of 3NF was inadequate in some situation. It was not satisfactory for the tables:

� That had multiple candidate keys � Where the multiple candidate keys were composite. � Where the multiple candidate keys overlapped (had at least one attribute in common)

Hence, a new normal form-the Boyce-Codd normal form was introduced. You must understand that in tables where the above three conditions do not apply, you can stop at the third normal form. In such cases, the third NF is the Boyce-Codd normal form A relation is in the Boyce-Codd normal form (BCNF) if and only if every determinant is a candidate key Consider the table Project given below.

Project

ECode Name ProjCode Hours E1 Veronica P2 48 E2 Anthony P5 100 E3 Map P6 15 E4 Susan P3 250 E4 Susan P5 75 E1 Veronica P5 40 This table has redundancies. If the name of an employee is changed, the change will have to be made in every row of the table, otherwise there will be inconsistencies.



Page 21 of 194

ECode+ProjCode is the primary key. You will notice that Name+ProjCode could be chosen as the primary key and hence, is a candidate key.

� Hours is functionally dependent on ECode+ProjCode � Hours is also functionally dependent on Name+ProjCode � Name is functionally dependent on Ecode � ECode is functionally dependent on Name

You will notice that this table has:

� Multiple candidate keys, that is ECode+ProjCode and Name+ProjCode. � The candidate keys are composite. � The candidate keys overlap since the attribute –ProjCode is common

This is a case of the Boyce-Codd Normal form. This is in third NF. The only non-key item is Hours, which is depended on the whole key that is ECode+ProjCode and Name+ProjCode ECode and Name are determinants since they are functionally dependent on each other. However, they are not candidate keys by themselves. As per BCNF, the determinants have to be candidate keys. Guidelines for Converting a Table to BCNF

� Find and remove the overlapping candidate keys. Place the part of the candidate key and the

attribute it is functionally dependent table. � Group the remaining items into a table Denormalisation for Performance Occasionally, database designers choose a structure that has redundant information, (a table which is not normalised). The redundancy is used to improve performance for specific applications. For instance, suppose that the name of an accountholder has to be displayed along with the account number and balance, everytime the account is accessed. In normalisation, this would have required a join between the account_details and depositor relations. One alternative to joins is to store all attributes of the account_details and depositor relation into a single table. This would help display the account information faster. This process of taking a normalised schema adnd making it un-normalised is called denormalization. This is done by designers to tune the performance of the system in order to support time critical operations.



Page 22 of 194

Summary

• Different Data Models used o Flat File System o Hierarchical Data Model o Network Data Model

• ER model is a conceptual model of a database which has 3 major components o Entity – is a person, place, thing, object, event or even a concept o Attribute – characteristics or properties of any entity o Relationship – is an association between the two tables

• An Entity Relationship (ER) diagram is a tool a database designer to display the logical database of a system.

• In the entity-relationship diagram, entities are represented by rectangle, relationship by diamonds, and attributes by ellipses.

• There are three types of relationships:

o One-to-one o One-to-many or Many-to-one o Many-to-many

• The type of relationship, whether many-to-many, one-to-many, or one-to-one is represented symbolically.

• Weak entities are represented in double-lined boxes • Subtypes are connected to a super type by an unnamed relationship, marked

with a cross-bar on the top. • Relational systems require keys that can uniquely identify the rows of a table.

The various types of keys that have been identified are: o Candidate (also referred to as Surrogate) o Primary o Alternate o Composite

• Normalisation performs a major row in designing the database and grouping the

data in table. o First Normal Form (1NF) o Second Normal Form (2NF) o Third Normal Form (3NF) o BCNF o Fourth Normal Form (4NF) o Fifth Normal Form (5NF)



Page 23 of 194

REVIEW QUESTIONS

1. What is the difference between a Flat file system and RDBMS? 2. Explain the difference between a weak and strong entity set? 3. What are the features of primary key and super key? 4. What needs to be done to convert a relation which is in 2NF to 3NF 5. Explain the advantages and disadvantages of Normalisation

6. Does Denormalisation decrease the database performance?



Page 24 of 194

RDBMS Concepts using SQL Server 2000 Chapter 2: Introduction to Microsoft SQL Server

Objectives

• Introduction to Microsoft SQL Server

• Understand the architecture of SQL Server 2000

• Learn the features of SQL Server 2000

• Learn the different SQL Server components



Page 25 of 194

What is Microsoft SQL Server?

Microsoft SQL Server is a powerful relational database management system (RDBMS) from Microsoft Corporation, USA. It consists of a database engine that runs on a Windows NT or Windows 95 machine, as well as various tools for users and developers.

Microsoft SQL Server is a high performance, client/server relational database management system (RDBMS). It was designed to support high volume transaction processing (such as that for the online order entry, inventory, accounting or manufacturing) as well as data warehousing and decision-support applications (such as sales analysis applications). Microsoft SQL Server 2000 includes several new features that make it an excellent database platform for large-scale online transactional processing (OLTP), data warehousing, and e-commerce applications. Microsoft SQL Server 2000 is a family of products that meet the data storage and analysis requirements of the largest data processing systems and commercial Web sites. The same products can provide easy-to-use data storage and analysis services to an individual or small business.

Fundamentals of SQL Server 2000 Architecture

Microsoft SQL ServerTM 2000 is a family of products that meet the data storage requirements of the largest data processing systems and commercial Websites, yet at the same time can provide easy-to-use data storage services to an individual or small business.

The data storage needs of a modern corporation or government organization are very complex. Some examples are:

• Online Transactions Processing (OLTP) systems must be capable of handling thousands of orders placed at the same time.

• Increasing number of corporations are implementing large Web sites as a mechanism for their customers to enter orders, contact the service department, get information about the products and for many other tasks that previously required contact with employees. These sites require data storage that is secure, yet tightly integrated with Web.

• Organizations have many users who must continue working when they do not have access to the network. Examples are mobile disconnected users, such as traveling sales representatives or regional inspectors. These users must synchronize the data on a notebook or laptop with the current data in the corporate system, disconnect from the network, record the results of their work while in the field, and then finally reconnect with the corporate network and merge the results of their field work into the corporate data store.

• Managers and marketing personnel need increasingly sophisticated analysis of trends recorded in corporate data. They need robust Online Analytical Processing (OLAP) systems easily built from OLTP data and support sophisticated data analysis.

Independent Software Vendors (ISVs) must be able to distribute data storage capabilities with applications targeted at individuals or small work groups. This means the data storage mechanism must be transparent to the users who purchase the application. This requires a data storage system that can be configured by the application and then tune itself automatically so that the users do not need to dedicate database administrators to constantly monitor and tune the application.

SQL Server 2000 provides two fundamental services to applications in a Windows® DNA environment:

• The SQL Server 2000 relational database engine is a modern, highly scalable, highly reliable engine for storing data. The database engine stores data in tables. Each table represents some object of interest to the organization, such as vehicles, employees, or customers. Each table has columns that represent an attribute of the object modelled by the table (such as weight,



Page 26 of 194

name, or cost), and rows that represent a single occurrence of the type of object modelled by the table (such as the car with license plate number ABC-123, or the employee with ID 123456). Applications can submit Structured Query Language (SQL) statements to the database engine, which returns the results to the application in the form of a tabular result set. The specific dialect of SQL supported by SQL Server is called Transact-SQL. Applications can also submit either SQL statements or XPath queries and request that the database engine return the results in the form of an XML document.

The distributed query feature of the database engine allows you to access data from any source of data that can be accessed using OLE DB. The tables of the remote OLE DB data source can be referenced in Transact-SQL statements just like tables that actually reside in a SQL Server database. In addition, the full-text search feature allows you to perform sophisticated pattern matches against textual data stored in SQL Server databases or Windows files.

The relational database engine is capable of storing detailed records of all the transactions generated by the top online transaction processing (OLTP) systems. The database engine can also support the demanding processing requirements for fact tables and dimension tables in the largest online analytical (OLAP) data warehouses.

• Microsoft SQL Server 2000 Analysis Services provides tools for analyzing the data stored in data warehouses and data marts. Certain analytical processes, such as getting a summary of the monthly sales by product of all the stores in a district, take a long time if run against all the detail records of an OLTP system. To speed up these types of analytical processes, data from an OLTP system is periodically summarized and stored in fact and dimension tables in a data warehouse or data mart. Analysis Services presents the data from these fact and dimension tables as multidimensional cubes that can be analyzed for trends and other information that is important for planning future work. Processing OLAP queries on multidimensional Analysis Services cubes is substantially faster than attempting the same queries on the detail data recorded in OLTP databases.

Microsoft SQL Server 2000 is a set of components that work together to meet the data storage and analysis needs of the largest websites and enterprise data processing systems.

Microsoft SQL Server 2000 data is stored in databases. The data in a database is organized into the logical components visible to users. A database is also physically implemented as two or more files on disk.

When using a database, you work primarily with the logical components such as tables, views, procedures and users. The physical implementation of files is largely transparent. Typically, only the database administrator needs to work with the physical implementation.

Each instance of SQL Server has four system databases (master, model, tempdb and msdb) and one or more user databases. Some organizations have only one user database, containing all the data for their organizations have different databases for each group in their organization, and sometimes a database used by a single operation. For example, an organization may have one database for sales, one for payroll, one for a document management application, and so on. Sometimes an application uses only one database; other applications may access several databases.

It is not necessary to run multiple copies of the SQL Server database engine to allow multiple users to access the databases on a server. An instance of the SQL Server Standard or Enterprise Edition is capable of handling thousands of users working in multiple databases at the same time. Each instance of SQL Server makes all databases in the instance available to all users that connect to the instance, subject to all defined security permissions.

When connecting to an instance of SQL Server, your connection is associated with a particular database on the server. This database is called the current database. You are usually connected to a database defined as your default database by the system administrator, although you can use connection option



Page 27 of 194

in the database APIs to specify another database. You can switch from one database to another using either the Transact-SQL USE database_name statement, or an API function that changes your current database context.

SQL Server 2000 allows you to detach from an instance of SQL Server, then reattach them to another instance, or even attach the database back to the same instance. If you have a SQL Server database file, you can tell SQL Server when you connect to attach that database file with a specific database

Features of SQL Server 2000

Microsoft SQL ServerTM 2000 features include:

• Internet Integration

The SQL Server 2000 database engine includes integrated XML support. It also has the scalability, availability and security features required to operate as the data storage component of the largest Web sites. The SQL Server 2000 programming model is integrated with the Windows DNA architecture for developing Web applications, and SQL Server 2000 supports features such as English Query and the Microsoft Search Service to incorporate user-friendly queries and powerful search capabilities in Web applications.

• Scalability and Availability

The same database engine can be used across platforms ranging from laptop computers running Microsoft Windows 98 through large, multiprocessor servers running Microsoft Windows 2000 Data Center Edition. SQL Server 2000 supports wide range of users accessing it at the same time. An instance of SQL Server 2000 includes files that that make up a set of databases and a copy of the DBMS software. Applications running on separate computers use a SQL Server 2000 communications component to transmit commands over a network to the SQL Server 2000 instance. When an application connects to an instance of SQL Server 2000, it can reference any of the databases in that instance that the user is authorized to access. The communication component also allows communication between an instance of SQL Server 2000 and an application running on the same computer. You can run multiple instances of SQL Server 2000 on a single computer.

SQL Server 2000 is designed to support the traffic of the largest Web sites or enterprise data processing systems. Instances of SQL Server 2000 running on large, multiprocessor servers are capable of supporting connections to thousands of users at the same time. The data in SQL Server tables can be partitioned across multiple servers, so that several multiprocessor computers can cooperate to support the database processing requirements of extremely large systems. These groups of data base servers are called federations.

Although SQL Server 2000 is designed to work as the data storage engine for thousands of concurrent users who connect over a network, it is also capable of working as a stand alone data base directly on the same computer as an application. The scalability and ease-of-use features of SQL Server 2000 allow it to work efficiently on a single computer without consuming too many resources or requiring administrative work by the stand alone user. The same features allow SQL Server 2000 to dynamically acquire the resources required to support thousands of users, while minimizing database administration and tuning. The SQL Server 2000 relational database engine dynamically tunes itself to acquire or free the appropriate computer resources required to support a varying load of users accessing an instance of SQL Server 2000 at any specific time. The SQL Server 2000 relational database engine has features to prevent the logical problems that occur if a user tries to read or modify data currently used by others.

• Enterprise-Level Database Features

The SQL Server 2000 relational database engine supports the features required to support demanding data processing environments. The database engine protects data integrity while minimizing the overhead of managing thousands of users concurrently modifying the database. The SQL Server 2000



Page 28 of 194

distributed queries allow you to reference data from multiple sources as if it were a part of SQL Server 2000 database, while at the same time, the distributed transaction support protects the integrity of any updates of the distributed data. Replication allows you to also maintain multiple copies of data, while ensuring that the separate copies remain synchronized. You can replicate a set of data to multiple, mobile, disconnected users, have them work autonomously, and then merge their modifications back to the publisher.

• Ease of installation, deployment and use

The SQL Server 2000 includes a set of administrative and development tools that improve upon the process of installing, deploying, managing and using SQL Server across several sites. The SQL Server 2000 also supports a standards-based programming model integrated with the Windows DNA, making the use of SQL Server databases and data warehouses a seamless part of building powerful and scalable system. These features allow you to rapidly deliver SQL Server applications that customers can implement with a minimum of installation and administrative overhead.

• English-based questions

SQL Server 2000 includes tools for extracting and analyzing summary data for online analytical processes. SQL Server also includes tools for visually designing databases and analyzing data using English-based questions.

Relational Database Components

The database component of Microsoft SQL ServerTM 2000 is a Structured Query Language (SQL) - based, scalable, relational database with integrated Extensible Markup Language (XML) support for Internet applications. Each of the following terms describes a fundamental part of the architecture of the SQL Server 2000 database component in brief.

Relational Database

Although there are different ways to organize data in a database, relational databases are one of the most effective. Relational database systems are an application of mathematical set theory to the problem of effectively organizing data. In a relational database, data is collected into tables (called relations in relational theory).

A table represents some class of objects that are important to an organization. For example, a company may have a database with a table for employees another table for customers, and another for stores. Each table is built of columns and rows (called attributes and tuples in relational theory). Each column represents some attribute of the object represented by the table. For example, an Employee would typically have columns for attributes such as first name, last name, employee ID, department, and pay grade and job title. Each row represents an instance of the object represented by the table. For example, one row in the Employee table represents the employee who has employee ID 12345.

When organizing data into tables, you can usually find many different ways to define tables. Relational database theory defines a process called normalization, which ensures that the set of tables you define will organize your data effectively.

Structured Query Language

To work with data in a database, you have to use a set of commands and statements (language) defined by the DBMS software. Several different languages can be used with relational databases; the most common is SQL.

The American National Standards Institute (ANSI) and the International Standards Organizations (ISO) define software as standards, including standards for the SQL language. SQL Server 2000 supports the Entry Level of SQL-92, the SQL standard published by ANSI and ISO in 1992. The dialect of SQL



Page 29 of 194

supported by Microsoft SQL Server is called Transact-SQL (T-SQL). T-SQL is the primary language used by Microsoft SQL Server application.

Extensible Markup Language

XML is the emerging internet standard for data. XML is a set of tags that can be used to define the structure of a hypertext document. The Hypertext Markup Language can easily process XML document, which is the most important language for displaying Web pages.

Although most SQL statements return their results in a relational, or tabular, result set, the SQL Server 2000 database component supports a FOR XML clause that returns results as an XML document. SQL Server 2000 also supports x path queries from internet and intranet applications. XML documents can be added to SQL server databases, and the OPENXML clause can be used to expose data from an XML document as a relational result set.



Page 30 of 194

Summary

• SQL Server 2000 is a Relational Database Management System with high end functionality

• SQL Server 2000 is highly scalable and the data stored in it can be retrieved using a structured Query Language.

• SQL Server 2000 also supports XML data



Page 31 of 194

REVIEW QUESTIONS

1. Mention two features of SQL Server 2000 2. What is meant by the term – SQL Server is highly scalable 3. Does SQL Server 2000 support XML?



Page 32 of 194

RDBMS Concepts using SQL Server 2000 Chapter 3: Creating and Maintaining Databases

Objectives

• Overview of databases

• Learn about System Databases

• Understand the Physical file structure of Databases

• Creation of User Databases

• Learn various commands to access and maintain databases



Page 33 of 194

Database Overview

A client/server database system comprises of two components: • Programs that provide an interference for client-based users to access data • The database structure that manages and stores the data on the server. For example, if you SQL Server 2000 to create a checking account application, you must set up a database structure to manage the account transaction data and an application that acts as the user interface to the database, allowing users to access checking account information. Creating a database to serve your business needs requires an understanding of how to design, create and maintain each of these components to ensure that your database performs optimally. Topic Description

Databases Describes how databases are used to represent, manage and access data.

Tables Describes how tables are used to store rows of data and define the relationships between multiple tables.

Indexes Describes how indexes are used to increase the speed of accessing the data in the table.

Views Describes views and their usefulness in providing an alternate way of looking at the data in the table.

Stored Procedures Describes how these Transact-SQL programs centralize business rules, tasks and processes within the server.

Enforcing Business Rules with Triggers Describes the function of triggers as special types of stored procedures executed only when a table is modified

Full-Text Indexes

Describes how full-text indexes facilities the querying of data stored in character-based columns, such as varchar and text.

Databases

A database consists of a collection of tables that contain data and other objects, such as views, indexes, stored procedures and triggers, defined to support activities performed with the data. The data stored in a database is usually related to a particular subject or process, such as inventory information for a manufacturing warehousing. SQL Server can support many databases. Each database can store either interrelated or unrelated data from other databases. For example, a server can have one database that stores personnel data and another that stores product-related data. Alternatively, one database can store current custom order data and another related database can store historical customer orders used for yearly reporting. Designing a database requires an understanding of both the business functions you want to model and the database concepts and features used to represent those business functions. It is important to accurately design a database to model the business because it can be time consuming to change the design of a database significantly once implemented. A well-designed database also performs better. When designing a database, consider:



Page 34 of 194

• The purpose of the database and how it affects the design. Create a database plan to fit your purpose.

• Database normalization rules that prevent mistakes in the database design. • Protection of your data integrity. • Security requirements of the database and user permissions. You must ensure that the database design takes advantage of Microsoft SQL Server 2000 features that improve performance. Achieving a balance between the size of the database and the hardware configuration is also important for performance. • Maintenance • Estimating the size of a database

Creating a Database Plan

The first step in creating a database is creating a plan that serves both as a guide to be used when implementing the database and as a functional specification for the database after it has been implemented. The complexity and detail of a database design is dictated by the complexity and size of the database application as well as the user population. The nature and complexity of the database application, as well as the process of planning it, can vary greatly. A database can be relatively simple and designed for use by a single person, or it can be large and complex designed, for example, to handle all the banking transactions for hundreds of thousands of clients. In the first case, the database design may be a little more than a few notes on some scratch paper. In the latter case, the design may be a formal document with hundreds of pages that contain every possible detail about the database.



Page 35 of 194

System Databases and Data

Microsoft SQL Server 2000 systems has four system databases

master

The master database records all of the system level information for a SQL Server system. It records all login accounts and all system configuration settings. master is the database that records the existence of all other databases, including the location of the database files. master records the initialization information for SQL Server. Hence, it is always important to have a copy of the master database. [ Important : It is recommended that you do not create any user objects, such as tables, views, stored procedures or triggers, in the master database. The master database contains the system tables that store the system information used by SQL Server, such as configuration option settings.]

tempdb

tempdb holds all temporary tables and temporary stored procedures. It also fills any other temporary storage needs such as worktables generated by SQL Server. tempdb is a global resource; the temporary tables and stored procedures for all users connected to the system are stored there. tempdb is re-created every time SQL Server is started so the system starts with a clean copy of the database. Because temporary tables and stored procedures are dropped automatically when the user disconnects and none of the connections are active when the system is shut down, there is never anything in tempdb to be saved from one session of SQL Server to another. By default, tempdb autogrows as needed while SQL Server is running. Unlike other databases, however, it is reset to its initial size each time the database engine is started. If the size defined for tempdb is small, part of your system-processing load may be taken up with autogrowing tempdb to the size needed to support your workload each time to restart SQL Server. You can avoid this overhead by using ALTER DATABASE to increase the size of tempdb.

model

The model database is used as the template for all database created on a system. When a CREATE DATABASE statement is issued, the first part of the database is created by copying in the contents of the model database, and then the remainder of the new database is filled with empty pages. Since



Page 36 of 194

tempdb is created every time SQL Server is started, the model database must always exist on a SQL Server system.

msdb

The msdb database is used by SQL Server Agent for scheduling alerts and jobs and recording operators. In SQL Server 2000 and SQL Server version 7.0 every database, including the system databases, has its own set of files and does not share those files with other databases. Each database in SQL Server 2000 contains system tables recording the data needed by the SQL Server components. The successful operation of SQL Server depends on the integrity of information in the system tables; therefore, Microsoft does not support users directly updating the information in the system tables.

Logical Database Components

The topics in this section describe the way Microsoft SQL Server 2000 files and databases are organized. The organization of SQL Server 2000 and SQL Server version 7.0 is different from the organization of data in SQL Server version 6.5 or earlier.

Pages and Extents

The fundamental unit of data storage in SQL Server 2000 is the page, which is 8KB in size. In SQL Server 2000 databases have 128 pages per megabyte. The start of each page is a 96-byte header used to store system information, such as the type of page, the amount of free space on the page and the object ID of the object owning the page. The table shows eight types of pages in the data files of a SQL Server 2000 databases. Page Type Contents

Data Data rows with all data except text, ntext and image data

Index Index entries Text/Image Text, ntext and image data Global Allocation Map, Secondary Global Allocation Map

Information about allocated extents.

Page Free Space Information about free space available on pages.

Index Allocation Map Information About extends used by a table or index.

Bulk Changed Map Information about extends modified by bulk operations since the last BACKUP LOG statement.

Differential Changed Map Information about extents that have changed since the last BACKUP DATABASE statement.

Log files do not contain pages as they contain a series of log records. Data pages contain all the data rows except text, ntext and image data, which is stored in separate pages. Data rows are placed serially on the page starting immediately after the header. A row offset table starts at the end of the page. The row offset table contains one entry for each row on the page and each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence of the rows on the page. Rows cannot span pages in SQL Server. In SQL Server 2000, the maximum amount of data contained in a single row is 8060 bytes, not including text, ntext and image data. Extents are the basic unit in which space is allocated to tables and indexes. An extent is 8 contiguous pages, or 64 KB. This means SQL Server 2000 databases have 16 extents per megabyte. To make its space allocation efficient, SQL Server 2000 does not allocate entire extents to tables with small amounts of data.



Page 37 of 194

SQL Server 2000 has two types of extents:

1. Mixed Extent

Mixed extents are shared by up to eight objects. A new table or index is usually allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it is switched to uniform extents. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.

2. Uniform Extent A single object owns uniform extent; the owning object can use all eight pages in the extent.

Physical Database Files and Filegroups

SQL Server 2000 maps a database over a set of operating-system files. Data and log information are never mixed on the same file, and individual files are used only by one database. SQL Server 2000 databases have three types of files: • Primary data files The primary data file is starting point of the database and points to the other files in the database. Every database had one primary data file. The recommended file name extension for primary data files is .mdf. • Secondary data files Secondary data files comprise all of the data files other than the primary data file. Some databases may not have secondary data files, while others have mulitiple secondary data files. The recommend file name extension for secondary data files is .ndf. • Log files Log files hold all of the log information used to recover the database. There must be at least one log file for each database, although there can be more than one. The recommended file name extension for log files is .ldf. SQL Server 2000 does not enforce the .mdf, .ndf and .ldf file name extensions, but these extensions are recommended to help identify the use of the file. In SQL Server 2000, the locations of all the files in a database are recorded in both the master database and the primary file for the database. Most of the time, the database engine uses the file location information from the master database. For some



Page 38 of 194

operations, however, the database engine uses the file location information from the primary file to initialize the file location information from the primary file to initialize then file location entries in the master database. For example, a simple database, sales can be created with one primary file that contains all data and objects and a log file that contains the transaction log information. Alternatively, a more complex database, orders, can be created with one primary file and five secondary files; the data and objects within the database spread across all six files, and four additional log files contain the transaction log information. Concept of Filegroup Filegroups allow files to be grouped together for administrative and data allocation/placement purposes. For example, three files (Data1.ndf, Data 2.ndf and Data 3.ndf) can be created on three disk drives, respectively, and assigned to the filegroup fgroup1. A table can then be created specifically on the filegroup fgroup1. Queries for data from the tables will be spread across the three disks, thereby improving performance. The same performance improvement can be accomplished with a single file created on a RAID (Redundant Array of Independent Disks) stripe set. Files and filesgroup allow you to easily add new files on new disks. Additionally, if your database exceeds the maximum size for a single Microsoft Windows NT file, you can use secondary data files to allow your database to continue to grow.

Rules for Designing Files and Filegroups

Rules for designing files and filegroups include: • A file or filegroup cannot be used by more than one database. For example, file sales.mdf and

sales.ndf, which contain data and objects from the sales database, cannot be used by any other database.

• A file can be a member of only one filegroup. • Data and transaction log information cannot be a part of the same file or filegroup. • Transaction log files are never part of any filegroups.

Default Filegroups

A database comprises a primary filegroup and any user-defined file groups. The file group that contains the primary file is the primary filegroup. When a database is created, the primary filegroup contains the primary data file and any other files that are not put into any other filegroup. All system tables are allocated in the primary filegroup. The primary filegroup only fills if either outgrow is turned off or all the disks holding the files in the primary filegroup run out of space User-defined filegroups are any filegroups that are specifically created by the user when first creating or later altering the database. If a user-defined database filegroups fills up, only the user tables specifically allocated to that filegroup would be affected. At any time, exactly one filegroup is designated as the DEFAULT filegroup. When objects are created in the database without specifying to which filegroup they belong, they are assigned to the default filegroup. The default filegroup must be large enough to hold any objects not allocated to user-defined filegroup. Initially, the primary filegroup is the default filegroup. The default filegroup can be changed using the ALTER DATABASE statement. By changing, the default filegroup, any objects that do not have a filegroup specified when they are created are allocated to the data files in the new default filegroup. However, allocation for the system objects and tables remain with the PRIMARY filegroup, not the new default filegroup.



Page 39 of 194

Using Files and Filegroups

Filegroups use a proportional fill strategy across all files within each filegroup. As data is written to the filegroup, SQL Server writes an amount proportional to the free space in the file within the filegroup, rather than writing all the data to the first file until full, and then writing to the next file. For example, if file f1 has 100 megabytes (MB) free and file f2 has 200 MB free; one extent is allocated from file f1, two extents from f2, and so on. This way both the files become full at about the same time, and simple striping is achieved. As soon as all the files in a filegroup are full, SQL Server automatically expands one file at a time in a round robin fashion to accommodate more data (provided that the database is set to grow automatically). For example, a filegroup comprises three files, all set to automatically grow. When space in all filegroup is exhausted, only the first file is expanded. If the third file becomes full, and no more data can be written to the filegroup, the first file is expanded again, and so on. Using files and filegroups improve database performance by allowing a database to be created across multiple disks, multiple disk controller, or RAID (Redundant Array of Independent Disks) systems. Additionally, files and filegroups allow data placement because a table can be created in a specific filegroup. This improves performance because all I/O for a specific table can be directed at a specific disk.

Recommendations

These are some general recommendations for file and filegroups: • Most databases will work well with a single data file and a single transaction log file. • If you use multiple files, create a second filegroup for the additional file and make that filegroup

the default filegroup. This way, the primary file will contain only system tables and objects. • To maximize performance, create files or filegroups on as many different available local physical

disks as possible, and place objects that compete heavily for space in different filegroups. • Use filegroups to allow placement of objects on specific physical disks. • Place different tables used in the same join queries in different filegroups. This will improve

performance, due to parallel disk I/O searching for joined data. • Place heavily accessed tables and the nonclustered indexes belonging to those tables on different

filegroups. This will improve performance, due to parallel disk I/O if the files are located on different physical disks.

• Do not place the transaction log file or files on the same physical disk with the other files and filegroups.

Create Database Command

Creates a new database and the files used to store the database, or attaches a database from the files of a previously created database.

Syntax:

CREATE DATABASE database_name

[ ON [PRIMARY]

[ <filespec> [,...n] ]

[, <filegroup> [,...n] ]



Page 40 of 194

]

[ LOG ON { <filespec> [,...n]} ]

[ FOR LOAD | FOR ATTACH ]

<filespec> ::=

( [ NAME = logical_file_name, ]

FILENAME = 'os_file_name'

[, SIZE = size]

[, MAXSIZE = { max_size | UNLIMITED } ]

[, FILEGROWTH = growth_increment] ) [,...n]

<filegroup> ::=

FILEGROUP filegroup_name <filespec> [,...n]

Arguments

database_name Is the name of the new database. Database names must be unique within a server and conform to the rules for identifiers. Database_name can be up to 128 characters, unless no logical name is specified for the log. If no logical log file name is specified, Microsoft® SQL Server™ generates a logical name by appending a suffix to database_name. This limits database_name to 123 characters so that the generated logical log file name is less than 128 characters. ON Specifies that the disk files used to store the data portions of the database (data files) are defined explicitly. The keyword is followed by a comma-delimited list of <filespec> items defining the data files for the primary filegroup. The list of files in the primary filegroup can be followed by an optional, comma-delimited list of <filegroup> items defining user filegroups and their files. PRIMARY Specifies that the associated <filespec> list define the primary file. The primary filegroup contains all of the database system tables. It also contains all objects not assigned to user filegroups. The first <filespec> entry in the primary filegroup becomes the primary file, which is the file containing the logical start of the database and its system tables. A database can have only one primary file. If PRIMARY is not specified, the first file listed in the CREATE DATABASE statement becomes the primary file. n Is a placeholder indicating that multiple files can be specified for the new database. LOG ON Specifies that the disk files used to store the database log (log files) are explicitly defined. The keyword is followed by a comma-delimited list of <filespec> items defining the log files. If LOG ON is not specified, a single log file is automatically created with a system-generated name and a size that is 25 percent of the sum of the sizes of all the data files for the database. FOR LOAD This clause is supported for compatibility with earlier versions of Microsoft SQL Server. The database is created with the dbo use only database option turned on, and the status is set to loading. This is not



Page 41 of 194

needed in SQL Server version 7.0 because the RESTORE statement can re-create a database as part of the restore operation. If you attach a database to a server other than the server from which the database was detached, and the detached database was enabled for replication, you should run sp_removedbreplication to remove replication from the database. FOR ATTACH Specifies that a database is attached from an existing set of operating system files. There must be a <filespec> entry specifying the first primary file. The only other <filespec> entries needed are those for any files that have a different path from when the database was first created or last attached. A <filespec> entry must be specified for these files. The database being attached must have been created using the same code page and sort order as SQL Server. Use the sp_attach_db system stored procedure instead of using CREATE DATABASE FOR ATTACH directly. Use CREATE DATABASE FOR ATTACH only when you must specify more than 16 <filespec> items. NAME Specifies the logical name for the file defined by the <filespec>. The NAME parameter is not required when FOR ATTACH is specified. logical_file_name Is the name used to reference the file in any Transact-SQL statements executed after the database is created. Logical_file_name must be unique in the database and conform to the rules for identifiers. The name can be a character or Unicode constant, or a regular or delimited identifier. FILENAME Specifies the operating-system file name for the file defined by the <filespec>. 'os_file_name' Is the path and file name used by the operating system when it creates the physical file defined by the <filespec>. The path in os_file_name must specify a directory on the server in which SQL Server is installed. os_file_name cannot specify a directory in a compressed file system. If the file is being created on a raw partition, os_file_name must specify only the drive letter of an existing raw partition. Only one file can be created on each raw partition. Files on raw partitions do not autogrow; therefore, the MAXSIZE and FILEGROWTH parameters are not needed when os_file_name specifies a raw partition. SIZE Specifies the size of the file defined in the <filespec>. When a SIZE parameter is not supplied in the <filespec> for a primary file, SQL Server uses the size of the primary file in the model database. When a SIZE parameter is not specified in the <filespec> for a secondary or log file, SQL Server makes the file 1 MB. size Is the initial size of the file defined in the <filespec>. The KB and MB suffixes can be used to specify kilobytes or megabytes, the default is MB. Specify a whole number; do not include a decimal. The minimum value for size is 512 KB. If size is not specified, the default is 1 MB. The size specified for the primary file must be at least as large as the primary file of the model database. MAXSIZE Specifies the maximum size to which the file defined in the <filespec> can grow. max_size Is the maximum size to which the file defined in the <filespec> can grow. The KB and MB suffixes can be used to specify kilobytes or megabytes, the default is MB. Specify a whole number; do not include a decimal. If max_size is not specified, the file grows until the disk is full.



Page 42 of 194

UNLIMITED Specifies that the file defined in the <filespec> grows until the disk is full. FILEGROWTH Specifies the growth increment of the file defined in the <filespec>. The FILEGROWTH setting for a file cannot exceed the MAXSIZE setting. growth_increment Is the amount of space added to the file each time new space is needed. Specify a whole number; do not include a decimal. A value of 0 indicates no growth. The value can be specified in MB, KB, or %. If a number is specified without an MB, KB, or % suffix, the default is MB. When % is specified, the growth increment size is the specified percentage of the size of the file at the time the increment occurs. If FILEGROWTH is not specified, the default value is 10% and the minimum value is 64 KB. The size specified is rounded to the nearest 64 KB.

Remarks

You can use one CREATE DATABASE statement to create a database and the files that store the database. SQL Server implements the CREATE DATABASE statement in two steps: 1. SQL Server uses a copy of the model database to initialize the database and its Meta data. 2. SQL Server then fills the rest of the database with empty pages, except for pages that have internal recording how the space is used in the database. Any user-defined objects in the model database are therefore copied to all newly created databases. You can add to the model database any objects, such as tables, views, stored procedures, data types and so on, to be included in all databases. Each new database inherits the database option settings from the model database (unless FOR ATTACH is specified). For example, the database option selects into/bulkcopy is set to OFF in model and any new databases you create. If you use ALTER DATABASE to change the options for the model database, these option settings are in effect for new databases you create. If FOR ATTACH is specified on the CREATE DTABASE statement, the new database inherits the database option settings of the original database. Some points to note is :

1. Fractions cannot be specified in the SIZE, MAXSIZE and FILEGROWTH parameters. To specify a fraction of a megabyte in size parameters, convert to kilobytes by multiplying the number by 1,204. For example, specify 1.536 KB instead of 1.5 MB (1.5 multiplied by 1,024 equals 1,536).

2. When a simple CREATE DATABASE database_name statement is specified with no additional parameters, the database is made the same size as the model database.

3. All databases have at least a primary filegroup. All system tables are allocated in the primary filegroup. A database can also have user-defined file-groups.

4. You can specify a user-defined filegroup as the default filegroup using ALTER DATABASE: 5. Permission to create a database defaults to members of the sysadmin and dbcreator fixed

server roles, although permissions can be granted to other users. 6. A maximum of 32,767 databases can be created on a server. 7. The name of the database must follow the rules for identifiers.

Expanding a Database

Microsoft SQL ServerTM 2000 can automatically expand a database according to growth parameters defined when the database was created. It is also possible to manually expand a database by allocating additional file space on an existing database file or allocating space on another new file. You may need to expand the data or transaction log space if the existing files are becoming full. If a database has already exhausted the space allocated to it cannot grow automatically, Error 1105 is raised.



Page 43 of 194

When expanding a database, you must increase the size of the database by at least 1 megabyte (MB). Permission for expanding a database defaults to the database owner and is automatically transferred with database ownership. When a database is expanded, the new space is immediately made available to either the data or transaction log file, depending on which file was expanding. If the transaction log is not set up to expand automatically, it can run out of space if certain type of activity occur in the database. The transaction log is purged only of inactive (committed) transactions when it is backed up, or at each checkpoint when the database is using the simple recovery model. SQL Server does not truncate the transaction log when backing up the database. When you expand a database, it is recommended that you specify a maximum size to which file is permitted to grow. This prevents the file from growing until disk space is exhausted. To specify a maximum size for the file, use the MAXSIZE parameter of the ALTER DATABASE statement or the restrict filegrowth (MB) option when using the Properties dialog box in SQL Server Enterprise Manager to expand the database. Expanding a database to increase space for data or the transaction log follows the same process.

Shrinking a Database

Microsoft SQL ServerTM 2000 allows each file within a database to be shrunk to remove unused pages. Both data and transaction log files can be shrunk. The database files can be shrunk manually, either as a group or individually. The database can be set to shrink automatically at given intervals. This activity occurs in the background and does not affect any user activity within the database. When the database is set to shrink automatically using the ALTER DATABASE AUTO_SHRINK option (or the sp_dboption system stored procedure), shrinking occurs when a significant amount of free space is available in the database. However, if the percentage of free space to be removed cannot be configured, as much free space as possible is removed. To configure the amount of free space to be removed, such as only 50 percent of the current free space in the database, use the Properties dialog box in SQL Server Enterprise Manager to shrink the database. You cannot shrink the entire database to be smaller than its original size. Therefore, if a database was created with a size of 10 megabytes (MB) and grew to 100 MB, the smallest the database could be shrunk to, assuming the entire database has been deleted is 10 MB. However, you can shrink the individual database files smaller than their initial size by using the DBCC SHRINKFILE statement. You must shrink each file individually, rather than attempting to shrink the entire database. There are fixed boundaries from which a transaction log file can be shrunk. The size of the virtual log determines the possible reduction in size. Therefore, the log file can never be shrunk to a size less than the virtual log file. For example, a transaction log file of I gigabyte (GB) may comprise five virtual log files of 200 MB each. Shrinking the transaction log file deletes unused virtual log files, but leaves at least one virtual log file. Because each virtual log file in this example is 200 MB, a transaction log file can shrink only a minimum of 200 MB and can shrink only in increments of 200 MB. To allow a transaction log file to shrink to a smaller size, create a smaller transaction log and allow it to grow automatically, rather than creating a large transaction log file. In SQL Server 2000, a DBCC SHRINKDATABASE or DBCC SHRINKFILE operation attempts to a shrink a transaction log file to the requested size (subject to rounding) immediately. You should truncate log file prior to shrinking the file to reduce the size of the logical log and mark as inactive virtual logs that do not hold any part of the logical log. Note It is not possible to shrink the database or transaction log while the database or transaction log is being backed up. Conversely, it is not possible to create a database or transaction log backup while the database or transaction log is being shrunk.

DBCC SHRINKDATABASE

Shrinks the size of the data files in the specified database. SYNTAX DBCC SHRINKDATABASE



Page 44 of 194

( database_name[ , target_precent] [, { NOTRUNCATE | TRUNCATEONLY}]

)

Arguments

database_name Is the name of the database to be shrunk. Database names must conform to the rules for identifiers. For more information, see Using Identifiers. target_percent Is the desired percentage of free space left in the database file after the database has been shrunk. NOTRUNCATE Causes the freed file space to be retained in the database files. If not specified, the freed file space is released to the operating system. TRUNCATEONLY Causes any used space in the data files to be released to the operating system and shrinks the file to the last allocated extent, reducing the file size without moving any data. No attempt is made to relocate rows to unallocated pages. Target percent is ignored when TRUNCATEONLY is used.

Examples

This example decreases the size of the files in the UserDB user database to allow 10 percent free space in the files of UserDB. DBCC SHRINKDATABASE (UserDB, 10)

Viewing a Database

We can view the definition of a database and its configuration settings when you are troubleshooting or considering making changes to a database.

sp_helpdb

Reports information about a specified database or all databases. Syntax sp_helpdb [[ @dbname=]’name’]

Arguments

[@dbname=]’name’ is the full name of the databases for which to provide information, name is sysname, with no default. If name is not specified, sp_helpdb reports on all databases in master.dbo.sysdatabases.

Examples

A. Return information about a single database This example displays information about the pubs database. exec sp_helpdb pubs



Page 45 of 194

B. Return information about all databases This example displays information about all databases on the server running Microsoft SQL ServerTM exec sp_helpdb

Displaying Database and Transaction Log Space

Microsoft SQL ServerTM 2000 can display the number of rows, reserved disk space, and disk space used by a table in a database. SQL Server can also display the reserved disk space that is used by an entire database as well as statistics about the use of transaction log space in a database. This indicates how much data is in the database, whether the database must be expanded (if outgrow is not permitted), and how fast the database is growing (if you maintain a history of the data space that is used).

Examples

A. Space information about a table This example reports the amount of space allocated (reserved) for the titles table, the amount used for data, the amount used for index(es), and the unused space reserved by database objects. USE PUBS EXEC sp_spaceused’titles’ B. Updated space information about a complete database This example summarizes space used in the current database and uses the optional parameter @updateusage. USE pubs sp_spaceused @updateusage = ‘TRUE’

Permissions

Execute permissions default to the public role.

Deleting a Database

You can delete a nonsystem database when it is no longer needed or if it is moved to another database or server. When a database is deleted, the files and their data are deleted from the disk on the server. When a database is deleted, it is permanently deleted from the disk on the server. When a database is deleted, it is permanently deleted and cannot be retrieved without using a previous backup. System databases (msdb, master, model, tempdb) cannot be deleted. It is recommended that you back up the master database after a database is deleted, because deleting a database updates the system tables in master. If master needs to be restored, any database that has been deleted since the last backup of master will still have references in the system tables and may cause error messages to be raised.



Page 46 of 194

DROP DATABASE

Removes one or more databases from Microsoft SQL ServerTM. Removing a database deletes the database and the disk files used by the database. Syntax DROP DATABASE database_name[,…n]

Arguments

database_name Specifies the name of the database to be removed. Execute sp_helpdb from master database to see a list of databases.

Permissions

DROP DATABASE permissions default to the database owner, members of the sysadmin and dbcreator fixed server roles, and are not transferable. Examples A. Drop a single database This example removes all references for the publishing database from the system tables. DROP DATABASE publishing B. Drop multiple databases This example removes all references for each of the listed databases from the system tables. (Assuming newpubs and pubs databases are present in the SQL Server) DROP DATABASE pubs, newpubs



Page 47 of 194

Summary

• A Database is a collection of related data stored in the form of tables, while a DBMS is a system that allows a user to retrieve, manipulate, add and delete data in the database o SQL Server is s client/server based RDBMS

• SQL Server has two types of databases o System Database ( master, model, tempdb, msdb) o User defined database

• The types of files used for creating a database o Primary File o Secondary File o Log File

• Concept of expanding and shrinking the database • Usage of DROP Database command



Page 48 of 194

Review Questions

1. Define a database 2. What is the physical file structure of a database? 3. State the system database 4. What is the command used to create a database 5. Which database’s structure is used while creating new databases? 6. If the Primary Data File is full, can another Primary file be added to the Database?



Page 49 of 194

RDBMS Concepts using SQL Server 2000 Chapter 4: Structured Query Language

Objectives

• What is SQL?

• Creating simple queries using SELECT clause

• Usage of WHERE CLAUSE

• Learn various keywords used to retrieve the data in a specific format



Page 50 of 194

What is Structured Query Language? Structured Query Language (SQL, pronounced as sequel), is a non-procedural language for database manipulation and retrieval of information. It is a de facto standard language for all RDBMS products including Microsoft SQL Server. SQL offers a very simple, yet extremely powerful syntax for manipulating data in tables as well retrieve the information. Tasks that would typically require a large amount of code to be written in most other languages can often be achieved with a single statement in SQL. The SQL language has several parts :

1. Data Definition Language (DDL) provides commands for defining relation schemas, deleting relations and modifying relation schemas

2. Interactive data-manipulation language (DML) includes a query language based on both the relational algebra and the tuple relational calculus. It contains commands to insert tuples into, delete tuples from and modify tuples in the database

3. View definition includes commands for defining views 4. Transaction control includes commands for specifying the beginning and end of transactions 5. Embedded SQL and dynamic SQL defines how SQL statements can be embedded within

general purpose programming languages such as C, C++, Java, COBOL. 6. Integrity in which SQL DDL includes commands for specifying the integrity constraints that the

data stored in the database must satisfy so that updates, which violate integrity constraints, are disallowed.

7. Authorization in which SQL DDL includes commands for specifying access rights to relations and views

The SELECT Statement The SELECT statement is the most frequently used SQL command and is the fundamental way to query data. The syntax is intuitive-at least in its simplest forms-and resembles how you might state a request in English. As its name implies, however, SQL is not only intuitive but also structured and precise. The basic structure of an SQL expression consists of three clauses:

1. The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of the query

2. The from clause corresponds to the Cartesian product operation of the relational algebra. It list the names of the relation (also known as tables), to be scanned in the evaluation of the expression

3. The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause

The basic form of SELECT, using brackets ([]) to identify optional items, appears below. SELECT [DISTINCT][TOP n] <columns to be chosen, optionally eliminating duplicate rows from result set or limiting number of rows to be returned> [FROM]<table names> [WHERE] <criteria that must be true for a row to be chosen> [GROUP BY]<columns for grouping aggregate functions> [HAVING]<criteria that must be met for aggregate functions> [ORDER BY]<optional specification of how the results should be sorted>



Page 51 of 194

Note that the only clause that must always be present is the verb SELECT; the other clauses are optional. For example, if the entire table is needed, data doesn’t need to be restricted to certain criteria, so the WHERE clause isn’t needed. Simple Retrieval with the SELECT statement The generic SELECT…FROM block is used for retrieving data from an existing table. For example, the following is used to view all data from the publishers table select * from Publishers The * symbol is used to indicate that all columns from the table need to be retrieved. If only certain columns need to be displayed, the we will have to explicitly name them in the SELECT clause. The following query retrieves all the Publisher ID’s and Name. select pub_id, pub_name from Publishers Manipulating the Column Names When query results are displayed against the SELECT statement, a column name of the result set is always displayed as the column name specified in the table at the time of its creation. The column name, by default, is the name of the column name specified in the table at the time of its creation. A user-defined column heading can replace the default column heading. SQL Server provides two methods of specifying the column alias. Method1: Select au_lname+au_fname ‘Name of Author’ from Authors Method2: Select ‘Name of Author’ = au_lname+au_fname from Authors Concatenating the two columns To concatenate the values from two columns, the ‘+’ symbol can be used. Example : To concatenate the Author’s first and last name use pubs Select au_lname + ‘ ‘ + au_fname from Authors Usage of DISTINCT keyword The DISTINCT keyword is used to remove duplicate rows from the result set. By default, the queries return all records from the table including duplicates. Example : To list the types of Books Available Select DISTINCT type ‘Category’ from Titles



Page 52 of 194

The above query will display a unique list of all types available in the Titles table. The DISTINCT keyword, will exclude the duplicate occurrences from the result set. Simple Conditional Retrieval The WHERE clause is used in a SELECT statement to specify the criteria for retrieval of data. For example, to view the list of employees having salary more than 3500: Select * from employee where salary > 3500 To view a list of employees from the MKTG department: Select * from employee where dept_code =’MKTG’ For a column containing character-type data, the search value is to be given within a pair of single or double quotes. The following relational operators can be used:

= equal to

!= not equal to

<> not equal to

> greater than

>= greater than or equal to

< less than

<= less than or equal to

Complex Conditional Retrieval Complex Conditional Retrieval can be done by combining two or more conditional statements while retrieving the data. This can be achieved using the Logical Operators like AND True when both the conditions evaluate to true, False Otherwise OR True when atleast one of the conditions evaluates to true, False otherwise NOT Negates the condition To view the list of all Books of type Modern Cooking with price greater than 10$ Select * from Titles where type=’mod_cook’ and price > 10 To view the list of Books which are priced between 10 and 20 Select * from Titles where price >=10 and price <=20 To view the list of all Books which are either of type modern cooking or traditional cooking : Select * from Titles where type = ‘mod_cook’ or type=’trad cook’



Page 53 of 194

Alternatively, the above query can be executed using the IN keyword Select * from Titles where type in (‘trad_cook’, ‘mod_cook’) To view the list of all Books which are not of type Modern Cooking or Traditional Cooking Select * from Titles where type not in (‘mod_cook’, ‘trad_cook’)

Usage of BETWEEN Keyword

The BETWEEN keyword is used to check for a range of values. When using the BETWEEN keyword, both the values are included in the conditional clause. Example : To view the names of books and their category whose price is greater than equal to 10 and less than equal to 20 Select title ‘Name of Book’, type ‘Category’ from Titles where price between 10 and 20 To view the list of employees who have joined between 01-01-1980 and 12-31-1989: Select * from employee where emp_hire_date between ’01-01-1980’ and ’12-31-1989’

Usage of LIKE Keyword

The LIKE keyword is used to search for a specific pattern in a particular column. To view the list of all employees whose names starts with the letter A: Select * from employee where emp_name like ‘A%’ The wildcard character ‘%’ can be replaced by zero, one or more characters at the specified position. To view the list of all employees with the name ending with ‘ari’: select * from employee where emp_name like ‘%ari’ To view the list of all employees who are in the highest scale of their respective grades (M1, S1, etc.): Select * from employee where grade like ‘_1’ The wildcard character ‘_’ can be replaced by exactly one character in the specified position.

Wildcard Searches For

% Any string of zero or more characters

_ (underscore) Any single character

[] Any single character within the specified range (for example, [a-f]) or the specified set (for example, [abcdef])



Page 54 of 194

[^] Any single character not within the specified range (for example, [^a-f]) or the specified set (for example, [^abcdef])

The ORDER BY Clause When we require sorting the data in a particular order, we can use the ORDER BY clause. By default this clause sorts the specified column in ascending order, unless specified. USE pubs SELECT au_fname, au_lname, phone AS Telephone, city, state FROM authors ORDER BY au_lname ASC, au_fname ASC The above query will sort the specified data based on the Author’s last name Example To sort the book details in descending order of its rate, the following query needs to be written Select ‘Book’=title, ‘Category’=type,’ Rate=price From titles ORDER BY price DESC The GROUP BY Clause SQL Server provides a method of groupng the result set using the GROUP BY clause. The GROUP BY clause summarizes the result set into the groups defined in the query using aggregate functions. The Having clause further restricts the result set, to produce the data based on a condition. A GROUP BY clause can include an expression as long as it does not include any aggregate function. More than one column can be used in the GROUP BY clause to nest groups. The following example returns the minimum and maximum values of different types of books whose price is greater than $10. SELECT type, ‘Minimum’=min(price), ‘Maximum’=max(price) From titles Where price > 10 Group By type The aggregate function specified in the SELECT list calculates the summary value for each group. CUBE Specifies that, in addition to the usual rows provided by GROUP BY, summary rows are introduced into the result set. A GROUP BY summary row is returned for every possible combination of group and subgroup in the result set. A GROUP BY summary row is displayed as NULL in the result, but is used to indicate all values. Use the GROUP BY function to determine whether null values in the result set are GROUP BY summary values. The number of summary rows in the result set is determined by the number of columns included in the GROUP BY clause. Each operand (column) in the GRROUP BY clause is bound under the grouping NULL and grouping is applied to all other operands (columns). Because CUBE returns every possible combination of group and subgroup, the number of rows is the same, regardless of the order in which the grouping columns are specified.



Page 55 of 194

CUBE explodes data to produce a result set containing a superset of groups, with a cross-tabulation of every column to the value of every other column, as well as a special super-aggregate value that can be thought of as meaning ALL VALUES. select dept_code, sum (salary) as ‘Total Salary’ from employee group by dept_code with cube The result set is as follows: dept_code Total Salary ------------ --------------- COMM 600 FIN 17400 GEN 16000 HRD 18100 MKTG 38500 SECY 4800 SOFT 45500 NULL 146300 ROLLUP Specifies that, in addition to the usual rows provided by GROUP BY, summary rows are introduced into the result set. Groups are summarized in a hierarchical order, from the lowest level in the group to the highest. The group hierarchy is determined by the order in which the grouping columns are specified. Changing the order of the grouping columns can affected the number of rows produced in the result set. Important Distinct aggregates, for example, AVG(DISTINCT column_name), COUNT(DISTINCT column_name), and SUM(DISTINCT column_name), are not supported when using CUBE or ROLLUP. If used, SQL Server returns an error message and cancels the query. The HAVING Clause The HAVING keyword in the SELECT query can be used to select rows from the intermediate result set. The GROUP BY clause treats the HAVING clause in the same way as the SELECT statement treats the WHERE clause. The result set produced with the GROUP BY clause eliminate all the records and values that do not meet the condition specified in the WHERE clause. The GROUP BY clause collects the data that matches the condition and summarizes it into an expression to produce a single value for each group. The HAVING clause eliminates all the groups that do not match the conditions. The following example displays all the book types along with their average prices which are listed for all the entries which have price greater than $10 and have a group average value greater then $15. Select type, ‘Average Price’=avg(price) From Titles Where Price > 10 Group by type Having avg(price) > 15 As final example, we shall build a query consisting of SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses.



Page 56 of 194

Considering only those employee who are more than 30 years of age, get the department-wise total of salary for only those departments where the total salary is more than 10000, and print this list in the descending order of the total salaries. select dept_code, sum (salary) from employee where age > 30 group by dept_code having sum (salary) > 10000 order by sum (salary) desc; The UNION Clause Combines the result of two or more queries into a single result set consisting of all the rows belonging to all queries in the union. This is different from using joins that combine columns from two tables. Two basic rules for combining the result sets of two queries with UNION are: • The number and the order of the column must be identical in all queries • The data types must be compatible. select * from employee union select * from emp UNION ALL Specifies that multiple result sets are to be combined and returned as a single result set. It incorporates all rows into the results, including duplicates. If not specified, duplicate rows are removed. select * from employee union all select * from emp COMPUTE Clause Generates totals that appear as additional summary columns at the end of the result set. When used with BY, the COMPUTE clause generates control-breaks and subtotal in the result set. You can specify COMPUTE BY and COMPUTE in the same query. Example SELECT * FROM EMPLOYEE COMPUTE SUM(SALARY) This example calculates the sum of the prices (for prides over $10) for each type of cookbook, in order first by type of book and then by price of book. SELECT type, price FROM titles WHERE price > $10 AND type LIKE ‘%cook’ ORDER BY type, price COMPUTE SUM(price) BY type This example orders the table by dept_code and sums the salary by dept_code



Page 57 of 194

select * from employee order by dept_code compute sum(salary) by dept_code TOP The TOP clause limits the number of rows returned in the result set. Syntax : TOP n [PERCENT] Where n specified how many rows are to be returned, if PERCENT is not specified. If PERCENT is specified, n if the percentage of rows to be returned. QUERY 1 SELECT TOP 5 title_id, price, type FROM titles title_id price type --------- ----------------- ------------- BU1032 19.990 business BU1111 11.9500 business BU2075 2.9900 business BU7832 19.9900 business MC2222 19.9900 mod_cook QUERY 2 SELECT TOP 5 title_id, price, type FROM titles ORDER BY price DESC title_id price type --------- ------------------ --------------- PC1035 22.9500 popular_comp PS1372 21.5900 psychology TC3218 20.9500 trad_cook PC8888 20.0000 popular_comp BU1032 19.9900 business Notice that simply including TOP 5 in the SELECT list is equivalent to setting ROWCOUNT to 5. SQL Server just stops returning rows after the first five. In the example above, however, you should note that other books exist with a price of $19.99, but because we wanted only five, we didn’t see them. Does this output really answer the question, “What are the five highest priced books?” If you answered no and really want to see all the books with the same price as the ones listed, you can use the WITH TIES option. This option is only allowable if your SELECT statement includes an ORDER BY: SELECT TOP 5 WITH TIES title_id, price, type FROM titles ORDER BY price DESC This returns the following eight rows: title_id price type



Page 58 of 194

--------- ----------------- ---------- PC1035 22.9500 popular_comp PS1372 21.5900 psychology TC3218 20.9500 trad_cook PC8888 20.0000 popular_comp BU1032 19.9900 business BU7832 19.9900 business MC2222 19.9900 mod_cook PS3333 19.9900 psychology If we want to see a certain fraction of the rows, we can use TOP with PERCENT, which will round up to the nearest integer number of rows: SELECT TOP 30 Percent title_id, price, type FROM titles ORDER BY price DESC



Page 59 of 194

Summary

• SQL is used to retrieve data, add or modify the contents present in various components of the SQL Server.

• The normal SELECT Query used to retrieve data from the table ideally consists of the SELECT and FROM clause o To retrieve all the rows and columns of the table * is to be used o DISTINCT keyword is used to retrieve only unique values o TOP keyword is used to display on certain topmost rows and columns of a

given result set • For conditional retrieval of data, the WHERE clause can be used • Using Logical operators, LIKE, BETWEEN and IN clause, complex queries can be

generated to retrieve the data effectively • ORDER BY clause is used to sort data

o COMPUTE and COMPUTE BY can be used to present grand total of any particular column, or sub total respectively.

• GROUP BY clause is used present summarized data by grouping rows based on a specified column into groups

• Conditions can be applied on the groups generated by the GROUP BY Clause using the HAVING clause

• CUBE and ROLLUP can be used with the GROUP BY clause



Page 60 of 194

REVIEW QUESTIONS

1. Which SQL statement is used to extract data from a database? 2. How would you retrieve the Pub_name, state columns from the publishers table present in the

pubs database? 3. Display the names of all books which have the word ‘cook’ present in their category [Hint : The

category of the book is present in the type column, use the LIKE clause] 4. Display the 3 most expensive books from the titles table 5. Generate a report which will print the maximum price, minimum price, average price and total

number of books in every category ? [Hint : The Titles table from the Pubs database would be able to give this information]

6. Display the Book name, price and the book’s category along with the sum total price of all books in every category

7. Display the minimum price of the books in every category in descending order.



Page 61 of 194

RDBMS Concepts using SQL Server 2000 Chapter 5: Tables

Objectives

• Creation of Tables

• Data types in SQL

• Modifying Table Structure

• Delete and Truncate



Page 62 of 194

Tables are database objects that contain all the data in a database. A table definition is a collection of columns. In tables, data is organized in a row-and-column format similar to a spreadsheet. Each row represents a unique record, and each column represents a field within a record. For example, a table containing employee data for a company can contain a row for each employee and columns representing employee information such as employee number, name, address, job title and home phone number.

Designing Tables

When you design a database, you decide what tables you need, what type of data goes in each table, who can access each table, and so on. As you create and work with tables, you continue to make more detailed discussions about them. The most efficient way to create a table is to define everything you need in the table at one time, including its data restrictions and additional components. However, you can also create a basic table, add some data to it, and then work with it for a while. This approach gives you a chance to see what type of transactions are most common and what types of data are frequently entered before you commit to a firm design by adding constraints, indexes, defaults, rules and other objects. It is a good idea to outline your plans on paper before creating a table and its objects. Decisions that must be made include: • Types of data the table will contain. • Columns in the table and the data type (and length, if required) for each column. • Whether columns accept null values. • Whether and where to use constraints or defaults and rules. • Types of indexes needed, where required and which columns are primary keys and which are

foreign keys.

Creating and Modifying a Table

After you have designed the database, the tables that will store the data in the database can be created. The data is usually stored in permanent tables. Tables are stored in the database files until they are deleted and are available to any user who has the appropriate permissions.

Temporary Tables

You can also create temporary tables. Temporary tables are similar to permanent tables, except temporary tables are stored in tempdb and are deleted automatically when no longer in use. The two types of temporary tables, local and global differ from each other in their names, their visibility and their availability. Local temporary tables have a single number sign (#) as the first character of their names; they are visible only to the current connection for the user; and they are deleted when the user disconnects from the instances of Microsoft SQL ServerTM 2000. Global temporary tables have two number signs (##) as the first characters of their names; they are visible to any user after they are created; and they are deleted when all users referencing the table disconnect from SQL Server. For example, if you create a table named Employees, the table can be used by any person who has the security permissions in the database to use it, until the table is deleted. If you create a local temporary table named #employees, you are the only person who can work with the table, and it is deleted when you disconnect. If you create a global temporary table named ##employees, any user in the database can work with this table. If no other user works with this table after you create it, the table is deleted when you disconnect. If another user works with the table after you create it, SQL Server deletes it when both of you disconnect.



Page 63 of 194

Table Properties

You can define up to 1,024 columns per table. Table and column names must follow the rules for identifiers; they must be unique within a given table, but you can use the same column names in different tables in the same database. You must also define a data type for each column. The number of rows and total size of the table are limited only by the available storage and the maximum number of bytes per row is 8,060. Although table names must be unique for each owner within a database, you can create multiple tables with the same name if you specify different owners for each. You can create two tables named employees and designate Jonah as the owner of one and Sally as the owner of the other. When you need to work with one of the employee tables, you can distinguish between the two tables by specifying the owner with the name of the table.

CREATE TABLE

In it simplest form the SQL (DDL) statement used to create a database table has the following syntax: CREATE TABLE table-name ( column-definition-list ) ;

Data Types Microsoft SQL Server supports a wide range of data types. Some of the important data types are described in the following table:

Datatype Range

bigint -263 to 263-1 int -231 to 231-1 smallint -215 to 215 -1 tinyint 0 to 255 money -922,337,203,685,477.5808 to -

922,337,203,685,477.5807 smallmoney -214,748.3648 to -214,748.3647 datatime January 1, 1753 to December 31st,

9999 smalldatetime January 1st, 1900 to June 6th ,2079 bit Either 0 or 1 decimal -1038+1 to 1038-1 float -1.79E + 308 through 1.79E + 308 char Fixed-length non-Unicode character

data with a maximum length of 8,000 characters

varchar Variable-length non-Unicode data with a maximum of 8,000 characters

text Variable-length non-Unicode data with a maximum length of 2^31 – 1

nchar Fixed-length Unicode data with a maximum length of 4,000 characters

nvarchar Variable-length Unicode data with a maximum length of 4,000 characters

ntext Variable-length Unicode data with a maximum length of 2^30 - 1



Page 64 of 194

Examples:

Creation of a simple table

create table Department ( dept_id int, dept_name varchar(20), dept_desc varchar(100) ) Note the following about the above code:

• The entire code given above can be entered in a single line or in multiple lines. The advantage of splitting the above code in multiple lines is better readability.

• SQL code is not case sensitive.

The command begins with the keywords “create table”, followed by the name of the table to be created. The table name can be up to 128 characters in length. The list of columns in the table follows, within a pair of brackets. The list of columns consists of two mandatory components: column name, and its data type. There may be additional clauses, but they are all optional, and will be covered in detail later. The list of columns is separated by a comma. Once this table is created, we can insert values into the table. To do so, this query would have to be executed in the Query Analyser. Please note that all the field of this table are nullable by nature. To ensure that the values like dept_id, dept_name do not contain NULL values(NULL are unknown values), we will have to use the NOT NULL keyword during table creation. The table will have to created as create table Department ( dept_id int not null, dept_name varchar(20) not null, dept_desc varchar(100) ) To ensure that the table has been created successfully and to check its structure, we can use the following stored procedure

To View Structure of a Table

sp_help <table name> For example, to view the structure of the Department table which has just been created, we can use sp_help Department



Page 65 of 194

To insert values into the table

insert into Department values (1, ‘Finance’,’Accounts Department’) You can check if these values have been successfully inserted into the table by using a select query as follows select * from Department To insert partial values into the Table Since it is not compulsory to give a value for dept_desc, it is possible that we supply values for only the first two columns, as given below insert into Department (dept_id, dept_name) values (2, ‘Training’) The above row would be added to the table, the dept_desc column would have a NULL value. IDENTITY Keyword The IDENTITY Keyword can be used while creating a table to indicate that as and when a new row is being added to the table, Microsoft® SQL Server™ would have to provide a unique, incremental value for the column. Identity columns are commonly used in conjunction with PRIMARY KEY constraints to serve as the unique row identifier for the table. The IDENTITY property can be assigned to tinyint, smallint, int, bigint, and decimal numeric columns. Only one identity column can be created per table. Bound defaults and DEFAULT constraints cannot be used with an identity column. The syntax for the identity keyword is IDENTITY (seed, increment). You must specify both the seed and increment or neither. If neither is specified, the default is (1,1). seed - Is the value used for the very first row loaded into the table. increment - is the incremental value added to the identity value of the previous row loaded.

Example of using the IDENTITY Keyword create table Student ( stud_id int identity(100,1), stud_fname varchar(20) not null, stud_mname varchar(20), stud_lname varchar(20) not null, address varchar(100) not null, tel_no varchar(10) ) Note :

1. The value for stud_id, would be automatically given by SQL Server, whenever a new row is inserted. The first row added would have a value 100 and for every other row, it would be incremented by one from the last value.

2. It is not mandatory for the stud_mname (Student’s Middle name), address and telephone number to be entered. These columns, accept NULL values.

To Add values to the Student table,



Page 66 of 194

insert into Student(stud_fname, stud_lname, address) values ('Mark','Robinson','New York') To check if the values have been added to the table, you can issue a select statement as follows, Select * from Student It would appear as follows stud_id stud_fname stud_mname stud_lname address tel_no ---------- ---------------- ---------------- --------------- ---------- --------- 100 Mark NULL Robinson New York NULL (1 row(s) affected)

Inserting data into a table from already existing table

The insert command can also help you to insert data from an already existing table. For example if you want to insert data into table emp1 from employee, assuming that the table structure is the same then the following command will help you (Remember the emp1 table should be present in the database) insert emp1 select * from employee or selecting only a few columns insert emp1 (emp_code, emp_name) select emp_code, emp_name from employee Create Table with SELECT INTO Before creating a table with the select option you must run the following database option ‘select into’ to true by executing this command. EXEC sp_dboption ‘pubs’, ‘select into/bulkcopy’,’true’ USE pubs SELECT * INTO newtitles FROM titles WHERE price >25 OR price <20 Create a table with just the structure of another table This example creates the employees table and uses GETDATE() (This function is used to display the current date and time) for a default value for the employee hire date. CREATE TABLE employee ( emp_id char(11) NOT NULL, emp_lname varchar(40) NOT NULL, emp_fname varchar(20) NOT NULL, emp_hire_date datetime DEFAULT GETDATE(), DEPT_CODE char(5) NOT NULL, GRADE char(5) NOT NULL, SALARY money NOT NULL, EMP_MGR VARCHAR(30) )



Page 67 of 194

The following command will just copy the structure of the table as the WHERE clause will never satisfy the condition 1=2. SELECT * INTO newemp FROM employee Where 1=2 Dropping a Table The DROP TABLE command is used for dropping the structure and data for a table. For example, drop table department The drop table command allows you to drop one or many tables in a single command. You can drop multiple tables in one statement as long as their names separated by commas. The command assumes that you wish to drop a table in the local database unless you specify another database or a table owner. Here’s an example of dropping multiple tables with one command. DROP TABLE pubs.dbo.distributors, new_sales, pubs.sales_analysis Modify Table Structure The ALTER TABLE command can be used to modify the structure of a table by adding or removing columns. For example, to add a new column to the employee table: alter table employee add new_col char(4) To drop a column from the employee table: alter table employee drop column new_col Modify an already existing column You cannot modify an already existing column i.e. change the data type unless the column to be modified is not empty. Altering a column which is empty alter table employee alter column grade int Altering a column which is not empty, Will generate the following error alter table employee alter column grade int Modifying Data in a Table The UPDATE statement is used to modify existing rows in a table. For example, to increase the salary for all employees by 100:



Page 68 of 194

update employee set salary=salary+100 To increase the salary of a particular employee by 200: update employee set salary=salary+200 where emp_id=100 Deleting Data in a Table The DELETE statement is used to delete one or more rows from a table. For example, to delete the data for Neerja Girdhar: delete from employee where emp_lname=’Greg’ To delete data for all employees from marketing: delete from employee where dept_code=’MKTG’ To delete all rows from the employee table: delete from employee Truncate Command Alternatively, you can use the Truncate command, if all the data from the employee table needs to be removed. The difference between the Delete and Truncate command is that the Truncate command is faster as it does not make an entry in the Transaction Log. Even though using the TRUNCATE command is faster, it would be risky, as the data could never be recovered unlike in the case of the DELETE command. EXAMPLE : TRUNCATE TABLE employee Creation of Temporary Tables Tables created with their names pre-fixed with # symbol are temporary tables. These tables are present in the Temporary Database and exists as long as the user’s connection is active. Local temporary tables are visible only to the current session; both the table data and table definition are deleted when the user logs off. CREATE TABLE #tmp_customer_order_totals ( customer_name VARCHAR(30), customer_total MONEY ) Global temporary tables are visible to all users as they are destroyed after every user who was referencing the table disconnects from the SQL Server. An Example of a global temporary table is given below. CREATE TABLE ##tmp_customer_order_totals



Page 69 of 194

( customer_name VARCHAR(30), customer_total MONEY ) Creation of User Defined Datatypes SQL Server enables the creation of user-defined datatypes. User-defined data types are constructed over an underlying system datatype, with possible rules and/or default bound to it. Let’s illustrate user-defined datatypes with a mini-case study: assume that we want to store an employee’s social security number (SSN) as the primary key for each record in the employee table. • Since SSNs are 9-digit strings of integer numbers, we could create a column called emp_id with a datatypt of INT or SMALLENT. Using an INT for such a number has one big problem: INT does not retain leading zeros. Thus an employee with an SSN of 0012-34-567 would be stored as 1234567 in an INT column instead of 001234567- a clear error. So we must choose a CHAR(9) or VARCHAR(9) datatype so that we can certain all leading zeros. In this case, CHAR is superior to VARCHAR for two reasons. First, the column will always contain a 9-digit string column is the primary key of the table and thus NOT NULL. These behaviors play into the strength of the CHAR datatype. We can now create own user-defined datatype called empid as a CHAR(9) field using the system stored procedure sp_addtype. This procedure creates a user-defined datatype by adding a descriptive record to the systypes system table. EXEC sp_addtype empid, ‘char(9)’, ‘NOT NULL’ For example, perhaps your database will store various phone numbers in many tables. Although no single, definitive way exists to store phone numbers, in this database consistency is important. You can create a phone_number UDDT and use it consistently for any column in any table that keeps track of phone numbers to ensure that they all use the same datatype. Here’s how to create this UDDT: And here’s how to use the new UDT when creating a table: EXEC sp_addtype phone_number, ’varchar(20)’,’not null’ CREATE TABLE customer ( cust_id smallint NOT NULL, cust_name varchar(50) NOT NULL, cust_addr1 varchar(50) NOT NULL, cust_addr2 varchar(50) NOT NULL, cust_city varchar(50) NOT NULL, cust_state char(2) NOT NULL, cust_zip varchar(10) NOT NULL, cust_phone phone_number, cust_fax varchar(20) NOT NULL, cust_email varchar(30) NOT NULL, cust_web_url varchar(20) NOT NULL ) Dropping a user-defined datatype is an even easier command than the one that creates it. Just follow this syntax:



Page 70 of 194

sp_DROPTYPE user_datatype_name The only restriction on using sp_droptype is that user-defined datatype cannot be dropped if a table or other database object references it. You must first drop any database objects that reference the user-defined datatype before dropping the user datatype itself.



Page 71 of 194

Summary

• A table is an object in a database that stores data as a collection of rows and columns

• Data types are attributes that specify what type of information can be stored in a column.

• SQL Server has two types of data types o System defined o User defined

• Values can be added to a table using the INSERT command • For unique values in a particular column, IDENTITY property can be used • When a table is deleted using the DROP TABLE command, its entire structure is

removed from the server • The data from a table can be removed either using a DELETE command or using

the TRUNCATE TABLE command



Page 72 of 194

Review Questions

1. What is the SQL statement used to create a Table 2. How would you create the table with the following structure?

Attribute Data type Nullability stud_id int Not Null, Identity stud_name varchar(20) Not Null stud_addr varchar(50) Not Null city varchar(10) Not Null phone_no varchar(10) Null mobile varchar(10) Null

3. Add the following values into the table Stud_id Stud_name Stud_addr City Phone_no Mobile 1 Mark Street 10, Bldg

5 Los Angeles 0923847465 NULL

2 Garry Bldg 3, Flat 9, Street 3

New York 2384755 987022122

4. Modify the mobile number of Mark as 9438457 5. Delete the entry of Garry from the table 6. Differentiate between the DELETE and TRUNCATE command



Page 73 of 194

RDBMS Concepts using SQL Server 2000 Chapter 6: In-Built Functions

Objectives

• Understand the need for Functions

• Learn about Aggregate Functions

• Mathematical Functions

• String Functions

• Date Functions

Aggregate Function



Page 74 of 194

Aggregate functions (sometimes referred to set functions) allow you to summarize a column of output. SQL Server provides six general aggregate functions.

Aggregate Function Description

AVG(expression) Returns the average (mean) of all the values, or only the DISTINCT values, in the expression. You can use AVG with numeric- columns only. Null values are ignored.

COUNT(expression) Returns the number of non-null values in the expression. When DISTINCT is specified, COUNT finds the number of unique non-null values. You can use COUNT with both numeric and character columns. Null values are ignored.

COUNT(*) Returns the number of rows. COUNT(*) takes no parameters and can’t be used with DISTINCT. All rows are counted, even those with null values.

MAX(expression) Returns the maximum value in the expression. You can use MAX with numeric, character, and datetime columns but with bit columns. With character columns, MAX finds the highest value in the collecting sequence. MAX ignores any null values. DISTINCT is available for ANSI compatibility, but it’s not meaningful with MAX.

MIN(expression) Returns the minimum value in the expression. You can use MIN with numeric, character, and datetime columns, but not with bit columns. With character columns, MIN finds the value that is lowest in the sort sequence. MIN ignores any null values. DISTINCT is a1 available for ANSI compatibility, but it’s not meaningful with MIN.

SUM(expression) Returns the sum of all the values, or only the DISTINCT values, in the expression. You can use SUM with numeric column only. Null values are ignored.

You can increase the power of aggregate functions by allowing them to be grouped and by allowing the group to have criteria established for inclusion via the HAVING clause.

The following queries provide some examples. EXAMPLE 1: Use pubs select 'MAXIMUM RATE'=max(price), 'MINIMUM RATE'=min(price) from titles The output would be as follows: MAXIMUM RATE MINIMUM RATE --------------------- --------------------- 22.9500 2.9900 Example 2 :



Page 75 of 194

To view the average price of books present in the table : select 'AVERAGE PRICE OF BOOKS'=avg(price) from titles The output is as follows : AVERAGE PRICE OF BOOKS ---------------------- 14.7662 Example 3 : To see the number of authors present in CALIFORNIA select 'Number of Authors'=count(*) from authors where state=’CA’ The output is as follows : Number of Authors ----------------------- 15 Example 4: select count (*) from employee; Example 5: Find out the total salary paid to all employees. select sum (salary) from employee To find all the information for the employee earning the minimum salary, we could write select * from employee where salary = ( select min (salary) from employee); Date and time function These scalar functions perform an operation on a date and time input value and return either a string, numeric, or date time value. All these functions use the Datepart argument which has given below.

Datepart Abbreviations

year yy, yyyy

quarter qq, q

month mm, m

dayofyear dy, y



Page 76 of 194

day dd, d

week wk, ww

weekday dw

hour hh

minute mi, n

second ss, s

millisecond ms

The various Date and Time functions are given below.

Function

DATEADD

DATEDIFF

DATENAME

DATEPART

DAY

GETDATE

DATEADD

Returns a new datetime value based on adding an interval to the specified date. Syntax DATEADD(datepart, number, date) Arguments datepart Is the parameter that specifies on which part of the date return a new value. The table lists the departs and abbreviations recognized by SQL Server Example select ‘DATEADD’=dateadd(mm,3,'2008/03/05') Note : Date is in YYYY/MM/DD Format Output is : DATEADD -------------- 2008-06-05

DATEDIFF

Returns the number of date and time boundaries crossed between two specified dates. Syntax



Page 77 of 194

DATEDIFF (datepart, startdate, enddate) Example SELECT DATEDIFF(day, '2008/01/01', '2008/01/20') AS no_of_days The output is no_of_days ----------- 19

DATENAME

Returns a character string representing the specified depart of the specified data. Syntax DATENAME(datepart, date) Example SELECT DATENAME(month, '2008/01/01') AS day The output is day ------------------------------ January

DATEPART

Returns an integer representing the specified department of the specified date. Syntax DATEPART(datepart, date) Example SELECT DATENAME(year, '2008/01/01') AS year The output is year -------- 2008

DAY

Returns an integer representing the day datepart of the specified date. Syntax



Page 78 of 194

DAY(date) Arguments date Is an expression of type datetime or smalldatetime Return Type int Remarks This function is equivalent to DATEPART(dd, date). Examples This example returns the number of the day from the date 03/12/1998 SELECT DAY(‘03/12/1998’) AS ‘DAY Number’

MONTH

Returns an integer that represents the month part of a specified date. Syntax MONTH(date) Arguments date Is an expression returning a datetime or smalldatetime value, or a character string in a date format. Use the datetime data type only for dates after January 1, 1753. Return Types int Remarks MONTH is equivalent to DATEPART(mm,date).

YEAR

Returns an integer that represents the year part of a specified date. Syntax YEAR(date)



Page 79 of 194

Remarks This function is equivalent to DATEPART(yy,date). Examples This example returns the number of the year from the date 03/12/1998. SELECT “Year Number” = YEAR(‘03/12/1998’)

CAST and CONVERT

Explicitly converts an expression of one data type to another. CAST and CONVERT provide similar functionality. Syntax Using CAST: CAST(expression AS data_type) Using CONVERT: CONVERT (data_type[length)], expression [,style]) SELECT SUBSTRING(title, 1, 30) AS Title, ytd_sales FROM titles WHERE CAST(ytd_sales AS char(20)) LIKE ‘3%’ SELECT SUBSTRING(title, 1, 30) AS Title, ytd_sales FROM titles WHERE CONVERT (char(20), ytd_sales) LIKE ‘3%’

Mathematical Functions (T-SQL)

These scalar functions perform a calculation, usually based on input values provided as arguments, and return a numeric value.

ABS DEGREES RAND

ACOS EXP ROUND

ASIN FLOOR SIGN

ATAN LOG SIN

ATN2 LOG10 SQUARE

CEILING PI SQRT

COS POWER TAN

COT RADIANS

Note Arithmetic functions, such as ABS, CEILING, DEGREES, FLOOR, POWER, RADIANS, and SIGN, return a value having the same data type as the input value.



Page 80 of 194

Trigonometric and other functions, including EXP, LOG, LOG10, SQUARE, and SQRT, cast their input values to float and return a float value.

ABS (T-SQL)

Returns the absolute, positive value of the given numeric expression. Syntax ABS(numeric_expression) Examples This example shows the effect of the ABS function on three different numbers SELECT ABS(-1.0), ABS(0.0), ABS(1.0) Here is the result set: ---- ----- ----- 1.0 .0 1.0

SQUARE (T-SQL)

Returns the square of the given expression. Syntax SQUARE(float_expression) Example select square(25) Here is the result set -------------------------------------------- 625.0 POWER (T-SQL) Returns the value of the given expression to the specified power. Syntax POWER(numeric_expression, y) Example Select power(25,2) Here is output ----------- 625 ROUND (T-SQL)



Page 81 of 194

Returns a numeric expression, rounded to the specified length or precision. Syntax ROUND(numeric_expression, length[,function]) Example SELECT ROUND(123.9994,3), ROUND(123.9995,3) Here is the result: ------------ ---------- 123.9990 124.0000

FLOOR (T-SQL)

Returns the largest integer less than or equal to the given numeric expression Syntax FLOOR(numeric_expression) Examples This examples shows positive numeric, negative numeric, and currency value with the FLOOR function SELECT FLOOR(123.45), FLOOR(-123.45), FLOOR($123.45) The result is the integer portion of the calculated value in the same data type as numeric_expression. ----- ----- --------------- 123 – 124 123.0000 CEILING (T-SQL) A mathematical function that returns the smallest integer than or equal to the given numeric expression. Syntax CEILING(numeric_expression) Example SELECT CEILING($123.45), CEILING($-123.45), CEILING($0.0) GO Here is the result set: ------- ------- --------------------------- 124.00 – 123.00 0.00 String Functions (T-SQL) These scalar functions perform an operation on a string input value and return a sting or numeric value. ASCII (T-SQL)



Page 82 of 194

Return the ASCII code value of the leftmost character of a character expression. Syntax ASCII(character_expression) CHAR (T-SQL) A string function that converts an int ASCII code to a character Syntax CHAR(integer_expression) Control character value Tab CHAR(9) Line feed CHAR(10) Carriage return CHAR(13) CHARINDEX (T-SQL) Returns the starting position of the specified expression in a character string. Syntax CHARINDEX(expression1, expression2 [, start_location]) Example select CHARINDEX(‘ful’,’wonderful’) -------------- 7 SELECT CHARINDEX(‘wonderful’, notes, 5) FROM titles WHERE title_id = ‘TC3218’ ------------------ 46 DIFFERENCE (T-SQL) Returns the difference between the SOUNDEX values of two character expression as an integer. Syntax DIFFERENCE(character_expression, character_expression) SELECT DIFFERENCE(‘Smithers’, ‘Smythers’) Here is the result set: --------- 4



Page 83 of 194

(1 row(s) affected) SELECT SOUNDEX(‘Green’), SOUNDEX(‘Green’), DIFFERENCE(‘Green’,’Green’) ------- ------ ------------- G650 G650 4 SELECT SOUNDEX(‘Blotchet-Halls’), SOUNDEX(‘Green’), DIFFRENCE(‘Blotchet-Halls’,’Green’) --- ----- ---------- B432 G650 0 LEFT (T-SQL) Returns the part of a character string starting at a specified number of character from the left. Syntax LEFT(character_expression, integer_expression) Arguments Character_expression

Is an expression of character or binary data. character_expression can be a constant variable, or column. character_expression must be a data type that can be implicitly convertible to varchar. Otherwise, use the CAST function to explicitly convert character_expression.

integer_expression Is a positive whole number. If integer_expression is negative, a null string is returned.

Examples This example returns the five leftmost characters of each book title. USE pubs GO SELECT LEFT(title, 5) FROM titles ORDER BY title_id GO Here is the result set: ------ The B Cooki You C Strai Silic The G



Page 84 of 194

The P But I Secre Net E Compu Is An Life Prolo Emoti Onion Fifty Sushi (18 row(s) affected)

A. Use LEFT with a character string

This example uses LEFT to return the two leftmost character of the character string abcdefg. SELECT LEFT(‘ancdefg’,2) GO Here is the result set: -- an (1 row(s) affected) LEN (T-SQL) Return the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks. Syntax LEN(string_expression) Arguments string_expression Is the string expression to be evaluated. Examples This example selects the number of characters and the data in Company Name for companies located in Finland. USE Northwind GO SELECT LEN(CompanyName) AS ‘Length’, CompanyName FROM Customers WHERE Country = ‘Finland’ Here is the result set. Length CompanyName ----------- --------------------- 14 Wartian Herkku 11 Wilman kala



Page 85 of 194

LOWER (T_SQL) Returns a character expression after converting uppercase character data to lowercase. Syntax LOWER(character_expression) Arguments character_expression

Is an expression of character or binary data. character_expression can be a constant, variable, or column. character_expression must be of a data type that is implicitly convertible to varchar. Otherwise, use CAST to explicitly convert character_expression.

LTRIM (T_SQL) Return a character expression after removing leading blanks. Syntax LTRIM(character_expression) Argument Character_expression

Is an expression of character or binary data. Character_expression can be a constant, variable or column. character_expression must be of a data type that is implicitly convertible to varchar. Otherwise, use CAST to explicitly convert character__expression.

Examples This example uses LTRIM to remove leading spaces from a character variable. DECLARE @string_to_trim varchar(60) SET @string_to_trim = ‘five spaces are at the beginning of this string’ SELECT ‘ Here is the string without the leading spaces’ + LTRIM(@string to trim) Here is the result set: ------------------------------------------------------------- Here is the string without the leading spaces Five spaces are at the beginning of this string. (1 row(s) affected) REVERSE (T-SQL) Returns the reverse of a character expression. Syntax



Page 86 of 194

REVERSE(character_expression) Argument Character_expression

Is an expression of character data. Character_expression can be a constant, variable, or column of either character or binary data.

Examples This example returns all author first names with the character reversed. USE pubs GO SELECT REVERSE(au_fname) FROM authors ORDER BY au_fname Here is the result set: --------------- maharbaA okikaA treblA nnA ennA truB enelrahC lyrehC naeD kriD rehtaeH sennI nosnhoJ aiviL eirojraM rednaeM leahciM lehciM ratsgninroM dlanigeR lyrehS snraetS aivlyS (23 row(s) affected) RIGHT (T-SQL) Returns the part of a character string starting a specified number of integer_expression character from the right. Syntax RIGHT(character_expression, integer_expression) Arguments



Page 87 of 194

character_expression

Is an expression of character data. character_expression can be a constant, variable, or column of either character or binary data

integer_expression

Is the starting position, expressed as a positive whole number. If integer_expression is negative, an error is returned.

Character_expression must of a data type that is implicitly convertible to varchar. Otherwise, use CAST to explicitly convert character_expression. Examples This example returns the five right-most character of each author’s first name. USE pubs GO SELECT RIGHT(au_fname, 5) FROM authors ORDER BY au_fname REPLACE (T-SQL) Replaces all occurrences of the second given string expression in the first string expression with a third expression. Syntax REPLACE(‘string_expressionl)’,’string _expression2’,’string_expression3’) Arguments ‘string_expression1’

Is the string expression to search for string_expression2. string_expression1 can be of character or binary data.

‘string_expression2’

Is the string expression for which to search in string_expression1 and to replace with string_expression3. string_expression2 can be of character or binary data.

‘string_expression3’

Is the new string expression that replaces string_expression2 in string_expression1 string_expression3 can be of character or binary data.

Returns Types Return character data if string_expression (1, 2, or 3) is one of the supported character data types. Returns binary data if string_expression (1, 2, or 3) is one of the supported binary data types. Examples This example replaces the string cde in abcdefghi with xxx



Page 88 of 194

SELECT REPLACE(‘abscdefghicad’,’cde’,’xxx’) Here is the result set: -------------------------------------------------------------- abxxxfghixxx (1 row(s) affected) RTRIM (T-SQL) Returns a character string after removing all trailing blanks. Syntax RTRIM(character_expression) Arguments Character_expression

Is an expression of character data. Character_expression can be a constant, variable, or column of either character or binary data.

Example Declare @string_to_trim char(30) Set @string_to_trim= 'Welcome ' SELECT 'Here is the string without the leading spaces: '+ CHAR(13) + RTRIM(@string_to_trim) SPACE (T-SQL) Returns a string of repeated spaces Syntax SPACE(integer_expression ) SELECT RTRIM(au_lname) + ',' + SPACE(2) + LTRIM(au_fname) FROM authors ORDER BY au_lname, au_fname Here is the result set: Name ---------------------------------------------------------- Bennet, A Braham Blotchet-Halls, Reginald Carson, Chery Defrance, Michel Del Castillo, Innes Dull, Ann Green Marjorie Greene, Morningstr Gringlesby, Burt Hunter, Sheryl Karsen, Livia Locksley, Charlene Macfeather, Stearns



Page 89 of 194

Mcbadden, Heather O’leary, Michael Panteley, Sylvia Ringer, Albert Ringer, Anne Smith, Meander Straight, Dean Stringer, Dirk White, Johnson Yokomoto, Akiko (23 Row(S) Affected) STR (T_SQL) Returns character data converted from numeric data Syntax STR(float_expression[, length[,decimal]]) SELECT STR(123.45,6,1) Here is the result set: ------ 123.5 (1 row(s) affected) UPPER (T-SQL) Returns a character expression with lowercase character data converted to uppercase. Syntax UPPER(character_expression) Arguments character_expression

Is an expression of character data. character_expression can be a constant, variable, or column of either character or binary data.

Examples This example uses the UPPER and RTRIM functions to return the trimmed, uppercase author’s last name concatenated with the author’s first name. USE pubs SELECT UPPER(RTRIM(au_lname)) + ',' + au_fname AS Name FROM authors ORDER BY au_lname Here is the result set: Name ------------------------------------------------------------------



Page 90 of 194

BENNET, A braham BLOTCHET-Halls, Reginald CARSON, Chery DEFRANCE Michel DELCASTILLO, Innes DULL, Ann GREEN, Marjorie GREENE, Morningstr GRINGLESBY, Burt HUNTER, Sheryl KARSEN, Livia LOCKSLEY, Charlene MACFEATHER, Stearns MCBADDEN, Heather O’lLEARY, Michael PANTELEY, Sylvia RINGER, Albert RINGER, Anne SMITH, Meander STRAIGHT, Dean STRINGER, Dirk WHITE, Johnson YOKOMOTO, Akiko (23 row(S) affected) SUBSTRING (T-SQL) Returns part of a character, binary, text, or image expression. Syntax SUBSTRING(expression, start, length) SELECT au_lname, SUBSTRING(au_fname, 1, 1) FROM authors ORDER BY au_lname Here is the result set: au_lname ------------------------------------- Bennet A Blotchet-Halls R Carson C DeFrance M Del Castillo 1 STUFF (T-SQL) Deletes a specified length of character and inserts another set of character at a specified starting point Syntax STUFF(character_expression, start, length, character_expression) Examples This example returns a character string created by deleting three characters from the first string (abcdef) starting at position 2 (at b) and inserting the second string at the deletion point.



Page 91 of 194

SELECT STUFF('abcdef', 2, 3, 'ijklmn') GO Here is the result set: -------- aijklmnef (1 row(s) affected) REPLICATE (T-SOL) Repeats a character expression a specified number of times. Syntax REPLICATE(character_expression, integer_expression) Arguments character_exprssion

Is an alphanumeric expression of character data. character_expression can be a constant, variable, or column of either character or binary data.

integer_expression

Is a positive whole number. If integer_expression is negative, a null string is returned. Examples This example replicates each author’s first name twice. USE pubs SELECT REPLICATE(au_fname, 2) FROM authors ORDER BY au_fname Here is the result set: ---------------------------- A brahamA braham AkikoAkiko AlbertAlbert AnnAnn AnneAnne BurtBurt CharleneCharlene CherylCheryl DeanDean DirkDirk HeatherHeather InnesInnes JohnsonJohnson LiviaLivia MarjorieMarjorie MeanderMeander MichaelMichael MichelMichel MorningstrMorningstr ReginaldReginald SherylSheryl



Page 92 of 194

StearnsStearnsSylviaSylvia (23 row(s) affected) System Functions System functions provide a method of querying the system tables of SQL Server. These functions are used to access SQL Server, databases or user-related information.

Function Definition

HOST_ID() Returns the current host process ID number of a client process

HOST_NAME() Returns the current host computer name of a client process

SUSER_ID([‘login_name’]) Returns the user’s SQL Server ID number SUSER_NAME([server_user_id]) Returns the user’s SQL Server login name USER_ID([‘name_in_db’]) Returns the user’s ID number in the Database. DB_ID([db_name]) Returns the database ID number of the database DB_NAME ([db_id]) Returns the database name OBJECT_ID(‘objname’) Returns the database object ID number OBJECT_NAME(‘obj_id’) Returns the database object name

Example: -- To Retrieve the user_id of user SET select user_id('SET') Output: ------ 5 Example : -- To Retrieve the name and ID of the current database select 'Current Database is '+ db_name()+' whose id is '+ convert(char(5),db_id(db_name())) Output : -------------------------------------------------- Current Database is pubs whose id is 200



Page 93 of 194

Summary

• SQL Server has a number of in-built functions • Aggregate functions are used to summarize data. • Date and Time functions are used to manipulate date values and perform date

arithmetic • String Functions are used to manipulate a character expression • System Functions are used to retrieve system level information about the user

or the SQL Server



Page 94 of 194

REVIEW QUESTIONS

1. Display the current date and time using the built-in functions 2. Display the square root of 9 3. Display 15 characters starting from the 5th character from the given string

“Life is all about Imagineering”



Page 95 of 194

RDBMS Concepts using SQL Server 2000 Chapter 7: Constraints

Objectives

• What is Constraint?

• Types of constraints

• Understand Rule and Defaults



Page 96 of 194

Microsoft SQL Server allows you to specify constraints of various types on tables, so that the database engine itself can handle various types of validations automatically. For example, you can create a primary key constraint on a table to ensure uniqueness of key values. Similarly, you can use check constraint to limit the possible values for field or a group of fields. Constraints can be specified in two ways: column constraint, or table constraint. A column constraint is linked to specify column in a table, while a table constraint can refer to one or more columns within a table. In a CREATE TABLE statement, a column constraint is specified along with the name of the column that refers to; while a table constraint is given after all the column specifications have been given. Constraints can be explicitly named by you, or may be kept unnamed, in which case SQL Server assigns a unique name to the constraint. NOT NULL Constraints A NOT NULL constraint on a column is used to tell the database that a value must always be specified for that column; it cannot be null. For example, create table a ( b char (1) not null) If you want to allow a null value for a column, you must use the NULL clause instead. For example: create table a2 ( b char (1) null) The default in SQL Server is NOT NULL, though the ANSI standard is NULL. Primary Key Constraints A primary key constraint specifies that a field or group of fields in a table should have unique, non-null values. For example, create table dept ( dept_code char (4) primary key, dept_name varchar(10) ) In this case we have specified PRIMARY KEY as an unnamed column constraint for the table dept. We can also specify a named column constraint for the same purpose, as follows: create table dept ( dept_code char (4) constraint dept_pk primary key, dept_name char (2)) Alternatively, we can specify a named table constraint, as follows: create table dept ( dept_code char (4), dept_name char (2), constraint dept_pk primary key (dept_code) )



Page 97 of 194

The last format is specifically useful if you have a combination of two or more fields to form a primary key, such as; create table invoice ( invoice_year numeric (4), invoice_numbernumeric (5), customer_code char (4), -- More columns to be added here constraint invoice_pk primary key invoice_year, invoice_number)) SQL Server automatically creates a unique Index for the primary key expression for a table. To view the list of constraints on table, you can give the following command: exec sp_helpconstraint dept Unique Constraints A unique constraint is similar to a primary key constraint: values for the fields comprising a unique constraint must be unique within a table, and an index on the unique constraint expression is automatically created by the system. However, there is one small difference: a primary key expression cannot have a null value, a unique constraint expression can. Example: create table employee ( emp_code int, emp_name varchar (30), passport_number char (20), constraint employee_unique unique (passport_number)) Foreign Key Constraints In a typical database, various tables have some logical relationship between themselves; a database is not just a group of unrelated tables. In most such databases, certain referential integrity relationship must exist between tables; otherwise logically the data would be corrupt. To illustrate the concept of referential integrity, let us consider a department table, and an employee table. The department table would typically consist of a department code, department code name, etc. The employee table would have, among other fields, a department code. Now, if the employee table contains a department code, which does not exist in the department table, it would be a violation of the referential integrity. The department code field in the employee table should refer to the department code field in the department table SQL Server enforces referential integrity through the use of foreign key constraint. For example, create table dept ( dept_code char (4), dept_name char (2), constraint dept_pk primary key (dept_code) ) create table employee (



Page 98 of 194

emp_code int, emp_name char (30), dept_code char (4), constraint employee_fk01 foreign key (dept_code) references dept (dept_code) ) In such a situation dept table would be considered as a master table, while the employee table would be considered a transaction table. To specify a foreign key constraint, there must be a primary key or a unique constraint on the master table, on the reference expression. However, the transaction table need not have an Index on the foreign key expression of the transaction table also. Matching column names and datatypes you don’t have to use identical column names in tables involved in a foreign key reference, but it’s often good practice. The cust_id and location_num column names are defined in the customer table. The orders tables, which reference the customer table, use the name cust_num and cust_loc. Although the column names are related columns can differ, the datatypes of the related column must be identical, except for nullability and variable-length attributes. (For example, a column of char (10) NOT NULL can reference one of varchar (10) NULL, but it can’t reference a column of char(12) NOT NULL. A column of type smallint can’t reference a column of type int.) Check Constraints A check constraint implement domain integrity, i.e, rules that govern what the valid entries are in a table. For example, in an employee table, you may not want the employee age to be less than 18 years or more than 65 years. This can be specified as follows: create table employee ( emp_code int, age tinyint, … constraint employee_ck01 check (age between 18 and 65) ) Default Constraints A default constraint allows you to specify a default value for a column if no value is specified for that column while inserting a new row. For example, create table employee ( emp_code int, … sex char (1) default ‘f’ ) Restrictions on Dropping Tables If you’re dropping tables, you must drop all the referencing tables, or drop the referencing FOREIGN KEY constraints before dropping the referenced table. For example, in the preceding example’s orders,



Page 99 of 194

customer_location, and master_customer tables, the following sequence of DROP statements fails because a table being dropped is referenced by a table that still exists—that is, customer_location can’t be dropped because the orders table reference it, and orders isn’t dropped because the orders table references it, and orders isn’t dropped until later: DROP TABLE customer_location DROP TABLE customer master_location DROP TABLE master_customer Changing the sequence to the following works fine because orders are dropped first: DROP TABLE orders DROP TABLE customer_location DROP TABLE master_customer When two tables reference each other, you must first drop the constraints or set them to NOCHECK (both operations use ALTER TABLE) before the tables can be dropped. Similarly, a table that’s being referenced can’t be part of a TRUNCATE TABLE command. You must drop or disable the constraint, or you must simply drop and rebuild. You must drop or disable the constraint, or you must simply drop and rebuild the table. ALTER TABLE EMPLOYEE DROP CONSTRAINT employee_fk01 DROP TABLE DEPT Self-Referencing Tables A table can be self-referencing—that is, the foreign key can reference one or more columns in the same table. The following example shows an employee table, in which a column for manager references another employee entry: CREATE TABLE employee ( emp_id int NOT NULL PRIMARY KEY, emp_name varchar (30) NOT NULL REFERENCES employee(emp_id) ) The employee table is a perfectly reasonable table. It illustrates most of the issues we’ve discussed. However, in this case, a single INSERT command that satisfies the reference is legal. For example, if the CEO of the company has an emp_id of 1 and is also his own manager, the following INSERT will be allowed and be a useful way to insert the first row in a self-referencing table’,1) INSERT employee VALUES (1,’Chris Smith’,1) Although SQL Server doesn’t currently provide a deferred option for constraints, self-referencing tables add a twist that sometimes makes SQL Server use deferred operations internally. Consider the case of a nonqualified DELETE statement that dates many rows in the table. After all rows are ultimately deleted, you can assume that no constraint violation would occur. However, while some rows are deleted internally and other remains during the delete operation, violations would occur because some of the referencing rows would be orphaned before they were actually deleted. SQL Server handles such interim violations automatically and without any user intervention. As long as self-referencing constraints are valid at the end of the data modification statement, no errors are raised during processing. To gracefully handle these interim violations, however, additional processing and worktables are required to hold the work-in-progress. This adds substantial overhead and can also limit the actual number of foreign keys that can be used. An UPDATE statement can also cause an interim violation. For



Page 100 of 194

example, if all employee numbers are to be changed by multiplying each by 1000, the following UPDATE statement would require worktables to avoid the possibility of raising an error on an interim violation: UPDATE employee SET emp_id=emp_id* 1000, mgr_id=mgr_id* 1000 The additional worktables and the processing needed to handle the worktables are made part of the execution plan. Therefore, it the optimizer sees that a data modification statement could cause an interim violation, the additional temporary worktable will be created, even if no such interim violations ever actually occur. These extra steps are needed only in the following situations • A table is self-referencing (it has a FOREIGN KEY constraint that refers back to itself). • A single data modification statement (UPDATE, DELETE, or INSERT based on a SELECT) is performed

and can affect more than one row. (The optimizer can’t determine a priori, based on the WHERE clause and unique Indexes whether more than one row could be affected.) Multiple data modification statements within the transaction don’t apply—this condition must be a single statement that affects multiple rows.

• Both the referencing and referenced columns are affected (which is always the case for DELETE and INSERT operations, but might or might not be the case for UPDATE).

If a data modification statement in your application meets the above criteria, you be sure that SQL Server is automatically using a limited and special-purpose form of deferred constraints to protect against interim violations. The Order of Integrity Checks The modification of a given row will fail if any constraint is violated or if a trigger aborts the operation. As soon as a failure in a constraint occurs, the operation is aborted, subsequent checks for that row aren’t performed, and no trigger fires for the row. Hence, the order of these checks can be important, as the following list shows. 1. Defaults are applied as appropriate. 2. NOT NULL violations are raised. 3. CHECK constraints are evaluated. 4. FOREIGN KEY checks of referencing tables are applied. 5. FOREIGN KEY checks of referenced tables are applied. 6. UNIQUE/PRIMARY KEY is checked for correctness. 7. Trigger fire. CREATE DEFAULT Create default keyword creates an object called default. When bound to a column or a user-defined data type, a default specifies a value to be inserted into the column to which the object is bound (or into all columns, in the case of a user-defined data type) when no value is explicitly supplied during an insert. Defaults, a backward compatibility feature, perform some of the same functions as default definitions created using the DEFAULT keyword of ALTER or CREATE TABLE statements Syntax CREATE DEFAULT default AS constant_expression Arguments default



Page 101 of 194

Is the name of the default. Default names must conform to the rules for identifiers. Specifying the default owner name optional. constant_expression Is an expression that contains only constant values (it cannot include the names of any columns or other database objects). Any constant, built-in function, or mathematical expression can be used. Enclose character and date constants in single quotation mark (‘); monetary, integer, and floating-point constants do not require quotation mark. Binary data must be preceded by (x), and monetary data must be preceded by a dollar sign ($). The default value must be compatible with the data type of the column. Remarks A default can be created only in the current database. Within a database, default names must be unique by owner. After a default has been created, use sp_bindefault to bind it to a column or to a user-defined data type. If the default is not compatible with the column to which it is bound, Microsoft® SQL SeverTM generates an error message when trying to insert the default value. For example, N/A cannot be used as a default for a numeric column. If the default value is too long for the column to which it is bound, the value is truncated. CREATE DEFAULT statement cannot be combined with other Transact-SQL statements in a single batch. A default must be dropped before creating a new one of the same name, and the default must be unbound by executing sp_unbindefault before it is dropped. If a column has both a default and a rule associated with it, the default value must not violate the rule. A default that conflicts with a rule is never inserted, and SQL Server generates an error message each time it attempt to insert the default. After bound to a column, a default value is inserted when: . A value is not explicitly inserted. . Either the DEFAULT VALUES or DEFAULT keywords are used with INSERT to insert default values. For more information and examples, see INSERT. If NOT NULL is specified when creating a column and a default is not created for it, an error massage is generated whenever a user fails to make an entry in that column. This table illustrates the relationship between the existence of a default and the definition of a column as NULL or NOT NULL. The entries in the table show the result. Examples Create a simple character default This example creates a character default of unknown CREATE DEFAULT phonedflt AS ‘unknown’ Binds a default to a table column This example binds the default created in example A. The default takes effect only if there is no entry in the phone column of the authors table. Note that no entry is not the same as an explicit null value. Note Because a default named phoneedflt does not exist, the following Transact-SQL statement fails. This example is for illustration only.



Page 102 of 194

sp_bindefault phonedflt,’authors.phone’ CREATE RULE Creates an object called a rule. When bound to a column or a user-defined data type, a rule specifies the acceptable values that can be inserted into that column. Rules, a backward compatibility feature, perform some of the same functions as check constraint. CHECK constraints, created using the CHECK keyword of ALTER or CREATE TABLE, are the preferred, standard way to restrict the values in a column (multiple constraints can be defined on a column or multiple columns). A column or user-defined data type can have only one rule bound to it. However, a column can have both a rule and one or more check constraints associated with it. When this is true, all restrictions are evaluated. Syntax CREATE RULE rule AS condition_expression Arguments rule –

is the name of the new rule. Rule name must conform to the rules for identifiers condition_expression –

is the condition(s) defining the rule. A rule can be any expression that is valid in a WHERE clause and can include such element as arithmetic operators, relational operators, and predicates (for example, IN, LIKE, BETWEEN). A rule cannot reference column or other database objects. Built-in functions that do not reference database objects can be included.

Please Note : The CREATE RULE statement cannot be combined with other Transact-SQL statement in a single batch. Rules do not apply to data already existing in the database at the time the rules are created, and rules cannot be bound to system data types. A rule can be created only in the current database. After creating a rule, execute sp_bindrule to bind the rule to column or to a user-defined data type. The rule must be compatible with the data type of the column. A rule cannot be bound to a text, image, or timestamp column. Be sure to enclose character and date constants with single quotation marks (‘) and to precede binary constants with 0x. For example, “@value LIKE A%” cannot be used as a rule for a numeric column. If the rule is not compatible with the column to which it has been bound, SQL Server returns an error message when inserting value (not when the rule is bound). A rule bound to a user-defined data type is activated only when you attempted to insert a value into or to update a database column of the user-defined data type. Because rules do not test variables, do not assign a value to a user-defined data type variable that would be reject by a rule bound to a column of the same data type. You can bind a new rule to a column or data type without unbinding the previous one; the new rule overrides the previous one. Rules bound to column always take precedence over rules bound to user-defined data types. Binding a rule to a column replaces a rule already bound to the user-defined data type of that column. But binding a rule to data type does not replace a rule bound to a column of that user-defined data type. The table shows the precedence in effect when binding rules to columns and to user-defined data types where rules already exist. Old rule bound to New rule bound to User-defined data type Column



Page 103 of 194

User-defined data type Old rule replaced No change Column Old rule replaced Old rule replaced Example CREATE RULE check_emp_id as @value LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]' SP_BINDRULE check_emp_id, empid This example creates a rule that restricts the actual values entered into the column(s) to which this rule is bound to only those listed in the rule. CREATE RULE list_rule AS @list IN (‘1389’,’0736’,’0877’) Rule with a pattern This example creates a rule to follows a pattern of any two characters followed by a hyphen, any number of characters (or no characters), and ending with an integer between 0 and 9. CREATE RULE pattern_rule AS @value LIKE’_ _-%[0-9]’ sp_bindrule This stored procedure is used to binds a rule to a column or to a user-defined data type. EXEC sp_bindrule ‘check_emp_id’, ‘employee.[emp_id]’ Drop a rule To drop a rule you have to first unbind the rule from the existing column and drop it. Syntax to drop a rule DROP RULE rule_name To unbind a rule SP_UNBINDRULE ‘Table_name.column_name’



Page 104 of 194

REVIEW QUESTIONS

Summary

• Data Integrity helps to maintain reliability, completeness, accuracy of data • Data Integrity can be classified into

o Entity integrity o Domain integrity o Referential integrity o User defined integrity

• Constraints are rules that can be specified at either the table-level or the column-level

• Default and rules are objects that are bound to columns or user-defined datatypes for specifying the default values and restricted values respectively

• A default is created using the CREATE DEFAULT statement and are bound to the column and user-defined datatypes using the sp_bindefault procedure

• A default is unbound using the sp_unbindefault procedure • A rule is created using the CREATE RULE statement, and bound to the column

and user-defined datatypes using the sp_bindrule procedure. • A rule is unbound using the sp_unbindrule procedure



Page 105 of 194

1. Create the following tables : Department Attribute Data Type Nullability Constraints

Dept_code char(5) Not NULL Primary Key Dept_name varchar(30) Not NULL - Dept_budget int Not NULL Should be greater than

0 Dept_base varchar(200) NULL Employee Attribute Data Type Nullability Constraints

Emp_code char(5) Not NULL Primary Key Emp_name varchar(30) Not NULL - DOB datetime Not NULL There should be a

difference of 21 years between the current date and the DOB

DOJ datetime NULL Should be the current date

Dept_code Char(5) NOT NULL Should have a reference to the Department tables, dept_code attribute

Emp_addr varchar(200) Not NULL - Emp_city Varchar(25) Not NULL Default value to be set

to Mumbai Phone Char(10) NULL -

3. What is the difference between a rule and Check constraints?



Page 106 of 194

RDBMS Concepts using SQL Server 2000 Chapter 8: Subqueries and joins

Objectives

• Understand the need for Joins

• Types of Joins



Page 107 of 194

SUB QUERIES A sub query can be defined as a group of nested SELECT statement inside a SELECT, INSERT, UPDATE, or DELETE statement. it can be used to retrieve data from multiple tables and as an alternative to a join. It can also be used inside the WHERE or HAVING clause of outer SELECT or HAVING clause of the outer SELECT, INSERT, UPDATE, and DELETE statements. The SELECT statements that contain one or mire sub queries are called nested queries. Syntax (SELECT [ALL | DISTINCT] subquery_select_list [FROM {table_name2 | view_name2}] […, {view_name}]] [WHERE clause] [GROUP BY clause] [HAVING clause]) A subquery must be enclosed within parentheses and cannot use the ORDER BY, COMPUTE BY or FOR BROWSE clause. SQL Server does not implement any restriction on the level of nesting while using sub queries. It does impose restrictions on the number of tables or views used in a subquery or a join. SQL Server evaluates the inner query first and returns the result to the outer query for the final result set. The outer query always depends on the evaluation result of the subquery. Sub queries can be divided into three categories depending upon the values they returns: Subqueries that operate on lists: this type of query returns single column-multiple value results and are implemented using the IN clause. The syntax is as follows. WHERE expression [NOT] IN (subquery) Subqueries are introduced with an unmodified comparison operator: this type of query returns single column-single value results for outer query evaluation and is implemented using unmodified comparison operators (operators without the ANY or ALL keywords). The syntax is as follows. WHERE expression comparsion_operator [ANY|ALL] (subquery) Subqueries that check for the existence of data: this type of query checks for the existence of records in a table that is used in the inner query, and returns either a TRUE or FALSE value based on the existence of data. it is implemented using the EXISTS keyword. The syntax is as follows: WHERE [NOT] EXISTS (subquery) Subqueries With IN A Subquery Introduced with IN returns zero or more values. Consider the following example in which all the authors Id from the Title Author table are displayed for all those whose looks are sold: Example SELECT Au_Id FROM TitleAuthor WHERE Title_Id IN (SELECT Title_Id FROM Sales) SQL Server returns a list of all the title IDs to the main query from the Sales table, and list all the authors whose books are sold in the result set. Consider the following example in which the server returns a list of valid publisher IDs to the main query and then determines whether each publisher’s Publid is in that list:



Page 108 of 194

Example SELECT Publisher = Pub_Name

FROM Publishers WHERE Pub_Id IN (SELECT Pub_Id FROM Titles WHERE Type = 'business')

Result Publisher ------------------------- NEW Moon Books Algodata Infosystems (2 row (s) affected) Example Consider another subquery with the IN clause: SELECT Type, Average = AVG (Ytd_Sales) FROM Titles WHERE Type IN (SELECT Type FROM Titles WHERE Title = 'The Busy Executive’s Database Guide') Group By Type Result Type Average ----------------------- ----------- Business 7697 Psychology 1987 (2 row (s) affected The NOT IN clause is used in the same way as the IN clause. Example : SELECT Publisher = Pub_Name FROM Publishers WHERE Pub_id IN (SELECT Pub_id FROM Titles WHERE Type = 'mod_cook') The above query returns all the publishers ID and names for the publishers who have published a book, but not of the mod_cook type. Subqueries With EXISTS A subquery, when used with the EXISTS clause, always returns data in terms of a TRUE or FALSE value. It checks for the existence of data row according to the condition specified in the inner query and passes the existence status to the outer query to produce the result set. The subquery returns a TRUE value if a subquery contains any row. The query introduced with the EXISTS keywords differ from other queries. The EXISTS keyword is not preceded by any column name, constant or other expression, and it contains an asterisk (*) in the SELECT list. Consider the following example that displays the publishers’ names: Example



Page 109 of 194

SELECT Publisher = Pub_Name FROM Publishers WHERE EXISTS (SELECT Pub_Id FROM Titles WHERE Type = ‘business’) Consider another example of a subquery with the EXISTS clause: Example SELECT Publisher = Pub_Name FROM Publishers

WHERE EXISTS (SELECT * FROM Publishers WHERE City = ‘Paris’) Result Publisher --------------------------- New Moon Books Binnet & Hardley Algodata Infosystems Algodata Infosystems Five Lakes Publishing Ramona Publishers GGG&G Scootney Books Lucerne Publishing (8 row (s) affected) Aggregate functions can also be used in subqueries. Consider the following example which display the titles of all those books for which the advance amount is more than the average advance for business-related books. Example SELECT Title FROM Titles WHERE Advance > (SELECT AVG (Advance) FROM Titles WHERE Type = 'buisness') Subquery Restrictions SQL Server restricts the use of certain methods and techniques, and forces the implementation of certain standards while using subsequires. The restrictions imposed are:

1. The column list of the SELECT statement of a subquery introduced with the comparison operator can include only one column.

2. The column used in the WHERE clause of the outer query should be compatible with the column used in the SELECT list of the inner query.

3. The DISTINCT keyword cannot be used with subqueries that include the GROUP BY clause.



Page 110 of 194

4. The ORDER BY clause, the GROUP BU clause and the INTO keywords cannot be used in a subquery because a subquery cannot manipulate its result internally.

5. A view created with a subquery cannot be updated. Nested Subqueries A subquery can itself contain one or more subqueries. There is no restriction on the number of subqueries you can include with the SELECT, INSERT, UPDATE or DELETE statement. Consider the following example, with display the name of the author book Net Etiquette. Example SELECT 'Author Name'=SUBSTRING (Au_Fname, 1, 1) + '.' + Au_Lname FROM Authors WHERE Au_Id IN (SELECT Au_Id FROM TitleAuthor WHERE Title_Id = (SELECT Title_Id FROM Titles WHERE Title = 'Net Etiquette')) Author Name ------------------ C. Locksley (1 row (s) affected Consider another example, which display the name of all the authors who have written books of the type, business. Example SELECT Name = SUBSTRING (Au_Fname, 1, 1) + '.'+ Au_Lname FROM Authors WHERE Au_Id IN (SELECT Au_Id FROM TitleAuthor WHERE Title_Id in (SELECT Title_Id FROM Titles WHERE type = 'business')) Result Name ------------------------ M. Green M. O’ Leary D. Straight A. Bennet S. MaceFeather (5 row (s) affected)

Correlated Subqueries

A correlated subquery can be defined as a query that depends on the outer query for its evaluation. In a correlated subquery, the WHERE clause references a table in the FROM clause of the



Page 111 of 194

outer query. In the case of correlated subqueries, the inner query is evaluated for each row of the table specified in the outer query. SQL Server demands alias for the tables if the correlated subquery is implemented using the same table. Consider the following example which displays, from the Titles table, all the titles of books along with their type that have an advance value greater than the average advance value for books of the type: Example USE pubs SELECT DISTINCT t1.type FROM titles t1 WHERE t1.type IN (SELECT t2.type FROM titles t2 WHERE t1.pub_id <> t2.pub_id) Here is the result set: type ---------- business psychology (2 row(s) affected) Consider another example, which uses a comparison operator to find the sales details when the quantity is less than the average quantity of the sale for that title. Example SELECT a.Stor_Id, a.Ord_Date, Quantity = a.Qty, a.Title_Id FROM Sales a WHERE a.Qty < (SELECT AVG (b.Qty)FROM Sales b WHERE b.Title_Id=a.Title_id) Result Stor_ID Ord_Date Quantity Title_Id --------------- -------------------------- ------------ ------------ 6380 SEP 14 1994 12:00 AM 5 BU1032 6380 SEP 13 1994 12:00 AM 3 PS2091 7067 SEP 14 1994 12:00 AM 10 PS2091 7131 SEP 14 1994 12:00 AM 20 PS2091 8042 SEP 14 1994 12:00 AM 5 MC3021 (5 row (s) affected) In the above example, SQL Server evaluates the outer query by selecting each row one by one. The subquery calculates the average quantity for each sales being considered for selection in the outer query. For each of the selected rows in the outer query, SQL Server evaluates the inner query and includes the record being considered in the result, if the quantity is less than the calculate average.



Page 112 of 194

Queries with Modified Comparison Operators

SQL Server provides the ALL and ANY keywords that can be used to modify the existing comparison operator. The subquery introduced with the modified comparison operator returns zero or more values and can be implemented using the GROUP BY or HAVING clauses. Operator Description

>ALL Means greater than the maximum value in the list. The expression | column_name > ALL (10, 20, 30) means ‘greater than 30’

>ANY Means greater than the maximum value in the list. The expression | column_name > ANY (10, 20, 30) means ‘greater than 10’

=ANY Means any of the values in the list. It acts in the same way as the IN clause. The expression | column_name = ANY (10, 20, 30) means ‘equal to either 10 or 20 or 30’

<>ANY Means not equal to any in the list. The expression | column_name <> ANY (10, 20, 30) means ‘not equal to 10 or 20 or 30’

<>ALL Means not equal to all the values in the list. It acts in the same way as the NOT IN clause. The expression | column_name <>ALL (10, 20, 30) means ‘not equal to 10 or 20 or 30’

Example Description SELECT Title_Id, Title FROM Titles WHERE price > ALL (SELECT price FROM Titles WHERE Pub_Id=’o736’)

List all the titles along with their tile IDs from the titles table where price is greater than the maximum price of books published by the publisher with publisher ID 0736.

SELECT Title_Id, Title FROM Titles WHERE price > ANY (SELECT price FROM Titles WHERE Pub_Id=’o736’)

List all the titles along with their tile IDs from the titles table where price is greater than the minimum price of books published by the publisher with publisher ID 0736.

SELECT Publisher_ID = Pub_Id, Name = Pub_Name FROM Publishers WHERE City = ANY (SELECT City FROM authors)

Lists all the publishers where city is the same as of any author.

JOINS

Join conditions can be specified in either the FROM or WHERE clauses; specifying them in the FROM clause is recommended. WHERE and HAVING clauses can also contain search conditions to further filter the rows selected by the join conditions. Joins can be categorized as:

1. Inner joins (the typical join operation, which uses some comparison operator like = or <>). These include equi-joins and natural joins. Inner joins use a comparison operator to match rows from two tables based on the values in common columns from each table. For example, retrieving all rows where the student identification number is the same in both the students and courses tables.



Page 113 of 194

2. Outer joins. Outer joins can be a left, right, or full outer join.

Outer joins are specified with one of the following sets of keywords when they are specified in the FROM clause:

a. LEFT OUTER JOIN

The result set of a left outer join includes all the rows from the left table specified in the LEFT OUTER clause, not just the ones in which the joined columns match. When a row in the left table has no matching rows in the right table, the associated result set row contains null values for all select list columns coming from the right table.

b. RIGHT OUTER JOIN

A right outer join is the reverse of a left outer join. All rows from the right table are returned. Null values are returned for the left table any time a right table row has no matching row in the left table.

c. FULL OUTER JOIN

A full outer returns all rows in both the left and right tables. Any time a row has no match in the outer table, the select list columns from the other table contain null values. When there is a match between the tables, the entire result set contain data values from the base tables.

3. Cross joins.

Cross joins return all rows from the left table; each row from the left table is combined with all rows from the right table. Cross joins are also called Cartesian products.

CROSS JOIN A join that includes more than one table without any condition in the ON clause is called a cross join. The output of such joins is called a Cartesian product. Example If a join is performed on a table named ’Titles’ that has 18 rows and another named ‘publisher’ with 8 rows, the result will be 144 rows, the Cartesian product of the two tables. SELECT Title FROM Titles CROSS JOIN publishers

Natural Join A join that restrict the redundant data from the result set is known as a natural join. It is implemented by specifying the various column names of the tables in the SELECT list. Consider the following example in which the title name and the publisher of the book are selected from the Title and Publishers tables and displayed. Example SELECT t.Title, p.Pub_Name FROM Titles t JOIN Publishers P ON t.pub_id = p.pub_id



Page 114 of 194

Equi Join A joins that uses an asterisk (*) sign the SELECT list and display redundant column data in the result set is termed as an equi join. An equi join display redundant column data in the result set, where two or more tables are compared for equality. Example SELECT * FROM Sales s JOIN Titles t ON t.Title_id = t.title_id JOIN Publishers p ON t.pub_id = p.pub_id The output produced by the above query result in redundant column data from the three tables. Self Join A join is said to be a self-join when one row in a table correlates with other rows in the same table. Since the same table is used twice for comparison, an alias name differentiates the two copies of the table. All join operators except the outer join operators can be used in a self-join. Consider the following example in which you want to list all the titles of books from the business category that have the same publishers. Example Consider the following example in which you want to list all the titles of books from the business category that have the same publishers. SELECT t1.Title_id, t1.Pub_Id, t2.Title_id, t2.pub_id FROM Titles t1 JOIN Titles t2 ON t1.Pub_id = t2.pub_id WHERE t1.type = ‘business’ The above query will result in a duplicate result set that can be eliminated by modifying the query in the following way.

SELECT t1. Title_id, t1.pub_id, t2.title_id, t2.pub_id FROM Titles t1 JOIN Titles t2 ON t1.pub_id = t2.pub_id WHERE t1.type = ‘business’ AND t1.Title_id < t2.Title_id Consider the next example in which the query extracts, from the Authors table, all the names of authors who live in the same city. Example SELECT a1.Au_Fname, a1.Au_Lname, a2.Au_Fname, a2.Au_Lname FROM Authors a1 JOIN Authors a2 ON a1.City=a2.City WHERE a1.City=’Oakland’ AND a1.au_id < a2.au_id Observe the use of the last condition in the WHERE clause of the SELECT statement the eliminates the rows in which the author is identical from the result set.



Page 115 of 194

Outer Join A join can be termed an outer join when the result set contains all the rows from one table and the matching rows from another. An outer join eliminates the information contained in the row that does not match the condition for only one of the tables. Syntax SELECT column_name, column_name [,column_name] FROM table_name [LEFT| RIGHT | FULL] OUTER JOIN table_name ON table_name.ref_column_name join_operator table_name.ref_column_name Consider the following example in which all the publishers are displayed irrespective of whether an author is located in the same city or not. Example SELECT pub_name, au_lname, au_fname FRON Publishers p LEFT OUTER JOIN Authors a ON p.City=a.City LEFT OUTER JOIN ensure the inclusion of all the rows from the first table and the matching rows from the second table. Result PubName AuLname AuFname ___________________ -------------- ----------------- New Moon Books (null) (null) Binnet & Hardley (null) (null) Algodata Infosystems Carson Cheryl Algodata Infosystems Bennet Abraham Five Lakes Publishing (null) (null) Ramona Publishers (null) (null) GGG&G (null) (null) Scootney Books (null) (null) Lucerne Publishing (null) (null) Observe that the publishers name from the Publishers table is displayed, and only the matching records in which the publisher’s city and the author’s city are the same are selected from the Authors table and displayed. An outer join can be combined with additional conditions. Consider the following example where all the titles are displayed with the store ID those stores in which the year of order is 1999. Example SELECT s.Stor_Id, s.Ord_Date, t.Title_Id FROM Sales s RIGHT OUTER JOIN Titles t ON s.Title_Id = t.Title_Id AND DATENAME (yy, s.Ord_Date) = '1999' RIGHT OUTER JOIN ensures the inclusion of all the rows from the second table and matching rows from the first table. Rules for Outer Join



Page 116 of 194

An outer join returns all the records from the outer table and includes those records that do not match the rows in the related tables. An outer join is possible only between two tables. The IS NULL search condition must be avoided for the inner table. QUERY 1 SELECT 'Author'=RTRIM(au_lname) + ',' +au_fname, 'Title'=title FROM (titleauthor AS TA RIGHT OUTER JOIN authors AS A ON (A.au_id=TA.au_id)) FULL OUTER JOIN titles AS T ON (TA.title_id=T.title_id) WHERE A.state<>'CA' or A.state is NULL ORDER BY 1 Author Title ---------------------------- ----------------------------- NULL The Psychology of Computer Cooking Blotchet-Halls, Reginald Fifty Years in Buckingham Palace Kitchens DeFrance, Michel The Gourmet Microwave del Castillo, Innes Silicon Valley Gastronomic Treats Greene, Morningstr NULL Panteley, Sylvia Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean Ringer, Albert Is Anger the Enemy? Ringer, Albert Life Without Fear Ringer, Albert The Gourmet Microwave Ringer, Albert Is Anger the Enemy? Smith, Meander NULL This query preserves both authors with no matching titles and titles with no matching authors. This might not be obvious, because RIGHT OUTER JOIN is clearly different than FULL OUTER JOIN. QUERY 2 SELECT 'Author'=au_lname + ','+ au_fname, 'Title'=title FROM ((titleauthor AS TA FULL OUTER JOIN titles AS T ON TA.title_id=T.title_id) RIGHT OUTER JOIN authors AS A ON A.au_id=TA.au_id) WHERE A.state <>'CA' or A.state is NULL ORDER BY 1 Author Title -------------------------- ------------------------ Blotchet-Halls, Reginald Fifty Years in Buckingham Palace Kitchens DeFrance, Michel The Gourmet Microwave del Castillo, Innes Silicon Valley Gastronomic Treats



Page 117 of 194

Greene, Morningstr NULL Panteley, Sylvia Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean Ringer, Albert Is Anger the Enemy? Ringer, Albert Life Without Fear Ringer, Albert The Gourmet Microwave Ringer, Albert Is Anger the Enemy? Smith, Meander NULL At a glance, Query 2 looks equivalent to Query 1, although the join order is slightly different. But notice how the results differ. This query didn’t achieve the goal of preserving the titles rows without corresponding authors, and the row for the Psychology of Computer Cooking is again excluded. This row would have been preserved in the first join operation: FULL OUTER JOIN titles AS T ON TA.title_id=Ttitle_id BUT then this row is discarded, because the second join operation, RIGHT OUTER JOIN authors AS A ON A.au_id=TA.au_id Preserves only authors without matching titles. Because the title row for The Psychology of Computer Cooking is on the lefthand side of this join operation and only a RIGHT OUTER JOIN operation is specified, this is discarded.



Page 118 of 194

Summary

• A subquery can be defined as a nested SELECT statement inside a SELECT, INSERT, UPDATE or DELETE statement

• A sub-query can be used inside the WHERE or HAVING clauses of the outer DELECT, INSERT, UPDATE or DELETE statement

• The subquery introduced with IN or NOT IN returns zero or more values • The subquery used with the EXISTS clause returns the data in terms of TRUE or

FALSE values • A subquery can itself contain one or more subqueries • A correlated subquery can be defined as a query that depends on the outer

query for its evaluation • A join that includes more than one table without any condition in the WHERE

clause is called as a CROSS JOIN • A Join which is between the same table is called as a SELF JOIN • A join can be termed as a OUTER JOIN when the result set contains all the rows

from one table and the matching rows from another



Page 119 of 194

REVIEW QUESTIONS

1. Can a subquery be placed in the FROM clause? 2. What is the output of a cross join? 3. What are the advantages of using Sub-Queries 4. Define an OUTER JOIN 5. Differentiate between a FULL OUTER JOIN and a CROSS JOIN



Page 120 of 194

RDBMS Concepts using SQL Server 2000 Chapter 9: Indexes

Objectives

• Understand the concept of Indexes

• Types of Indexes

• Fillfactor option



Page 121 of 194

Introduction Indexes in databases are similar to indexes in books. In a book, an index allows you to find information quickly without reading the entire book. In a database, an index allows the database program to find data in a table without scanning the entire table. An index in a book is a list of words with the page numbers that contain each word. An index in a database is a list of values in a table with the storage locations of rows in the table that contain each value. Indexes can be created on either a single column or a combination of columns in a table and are implemented in the form of B-trees. An index contains an entry with one or more columns (the search key) from each row in a table. A B-tree is sorted on the search key, and can be searched efficiently on any leading subset of the search key. For example, an index on columns A, B, C can be searched efficiently on A, on A, B, and A, B, C. Most books contain one general index of words, names, places, and so on. Databases contain individual indexes for selected types or columns of data: this is similar to a book that contains one index for names of people and another index for places. When you create a database and tune it for performance, you should create indexes for the columns used in queries to find data. The performance benefits of indexes, however, do come with a cost. Tables with indexes require more storage space in the database. Also, commands that insert, update, or delete data can take longer and require more processing time to maintain the indexes. When you design and create indexes, you should ensure that the performance benefits outweigh the extra cost in storage space and processing resources. Index Architecture Objects in a SQL Server database are stored as a collection of 8-KB pages. The data rows for each table are stored in a collection of 8-KB data pages.

Each data page has a 96-byte header containing system information such as the identifier (ID) of the table that owns the page. The page header also includes pointers to the next and previous pages that are used if the pages are linked in a list. A row offset table is at the end of the page. Data rows fill the rest of the page. Organization of Data Pages : SQL Server 2000 tables use one of two methods to organize their data pages:

• Clustered tables are tables that have a clustered index.

The data rows are stored in order based on the clustered index key. The index is implemented as a B-tree index structure that supports fast retrieval of the rows based on their clustered index key values. The pages in each level of the index, including the data pages in the leaf level, are linked in a doubly linked list, but navigation from one level to another is done using key values.

• Heaps are tables that have no clustered index.



Page 122 of 194

The data rows are not stored in any particular order, and there is no particular order to the sequence of the data pages. The data pages are not linked in a linked list. Structure of Indexes An Index is created in the form of a B-tree structure, in which all the values of the Index expression are arranged in the sorted order. An Index consists of a tree with a root from where navigation begins possible intermediate Index levels, and the bottom-level leaf pages. The Index is used to find the correct leaf page. The number of levels in an Index varies depending on the number of rows in the table and the size of the key expression for the Index. A B-tree provides fast access to data by searching on a key a value of the Index. B-trees cluster records with similar keys. The B stand for balanced, and balancing the tree is a core feature of a B-tree’s usefulness. The trees are managed, and branches are grafted as necessary so that navigating down the tree to find a value and locate a specific record always takes only a few page access. Because the trees are balanced, finding any record requires about the same amount of resources, and retrieval speed will be consistent because the Index has the same depth throughout.

A heap

MORGK

MORGK

ANATR

TRADH

GOURL

EASTC

LAMAI

Page 10 Page 11 Page 12 Page 13 Page 14 An Index consists of a tree with a root from which the navigation begins possible intermediate Index levels, and bottom-level leaf pages. The Index is used to find the correct leaf page. The number of levels in an Index will very depending on the number of row in the table and size of the key column or column for the Index. If you create an Index using a large key, fewer entries will fit on a page, so more pages (and possibly more levels) will be needed for the Index. On a qualified retrieval or delete, the correct leaf page will be the lowest page of the tree in which one or more rows with the specified key or keys reside. In any Index, the leaf level contains every key value, in key sequence. There are two types of Index: clustered Index, and non-clustered Index.

CLUSTERED INDEX

Creates an object where the physical order of rows is the same as the indexed order of the rows, and the bottom (leaf) level of the clustered index contains the actual data rows. A table or view is allowed one clustered index at a time. Creating a clustered index requires space available in your database equal to approximately 1.2 times the size of the data. This is space in addition to the space used by the existing table; the data is duplicated in order to create the clustered index, and the old, nonindexed data is deleted when the index is complete.

BOTTM

LINOD

LONEP

CENTC

BLONP

PICCO

MAGAA

MAISD

BERGS

LACOR

FISSA

SPECD

GALED

CONSH

PERIC

BLAUS

SEVES

ISLAT

TRAIH

OTTIC

QUICK

ANTON

FAMIA

SPLIR

QUEDE

FRANR

LILAS

HILAA



Page 123 of 194

Since this is a clustered index, the data pages and the leaf pages are the same. The root page has a reference to the record ANATR, which would be found on page 10. Note that the records in the data pages are in sorted order.

NONCLUSTERED INDEX

The non-clustered index is an object that specifies the logical ordering of a table. With a nonclustered index, the physical order of the rows is independent of their indexed order. The leaf level of a nonclustered index contains index rows. Each index row contains the nonclustered key value and one or more row locators that point to the row that contains the value. If the table does not have a clustered index, the row locator is the row's disk address. If the table does have a clustered index, the row locator is the clustered index key for the row.

Each table can have as many as 249 nonclustered indexes (regardless of how they are created: implicitly with PRIMARY KEY and UNIQUE constraints, or explicitly with CREATE INDEX). Each index can provide access to the data in a different sort order.



Page 124 of 194

From the above diagram, it can be noted that, a clustered index has the top most root level page, followed by the leaf pages and finally the data pages which contains the data. For instance, the record containing id, Barcelona would be found on Leaf Page 20. Further the leaf page 20 indicates that the record containing details of Barcelona, would be found as the 6th record on Data Page number 11. Similarly, Bergamo would be found on data page number 13 as the first record. Unique index

A unique index ensures that the indexed column contains no duplicate values. In the case of multicolumn unique indexes, the index ensures that each combination of values in the indexed column is unique. For example, if a unique index full_name is created on a combination of last_name, first_name, and middle_initial columns, no two people could have the same full name in the table.

Both clustered and nonclustered indexes can be unique. Therefore, provided that the data in the column is unique, you can create both a unique clustered index and multiple-unique nonclustered indexes on the same table.



Page 125 of 194

Composite Index A composite index is made of more than one column. The only restriction is that the index key has to be less than 900 bytes. If the index is composed of only fixed size columns, the sum of their sizes should be less than 900 bytes.

CREATE INDEX Command

The syntax to create an Index is as follows: CREATE [UNIQUE] ]CLUSTERED | NONCLUSTERED] Index Index_name ON table_name (column_name [, column_name]…) Example for creating a SIMPLE INDEX USE pubs CREATE INDEX au_id_ind ON authors (au_id) Example for creating a CLUSTERED INDEX USE pubs -- Table Creation CREATE TABLE emp_pay ( employeeID int NOT NULL, base_pay money NOT NULL, commission decimal(2, 2) NOT NULL ) --Values being added to the table INSERT emp_pay VALUES (1, 500, .10) INSERT emp_pay VALUES (2, 1000, .05) INSERT emp_pay VALUES (3, 800, .07) INSERT emp_pay VALUES (5, 1500, .03) INSERT emp_pay VALUES (9, 750, .06) --Creation of clustered index CREATE UNIQUE CLUSTERED INDEX employeeID_ind ON emp_pay(employeeID)

This example creates an index on the employeeID column of the emp_pay table that enforces uniqueness. This index physically orders the data on disk because the CLUSTERED clause is specified.



Page 126 of 194

FILL FACTOR OPTION Specifies a percentage that indicates how full SQL Server should make the leaf level of each index page during index creation. When an index page fills up, SQL Server must take time to split the index page to make room for new rows, which is quite expensive. For update-intensive tables, a properly chosen FILLFACTOR value yields better update performance than an improper FILLFACTOR value. The value of the original FILLFACTOR is stored with the index in sysindexes. When FILLFACTOR is specified, SQL Server rounds up the number of rows to be placed on each page. For example, issuing CREATE CLUSTERED INDEX ... FILLFACTOR = 33 creates a clustered index with a FILLFACTOR of 33 percent. Assume that SQL Server calculates that 5.2 rows is 33 percent of the space on a page. SQL Server rounds so that six rows are placed on each page.

Note An explicit FILLFACTOR setting applies only when the index is first created. SQL Server does not dynamically keep the specified percentage of empty space in the pages. User-specified FILLFACTOR values can be from 1 through 100. If no value is specified, the default is 0. Use a FILLFACTOR of 100 only if no INSERT or UPDATE statements will occur, such as with a read-only table. If FILLFACTOR is 100, SQL Server creates indexes with leaf pages 100 percent full. An INSERT or UPDATE made after the creation of an index with a 100 percent FILLFACTOR causes page splits for each INSERT and possibly each UPDATE. Smaller FILLFACTOR values, except 0, cause SQL Server to create new indexes with leaf pages that are not completely full. For example, a FILLFACTOR of 10 can be a reasonable choice when creating an index on a table known to contain a small portion of the data that it will eventually hold. Smaller FILLFACTOR values also cause each index to take more storage space. The following table illustrates how the pages of an index are filled up if FILLFACTOR is specified.

FILLFACTOR Intermediate page Leaf page

0 percent One free entry 100 percent full

1 - 99 percent One free entry <= FILLFACTOR percent full

100 percent One free entry 100 percent full

This example uses the FILLFACTOR clause set to 100. A FILLFACTOR of 100 fills every page completely and is useful only when you know that index values in the table will never change. USE pubs CREATE NONCLUSTERED INDEX zip_ind ON authors (zip) WITH FILLFACTOR = 100 Creating Useful Indexes Creating useful Indexes is one of the most important tasks you can do to achieve good performance. Indexes can dramatically speed up data retrieval and selection, but they are a drag on a data modification because along with changes to the data, the Index entries must also be maintained and those changes must be logged. The key to creating useful Indexes understands the uses of the data, the types and frequencies of queries performed, and how queries can use Indexes to help SQL Server find your data quickly. A CRUD chart or similar analysis technique can be invaluable in this effort. You might want to quickly review the difference between clustered and nonclustered Indexes because the difference is crucial in deciding what kinds of Index of create. Clustered and nonclustered Indexes are similar at the upper (node) levels—both are organized as B-trees. Index rows above the leaf level contain Index key values and pointers o page the next level



Page 127 of 194

down. Each rows keeps track of the first key value on the page it points to. Figure below shows an abstract view of an Index node for an Index on a customer’s last name. The entry Johnson indicates page 1:200 (file 1, page 200), which is at the next level of the Index. Since Johnson (inclusive) and Jones (exclusive).

Key Page Number

Jackson

Jenson

Johnson

Jones

Juniper

1:147

1:210

1:200

1:186

1:202

An Index node page The leaf, or bottom, level of the Index is where clustered and nonclustered Indexes doffer. For both kinds of indexes, the leaf level contains every key value in the table on which the Index is built, and those keys are in sorted order. In a clustered Index, the leaf level is the data level, so of course every key value is present. This means that the data in a table is sorted in order of the clustered Index. In a nonclustered Index, the leaf level is separate from the data. In addition to the key values, the Index rows contain a bookmark indicating where to find the actual data. If the table has a clustered Index, the bookmark is the clustered Index key that corresponds to the nonclustered key in the row. (If the clustered key is composite, all parts of the key are included.) Choose the Clustered Index Carefully Clustered Indexes are extremely useful for range queries (for example, WHERE sales_quantity BETWEEN 500 and 1000) and for queries in which the data must be ordered to match the clustering key. Only one clustered Index can exist per table, sine it defines the physical ordering of the data for that table. Since you can have only one clustered Index per table, you should choose it carefully based on the most critical retrieval operations. Because of the clustered Index’s role in managing space within the table, nearly every table should have one. And if a table has only one Index, it should probably be clustered. If a table is declared with a primary key (which is advisable), by default the primary key columns from the clustered Index. Again, this is because almost every table should have clustered Index and if the table has only one Index; it should probably be clustered. But if your table has several Indexes, some other index might better serve as the clustered Index. This is often true when you do single-row retrieval by primary key. A nonclustered, unique Index works nearly as well in this case and till enforces the primary key’s uniqueness. So save your clustered Index for something that will benefit more from it by adding the keyword NONCLUSTERED when you declare the PRIMARY KEY constraint. Make Nonclustered Indexes Highly Selective A query using an Index on a large table is often dramatically faster than a query doing a table scan. But this is not always true, and table scans are not all inherently evil. Nonclustered Index retrieval means reading B-tree entries to determine the data page that is pointed to and then retrieving the page, going back to the B-tree, retrieving another data page, and so on until many data pages are read over and over. (Subsequent retrievals can be from cache) With a table scan, the pages are read only once. If the Index does not disqualify a large percentage of the rows, it is cheaper to simply scan the data pages, reading every page exactly once. The query optimizer greatly favors clustered over nonclustered Indexes, because in scanning a clustered Index the system is already scanning the data pages. Once it is at the leaf of the Index, the system has gotten the data as well. So there is no need to read the B-tree, read the data page, and so



Page 128 of 194

on. This is why nonclustered Indexes must be able to eliminate a large percentage of rows to be useful (that is, they must be highly selective), whereas clustered Indexes are useful even with less selectivity. Indexing on column used in the WHERE clause of frequent or critical queries often a big win, but this usually depends on how selective the Index is likely to be. For example, if a query has the clause WHERE last_name = ‘Stankowski’, an Index on last_name is likely to be very useful; it can probably eliminate 99.9 percent of the rows from consideration. On the other hand, a nonclustered Index will probably not be useful on a clause of WHERE sex =’M’ because it eliminate only about half of the rows from consideration; the repeated steps needed to read the B-tree entries just to read the data require far more I/O operations than simply making one single scan through all the data. So nonclustered Indexes are typically not useful on columns that do not have a wide dispersion of values. Think of selectivity as the percentage of qualifying rows in the table (qualifying rows/total rows). If the ratio of qualifying rows to total rows is low, the Index is highly selective and is most useful. If the Index is used, it can eliminate most of the rows in the table from consideration and greatly reduce the work that must be performed. If the ratio of qualifying rows to total rows is high, the Index has poor selectivity and will not be useful. A nonclustered Index is most useful when the ratio is around 5 percent or less—that is, it the Index can eliminate 95 percent of the rows from consideration. If the Index has less than 5 percent selectivity, it probably will not be used; either a different Index will be chosen or the table will be scanned. Recall that each Index has a histogram of sampled data values for the Index key, which the optimizer uses to estimate whether the Index is selective enough to be useful to the query. Tailor Indexes to Critical Transactions Indexes speed data retrieval at the cost of additional work for data modification. To determine a reasonable number of Indexes, you must consider the frequency of updates vs. retrievals and the relative importance of the competing types of work. If your system is almost purely a decision-support system (DSS) with little update activity, it makes sense to have as many Indexes as will be useful to the queries being issued. A DSS might reasonable have dozen or more Indexes on a single table. If you have a predominantly online transaction processing (OLTP) application, you need relatively few Indexes on a table—probably just a couple carefully chosen ones. Look for opportunities to achieve Index coverage in queries, but don’t get carried away. An Index “covers” the query if it has all the data values needed as part of the Index key. For example, if you have a query such as SELECT emp_name, emp_sex from employee WHERE emp_name LIKE ‘Sm%’ and you have a nonclustered Index on emp_name, it might make sence to append the emp_sex column to the Index key as well. Then the Index will still be useful for the selection, but it will already have the value for emp_sex. The optimizer won’t need to read the data page for the raw to get the emp_sex value; the optimizer is smart enough to simply get the value from the B-tree key. The emp_sex column is probably a char (1), so the column doesn’t add greatly to the key length, and this is good. Every nonclustered Index is a covering Index if all you are inserted in is the key column of the index. For example, if you have a nonclustered Index on first name, it covers all these queries • Select all the first names that begin with K. • Find the first that occurs most often. • Determine whether the table contains the name Melissa. In addition, if the table also has a clustered Index, every nonclustered Index includes the clustering key. So it can also cover any queries that need the clustered key value in addition to the nonclustered key. For example, if our nonclustered Index is on the first name and the table has a clustered Index on the last name, the following queries can all be satisfied by accessing only leaf pages of the B-tree: • Select Tibor’s last name. • Determine whether any duplicate first last name combinations exist. • Find the most common first name for people with the last name Wong.



Page 129 of 194

You can go too and add all types of fields to the index. The net effect is that the Index becomes a virtual copy of the table, just organized differently. Far fewer Index entries fit on a page, I/O increases, cache efficiency is reduced, and much more disk space is required. The covered queries technique can improve performance in some cases, but should use it with discretion. A unique Index (whether nonclustered or clustered) offers the greatest selectivity (that is, only one row can match), so it is most useful for queries that are intended to return exactly one row. Nonclustered Indexes are great for single-row access via the PRIMARY KEY or UNIQUE constraint values in the WHERE clause. Indexes are also important for data modifications, not just for queries. They can speed data retrieval for selecting rows, and they can speed data retrieval needed to find the rows that must be modified. In fact, if no useful Index for search operations exists, the only alternative is for SQL Server to scan the table to look for qualifying rows. Update or delete operations on only one row are common; you should do these operations using the primary key (or other UNIQUE constraint Index) values to be assured that there is a useful Index to that row and no others. A need to update Indexed column can affect the update strategy chosen. For example, to update a column that is part of the key of the clustered Index on a table, you must process the update as a delete followed by an insert rather than as an update-in-place. When you decide which column to Index, especially which columns to make part of the clustered Index, consider the effect the Index will have on the update method used. Index Column Used in Joins Index columns are frequently used to join tables. When you create a PRIMARY KEY or UNIQUE constraint, an Index is automatically created for you. But no Index is automatically created for the referencing columns in a FORIGN KEY constraint. Such columns are frequently used to join tables, so they are almost always among the most likely ones on which to create in Index. If your primary key and foreign key columns are not naturally compact, consider creating a surrogate key using an identity column (or a similar technique). As with row length for tables, if you can keep your Index keys compact, you can fit many more keys on a given page, which result in less physical I/O and better cache efficiency. And if you can join tables based on integer values such as an identity, you avoid having to do relatively expensive character-by-character comparisons. Ideally, columns used to join tables are integer columns—fast and compact. Join density is the average number of rows in one table that match a row in the table it is being joined to. You can also think of density as the average number of duplicates for an Index key. A column with a unique Index has the lowest possible density (there can be no duplicates) and is therefore extremely selective for the join. If a column being joined has a large number of duplicates, it has a high density and is not very selective for joins. Joins are frequently processed as nested loops. For example, if while joining the orders table with order_items the system starts with the orders table (the outer table) and then for each qualifying order row, the inner table is searched for corresponding rows. Think of the join being processed as, “Given a specific row in the outer table, go find all corresponding rows in the inner table. “If you think of joins in this way, you’ll realize that it is important to have a useful Index on the inner table, which is the one being searched for a specific value. For the most common type of join, an equijoin that looks for equal values in columns of two tables, the optimizer automatically decides which the inner table is and which the outer table of a join is. The table order that you specify for the join doesn’t matter in the equijoin case. However, the order for outer joins must match the semantics of the query, so the resulting order is dependent on the order specified. We’ll talk about join strategies later in this chapter.



Page 130 of 194

To view indexes on a table sp_helpindex <table_name> Example USE pubs sp_helpindex titles DROP INDEX Removes one or more indexes from the current database. Note : The DROP INDEX statement does not apply to indexes created by defining PRIMARY KEY or UNIQUE constraints (created by using the PRIMARY KEY or UNIQUE options of either the CREATE TABLE or ALTER TABLE statements, respectively) . Syntax DROP INDEX 'table.index | view.index' [ ,...n ] Example: DROP INDEX authors.au_id_ind Usage of IGNORE_DUP_KEY, DROP EXISTING and SORT_IN_TEMPDB keywords Create unique clustered index idx On Publishers(pub_id) With FILLFACTOR=60, IGNORE_DUP_KEY, DROP EXISITING, SORT_IN_TEMPDB IGNORE_DUP_KEY

This keyword ignores any insert statement that would violate the uniqueness required by the index

DROP EXISITING This keyword replaces the existing index of the same name created on the table SORT_IN_TEMPDB

This keyword instructs the sort operation should take place in the tempdb



Page 131 of 194

Summary

• Indexes are created to enhance the performance of queries • There are two types of indexes – clustered and non-clustered • Data is physically sorted in a clustered index • Clustered index should be built on attributes whose values are highly unique

and do not change often • In a non-clustered index, the physical order of rows is not the same as that of

the index order



Page 132 of 194

REVIEW QUESTIONS

1. What is an index? 2. How many clustered index can be created on a table? 3. Does index creation on a table increase performance? 4. How can you view the indexes created on a table? 5. When a table is dropped from the database, does an index automatically get dropped or does it

have to be dropped explicitly?



Page 133 of 194

RDBMS Concepts using SQL Server 2000 Chapter 10: Views

Objectives

• What is a View

• Creation of a view

• Alter views

• Concept of Derived Tables



Page 134 of 194

Views A view is a virtual table, which consists of a subset of columns from one or more tables. Though it is similar to a table, it is not stored in the database as a physical structure. It is a query, which is stored as an object. Hence, a view is an object that derives its data from one or more tables. These tables are referred to as the base or the underlying tables. A view doesn’t actually store any data. It acts as a filter to underlying tables in which the data is stored. The SELECT statement that defined the view can be from one or more underlying tables or from other views. Once you have defined a view, you can reference it like any other in the database. A view serves as a security mechanism. It ensures that users are able it retrieve and modify only the data given to them. The remaining data in the underlying tables can neither be seen nor accessed. A view also serves as a mechanism for simplifying query execution. Complex queries can be stored in the form of a view and data from the view can be extracted using simple queries. A view can be created by using the CREATE VIEW statement Syntax : CREATE VIEW [ < database_name > . ] [ < owner > . ] view_name [ ( column [ ,...n ] ) ] [ WITH < view_attribute > [ ,...n ] ] AS select_statement [ WITH CHECK OPTION ] < view_attribute > ::= { ENCRYPTION | SCHEMABINDING | VIEW_METADATA } Arguments view_name Is the name of the view. View names must follow the rules for identifiers. Specifying the view owner name is optional. column Is the name to be used for a column in a view. Naming a column in CREATE VIEW is necessary only when a column is derived from an arithmetic expression, a function, or a constant, when two or more columns may otherwise have the same name (usually because of a join), or when a column in a view is given a name different from that of the column from which derived. Column names can also be assigned in the SELECT statement. If column is not specified, the view columns acquire the same names as the columns in the SELECT statement. AS Are the actions the view is to perform. select_statement Is the SELECT statement that defines the view. It can use more than one table and other views. To select from the objects referenced in the SELECT clause of a view created, it is necessary to have the appropriate permissions. WITH CHECK OPTION Forces all data modification statements executed against the view to adhere to the criteria set within select_statement. When a row is modified through a view, the WITH CHECK OPTION ensures the data remains visible through the view after the modification is committed. WITH ENCRYPTION Indicates that SQL Server encrypts the system table columns containing the text of the CREATE VIEW statement. Using WITH ENCRYPTION prevents the view from being published as part of SQL Server replication. SCHEMABINDING



Page 135 of 194

Binds the view to the schema. When SCHEMABINDING is specified, the select_statement must include the two-part names (owner.object) of tables, views, or user-defined functions referenced. Views or tables participating in a view created with the schema binding clause cannot be dropped unless that view is dropped or changed so that it no longer has schema binding. Otherwise, SQL Server raises an error. Creation of a simple view This example creates a view with a simple SELECT statement. A simple view is helpful when a combination of columns is queried frequently. USE pubs CREATE VIEW titles_view AS SELECT title, type, price, pubdate FROM titles To retrieve data from the view, the following query would have to be issued select * from titles_view Usage of WITH ENCRYPTION keyword This example uses the WITH ENCRYPTION option and shows computed columns, renamed columns, and multiple columns. USE pubs CREATE VIEW accounts (title, advance, amt_due) WITH ENCRYPTION AS SELECT title, advance, price * royalty * ytd_sales FROM titles WHERE price > $5 GO When the WITH ENCRYPTION keyword is used, the text in the syscomments table is encrypted. The following query will enable you to see the text from the syscomments table USE pubs SELECT c.id, c.text FROM syscomments c, sysobjects o WHERE c.id = o.id and o.name = 'accounts'

Use WITH CHECK OPTION

This example shows a view named CAonly that allows data modifications to apply only to authors within the state of California.

USE pubs IF EXISTS (SELECT TABLE_NAME FROM INFORMATION_SCHEMA.VIEWS WHERE TABLE_NAME = 'CAonly') DROP VIEW CAonly GO CREATE VIEW CAonly AS



Page 136 of 194

SELECT au_lname, au_fname, city, state FROM authors WHERE state = 'CA' WITH CHECK OPTION GO Guidelines for creating views

• Views can be created only in the current database. • A views name should be similar to the table name because it is easy to remember and

associate. It should follow the naming conventions for identifiers. • A view can be built on other views. SQL Server allows views to be nested up to 32 levels. It can

contain upto 1024 columns from one or more tables or views. • Default, Rules and triggers cannot be indexed. • Temporary tables cannot participate in views. • View definition holds good even if the table are dropped. • The query defining the view cannot include ORDER BY, COMPUTE,COMPUTE BY clauses or the

INTO keyword. Modifying data through views A view can be used to modify the data in the table provided:

• The view contains at least one table in the FROM clause of the view definition; that means the view cannot be based solely on an expression.

• No aggregate functions (AVG, COUNT, SUM, MIN, MAX, GROUPING) or GROUP BY, UNION, DISTINCT, or TOP clauses are used in the select list. However, aggregate functions can be used within a sub query that is defined in the FROM clause provided that the derived values generated by the aggregate are not modified.

• The view has no derived columns in the select list. Derived columns are result set columns formed by anything other than a simple column expression, such as using functions or addition or subtraction operators.

ALTER VIEW

Alters a previously created view (created by executing CREATE VIEW), including indexed views, without affecting dependent stored procedures or triggers and without changing permissions.

This example creates a view that contains all authors called All_authors.

CREATE VIEW All_authors (au_fname, au_lname, address, city, zip) AS SELECT au_fname, au_lname, address, city, zip FROM pubs..authors GO

Due to a change in requirement, the view would have to select authors only from Utah. Here, ALTER VIEW is used to replace the view. If ALTER VIEW is not used and instead the view is dropped and re-created, the GRANT statement which give permissions to a view will have to be re-entered.

ALTER VIEW All_authors (au_fname, au_lname, address, city, zip) AS SELECT au_fname, au_lname, address, city, zip FROM pubs..authors



Page 137 of 194

WHERE state = 'UT' Indexed Views Indexed view has been introduced with SQL Server 2000 and is supported only by its Enterprise Edition ( The Standard and Personal edition allow you to create an index on a view but the Query Optimizer will not use the Indexed View in the query plan). Unit recently, the concept of indexing a view was very absurd since indexing virtual data (that does not exist) seems very vague. A view is a virtual table, which does not have any real data of its own. Indexed views improve the performance of complex queries drastically. SQL Server 2000 has introduced Indexed View, equivalent to Oracle’s Materialized View. The advantage of these views is that it provides a very quick lookup in terms of retrieving the information of a view together. The first index created on an indexed view has to be a unique clustered index. Once this first index is created, more non-clustered indexes can be created in the view. It is best to built Indexed views on data that is not frequently updated, as the maintenance of the view is higher than the cost of maintaining a table index. If the data on which the indexed view is built is updated frequently, then the maintenance cost of the indexed view data may overshadow the performance benefits of using the indexed view. We can improve the performance of the following type of queries by using indexed views.

• Aggregation and Joins processing a lot of rows. We cannot improve the performance of the following types of queries using indexed views.

• A database operation consists of a large number of updates.

• OLTP systems with many write operations.

• Queries without aggregations or joins.

• Aggregations of data having a high degree of cardinality for key. (A high degree of cardinality indicates that key holds numerous values. In case of a unique key, every key has a different value thereby having the possible degree of cardinality).

There are some restrictions for building indexes on views:

• The SCHEMABINDING option must be used with the view. This option binds the tables specified in the view definition. Once the objects involved in a view are bound, no one can alter them. In order to alter such objects, you have to first drop the schema-bound view. SCHEMABINDING does not allow any alterations in the underlying objects, thereby preventing the view from becoming “orphaned”.

• For example, consider that a table on which a view a created is dropped by someone. If the

view is Schema Bound, you can prevent such incidents.

• If the view references any User Defined Functions, it must use the SCHEMABINDING option.

• The view can include only reference tables and user-defined functions and it cannot reference any other view.

• The objects reference in the view must be in the same database as of the view.



Page 138 of 194

• The objects referenced in the view must use a two part naming convention such as dbo.Employees(dbo is the owner and Employees is the database object). A view can only include objects created by its owner.

• For example, a view created by the user dbo can contain objects owned by dbo alone. Drop View command Removes one or more views from the current database. DROP VIEW can be executed against indexed views. Syntax : Drop view <view_name> Renaming a View A view can be renamed without having to drop it. The guidelines to be followed are as follows:

• The view must be in the current database • The new name for the view must follow the rules for identifiers • A view can be renamed only by its owner or the owner of the database

A view can be renamed using the sp_rename system stored procedure Syntax: Sp_rename old_viewname, new_viewname Example : sp_rename All_authors, UT_authors



Page 139 of 194

Summary

• A view is a virtual table, which consists of a subset of columns from one or more tables.

• A view derives its data from one or more tables known as base or underlying tables

• Views serve as security mechanisms, thus protecting the data in the base tables.

• A view can be created with the CREATE VIEW statement. • SQL Server allows data to be modified only in one of the underlying tables when

using views, even if the view is derived from multiple underlying tables • A view can be modified with the ALTER VIEW statement • A view can be dropped with the DROP VIEW statement • A view can be renamed with the sp_rename stored procedure • The text of the view is stored in the SYSCOMMENTS system table



Page 140 of 194

REVIEW QUESTIONS

1. Why is a view called a virtual table? 2. What are the advantages of creating a view 3. Does a view contain data? 4. When a DELETE command is executed in a view, where is the data removed from. 5. Can a single update statement modify details from more than one base table?



Page 141 of 194

RDBMS Concepts using SQL Server 2000 Chapter 11: Programming with Transact - SQL

Objectives

• To understand the need for Transact-SQL

• If………else construct

• Print statement



Page 142 of 194

What is Transact-SQL? Transact-SQL (T-SQL for short) is a programming language that can execute within the database engine. In addition to all the features of SQL, it also offers a number of extensions, such as control of flow, looping, error handling etc. Using these extensions, you can write complex routines entirely in SQL. T-SQL simplifiers application development and reduces the amount of network traffic between the client application and the server by allowing more code to be processed at the server. Using T-SQL you can create code blocks or procedures that can be called from client application written in various programming languages such as C++, Visual Basic, etc. T-SQL does not have the features of many modern programming languages and developments. T-SQL does not have any user interface, and no file or device I/O. It is not suited for an entire application being developed in this language; rather you would mainly develop stored procedures using it, which can be called from other development environments. Writing a T-SQL Procedure In its simplest form, a T-SQL procedure is simply a collection of any valid SQL statements. For example, the following single statement would make a T-SQL procedure, though you can very well execute this statement by typing it in the ISQLW window: select * from employee Using Variables You can use variable to store result temporarily, so that they can be used later as and when required. The following example illustrates the use of local variables: declare @minsal int, @maxsal int select @minsal = min (salary), @maxsal = max (salary) from employee select @minsal, @maxsal This program begins by declaring two local variables. The name of a local variable must begin with a single @ character. Data type of each variable must be specified after the variable name. The second statement shows how the output of a normal SQL query can be stored in a temporary variable. The result of min (salary) is stored in the variable @minsal, while the result of max (salary) is stored in another variable @maxsal. The third statement displays the values of the local variables on the user console, by using the SELECT clause, without any FROM clause. It is valid for a SELECT statement to return multiple rows, and the result of such a query being assigned to a local variable. However, in that case, only the value from the last row will be available to the next statement. For example, the following code will run without an error, but may be logically incorrect: declare @sal int select @sal = salary from employee where dept_code = “MKTG”



Page 143 of 194

In Transact SQL, there are two kinds of variables, local and global. The variable tempsal declared above is lost when the execution of the batch is over. Such a variable is called a local variable and since users define them, they are called user-defined variables. Global variables are those, which are declared by the server and typically, assigned values by the server. Global variables are system-supplied, pre-defined variables. They are distinguished from local variables as they have two @signs preceding their names. The server updates global variables on an ongoing basis. The following table contains a list of useful global variables

Variable Name Value Returned @@version Date of the current version @@servername Name of the SQL Server @@spid Server process ID number of the current process @@procid Stored process ID of the currently executing procedure @@error 0 if the last transaction succeed, else last error number @@rowcount Number of rows affected by the last query, 0 if no rows

affected @@connection Number of logins or attemped logins @@trancount Number of currently active transactions for the user @@max_connection Maximum number of simultaneous connections @@total_errors Total number of errors that have occurred during the current

SQL Server session. If you declare a local variable that has the same name as a global variable, then the variable will be treated as a local variable Example : To display the server name using a global variable Select ‘The name of the server is ‘ + @@servername Including Comments T-SQL support two forms of comments in source code, as illustrated in the following code example: declare @minsal int, @maxsal int -- declare variables /* The following code retrieves the minimum and the maximum salary values, and assigns them to the @minsal and @maxsal Variables respectively. /* declare @minsal int, @maxsal int select @minsal = min (salary), @maxsal = max (salary) from employee select @minsal, @maxsal -- display results Double-hyphen (--) is used to include single-line comments only, whereas the C-style pair of /*…*/ can be used for marking a block of one or more lines as a comment.



Page 144 of 194

The IF…ELSE Construct The IF…ELSE construct is used to specify conditional execution of code, with an optional ELSE part specifying an alternative code to be executed. For example,

This example shows an IF condition with a statement block. If the average price of the title is not less than $15, it prints the text: Average title price is more than $15.

USE pubs IF (SELECT AVG(price) FROM titles WHERE type = 'mod_cook') < $15 BEGIN PRINT 'The following titles are excellent mod_cook books:' PRINT ' ' SELECT SUBSTRING(title, 1, 35) AS Title FROM titles WHERE type = 'mod_cook' END ELSE PRINT 'Average title price is more than $15.'

Here is the result set:

The following titles are excellent mod_cook books: Title ----------------------------------- Silicon Valley Gastronomic Treats The Gourmet Microwave The BEGIN…END Block The BEGIN…END statements are used to mark a block of statements to be executed. Typically, BEGIN immediately follows IF, ELSE, or WHILE statements. For example, USE pubs DECLARE @msg varchar(255) IF (SELECT COUNT(price) FROM titles WHERE title_id LIKE 'BU%' AND price < 20) > 0 BEGIN SET @msg = 'There are several books that are a good value at under $20. These books are: ' PRINT @msg SELECT title FROM titles WHERE price < 20 END The CASE Expression T-SQL provides for the CASE expression, which can be used as an alternative to writing code with nested IF…ELSE IF…ELSE statements. However, it is to be noted that CASE is not a control-of-flow keyword; it can only be used within SELECT or UPDATE statements. The following is an example of its use in a SELECT statement:



Page 145 of 194

USE pubs SELECT Category = CASE type WHEN 'popular_comp' THEN 'Popular Computing' WHEN 'mod_cook' THEN 'Modern Cooking' WHEN 'business' THEN 'Business' WHEN 'psychology' THEN 'Psychology' WHEN 'trad_cook' THEN 'Traditional Cooking' ELSE 'Not yet categorized' END, CAST(title AS varchar(25)) AS 'Shortened Title', price AS Price FROM titles WHERE price IS NOT NULL ORDER BY type, price COMPUTE AVG(price) BY type The BREAK Statement This statement is used to exit the innermost WHILE loop. Any statements following the END keyword are ignored. BREAK is often, but not always, activated by an IF test. The CONTINUE Statement This statement is used to restarts a WHILE loop. Any statements after the CONTINUE keyword are ignored. CONTINUE is often, but not always, activated by an IF test. The PRINT Statement T-SQL provides the PRINT statement to display a character string of maximum 255 characters. You can display a literal character string, or a variable of type char or varchar. For example, print “hello, world” However, you need to remember that the PRINT statement takes a single parameter; you cannot use string functions or concatenation in the PRINT statement. For example, the following are invalid:

This example converts the results of the GETDATE function to a varchar data type and concatenates it with literal text to be returned by PRINT.

PRINT 'This message was printed on ' + RTRIM(CONVERT(varchar(30), GETDATE())) + '.' The WHILE Statement

The WHILE keyword is used to sets a condition for the repeated execution of an SQL statement or statement block. The statements are executed repeatedly as long as the specified condition is true. The execution of statements in the WHILE loop can be controlled from inside the loop with the BREAK and CONTINUE keywords. If two or more WHILE loops are nested, the inner BREAK exits to the next outermost loop. First, all the statements after the end of the inner loop run, and then the next outermost loop restarts.



Page 146 of 194

USE pubs WHILE (SELECT AVG(price) FROM titles) < $30 BEGIN UPDATE titles SET price = price * 2 SELECT MAX(price) FROM titles IF (SELECT MAX(price) FROM titles) > $50 BREAK ELSE CONTINUE END PRINT 'Too much for the market to bear'



Page 147 of 194

SUMMARY

• A batch is a set of SQL statements submitted together, to the server for execution.

• The control-of-flow statements are used to program in SQL Server. They are used in stored procedures, triggers and transactions

• The control-of-flow programming constructs are : o IF…ELSE o CASE o WHILE

• The keywords used while programming in SQL Server are : o BEGIN..END o PRINT o Comments specified with the – and /* …. */ symbols



Page 148 of 194

REVIEW QUESTIONS

1. What is a batch? 2. What are the two type of variables? 3. What is the default value present in every variable? 4. What is the difference between the BREAK and CONTINUE statement



Page 149 of 194

RDBMS Concepts using SQL Server 2000 Chapter 12: Stored Procedure & User Defined Functions

Objectives

• Understand Stored Procedures

• Types of Stored Procedures

• Create, execute and alter stored Procedure

• Drop Stored Procedure

• Introduction to User Defined Functions



Page 150 of 194

What is a Stored Procedure?

A stored procedure is a SQL code block consisting of one or more statements, which are stored and cached at the server for later reuse. A stored procedure gets stored within the database, including its execution plan. Other applications having access to the database can execute the procedure directly. Procedures can be created for permanent use or for temporary use within a session (local temporary procedure) or for temporary use within all sessions (global temporary procedure).

Types 0f Stored Procedures

1. System Stored Procedure-are stored in the master database and are typically named with sp_prefix

2. Local Stored Procedure- Procedure stored in a user database, are typically designed to complete tasks in the user database

3. Temporary Stored Procedure- Stored in the Tempdb database and exists only until either the connection that created it is closed or server is shut down.

a. Local Temporary stored Procedure- #

b. Global Temporary Stored Procedure -##

4. Extended Stored Procedure

a. An extended stored procedure uses an external program compiles as a 32-bit DLL

b. Follow the xp_prefix as a naming convention

5. Remote Stored Procedure- Execute procedure on a remote SQL Server Installation

Creating and Executing a Stored Procedure Let us begin by creating a simple stored procedure. To do this, enter the following code using ISQLW: Create procedure getTotalSalary as Select sum (salary) from employee This procedure can be executed with the following syntax: Exec getTotalSalary Passing Parameters You can pass parameters to a procedure so that it can carry out instructions based on some user-specified values. For example, the following procedure receives the code of a department, and returns the total salary of employees in that department: create procedure gettotalsalary2 @deptcode varchar (4) as select sum (salary) from employee where dept_code = @deptcode The procedure can then be executed as follows:



Page 151 of 194

exec gettotalsalary2 ‘MKTG’ An alternative way, especially useful for procedure requiring multiple parameters, is as follows: exec gettotalsalary2 @deptcode = ‘MKTG’ The advantage of this method is that the parameters need not be sent in any particular order. The disadvantage, however, is that the user needs to know the names of the parameters being received. Any change in the source code of the procedure, which changes the name by which the procedure receives its parameters, will result in all calling programs failing Output Parameters You can specify one or more parameters as output parameters, which provide the pass-by –reference capability. This means that apart from a result set being returned by a procedure, the calling program can also retrieve the values of the output parameters. For example, the following procedure can be used to count the number of departments from the dept table, as well as the number of employees from the employee table: create procedure counttables @deptcount int output, @empcount int output as select @deptcount = count (*) from dept select @empcount = count (*) from employee To use this procedure, you would call it as follows: declare @dcount int, @ecount int exec counttables @dcount output, @ecount output Storage of Stored Procedures For each new stored procedure, a row is created in the sysobjects table, just like any other database object. Information about the sequence tree that is used internally by SQL Server is stored in the sysprocedures table. The actual binary representation of the sequence tree is not exposed as part of the table. The text of a stored procedure is stored in the syscomments table. This allows procedures like sp_helptext to display the source code of a stored procedure. If you do not want the source code of a procedure to be visible to anybody querying the syscomments table, you can create the procedure using the WITH ENCRYPTION clause, as follows; Create procedure myprocedure with encryption as Example of Stored Procedures: Use a simple procedure with a complex SELECT This stored procedure returns all authors (first and last names supplied), their titles, and their publishers from a four-table join. This stored procedure does not use any parameters. USE pubs CREATE PROCEDURE au_info_all



Page 152 of 194

AS SELECT au_lname, au_fname, title, pub_name FROM authors a INNER JOIN titleauthor ta ON a.au_id = ta.au_id INNER JOIN titles t ON t.title_id = ta.title_id INNER JOIN publishers p ON t.pub_id = p.pub_id ----The au_info_all stored procedure can be executed in these ways: EXECUTE au_info_all -- Or EXEC au_info_all Use a simple procedure with parameters This stored procedure returns only the specified authors (first and last names supplied), their titles, and their publishers from a four-table join. This stored procedure accepts exact matches for the parameters passed. USE pubs CREATE PROCEDURE au_info @lastname varchar(40), @firstname varchar(20) AS SELECT au_lname, au_fname, title, pub_name FROM authors a INNER JOIN titleauthor ta ON a.au_id = ta.au_id INNER JOIN titles t ON t.title_id = ta.title_id INNER JOIN publishers p ON t.pub_id = p.pub_id WHERE au_fname = @firstname AND au_lname = @lastname ----The au_info stored procedure can be executed in these ways: EXECUTE au_info 'Dull', 'Ann' -- Or EXECUTE au_info @lastname = 'Dull', @firstname = 'Ann' -- Or EXECUTE au_info @firstname = 'Ann', @lastname = 'Dull' -- Or EXEC au_info 'Dull', 'Ann' -- Or EXEC au_info @lastname = 'Dull', @firstname = 'Ann' -- Or EXEC au_info @firstname = 'Ann', @lastname = 'Dull' ----Or, if this procedure is the first statement within the batch: au_info 'Dull', 'Ann' -- Or au_info @lastname = 'Dull', @firstname = 'Ann' -- Or au_info @firstname = 'Ann', @lastname = 'Dull' Use a simple procedure with wildcard parameters



Page 153 of 194

This stored procedure returns only the specified authors (first and last names supplied), their titles, and their publishers from a four-table join. This stored procedure pattern matches the parameters passed or, if not supplied, uses the preset defaults. USE pubs CREATE PROCEDURE au_info2 @lastname varchar(30) = 'D%', @firstname varchar(18) = '%' AS SELECT au_lname, au_fname, title, pub_name FROM authors a INNER JOIN titleauthor ta ON a.au_id = ta.au_id INNER JOIN titles t ON t.title_id = ta.title_id INNER JOIN publishers p ON t.pub_id = p.pub_id WHERE au_fname LIKE @firstname AND au_lname LIKE @lastname ----The au_info2 stored procedure can be executed in many combinations. Only a few combinations are shown here: EXECUTE au_info2 -- Or EXECUTE au_info2 'Wh%' -- Or EXECUTE au_info2 @firstname = 'A%' -- Or EXECUTE au_info2 '[CK]ars[OE]n' -- Or EXECUTE au_info2 'Hunter', 'Sheryl' -- Or EXECUTE au_info2 'H%', 'S%' Use OUTPUT parameters OUTPUT parameters allow an external procedure, a batch, or more than one Transact-SQL statements to access a value set during the procedure execution. In this example, a stored procedure (titles_sum) is created and allows one optional input parameter and one output parameter. First, create the procedure: USE pubs CREATE PROCEDURE titles_sum @@TITLE varchar(40) = '%', @@SUM money OUTPUT AS SELECT 'Title Name' = title FROM titles WHERE title LIKE @@TITLE SELECT @@SUM = SUM(price) FROM titles WHERE title LIKE @@TITLE GO Next, use the OUTPUT parameter with control-of-flow language.



Page 154 of 194

Note The OUTPUT variable must be defined during the table creation as well as during use of the variable. The parameter name and variable name do not have to match; however, the data type and parameter positioning must match (unless @@SUM = variable is used). DECLARE @@TOTALCOST money EXECUTE titles_sum 'The%', @@TOTALCOST OUTPUT IF @@TOTALCOST < 200 BEGIN PRINT ' ' PRINT 'All of these titles can be purchased for less than $200.' END ELSE SELECT 'The total cost of these titles is $' + RTRIM(CAST(@@TOTALCOST AS varchar(20))) Here is the result set: Title Name ------------------------------------------------------------------------ The Busy Executive's Database Guide The Gourmet Microwave The Psychology of Computer Cooking (3 row(s) affected) Warning, null value eliminated from aggregate. All of these titles can be purchased for less than $200. Use the WITH RECOMPILE option The WITH RECOMPILE clause is helpful when the parameters supplied to the procedure will not be typical, and when a new execution plan should not be cached or stored in memory. USE pubs CREATE PROCEDURE titles_by_author @@LNAME_PATTERN varchar(30) = '%' WITH RECOMPILE AS SELECT RTRIM(au_fname) + ' ' + RTRIM(au_lname) AS 'Authors full name', title AS Title FROM authors a INNER JOIN titleauthor ta ON a.au_id = ta.au_id INNER JOIN titles t ON ta.title_id = t.title_id WHERE au_lname LIKE @@LNAME_PATTERN GO Use the WITH ENCRYPTION option The WITH ENCRYPTION clause hides the text of a stored procedure from users. This example creates an encrypted procedure, uses the sp_helptext system stored procedure to get information on that encrypted procedure, and then attempts to get information on that procedure directly from the syscomments table.



Page 155 of 194

USE pubs GO CREATE PROCEDURE encrypt_this WITH ENCRYPTION AS SELECT * FROM authors GO EXEC sp_helptext encrypt_this Here is the result set: The object's comments have been encrypted. Next, select the identification number and text of the encrypted stored procedure contents. SELECT c.id, c.text FROM syscomments c INNER JOIN sysobjects o ON c.id = o.id WHERE o.name = 'encrypt_this' Here is the result set: Note The text column output is shown on a separate line. When executed, this information appears on the same line as the id column information. id text ---------- ------------------------------------------------------------ 1413580074 ?????????????????????????????????e?????????????????????????????????????????????????????????????????????????? (1 row(s) affected) Alter Procedure Alters a previously created procedure, created by executing the CREATE PROCEDURE statement, without changing permissions and without affecting any dependent stored procedures or triggers. This example creates a procedure called Oakland_authors that, by default, contains all authors from the city of Oakland, California. Then, when the procedure must be changed to retrieve all authors from California, ALTER PROCEDURE is used to redefine the stored procedure. USE pubs GO CREATE PROCEDURE Oakland_authors AS SELECT au_fname, au_lname, address, city, zip FROM pubs..authors WHERE city = 'Oakland' and state = 'CA' ORDER BY au_lname, au_fname GO



Page 156 of 194

The procedure must be changed to include all authors from California, regardless of what city they live in. ALTER PROCEDURE Oakland_authors WITH ENCRYPTION AS SELECT au_fname, au_lname, address, city, zip FROM pubs..authors WHERE state = 'CA' ORDER BY au_lname, au_fname GO Drop a Stored Procedure Removes one or more stored procedures or procedure groups from the current database. Syntax DROP PROCEDURE { procedure } [ ,...n ] Example Examples This example removes the Oakland_authors stored procedure (in the current database). DROP PROCEDURE Oakland_authors GO View the text of a Procedure

We can use the system stored procedure, sp_helptext to check the text of the procedure Example: sp_helptext Oakland_authors User-Defined Functions

Functions are subroutines made up of one or more Transact-SQL statements that can be used to encapsulate code for reuse. Microsoft® SQL Server™ 2000 does not limit users to the built-in functions defined as part of the Transact-SQL language, but allows users to create their own user-defined functions. User-defined functions are created using the CREATE FUNCTION statement, modified using the ALTER FUNCTION statement, and removed using the DROP FUNCTION statement. Each fully qualified user-defined function name (database_name.owner_name.function_name) must be unique. You must have been granted CREATE FUNCTION permissions to create, alter, or drop user-defined functions. Users other than the owner must be granted appropriate permissions on a function before they can use it in a Transact-SQL statement. To create or alter tables with references to user-defined functions in the CHECK constraint, DEFAULT clause, or computed column definition, you must also have REFERENCES permission on the functions. Transact-SQL errors that cause a statement to be canceled and continue with the next statement in the module (such as triggers or stored procedures) are treated differently inside a function. In functions, such errors cause the execution of the function to stop. This in turn causes the statement that invoked the function to be canceled.



Page 157 of 194

Types of User-Defined Functions SQL Server 2000 supports three types of user-defined functions:

• Scalar functions

• Inline table-valued functions

• Multistatement table-valued functions

A user-defined function takes zero or more input parameters and returns either a scalar value or a table. A function can have a maximum of 1024 input parameters. When a parameter of the function has a default value, the keyword DEFAULT must be specified when calling the function to get the default value. This behavior is different from parameters with default values in stored procedures in which omitting the parameter also implies the default value. User-defined functions do not support output parameters. Scalar functions return a single data value of the type defined in a RETURNS clause. All scalar data types, including bigint and sql_variant, can be used. The timestamp data type, user-defined data type, and nonscalar types, such as table or cursor, are not supported. The body of the function, defined in a BEGIN...END block, contains the series of Transact-SQL statements that return the value. The return type can be any data type except text, ntext, image, cursor, and timestamp. Table-valued functions return a table. For an inline table-valued function, there is no function body; the table is the result set of a single SELECT statement. For a multistatement table-valued function, the function body, defined in a BEGIN...END block, contains the TRANSACT-SQL statements that build and insert rows into the table that will be returned. The statements in a BEGIN...END block cannot have any side effects. Function side effects are any permanent changes to the state of a resource that has a scope outside the function such as a modification to a database table. The only changes that can be made by the statements in the function are changes to objects local to the function, such as local cursors or variables. Modifications to database tables, operations on cursors that are not local to the function, sending e-mail, attempting a catalog modification, and generating a result set that is returned to the user are examples of actions that cannot be performed in a function. The types of statements that are valid in a function include:

• DECLARE statements can be used to define data variables and cursors that are local to the function.

• Assignments of values to objects local to the function, such as using SET to assign values to scalar and table local variables.

• Cursor operations that reference local cursors that are declared, opened, closed, and deallocated in the function. FETCH statements that return data to the client are not allowed. Only FETCH statements that assign values to local variables using the INTO clause are allowed.

• Control-of-flow statements.

• SELECT statements containing select lists with expressions that assign values to variables that are local to the function.

• UPDATE, INSERT, and DELETE statements modifying table variables that are local to the function.



Page 158 of 194

• EXECUTE statements calling an extended stored procedure.

The number of times that a function specified in a query is actually executed can vary between execution plans built by the optimizer. An example is a function invoked by a subquery in a WHERE clause. The number of times the subquery and its function is executed can vary with different access paths chosen by the optimizer. Built-in functions that can return different data on each call are not allowed in user-defined functions. The built-in functions not allowed in user-defined functions are:

@@CONNECTIONS @@PACK_SENT GETDATE

@@CPU_BUSY @@PACKET_ERRORS GetUTCDate

@@IDLE @@TIMETICKS NEWID

@@IO_BUSY @@TOTAL_ERRORS RAND

@@MAX_CONNECTIONS @@TOTAL_READ TEXTPTR

@@PACK_RECEIVED @@TOTAL_WRITE

Calling User-Defined Functions

When calling a scalar user-defined function, you must supply at least a two-part name: SELECT *, MyUser.MyScalarFunction() FROM MyTable Table-valued functions can be called by using a one-part name: SELECT * FROM MyTableFunction() A scalar function can be referenced any place an expression of the same data type returned by the function is allowed in a Transact-SQL statement, including computed columns and CHECK constraint definitions. For example, this statement creates a simple function that returns a decimal: CREATE FUNCTION CubicVolume (@CubeLength decimal(4,1), @CubeWidth decimal(4,1), @CubeHeight decimal(4,1) ) RETURNS decimal(12,3) -- Cubic Centimeters. AS BEGIN RETURN ( @CubeLength * @CubeWidth * @CubeHeight ) END This function can then be used anywhere an integer expression is allowed, such as in a computed column for a table: CREATE TABLE Bricks ( BrickPartNmbr int PRIMARY KEY, BrickColor nchar(20), BrickHeight decimal(4,1), BrickLength decimal(4,1), BrickWidth decimal(4,1), BrickVolume AS ( dbo.CubicVolume(BrickHeight, BrickLength, BrickWidth) ) )



Page 159 of 194

dbo.CubicVolume is an example of a user-defined function that returns a scalar value. The RETURNS clause defines a scalar data type for the value returned by the function. The BEGIN...END block contains one or more Transact-SQL statements that implement the function. Each RETURN statement in the function must have an argument that returns a data value that has the data type specified in the RETURNS clause, or a data type that can be implicitly converted to the type specified in RETURNS. The value of the RETURN argument is the value returned by the function. Obtaining Information About Functions

Several catalog objects report information about user-defined functions:

• sp_help reports information about user-defined functions.

• sp_helptext reports the source of user-defined functions.



Page 160 of 194

SUMMARY

• A stored procedure is a collection of various T-SQL statements that are stored under one name and executed as a single unit

• A stored procedure allows you to declare parameters, variables and use T-SQL statements and programming logic

• A stored procedure provides better performance, security and accuracy and reduces network congestion

• A stored procedure receives and ssends data through the following : o Input parameters o Ouput parameters o Return codes

• A stored procedure can be executed using the EXECUTE statement • A stored procedure can be altered using the ALTER PROCEDURE statement • A stored procedure can be viewed using the sp_help and sp_helptext stored

procedures • A stored procedure can be removed from the database using the DROP

PROCEDURE statement • User defined functions are used for arithmetic and numeric calculations



Page 161 of 194

REVIEW QUESTIONS

1. What are the types of Stored Procedures?

2. What are the advantages of Stored Procedures?

3. Once a stored procedure is created using the WITH ENCRYPTION keyword, how can it be de-crypted?

4. How can you view the text of a stored procedure

5. Can we write recursive stored procedure?



Page 162 of 194

RDBMS Concepts using SQL Server 2000 Chapter 13: Cursors

Objectives

• To know the use of cursor

• Different types of cursor and how to use it



Page 163 of 194

Introduction

Operations in a relational database act on a complete set of rows. The set of rows returned by a SELECT statement consists of all the rows that satisfy condition in the WHERE clause of the statement. This complete set of row returned by the statement is known as the result set. Applications, especially interactive, online applications, cannot always work effectively with the entire result set as a unit. These applications need a mechanism to work with one row or a small block of rows at a time. Cursors are an extension to result sets that provide that mechanism. In Transact-SQL, the default behavior is for individual Transact-SQL statements to return a result set known as the default result set. An application must retrieve all rows in the result set before it can execute any other statement on the connection. The only time a cursor is associated with a result set in Transact-SQL is when you use the DECLARE CURSOR statement to associate a cursor with the result set of an Transact-SQL SELECT statement. You can then fetch the rows individually using the FETCH statement, and can have multiple active cursors on a connection at a time What are cursors? Cursors is a database object used by applications to manipulate the data by rows instead of sets. Using cursors, multiple operations can be performed row by row against the result set with or without returning to the original table. In other words, cursors conceptually return a result set based on tables within the database What are possible uses of cursors? • Allow positioning at specific rows of the result set. • Retrieving a single row or set of rows from the current position in the result set. • Supporting data modifications to the rows at the current position in the result set. • Supporting different levels of visibility to changes made by other users to the database data that is presented in the result set • Providing Transact-SQL statements in scripts, stored procedures, and triggers access to the data in a result set. SQL Server supports • Transact SQL The Transact SQL language supports the syntax for using cursor modeled after the SQL – 92 cursor syntax • Database Application Programming Interface (API) API is a set routing available in an application, such as DB-Library, for interfacing with applications developed, by software programmers. How does the cursor process? 1. Associate a cursor with the result set of a Transact-SQL statement and define characteristics of a cursor such as how rows are going to be retrieved. 2. Execute the Transact-SQL statement to populate the cursor 3. Retrieve the rows in a cursor. The operation to retrieve one row or set of rows from a cursor called a fetch. Performing series of fetch to retrieve the rows backward or forward is called scrolling 4. Optionally, perform the modifications (update or delete) on the row at the current cursor position. 5. Close the cursor. How to create a cursor?



Page 164 of 194

A cursor is said to exist in many states. The different states of a cursor are the different stages in which it dwells when it is created and used. A simple cursor is created and executed using the following step: The syntax is: 1. DECLARE statement is used to create a cursor. It contains the SELECT statement to include the records from the table. The syntax is: DECLARE<Cursor_name>CURSOR For <Select Statement> 2. After a cursor is created and before and you fetch the records from the cursor, you need to open the cursor. OPEN statement issued to open the cursor. Syntax: OPEN <Cursor_name> 3. Once the cursor is opened, records are fetched from the cursor to display them on the screen. FETCH statement is used to show the records from the cursor. Syntax: FETCH <cursor_name> 4. Optionally a cursor can be closed when it is not required temporarily. A cursor is closed using the CLOSE TSQL statement. It closes an open cursor by releasing the current result set. Once a cursor is closed, the rows can be fetched only after it is reopened. Syntax: CLOSE: <Cursor_name> 5. When the cursor is not required any more, its reference is removed DEALLOCATE statement is used to remove the reference of a cursor. Once the cursor is created and opened, rows are fetched from the cursor. We will see in detail fetching and scrolling How does Fetching and Scrolling work? When a cursor is opened, the current row position in the cursor is logically before the first row. Transact-SQL cursors can fetch one row at a time. The operation of retrieving the rows from the cursor is called fetching. These are the various fetch operations:

o FETCH FIRST Fetches the first row in the cursor.

o FETCH NEXT

Fetches the row after the previously fetched row

o FETCH PRIOR Fetches the row before the previously fetched row.

o FETCH LAST

Fetches the Last row in the cursor

o FETCH ABSOLUTE n If n is a positive integer, it fetches the nth row from the first row in the cursor. If n is a negative integer, the nth row before the last row in a cursor is fetched. If n is 0, no rows are fetched. For example, FETCH Absolute 2 will display the second record from a table.

o FETCH RELATIVE n Fetch the nth row from the previously fetched row, if n is positive. If n is negative the nth row before previously fetched row is fetched. If n is 0, the same row is fetched again.



Page 165 of 194

Note: By default only the FETCH NEXT option works. If you want to use some other options with the FETCH statement, you should include some more options in the DECLARE statement while creating the cursors. Some more attributes can be added to the DECLARE statement to enhance the scroll ability of the cursors. They are: DECLARE <Cursor_Name> CURSOR [LOCAL |GLOBAL] [FORWARD ONLY | SCROLL] [STATIC | KEYSET | DYNAMIC | FAST_FORWARD] [READ_ONLY | SCROLL_LLOCKS | OPTIMISTIC] [TYPE_WARNING] FOR <Select Statements>] [FOR UPDATE [OF column_name […..N]]] Each attribute is explained here

o LOCAL Specifies the scope of the cursor to the stored procedure or trigger in which it is created. In other words the name of the cursor is valid within the scope. The cursor is implicitly de-allocated when the stored procedure or the trigger terminates.

o GLOBAL

Specifies that the scope of a cursor is global. The cursor name can be referenced in any stored procedure.

o FORWARD_ONLY

Specifies that the cursor can be scrolled from the first to the last row. FETCH NEXT is the only supported fetch option. By default a cursor is FORWARD ONLY.

o SCROLL

Specifies that all fetch options (FIRST, LAST, PRIOR, NEXT, RELATIVE, ABSOLUTE) are available. If SCROLL is not specified in DECLARE CURSOR, NEXT is the only fetch option supported.

o STATIC

Defines a cursor that makes a temporary copy of the data to be used by the cursor. All requests to the courser are answered from this temporary table in tempb. Therefore, modifications made to the base tables are not reflected by fetches made to this cursor, and this cursor does not allow modifications.

o KEYSET

Specifies the order of the rows in the cursor is fixed when the cursor is opened.

o DYNAMIC Defines a cursor that reflects all the changes made to the rows in its result set as one scroll around the cursor. It does not support the ABSOLUTE fetch options.

o FAST_FORWARD

Specifies a FORWARD_ONLY and READ_ONLY cursor. FAST_FORWARD cannot be specified with SCROLL or FOR_UPDATE options. FORDWARD ONLY and FAST_FORWARD cursors are mutually exclusive, if one is specified the other one cannot be specified.



Page 166 of 194

o READ_ONLY Prevents updates made through this cursor. It cannot be referenced in a WHERE CURRENT OF clause in an UPDATE or DELETE statement.

A DELETE statement can have a reference of a cursor using WHERE CURRENT OF clause. For example, Delete From FlightSchedule WHERE CURRENT OF Try The above DELETE statement will delete the rows selected in Try cursor. This is known as ‘positioned delete’

o SCROLL_LOCKS

Specifies that positioned updates or deletes made through the cursor are guaranteed to succeed. SQL Server locks the rows as they are read into the cursor to ensure their availability with FAST_FORWARD option.

o OPTIMISTIC

Specifies the positioned updates or deletes made through the cursor do not succeed, if the row is updated since it was read into the cursor.

o TYPE_WARNING

Specifies a warning message sent to the client if the cursor is implicitly converted from the requested type to another.

o UPDATE [OF Column_name [,…n]]

Defines updatable columns within the cursor. If OF Column_name [,…n], only the listed columns are allowed for modification.

Among the attributes listed above DYNAMIC, STATIC, KEYSET and FORWARD_ONLY define the Processing characteristics of a cursor whereas the rest of them define the functional characteristics of a cursor. In order to know the status of the fetch statement, we need to learn two global variables. @@FETCH_STATUS This variable return an integer for the last cursor FETCH statement executed. Returns the status of the last cursor FTECH statement issued against any cursor currently opened by the connection. Return value Description 0 FETCH statement was successful. -1 FETCH statement failed or the row was beyond the result set. -2 Row fetched is missing. @@CURSOR_ROWS This returns the number of qualifying rows that are in the currently opened cursor. Examples on cursor from the Pubs database



Page 167 of 194

Simple cursor and syntax The result set generated at the opening of this cursor includes all rows and all columns in the authors table of the pubs database. This cursor can be updated, and all updates and deletes are represented in fetches made against this cursor. FETCH NEXT is the fetch available because the SCROLL option has not been specified. DECLARE authors_cursor CURSOR FOR SELECT * FROM authors OPEN authors_cursor FETCH NEXT FROM authors_cursor Use nested cursors to produce report output This example shows how cursor can be nested to produce complex reports. The inner cursor is declared for each author. DECLARE @au_id varchar(11), @au_fname varchar(20), @au_lname varchar(40), @message varchar(80), @title varchar(80) DECLARE authors_cursor CURSOR FOR SELECT au_id, au_fname, au_lname FROM authors WHERE state = 'UT' ORDER BY au_id OPEN authors_cursor FETCH NEXT FROM authors_cursor INTO @au_id, @au_fname, @au_lname WHILE @@FETCH_STATUS = 0 BEGIN PRINT ' ' SELECT @message = '-----Books by Author:'+ @au_fname + ' ' + @au_lname PRINT @message -- Declare an inner cursor based -- On au_id from the outer cursor. DECLARE titles_cursor CURSOR FOR SELECT title FROM titleauthor ta, titles t WHERE ta.title_id = t.title_id AND Ta.au_id = @au_id -- Variable value from the outer cursor OPEN titles_cursor FETCH NEXT FROM titles_cursor INTO @title IF @@FETCH_STATUS <> 0 PRINT ' <<No Books>>' WHILE @@FETCH_STATUS = 0



Page 168 of 194

BEGIN SELECT @message = ' '+ @title PRINT @message FETCH NEXT FROM titles_cursor INTO @title END CLOSE titles_cursor DEALLOCATE titles_cursor -- Get the next author. FETCH NEXT FROM authors_cursor INTO @au_id, @au_fname, @au_lname END CLOSE authors_cursor DEALLOCATE authors_cursor GO Use Fetch in a Simple Cursor This example declares a simple cursor for the rows in the authors table with a last name beginning with B, and uses FETCH NEXT to step through the rows. The FETCH statements return the value for the column specified in the DECLARE CURSOR as a single-row result set. DECLARE authors_cursor CURSOR FOR SELECT au_lname FROM authors WHERE au_lname LIKE 'B%' ORDER BY au_lname OPEN authors_cursor --Perform the first fetch FETCH NEXT FROM authors_cursor --Check @@FETCH_STATUS to see if there are more rows to fetch WHILE @@FETCH_STATUS = 0 BEGIN --This is executed as long as previous fetch succeeds. FETCH NEXT FROM authors_cursor END CLOSE authors_cursor DEALLOCATE authors_cursor Use Fetch to store values in variables This example is similar to the last example, except the output of the FETCH statements is stored in local variables rather than being returned directly to the client. The PRINT statement combines the variables into a single string and returns them to the client. USE pubs



Page 169 of 194

GO -- Declare the variables to store the values returned by FETCH. DECLARE @au_lname varchar(40), @au_fname varchar(20) DECLARE authors_cursor CURSOR FOR SELECT au_lname, au_fname FROM authors WHERE au_lname LIKE 'B%' ORDER BY au_lname, au_fname OPEN authors_cursor --Perform the first fetch and store the values in variables. --Note: The variables are in the same order as the columns --in the SELECT statement. FETCH NEXT FROM authors_cursor INTO @au_lname, @au_fname --Check @@FETCH_STATUS to see if there are any more rows to fetch. WHILE @@FETCH_STATUS = 0 BEGIN PRINT 'Author:' + @au_fname + ' ' + @au_lname -- This is executed as long as the previous fetch succeeds. FETCH NEXT FROM authors_cursor INTO @au_lname, @au_fname END CLOSE authors_cursor DEALLOCATE authors_cursor GO Declare a SCROLL cursor and use the other FETCH options This example creates a SCROLL cursor to allow full scrolling capabilities through the LAST, PRIOR, RELATIVE, and ABSOLUTE options. USE pubs GO --Execute the SELECT statement alone to show the --full result set that is used by the cursor. SELECT au_lname, au_fname FROM authors



Page 170 of 194

ORDER BY au_lname, au_fname --Declare the cursor. DECLARE authors_cursor SCROLL CURSOR FOR SELECT au_lname, au_fname FROM authors ORDER BY au_lname, au_fname OPEN authors_cursor --Fetch the last row in the cursor. FETCH LAST FROM authors_cursor --Fetch the row immediately prior to the current row in the cursor FETCH PRIOR FROM authors_cursor --Fetch the second row in the cursor. FETCH ABSOLUTE 2 FROM authors_cursor --Fetch the row that is three rows after the current row. FETCH RELATIVE 3 FROM authors_cursor --Fetch the row that is two rows prior to the current row. FETCH RELATIVE -2 FROM authors_cursor CLOSE authors_cursor DEALLOCATE authors_cursor



Page 171 of 194

SUMMARY

• A cursor is a database object that helps in accessing and manipulating row-by-row data in the given result set

• Cursors are implemented in the following sequence o Declare the cursor (DECLARE statement) o Open the cursor (OPEN statement) o Fetch the individual rows (FETCH statement) o Close the cursor (CLOSE statement) o Release the cursor (DEALLOCATE statement)



Page 172 of 194

REVIEW QUESTIONS

1. What are the different type of Cursors? 2. What are the disadvantages of Cursors? 3. How to fetch the last record from the cursor? 4. What are 5 operations for using cursors? 5. What is the use of the ABSOLUTE and RELATIVE keywords



Page 173 of 194

RDBMS Concepts using SQL Server 2000 Chapter 14: Transactions and Locks

Objectives

• Understand the concept of Transactions

• Global variables and error handling

• Learn the Locking Mechanism

• Understand the different kinds of locks



Page 174 of 194

Transactions A transaction is group of several SQL commands to read and update a database, which are treated as a single unit. SQL Server guarantees that changes that changes made by a transaction are either fully recorded (if you commit the transaction) or not recorded at all (if the transaction is rolled back) By default SQL Server treats each statement as an independent transaction and immediately commits it. Transactions One of the most important aspects of database design is to ensure the logical consistency of data. It is necessary to ensure that all the applications and queries accessing the data will return accurate data in minimal time. Transactions play an important role in ensuring the consistency of data. Since transactions play a vital role in designing, they should be set up and tuned properly. Transaction is a group of database operations combined into a single unit of work that is either completely committed or rolled back. All the actions that occur on the database break down to one or more transactions. A logical unit of work must exhibit four properties called the ACID properties (Atomicity, Consistency, Isolation, Durability) to qualify as a transaction. Atomicity A transaction must be an atomic unit of work. Either all its actions should be performed or none of them should be performed. It must completely succeed or fail. If any statement in the transaction fails, the entire transaction fails completely. When a transaction is executed successfully, it can be committed. In case of failure of a statement, all previous successful statements in the transaction are rolled back or reverted to the previous state of consistency. Consistency When completed, transaction must leave the data in a consistent state. In other words, before a transaction starts, the system is in a consistent state. When a transaction is completed, the system once again must be in the known, consistent state. It must be a full and complete operation. Isolation All transactions that modify data are isolated from each other. They do not access the same data at the same time. Transactions must be stand-alone and have no dependence or effect on other transactions. A modifying transaction can access the data only before or after another transaction is completed. These types of transaction do not affect the state of other transactions. Durability Transaction durability implies that the transaction modifications are permanent and persistent. Even when the computer crashes or reboots, the data is guaranteed to be completed after it is restarted.



Page 175 of 194

Transaction plays a vital role in managing the SQL Server database. Hence, you must be aware of what happens during the execution of SQL Server transaction. A transaction starts with the BEGIN TRAN or BEGIN TRANSACTION statement. When this command is received, the SQL Server starts the transaction. Since no work has commenced yet, SQL Server will not allocate any memory to log the records. When the INSERT, SELECT, UPDATE, DELETE statement is detected, SQL Server allocates a new transaction ID and creates a log record in the memory. If the records are altered, a new record is created and recorded in the memory. All altered rows are recorded in the logs and then the data page is actually changed in memory. After all SQL Server statements have completed the process and SQL Server receives the COMMIT TRAN or COMMIT TRANSACTION statement, the records are written to the transaction log of the database. This ensures that all the transaction can be recovered, in the event of power failure. The log records are stored separately in the temporary files and they cannot be accessed with any Transact SQL Statements. Only SQL Server internal process can access them. Once the transaction logs are written the changes are automatically applied to the database. In the next section we will look at different types of transactions Types of Transaction There are three types of transactions Explicit Transactions Explicit Transactions are manually configured transaction in which the beginning and the end of the transactions are clearly defined. In the earlier versions of SQL Server, these transactions were referred as user-defined transactions. The Transact SQL reserved words are used to specify the beginning and end of the transactions. The reserved words are: BEGIN TRANSACTION: To start an explicit transaction, the BEGIN TRAN or BEGIN TRANSACTION statement is used. COMMIT TRANSACTION: To successfully complete the transaction, the COMMIT TRAN or COMMIT TRANSACTION statement is used. COMMIT WORK: This statement is functionally identical to COMMIT TRANSACTION, except COMMIT TRANSACTION accepts a user-defined transaction name. ROLLBACK TRANSACTION and ROLLBACK WORK: To cancel the transaction, the ROLLBACK WORK or ROLLBACK TRANSACTION statement is applied.

Some statements not supported by explicit transaction are: ALTER DATABASE DROP DATABASE

Note: All work done before the ROLLBACK TRANSACTION or ROLLBACK WORK

right upto the immediately preceding BEGIN TRAN statement will be cancelled.



Page 176 of 194

RESTORE DATABASE CREATE DATABASE DISK INIT LOAD DATABASE LOAD TRANSACTION DUMP TRANSACTION Explicit transaction last only for the duration of the transaction. Once the transaction ends the connect returns to the previous transaction mode. For example, consider that a company decides to give a 10% discount on First Class Fare and adds a record in the Service table for not availing the service. The explicit transaction will be written as shown below(before executing the below code ensure that flight_details and service table is created ) BEGIN TRANSACTION select fare, class_code from flight_details set Fare = Fare – Fare * 0.1 select fare, class_code from flight_details INSERT Service VALUES (‘NA’, ‘Not Availed’) Select * from Service COMMIT TRANSACTION OUTPUT

Service_code Service_name 1 CC Child Care 2 N Nurse 3 WC Wheel Chair 4 NA Not Availed

Implicit Transaction Implicit transaction is enabled with the statement SET IMPLICIT_ TRANSACTION ON. The user commences the Explicit Transaction (as they are user-defined) whereas the server commences the implicit transaction. Once the implicit transaction is set on, SQL Server will start the implicit transaction whenever it encounters the following keywords: ALTER TABLE REVOKE CREATE SELECT DELETE INSERT



Page 177 of 194

DROP Once the Implicit transaction is set on, SQL Server generates a continuous chain of implicit transaction until the implicit transaction is set off. SET IMPLICIT_ TRANSACTION OFF statement is issued to turn the implicit transaction off. If any transaction in the chain of implicit transaction fails, the entire chain is aborted and all successful transactions are rolled back to their original sate. This type of transaction does not require BEGIN TRAN or BEGIN TRANSACTION. Implicit transaction runs automatically. The following examples shows how to create an implicit transaction to ensure data consistency and recoverability USE pubs GO CREATE TABLE Practice (Columna int PRIMARY KEY, Columnb char (3) NOT NULL) GO SET IMPLICIT_TRANSACTIONS ON /* First implicit transaction started by an INSERT statement */ INSERT INTO Practice VALUES (1, 'aaa') GO INSERT INTO Practice VALUES (2, 'bbb') GO /* Commit first transaction */ COMMIT TRANSACTION GO /* Second implicit transaction started by a SELECT statement */ SELECT COUNT (*) FROM Practice GO INSERT INTO Practice VALUES (3, 'ccc') GO SELECT * From Practice GO /* Commit second transaction */ COMMIT TRANSACTION GO SET IMPLICIT_TRANSACTIONS OFF Error Handling You can create user-defined error message with the sp_addmessage stored procedure. Its syntax is SP_ADDMESSAGE number, severity, “message” The first parameter, number, is the ID of the message and it is of integer type. Acceptable values for user-defined error message are 50001 onwards. The second parameter, severity, is the severity level of the error. It is a smallint, with valid values from 1 through 25. The third parameter is the text of the error message. For example, Excep sp_addmessage 50001, 10, ‘This department code not allowed’



Page 178 of 194

A user=defined error can be returned from a procedure, by using the RAISEROR statement. For example, CREATE TRIGGER employee_insupd ON employee FOR INSERT, UPDATE AS /* Get the range of level for this job type from the jobs table. */ DECLARE @@MIN_LVL tinyint, @@MAX_LVL tinyint, @@EMP_LVL tinyint, @@JOB_ID smallint SELECT @@MIN_LVl = min_lvl, @@MAX_LV = max_lvl, @@ EMP_LVL = i.job_lvl, @@JOB_ID = i.job_id FROM employee e, jobs j, inserted i WHERE e.emp_id = i.emp_id AND i.job_id = j.job_id IF (@@JOB_ID = 1) and (@@EMP_lVl <> 10) BEGIN RAISERROR ('Job id 1 expects the default level of 10.', 16, 1) ROLLBACK TRANSACTION END ELSE IF NOT @@ EMP_LVL BETWEEN @@MIN_LVL AND @@MAX_LVL) BEGIN RAISERROR ('The level for job_id:%d should be between %d and %d.', 16, 1, @@JOB_ID, @@MIN_LVL, @@MAX_LVL) ROLLBACK TRANSACTION END The RAISEROR statement is called by specifying the error code as the first parameter, and severity level as the second parameter. Severity levels from 0 through 18 can be used by any user. Severity levels 19 through 25 are used only by member of the sysadmin fixed database role. The third parameter can have a value between 1 and 127, but is currently not used by SQL Server. You can view the list of system-defined error message by querying the master sysmessage table. Global Variables There are some built-in global variables is T-SQL, such as @@ERROR (last return code generated), and @@ROWCOUNT (number of rows selected or affected by the last statement). However, there is currently no facility for user-defined global variables in T-SQL. Error Checking in Transactions Certain error are treated as fatal as inside transactions, and cause the entire transaction to be automatically rolled back and aborted. For example, if the system runs out of locks or memory, a fatal error will be automatically get aborted immediately on such an error. Non-fatal errors do not result in the entire transaction to be rolled back, but only the concerned statement being skipped. For example, if, as part of a transaction, you attempted to insert a row in a table which would violate a foreign key constraint, that insert statement is discarded, but execution continues to the next statement within the transaction.



Page 179 of 194

Locking Locking is a crucial function of any multiple-user database system, including Microsoft SQL Server. As you know, SQL Server manages multiple users simultaneously and ensure that all transaction observe the properties of the specified isolation level. At the highest isolation level. Serializable, SQL Server must make the multiple-user system yield results that are indistinguishable from those of a single-user system. It does this by automatically locking data prevent changes made by one user from unexpectedly affecting work being done by another user on the same database at the same time. Transaction Log SQL server maintains a log to manage all its transactions. This log is called the transaction log. The transactions log is used to revert all the transaction activities that could not be completed due to reasons like power failures. This helps SQL Server to maintain data integrity. Locking SQL Server uses the concept of locking to ensure transactional integrity and database consistency. Locking by functionality prevents users from accessing information being changed by other users. In a multi-user environment, locking prevents users from changing the same data at the same time. In the absence of locking, information or data within the database may become logically incorrect. This problem can result in unexpected results. In a SQL server locking is implemented automatically, whereas you can design application that is more efficient by understanding and customizing locking in your applications. For a true transaction-processing database, the DBMS is resolving potential conflicts between two different processes that attempt to change the same piece of information at the same time. The responsibility for ensuring conflicts resolution between users falls on the SQL Sever Locking Manager. Locks are used to guarantees that the current user of a resource has consistent view of those resources, from the beginning to the completion of a particular operation. In other words, what we start with has to be be what we work with throughout our application. Without locking, the view of transaction processing is impossible. Transaction and locking, therefore, are part of the same tree- ensuring the completion of data modification. Transactional Concurrency SQL Sever provides both optimistic and pessimistic concurrency controls. It uses pessimistic concurrency control by default, whereas the optimistic concurrency control can be used through cursors. Optimistic Concurrency Optimistic Concurrency control works on the basis of assumption. The assumption is that resource conflict between multiple users is unlikely, but not impossible. Optimistic concurrency allows transaction to execute without locking any resource. Resources are checked only when transaction has to commit, to determine if any conflicts have occurred. If there is a conflict, the transaction, the transaction starts again. Pessimistic Concurrency Pessimistic Concurrency locks resources, as they are required, for the duration of a transaction. Unless a deadlock occurs, a transaction is assured of successful completion. Concurrency Problem In the absence of Locking, when many users access a database, there are four types of problem that may occur if their transactions use the same data at the same time. The four problems are:

1. Lost Update 2. Uncommitted Dependency 3. Inconsistent analysis



Page 180 of 194

4. Phantom read Lost Updates The lost update problem occurs when two or more transactions try to modify the same row that is based on the originally selected values. In this event, each transaction is unaware of other transactions. The last update in the transaction queue overwrites the updates by the previous transactions. Hence, the data has been lost Uncommitted Dependency (Dirty Read) The uncommitted dependency problem is also known as dirty read problem. This problem can be best explained with the example. An employee is making changes to the company policy document. When the changes are being made, another employee takes a copy of document, which includes all the changes made so far, and distributes it to the intended audience. The first employee the modifies the previously made changes and saves the document. Now, the distributed document contains information that no longer exists and should be treated as if is never existed. The solution to this problem would be that no one should be able to read the changed document until the first edition determined that the change is final. Inconsistent analysis The inconsistent analysis problem is also known as the non-repeatable problem. E.g., an employee reads a particular document twice. Between each reading by the employee, the writer rewrites the original document. When the first employee reads the document for the second time, it has completely changed. Hence the original read was not repeatable, leading to confusion. To ensure that this does not happen, the employee should be able to read the document only after the writer had completely finished writing it. Phantom read The Phantom read problem is also known as phantom problem. E.g. the supervisor reads and makes suggestion to change a document submitted by an employee. When the suggested changes are being incorporated into the master copy of the document, the other employee find that new and unreviewed content has been added to the document by the employee, leading to confusion and problems. No one should be able to add new material to the document until the editor and production department finishes working with the original document. SQL Server can lock the following resources. The following is a list of resources that can be locked.

Resource Description

RID Row identifier. Used to lock a single row within a table.

Key Row lock within an index. Used to protect key ranges in serializable transactions.

Page 8 kilobyte –(KB) data page or index page.

Extent Contiguous group of eight data pages or index pages.

Table Entire table, including all data and indexes.

DB Database.

SQL Server locks resources using different lock modes that determine how the resources can be accessed by concurrent transactions.

SQL Server uses these resource lock modes.



Page 181 of 194

Lock mode Description

Shared (S) Used for operations that do not change or update data (read-only operations), such as a SELECT statement.

Update (U) Used on resources that can be updated. Prevents a common form of deadlock that occurs when multiple sessions are reading, locking, and potentially updating resources later.

Exclusive (X) Used for data-modification operations, such as INSERT, UPDATE, or DELETE. Ensures that multiple updates cannot be made to the same resource at the same time.

Intent Used to establish a lock hierarchy. The types of intent locks are: intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX).

Schema Used when an operation dependent on the schema of a table is executing. The types of schema locks are: schema modification (Sch-M) and schema stability (Sch-S).

Bulk Update (BU) Used when bulk-copying data into a table and the TABLOCK hint is specified.

Shared Locks

Shared (S) locks allow concurrent transactions to read (SELECT) a resource. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the data has been read, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.

Update Locks

Update (U) locks prevent a common form of deadlock. A typical update pattern consists of a transaction reading a record, acquiring a shared (S) lock on the resource (page or row), and then modifying the row, which requires lock conversion to an exclusive (X) lock. If two transactions acquire shared-mode locks on a resource and then attempt to update data concurrently, one transaction attempts the lock conversion to an exclusive (X) lock. The shared-mode-to-exclusive lock conversion must wait because the exclusive lock for one transaction is not compatible with the shared-mode lock of the other transaction; a lock wait occurs. The second transaction attempts to acquire an exclusive (X) lock for its update. Because both transactions are converting to exclusive (X) locks, and they are each waiting for the other transaction to release its shared-mode lock, a deadlock occurs.

To avoid this potential deadlock problem, update (U) locks are used. Only one transaction can obtain an update (U) lock to a resource at a time. If a transaction modifies a resource, the update (U) lock is converted to an exclusive (X) lock. Otherwise, the lock is converted to a shared-mode lock.

Exclusive Locks

Exclusive (X) locks prevent access to a resource by concurrent transactions. No other transactions can read or modify data locked with an exclusive (X) lock.

Intent Locks

An intent lock indicates that SQL Server wants to acquire a shared (S) lock or exclusive (X) lock on some of the resources lower down in the hierarchy. For example, a shared intent lock placed at the table level means that a transaction intends on placing shared (S) locks on pages or rows within that table. Setting an intent lock at the table level prevents another transaction from subsequently



Page 182 of 194

acquiring an exclusive (X) lock on the table containing that page. Intent locks improve performance because SQL Server examines intent locks only at the table level to determine if a transaction can safely acquire a lock on that table. This removes the requirement to examine every row or page lock on the table to determine if a transaction can lock the entire table.

Intent locks include intent shared (IS), intent exclusive (IX), and shared with intent exclusive (SIX).

Lock mode Description

Intent shared (IS) Indicates the intention of a transaction to read some (but not all) resources lower in the hierarchy by placing S locks on those individual resources.

Intent exclusive (IX) Indicates the intention of a transaction to modify some (but not all) resources lower in the hierarchy by placing X locks on those individual resources. IX is a superset of IS.

Shared with intent exclusive (SIX)

Indicates the intention of the transaction to read all of the resources lower in the hierarchy and modify some (but not all) resources lower in the hierarchy by placing IX locks on those individual resources. Concurrent IS locks at the top-level resource are allowed. For example, an SIX lock on a table places an SIX lock on the table (allowing concurrent IS locks), and IX locks on the pages being modified (and X locks on the modified rows). There can be only one SIX lock per resource at one time, preventing updates to the resource made by other transactions, although other transactions can read resources lower in the hierarchy by obtaining IS locks at the table level.

Schema Locks

Schema modification (Sch-M) locks are used when a table data definition language (DDL) operation (such as adding a column or dropping a table) is being performed.

Schema stability (Sch-S) locks are used when compiling queries. Schema stability (Sch-S) locks do not block any transactional locks, including exclusive (X) locks. Therefore, other transactions can continue to run while a query is being compiled, including transactions with exclusive (X) locks on a table. However, DDL operations cannot be performed on the table.

Bulk Update Locks

Bulk update (BU) locks are used when bulk copying data into a table and either the TABLOCK hint is specified or the table lock on bulk load table option is set using sp_tableoption. Bulk update (BU) locks allow processes to bulk copy data concurrently into the same table while preventing other processes that are not bulk copying data from accessing the table.

Deadlocks

A deadlock occurs when there is a cyclic dependency between two or more threads for some set of resources.

Deadlock is a condition that can occur on any system with multiple threads, not just on a relational database management system. A thread in a multi-threaded system may acquire one or more resources (for example, locks). If the resource being acquired is currently owned by another thread, the first thread may have to wait for the owning thread to release the target resource. The waiting thread is said to have a dependency on the owning thread for that particular resource.



Page 183 of 194

If the owning thread wants to acquire another resource that is currently owned by the waiting thread, the situation becomes a deadlock: both threads cannot release the resources they own until their transactions are committed or rolled back, and their transactions cannot be committed or rolled back because they are waiting on resources the other owns. For example, thread T1 running transaction 1 has an exclusive lock on the Supplier table. Thread T2 running transaction 2 obtains an exclusive lock on the Part table, and then wants a lock on the Supplier table. Transaction 2 cannot obtain the lock because transaction 1 has it. Transaction 2 is blocked, waiting on transaction 1. Transaction 1 then wants a lock on the Part table, but cannot obtain it because transaction 2 has it locked. The transactions cannot release the locks held until the transaction is committed or rolled back. The transactions cannot commit or roll back because they require a lock held by the other transaction to continue.

The LOCK Manager SQL Server can lock data using several different modes. For example, read operations acquire shared locks and write operations acquire exclusive locks. Update locks are acquired during the initial portion of an update operation when the data is read. The SQL Server lock manager acquires and releases these locks. It also manages compatibility between lock modes, resolves deadlocks, and escalates locks if necessary. It controls locks on tables, on the pages of a table, on Index keys, and on individual rows of data. Locks can also be held on system data-data that’s private to the database system, such as page headers and Index. Another way to look at the difference between looks and latches is that locks ensure the logical consistency of the data and latches ensure the physical consistency. Latching happens when you place a row physically on a page or move data in other ways, such as compressing the space on a page. SQL Server must guarantee that this data movement can happen without interference.



Page 184 of 194

SUMMARY

• A transaction is a sequence of operations performed together as a single logical unit of work.

• A transaction must possess the four properties called ACID (Atomic, Consistency, Isolation and Durability)

• Transactions can be handled in three ways o Explicit transactions o Autocommit transactions o Implicit transactions

• Using either COMMIT or ROLLBACK statements can terminate the SQL Server transactions

• Explicit transactions are user-defined transactions with a start and end of transactional statements

• SQL server uses concept of locking to ensure transactional integrity and database consistency

• SQL Server provides two types of consistency o Optimistic concurrency o Pessimistic concurrency

• There are four type of concurrency problems o The lost update problem o The uncommitted dependency problem o The inconsistency analysis problem o The Phantom read problem

• The following are the locks modes supported by SQL Server o Shared Lock o Update Lock o Exclusive Lock o Intent Lock o Schema Lock

• A deadlock is a situation in which two users or transactions have locks on separate objects and each user is waiting for a lock on the other’s object.



Page 185 of 194

REVIEW QUESTIONS

1. What are the types of Locks available? 2. Mention the different isolation levels? 3. What is a Deadlock? 4. Differentiate between Implicit and Explicit Transactions?



Page 186 of 194

RDBMS Concepts using SQL Server 2000 Chapter 15: Triggers

Objectives

• Understand What is Trigger

• What is the use of Trigger

• How to Create Trigger

• Types of Triggers



Page 187 of 194

Trigger A trigger is a special type of stored procedure that is fired on an event-driven basis rather than by a direct call. A trigger can be fired when any one of the following events takes place on table: INSERT, UPDATE, or DELETE. A very trivial example of a trigger is as follows: create trigger emp_insert on employee for insert as print 'added a new employee' After creating this trigger, whenever a new row is inserted in the employee table, the specified message will be displayed. A single trigger can be created to execute for any or all of the INSERT, UPDATE, and DELETE actions: create trigger xyz on dept for insert, update, delete as … A trigger is executed once for each INSERT, UPDATE, or DELETE statement, regardless of the number of rows that it affects. A trigger fire the data modification statement has performed its work but before that work is committed. Both the statement and any modification made in the trigger are implicitly a transaction, whether or not an explicit transaction was declared. Therefore, the trigger can roll back the work. A trigger has access to the before image and after image of the data via the special pseudotables inserted and deleted. These two tables are structurally identical to the underlying table being changed. You can check the before and after values of specific column and take action depending on your requirements. Triggers are special stored procedure created by the user and provoke by the SQL Server when data modification Transact SQL statement are issued. Triggers are special object created on the table and are part of the database. Trigger is a special stored procedure invoked automatically whenever the data in the table is modified. Triggers are invoked in response to INSERT, UPDATE or DELETE Transact SQL statements. A trigger may query other tables and can include complex transact SQL statements. Triggers are useful in these ways: • Triggers can cascade changes through related tables in a database. This means if a row is deleted in a table, related row in other tables will also be deleted. For example, a trigger on a Route_Code column of fare table causes the deletion of matching rows in other tables, using Route_code column as key to locate the rows in Flight_Schedule. Reservations and Cancellation tables. • Triggers can disallow or reject changes that violate referential integrity, thereby canceling the attempted data modification. Such a trigger will be affected when the foreign key of one table, is given a value which does not match, with the primary key of another table. For example, in Flight_Schedule table if a Route_Code is entered that does not exist in the Route Table, when the violation of referential integrity is said to occur. • Triggers can enforce more complex restrictions than those defined with CHECK constraints. Unlike CHECK constraints, Triggers can refer a column in another table.



Page 188 of 194

• Triggers can find difference between the state of a table before and after data modification. • Multiple triggers of the same type (INSERT, UPDATE or DELETE) on a table allow multiple different actions to take place in response to the same modification statement. • Triggers can be used to implement error handling in transactions. Triggers can be used to call an alert when an error occurs. Stored procedures like sp_OAGetErrorInfo can be called in a trigger to check what caused the problem Creating Triggers The following points should be considered before creating the triggers: • The CREATE TRIGGER must be the first statement while defining a trigger. • Permission to create the trigger by default is available only to the owner of the database, which cannot be transferred. • Triggers are the part of database and the trigger name must follow the rules of the identifier. • You can create the trigger only in the current database. Although it can refer to the object of another database. There are three types of triggers: INSERT trigger, DELETE trigger and UPDATE trigger. Any type of trigger can be created using CREATE TRIGGER statement. The syntax for the CREATE TRIGGER is: CREATE TRIGGER Trigger_name ON table [WITH ENCRYPTION] FOR [DELETE][,] [INSERT] [,] [UPDATE] AS Sql_statement Here: Arguments Description Trigger_name Name of the trigger Table Name of the table in which trigger is to be executed. WITH ENCRYPTION This option is used to prevent other user from seeing the

text of the trigger after it is loaded into SQL Server [DELETE][,][INSERT][,][UPDATE] Keywords that specify which statement will invoke the

trigger AS Trigger’s action Sql_statement Transact SQL statement to e executed when the trigger is

invoked

Examples of Triggers Triggers use the Inserted and Deleted tables to hold the changed data that are not committed yet. Since these tables are created dynamically in RAM and do not use any permanent storage, these tables are called as Logical tables. When a record is added to the table, it is recorded in the Base table as well as the Inserted table. When a record is deleted, it is recorded in the Deleted table. In an update, both the Inserted and Deleted tables are used. When a record is updated, the original data is moved to the Deleted data table, and the new data is recorded in the Base and Inserted table. Triggers query these tables to determine whether further action is needed. If the changes are not successful they are rolled back. In other words if the transaction is not successful, it is rolled back. The various types of triggers are explained in the following section. Insert Triggers



Page 189 of 194

Insert Triggers ensure the validity of the data being inserted. When an INSERT transaction is deleted by the database, the insert trigger is executed. As the insert trigger start running, the inserted data is held in the logical inserted table. A copy of the new row stay in the inserted table until the trigger decides how to implement the new data insert. The example of an insert trigger is illustrated below: create trigger agecheck on employee For Insert AS If (Select age from Inserted) < 21 BEGIN PRINT ‘Age cannot be less than 21’ ROLLBACK TRANSACTION END The Agecheck trigger checks to ensure the age in the employee table is not less than 21. If it is lower than 21, it will output an error message and then roll back the transaction. Update Triggers When an update trigger is executed, the original data is moved to the logical Deleted table. The new rows are than moved to the inserted table. Once the data has been successful moved, the trigger will check to see if the data can be verified. Update trigger is of two types: • Table level The update trigger is fired when any field in the row is updated. The following is an example of the update trigger occurring at the table level. CREATE TRUIGGER Service_Fare_Check ON Swrvice FOR UPDATE AS IF (Select SS_Fare from inserted) < 50 BEGIN Print ‘Service Fare cannot be less than 50’ ROLL BACK TRANSACTION END • Column level The update column level trigger occurs when data in a particular column is updated. It uses IF UPDATE(Column_name) clause. The following is an example of update trigger occurring at column level. CRETAE TRIGGER Service_Fare_Column On Service FOR UPDATE AS IF UPDATE(SS_Fare) BEGIN PRINT ‘You cannot update the Fare Field’ ROLL BACK TRANSACTION



Page 190 of 194

END Delete Triggers Delete triggers are executed when the delete statement is issued against rows in a table. When rows from a table are deleted, which is protected by delete trigger, the deleted rows are moved from the target table to Logical deleted table. Delete triggers are used for two reasons: Enforcing data integrity and cascading deletes. Delete trigger can prevention the deletion of crucial data, such as a foreign key. Without delete trigger there is no way to maintain the integrity of the data on the table out sides of the methods native to SQL server, such as PRIMARY KEY constraint. The other reason of using the delete trigger is for the option of cascading delete triggers. An example of cascading delete is when the deletion of the master record occurs, SQL Server prompts for deletion of all the child records. Example of Delete Trigger CREATE TRIGGER No_Records_Deleted ON Reservation FOR DELETE IF (Select count(*) from Deleted) > 5 BEGIN Print ‘You should not delete more than 5 records’ ROLLBACK TRANSACTION END This example ensures that every time if somebody tries to delete the records, the number should not exceed 5. If somebody tries to delete more than 5 records the transaction will be cancelled Rolling Back a Trigger Executing a ROLLBACK from within a trigger is different from executing a ROLLBACK from within a nested stored procedure. In a nested stored procedure, a ROLLBACK will causes the outermost transaction to abort, but the flow of control continues. However, if a trigger results in a ROLLBACK (because of a fatal error or an explicit ROLLBACK command), the entire batch is aborted. Suppose that the following pseudocode batch is issued from the Query Analyzer: begin tran delete… update… insert…--This starts some chain of events that fires a trigger -- that rolls back the current transaction update….—Execution never gets to here – entire batch is -- aborted because of the rollback in the trigger if -----commit – Neither this statement nor any of the following -- will be executed else…..rollback begin tran….. insert…… if……commit else……rollback GO -- isql batch terminator only



Page 191 of 194

select … -- Next statement that will be executed is here As you can see, once the trigger in the first INSERT statement aborts the batch, SQL Server not only roll back the first transaction but skips the second transaction completely and continues execution following the GO Misconceptions about triggers include the belief that the trigger cannot do a SELECT statement that returns rows and that it cannot execute a PRINT statement. Although you can use SELECT and PRINT in a trigger, doing these operations is usually a dangerous practice unless you control all the applications that will work with the table that includes the trigger. Otherwise, applications not written to except a result set or a print message following a change in data might fail because that unexpected behavior occurs anyway. For the most part, you should assume the trigger will execute invisibly, and that if all goes well, users and application programmers will never even know that a trigger was fired. Be aware that if a trigger modifies data in the same table on which the trigger exists, using the same operation (INSERT, UPDATE or DELETE) won’t by default, fire that trigger again. That is, if you have an UPDATE trigger for the inventory table that updates the inventory table within the trigger, the trigger will not be fired a second time You can change this behavior by allowing triggers to be recursive. This is controlled on a database-by-database basis by setting the option recursive triggers to TRUE. It’s up to the developer to control the recursion and make sure it terminates appropriately. However, it will not cause an infinite loop if the recursion isn’t terminated because, just like stored procedures, triggers can be nested only to a maximum level of 32. Even if recursive triggers have not been enabled, if separate triggers exist for INSERT, UPDATE, and DELETE statements, one trigger on a table could cause other triggers on the same table to fire (but only if sp_configuration ‘nested triggers’ is set to 1, as we’ll see in a moment) A trigger can also modify data on some other table. If that other table has a trigger whether or not that triggers also fires depends on the current sp_configuration value for the nested triggers option. If that option is set to 1 (TRUE), which is the default setting, triggers will cascade to maximum chain of 32. I f an operation would cause more than 32 triggers to fire, the fire batch will be aborted and any transaction will be rolled back. This prevents an infinite cycle for being encountered. If your operation is hitting the limited of 32 firing triggers, you should probably look at your design-you-‘ve reached a point at which there are no longer any simple operations, so you’re probably not going to be ecstatic with the performance of your system. If your operation truly is so complex that you need to perform further operation on 32 or more tables to modify any data, you could call stored procedures to perform the actions directly rather than enabling and using cascading triggers. Although valuable, overused cascading triggers can make your system a nightmare to maintain. Limitations of a Trigger Following are the limitations of triggers: • Triggers can only be created on the tables. Although a trigger can refer to temporary tables, triggers cannot be created on view or temporary tables. • Triggers cannot be used to output the result sets from the SQL statements that lie within the trigger • Triggers cannot contain objects creation statements such as: CREATE DATABASE, CREATE TABLE, CREATE Index, CREATE PROCEDURE, CREATE DEFAULT, CREATE RULE, CREATE TRIGGER and CREATE VIEW. • Trigger cannot contain any DROP statements such as DROP DATABASE . • Triggers cannot use database object modification statement such as ALTER TABLE, ALTER DATABASE. • Triggers cannot use the database load operations, such as LOAD DATABASE.



Page 192 of 194

• Triggers cannot utilize the statement related to the object permission, such as GRANT. Disable trigger A trigger fires after the data modification statement has performed its work but before that work is committed to the database. Both the statement and any modifications made in the trigger are implicitly transaction (whether or not an explicit BEGIN TRANSACTON was declared). Therefore, the trigger can roll back the data modifications that caused the trigger to fire. A trigger has access to the before image and after image of the data via the special pseudotables and deleted. These two tables have the same set of column as the underlying table being changed. You can check the before and after values of specific column and takes action depending on what your encounter. These tables are not physical structures—SQL Server constructs them from the transaction log. This is why an unlogged operation such as a bulk copy or SELECT INTO does not cause triggers to fire. For regular logged operations, a trigger will always fire if it exists and has not been disabled. A trigger can be disabled by using the DISABLE TRIGGER clause of the ALTER TABLES statement. The inserted and deleted pseudotables cannot be modified directly because they don’t actually exist. As mentioned earlier the data from these tables can be queried only. The data they appear to contain is based entirely on modification made to data in an actual, underlying base table. The inserted and deleted pseudotables will contain as many rows as the insert, update, or delete statement affected. Sometimes it is necessary to work on a row-by-row basis within the pseudotables, although, as usual, a set-based operation is generally preferable to row-by-row operations. You can perform row-by-row operations by executing the underlying insert, update, or delete in a loop so that any single execution affects only one row, or you can perform the operations by opening a cursor on one of the inserted or deleted tables within the trigger. This example uses the DIABLE TRIGGER option of ALTER TABLE to disable the trigger and allow an insert that would normally violate the trigger. It then uses ENABLE TRIGGER to reenable the trigger CREATE TABLE trig_example (id int, name varchar(10) salary money) go --Create the trigger. CREATE TRIGGER trig1 ON trig_example FOR INSERT as IF (SELECT COUNT(*) FROM INSERTED WHERE salary > 100000)> 0 BEGIN print “TRIG Error: you attempted to insert a salary > $100.000” ROLLBACK TRANSACTION END GO



Page 193 of 194

SUMMARY

o A trigger is a block of code that constitutes a set of T-SQL statements that gets activated in response to certain actions

o A trigger fires in response to an INSERT, UPDATE and DELETE statement o There are several restrictions on the use of triggers o A trigger can be used to enforce the business rules and data integrity o A magic table is a conceptual table that is structurally similar to the table

on which the trigger is defined o There are two types of magic table – INSERTED and DELETED o SQL Server supports multiple, recursive and nested triggers o A trigger can be dropped using DROP TRIGGER statement o A trigger can be altered using ALTER TRIGGER statement o A trigger can be viewed using sp_help and sp_helptext system stored

procedure



Page 194 of 194

REVIEW QUESTIONS

1. What are triggers? 2. How to invoke a trigger on demand? 3. Can a trigger be fired recursively? 4. Does a trigger get fired when the DROP TABLE command is executed? 5. Can we write the COMMIT and ROLLBACK statements in a trigger definition?

Documents

RDBMS Using SQL Server 2000