63
IBM PureData System for Analytics (Formerly known as, IBM Netezza) - Ravi

Netezza database objects creation by

Embed Size (px)

DESCRIPTION

By, www.etraining.guru

Citation preview

  • IBM PureData Systemfor Analytics (Formerly known as, IBM Netezza)

    - Ravi

  • CREATE DATABASE

  • Default Database: SYSTEMDefault Database Template: MASTER_DBIn Netezza world, DATABASE=CATALOGStarting in 7.0.3, Netezza supports the ability to define multiple schemas within each databaseIn previous releases (i.e., before 7.0.3), the Netezza system supported one default schema per database.The default and only schema matched the name of the database user who created the databaseEnable Multiple schema support:vi /nz/data/postgresql.confVariable: enable_schema_dbo_check

    0: Single schema

    1: Enables multiple schema support in limited mode. You will get a warning, if a query references an invalid schema

    2: Enables support for multiple schemas. Users can create, alter, set, and drop schemas. If a query references an invalid schema, the query returns an error

  • Syntax:CREATE DATABASE name

    [ WITH ] [ DEFAULT CHARACTER SET charset ] [ DEFAULT CHARACTER SET charset COLLATION collation ] [ COLLECT HISTORY [ ON | OFF ] ] [ REPLICATION SET name ]

    Create a database, optionally adding it to the specified replication set.Example:SYSTEM(ADMIN)=> create database trainingdb;CREATE DATABASESYSTEM(ADMIN)=> \l List of databases DATABASE | OWNER------------+------- MASTER_DB | ADMIN SYSTEM | ADMIN TRAININGDB | ADMIN(3 rows)

  • SYSTEM(ADMIN)=> \c trainingdbYou are now connected to database trainingdbTRAININGDB(ADMIN)=> select objid, database from _v_database; OBJID | DATABASE--------+------------ 2 | MASTER_DB 1 | SYSTEM 257262 | TRAININGDB(3 rows)

    TRAININGDB(ADMIN)=> \q[nz@netezza 257323]$ nzstats -type database

    DB Id DB Name Create Date Owner Id Num Tables Num Views Num Active Users------ ---------- ------------------- -------- ---------- --------- ---------------- 1 SYSTEM 2012-11-07 05:36:03 500 0 0 7 2 MASTER_DB 2012-11-07 05:36:03 500 0 0 0257262 TRAININGDB 2014-01-21 01:46:57 500 1 0 0Any idea, where this newly created database stored? (In SPU disks (or) Netezza host disks?)/nz/data/base/

  • When we create a database, the system automatically creates 3 schemas:- INFORMATION_SCHEMA- DEFINITION_SCHEMA- INFORMATION_SCHEMA & DEFINITION_SCHEMA are used by the system to hold information about system objects and views, but they are not accessible by users. The owner schema is the default schemaA database name can be a maximum of 128 bytesAny user who is the owner of a database automatically has full privileges to all the objects in the database. Similarly, for systems that support multiple schemas in a database, the schema owner automatically has full privileges to all the objects in the schema.SET CATALOG SET SCHEMA

  • You can manage Netezza databases using:(1) NZSQL(2) Netezza Performance Portal(3) NzAdmin tool(4) Web Admin Interface(5) Data Connectivity applications like ODBC, JDBC, and OLEDB. Ex: Aginity Workbench

  • CREATE TABLE

  • Syntax:

    CREATE [ TEMPORARY | TEMP ] TABLE table_name ( column_name type [ [ constraint_name ] column_constraint [ constraint_characteristics ] ] [, ... ] [ [ constraint_name ] table_constraint [ constraint_characteristics ] ] [, ... ] ) [ DISTRIBUTE ON ( column [, ...] ) ] [ ORGANIZE ON { ( column [, ...] ) | NONE } ] [ ROW SECURITY ]Example:TRAININGDB(ADMIN)=> \dtNo relations found.TRAININGDB(ADMIN)=> create table t1 (c1 int, c2 int) distribute on random;CREATE TABLETRAININGDB(ADMIN)=> \dt List of relations Name | Type | Owner------+-------+------- T1 | TABLE | ADMIN(1 row)When you create a table, it doesnt consume any space on dataslicesSpace in dataslices will be allocated only when you insert rows.

  • Extent size: 3 MBEach extent is divided into 24*128KB pages (also called block)Action Item: Login to NZADMIN and watch space getting allocated the moment you insert rows in a table

  • Checkpoint Time!Test our learnings:(1) Create a new database(2) Find out its object ID(3) Create a table (Using hash/random distribution). Check if any space allocation through NZADMIN(4) Insert a new row. Now check space allocation through NZADMIN(5) Insert another row. Now check space allocation through NZADMIN

  • NETEZZA DATATYPES

  • What is a Datatype? A datatype represents a set of values Each column can have only one datatype. You cant mix datatypes with a column Exact numeric data types: To determine smallest data type for fixed point numerics: SELECT MIN(column_name), MAX(column_name) FROM table_name;

  • 2. Approximate numeric data types: Dont use Floating point data types for distribution columns, join columns, or for columns that require mathematical operations such as SUM and AVGNetezza cant run a fast hash join on a floating point data type, but instead must run a slower sort and merge join

  • 3. Character String data types: To determine the optimal character data type, use the below query:SELECT MAX ( LENGTH(TRIM(column_name))), AVG (LENGTH (TRIM (column_name))) FROM table_name;If MAX(LENGTH) > CHAR == > CHAR instead of VARCHARIf AVG length + 2 < CHAR == > use VARCHAR instead of CHAR

  • 4. Boolean data types: You can use following words to specify booleans:True or falseOn or off0 or 1true or falset or fon or offyes or noNever use a Boolean data type for distribution columns

  • 5. Temporal data types:

  • 6. Binary data types:Netezza supports two types of binary data types:

  • Netezza Internal Data TypesNetezza reserves below column names as internal data types:

  • Row SizeFor every row of every table, there is a 24-byte fixed overhead of the rowid, createxid, and deletexid. If you have any nullable columns, a null vector is required and it is N/8 bytes where N is the number of columns in the record. The system rounds up the size of this header to a multiple of 4 bytes.In addition, the system adds a record header of 4 bytes if any of the following is true:Column of type VARCHARColumn of type CHAR where the length is greater than 16 (stored internally as VARCHAR)Column of type NCHARColumn of type NVARCHARThe only time a record does not contain a header is if all the columns are defined as NOT NULL, there are no character data types larger than 16 bytes, and no variable character data types

  • Data types description

  • SYNONYMS

  • An alternative way of referencing tables, views, or functionsAllows us to create easily typed names for long object namesYou can use following synonym commands:- Create Synonym- Drop Synonym- Alter Synonym- Grant Synonym- Revoke SynonymSyntax: SYSTEM(ADMIN)=> \h create synonymCommand: CREATE SYNONYMDescription: Creates a new synonymSyntax:CREATE SYNONYM name FOR refnameExample: SYSTEM(ADMIN)=> create synonym s_skew_student for skew..student;CREATE SYNONYM

  • To display synonyms: SYSTEM(ADMIN)=> \dy List of relations Name | Type | Owner----------------+---------+------- S_SKEW_STUDENT | SYNONYM | ADMIN(1 row)SYSTEM(ADMIN)=> alter synonym s_skew_student rename to s_student_skew;ALTER SYNONYMSYSTEM(ADMIN)=> alter synonym s_student_skew owner to RAJ;ALTER SYNONYMSYSTEM(ADMIN)=> \dy List of relations Name | Type | Owner----------------+---------+------- S_STUDENT_SKEW | SYNONYM | RAJ(1 row)

    SYSTEM(ADMIN)=> drop synonym s_student_skewSYSTEM(ADMIN)-> ;DROP SYNONYMSYSTEM(ADMIN)=> \dyNo relations found.SYSTEM(ADMIN)=>

  • VIEWS

  • A view is simply the representation of a SQL statement that is stored in memory so that it can be easily re-usedFor example: If we frequently issue the following query:SELECT student.sid, student.sname, marks.per from student, marks where student.sid = marks.sid;

    We can create here a view as below:CREATE VIEW v_student_marks AS SELECT student.sid, student.sname, marks.per from student, marks WHERE student.sid = marks.sid;

    From next time onwards, we can simply select from view as below:

    SELECT * FROM v_student_marks;

    Syntax:SYSTEM(ADMIN)=> \h create viewCommand: CREATE VIEWDescription: Constructs a virtual tableSyntax:CREATE VIEW view AS SELECT query

    Creates a new view.

    CREATE OR REPLACE VIEW view AS SELECT query

    Creates a new view or replaces an existing view.

  • Materialized Views

  • Sorted, Projected, and Materialized (SPM) viewsProjects subset of tables columns, does sort, and stores on diskSyntax:SYSTEM(ADMIN)=> \h create materialized viewCommand: CREATE MATERIALIZED VIEWDescription: Creates a new materialized viewSyntax:CREATE MATERIALIZED VIEW view AS SELECT column [, ...] FROM table [ ORDER BY column [, ...] ]

    Creates a new materialized view.

    CREATE OR REPLACE MATERIALIZED VIEW view AS SELECT column [, ...] FROM table [ ORDER BY column [, ...] ]

    Creates a new materialized view or replaces an existing materialized view.

    SPM views are used to improve query performance significantlyIf selected column exists in materialized view, optimizer selects the data from materialized view instead of going through base table

  • Few Restrictions:- Only one base table in the FROM clause- No WHERE clause- No expressions are allowed in materialized view columns- Only user table as base table for SPM view. Dont specifiy CBT, system table, etc as base table

    When you insert a new record in base table, same will be inserted into materialized view table as well.So, as time goes on, we will be having unsorted records getting appended to materialized view table at the end.

    So, we should periodically manually refresh SPM view by suspending and refreshing it.

    ALTER VIEW M_STUDENT_SKEW MATERIALIZE REFERESH;SKEW(ADMIN)=> create materialized view m_student_skew as select SID, SNAME from student order by sid;CREATE MATERIALIZED VIEWSKEW(ADMIN)=> \dm List of relations Name | Type | Owner | STATE----------------+-------------------+-------+-------- M_STUDENT_SKEW | MATERIALIZED VIEW | ADMIN | ACTIVE(1 row)Setting Referesh Threshold: SET SYSTEM DEFAULT MATERIALIZE THRESHOLD TO

  • When you use ALTER VIEWS ON MATERIALIZE REFRESH: The system refreshes all suspended views, and all non-suspended views whose unsorted data has exceeded the refresh threshold.Setting Refresh Threshold: SET SYSTEM DEFAULT MATERIALIZE THRESHOLD TO The THRESHOLD specifies the % of unsorted data in the materialized view. Default: 20

    You can set threshold from 1 to 99.\h ALTER VIEW

  • Nzbackup command just backs up SPM view definition, not the SPM view-specific dataNzrestore automatically creates/populates materialized views from base tables unless the SPM view is in suspend state.

    Zone maps for ORDER BY columns in materialized views are created. Best Practices:- Use most frequently used and most frequently restricted columns - Even though you can have more than one materialized view on a table, restrict this number- Limit columns in materialized views to as less as possible

  • System Views

  • There are N number of catalog views in Netezza. Lets look at few of them here:_v_view: To know what are available views in the Netezza_v_database: All databases information_v_user: About users information in the Netezza system_v_table: List of tables. Both system tables & Management Tables

    RAVI(ADMIN)=> select objid, tablename from _v_table where tablename='CUSTOMER'; OBJID | TABLENAME--------+----------- 236354 | CUSTOMER(1 row)

    _v_relation_column: Table/column mappingRAVI(ADMIN)=> select OBJID, NAME, ATTNAME from _v_relation_column where objid=236354; OBJID | NAME | ATTNAME--------+----------+--------- 236354 | CUSTOMER | C3 236354 | CUSTOMER | C2 236354 | CUSTOMER | C1(3 rows)

  • _v_objects: Lists different objects like tables, views, functions, etc_v_qrystat: currently running queries statistics_v_qryhist: History of what has been run_v_aggregate: Returns a list of all defined aggregates_v_datatype: all system datatypes_v_function: All defined functions_v_group: List of all groups_v_system_info: System information such as systemstatus, version info, etc_v_index: List of all user indexes_v_operator: List of all defined operators_v_relation_column_def: Returns a list of all attributes of a relation that has defined defaults

  • _v_sequence: Returns a list of all defined sequences_v_table_index: Returns a list of all user table indexes_v_user: Returns a list of all users (\du)_v_usergroups: Returns a list of all groups of which the user is a member_v_groupusers: list of all users of a group_v_sys_group_priv: Returns a list of all defined group privileges. (\dpg group_name)_v_sys_index: Returns a list of all system indexes (\dSi)_v_sys_priv: Returns a list of all user privileges. (\dp )_v_sys_table: Returns a list of all system tables. (\dSt)_v_sys_user_priv: list permissions granted to a user. (\dpu )_v_sys_view: Returns a list of all system views. (\dSv)

  • USERS GROUPS PRIVILEGES

  • Netezza Database UsersTo access the Netezza database, users must have Netezza database user accounts Remember, no need for any OS level userids! When a user accesses Netezza databases, Netezza determines the access privileges to database objectsand the administrative permissions to various tasks and capabilitiesCreate User Example:

  • Create User (Syntax)

  • Why cant I create an user like this?Yes, users are global objects and a global object (database) already has the same name

  • Netezza Database GroupsGroups are designed to allow administrators to group users by department or functionality By default, there is a predefined group called PUBLIC. As users are created they are automatically added To the group PUBLIC. Users cannot be removed from the group public, or drop the group public.Users can be members of many groups; however, groups cannot be members of other groups.Groups, users, and databases share a common name space so group, user and database namesmust be unique.

    For example: You cannot have a group name RAVI, a user name RAVI, and a database name RAVIHow do you display list of groups?

  • Groups Creation (Syntax)

  • CREATE GROUP/USER (Examples!)

  • CREATE GROUP/USER (Examples!)

  • CREATE GROUP/USER (Examples!)

  • nz(nz): Linux user, not exposed to NPS client users

    admin(password): NPS database super-user for the NPS host software, with full access to all system functions and objects at all times

    root(Netezza): Linux super-user which provides system root loginDefault Users & PasswordsThe default database group is called public. All users are automatically assigned as members of the public group. You cannot delete the public group, or remove users from it.

  • PrivilegesNetezza has two types of privileges:(1) Object Privileges(2) Administrative PrivilegesObject privileges apply to individual object instances. Administrative privileges apply to the system as a whole. List of Object Privileges: Abort: Allows the user to abort sessions. i.e., you can use nzsession commandAll: Allows the user to have all the object privilegesAlter: Allows the user to modify the object attributesDelete: Allows the user to delete table rowsDrop: Allows the user to drop all objectsExecute: Allows the user to execute UDFs and UDAs in SQL queriesGenStats: Allows the user to generate statistics on tables/databasesGroom: Allows the user to run GROOM TABLE commandInsert: Allows the user to insert rows into a table

  • List: Allows the user to display an object nameSelect: Allows the user to select (or query) rows within a tableTruncate: Allows the user to delete all rows from a tableUpdate: Allows the user to modify table rowsList of Administrator Privileges: Backup: Allows the user to perform backups. The user can run nzbackup command[Create] Aggregate: Allows the user to create user-defined aggregates (UDAs) and to operate on existing UDAs.[Create] Database: Allows the user to create a databases[Create] External Table: Allows the user to create external tables. Permissions to operate on existing tables is controlled by object privileges.[Create] Function: Allows the user to create user-defined functions (UDFs) and to operate on existing UDFs.[Create] Group: Allows the user to create groups. Permissions to operate on existing groups is controlled by object privilegesPrivileges

  • [Create] Index: For system use only. User cannot create indexes[Create] Library: Allows the user to create user-defined shared libraries.[Create] Materialized View: Allows the user to create Materialized views[Create] Procedure: Allows the user to create stored procedures[Create] Sequence: Allows the user to create sequences[Create] Synonym: Allows the user to create synonyms[Create] Table: Allows the user to create tables[Create] Temp Table: Allows the user to create temporary tables[Create] User: Allows the user to create users[Create] View: Allows the user to create views[Manage] Hardware: Allows the user to do the following hardware-related operations: View hardware status, manage SPUs, manage topology and mirroring, and run diagnostic tests. The user can run nzds and nzhw commandsPrivileges

  • [Manage] Security: Allows the user to run commands and operations that relate to history databases such as creating and cleaning up history databases[Manage] System: Allows the user to do management operations. For example: nzsystem, nzstate, nzstats, and nzsession priorityrestore: Allows the user to restore the system. can run nzrestore command Unfence: Allows the user to create an unfenced user-defined function (UDF) or user-defined aggregate(UDF)Privileges

  • Grant Syntax

  • Grant Example (Object Privilege)

  • Grant Example (Admin Privilege)

  • REVOKE Syntax

  • SQL IDENTIFIERS

  • Types of IdentifiersThere are 2 types of Identifiers in Netezza Regular IdentifiersAre case-insensitive Are converted to default system case For example: Sales and SALES are the equivalent Delimited IdentifiersAre enclosed in double-quotation marks Are case-sensitive Are not converted to default system case For example: Sales and SALES are different

  • Types of Identifiers (Regular Identifier Example)

  • Types of Identifiers (Delimited Identifier Example)

  • SEQUENCE

  • What is a sequence?A sequence is a named object in a database that can be used to generate unique numbersA sequence may be byteint, smallint, integer, bigintYou can use sequence values wherever you would use numeric valuesYou can create, alter, and drop named sequencesSyntax:CREATE SEQUENCE sequence_name AS data_type[];

    where the options are the following:> START WITH start_value> INCREMENT BY increment_value> NO MINVALUE | MINVALUE minimum_value> NO MAXVALUE | MAXVALUE maximum_value> NO CYCLE | CYCLE

  • Sequences do not support cross database access; you cannot obtain a sequence value from a sequence defined in a different database.

  • Questions?

    **CREATE DATABASE*

    *Within the nzsql command environment, you can also use the \c option to connect to a new database. Unlike the SET CATALOG command, the \c option closes the current session andstarts a new session to the database. The option syntax is as follows:\c[onnect] [dbname [user] [password]]

    **What is a Datatype? *What is a Datatype? *CREATE DATABASE*CREATE DATABASE*CREATE DATABASE*What is a Datatype? ***Varchar: Variable-length, non-unicode character data. Varchar stores single-byte character data.Nvarchar: variable-length unicode character data. Nvarchar requires twice the storage space as varchar.Unicode means 16-bit character encoding scheme allowing characters from lots of other languages like Arabic, Hebrew, Chinese, Japanese, *******What is a Datatype? *What is a Datatype? *What is a Datatype? *What is a Datatype? *What is a Datatype? **What is a Datatype? ***What is a Datatype? *What is a Datatype? *Data Loading Components:*http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.adm.doc%2Fr_sysadm_qhist_views_tbls.html

    *http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.adm.doc%2Fr_sysadm_qhist_views_tbls.html

    *http://pic.dhe.ibm.com/infocenter/ntz/v7r0m3/index.jsp?topic=%2Fcom.ibm.nz.adm.doc%2Fr_sysadm_qhist_views_tbls.html

    ***Data Loading Components:*Data Loading Components:*Data Loading Components:*Data Loading Components:*Data Loading Components:*Data Loading Components:*Data Loading Components:*nz(nz): Linux user, not exposed to NPS client usersadmin(password): NPS database super-user for the NPS host software, with full access to all system functions and objects at all times.root(Netezza): Linux super-user which provides system root login

    Default Users & Passwords

    *Data Loading Components:*Data Loading Components:*Data Loading Components:*Fenced or unfenced considerationsWhen creating a User Defined Function (UDF) consider whether to make the UDF an Unfenced UDF. By default, UDFs are created as Fenced UDFs. Fenced indicates that the database should run the UDF in a separate thread. For complex UDFs, this separation is meaningful as it will avoid potential problems such as generating unique SQL cursor names. Not having to be concerned about resource conflicts is one reason to stick with the default and create the UDF as a fenced UDF. A UDF created with the NOT FENCED option indicates to the database that the user is requesting that the UDF can run within the same thread that initiated the UDF. Unfenced is a suggestion to the database, which can still decide to run the UDF in the same manner as a Fenced UDF.

    CREATE FUNCTION QGPL.FENCED (parameter1 INTEGER) RETURNS INTEGER LANGUAGE SQL BEGIN RETURN parameter1 * 3; END;

    CREATE FUNCTION QGPL.UNFENCED1 (parameter1 INTEGER) RETURNS INTEGER LANGUAGE SQL NOT FENCED -- Build the UDF to request faster execution via the NOT FENCED option BEGIN RETURN parameter1 * 3; END;*GRANT *Data Loading Components:*Data Loading Components:*Data Loading Components:*What is a Datatype? *Data Loading Components:*Data Loading Components:*Data Loading Components:*What is a Datatype? *Sequences do not support cross database access; you cannot obtain a sequence value from a sequence defined in a different database.

    Default MINVALUE is 1Default MAXVALUE is the max value possible in the datatype.By default, sequences do not cycle

    Sequences have gaps because IBM Netezza caches sequence values on the host and SPUs for efficient operation.

    **Sequences do not support cross database access; you cannot obtain a sequence value from a sequence defined in a different database.

    *What is a Datatype? *