Upload
ravikumar-nandigam
View
84
Download
7
Embed Size (px)
Citation preview
IBM PureData Systemfor Analytics
(Formerly known as, IBM Netezza)
- Ravi
Loading and Unloading Tables
Data Loading/Unloading Components:
• External Tables
• nzload command
• Backup and Restore
• nz_migrate utility
*
External Tables
*
In IBM Netezza environment, there are the following types of tables:•System Tables: Stored on the host•User tables: Stored on the disks in storage arrays•External Tables: Stored as flat files on the host or client systems
An External table allows Netezza to treat an external file as database table
An External table has a definition (also called table schema), but the actual data exists outside of Netezza appliance database
Netezza can treat a file on a client system as an external table using the REMOTESOURCE option
You can use INSERT INTO/SELECT FROM on external tables
*
EXTERNAL TABLES:Example1
*
*
External Tables: Loading data through ODBC
*
Managing External Tables
• You can INSERT and DROP an External Table
• You can join an external table with database tables
• You cannot DELETE, TRUNCATE, and UPDATE an External Table
• Not more than 1 External Table in a FROM/WHERE clause in a query or subquery
• No Union operation between External Tables
• Statistics are automatically generated for External Tables
*
External Tables: Unload data
*
Transient External Tables
Transient external tables (TET) provide a way to define an external table that exists only for the duration of a single query
*
Export data using TET:create external table '/tmp/customer.out' USING (DELIM '|') AS select * from customer;
Import data using TET:truncate table customer;
INSERT INTO CUSTOMER SELECT * FROM EXTERNAL '/tmp/customer.out' USING (DELIM '|');
Compress Binary Format External Tables
create external table ext_customer sameas customer USING (DATAOBJECT '/tmp/customer1.out' FORMAT 'internal' COMPRESS true);
*
\d customer Table "CUSTOMER" Attribute | Type | Modifier | Default Value-----------+-----------------------+----------+--------------- CID | SMALLINT | | CNAME | CHARACTER VARYING(30) | | CAGE | BYTEINT | | CADDRESS | CHARACTER VARYING(50) | |Distributed on hash: "CID"
NZLOAD
NZLOAD
The NZLOAD command is a wrapper to the CREATE EXTERNAL TABLE/INSERT INTO commands
NZLOAD allows you to load data from the local host or a remote client
Nzload is command line interface program. You can provide inputs to nzload through command line or through a control file
The nzload command is an ODBC client application that loads data remotely or locally. You can use the nzload command on the Netezza host and on all the supported client platforms.
STATISTICS are generated for load operations
*
How the nzload command works
Sends queries to the host to create an external table definition
Processes command-line load options
Runs the insert/select query to load data
Drops the external table when the load completes
An nzload operation is treated as a single transaction. i.e., all records are loaded with a single transaction ID
If the load fails the records are logically deleted.The storage space allocated for those records should be recovered at some point in time using either nzreclaim/Truncate table(If load is for first time)
Other users can run queries against the tables while they are being loaded. New data is only visible to users when the transaction has been committed
*
NZLOAD important options
nzload accepts many options and arguments, but below are required:•-host <host_name>•-u <username>•-pw <password>•-db <database_name>•-t <table_name>
Commonly used options & arguments:•-df <filename> /* data (inputs rows to be loaded) */•-cf <filename> /* control file name */•-delim <char> /* delimiter. Default is \t */•-nullValue <char> /* default is NULL. You can change this to any 1 to 4 characters */•-maxErrors•-dateDelim•-dateStyle•-allowReplay /* To enable load continuation if the system paused due to a SPU reset or failover*/
nzload -db database_name -t table_name -delim “|” -maxErrors num_errors -df source_file_name
*
NZLOAD Example:1
*
NZLOAD (Example:2)
*
Sample NZLOG/NZBAD files
*
NZLOG file
When nzload is executed a nzlog file is created; It contains messages related to the load
The nzlog file by default is located in your current working directory
The file name format is <table_name>.<database_name>.nzlog
Use the -lf <file_name> option to specify a different nzlog file name
-outputDir <directory> option may be used to specify the directory for the nzlog file
Appends to the log file for every nzload process that loads to the same database table
Periodically delete log files to free disk space
For nzload operations, a return code is also issued as follows:•0 (success)•1 (failed, no records inserted)•2 (Found errors in input but did not exceed maxErrors, load is deemed successful, and records are inserted)
*
NZBAD file
When nzload is executed a nzbad file is created; It contains only rejected records from the load file.
The nzbad file by default is located in your current working directory
The file name format is <table_name>.<database_name>.nzbad
Use the -bf <file_name> option to specify a different nzbad file name
-outputDir <directory> option may be used to specify the directory for the nzbad file
If the file already exists, it is overwritten.
If there are no rejected records the file will be empty (0 bytes)
*
-maxErrors option (NZLOAD)
*
-maxErrors option (contd …)
*
NZLOAD using Control File
*
NZLOAD using FIXED format
So far what we have seen is text delimited loading. However there are cases where it is difficult to define any delimiter.
For example: A column containing lengthy data having alpha numeric characters. In such cases, it will be difficult to use text delimited loading and one has to use Fixed length loading.
*
Questions?