90
Infobright Community Edition 4.0.6 GA USER GUIDE WWW.INFOBRIGHT.COM

Infobright Community Edition-user Guide

  • Upload
    mishaiv

  • View
    1.410

  • Download
    17

Embed Size (px)

DESCRIPTION

Infobright Community Edition-user Guide

Citation preview

Page 1: Infobright Community Edition-user Guide

Infobright Community Edition 4.0.6 GA USER GUIDE

WWW.INFOBRIGHT.COM

Page 2: Infobright Community Edition-user Guide

COPYRIGHT NOTICE

The materials provided herein are Copyright © 2005-2012 Infobright Inc.

All rights reserved.

CONFIDENTIAL: The information contained in this document is the property of Infobright Inc. Except as specifically authorized in writing by Infobright, the holder of this document shall keep the information contained herein confidential and shall protect same in whole or in part from disclosure or dissemination to third parties.

If these materials were purchased as a digital download, Infobright hereby grants the purchaser permission to reproduce a single copy (print or download) of the materials without prior written permission.

If these materials were purchased in printed form, no part of these materials shall be reproduced or retransmitted by any means, electronic, mechanical, photocopying, recording, or otherwise without written permission from Infobright.

Document Revision 4.0.6 GA-12.03.06

Page 3: Infobright Community Edition-user Guide

CONTENTS I

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Contents

1. About Infobright .................................................................................................................................................... 1

Infobright Overview ............................................................................................................................................... 1 Infobright and MySQL ........................................................................................................................................... 1

2. Setting up Infobright ............................................................................................................................................ 3

Technical Requirements ......................................................................................................................................... 3 Linux for Infobright ................................................................................................................................................ 4

Installation ............................................................................................................................................................... 4

Installing Infobright............................................................................................................................................ 4

Windows Installation ......................................................................................................................................... 4 Linux Installation ................................................................................................................................................ 5

Upgrade ................................................................................................................................................................... 8

Windows Upgrade Instructions ....................................................................................................................... 8

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) ..................................................................... 9 Linux RPM or DPKG Upgrade ......................................................................................................................... 9

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) ................................................................... 10

Linux TAR Upgrade ......................................................................................................................................... 11

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) ................................................................... 12 Configuration ........................................................................................................................................................ 12

Configuring Infobright .................................................................................................................................... 12

Configuration Tips and Examples .................................................................................................................. 14

3. Using Infobright .................................................................................................................................................. 16

Starting and Stopping the Infobright Server ..................................................................................................... 16

Windows ............................................................................................................................................................ 16

Linux ................................................................................................................................................................... 16 Working with the Infobright Server ................................................................................................................... 16

Windows ............................................................................................................................................................ 17

Linux ................................................................................................................................................................... 17

Checking the Infobright Version ........................................................................................................................ 17 Infobright is the Default Storage Engine ........................................................................................................... 18

About Log Files ..................................................................................................................................................... 19

About Errors .......................................................................................................................................................... 20

About SQL Command Syntax ............................................................................................................................. 20 About SQL ISO Standards ................................................................................................................................... 21

Page 4: Infobright Community Edition-user Guide

CONTENTS

II

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

4. Managing Infobright Tables ............................................................................................................................. 22

About the Infobright Database Files .................................................................................................................. 22

About Supported Data Types ............................................................................................................................. 22 Creating and Dropping Tables ........................................................................................................................... 24

About Column Options ....................................................................................................................................... 25

NULL and NOT NULL .................................................................................................................................... 25

Lookup Columns .............................................................................................................................................. 25 Unsupported Column Options ....................................................................................................................... 25

Unsupported Indices Options ......................................................................................................................... 26

Converting Oracle DDL to Infobright ................................................................................................................ 26

Converting SQL Server to Infobright ................................................................................................................. 26 Converting MySQL (MyISAM) to Infobright ................................................................................................... 26

Viewing Table Information ................................................................................................................................. 27

Viewing Compression Ratio Statistics ............................................................................................................... 28

Viewing Table Level Compression Ratio Statistics ...................................................................................... 29 Viewing Column Compression Ratio Statistics ............................................................................................ 30

Comparison of Calculated Compression Ratio to Physical Size ................................................................ 30

5. Data Manipulation Statements ......................................................................................................................... 31

Unsupported Data Manipulation Commands (INSERT, UPDATE, DELETE) ............................................ 31

6. Character Set Support ......................................................................................................................................... 32

Supported Character Sets .................................................................................................................................... 32 Collations and Comparisons ............................................................................................................................... 32

Padding .................................................................................................................................................................. 33

7. Importing and Exporting Data in Infobright.................................................................................................. 34

Multi-character Delimiter .................................................................................................................................... 34

About Transactions .............................................................................................................................................. 34

Using AUTOCOMMIT, COMMIT and ROLLBACK Commands ............................................................. 34

About Transaction Behavior ........................................................................................................................... 35 Failure Handling ............................................................................................................................................... 35

About Export Differences in Infobright ............................................................................................................. 36

CHAR(n) Data Type Values ............................................................................................................................ 36

Escape Characters ............................................................................................................................................. 36 Exporting NULL Values .................................................................................................................................. 36

Infobright Import/Export Syntax ........................................................................................................................ 36

Page 5: Infobright Community Edition-user Guide

CONTENTS

III

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Infobright Loader Reject File ........................................................................................................................... 36 Enabling the Reject File Functionality ........................................................................................................... 37

Disabling the Reject File Functionality .......................................................................................................... 38

Infobright Loader Line Terminators .............................................................................................................. 38

Escape Character ............................................................................................................................................... 38 End of Line (EOL) Sequence ........................................................................................................................... 38

Importing Data .................................................................................................................................................. 39

Importing Data Using Remote Load .............................................................................................................. 39

Exporting Data .................................................................................................................................................. 40

Optional FIELDS Clause .................................................................................................................................. 40 Importing Files with Invalid Values .............................................................................................................. 41

Importing Data Using Linux Pipes .................................................................................................................... 41

About Import Errors ............................................................................................................................................. 42

About Export Errors ............................................................................................................................................. 42 Sample Script (Create Table, Import Data, Export Data) ................................................................................ 43

Exporting and Importing Query Results ........................................................................................................... 44

8. Running Queries in Infobright ......................................................................................................................... 45

About the Knowledge Grid ................................................................................................................................. 45

About Knowledge Nodes ................................................................................................................................ 45

Running Queries ................................................................................................................................................... 46

Running Queries ............................................................................................................................................... 46 Enabling Queries to be Redirected to the MySQL Engine .............................................................................. 46

Enabling Queries to be Redirected to the MySQL Engine .......................................................................... 46

Viewing Queries Redirected to the MySQL Engine ..................................................................................... 46

Terminating a Query ........................................................................................................................................ 47 Creating VIEWs in Infobright ............................................................................................................................. 47

Create VIEW Syntax ......................................................................................................................................... 47

Select Syntax Supported in Infobright ............................................................................................................... 47

Select Syntax ...................................................................................................................................................... 47 Join Syntax ......................................................................................................................................................... 48

Union Syntax ..................................................................................................................................................... 48

Subqueries ......................................................................................................................................................... 48

Query Performance .............................................................................................................................................. 49 Rough Queries....................................................................................................................................................... 50

About Rough Query ......................................................................................................................................... 50

Query Support .................................................................................................................................................. 51

Page 6: Infobright Community Edition-user Guide

CONTENTS

IV

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Subquery Support ............................................................................................................................................. 56 Complex Expression ......................................................................................................................................... 61

9. Infobright Backup and Recovery ...................................................................................................................... 65

Backup Procedure ................................................................................................................................................. 65 Restore Procedure ................................................................................................................................................. 65

A. Infobright Optimizer - Supported Functions and Operators ..................................................................... 66

Comparison Functions and Operators ............................................................................................................... 66 Logical Operators ................................................................................................................................................. 67

Control Flow Functions ....................................................................................................................................... 67

String Functions .................................................................................................................................................... 67

String Comparison Functions ............................................................................................................................. 69 Numeric Functions ............................................................................................................................................... 69

Date and Time Functions ..................................................................................................................................... 71

Text Search and Other Functions ....................................................................................................................... 73

Group By Aggregate Functions .......................................................................................................................... 74 Group By Modifiers .............................................................................................................................................. 74

B. Infobright Data Tools ......................................................................................................................................... 75

Infobright Configuration Manager ..................................................................................................................... 75 Running the Infobright Configuration Manager .......................................................................................... 75

Charset Migration Tool ........................................................................................................................................ 75

Running the Charset Migration Tool ............................................................................................................. 75

Log Structure ..................................................................................................................................................... 76 Collations-conversion-file Structure .............................................................................................................. 76

Infobright DomainExpert .................................................................................................................................... 77

About the Infobright DomainExpert .............................................................................................................. 77

Decomposition Rules ....................................................................................................................................... 77 Decomposition Rules Language ..................................................................................................................... 77

Predefined IPv4 Rule ........................................................................................................................................ 79

Other Predefined Rules .................................................................................................................................... 79

Assigning Rules to Columns ........................................................................................................................... 79 Applying Rules to Data.................................................................................................................................... 80

Modifying a Rule for an Existing Column .................................................................................................... 81

C. Linux Tuning Settings ....................................................................................................................................... 82

Page 7: Infobright Community Edition-user Guide

CONTENTS

V

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

System Settings for Red Hat Enterprise Linux and CentOS ........................................................................... 82

Disable SElinux ................................................................................................................................................. 82

Swappiness ........................................................................................................................................................ 82 Disable Unused Processes ............................................................................................................................... 82

File System Settings .............................................................................................................................................. 82

Ensure CacheFolder is on a Fast Local Disk ................................................................................................. 82

Larger Readahead ............................................................................................................................................. 82 Use XFS File System for Data Directories ...................................................................................................... 82

noatime ............................................................................................................................................................... 83

Deadline Elevator ............................................................................................................................................. 83

Increase ulimit to Support Large Data Volume or Users ............................................................................ 83 Note on how to detect ulimit problem .......................................................................................................... 83

Page 8: Infobright Community Edition-user Guide

1. ABOUT INFOBRIGHT

1

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

1. About Infobright

Infobright Overview Thank you for choosing to install Infobright Community Edition (ICE) 4.0.6 GA. Infobright is a column-oriented, high performance analytic engine designed for analytic applications and data marts that need fast query response across large data volumes. Infobright was designed specifically for large volume data analytics applications with up to 50TB of data.

Infobright uses a unique and patent-pending approach to compressing, storing, and processing data that allows it to be installed and run on commodity hardware with little or no DBA intervention. Infobright requires little tuning to support ad hoc or complex business analytic queries.

Infobright is a database engine utilizing the MySQL database environment. As such, Infobright is fully compatible with all MySQL-compliant Business Intelligence tools and utilizes the MySQL administrative interface to reduce the learning curve for system administrators.

Infobright Community Edition provides a versatile, highly-compressed database system optimized for analytic-type queries. The ratio of possible compression and the speed of data import and retrieval are optimized at the expense of some transactional features of the engine performance, like the frequent data updating.

Infobright executes complex or ad hoc queries across vast amounts of data with a low cost of ownership.

Infobright and MySQL ICE 4.0.6 GA combines the Infobright storage engine with MySQL server implementation.

Infobright consists of several layers. The upper layers are provided by the MySQL server implementation, and the lower layers are provided by Infobright.

Infobright includes its own computing engine along with the storage engine. The MySQL query engine can be used with Infobright; however, since the MySQL storage engine interface is row oriented, it can not take full advantage of the column orientation or the Knowledge Grid and hence query execution via this path is reduced. Queries will be directed to the Infobright optimizer whenever possible.

Infobright ships with the full MySQL binaries required, including the MyISAM storage engine. MyISAM is used to store catalog information (as with other storage engines) and you can use the MyISAM instance for other purposes but joining MyISAM and Infobright tables may result in reduced performance as the MySQL query engine will be used.

Page 9: Infobright Community Edition-user Guide

1. ABOUT INFOBRIGHT

2

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

MySQL provides: Infobright provides:

Mature connectors, tools and resources

Interconnectivity and certification with BI Tools

Management services and utilities

Load function that compresses data

Column-oriented storage engine

Knowledge Grid metadata layer that contains information about the compressed data

Optimizer/executor that uses the Knowledge Grid

Infobright and MySQL are integrated as shown below:

Since other storage engines, like InnoDB and Falcon, are not included in the Infobright distribution, they must be run as separate instances (executables). If you wish to combine other storage engines with Infobright, you will need to look at a database federation application (some BI tools provide this).

Page 10: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

3

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

2. Setting up Infobright

Technical Requirements Before installing Infobright, review the following technical requirements.

INFOBRIGHT TECHNICAL REQUIREMENTS

Requirement Description

Platforms Windows XP (32-bit only)

Windows Server 2003 (64-bit only)

Red Hat Enterprise Linux 5

Debian “Lenny”

CentOS 5.2

Ubuntu 8.04 (32-bit only)

Fedora 9 (32-bit only)

Processor Architecture Intel 64-bit

Intel 32-bit

AMD 64-bit

AMD 32-bit

For Personal Evaluation and/or Application Development

CPU Speed 32-bit: 1.6GHz minimum, 2.0GHz or faster dual or quad core recommended

64-bit: 1.8GHz minimum, 2.0GHz or faster dual or quad core recommended

Memory 32-bit: 1GB minimum. 2GB or more recommended

64-bit: 2GB minimum, 4GB or more recommended

For Multi-User Evaluation or Production Deployment

CPU Speed 64-bit: 2.0GHz minimum, 2.0GHz or faster dual or quad core recommended

Memory 64-bit: 4GB minimum, 16GB or more recommended (and at least 2GB per core)

Important 32-bit platforms are for solution testing purposes only and are not recommended for performance or multi-user testing, or production deployments.

Page 11: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

4

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Linux for Infobright Infobright has been optimized for various ‘flavours’ of Linux. While Infobright can be run ‘out of the box’ on any supported Linux platform, there are a number of tuning opportunities to improve performance.

See "Linux Tuning Settings" on page 82 for a list of tuning suggestions.

Installation

Installing Infobright The Infobright installation packages are provided as an RPM, DEB, PKG, .exe, or tarball. For non-Windows platforms, the user installing Infobright must be the root user or a user with the necessary permissions to install files, create the user mysql and create the group mysql.

Windows Installation

Windows Installation Instructions

1. Download the install package (for example, infobright-version-win32.exe) to the Windows machine on which you are installing Infobright, and double click on the .exe file to launch the Install Wizard. Click Next to continue.

2. Click I Agree to accept the GPL license agreement.

3. By default ICE is installed in C:\Program Files\Infobright. To change the default location, either enter the folder name in the field or click Browse… to select the desired install location on your computer. Click Install to accept the install location.

4. Please wait while the Install Wizard completes the installation.

5. Choose if you want Infobright to start on completion of the installation. Click Finish to complete the installation.

6. The Install Wizard automatically creates ICE as a Windows Service, which allows the Infobright server to be started and stopped automatically when you boot or shutdown Windows. If you do not want ICE to start on boot, open the Services window from the Control Panel and change the Startup Type for Infobright from “Automatic” to “Manual”.

7. The Install Wizard automatically determines the optimum memory settings based on the physical memory of the system. You may change these settings by editing the file brighthouse.ini within the data directory.

Important The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, please lower the memory settings for Infobright.

Page 12: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

5

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

See “Recommended Memory Configurations” in "Configuration Tips and Examples" on page 14.

Uninstalling on Windows

To uninstall ICE, select “Infobright Uninstall” under the Infobright program group in the Windows Start Menu: Start/All Programs/Infobright/Infobright Uninstall

Linux Installation

Linux RPM and DPKG Installation Instructions

To install Infobright on Linux using the rpm or dpkg package:

1. Download the installation package from www.infobright.org/Download/ICE/.

2. Obtain root user access and run: rpm -i infobright_version_name.rpm [--prefix=path]

or dpkg -i infobright_version_name.deb

Important Do not install in the root or home directories due to possible MySQL permission checking issues during install, start up, and/or load. If you use the rpm --prefix option, you should manually create a softlink to the Infobright install directory from /usr/local/infobright.

3. To change the default install options, after installation run: /usr/local/infobright/postconfig.sh

You can run this script at any time after installation to change the datadir, CacheFolder, socket, or port. The script must be run as root, and Infobright must not be running.

Page 13: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

6

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

INFOBRIGHT INSTALL OPTIONS

Parameter Description

Datadir Path to the directory where tables will be created and stored. Use a high-performance storage such as a RAID.

Cachedir Path to the directory where temporary files will be created and stored. Should be located on a fast drive, possibly not the same as the data. Allow at least 100 GB of free space (depending on database size).

Note: The Cachedir option is disabled when the Datadir option is chosen. To change Cachedir, rerun the postconfig utility and do not choose Datadir.

Port Listening port for the Infobright server instance.

Socket Socket connection point for client connections. (The socket connection point will be created during the Infobright installation.)

4. The installation determines the optimum memory settings based on the physical memory of the system. You may change these settings by editing the file brighthouse.ini within the data directory. See “Recommended Memory Configurations” in "Configuration Tips and Examples" on page 14.

Important The memory settings assume that there are no other services on the machine consuming significant memory. If this is not the case, please lower the memory settings for Infobright.

Uninstalling on Linux

To uninstall Infobright, run: rpm -e infobright

or dpkg -r infobright

Linux TAR Install

To install Infobright on Linux using the tarball package:

1. Obtain root user access.

2. Change to the parent location in which you want to install (e.g. /usr/local) : cd /usr/local

Important Do not install in the root or home directories due to possible MySQL permission checking issues during install, start up, and/or load.

Page 14: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

7

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

3. Unpack the tarball, which will create the product directory (e.g. infobright-version-x86_64_ice and create a symbolic link ‘infobright’ to the product folder: gunzip < /path/to/infobright-version-x86_64_ice.tar.gz | tar xvf - ln -s /usr/local/infobright-version-x86_64_ice nfobright cd /usr/local/infobright

4. Run the install script with the “--help” flag to check for system configuration and provide examples of directory parameters: ./install-infobright.sh –help

Parameters required: --datadir=infobright data folder [--datadir=/usr/local/infobright/data] --cachedir=infobright cache folder [--cachedir=/usr/local/infobright/cache] --config=mysql conf file to be created [--config=/etc/my-ib.cnf] --port=infobright server port [--port=5029] --socket=socket file to be used by this server [--socket=/tmp/mysql-ib.sock] --user=user to be created if not exist [--user=mysql] --group=user group to be created if not exist [--group=mysql]

INFOBRIGHT COMMAND-LINE PARAMETERS

Parameter Description

Datadir Path to the directory where tables will be created and stored. Use a high-performance storage such as a RAID.

Cachedir Path to the directory where temporary files will be created and stored. Should be located on a fast drive, possibly not the same as the data. Allow at least 100 GB of free space (depending on database size).

Note The Cachedir option is disabled when the Datadir option is chosen. To change Cachedir, rerun the postconfig utility and do not choose Datadir.

Port Listening port for the Infobright server instance.

Config MySQL configuration file. (The configuration file will be created with defaults during the Infobright installation.)

Socket Socket connection point for client connections. (The socket connection point will be created during the Infobright installation.)

User System user who can run the Infobright server instance. User will be created if it does not exist. The default user is mysql.

Page 15: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

8

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

INFOBRIGHT COMMAND-LINE PARAMETERS

Parameter Description

Group System group for the above user. Group will be created if it does not exist. The default group is mysql.

Run the install script again, this time with directory parameters. If parameters are used that already exist, an error will occur (for example running the same script with parameters twice).

Example command: ./install-infobright.sh --datadir=/usr/local/infobright/data --cachedir=/usr/local/infobright/cache --port=5029 --config=/etc/my-ib.cnf --socket=/tmp/mysql-ib.sock --user=mysql --group=mysql

5. Change the default memory configuration by editing the file brighthouse.ini within the data directory. See “Recommended Memory Configurations” in "Configuration Tips and Examples" on page 14.

Important It is critical that you increase the memory settings for systems running more than 2GB of physical memory or performance will be severely impacted.

Upgrade

Windows Upgrade Instructions Before upgrading, be sure to read the latest release notes for any special upgrade instructions.

1. Please follow the standard ICE Windows installation instructions. The Install Wizard automatically detects a previous version of ICE and upgrades your ICE installation while preserving your data and configuration settings. The install procedure automatically runs the Configuration Manager.

2. Start the Infobright server from the Start Menu items.

3. Create or ensure that the directory c:\tmp exists (necessary for step 4).

4. Run the MySQL Upgrade utility from the Windows command line: cd "C:\Program Files\Infobright\bin" mysql_upgrade.exe --defaults-file="c:\Program Files\Infobright\my-ib.ini" -uroot --tmpdir=c:\tmp

Important The MySQL Upgrade utility may display several errors regarding the use of locks with log tables and errors requiring table upgrades. The errors are all handled automatically by Infobright and/or the upgrade utility and can be ignored.

Page 16: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

9

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

5. Stop and start the Infobright server from the Start Menu items.

6. If you are upgrading from Infobright 3.5 or earlier, run the Infobright upgrade tool from the Windows command line. This creates stored procedures used by the DomainExpert. cd "C:\Program Files\Infobright" \Infobright-upgrade.bat -u root

7. If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures. See the next section for details.

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures after upgrading ICEIEE. Do NOT follow these instructions if you are upgrading from ICE 3.3.2 or higher or you may experience data corruption. If you are unsure what version of ICE you are using, please contact Professional Services.

1. Stop the Infobright server from the Start Menu items.

2. Run the Charset Migration Tool from the Windows command line: cd "C:\Program Files\Infobright\bin" chmt.exe –datadir=\absolute\path\to\data\directory

3. Start the Infobright server from the Start Menu items.

Linux RPM or DPKG Upgrade Before upgrading, be sure to read the latest release notes for any special upgrade instructions.

To upgrade using the rpm or deb package, simply run the installation command. The package will automatically identify that Infobright is already installed and switch to upgrade mode. Your configuration settings and data will not be changed during the upgrade.

Important If the previous installation was done using the tarball package, you must upgrade using the tarball package (see "Linux TAR Upgrade" on page 11) or contact Infobright Support to move from a tar install to a package install.

Page 17: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

10

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

To upgrade Infobright on Linux using the rpm or deb package:

1. As user root, run either: rpm -U infobright-version-platform.rpm dpkg -i infobright-version-platform.deb

2. Start the Infobright server: /etc/init.d/mysqld-ib start

3. Run the mysql upgrade tool to upgrade the data folder: cd /usr/local/infobright ./bin/mysql_upgrade --defaults-file=/etc/my-ib.cnf --user=root --tmpdir=/tmp

Important The MySQL Upgrade utility may display several errors regarding the use of locks with log tables and errors requiring table upgrades. The errors are all handled automatically by Infobright and/or the upgrade utility and can be ignored.

4. Restart the Infobright server: /etc/init.d/mysqld-ib restart

5. Confirm the build version as IB_4.0.6_r16086_16275: /usr/local/infobright/bin/mysqld --version

6. If you are upgrading from Infobright 3.5 or earlier, run the Infobright upgrade tool. This creates stored procedures used by the DomainExpert. cd /usr/local/infobright ./infobright_upgrade.sh -u <user> -p <password> i.e. ./infobright_upgrade.sh -u root Usage: ./infobright_upgrade.sh [-u <user>] [-p <password>]

7. If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures. See the next section for details.

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures after upgrading ICE. Do NOT follow these instructions if you are upgrading from ICE 3.3.2 or higher or you may experience data corruption. If you are unsure what version of ICE you are using, please contact Professional Services.

1. Stop the Infobright server from the Start Menu items.

2. Run the Charset Migration Tool (as user mysql): cd /usr/local/infobright ./bin/chmt –datadir=/absolute/path/to/data/directory

3. Start the Infobright server from the Start Menu items.

Page 18: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

11

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Linux TAR Upgrade Before upgrading, be sure to read the latest release notes for any special upgrade instructions.

To upgrade Infobright on Linux using the tarball package:

1. Unpack the tarball into a temporary folder. Use the gunzip utility for unpacking: cd /path/to/temp/ gunzip < /path/to/infobright-version-x86_64.tar.gz | tar xvf -

2. Stop the Infobright server: /etc/init.d/mysqld-ib stop

3. Run the install script with the “--upgrade” and “--config” flags and pass in the configuration files of the previously installed version: ./install-infobright.sh --upgrade --config=/etc/my-ib.cnf

4. Start the Infobright server and run the mysql_upgrade utility: /etc/init.d/mysqld-ib start cd /usr/local/infobright ./bin/mysql_upgrade --defaults-file=/etc/my-ib.cnf --user=root --tmpdir=/tmp

Important The MySQL Upgrade utility may display several errors regarding the use of locks with log tables and errors requiring table upgrades. The errors are all handled automatically by Infobright and/or the upgrade utility and can be ignored.

5. Re-start the Infobright server: /etc/init.d/mysqld-ib restart

6. Confirm the build version as IB_4.0.6_r16086_16275: /usr/local/infobright/bin/mysqld –-version

7. If you are upgrading from Infobright 3.5 or earlier, run the Infobright upgrade tool. This creates stored procedures used by the DomainExpert. cd /usr/local/infobright ./infobright_upgrade.sh -u <user> -p <password> i.e. ./infobright_upgrade.sh -u root Usage: ./infobright_upgrade.sh [-u <user>] [-p <password>]

8. If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures. See the next section for details.

Page 19: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

12

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Updating Table Structures (Versions Prior to ICE 3.3.2 Only) If you are upgrading from a version prior to ICE 3.3.2, you must update your table structures after upgrading ICE. Do NOT follow these instructions if you are upgrading from ICE 3.3.2 or higher or you may experience data corruption. If you are unsure what version of ICE you are using, please contact Professional Services.

1. Stop the Infobright server from the Start Menu items.

2. Run the Charset Migration Tool (as user mysql): cd /usr/local/infobright ./bin/chmt –datadir=/absolute/path/to/data/directory

3. Start the Infobright server from the Start Menu items.

Configuration

Configuring Infobright The Infobright configuration file is called brighthouse.ini and is located in the data subdirectory within your Infobright installation directory. The configuration file is a text file containing the Infobright configuration parameters. See the Infobright installation package for a sample brighthouse.ini file.

Important It is critical that you specify increased memory settings for systems running more than 2GB of physical memory to ensure optimal performance.

Each parameter is shown on a separate line and uses the following form:

ParameterName=ParameterValue

If a parameter is not present in the configuration file or if the configuration file does not exist, the default values are used. Blank lines and comments (lines starting with #) are ignored.

Be sure to customize the following parameters to optimize performance. These parameters are case-sensitive and must be typed as shown.

Page 20: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

13

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

INFOBRIGHT TUNING PARAMETERS

Parameter Syntax Value Description

ServerMainHeapSize=size Not less than 320

Default: 600

Size of the main memory heap in the server process, in MB. The larger the heap size, the more effectively the server works. However, the sum of the heap sizes in the server and the loader should not exceed physical memory installed in the machine, otherwise performance decreases radically.

LoaderMainHeapSize=size Not less than 320

Default: 320

Size of the memory heap in the loader process, in MB. The sum of the heap sizes in the server and the loader should not exceed physical memory installed in the machine, otherwise performance decreases radically.

CacheFolder=directory Directory name

Default: none

This is a mandatory parameter. Path to the directory where temporary files will be created and stored. This is set as one of the install script parameters.

AllowMySQLQueryPath=number 0, 1

Default: 0

Set to 1 to allow queries that are not supported in the Infobright Optimizer to be handled by the MySQL query engine. Queries that take the MySQL path will have reduced performance.

Note The values are commented out (preceded by #) in the brighthouse.ini file which causes them to default to the application minimum allowed values of 600 and 320 for ServerMainHeapSize and LoaderMainHeapSize respectively.

INFOBRIGHT ADDITIONAL PARAMETERS

Parameter Syntax Value Description

KNFolder=directory Directory name

Default: BH_RSI_Repository

Directory where the Knowledge Grid is stored. If not specified, these files are located in a subdirectory of the data directory. Allow free space of at least 1% of database size (compressed).

Page 21: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

14

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

INFOBRIGHT ADDITIONAL PARAMETERS

Parameter Syntax Value Description

ControlMessages=number 0, 1, 2, 3

Default: 0

Set to 2 to turn the control messages on with timestamps. This is usually needed by Infobright to support performance investigation.

1 removes timestamp and session number information, and generally not used in favour of 2.

3 is new in Infobright 3.4 and adds information for resource management (total and free memory and CPU cores).

Configuration Tips and Examples

Important You should configure memory settings to ensure optimal performance.

The following table shows sample memory configurations for different systems.

RECOMMENDED MEMORY CONFIGURATIONS

System Memory Server Main Heap Size Loader Main Heap Size

64GB 48000 800

48GB 32000 800

32GB 24000 800

16GB 10000 800

8GB 4000 800

4GB 1300 400

2GB 600 320

In most cases, the loader does not benefit from larger memory settings. However, increasing the LoaderMainHeapSize can help when:

a table to be loaded has very long text values, or

Page 22: Infobright Community Edition-user Guide

2. SETTING UP INFOBRIGHT

15

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

the table has many columns (e.g., 1000 columns).

You can use more memory at import if you are planning to execute several concurrent load tasks to different data tables. However, disk access may become a bottleneck.

ServerMainHeapSize should be as large as possible but safely smaller than the amount of physical memory in the machine. If performance decreases because of memory swapping by the operating system, try to set lower heap sizes. We also recommend decreasing the heap size if many users are running queries in parallel.

Important Infobright may use additional memory for heavy loads or queries. Also, other applications on your server will use memory for their processes. It is important that the total of ServerMainHeapSize and LoaderMainHeapSize is less than the total available physical memory. If the system needs to swap memory, performance will be severely impacted.

Page 23: Infobright Community Edition-user Guide

3. USING INFOBRIGHT

16

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

3. Using Infobright

Starting and Stopping the Infobright Server

Windows The Windows Install Wizard automatically creates Infobright as a Windows Service, which allows the Infobright server to be started and stopped automatically when you boot or shutdown Windows.

To manually start the Infobright server, from the Windows Start Menu run: Start/All Programs/Infobright/Infobright Start

To manually stop the Infobright server, from the Windows Start Menu run: Start/All Programs/Infobright/Infobright Stop

Linux You can start and stop the Infobright server the same way you would start and stop the original MySQL server (mysqld). Before using the Infobright server, see Starting and Stopping MySQL Automatically in the MySQL 5.1 Reference Manual.

Important It is recommended that you run Infobright using MySQL user credentials rather than root for security reasons.

To start the Infobright server on Linux, run: /etc/init.d/mysqld-ib start

To start/stop the Infobright server during system boot/shutdown use the mysqld-ib script in /etc/init.d/ for start and stop services. Use run level 2 3 4 5 to start the service, and run level 0 1 6 to stop.

The following are sample commands to create services: (Ubuntu) update-rc.d mysql-ib.server start 99 2 3 4 5 . stop 01 0 1 6 . (CentOS) chkconfig --add mysqld-ib chkconfig --level 2345 mysqld-ib on chkconfig --level 016 mysqld-ib off

Working with the Infobright Server You can use the tools provided with MySQL, such as the mysql client program, with the Infobright server. For more information, see Tutorial in the MySQL 5.1 Reference Manual.

You can also use GUI tools, such as the MySQL Workbench provided by MySQL AB, to query Infobright databases in a more graphical manner.

You can use the mysql client program to perform the following actions. For more information, see Tutorial in the MySQL 5.1 Reference Manual.

Page 24: Infobright Community Edition-user Guide

3. USING INFOBRIGHT

17

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Windows To connect to the Infobright command line interface, run:

Start/All Programs/Infobright/Infobright Command Line Client

To enable remote connections to Infobright you need to grant connection permissions in Infobright. From within the mysql shell run the following grant privileges commands: mysql> grant all privileges on *.* to 'root'@'localhost' with grant option; Query OK, 0 rows affected (0.00 sec) mysql> grant all privileges on *.* to 'root'@'%' with grant option; Query OK, 0 rows affected (0.00 sec)

Linux If you used the standard install locations, enter the following command to connect to

Infobright: /usr/bin/mysql-ib

If you used a different install location, modify the above command to point to your socket file.

When the Infobright server is first installed, an administrator account with no password is created. To connect to the administrator account, use the following command: mysql-ib

To run a script when connecting to the administrator account, use the following command: mysql-ib < input_script_name.txt

For example: mysql-ib < /tmp/testing/input.txt

To run a script when connecting to the administrator account and direct all output to a text file, use the following command: mysql-ib < input_script_name.txt > output_results.txt

For example: mysql-ib < /tmp/testing/input.txt > /tmp/testing/output.txt

During the Infobright server shutdown process, the server will not shut down until all running commands are completed. To force the shutdown of the server:

Kill the mysqld process and all running bhloader processes.

Infobright can be used with most Business Intelligence tools and any MySQL GUI client tool like Toad or Navicat. Simply point to the IP address and socket number for the Infobright server, and logon using any user credentials that have been set up.

Checking the Infobright Version You can use the following methods to check the version of the Infobright system.

Enter the following command at the command prompt: /usr/local/infobright/bin/mysqld –-version

Page 25: Infobright Community Edition-user Guide

3. USING INFOBRIGHT

18

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

After connecting to the Infobright administrator account, enter the following command at the mysql command prompt: mysql> show variables like "version_comment";

The Infobright version will be shown. For example: mysql> show variables like "version_comment"; +-----------------+----------------------------------------------------+ | Variable_name | Value | +-----------------+----------------------------------------------------+ | version_comment | build number (revision)=IB_version_r5IB_3.2_GA_5316 | +-----------------+----------------------------------------------------+ 1 row in set (0.00 sec)

Infobright is the Default Storage Engine The Infobright storage engine (named “Brighthouse”) should always be used when

working with Infobright data. This is the default setting created when using the installer. To view all available engines, enter the following command:

mysql> show engines;

The following information is displayed at the command prompt. In this example, MyISAM is shown as the default storage engine. You can combine the usage of different storage engines but you should avoid joining across storage engines as this can result in sub-optimal performance due to the use of the MySQL query engine. However it can be quite useful in some cases to store query results in Memory or MyISAM tables and do further manipulations of results.

mysql> show engines; +-----------+---------+----------------------------------------------------------+-------------+---+-----------+ |Engine |Support |Comment |Transactions |XA |Savepoints | +-----------+---------+----------------------------------------------------------+-------------+---+-----------+ |BRIGHTHOUSE|DEFAULT |Infobright storage engine |YES |NO | NO | |MRG_MYISAM |YES |Collection of identical MyISAM tables |NO |NO | NO | |CSV |YES |CSV storage engine |NO |NO | NO | |MEMORY |YES |Hash based, stored in memory, useful for temporary tables |NO |NO | NO | |MyISAM |YES |Default engine as of MySQL 3.23 with great performance |NO |NO | NO | +-----------+---------+----------------------------------------------------------+-------------+---+-----------+ 5 rows in set (0.00 sec)

Page 26: Infobright Community Edition-user Guide

3. USING INFOBRIGHT

19

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

About Log Files Infobright uses the MySQL server logs and also creates several new logs. For more information about MySQL logs, see MySQL Server Logs in the MySQL 5.1 Reference Manual.

INFOBRIGHT LOG FILES

Log Type Information Written to Log

Error log Errors starting, stopping and running the Infobright server (mysqld). To generate this log, add the following lines to my.cnf:

log-error=<filename>

log-output=FILE

General query log

Connection and statement information received from clients.

Infobright log Server start and stop information. Also contains missing configuration settings.

It is possible to turn on the display of diagnostic information. By default this information is redirected to Infobright’s console, unless an error log is specified (see table above). To turn on diagnostic messages you have to modify your brighthouse.ini configuration file (see "Configuring Infobright" on page 12) and set parameter ControlMessages to 1 (log actions), 2 (to add a time stamp to each message), or 3 (to add memory and CPU resource information).

Note In general, more detail in the log may have an impact on performance; it is recommended that you find and use the setting that strikes the best balance for you in terms of performance versus log details.

Page 27: Infobright Community Edition-user Guide

3. USING INFOBRIGHT

20

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

About Errors Infobright reports the same errors as the standard MySQL server. For more information, see Appendix B. Errors, Error Codes, and Common Problems in the MySQL 5.1 Reference Manual.

There are a few additional errors specific to Infobright import and export commands. For more information, see "About Import Errors" on page 42 and "About Export Errors" on page 42.

About SQL Command Syntax The syntax for Infobright SQL commands is exactly the same as the syntax for MySQL commands. For more information, see SQL Statement Syntax in the MySQL 5.1 Reference Manual.

There are special considerations when using the following commands with Infobright. All other SQL commands can be used with Infobright as they are with the standard MySQL.

USING MYSQL COMMANDS WITH INFOBRIGHT

MySQL Command More Information

CREATE TABLE, DROP TABLE "Creating and Dropping Tables" on page 24

SHOW TABLE STATUS, SHOW FULL COLUMNS

"Viewing Table Information" on page 27

"Viewing Compression Ratio Statistics" on page 28

INSERT, UPDATE, DELETE "Unsupported Data Manipulation Commands (INSERT, UPDATE, DELETE)" on page 31

Important Do not use INSERT, UPDATE, and DELETE to manipulate Infobright data.

LOAD DATA INFILE "Infobright Import/Export Syntax" on page 36

SELECT "Running Queries in Infobright" on page 45

VIEW "Creating VIEWs in Infobright" on page 47

Page 28: Infobright Community Edition-user Guide

3. USING INFOBRIGHT

21

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

About SQL ISO Standards As mentioned in the previous section, Infobright uses the same syntax as the standard MySQL commands. For information about the compliance of the MySQL language with ISO SQL standards, see MySQL Standards Compliance in the MySQL 5.1 Reference Manual.

Infobright is approaching full ISO SQL compliance. However, certain sections of the ISO SQL standard are open to interpretation and each DBMS, including Infobright, may implement these sections slightly differently. Consequently, Infobright query results may differ from those of other databases.

For example, the SQL standard does not define a default collation for string comparisons, which affects the ordering of query results. Different databases will implement different collation approaches, thus displaying inconsistent results for such things as sorts.

Page 29: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

22

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

4. Managing Infobright Tables

About the Infobright Database Files Infobright tables are located in the data subdirectory in your Infobright installation directory. This is the same directory structure used for standard MySQL databases and tables. For more information, see Installation Layouts in the MySQL 5.1 Reference Manual.

Within the data subdirectory, Infobright databases are stored in separate subdirectories. Within each database subdirectory, data files for each Infobright table are stored in separate subdirectories.

Important Do not manually copy a data table from one database to another by copying the database files—internal table numbering errors and Knowledge Grid inconsistencies may occur. To copy a table, use import and export commands (see "Importing and Exporting Data in Infobright" on page 34) or backup the entire database directory (see "Infobright Backup and Recovery" on page 65).

The Infobright server uses additional directories to store temporary data, and optimization information, such as Knowledge Nodes. The following shows the data directory, containing the Infobright databases:

[root@ib03 data]# pwd /usr/local/infobright/data [root@ib03 data]# ls BH_RSI_Repository Infobright.log Infobright.seq ib03.corp.infobright.com.err mysql test

About Supported Data Types The following data types are supported in Infobright. Note that numeric data types ranges are 1 less than the MySQL minimums and maximums.

NUMERIC TYPES

Data Type Minimum Maximum

TINYINT -127 127

BOOL, BOOLEAN -127 127

Page 30: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

23

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

NUMERIC TYPES

Data Type Minimum Maximum

SMALLINT -32767 32767

MEDIUMINT -8388608 8388607

INT (INTEGER) -2147483647 2147483647

BIGINT -9223372036854775806 9223372036854775806

FLOAT -3.402823466E+38 3.402823466E+38

DOUBLE (DOUBLE PRECISION) -1.7976931348623157E+308 1.7976931348623157E+308

DEC(M, D) (DECIMAL(M, D))

where 0 < M <= 18 and

0 <= D <= M

-(1E+M – 1) / (1E+D) (1E+M – 1) / (1E+D)

DATE AND TIME TYPES

Data Type Minimum Maximum Format

DATE 100-01-01 9999-12-31 YYYY-mm-dd

DATETIME 100-01-01 00:00:00 9999-12-31 23:59:59 YYYY-mm-dd HH:MM:SS

TIMESTAMP 1970-01-01 00:00:00 2038-01-01 00:59:59 YYYY-mm-dd HH:MM:SS

TIME -838:59:59 838:59:59 HHH:MM:SS

YEAR (4-digit format only) 1901 2155 YYYY

STRING TYPES

Data Type Maximum Length

CHAR(N) 255

Page 31: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

24

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

STRING TYPES

Data Type Maximum Length

VARCHAR(N) 65532

BINARY(N) 255

VARBINARY(N) 65532

TINYTEXT 255

TEXT(N) 65535

Creating and Dropping Tables Use the standard MySQL commands to create and drop tables in Infobright, the same as you would with a MySQL table. For detailed syntax information, see CREATE TABLE Syntax and DROP TABLE Syntax in the MySQL 5.1 Reference Manual.

Important Do not manually copy a data table from one database to another by copying the database files—internal table numbering errors and Knowledge Grid inconsistencies may occur. To copy a table from one database to another, export from the source database and then import into the target database (see "Importing and Exporting Data in Infobright" on page 34) or backup the entire database directory (see "Infobright Backup and Recovery" on page 65). You can rename the entire database by renaming the folder. However, you should not copy a database folder from one active instance to another, or within the same active instance.

To create a table, enter the following command: mysql> create table <table_name> (<column(s)>);

To drop a table, enter the following command: mysql> drop table table_name;

See "About Column Options" on page 25 for information on supported and unsupported options when creating columns.

Note When creating a table, as a matter of practice one should always use the ENGINE= option to ensure that the correct database engine is used. Infobright is shipped with DEFAULT ENGINE = BRIGHTHOUSE, but this can be changed. The name of the engine can be specified explicitly at the end of create table statement: mysql> create table <table_name> (<column(s)>) engine=brighthouse;

Page 32: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

25

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

About Column Options

NULL and NOT NULL Infobright supports NULL and NOT NULL specifications for columns.

NULL allows NULL values for the column. NOT NULL replaces the imported NULL values with default values such as 0 (zero) for

numeric columns and an empty string (‘’) for string columns.

Lookup Columns Infobright provides an additional modifier for string data type columns, called a lookup column. The lookup column utilizes an integer substitution for values. You can declare a lookup column on a CHAR or VARCHAR column to increase its compression and performance in queries. However, to use a lookup column, the CHAR or VARCHAR column must meet the following criteria:

There is no fixed upper limit for unique values in the column (cardinality). The total size of a dictionary, being the total length of all distinct values, will be loaded into RAM (for example: 1 million distinct values that are each 100-character wide will permanently occupy 100 MB of RAM.) As a rough guideline, the ratio of total number of records to distinct values should be reasonably high (greater than 10).

The column must contain a large number of duplicate values: the ratio of total number of records to distinct values should be greater than 10.

Typically, a lookup column is useful for fields like state, gender, category, and the like where the number of instances is very high, but the number of unique values is very low. To determine the ratio of records to distinct values, determine the number of distinct values using SELECT COUNT (DISTINCT <COLUMN>) FROM… Then compare this to the number of records using a SELECT COUNT(<COLUMN>) FROM…

Note Using a lookup on a column where there are more than 10,000 distinct values will result in greatly reduced load speeds.

To declare a column as a lookup column, add the comment 'lookup' on the column. Enter the following command: mysql> create table … (… <<column name>> <<column type>> … comment 'lookup' … …) engine=brighthouse;

Unsupported Column Options The following column options are ignored by Infobright:

default values references to other tables

Page 33: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

26

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Unsupported Indices Options Infobright uses Knowledge Grid technology instead of standard indices and does not support explicit indices. The following elements of CREATE TABLE syntax related to indices are not allowed:

keys indices unique columns auto-increment columns

Converting Oracle DDL to Infobright If you have an existing Oracle schema definition, use the following steps to make it work on Infobright:

Convert MEDIUMTEXT to VARCHAR (N), where ‘N’ is only as large as necessary Convert LONGTEXT to VARCHAR (N), where ‘N’ is only as large as necessary Convert DOUBLE(A,B) to DECIMAL(A,B) INTEGER types may be converted to BIGINT Convert VARCHAR2/CHAR2 to VARCHAR/CHAR

Converting SQL Server to Infobright If you have an existing SQL Server schema definition, use the following steps to make it work on Infobright:

Convert MEDIUMTEXT to VARCHAR (N), where ‘N’ is only as large as necessary Convert LONGTEXT to VARCHAR (N), where ‘N’ is only as large as necessary Convert DOUBLE(A,B) to DECIMAL(A,B) Convert MONEY to DECIMAL(18,4) Convert SMALLMONEY to DECIMAL(6,4) INTEGER types may be converted to BIGINT NCHAR/NVARCHAR should be converted to CHAR/VARCHAR Convert NUMBER to INTEGER. Convert NUMBER(A,B) to DECIMAL(A,B)

Converting MySQL (MyISAM) to Infobright If you have an existing MyISAM schema definition, use the following steps to ensure compliance with Infobright:

Convert MEDIUMTEXT to VARCHAR (N), where ‘N’ is only as large as necessary Convert LONGTEXT to VARCHAR (N), where ‘N’ is only as large as necessary Convert DOUBLE(A,B) to DECIMAL(A,B)

Page 34: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

27

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Viewing Table Information You can use the standard MySQL commands to obtain information about a table.

To view column information, enter the following command: SHOW [FULL] COLUMNS FROM tbl_name [FROM db_name] [LIKE 'pattern'];

For more information, see SHOW COLUMNS Syntax in the MySQL 5.1 Reference Manual.

Utilization of the FULL option will provide an estimate of the compression for each column. mysql> show full columns from dim_cars; +------------+---------------+-------------------+------+-----+---------+-------+-------------------+---------------------------------------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +------------+---------------+-------------------+------+-----+---------+-------+-------------------+---------------------------------------+ | make_id | decimal(10,0) | NULL | YES | | NULL | | select,references | Size[MB]: 0.1; Ratio: 15.64; unique | | make_name | varchar(25) | latin1_swedish_ci | YES | | NULL | | select,references | Size[MB]: 0.1; Ratio: 5.05 | | model_name | varchar(25) | latin1_swedish_ci | YES | | NULL | | select,references | Size[MB]: 0.1; Ratio: 1.38 | | record_dt | datetime | NULL | YES | | NULL | | select,references | Size[MB]: 0.1; Ratio: 3.86 | +------------+---------------+-------------------+------+-----+---------+-------+-------------------+---------------------------------------+ 4 rows in set (0.01 sec)

To view the CREATE TABLE statement used to create a given table, enter the following command: SHOW CREATE TABLE tbl_name;

For more information, see SHOW CREATE TABLE Syntax in the MySQL 5.1 Reference Manual.

mysql> show create table dim_cars; +----------+--------------------------------------------------------------------+ | Table | Create Table | +----------+--------------------------------------------------------------------+ | dim_cars | CREATE TABLE `dim_cars` ( `make_id` decimal(10,0) DEFAULT NULL, `make_name` varchar(25) DEFAULT NULL, `model_name` varchar(25) DEFAULT NULL, `record_dt` datetime DEFAULT NULL ) ENGINE=BRIGHTHOUSE DEFAULT CHARSET=latin1 | +----------+--------------------------------------------------------------------+ 1 row in set (0.00 sec)

Page 35: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

28

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

To view table information, enter the following command: SHOW TABLE STATUS [FROM db_name] [LIKE 'pattern'];

For more information, see SHOW TABLE STATUS Syntax in the MySQL 5.1 Reference Manual.

mysql> show table status like 'dim_cars'; +----------+-------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-- | Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | +----------+-------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-- | dim_cars | BRIGHTHOUSE | 10 | Compressed | 400 | 11 | 4672 | 0 | 0 | 0 | NULL | +----------+-------------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+-- ---------------------+---------------------+------------+-------------------+----------+----------------+----------------------------------+ Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment | ---------------------+---------------------+------------+-------------------+----------+----------------+----------------------------------+ 2008-08-28 05:30:44 | 2008-04-23 14:17:13 | NULL | latin1_swedish_ci | NULL | | Overall compression ratio: 3.622 | ---------------------+---------------------+------------+-------------------+----------+----------------+----------------------------------+ 1 row in set (0.01 sec)

Viewing Compression Ratio Statistics Infobright provides specific statistics on table and column compression. The compression ratio is calculated in relation to the “natural size” of uncompressed data in the table or column. The ratio equal to n means that the compressed data, including statistics and technical description of a column, is n times smaller than its theoretical natural size.

The following natural sizes (in bytes) are defined for various data types. Note the following:

For all data types, if the column is not declared as NOT NULL, add one bit per value for NULL indicators.

These data sizes take into account the typical format of data display, for example “yyyy-mm-dd” for DATE or decimal point for DEC. The size also counts the bytes that store the actual text length (VARCHAR).

DATA TYPES AND NATURAL SIZES

Data Type Natural Size (in bytes)

CHAR(n), BINARY(n) n*(number of rows)

BIGINT, INT, MEDIUMINT, SMALLINT, TINYINT, BOOL

(8 or 4 or 3 or 2 or 1 or 1)*(number of rows)

Page 36: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

29

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

DATA TYPES AND NATURAL SIZES

Data Type Natural Size (in bytes)

YEAR 4*(number of rows)

DATE 10*(number of rows)

TIME 8*(number of rows)

TIMESTAMP / DATETIME 19*(number of rows)

DEC(x,y) (x+1)*(number of rows)

FLOAT 4*(number of rows)

REAL, DOUBLE 8*(number of rows)

VARCHAR(n), VARBINARY(n) (total number of bytes used—i.e., the total length of all strings, excluding terminating characters) + 2*(number of rows)

Viewing Table Level Compression Ratio Statistics To view the compression ratio at the table level, enter the following command: mysql> show table status [from db_name ] [like 'table_name']; The optional like clause can be used to filter the tables. Note that the table name must be

provided in single quotes. The compression statistics are provided in the table comment. For example:

mysql> show table status from test like 't1' \G *********************** 1. Row ********************** Name: t1 Engine: BRIGHTHOUSE Version: 10 Row_format: Compressed Rows: 3430387 Avg_row_length: 0 Data_length: 0 Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: 2008-09-04 15:31:39 Check_time: NULL Update_time: 2008-09-04 15:35:30 Collation: ascii_bin Checksum: NULL Create_options: Comment: Overall compression ratio 39.908 1 row in set (0.59 sec)

Page 37: Infobright Community Edition-user Guide

4. MANAGING INFOBRIGHT TABLES

30

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Viewing Column Compression Ratio Statistics To view the compression ratio and the compressed size for a column, enter the following

command: mysql> show full columns from table_name …;

A database name and a column filter can be specified in optional clauses. For more information, see SHOW COLUMNS Syntax in the MySQL 5.1 Reference Manual.

The compression statistics are provided in the column comment. In addition to the compression information, the comment line may also contain a “unique” indicator, meaning that the column has all unique values (except nulls).

For example:

Comparison of Calculated Compression Ratio to Physical Size The compression ratio calculated above will differ from the compression ratio calculated from physical sizes of files on disk. The compression ratio based on physical size will be slightly smaller, due to extra files that are generated containing statistics on the imported data, such as Knowledge Nodes. Knowledge Nodes are used to optimize query execution and are discussed further in "About the Knowledge Grid" on page 45.

Page 38: Infobright Community Edition-user Guide

5. DATA MANIPULATION STATEMENTS

31

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

5. Data Manipulation Statements

Unsupported Data Manipulation Commands (INSERT, UPDATE, DELETE) INSERT , UPDATE, and DELETE commands are not supported in Infobright Community Edition and should not be used to manipulate data in Infobright tables. Using these commands may result in errors.

When using GUI tools with Infobright Community Edition, such as MySQL Browser, use these tools in read-only mode only. Do not use these tools to insert, update, or delete data. This may result in errors and the hanging of the GUI application.

To insert data into Infobright tables, use the MySQL import command. For more information, see "Importing and Exporting Data in Infobright" on page 34.

Page 39: Infobright Community Edition-user Guide

6. CHARACTER SET SUPPORT

32

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

6. Character Set Support

Supported Character Sets Infobright storage supports all ANSI and UTF-8 character sets. This means that Infobright can store and retrieve data encoded in 8-bit and multi-byte character sets.

Important Queries that evaluate against UTF-8 character data columns will execute with less performance than an equivalent query against ASCII character data, due to ASCII support of Character Maps in the Knowledge Grid (see "Running Queries in Infobright" on page 45). UTF-8 specific Knowledge Grid extensions will be available in an upcoming release.

Collations and Comparisons Infobright supports all custom UTF-8 collations supported by MySQL 5.1:

utf8_bin utf8_czech_ci utf8_danish_ci utf8_esperanto_ci utf8_estonian_ci utf8_general_ci (default) utf8_hungarian_ci utf8_icelandic_ci utf8_latvian_ci utf8_lithuanian_ci utf8_persian_ci

utf8_polish_ci utf8_roman_ci utf8_romanian_ci utf8_slovak_ci utf8_slovenian_ci utf8_spanish2_ci utf8_spanish_ci utf8_swedish_ci utf8_turkish_ci utf8_unicode_ci*

*utf8_unicode_ci properly handles both French and German collation, so specific collation types for these languages are not necessary.

For more information, see Unicode Support in the MySQL 5.1 Reference Manual.

The SQL standard does not define a default collation; therefore, many DBMS engines have different default collations and produce different results. As a result, there are several differences between Infobright and other DBMS engines.

For Infobright, character data types are case-sensitive. For example, the condition 'toronto'='Toronto' is not true in Infobright. Similarly, the condition LIKE 'Abc%' is not true for 'abcde'.

The Infobright sorting order is “A…Z a…z” (for example 'Zeta' < 'alfa'), which is the same sorting order as used by Oracle. The Infobright sorting order is different than the

Page 40: Infobright Community Edition-user Guide

6. CHARACTER SET SUPPORT

33

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

default MySQL sorting order, which mixes lowercase and uppercase; the SQL Server order, which is “aAbB…zZ”; and the DB2 order, which is “AaBb…Zz”.

The Infobright sorting order affects ORDER BY results, GROUP BY results (which is the order of groups and their definitions—for example, 'aaa' and 'AAA' define different groups) and DISTINCT results. WHERE conditions may also be affected if you are expecting a different sorting order than the one used by Infobright.

To simulate Infobright collation in the MySQL engine, set latin1_bin collation while creating a table (for more information, see Table Character Set and Collation in the MySQL 5.1 Reference Manual). Enter the following command: mysql> create table … collate ascii_bin;

Padding Infobright treats padding differently than other DBMS engines. Infobright assumes literal comparisons of text fields, including all whitespace characters. Therefore, a string containing two spaces is different than a string containing one space or an empty (0 length) string, which is also different than the NULL value.

The Infobright padding definition is compatible with the SQL standard. However, most DBMS systems have defined less restricted, customizable rules regarding text comparison. For example, 'abc ' = 'abc' may be true in some databases but is not true in Infobright.

Note In CHAR columns, trailing spaces are trimmed on LOAD, whereas in VARCHAR columns values are loaded with all spaces.

Page 41: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

34

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

7. Importing and Exporting Data in Infobright

Multi-character Delimiter As of version 4.0.6 GA, data delimited by more than one character can be loaded in Infobright. This means that you can delimit data for each column with a sequence of characters that are not otherwise encountered as valid data. For example, instead of \t, a delimiter such as \t# can be used.

About Transactions

Using AUTOCOMMIT, COMMIT and ROLLBACK Commands By default, Infobright uses AUTOCOMMIT mode to finalize transactions, meaning that every transaction is either automatically committed or rolled back if an error occurs. However, you can and should choose to disable AUTOCOMMIT and use COMMIT and ROLLBACK commands instead.

A new transaction starts with the first LOAD command or DML statement entered in a new Infobright session. A new transaction also starts after each COMMIT or ROLLBACK command.

To enable the use of COMMIT and ROLLBACK commands in Infobright, you must disable AUTOCOMMIT. Enter the following command: mysql> set autocommit=0;

You can disable AUTOCOMMIT by setting the parameter to 0 (zero) and enable AUTOCOMMIT by setting the parameter to 1. If AUTOCOMMIT is set to 1, then when a LOAD is completed, the transaction is automatically committed.

To commit the current transaction, enter the following command: mysql> commit;

If you have not yet committed a LOAD DATA INFILE transaction, you can rollback the transaction. This will restore the import tables to the state that existed before the current transaction. Enter the following command: mysql> rollback;

Using COMMIT and ROLLBACK makes it possible to check the load within the same session before committing the data, as the loaded data is available (viewable) to the load session. For instance, you could check something about the data (number of records load) before committing.

After importing data using the LOAD DATA INFILE command, the status of the import and the number of affected rows is shown. All uncommitted rows, including those from previous imports, are shown; therefore, the number of affected rows may be greater than the number of rows in the file you just imported.

Page 42: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

35

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

About Transaction Behavior While a write operation is being performed on a table, the following occurs:

Queries to the table are not executed until the current import is complete and the operation is committed.

Until the current write operation is committed, all subsequent write commands to the table are queued. They will wait for the write lock to be released before proceeding in the order they were received.

While a read query is being executed on a table, the following occurs:

All subsequent queries run concurrently with the current query.

In general, Infobright uses table level locking where only one LOAD operation can execute at one time and after queries have completed.

Failure Handling If AUTOCOMMIT is disabled and the Infobright server is terminated during an import session, the following occurs:

Infobright does not store the rows that were loaded during the failed import operation. The input file and the database files are not harmed. To load data from the input file,

repeat the LOAD operation.

If AUTOCOMMIT is disabled and the Infobright server is terminated after an import session is completed successfully but is not committed, the following occurs:

The transaction is rolled back and the imported data is lost when the server restarts. The input file and the database files are not harmed by the failed import operation (the

database is unaffected, as if the import session did not occur). To re-import the data, repeat the LOAD operation.

If the Infobright server is terminated during an export operation to a disk file, the following occurs:

A non-empty file is saved on disk; however, the last row in the saved file is inconsistent. The database files are not harmed by the failed export operation. To export the data,

repeat the export operation.

If Infobright tries to import data from a file created during a failed export session, the following occurs:

No data is inserted because the input file consists of corrupted table rows. No new records are added to the database files, so no harm is done.

Page 43: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

36

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

About Export Differences in Infobright There are several important differences between exporting data from Infobright and exporting data from other DBMS engines.

CHAR(n) Data Type Values In Infobright, when you export CHAR(n) data type values to a text file, the extra spaces are trimmed from the export.

Escape Characters The Infobright Loader supports escape character definition and usage.

Exporting NULL Values Infobright recognizes the following representations of NULL values when loading data from a text file:

NULL, \N, <field delimiter><field delimiter>

However, Infobright only exports NULL values in the following representation: <field delimiter><field delimiter>

Other DBMS systems may have different representations of the NULL value; for example, MySQL only recognizes the representation \N for a NULL value. This can create issues if you export data from Infobright and import the data into MySQL. Since MySQL will only look for \N and will not recognize the Infobright representation of the NULL value, MySQL will change the NULL value into the default values in numeric and string columns.

Infobright Import/Export Syntax

Infobright Loader Reject File Control if and when a load is halted (based on the number of rejected rows) using the Infobright Loader reject file. You can also determine if rejected rows are discarded or placed in a rejected row file. The default is to place them in a reject file.

Note The reject file may not exist before running load data infile.

Page 44: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

37

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Use the following parameters to configure the reject file:

INFOBRIGHT LOADER REJECT FILE OPTIONS

Option Description

BH_REJECT_FILE_PATH Path to the file where rejected rows are stored. Rejected rows are placed into the reject file in the order they are rejected. The original format is preserved to allow the operator to correct and rerun the load for only the rejected rows.

Note: If BH_REJECT_FILE_PATH is set, BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD must be set as well.

BH_ABORT_ON_COUNT Abort and rollback the load if the number of rejected rows exceeds this value. If this value is not set, the load will be rolled back to the first bad record if the load fails. A value of -1 means never abort; a value of 0 means abort on first rejected row. There is no upper limit on this value.

Note: BH_ABORT_ON_COUNT and BH_ABORT_ON_THRESHOLD are mutually exclusive.

BH_ABORT_ON_THRESHOLD Abort and rollback the load if the relative number of rejected rows to total processed rows exceeds this value (threshold test starts after one packrow row has been processed). Value must be in the range (0,1) - this is an open interval.

For example:

set @ BH_ABORT_ON_THRESHOLD=0.01 / 0.5 / 0.99 means that 1% / 50% / 99% of all processed lines corrupted will terminate the Infobright Loader and save the problematic rows in the reject file.

Note: BH_ABORT_ON_COUNT and BH_ABORT_ON_THRESHOLD are mutually exclusive.

Enabling the Reject File Functionality To enable the reject file functionality, you must specify BH_REJECT_FILE_PATH and one of the associated parameters (BH_ABORT_ON_COUNT or BH_ABORT_ON_THRESHOLD). For example, if you want to load data from the file DATAFILE.csv to table T but you expects that 10 rows in this file might be wrongly formatted, you would run the following commands:

set @BH_REJECT_FILE_PATH = '/tmp/reject_file'; set @BH_ABORT_ON_COUNT = 10; load data infile DATAFILE.csv into table T;

If less than 10 rows are rejected, a warning will be output, the load will succeed and all problematic rows will be output to the file /tmp/reject_file. If the Infobright Loader finds a tenth bad row, the load will terminate with an error and all bad rows found so far will be output to the file /tmp/reject_file.

Page 45: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

38

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Disabling the Reject File Functionality Disabling the reject file related parameters after the load is recommended to ensure the reject file functionality is not used by accident. For the same reason, setting any values for those parameters in the file my.ini/cnf is not recommended. To disable the reject file functionality, run the following commands:

set @BH_REJECT_FILE_PATH = NULL; set @BH_ABORT_ON_COUNT = NULL;

Infobright Loader Line Terminators You can define the type of terminator for a load when using the "line terminated by" command.

Escape Character The escape character cannot be part of the field terminator. For example, assume you try to execute a command such as:

LOAD DATA INFILE ... FIELDS TERMINATED BY "a\\a" ESCAPED BY '\\';

or LOAD DATA INFILE ... FIELDS TERMINATED BY "#@\t" ESCAPED BY '@';

The following error message will appear: Field terminator containing the escape character not supported.

If you try to execute a command such as: LOAD DATA INFILE ... FIELDS ENCLOSED BY '"' ESCAPED BY '"';

or LOAD DATA INFILE ... FIELDS ENCLOSED BY '#' ESCAPED BY '#';

the following error message will appear: The same enclose and escape characters not supported.

End of Line (EOL) Sequence When using MySQL's LOAD DATA INFILE command, you can specify the end of line (EOL) sequence in the file by adding a LINES TERMINATED BY 'X' clause. For example:

LOAD DATA INFILE 'DATAFILE.csv' INTO TABLE T ... LINES TERMINATED BY '\n'" (Lines terminated like in Linux) LOAD DATA INFILE 'DATAFILE.csv' INTO TABLE T ... LINES TERMINATED BY '\r\n'" (Lines terminated like in Windows)

If the EOL sequence is not specified, the following error message will be output: Query contains syntax that is not supported and will be ignored.

Page 46: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

39

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Importing Data To import data into an Infobright table, use the following MySQL loading command:

LOAD DATA INFILE 'file_name' INTO TABLE tbl_name [FIELDS [TERMINATED BY 'char'] [ENCLOSED BY 'char'] [ESCAPED BY 'char'] ];

where:

file_name = path to the file to be loaded

tbl_name = name of the table where the data will be loaded

Importing Data Using Remote Load Using the Infobright Loader, you can load data from a remote machine across the network using the LOAD DATA LOCAL INFILE syntax. This allows you to offload potentially heavy ETL processing to a separate server, keeping the Infobright server on a dedicated machine. This also allows you to save significant time when transferring large LOAD files over the network, which can typically limit load speed. For more information, see LOAD DATA INFILE Syntax in the MySQL 5.1 Reference Manual.

A few important notes:

Before importing data using the LOAD DATA LOCAL syntax, be sure you fully understand the security issues. See Security Issues with LOAD DATA LOCAL in the MySQL 5.1 Reference Manual for details.

You can disable all LOAD DATA LOCAL statements from the server side by starting mysqld with the --local-infile=0 option.

For the mysql command-line client, enable LOAD DATA LOCAL by specifying the --local-infile[=1] option, or disable it with the --local-infile=0 option. For mysqlimport, local data file loading is off by default; enable it with the --local or -L option. In any case, successful use of a local load operation requires that the server permits it.

Some (but not all) Windows GUI tools may work with remote load, even with Linux servers.

To import data into an Infobright table from a remote machine across the network, use the following MySQL loading command (for more information about command options, see the Data Loading Guide):

LOAD DATA [LOCAL] INFILE 'file_name' INTO TABLE tbl_name [FIELDS [TERMINATED BY 'char'] [ENCLOSED BY 'char'] [ESCAPED BY 'char'] ];

Page 47: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

40

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

where:

file_name = path to the file to be loaded

tbl_name = name of the table where the data will be loaded

If LOCAL is specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full path name to specify its exact location. If given as a relative path name, the name is interpreted relative to the directory in which the client program was started.

Note Network speeds may limit the load speed. Exceptions and errors in the transfer are handled by the MySQL client, and will behave the same as the MySQL client.

Exporting Data To export data from an Infobright table, use the following MySQL export command:

SELECT … INTO OUTFILE 'file_name' [FIELDS [TERMINATED BY 'string'] [ENCLOSED BY 'char'] [ESCAPED BY 'char']] FROM 'tbl_name';

where:

file_name = path to the file where data will be exported

tbl_name = name of the table from which the data will be retrieved

For more information on export syntax, see SELECT Syntax in the MySQL 5.1 Reference Manual.

Optional FIELDS Clause Several optional clauses exist for the MySQL LOAD command. All of these clauses are ignored by Infobright, with the exception of the FIELDS clause. You can also use the FIELDS clause when exporting data.

You can use the optional FIELDS clause to specify how values are provided in the input file.

To use the FIELDS clause, the data import format must be defined as variable-length text.

Within the FIELDS clause, you can use the following sub-clauses:

Use the TERMINATED BY sub clause to specify the character recognized as the separator (delimiter) between values. By default, a semicolon ; is assumed to separate values.

Use the ENCLOSED BY sub clause to specify the character that begins and ends each string representing a text value. By default, a double quotation mark " is assumed to enclose each value. If the text values in the input file do not use any enclosing characters, use the

Page 48: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

41

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

value NULL in the ENCLOSED BY sub clause. Note that this is the same as using the empty string '' option in standard MySQL.

Use the ESCAPED BY sub clause to support special characters that may be imbedded within text fields.

Importing Files with Invalid Values Infobright may abort a load when invalid values are found. Certain invalid values, however, can be loaded in Infobright. The following rules are used with invalid data:

If a numeric, date or time value is invalid, the value is replaced by 0. If a NULL value is imported into a column defined as NOT NULL (except for TIMESTAMP

columns), it is replaced by 0 (for numerical, date and time columns) or by an empty string (for string columns).

Importing Data Using Linux Pipes You can use Linux pipes when importing data in Infobright. The same dataformat parameter applies; see Setting Import and Export Parameters. You can also use the FIELDS clause when exporting data. For more information, see "Optional FIELDS Clause" on page 40.

To set up a Linux pipe, you need to run the mkfifo command from Linux, and ensure that the pipe is accessible to Infobright. In the following example the pipe is setup as /pipe_test/thepipe.pipe. You can use the directory and name of your choice.

mkfifo /pipe_test/thepipe.pipe chmod 666 /pipe_test/thepipe.pipe

Once the pipe is set up, direct the data either by directing a file or a process to the pipe: cat /usr/tmp/jkvarload.txt > /pipe_test/thepipe.pipe &

Then execute a LOAD DATA INFILE statement using the pipe: mysql> load data infile '/pipe_test/thepipe.pipe' into table pipe_test lines terminated by '\n';

When finished, remember to remove the pipe: rm thepipe.pipe

Page 49: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

42

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

About Import Errors There are several possible Infobright-related errors that could occur when using the LOAD DATA command on a Infobright table. These errors are described in the following table. Standard MySQL errors may also occur (for more information, see Appendix B. Errors, Error Codes, and Common Problems in the MySQL 5.1 Reference Manual).

INFOBRIGHT IMPORT ERRORS

Code Message Description Action

1 Cannot open file or pipe

Cannot open a file or a pipe containing input data

Ensure the file exists and the path is entered correctly

2 Wrong data or column definition

Format of data does not comply with table definition

Ensure the data being imported is the correct data type and does not exceed the size specified

3 Syntax error Not used N/A

4 Cannot connect to the database

Not used N/A

5 Unknown error Unspecified error occurred Contact customer support

6 Wrong parameter Wrong value for one of the loading parameters

Make sure the correct parameter is used

7 Data conversion error

A value in data cannot be converted to a column type

Ensure the data is the correct column type

About Export Errors There are several possible Infobright-related errors that could occur when exporting data from a Infobright table. These errors are described in the following table. Standard MySQL errors may also occur (for more information, see Appendix B. Errors, Error Codes, and Common Problems in the MySQL 5.1 Reference Manual).

INFOBRIGHT EXPORT ERRORS

Code Message Description Action

1 Cannot open file or pipe

Can not open a file or a pipe for output

Ensure the file exists and the path is entered correctly

Page 50: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

43

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

INFOBRIGHT EXPORT ERRORS

Code Message Description Action

2 Wrong data or column definition

Not used Ensure the data being exported is the correct data type and does not exceed the size specified

3 Syntax error Not used Check the export syntax

4 Cannot connect to the database

Not used Ensure database exists, the correct path is given and Infobright is started

5 Unknown error Unspecified error occurred Contact customer support

6 Wrong parameter Wrong value for one of the export parameters

Make sure the correct parameter is used

7 Data conversion error

Not used Ensure the data is the correct column type

Sample Script (Create Table, Import Data, Export Data) The following sample script creates a table called customers, sets Infobright as the default engine, imports data from an existing text file and exports the data.

USE Northwind; DROP TABLE IF EXISTS customers; CREATE TABLE customers ( CustomerID varchar(5), CompanyName varchar(40), ContactName varchar(30), ContactTitle varchar(30), Address varchar(60), City varchar(15) Region char(15) PostalCode char(10), Country char(15), Phone char(24), Fax varchar(24), CreditCard float(17,1), FederalTaxes decimal(4,2) ) ENGINE=BRIGHTHOUSE; -- Import the text file. Set AUTOCOMMIT=0; LOAD DATA INFILE "/tmp/Input/customers.txt" INTO TABLE customers FIELDS TERMINATED BY ';' ENCLOSED BY 'NULL' LINES TERMINATED BY '\r\n'; COMMIT;

Page 51: Infobright Community Edition-user Guide

7. IMPORTING AND EXPORTING DATA IN INFOBRIGHT

44

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

-- Export the data into TEXT format. SET @bh_dataformat = 'txt_variable'; SELECT * INTO OUTFILE "/tmp/output/customers.text" FIELDS TERMINATED BY ';' ENCLOSED BY 'NULL' LINES TERMINATED BY '\r\n' FROM customers;

Exporting and Importing Query Results After exporting the results of a query to an output file, you may not be able to import the file back into the same definition of the accessed table. This is because the query may contain aggregates that will produce values beyond the boundaries of the original data types. In order to load the output file, you may need to create a new table with the appropriate data types for the values to be imported.

Page 52: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

45

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

8. Running Queries in Infobright

About the Knowledge Grid The Knowledge Grid is a set of Infobright metadata used by the Infobright storage engine (named “Brighthouse”) to optimize query execution. The Knowledge Grid consists of Knowledge Nodes, which are optimization data for particular tables and columns. Knowledge Nodes are stored on disk in a special directory, specified in the brighthouse.ini configuration file (see "Query Support " on page 51). Knowledge Nodes can be lost without losing data integrity.

About Knowledge Nodes There are four kinds of Knowledge Nodes:

INFOBRIGHT KNOWLEDGE NODES

Knowledge Node Type

Description

Histogram Used by Infobright to enhance the speed of most queries consisting of numerical conditions (including date/time, decimal, etc.).

Histograms are created automatically during data load.

Character Map Used by Infobright to enhance the speed of most queries consisting of text conditions.

Character Maps are created automatically during data load.

Pack/Pack Used to enhance joining of tables. Created or updated automatically while executing user queries.

DPN (Data Pack Nodes)

Statistical metadata that describes the content of the Data Pack. Used to both assist in data access and in rough operations.

DPNs are created automatically during data load.

Page 53: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

46

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Running Queries

Running Queries To run queries on Infobright tables, use the following standard MySQL syntax:

mysql> select …;

The Infobright Optimizer is the primary engine used to resolve queries. While significant additions have been made to the library of supported SQL, there are cases where the query will still be executed by the MySQL query engine instead of the Infobright engine. In this event, query response time tends to suffer due to the fact that the MySQL engine is row-oriented and therefore cannot make use of the Knowledge Grid information, and in some cases it can be too slow to be usable. For best performance, ensure your queries (and VIEWs) contain only syntax supported by the Infobright Optimizer. For more information, see “Subquery Support” on page 56 for select syntax supported in Infobright.

Enabling Queries to be Redirected to the MySQL Engine

Enabling Queries to be Redirected to the MySQL Engine By default, executing queries in MySQL query engine is disabled. You can enable queries that cannot be handled by the Infobright Optimizer to be redirected to the MySQL query engine by editing the file brighthouse.ini within the data directory:

AllowMySQLQueryPath=1

If the MySQL query path is disabled, then the following message will be returned if the query would have otherwise been directed to MySQL for processing:

The query includes syntax that is not supported by the Infobright Optimizer. Infobright suggests either restructure the query with supported syntax, or enable the MySQL Query Path in the brighthouse.ini file to execute the query with reduced performance.

Viewing Queries Redirected to the MySQL Engine When a query is redirected from the Infobright Optimizer to the MySQL query engine, a warning is reported. For example:

400 rows in set, 1 warning (0.00 sec)

This will occur when functions not optimized in Infobright are used. If you get poor query performance, you should execute the command below to identify if a query has been directed to the MySQL query engine.

After running a query, enter the following command to view any warnings: mysql> show warnings;

The following message indicates that the query was directed to MySQL for processing: 1105 | Query syntax not implemented in Brighthouse, executed by MySQL engine.

Page 54: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

47

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Important When queries are executed on Infobright tables by the standard MySQL engine, performance can be significantly slower than when queries are executed by Infobright .

Terminating a Query If you want to terminate a query executed from a client session before the query is complete, do the following:

1. Use the show [full] processlist command to determine the query’s process ID.

2. Use the kill <id> command to terminate the query.

OR

If you are using a command-line MySQL client, you can also use Ctrl+C to terminate the query.

Creating VIEWs in Infobright Infobright supports the creation of VIEWs. Please note that the VIEW must contain Infobright optimized syntax, or the VIEW will be run in the MySQL query engine.

Create VIEW Syntax The syntax to create a VIEW is as follows:

CREATE [OR REPLACE] VIEW view_name [(column_list)] AS select_statement

A VIEW must contain unique column names. If you select two columns with the same name from separate tables, at least one must be aliased or the column list option must be used.

If the VIEW’s select statement contains functionality that is not supported in the Infobright optimizer, then the VIEW will perform sub-optimally since it will always flip over to the MySQL query engine.

Select Syntax Supported in Infobright The following SELECT syntax is supported in Infobright.

Select Syntax For more information, see SELECT Syntax in the MySQL 5.1 Reference Manual.

SELECT [ ALL | DISTINCT | DISTINCTROW ] Select_expr, … [ FROM table_references [ WHERE where_condition ] [ GROUP BY {col_name | expr | position} ] [ HAVING where_condition ] [ ORDER BY {col_name | expr | position } [ ASC | DESC ], … ] [ LIMIT { [ offset,] row_count | row_count OFFSET offset} ]

Page 55: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

48

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

[ INTO OUTFILE 'file_name' export_options - AS alias_name - ORDER BY NULL ]

Join Syntax For more information, see JOIN Syntax in the MySQL 5.1 Reference Manual.

Infobright supports the following JOIN syntax for the table_references part of SELECT statements (as described in the previous section, “Select Syntax”):

table_references: table_reference [, table_references] table_reference: table_factor | join_table table_factor: tbl_name [ [ AS ] alias] join_table: table_reference [ INNER | CROSS ] JOIN table_factor [join_condition] | table_reference STRAIGHT_JOIN table_factor | table_reference STRAIGHT_JOIN table_factor ON condition | table_reference {LEFT|RIGHT} [OUTER] JOIN table_reference join_condition Join_condition: ON conditional_expr | USING (column_list)

Union Syntax For more information, see UNION Syntax in the MySQL 5.1 Reference Manual.

SELECT …. UNION [ ALL | DISTINCT ] SELECT … [ UNION [ ALL | DISTINCT ] SELECT … ]

Subqueries For more information, see Subquery Syntax in the MySQL 5.1 Reference Manual.

SELECT * FROM t1 WHERE column1 = (SELECT max(column1) FROM t2);

The following functions are also supported:

subquery as scalar operand subquery with ANY, IN, SOME and ALL EXISTS and NOT EXISTS correlated subqueries subqueries in the FROM clause VIEWs in the FROM clause

Page 56: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

49

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Query Performance Due to Infobright’s column-oriented data organization and other Infobright-specific features, query optimization in Infobright is slightly different than in traditional DBMS approaches.

Infobright works well with data tables containing many columns, where only necessary columns are accessed by query (as opposed to SELECT *). The traditional approach suggests keeping records as small as possible (e.g., using schema normalization and table decomposition). However, in Infobright, only necessary columns are used in calculations. Therefore, queries with many limiting conditions on many columns of the same table are especially well optimized in Infobright.

In traditional DBMS systems, better performance can be achieved by creating indices. In Infobright, Knowledge Nodes are used instead of indices (Knowledge Nodes are created automatically). To further enhance performance, you can try to influence the data loading procedure by keeping similar data (e.g., for similar time frames) close together. The order in which data are loaded may influence both compression ratio and query speed.

Avoid using OR in queries and, if possible, use IN instead. In some cases ORs can be translated to UNION ALL or IN, for example: “...WHERE a=1 OR a=2...“ could be replaced by “...WHERE a IN (1,2)...”.

Try to replace correlated subqueries with joins and independent subqueries. Executing queries in steps may also help with missing function support. For instance,

execute the bulk of query in Infobright and export the data to MyISAM table. Then execute the function query on the result set.

To optimize your query performance, avoid the following which will result in the query being handled by the MySQL query engine:

Using functions or type cast operators. Creating queries containing mixed Infobright and MySQL tables. Performing comparisons or arithmetical operations on two different data types (such as

numbers and text). Creating JOINs with the JOIN condition defined as NOT BETWEEN.

Page 57: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

50

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Rough Queries

About Rough Query Rough query provides fast ad-hoc querying without indexes or other database optimizations. Query results are processed based on Knowledge Node information only and do not involve disk access.

Rough query will never tell you something does not exist when it actually does exist. This guarantee is an important property of the rough approximation the engine uses and is why rough query is appropriate for operational type queries (namely, iterative analytics).

This also means that all queries will return an answer lying between the upper and lower bounds. Taken to its extreme, this means that a poor quality estimate of rough aggregation will return +inf, -inf. Any existential query (e.g., simple projection) will have a similar guarantee-—the range of values returned is guaranteed to be within the approximation of the rough evaluation. This gives you confidence in the result and is why operational telescoping queries (getting wider and narrower) provide context for queries.

Approximate query loosens this restriction and instead gives an estimate of the answer and a margin of error and confidence interval for the estimation. This difference in semantics is crucial to the way that rough query can be used vs. approximate query.

This is the most crucial property of rough query from a use case perspective as well and should not be understated.

Select “roughly” allows you to instantly see the Min/Max range of the aggregate and does so by using only the in-memory Knowledge Grid meta-data structures. For example:

Select roughly num_of_unique_visits from fact_log

returns the range in which the values in the column lie—that is, they return two rows, the upper and the lower bound (for example, 10/20).

Filters (where clause) are supported for rough query. Aggregates—such as Min, Max, Sum, Count(*)—are also supported. For example:

Select roughly Sum(num_of_unique_visits) from fact_log

returns the range for the where the sum of that column lies in (for example, 100/200).

Ranges for VARCHARs are not supported. Corelated sub queries are optimized using rough evaluation so they perform faster.

Page 58: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

51

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Query Support In general, * means all, min max on Numeric types to evaluate Upper and Lower bounds.

When there are no available rough values (statistics) for the query, the return values will be the known upper and lower bounds. In the case of most numerical functions, this will be either -inf, +inf or 0, +inf.

In general, you should not use AllowMySQLPath with rough query as unsupported syntax may switch to the MySQL exact evaluation resulting in a heavy query. Future releases will disable this path in the case of select roughly.

QUERY SUPPORT

SELECT Support in Roughly

GENERAL CASE + SIMPLE SELECTS

SELECT ALL

SELECT * FROM t1;

min, max (Numeric), '*' String, min, max for DATETIME types

SELECT DISTINCT(a) FROM t1;

min, max (Numeric), '*' String, min, max for DATETIME types

SELECT DISTINCTROW a FROM t1;

min, max (Numeric), '*' String, min, max for DATETIME types

UNION / UNION ALL

SELECT * FROM t1 UNION t2;

SELECT * FROM t1 UNION ALL t2;

INTERSECT / EXCEPT

S1 EXCEPT ALL S2

Page 59: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

52

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

QUERY SUPPORT

SELECT Support in Roughly

S1 INTERSECT ALL S2

S1 EXCEPT S2

S1 INTERSECT S2

JOINS

CROSS JOIN (Explicit and Implicit)

SELECT * FROM t1 CROSS JOIN t2

SELECT * FROM t1, t2

NATURAL / INNER JOIN (Semantically equivalent in MySQL)

SELECT * FROM t1 INNER JOIN t2 ON t1.a = t2.a

min, max (possibly some rough projections to evaluate t1.a = t1.b, but in the general case this will be the same as the cross join)

SELECT * FROM t1 INNER JOIN t2 USING (a)

min, max (possibly some rough projections to evaluate t1.a = t1.b, but in the general case this will be the same as the cross join)

SELECT * FROM t1, t2 WHERE t1.a = t2.a

min, max (possibly some rough projections to evaluate t1.a = t1.b, but in the general case this will be the same as the cross join)

NJ SELECT * FROM t1 NATURAL JOIN t2

min, max (possibly some rough projections to evaluate t1.a = t1.b, but in the general case this will be the same as the cross join)

STRAIGHT JOIN

Page 60: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

53

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

QUERY SUPPORT

SELECT Support in Roughly

SELECT * FROM t1 STRAIGHT_JOIN t2 ON t1.a = t2.a

(Note: transformed by MySQL into a inner join) min, max (possibly some rough projections to evaluate t1.a = t1.b, but in the general case this will be the same as the cross join)

OUTER JOIN

LOJ1 SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.a = t2.a

Unknown

LOJ2 SELECT * FROM t1 LEFT OUTER JOIN t2 USING (a)

Unknown

ROJ1 SELECT * FROM t1 RIGHT OUTER JOIN t2 ON t1.a = t2. a

Unknown

ROJ2 SELECT * FROM t1 RIGHT OUTER JOIN t2 USING (a)

Unknown

FOJ SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.a = t2.a

Not Supported

FOJ-ALT1 "SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a UNION SELECT * FROM t1 RIGHT JOIN t2 ON t1.a = t2.a WHERE t1.a IS NULL"

min and max for every component of UNION; there would be two pairs of min and max evaluated as for each of the involved SELECTs (in this case JOINs). Exact behaviour to be checked. Unknown

Page 61: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

54

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

QUERY SUPPORT

SELECT Support in Roughly

FOJ-ALT2 "SELECT t1.*, t2.* FROM t1.* LEFT JOIN t2 ON t1.a = t2.a UNION SELECT t1.*, t2.* FROM t2 LEFT JOIN t1 ON t1.a = t2.a WHERE t1.a IS NULL"

min and max for every component of UNION; here would be two pairs of min and max evaluated as for each of the involved SELECTs (in this case JOINs). Exact behaviour to be checked. Unknown

SELF-JOIN

ORDER BY "SELECT a, b, c, d FROM t1 sj1, t1 sj2 WHERE sj1.a = sj1.b AND sj1.c > sj1.d OR sj1.a = ‘Value’ ORDER BY sj1.c, sj1.d"

ORDER BY is omitted and then the query will behave as for the SELF JOIN. In general ORDER BY is ignored unless there is a limit supplied. If a LIMIT is supplied with ORDER BY we can provide a more precise upper and lower bound.

CORRELATED SUBQUERIES (Alternate Joins - See SUBQUERY TABLE FOR MORE ON SUPPORTED SYNTAX)

LOJ-ALT SELECT a, (SELECT b FROM t2 WHERE t1.a = t2.a) FROM t1;

Unknown

SELECT / JOIN RELATIONS

SUPPORTED RELATIONS (in Join + Sub Query) ON or WHERE Which of the relations are significant

t1.a = t2.a Rough JOIN (Numeric)

t1.a > t2.a Rough JOIN (Numeric)

t1.a >= t2.a Rough JOIN (Numeric)

Page 62: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

55

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

QUERY SUPPORT

SELECT Support in Roughly

t1.a < t2.a Rough JOIN (Numeric)

t1.a <= t2.a Rough JOIN (Numeric)

t1.a <> t2.a No Rough approximation possible

t1.a LIKE t2.a Unknown

t1.a NOT LIKE t2.a

Unknown

t1.a IS BETWEEN t2.a and t2.b

Rough JOIN (Numeric)

AGGREGATE FUNCTIONS

SELECT AND SUBSELECT W/ DISTINCT We are passing statistics from the evaluation of the WHERE clause

AVG() All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

COUNT() All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

MAX() All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

MIN() All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

SUM() All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

SELECT AND SUBSELECT W/O DISTINCT

Page 63: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

56

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

QUERY SUPPORT

SELECT Support in Roughly

STD All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

STDDEV All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

STDDEV_POP All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

STDDEV_SAMP

All aggregate functions are guesses to the original values - the rough constraint property - i.e. no false negatives - holds throughout.

Subquery Support

SUBQUERY SUPPORT

SUBSELECT Support in Roughly

SIMPLE SUBSELECTS

SIMPLE SUBQUERIES IN SELECT CLAUSE

SELECT (SELECT 1 FROM t2) FROM t1;

Unknown

SELECT (SELECT a1 FROM t1) FROM t1;

Unknown

SELECT (SELECT 1 FROM t2), (SELECT min(a1) FROM t1), ...;

Unknown

SELECT (avg(sum_a5) FROM (SELECT sum(a5) AS sum_a5 FROM t1 GROUP BY a5) AS t1);

Unknown

Page 64: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

57

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

SUBQUERY SUPPORT

SUBSELECT Support in Roughly

SELECT (SELECT min(a1) FROM t1) FROM t1;

Unknown

SUBQUERIES WITHIN COMPLEX EXPRESSIONS

SELECT (SELECT min(a1) FROM t1) + a1, 1 + (SELECT min(a1) FROM t1) FROM t1;

Not yet supported (Exact or Rough)

LIMIT INSIDE SUBQUERY

SELECT (SELECT a1 FROM t1 ORDER BY a1 LIMIT 1) FROM t2

Unknown

SELECT (SELECT a1 FROM t1 LIMIT 1) FROM t2

Unknown

SELECT a5 + (SELECT a1 + a2 FROM t1 LIMIT 1) FROM t2

Not yet supported (Exact or Rough)

SELECT count(*) + (SELECT a1+a2 FROM t1 LIMIT 1) FROM t2

Not yet supported (Exact or Rough)

SUBQUERY INSIDE OF FUNCTIONS (either Row Function or an Aggregation)

SELECT avg((SELECT avg(a1) FROM t1)) FROM t1;

Unknown

SUBQUERY AS LOGICAL EXPRESSION

SELECT a1 IN (SELECT a1 FROM t1) FROM t1;

Unknown

SELECT a1 > ALL (SELECT a1 FROM t1) FROM t1;

Unknown

SELECT ISNULL((SELECT min(a1) FROM t1));

Unknown

FROM

SIMPLE SUBQUERY

Page 65: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

58

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

SUBQUERY SUPPORT

SUBSELECT Support in Roughly

SELECT * FROM (SELECT * FROM t1 UNION SELECT * FROM t2) AS A;

Will return the min and max of the UNION (Note: generated from the four constituent rows of the UNION - min max for each query involved in the UNION)

SELECT * FROM (SELECT * FROM t1) AS A;

min, max as for SELECT * FROM t1

JOIN SUBQUERIES, OTHER TABLES, VIEWS, TEMP TABLES

"SELECT * FROM (SELECT min(a1) FROM t1 WHERE … ) as A, (SELECT b1+1 FROM t2 WHERE …) as B, t3, T4_temp_table, t5_view … ;"

Should behave as for the general cases outlined elsewhere (e.g. the Unsupported expression will return)

WHERE

INDEPENDENT SUBQUERIES

SELECT * FROM t1 WHERE a1 > ALL(SELECT t2.b1 FROM t2 WHERE t2.b1 = 10);

Unknown

DEPENDENT SUBQUERIES (But not in HAVING...)

SELECT * FROM t1 WHERE a1 > ALL(SELECT b1 FROM t2 WHERE t1.a1 =t2.b1);

Subquery will generate all packrows suspect, so it will evaluate as for SELECT * FROM t1 with no WHERE clause

DEPENDENT SUBQUERIES USING UNION

"SELECT * FROM t1 WHERE t1.a1 > ANY( SELECT * FROM t2 WHERE t2.a1=t1.a1 UNION …. );"

Subquery will generate all packrows suspect, so it will evaluate as for SELECT * FROM t1 with no WHERE clause

USE IN SUBQUERY IN CLAUSE SELECT COLUMN FROM OUTER TABLE

Page 66: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

59

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

SUBQUERY SUPPORT

SUBSELECT Support in Roughly

SELECT * FROM t1 WHERE a3 > (SELECT max(a1) FROM t2);

Subquery will generate all packrows suspect, so it will evaluate as for SELECT * FROM t1 with no WHERE clause

SELECT * FROM t1 WHERE a3 > (SELECT a1 FROM t1)

Subquery will generate all packrows suspect, so it will evaluate as for SELECT * FROM t1 with no WHERE clause

COMPLEX EXPRESSIONS CONTAINING SUBQUERIES

SELECT * FROM t1 WHERE a1 > a2 + (SELECT min(b1) FROM t2);

Not yet supported (Exact or Rough)

ROW SUBQUERIES

SELECT * FROM t1 WHERE (a1,a2) = (1,1)

Not yet supported (Exact or Rough)

SELECT * FROM t1 WHERE a1 = 1 AND a2 = 1;

As for simple projection

SELECT * FROM t1 WHERE (a1, a2) > ALL(SELECT b1,b2 FROM t2);

Not yet supported (Exact or Rough)

SELECT a1,a2,a3 FROM t1 WHERE (a1,a2,a3) IN (SELECT b1,b2,b3 FROM t2);

Not yet supported (Exact or Rough)

GROUP BY

As for SIMPLE SUBSELECTS—complete implementation of one implies the other

GENERAL CASE

SELECT * FROM t1 GROUP BY (SELECT b1 FROM t2);

Unknown

SELECT * FROM t1 GROUP BY (SELECT b1 FROM t2 WHERE a1=b1);

Unknown

Page 67: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

60

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

SUBQUERY SUPPORT

SUBSELECT Support in Roughly

SELECT * FROM t1 GROUP BY (SELECT min(a1) FROM t2);

Unknown

HAVING

DEPENDENT

"SELECT * FROM t1 GROUP BY a1 HAVING a1 > ANY(SELECT b1 FROM t2 WHERE a1 = b1 );"

Unknown

INDEPENDENT

"SELECT * FROM t1 GROUP BY a1 HAVING a1 > ANY(SELECT b1 FROM t2 WHERE b5 = b1 );"

Unknown

ORDER BY

GENERAL CASE

SELECT * FROM t1 ORDER BY (SELECT b1 FROM t2);

Unknown

SELECT * FROM t1 ORDER BY (SELECT b1 FROM t2 WHERE a1=b1);

Unknown

SELECT * FROM t1 ORDER BY (SELECT min(a1) FROM t2);

Unknown

Page 68: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

61

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Complex Expression

COMPLEX EXPRESSION

SUBSELECT Support in Roughly

SELECT / SUBSELECT

ROW FUNCTIONS (ARITHMETIC)

SELECT SUM(a) FROM t1 WHERE a = 200 As for previous description

SELECT SUM(a + b) FROM t1 WHERE a = 200

Expression so will always be +/-inf

AGGREGATE FUNCTIONS (ARITHMETIC)

SELECT MIN(ABS(a)) FROM t1 WHERE ... Expression so will always be +/-inf

SELECT MIN(a + b) FROM t1 WHERE ... Expression so will always be +/-inf

SELECT MIN(ABS(a + b)) FROM t1 WHERE ...

Expression so will always be +/-inf

AGGREGATION OVER ROW FUNCTION (ARITHMETIC)

SELECT AVG(ABS(a + b)) FROM t1 WHERE ...

COLUMN / VARIABLE / CONSTANT

SELECT ROUGHLY 100 * 200 + 200 FROM t1 As for exact evaluation

SELECT a, b FROM t1 WHERE a = @VAR As for simple projection

TYPE COERCION (Implicit Conversion)

IMP1 VARCHAR(20) a, INT b; SELECT a, b FROM t1 WHERE a = b

Unknown

Page 69: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

62

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

COMPLEX EXPRESSION

SUBSELECT Support in Roughly

EXP1 VARCHAR(20) a, INT b; SELECT a, b FROM t1 WHERE CAST(a AS INT) = b

Expression so will always be +/-inf

CASE HANDLING

SELECT SUM(CASE WHEN a = c THEN b ELSE 0) FROM t1;

Expression so will always be +/-inf

OTHER

Some combination of the above

FROM

SIMPLE

SELECT a FROM t1 WHERE...

SUBQUERY

See "Subquery Support" on page 56.

JOIN SUPPORT

See "Subquery Support" on page 56.

TEMPORARY TABLE SUPPORT

TBD

WHERE

LOGICAL CONDITIONS

NOT SELECT a, b FROM t1 WHERE NOT ABS(a)

IN / NOT IN1 SELECT a, b FROM t1 WHERE a IN (b, c)

IN / NOT IN2 SELECT a, b FROM t1 WHERE a IN (b, a + b)

IN / NOT IN3 SELECT a, b FROM t1 WHERE a NOT IN (AVG(b), AVG(c))

Page 70: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

63

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

COMPLEX EXPRESSION

SUBSELECT Support in Roughly

IN / NOT IN4 SELECT a, b FROM t1 WHERE a NOT IN (a, b, NULL)

COLUMN / VARIABLE / CONSTANT

WHERECONST SELECT a FROM t1 WHERE 1

WHEREOP SELECT a FROM t1 WHERE a + b = c

WHEREVAR SELECT a FROM t1 WHERE a = @somevar

SUBQUERY

See "Subquery Support" on page 56.

GROUP BY

COMPLEX EXPRESSION

GENERAL SELECT a FROM t1 WHERE a = b GROUP BY a + b

NOTFULLGROUP SELECT a FROM t1 GROUP BY b -- implicit inclusion

FULLGROUP SELECT a FROM t1 GROUP BY b -- error if @@sql_mode = ONLY_FULL_GROUP_BY

ALIAS SELECT a, (SELECT b FROM t2) AS cA FROM t1 GROUP BY cA

CONDITIONS

COND1 SELECT a, b FROM t1 GROUP BY a>12

HAVING

Note: HAVING functionality is the same as GROUP BY -- except General case below

COMPLEX EXPRESSION

Page 71: Infobright Community Edition-user Guide

8. RUNNING QUERIES IN INFOBRIGHT

64

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

COMPLEX EXPRESSION

SUBSELECT Support in Roughly

HAVING SELECT a FROM t1 WHERE a = b GROUP BY a + b HAVING a > c

ORDER BY

COMPLEX EXPRESSION

GENERAL SELECT a + b FROM t1 WHERE ... ORDER BY 1

UNION SELECT SUM(a + b) FROM t1 UNION SELECT SUM(a + c) FROM t1 ORDER BY 1

Page 72: Infobright Community Edition-user Guide

9. INFOBRIGHT BACKUP AND RECOVERY

65

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

9. Infobright Backup and Recovery

Backup Procedure Use the following procedures to back up Infobright.

To back up the Infobright databases, copy the entire directory containing the Infobright databases (usually the data subdirectory in your Infobright installation directory).

You can take advantage of incremental backups, since only some of the database files are updated when new data is imported. Be sure to do a full backup occasionally.

Important Some files in the KNFolder are updated when queries (using JOIN) are run so be sure to back up the KNFolder on a regular basis.

Restore Procedure To restore the Infobright databases from a backup copy, do the following:

1. Replace the entire data directory (usually the data subdirectory in your Infobright installation directory) with the backup copy.

2. Replace the KNFolder with the backup copy (if the KNFolder is not inside the data directory).

Important Do not manually modify database files or move them from one database to another—this may lead to data corruption and unpredictable results.

Page 73: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

66

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

A. Infobright Optimizer - Supported Functions and Operators

Comparison Functions and Operators

Equal = YES

Null safe equal <= > No (MySQL engine)

Not equal <> , != YES

Less than or equal <= YES

Less than < YES

Greater than > YES

Greater than or equal >= YES

IS No (MySQL engine)

IS NOT No (MySQL engine)

IS NULL YES

IS NOT NULL YES

BETWEEN … AND … YES (except in join conditions)

NOT BETWEEN … AND ... YES

COALESCE YES

GREATEST No (MySQL engine)

IN YES

NOT IN YES

ISNULL YES

INTERVAL No (MySQL engine)

LEAST No (MySQL engine)

Page 74: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

67

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Logical Operators

NOT, ! YES (except in join conditions)

AND, && YES

OR, | | YES

XOR No (MySQL engine)

Control Flow Functions

CASE YES

IF YES

IFNULL YES

NULLIF YES

String Functions

ASCII YES

BIN YES

BIT_LENGTH YES

CHAR No (MySQL engine)

CHAR_LENGTH YES

CHARACTER_LENGTH YES

CONCAT YES

CONCAT_WS YES

CONV YES

ELT YES

EXPORT_SET YES

FIELD YES

FIND_IN_SET YES

Page 75: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

68

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

FORMAT YES

HEX YES

INSTR YES

LCASE YES

LEFT YES

LENGTH YES

LOAD_FILE No (MySQL engine)

LOCATE YES

LOWER YES

LPAD YES

LTRIM YES

MAKE_SET YES

MID YES

OCT YES

OCTET_LENGTH YES

ORD YES

POSITION YES

QUOTE YES

REPEAT YES

REPLACE YES

REVERSE YES

RIGHT YES

RPAD YES

RTRIM YES

Page 76: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

69

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

SOUNDEX YES

SOUNDS LIKE No (MySQL engine)

SPACE YES

SUBSTR YES

SUBSTRING YES

SUBSTRING_INDEX YES

TRIM YES

UCASE YES

UNHEX No (MySQL engine)

UPPER YES

String Comparison Functions

LIKE YES

NOT LIKE YES

RLIKE YES

REGEXP YES

NOT REGEXP YES

STRCMP YES

Numeric Functions

Addition ( + ) YES

Subtraction ( - ) YES

Multiplication ( * ) YES

Division ( / ) YES

Modulo ( % ) YES

Page 77: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

70

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

ABS YES

ACOS YES

ASIN YES

ATAN2, ATAN YES

ATAN YES

CEIL YES

CEILING YES

CONV YES

COS YES

COT YES

DEGREES YES

EXP YES

FLOOR YES

LN YES

LOG10 YES

LOG2 YES

LOG YES

MOD YES

OCT YES

PI YES

POW YES

POWER YES

RADIANS YES

RAND YES

Page 78: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

71

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

ROUND YES

SIGN YES

SIN YES

SQRT YES

TAN YES

TRUNCATE YES

Date and Time Functions

ADDDATE YES

ADDTIME YES

CURDATE YES

CURRENT_DATE YES

CURRENT_TIME YES

CURRENT_TIMESTAMP YES

CURTIME YES

DATE YES

DATEDIFF YES

DATE_ADD YES

DATE_FORMAT YES

DATE_SUB YES

DAY YES

DAYNAME YES

DAYOFMONTH YES

DAYOFWEEK YES

DAYOFYEAR YES

Page 79: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

72

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

EXTRACT YES

FROM_DAYS No (MySQL engine)

FROM_UNIXTIME YES

GET_FORMAT No (MySQL engine)

HOUR YES

LAST_DAY No (MySQL engine)

LOCALTIME YES

LOCALTIMESTAMP YES

MAKEDATE No (MySQL engine)

MAKETIME No (MySQL engine)

MICROSECOND No (MySQL engine)

MINUTE YES

MONTH YES

MONTHNAME YES

NOW YES

PERIOD_ADD YES

PERIOD_DIFF YES

QUARTER YES

SECOND YES

SEC_TO_TIME No (MySQL engine)

STR_TO_DATE No (MySQL engine)

SUBDATE YES

SUBTIME YES

SYSDATE YES

Page 80: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

73

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

TIME YES

TIMEDIFF YES

TIMESTAMP No (MySQL engine)

TIMESTAMPADD No (MySQL engine)

TIMESTAMPDIFF No (MySQL engine)

TIME_FORMAT YES

TIME_TO_SEC No (MySQL engine)

TO_DAYS YES

UNIX_TIMESTAMP YES

UTC_DATE YES

UTC_TIME YES

UTC_TIMESTAMP No (MySQL engine)

WEEK YES

WEEKDAY No (MySQL engine)

WEEKOFYEAR No (MySQL engine)

YEAR No (MySQL engine)

YEARWEEK YES

Text Search and Other Functions

BINARY No (MySQL engine)

CAST YES

CONVERT YES

MATCH No (MySQL engine)

Bit Functions No (MySQL engine)

Page 81: Infobright Community Edition-user Guide

A. INFOBRIGHT OPTIMIZER - SUPPORTED FUNCTIONS AND OPERATORS

74

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Encryption, Compression Functions No (MySQL engine)

Information Functions No (MySQL engine)

Group By Aggregate Functions

AVG YES

BIT_OR No (MySQL engine)

BIT_AND No (MySQL engine)

BIT_XOR No (MySQL engine)

COUNT YES

GROUP_CONCAT No (MySQL engine)

MIN YES

MAX YES

STD, STDDEV YES

STDDEV_POP YES

STDDEV_SAMP YES

SUM YES

VAR_POP YES

VAR_SAMP YES

VARIANCE YES

Group By Modifiers

ROLLUP No (error signalled)

Page 82: Infobright Community Edition-user Guide

B. INFOBRIGHT DATA TOOLS

75

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

B. Infobright Data Tools

Infobright Configuration Manager As of ICE 3.5.2 GA, several tuning parameters are now configured automatically and have been deprecated from the brighthouse.ini file. These parameters as well as several new parameters for managing multi-core query processing have been moved to a different system-only file.

The deprecated parameters include: ServerCompressedHeapSize LoaderSaveThreadNumber BufferingLevel CachingLevel ClusterSize HugefileDir

The additional query processing parameters include: Threads QueueLength Depth

Setting ControlMessages=4 in the brighthouse.ini will print the configuration settings to log file.

Running the Infobright Configuration Manager To run the Infobright Configuration Manager, use the following command:

confman.sh --defaults-file=/etc/my-ib.cnf --autoconfigure=yes

Charset Migration Tool Installations prior to ICE 3.3.1 GA may have charsets and collations defined for tables/columns that do not match the actual Infobright storage of ascii charset and ascii_bin collation. When you upgrade to ICE 4.0.6 GA these settings will be respected.

Infobright includes a standalone application to adapt existing tables created prior to ICE 3.3.1 GA to UTF-8 capable structures. The Charset Migration Tool (CHMT) is in the Infobright bin directory.

Running the Charset Migration Tool CHMT requires a text file containing a mapping between collations used for conversion.

chmt --help // help message

Executing CHMT: chmt --datadir=/absolute/path/to/data/directory [other parameters]

Page 83: Infobright Community Edition-user Guide

B. INFOBRIGHT DATA TOOLS

76

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

INFOBRIGHT CHARSET MIGRATION TOOL PARAMETERS

Parameter Type Description Details

datadir Mandatory Absolute path to data directory

conv-map Optional Absolute path to file with collations conversions

If not specified CHMT would try to use file: chmt-binary-folder/../support-files/collations.txt ; if not found there it would search for: chmt-binary-folder/collations.txt

database Optional Name of database for migrating

If specified, tables from no other databases would be migrated

table Optional Name of table for migrating

If specified, database must be also specified; no other tables but specified will be migrated

log-file Optional Absolute path to output log file

If not specified, logs will be printed to the console

Log Structure The logs detail information about every considered table found in a specified datadir. Each conversion finishes with [NOT NEEDED], [PASS] or [FAILED] status.

Collations-conversion-file Structure Each conversion directive is stored in one line of file:

collation_from_name;collation_from_id;collation_to_name;collation_to_id

For example: big5_chinese_ci;1;binary;63

where both fields containing names are only informative (all conversions will be done using only ids).

Example collations-conversion-file (with conversion directives described above) can be obtained by running the following SQL:

use information_schema; select a.collation_name a_n, a.id a_id, b.collation_name b_n, b.id b_id from information_schema.collations a, information_schema.collations b, character_sets c where substr(a.collation_name, 1, locate('_',a.collation_name)-1)=c.character_set_name and substr(a.collation_name, 1, locate('_',a.collation_name)) = substr(b.collation_name, 1, locate('_',b.collation_name)) and b.collation_name like '%bin' and c.maxlen=1 UNION select a.collation_name a_n, a.id a_id, 'binary' b_n, 63 b_id from information_schema.collations a, character_sets c where ((substr(a.collation_name, 1, locate('_',a.collation_name)-1)=c.character_set_name) or (locate('_',a.collation_name)=0)) and (c.maxlen>1

Page 84: Infobright Community Edition-user Guide

B. INFOBRIGHT DATA TOOLS

77

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

or c.character_set_name ='binary') order by a_id into outfile '/some/path/my_collations.txt' fields terminated by ';';

Infobright DomainExpert

About the Infobright DomainExpert The Infobright DomainExpert improves data compression and the performance of import, queries and export. The DomainExpert allows you to define the composition of data, particularly columns. The database then uses this information to optimize the storage of the data and to reduce query processing time.

DomainExpert metadata is maintained in the system tables of the database sys_infobright and should be managed only with the use of stored procedures.

Decomposition Rules Decomposition rules are the main DomainExpert objects. Each rule describes the composition structure of values of a selected column expressed in a simple language. You can create, modify and delete rules using the following stored procedures from the system database sys_infobright:

create_rule(id, rule, comment) update_rule(id, rule) change_rule_comment(id, comment) delete_rule(id)

where:

id is a unique identifier or name of a rule rule defines the structure of values comment is a free description of the rule.

For example, to create a simple rule for email addresses, you would run the following command:

CALL sys_infobright.create_rule('email', '%s@%s', 'Rule for email addresses');

The rules are stored in the system table decomposition_dictionary. The list of all rules defined in the system can be obtained with the following query:

SELECT * FROM sys_infobright.decomposition_dictionary;

Decomposition Rules Language The language to define the structure of values accepts three types of primitives:

nonnegative integer number, denoted as %d arbitrary character sequence, denoted as %s literal—a sequence of characters that are to be matched exactly

Page 85: Infobright Community Edition-user Guide

B. INFOBRIGHT DATA TOOLS

78

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Examples:

%d.%d.%d.%d decomposes an IP address (4-byte version) in four 1-byte numerical components

%s@%s decomposes an email address into the user name and the domain name %s://%s?%s decomposes a simple url with a query string into the scheme, the address

and the query string

As the percent sign (%) is a special character, to match it literally you can use a double percent sign (%%). For example, to match exactly the text 10% humidity, the rule can be defined as 10%% humidity. However, the percent sign only has a special meaning if it is followed by the letter s or d. Otherwise the percent sign has the literal meaning, so in the above example the unmodified text 10% humidity is also a correct syntax of the exact rule.

There are two constraints on the rule syntax—the following ambiguous subsequences of symbols are not allowed in rules: %s%s %d%d

The matching algorithm for rules is LAZY—the algorithm moves to the next primitive in the rule as soon as possible. For example, for the text aa.bb.cc and the rule %s.%s, the first %s is matched to aa and the second %s is matched to bb.cc. However, if the most lazy approach fails, the algorithm searches back until the correct match is found or all the cases are traced. For example, for the text aa.bb.11 and the rule %s.%d, the string %s is matched to aa.bb and the number %d is matched to 11.

The current language is a simple, limited language that will be replaced with a much more powerful language in the future. The current language does not support the following regular expression constructs (these will be added in future releases):

Grouping—for example, (%s.%s).%s@(%d%s).%s Type classes—for example, [%s|%d]@%s Repetition—for example, %s{5,10} Optional inclusion—for example, (%s.)?%d. This currently matches (string.)?1000

whereas it might more reasonably match string.1000 and 1000. Sub-expressions Word boundaries Back-references, i.e. each group has a reference—$1 for the match of the first group, $2

for the match of the second group and so on

Building recursive rules using the following operations is also not yet available:

Concatenation: r1r2 where r1 and r2 are any pair of already defined rules—matches any value that is concatenation of any pair of values, with v1 matching r1 and v2 matching r2

Page 86: Infobright Community Edition-user Guide

B. INFOBRIGHT DATA TOOLS

79

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

Union (alternative): r1|r2 matches each value that matches one of r1 and r2 Closure: r* matches each value which is any repetition of any value matching r

Predefined IPv4 Rule Besides user-defined rules, Infobright provides a built-in rule that is not expressible in the above language. This is the IPv4 rule that is defined and added to the DomainExpert metadata at installation. IPv4 converts the text representation of an IP address into a single 32-bit number as used in network hardware and low-level network handling software.

If you have data with IP addresses, this allows you to compare the performance of the predefined IPv4 with IP decompositions expressible in the language—for example, with the rule %d.%d.%d.%d.

Other Predefined Rules The following predefined rules are provided with the DomainExpert.

RULE ID RULE CONTENT COMMENTS

IPv4_C %d.%d.%d.%d Similar to IPv4 but uses generic numeric compression.

EMAIL %s@%s Username/domain split of an email address.

URL %s://%s?%s Protocol, domain and query parameters based rule.

These rules can be improved if the user data matches more specific criteria (for example, the domain always contains a suffix such as .com). Using specific criteria may improve both the compression ratio and the response time. If you want to use more specific rules, create new rules (instead of replacing the predefined ones).

Assigning Rules to Columns The stored procedure set_decomposition_rule(database, table, column, id) from the database sys_infobright supports the assignment of rules to particular columns from the Infobright tables. For example, to apply the predefined IPv4 rule to column ip in the table connection from the database network, run the following command:

CALL sys_infobright.set_decomposition_rule('network', 'connection', 'ip', 'IPv4');

The decomposition rules can be applied only to columns of string types that are not lookup columns:

CHAR VARCHAR BINARY VARBINARY

Page 87: Infobright Community Edition-user Guide

B. INFOBRIGHT DATA TOOLS

80

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

TEXT TINYTEXT

Assigning a rule to a column of another type or to a lookup column is ignored.

You cannot set multiple rules on the same column. If the set_decomposition_rule procedure is called for a column with an already assigned rule, the previous rule is replaced with the new rule.

To see the current decomposition rules for a particular table, use the show_decomposition procedure. For example:

CALL sys_infobright.show_decomposition('network', 'connection');

If a rule is assigned to a column, you cannot change or delete the rule from the decomposition_dictionary system table.

Applying Rules to Data After decomposition rules are assigned to columns, the rules are automatically applied to any new data coming to the tables containing these rules when using the standard "LOAD DATA" DML command.

If a rule is assigned to a column, instead of storing whole values, each value inserted into the column is decomposed into the parts matching the subsequent occurrences of %s and %d in the rule and the parts are compressed and stored in separate subcollections. Each subcollection corresponds to one occurrence of %s or %d in the rule.

A value inserted into a column with a decomposition defined does not have to match the rule. Such non-matching values are inserted into a separate subcollection. This subcollection of outliers is compressed and stored independently of other subcollections.

You can obtain the accuracy of decomposition rules by setting the ControlMessages parameter in the brighthouse.ini file to 2 (or higher):

ControlMessagApplying Rules to Dataes = 2

If the parameter is set on each COMMIT for each column, Infobright reports the number of outliers among the committed values (from INSERTs and LOADs). For example:

2011-05-25 16:59:03 Decomposition of ./network/connection.ip left 15 outliers.

Note Applying a decomposition rule DOES NOT always result in better compression ratio and time. A decomposition rule may result in a worse compression ratio or load and slower

Page 88: Infobright Community Edition-user Guide

B. INFOBRIGHT DATA TOOLS

81

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

queries. To ensure decomposition improves performance, you can compare load time, compression ratio and query time when loading the same data to a table with a decomposition rule defined and to a table without decomposition.

Modifying a Rule for an Existing Column A rule for a column can be changed or deleted during the life of the table using the following stored procedures:

set_decomposition_rule(database, table, column, id) delete_decomposition_rule(database, table, column)

The change applies only to new data. The old data remains decomposed with the previously used rules. If the rule for a column is deleted, new values are stored without decomposition.

Page 89: Infobright Community Edition-user Guide

C. LINUX TUNING SETTINGS

82

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

C. Linux Tuning Settings

System Settings for Red Hat Enterprise Linux and CentOS

Disable SElinux SElinux is intended to protect Linux servers on the public internet such as Web Servers. It provides an extra layer of security that isn’t really required for a back-end database server.

In /etc/sysconfig/selinux add: SELINUX=disabled

Swappiness Set low swappiness to avoid unnecessary paging. This only helps for machines with low levels of memory (say 4GB with 3GB allocated for Infobright).

In /etc/rc.local add: echo "7" > /proc/sys/vm/swappiness

Disable Unused Processes Run system-config-services (or edit /etc/initd.d directory) and leave ssh running.

File System Settings

Ensure CacheFolder is on a Fast Local Disk See “Infobright Tuning Parameters” in "Configuring Infobright" on page 12.

Larger Readahead In /etc/rc.local add:

blockdev --setra 2048 /dev/sd<x>

Replace sd<x> with a proper device symbol (e.g. sdc); it should be the drive(s) on which datadir and/or CacheFolder resides.

Use XFS File System for Data Directories For XFS (may need to install kmod-xfs and xfsprogs):

mkfs.xfs -b size=4096 /dev/sdc1

In /etc/fstab add: /dev/sdc1 /bha xfs noatime 1 2

Note This is for data folders only. Linux boot partition can be ext3.

Page 90: Infobright Community Edition-user Guide

C. LINUX TUNING SETTINGS

83

INFOBRIGHT COMMUNITY EDITION 4.0.6 GA USER GUIDE

noatime Use noatime options for mounting database and cache volumes (see the next section, "Deadline Elevator", for details). Otherwise the system will update the access time for files and directories (which degrades performance).

Deadline Elevator The default scheduler - CFQ - is 1% faster than elevator for a single user. However, in multi-user test with 4 users, elevator had 20% better performance.

In /etc/rc.local add: echo "deadline" > /sys/block/sd<x>/queue/scheduler

Replace sd<x> with a proper device symbol (e.g. sdc); it should be the drive(s) on which datadir and/or CacheFolder resides.

Increase ulimit to Support Large Data Volume or Users This will not change performance, but may avoid errors. Ulimit determines the maximum number of files a user can have open.

Increase ulimit to unlimited or 32,768 since the default file limit is 1024. This is insufficient for large databases (lots of columns) or servers with multiple Infobright databases.

To view current settings, run command: # ulimit -a

To set it to a new value for this running session, which takes effect immediately, run command: # ulimit -n 8800 # ulimit -n -1 // for unlimited; recommended if server isn't shared, reportedly doesn't work on IB03

Alternatively, if you want the changes to survive reboot, do the following:

1. Exit all shell sessions for the user you want to change limits on.

2. As root, edit the file /etc/security/limits.conf and add these two lines toward the end: user1 soft nofile 16000 user1 hard nofile 20000

The two lines above change the max number of file handles - nofile - to new settings.

3. Save the file.

4. Login as user1 again. The new changes will be in effect.

Note on how to detect ulimit problem If you are noticing crashes during multi-user use cases, please check the console log for the following error:

what(): FileSystem Error : Bad file descriptor mysqld got signal 6;

To fix this, increase ulimit (see the previous section, “Increase ulimit to Support Large Data Volume or Users”).