30
IBM Software Big SQL on Hadoop Connecting to the IBM Big SQL Server and running SQL queries.

Exercise 1

Embed Size (px)

DESCRIPTION

big data exercise

Citation preview

  • IBM Software

    Big SQL on Hadoop Connecting to the IBM Big SQL Server and running SQL queries.

  • Copyright IBM Corporation, 2013

    US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

  • IBM Software

    Contents Page 3

    Contents CONNECTING TO THE IBM BIG SQL SERVER AND RUNNING SQL QUERIES ........................................................................ 4

    1.1 GETTING STARTED .................................................................................................................................. 6 1.2 MANAGING THE BIG SQL STATUS USING THE COMMAND LINE...................................................................... 13 1.3 CONNECTING TO BIG SQL USING JSQSH ................................................................................................. 15 1.4 CONNECTING TO BIG SQL USING ECLIPSE ............................................................................................... 22 1.5 USING THE BIGINSIGHTS CONSOLE TO RUN BIG SQL QUERIES ................................................................... 26 SUMMARY ........................................................................................................................................................... 27

  • IBM Software

    Page 4

    Connecting to the IBM Big SQL Server and running SQL queries IBM Big SQL is a component of the IBM InfoSphere BigInsights product. Before you can start working with your Hadoop data using Big SQL, you need to be able to connect to Big SQL. There are three different methods for connecting to the IBM Big SQL Server. You will get to see how to use all three of these methods in this exercise.

    The first of the three methods that you will use for Big SQL is JSqsh, pronounced jay-skwish. JSqsh is an open source CLI for JDBC applications such as Big SQL. This means that you can also use JSqsh for other JDBC applications as well. Another method that you will use in this exercise is Eclipse. Eclipse is generally preferred as the tool for working with Big SQL because the results are formatted in a way that is easy for the user to understand. Queries can also be organized in scripts within projects which make it easier to manage your queries. Finally, another option to use with Big SQL is through the BigInsights console. There is a web based, Big SQL console where you can run your queries and get the results.

    After completing this hands-on lab, you should be able to:

    o Manage the Big SQL Server o Connect to Big SQL using JSqsh to run Big SQL queries o Connect to Big SQL using Eclipse to run Big SQL queries o Use the BigInsights Console to run Big SQL queries

    Allow 30-45 minutes to complete this section of lab.

    Throughout this lab you will be using the following account login information:

    When to use: Username Password

    Log in from the command-line to accept the licenses root password

    Log in from the Linux SUSE Desktop to access the BigInsights Desktop

    biadmin biadmin

  • IBM Software

    Hands-on-Lab Page 5

  • IBM Software

    Page 6

    1.1 Getting Started

    __1. Start the VMware image by clicking the Play virtual machine button in the VMware Player if it is not already on.

    __2. Choose the first option to load up the image.

  • IBM Software

    Hands-on-Lab Page 7

    __3. Youll need to log in to the image initially. Use the VM Image setup screen credentials listed at the front of the document:

  • IBM Software

    Page 8

    __4. Go through the VM setup screens. When you get to the screen that asks to input your passwords, use the same passwords as listed at the beginning of this document.

    __5. Log in to the VMware virtual machine using the following credentials.

    Username: biadmin

    Password: biadmin

  • IBM Software

    Hands-on-Lab Page 9

    __6. After you log in, your screen should look similar to the one below.

    There are two ways to start up BigInsights: through terminal or simply double-clicking an icon. Both of these methods will be shown in the following steps.

    __7. Now open the terminal by double clicking the BigInsights Shell icon.

    __8. Double click on the Terminal icon

  • IBM Software

    Page 10

    .

    __9. Once the terminal has been opened, change to the $BIGINSIGHTS_HOME/bin directory (which by default is /opt/ibm/biginsights) by issuing the following commands:

    cd $BIGINSIGHTS_HOME/bin

    or

    cd /opt/ibm/biginsights/bin

    __10. Go ahead and start up the BigInsights environment. Note that they will take a few minutes to run.

    ./start-all.sh

  • IBM Software

    Hands-on-Lab Page 11

    __11. If you would like to stop all components execute the command below. However, for this lab, leave all components started.

    ./stop-all.sh

    Next, let us look at how you would start all the components by double-clicking an icon.

    __12. Double-clicking on the Start BigInsights icon would execute a script that does the above mentioned steps. Once all components are started the terminal exits and you are set. Simple.

    __13. You can stop the components in a similar manner, by double-clicking on the Stop Biginsights icon.

  • IBM Software

    Page 12

    Now that are components are started you may move on to the next section.

    Note: Occasionally, you may need to suspend your lab image and resume your work another time. By doing so, you may disrupt the BigInsights instance where some components do not function properly. If you find yourself resuming to a lab image and things do not work properly, go ahead and restart the BigInsights instance.

  • IBM Software

    Hands-on-Lab Page 13

    1.2 Managing the Big SQL status using the command line

    All of the BigInsights components have been started. There is a method to manage just the Big SQL component. In this section, you will see how you can manage just the Big SQL server using the command line. Alternatively, you could also start and stop the Big SQL server from within the command line using BigInsights:

    $BIGINSIGHTS_HOME/bin/start.sh bigsql

    $BIGINSIGHTS_HOME/bin/stop.sh bigsql

    __1. Open up a new terminal window. Right-click on the desktop and select Open in Terminal

    __2. Switch to the bigsql user with the password bigsql. Type in su bigsql in the terminal and provide the password when prompted.

    __3. Change to the Big SQL bin directory. Type in:

    cd $BIGSQL_HOME/bin

    __4. Check the status of the Big SQL server. Type in:

    ./bigsql status

    Note that there is a Big SQL v1 instance. If you recall, this current release of BigInsights comes with Big SQL and Big SQL v1 (older, legacy version). You would only really use v1 if you need support for HBase or if your applications requires specific v1 features that is not yet supported in the current release of Big SQL.

    __5. Stop the Big SQL server, type in:

    ./bigsql stop

  • IBM Software

    Page 14

    __6. Once the server has stopped, check the status of the Big SQL server. Type in:

    ./bigsql status

    __7. Restart the Big SQL server, type in:

    ./bigsql start

    __8. Close any opened terminals.

  • IBM Software

    Hands-on-Lab Page 15

    1.3 Connecting to Big SQL using JSqsh

    In this section, you will work with Big SQL using JSqsh. You will set up a Big SQL connection and run a few simple queries to show the interaction with JSqsh.

    __1. Open a new terminal. Right-click the desktop and select Open in Terminal to open a new command line.

    __2. Create a Big SQL connection for JSqsh. Type the following:

    $JSQSH_HOME/bin/jsqsh --setup

    __3. Start the connection wizard. Type in the letter c.

    __4. There are two different connections. One for bigsql and one for bigsql1. In this lab, you will be working with the bigsql database on port 51000. Select the Big SQL driver. Type in the number 1.

    __5. You can see that there are a number of variables already defined. Make sure that the values to your variables are the same as shown in the screenshot. For the password variable, specify the password so that you will not need to provide it every time you connect to the Big SQL connection. Select the password variable. Type in the number 5.

  • IBM Software

    Page 16

    __6. When prompted, enter in the password: biadmin and hit enter. You will see asterisks in place of the password:

    __7. Make sure you have entered in the correct password by performing the Test operation. Type in the letter t, to run the test.

  • IBM Software

    Hands-on-Lab Page 17

    __8. If the connection test was not successful, make sure you update the password variable and rerun the test. If the test was successful, save the connection profile. Type in the letter s to save.

    __9. Quit the JSqsh connection wizard by typing in the letter q.

    __10. Get out of JSqsh by typing quit.

    __11. Restart JSqsh, but this time, specify the bigsql connection that we just set up.

    $JSQSH_HOME/bin/jsqsh bigsql

    Because you had saved the password in the connection profile, you will not be prompted to provide the password again.

    __12. To get help from JSqsh, type in:

    \help

    __13. To see the list of commands to use within JSqsh, type in

    \help commands

    __14. To see the schemas, type in:

    \show schemas

  • IBM Software

    Page 18

    __15. To display essential information about all available tables one page at a time, type in:

    \show tables e | more

    The JSqsh more operator simply breaks up the output into pages. You can hit the space bar to continue viewing the output or hit the letter q to quit and go to the end of the output.

    __16. Create a simple Hadoop table using Big SQL. Copy and paste or type in the following:

    create hadoop table test1 (col1 int, col2 varchar(5));

    Because you did not specify a schema name for the table, it was created under your default schema, which is your username, biadmin. The statement above is equivalent to:

    create hadoop table biadmin.test1 (col1 int, col2 varchar(5));

    __17. Check that the table has been created. Type in:

    \show tables e | more

    You will notice that the command you just entered listed all of the tables, including system tables. It may be difficult to locate your particular table. Use the command in the next step to narrow down the search results.

    __18. To display just user tables (avoid views and system tables) type in:

    \tables user

  • IBM Software

    Hands-on-Lab Page 19

    With this command, you may see tables from other users as well but you may not have the privilege to query them.

    __19. To see just your tables, namely biadmin, type in:

    \tables s BIADMIN

    Pay attention that the login name provided here is in uppercase. That is because the system changes all names to uppercase. The search is case sensitive, so if you do not provide the login name in uppercase, you will not see what you are expecting. The screenshot shows that if you queried with the lowercase, you will not see the test1 table. When you query using uppercase BIADMIN, you see the test1 table.

    __20. Insert a row into the test1 table. Type in:

    insert into test1 values (1, 'one');

    It is important to remind you that the INSERT statement should only be used for testing purposes. The INSERT operation is not parallelized on the cluster; so therefore, it is very inefficient for loading large amounts of data. It is recommended that you use one of the bulk loading operators that you will see in the next exercise. Those operators are parallelized and optimized so your queries will yield much better performance in a production environment.

    __21. Look at the test1 table Type in:

    \describe BIADMIN.TEST1

    Notice again that you have use uppercase for the schema and the table names because those values are folded to upper case in the system catalog tables.

    __22. To see the inserted row, type in:

    select * from test1;

  • IBM Software

    Page 20

    When used in queries, you do not need to uppercase any of the names.

    __23. Issue a query that restricts the number of rows returned to 5. For example, select the first 5 rows from the syscat.tables:

    select tabschema, tabname from syscat.tables fetch first 5 rows only;

    Restricting the number of rows returned by a query is good for development when working with large volume of data.

    __24. JSqsh has a few useful commands that should be considered. You can review the history of the commands recently executed in the JSqsh shell. Type in:

    \history

    __25. To recall a query from the history, for example, to recall statement 4, type in !4. This will bring the query to the current command line. Then you just need to add a ; (semi-colon) to the final line and hit Enter to run the statement.

    __26. To recall a previously executed statement, type in !! (two exclamation points, without spaces). Then add a ; (semi-colon) at the end to run the statement.

    __27. JSqsh also has the ability to pipe outputs to an external program. Pipe the output of the next statement to the more operator.

    select tabschema, tabname from syscat.tables

    go | more

    Note that because the first line did not have a semi-colon at the end, the statement did not run. That is because the default terminator for Big SQL is a semi-colon. The go command on the second line is actually what triggers Big SQL to run the statement. In fact, under the covers, the semi-colon at the end is the short cut for the JSqsh go command.

    __28. Experiment with JSqshs ability to redirect output to a local file rather than the console display. Enter the following lines on the command shell, adjusting the path information as needed for your environment.

    select tabschema, colname, colno, typename, length from syscat.columns where tabschema = USER and tabname= 'TEST1' go > $HOME/test1.out

    __29. View the output by opening up a new terminal window and typing in:

    gedit $HOME/test1.out

    Close the gedit screen when you are done viewing the results.

  • IBM Software

    Hands-on-Lab Page 21

    __30. In a production environment, you are likely to have your SQL statements in script files. Maintaining SQL script files can be quite handy for repeating executing various queries. Create a new file with the following SQL queries. From the same command line you used to open the test1.out file, create a new test1.sql file. Type in:

    gedit $HOME/test1.sql

    __31. Copy and paste the following into the test1.sql file.

    select tabschema, tabname from syscat.tables fetch first 5 rows only;

    select tabschema, colname, colno, typename, length

    from syscat.columns

    fetch first 10 rows only;

    __32. Save and close that file.

    __33. Invoke the SQL script (test1.sql). Using the same command line, type in:

    $JSQSH_HOME/bin/jsqsh bigsql < $HOME/test1.sql

    __34. Inspect the output.

    __35. Clean up the database. In the JSqsh window, type in:

    drop table test1;

    __36. Delete the test1.sql and the test1.out from the $HOME directory.

    __37. Close any opened windows

  • IBM Software

    Page 22

    1.4 Connecting to Big SQL using Eclipse In this section, you will see how to use Eclipse (the lab is using Eclipse Juno) to work with Big SQL. People generally prefer to use Eclipse for writing queries because the results are formatted and the scripts are organize in projects.

    Before continuing with this section, make sure your BigInsights services are up and running by checking the cluster status tab in the BigInsights console. If BigInsights is not started, go ahead and start it up.

    __1. Launch Eclipse by double clicking the icon on the desktop.

    __2. Select the default workspace when prompted.

    __3. The QSE image has existing connections that you can probably use, but this exercise shows you how to create your own. Open the Database Development perspective. Window > Open Perspective > Other > Database Development

    __4. In the Data Source Explorer pane, right click on Database Connections > Add Repository

    __5. In the New Connection Profile menu, select Big SQL JDBC driver and enter a name for the new driver (e.g. My Big SQL Connection). Click Next.

    __6. Enter in the appropriate connection information for your environment.

    Schema: bigsql

    Host: bivm.bim.com

    Port number: 51000

    User name: bigsql

    Password: bigsql

  • IBM Software

    Hands-on-Lab Page 23

    __7. Test the connection by clicking the Test Connection button.

    __8. Ensure the test succeeds. Otherwise adjust the properties. Click the Save password checkbox.

    __9. Click the Optional tab under the Properties heading to expose another menu that allows you to add more properties to the connection.

    __10. If this has not been added, go ahead and do so:

    __i. In the Property field, enter retrieveMessagesFromServerOnGetMessage

    __ii. In the Value field, enter true

    __iii. Click Add

    __11. Click Test Connection again to verify that you can successfully connect to your target Big SQL server.

  • IBM Software

    Page 24

    __12. Click Finish to create the connection.

    __13. In the Data Source Explorer, expand the list of data sources and verify that your Big SQL connection appears.

    __14. Return to the BigInsights perspective.

    __15. Create a BigInsights project for your work. From the Eclipse menu bar, click File > New > Other. Expand the BigInsights folder, and select BigInsights Project, and then click Next

    __16. Type myBigSQL in the Project name field and click Finish.

    __17. Create a new SQL script file. From the Eclipse menu bar, click File > New > Other. Expand the BigInsights folder, and select SQL Script, and then click Next.

    __18. In the New SQL File window, in the Enter or select the parent folder field, select myBigSQL. Your new SQL file is stored in this project folder.

    __19. In the File name field, type aFirstFile. The sql extension is added automatically. Click Finish.

    __20. In the Select Connection Profile window, select the My Big SQL Connection (created earlier). The properties of the selected connection display in the Properties field. When you select the Big SQL connection, the Big SQL database-specific context assistant and syntax checks are activated in the editor that is used to edit your SQL file. Verify that the connection uses the JDBC driver and database name shown in the Properties pane here.

    __21. Click Finish

    About the driver selection: You may be wondering why you are using a connection that employs the com.ibm.com.db2.jcc.DB2 driver class. In 2014, IBM released a common SQL query engine as part of its DB2 and BigInsights offerings. Doing so provides for greater SQL commonality across its relational DBMS and Hadoop-based offerings. It also brings a greater breadth of SQL function to Hadoop (BigInsights) users. This common query engine is accessible through the DB2 driver. The Big SQL driver remains operational and offers connectivity to an earlier, BigInsights-specific SQL query engine, also referred to as Big SQL v1. This lab focuses on using the common SQL query engine also referred to as Big SQL.

    __22. Copy the following statement into the SQL script you just created: create hadoop table test1 (col1 int, col2 varchar(5));

    In some cases, the Eclipse SQL editor may flag certain SQL statement as errors. You can ignore these warnings and continue on with the lab.

    __23. Save your file. Press CTRL+S or click File > Save.

    __24. Run the script. Right-click anyway in the script to display a menu of options. Select Run SQL or press F5. This causes all your statements in the script to be executed.

  • IBM Software

    Hands-on-Lab Page 25

    For the remainder of this lab, to execute each SQL statement individually, highlight the statement you wish to run and then press F5. When developing a SQL script with multiple statements, it is generally a good idea to test each statement individually first to verify that each is working as expected.

    __25. Inspect the SQL Results pane that appears towards the bottom of your display. If desired, double click on the SQL Results tab to enlarge this pane. Then double click on the tab again to return the pane back to its normal size. Verify that the statement executed successfully. Your Big SQL database now contains a new table named BIGSQL.TEST1 where BIGSQL is the name of the current user. Note that your schema and the table name were folder into upper case.

    __26. From your Eclipse project, query the system for metadata about your test1 table. Type in:

    select tabschema, colname, colno, typename, length

    from syscat.columns where tabschema = USER and tabname= 'TEST1';

    In case you are wondering, syscat.columns is one of a number of views supplied over system catalog data automatically maintained for you by the Big SQL service.

    __27. Inspect the SQL Results to verify that the query executed successfully and click on the Results1 tab to view its output.

    __28. Save and close your file and the Eclipse environment. You are done with this section.

  • IBM Software

    Page 26

    1.5 Using the BigInsights Console to run Big SQL queries

    __1. Launch the BigInsights Console. Double click the web console icon on the desktop:

    __2. Log in using the credentials bigsql / bigsql.

    __3. On the Welcome tab, click Run Big SQL queries under the Quick Links section. A new browser tab opens up. You will run Big SQL queries on this tab.

    There is a set of radio buttons to select which Big SQL connection to use. For the lab, you will be using Big SQL. However, if you need Big SQL v1, select the appropriate radio button. There is also a dropdown menu with the history of queries that have been executed. You can use this to run repeated queries.

    __4. Insert some values into the test1 table that you created earlier. Type in:

    insert into test1 values (1, 'one');

    insert into test1 values (2, 'two');

    You can enter both statements into the input box. They will both be executed by Big SQL.

    __5. Click Run. You may need to scroll down to get to the Run button.

    __6. Query the table to see the results. Type in:

    select * from test1;

    __7. The results tab will appear with the result of the run.

    If you input more than one select statements, each terminated with a semi-colon, then there will be multiple results tab displayed for each of the statement.

    __8. Clean up by executing this:

    drop table test1;

    __9. Close any opened windows and applications.

  • IBM Software

    Hands-on-Lab Page 27

    Summary

    Having completed this exercise, you should now be able to start using Big SQL with one of the three methods: JSqsh, Eclipse, and the BigInsights console.

  • NOTES

  • NOTES

  • Copyright IBM Corporation 2013.

    The information contained in these materials is provided for

    informational purposes only, and is provided AS IS without warranty

    of any kind, express or implied. IBM shall not be responsible for any

    damages arising out of the use of, or otherwise related to, these

    materials. Nothing contained in these materials is intended to, nor

    shall have the effect of, creating any warranties or representations

    from IBM or its suppliers or licensors, or altering the terms and

    conditions of the applicable license agreement governing the use of

    IBM software. References in these materials to IBM products,

    programs, or services do not imply that they will be available in all

    countries in which IBM operates. This information is based on

    current IBM product plans and strategy, which are subject to change

    by IBM without notice. Product release dates and/or capabilities

    referenced in these materials may change at any time at IBMs sole

    discretion based on market opportunities or other factors, and are not

    intended to be a commitment to future product or feature availability

    in any way.

    IBM, the IBM logo and ibm.com are trademarks of International

    Business Machines Corp., registered in many jurisdictions

    worldwide. Other product and service names might be trademarks of

    IBM or other companies. A current list of IBM trademarks is

    available on the Web at Copyright and trademark information at

    www.ibm.com/legal/copytrade.shtml.

    Connecting to the IBM Big SQL Server and running SQL queries1.1 Getting Started1.2 Managing the Big SQL status using the command line1.3 Connecting to Big SQL using JSqsh1.4 Connecting to Big SQL using Eclipse1.5 Using the BigInsights Console to run Big SQL queriesSummary