33
SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February 1 st , 2007

SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Embed Size (px)

Citation preview

Page 1: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

SQL 101:An Introduction to writing queries

with SQL at the University of Minnesota

Peter M. RadcliffeSenior Analyst

College of Liberal ArtsFebruary 1st, 2007

Page 2: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Class Objectives

1. Understand the structure and purpose of data warehouse tables

2. Write simple queries using…• …multiple tables• …summary columns and grouping• …calculated columns• …inner and outer joins• …case statements• …subqueries

3. Understand the data warehouse query tool

Page 3: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Class Assumptions

• You haven’t written queries in SQL before (or at least not much)

• You have access to the data warehouse, but haven’t used it (or at least not much)

• You are familiar with Excel

• You perform data analysis as part of your job

• If you have more background that this, let me know and we’ll speed up the early portion of the course

Page 4: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Relational Databases

• Database tables versus spreadsheets Each table is like an individual spreadsheet Structure is more important

• Entries must be in rows and columns (no isolated cells)

• Computations on cells are more difficult, they are more commonly conducted on entire columns

Visualizing a spreadsheet may help with properly constructing a query to give the desired results

Page 5: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Relational Databases

• Column and row structure Rows are observations or records Columns are the quantities or characteristics observed or

recorded

• Linking between tables Common column(s) Concordant product (all-to-all)

• Structure of joins One-to-One One-to-Many Many-to-Many One or Many to None

Page 6: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Simple Query Structure

• Sentence and clause analogy Each query can be stated as a sentence of the form, “Return

observations on the following characteristics, from the following tables, meeting the following criteria, and organized in the following way.”

Each clause in the sentence corresponds to a particular portion of a query.

• Clauses SELECT FROM WHERE ORDER BY Will address “GROUP BY” and “HAVING” clauses when we

discuss summary columns

Page 7: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Using DW Web Query Tool

• Data Warehouse web site: http://dw.umn.edu• Web Query Tool login:

https://dw1.umn.edu/queries/queries0.asp• Data warehouse user id and password is separate

from x.500• Options for “Querying Method”

Build a Query (point-and-click query building tool) Free-Form Query (we’ll focus on this) Choose a Saved Query (after you’ve created some) Delete Saved Queries (ditto) Re-Group Saved Queries List Saved Queries

• Will return to these functions after creating first query

Page 8: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

First Example Query

• Question: Who are the Senior Analysts at the University of Minnesota?

• What do we want to know about them? Name, CUFS Area code, CUFS Area Name Columns: name, cufs_area, area_name30

• Source: PeopleSoft Employment Table, ps_dwhr_job

• Criteria: Senior Analysts, jobcode = 9370

Page 9: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

First Example Query Continued

• Log into DW web query tool, choosing “Free-Form Query”

• Type the following query into the box:

SELECT name, cufs_area, area_name30FROM sysadm.ps_dwhr_job WHERE jobcode = ‘9370’‘ORDER BY name

• Leave “Choose Row Limit” and “Print to File” dialogs on their default values (“10 Rows Paging” and “No File Type Selected”).

Page 10: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

First Example Query Continued

• Before clicking on “Submit Query” button, notice a couple details

Paragraph style (new lines for each command, indents, etc.)• This is all completely unnecessary. It is simply for clarity.

Syntax• Commas between column names are necessary.• Single quotes are necessary for string values (words), but not for

numeric data (in this case, even the seemingly numeric jobcode is a string)

• Capitalization matters for the string values in the where clause, but not for the rest of the query.

Page 11: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

First Example Query: Results

• Viewing Query Results Click on the “Submit Query” button to view the results There should be eight individuals listed Looking at the data, you can imagine this as a spreadsheet

with eight rows of data, plus column headers, and three columns.

These results can be downloaded by choosing a different output format, or they can be copied and pasted into Excel.

Page 12: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Saving Results

• Saving queries with DW query tool Notice the “Save Query” button between the query and the

results You can enter a name for a query group, or select and

existing one using the left two dialog boxes. You can enter a name for the query itself or select an existing

name if you want to update/replace a query you have already created.

After entering or selecting names, you can hit the “Save Query” button to add it to your personal query library

Page 13: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Managing Saved Queries

• You will see your saved queries, organized in groups• Scroll down if necessary to find your newly saved

query• Click on radio button next to saved query• Click on “Show Query” button at bottom of list• You should now see your query again in the “Edit Box”

Page 14: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

SQL Structure

• SELECT [necessary – Used to choose columns from table(s) that you want in your output]

• FROM [necessary - State the table(s) that you are pulling the data from]

• JOIN [optional - Clause used if you want to pull data from more than one table]

• WHERE [optional - Use if you want to limit the results that are pulled based on specified conditions/qualifications]

• GROUP BY [optional – Use if you want to organize your results by groups represented in column(s)]

• HAVING [optional – For setting the specifications for the groups that you want to display]

• ORDER BY [optional – Use if you want the results ordered in ascending (ASC) or descending order (DESC)]

Page 15: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

SELECT Clause: Overview

• Necessary – Used to choose columns from table(s) that you want in your output

• How SELECT works: 1. Returns only the columns that you list in your SELECT

statement.

2. Rows returned are specified in WHERE clause.

• Thus, a SELECT statement indicates the columns to be included in the query output

Page 16: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

FROM Clause: Overview

• Necessary • Identifies the table(s) from which to retrieve data• In relational databases, data is generally split into

multiple tables. The data warehouse was designed to combine data in ways that make inquiry-based sense.

• However, it may still be necessary to recombine tables to answer data questions

• TIP: Visualize the end result, or table, that you want to see. What columns are in the proposed table? In what tables do those columns reside?

Page 17: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

WHERE Clause: Overview

• Optional, if omitted all rows are returned

• Uses Logical operators to define criteria.

• Thus, a WHERE clause indicates the rows to be included in the query output

• Columns in the WHERE clause do not have to be included in the SELECT list

Page 18: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

ORDER BY Clause: Overview

• Optional • Used to present results ordered in ascending (ASC, the

default) or descending order (DESC) • If you use the ORDER BY statement, the columns that

you are ordering by have to be included in the SELECT statement.

• If this clause is omitted, the data will appear in the order it is in the source table

Page 19: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Second Example Query

• We want to reduce the number of cases returned by removing duplicates

• Edit your first query, adding only the word “DISTINCT”

SELECT DISTINCT name, cufs_area, area_name30FROM sysadm.ps_dwhr_jobWHERE jobcode = ‘9370’ORDER BY name

• Leave “Choose Row Limit” and “Print to File” dialogs on their default values (“10 Rows Paging” and “No File Type Selected”), and click “Submit Query”.

Page 20: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

SELECT DISTINCT

• SELECT DISTINCT Optional - Used to indicate that you want unique rows (or

records) of data for the column(s) specified, thus reducing duplicates

Eliminates rows that are identical on all selected columns Note that values for columns in the record (row) other than those

selected may be different

Page 21: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Second Query Example Results

• Switching from SELECT to SELECT DISTINCT eliminated only one row, where all the values for the selected columns were identical

• Note that the same individuals still appear on multiple rows, but now those rows indicate different offices where those people worked

• If any of these names are familiar, you may well notice that some of them are no longer at the University of Minnesota, and have not been for some time

• Clearly, we need to use more criteria to reduce the list

Page 22: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

WHERE Clause: Overview

• Quick reminder from before:

• Optional, if omitted all rows are returned

• Uses Logical operators to define criteria

• Thus, a WHERE clause indicates the rows to be included in the query output

• Columns in the WHERE clause do not have to be included in the SELECT list

Page 23: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

WHERE Clause: Operators Comparisons (=, <, >, >=, <=, !=, <>)

• WHERE efc > unmet_need Combinations of Logical Conditions (AND, OR, NOT)

• WHERE unmet_need < 2000 OR efc > 15000• WHERE effdt > 7/01/2005 AND effdt < 6/30/2006

Ranges (BETWEEN and NOT BETWEEN)• WHERE effdt BETWEEN 7/01/2005 AND 6/30/2006

Lists (IN, NOT IN)• WHERE institution IN(‘umntc’)• WHERE ethnicity NOT IN (‘Unknown’,’’,NS)

Unknown Values (IS NULL and IS NOT NULL)• WHERE ethnicity IS NULL

Character Matches (LIKE and NOT LIKE). • Outlines patterns that must be included in quotes by using one or

more wildcard symbols (% means any string of zero or more characters; _ means any single character)

• WHERE zip LIKE ‘5%’• WHERE zip NOT LIKE ‘55417’

Page 24: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

WHERE: How it Works

• Follows a sequential order of operations for arithmetic and logical operators:

1. Parentheses

2. Multiplication and Division,

3. Subtraction and Addition,

4. NOT,

5. AND, and

6. OR

• Reads and performs operations left to right

Page 25: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

WHERE Clause: Conjunctions

• Conjunctions in WHERE clause AND means the row must meet all of the listed criteria OR means the row must meet the criteria listed before the OR

clause, or the criteria after the where clause Order and grouping are critical, use parentheses to adjust order

of operations

• Examples – note the differences in these criteria: WHERE jobcode = ‘9370’ AND status_flg = ‘C’ AND

empl_status = ‘A’ WHERE jobcode = ‘9370’ AND status_flg = ‘C’ OR

empl_status = ‘A’ WHERE jobcode = ‘9370’ AND (status_flg = ‘C’ AND

empl_status = ‘A’)

Page 26: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

WHERE Clause: IN & NOT IN

• To select rows with many different values that are not in a simple range, you can use several OR statements

• Example:

WHERE area = ‘466’ or area = ‘481’ or area = ‘502’

• Alternatively, the same statement can be constructed using an IN statement

Example: WHERE area IN (‘466’,’481’,’502’)

• NOT IN can be used to exclude a set of values

Example: WHERE area NOT IN (‘999’,’000’,’ZZZ’)

Page 27: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

WHERE: NULL VALUES

• Not all rows have data in them for all columns• Where data is actually missing, rather than coded as

zero, “missing”, “unknown”, etc., are called null values• Nulls are not actual values, and therefore cannot be

referred to in any calculation or comparison• Instead, separate operators are used in the WHERE

clause to refer to null values: IS NULL (example: WHERE cum_gpa IS NULL) IS NOT NULL (example: WHERE cum_gpa IS NOT NULL)

• Sometimes lead to compound statements WHERE act_comp = 0 OR act_comp IS NULL

Page 28: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Null values in SELECT

• In addition to the WHERE clause, null values can be addressed in the SELECT clause

• To prevent null values from appearing as the word “NULL” or generally to provide a meaningful label, use the NVL command to replace the null values Example:

SELECT DISTINCT NVL(ethnic_group,’Missing’)

FROM sysadm.ps_dwsa_stix_1069

• If the column contains string data, remember to use the single quotes in specifying the replacement value

Page 29: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Third Example Query

• Use multiple criteria to reduce rows returned

• Edit your previous query to add additional criteria:

SELECT DISTINCT name, cufs_area, area_name30FROM sysadm.ps_dwhr_jobWHERE jobcode = ‘9370’ AND status_flg = ‘C’ AND empl_status = ‘A’ORDER BY name

• Submit the query and view results, there should now be four records

Page 30: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

Aside: DW Codebooks• How do we know what columns to use in the previous

queries?

• Click on button marked “Information” at top of Data Warehouse query page (right-clicking and selecting “open in a new window” will make it possible to switch back and forth between codebooks and query window)

• Top option on list is “Data Dictionaries”, click here

• Click on “Online HR-Job and Demographics Data Dictionary”

• Click on “Individual Table Descriptions”

• Click on “PS_DWHR_JOB”

• Click on “Click Here to See Layout and Sources”

Page 31: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

DW Codebooks Continued

• Table displayed has information on the columns # - position in the table, not important for our use Column Name – label of column to be used in queries Column Descr – short, sometimes helpful description of column Type/Size – Data type (string, numeric, etc.) and number of

characters in the column Table Source – PeopleSoft “setup” table from which the column

information is taken when building the data warehouse tables Column Source – Column in the source table that is imported, or

“calculated” if the data is created when the table is built

• Clicking on a column name will bring up a longer description and often allow you to run a definition query

Page 32: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

SELECT: Aliases and Wildcards

• To see all of the columns in a table, use the wildcard * in place of the column names

Example: SELECT * FROM ps_area_tbl This is the simplest form of query that can be executed

• To rename a column, give it an “alias” by following the original column name with your desired column name

Example: SELECT emplid AS student_id Note that the word “AS” is optional in most systems,

but is generally helpful for readibility

Page 33: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota Peter M. Radcliffe Senior Analyst College of Liberal Arts February

SELECT: Substring Function

• May need only part of a string column Example: the first four digits of University of Minnesota academic

plans indicates the program, the next three the degree sought, and the last two the college that owns the plan.

SELECT acad_plan(substring,1,4) as major_code

SELECT acad_plan(substring,5,3) as degree_sought

SELECT acad_plan(substring,8,2) as college_code