Upload
dinhmien
View
220
Download
0
Embed Size (px)
Citation preview
SQL
Structured Query Language (SQL) is a standardized language for access and manipulation of various data structures.
SQL is implemented in SAS via PROC SQL
General Syntax
PROC SQL ;
ALTER TABLE ;
CONNECT ;
CREATE INDEX ;
CREATE TABLE ;
CREATE VIEW ;
DELETE ;
DESCRIBE ;
DISCONNECT ;
DROP ;
EXECUTE ;
INSERT ;
RESET ;
SELECT ;
UPDATE ;
VALIDATE ;
SAS vs. SQL
In data processing, some standard terms differ between SAS and SQL:
General/Raw SAS SQL
File Data Set Table
Record Observation Row
Field Variable Column
Unique “Features” of PROC SQL
SQL does not require a run statement (it will use a quit statement instead). This is because SQL statements are executed upon submission.
How does this differ from typical SAS procedures?
Typical Execution of SAS Procedures
Suppose I have submitted the code as shown below:
I get a message that PROC MEANS is running. Has it summarized the variable as specified?
Typical Execution of SAS Procedures
Well, no. Suppose I amend it with a class statement and submit it.
PROC MEANS is still running, but it’s not recomputing the analysis for the given classes.
Typical Execution of SAS Procedures
Typical procedures are checked for proper syntax and compiled one at a time.
If you now submit a run statement, proc means will execute based on all statements specified.
For SQL, statements compile and execute immediately.
Other things about SQL…
SQL statements are made up of clauses
References to variables/columns or lists of data sets are separated by commas.
The select statement is the most important example of this structure…
Example
proc sql;
select region, pol_type, jobtotal,
0.02*jobtotal as incidental
from mysas.projects
where region in ('Beaumont','Boston')
order by region, pol_type
;
quit;
Example
proc sql;
select region, pol_type, jobtotal,
0.02*jobtotal as incidental
from mysas.projects
where region in ('Beaumont','Boston')
order by region, pol_type
;
quit;
Begin SQLprocessing
Example
proc sql;
select region, pol_type, jobtotal,
0.02*jobtotal as incidental
from mysas.projects
where region in ('Beaumont','Boston')
order by region, pol_type
;
quit;
Note that thisSQL invocation has
3 clauses
Select statementwith clauses
Example
proc sql;
select region, pol_type, jobtotal,
0.02*jobtotal as incidental
from mysas.projects
where region in ('Beaumont','Boston')
order by region, pol_type
;
quit;
Select startswith a reference
to variables/columns…
and potentiallycontains several
clauses
(including,possibly,
new ones)…
Example
proc sql;
select region, pol_type, jobtotal,
0.02*jobtotal as incidental
from mysas.projects
where region in ('Beaumont','Boston')
order by region, pol_type
;
quit;
End SQLprocessing
The Select Statement
Refers to a comma separated list of columns (variables) to be used/generated.
Allows for construction of new columns and aliasing (a variable name).
Has several clauses available…
Some Clauses
From: indicates which table/data set to read from.
Where: allows for conditional sub-setting of the data—similar to the where statement that can be used in any SAS procedure.
Order By: Sorts results by the column(s) specified (with the options ASC or DESC).
Summarizing/Grouping Data
SQL accommodates a variety of functions for summarizing data.
Some functions:
AVG (or MEAN)
COUNT (or FREQ or N)
CSS
CV
MAX
Using Summary Functions
Summary functions are applied to columns in the select statement.
Suppose I try:proc sql;
select region, pol_type,
mean(jobtotal) as jobmean,
mean(0.02*jobtotal) as incidental
from mysas.projects
where region in ('Beaumont','Boston')
;
quit;
Using Summary Functions
The summary results are “merged” back on to the original data set (as noted in the log).
Try this version:proc sql;
select region, pol_type, jobtotal,
mean(jobtotal) as jobmean,
mean(0.02*jobtotal) as incidental
from mysas.projects
where region in ('Beaumont','Boston')
;
quit;
Producing Short Summaries
You can produce summaries similar to what you get using PROC MEANS with a CLASS statement using the GROUP BY clause.proc sql;
select region, pol_type,
mean(jobtotal) as jobmean,
mean(0.02*jobtotal) as incidental
from mysas.projects
where region in ('Beaumont','Boston')
group by region, pol_type
;
quit;
Producing Short Summaries
Summaries are now computed for each group—behavior is similar to the group option in PROC REPORT
Creating Tables for Output
To create an actual table (data set) from our query, we use the CREATE TABLE statement.
SELECT now becomes a clause within that statement…
Creating Tables for Output
Example:proc sql;
create table incidentals as
select region, pol_type, jobtotal,
0.02*jobtotal as incidental
from mysas.projects
where region in ('Beaumont','Boston')
order by region, pol_type
;
quit;One
Statement