Learning Objective Using SAS/BASE to connect to third-party
relational data base software to extract data needed for program
evaluation research using administrative data operational reports
e.g. routine surveillance 1SHRUG, 2014-05-02 1.What is a relational
database? 2.Contact your DBA for how to connect to your
database(s)? 3.How to write queries using PROC SQL
Slide 3
What is a relational database? Set of tables tables made up of
rows and columns Trade names of relational databases (RDB): Oracle,
Teradata, SQL Server, DB2, Access RDB is software which is designed
to retain large amounts of data transactional DB
reporting/warehousing DB 2SHRUG, 2014-05-02
Slide 4
What is a relational database? Transactional DB designed to
increase the speed for front- end users complex table and table
join structures Warehousing DB designed for efficient storage and
retrieval for reporting simpler table designs and table join
structures Queries for either design use same syntax (code) queries
for warehouses will be simpler to write 3SHRUG, 2014-05-02
Slide 5
What is a relational database? Why use relational databases?
relational databases use a concept called normalization
Normalization reduces the amount of redundant data and allows for
updates to data with less error There are degrees of normalization
first degree second degree third degree and higher degrees 4SHRUG,
2014-05-02
Slide 6
First degree normalization each row pertains to a single
entity: a patient, an encounter, a physician each column pertains
to a characteristic of the entity: e.g. date of birth, sex, date of
encounter, etc IDFirstNameGenderBirthCityBirthCountry
0001JohnMMonctonCanada 0002DevbaniFKolkataIndia Table 1: Subjects
with demographic information 5SHRUG, 2014-05-02
Slide 7
Violation of first degree normalization
SubjIDFirstNameGenderBirthCityBirthCountry 0001John43MonctonNew
Brunswick 0002RahaFWest BengalIndia What impact does violating the
first degree normalization have on your query if you want all
patients born in Canada? if you want all male patients? Table 1:
Subjects with improper 1NF 6SHRUG, 2014-05-02
Slide 8
Second degree normalization Table 2 has employer information
about rows in Table 1 The table above has some redundant
information: name is repeated from Table 1, province is embedded in
the postal code Better design two or even 3 tables
NameCityProvPostalCode JohnHalifaxNSB3K 6R8 DevbaniHalifaxNSB3H 2Y9
Table 2: Business addresses 7SHRUG, 2014-05-02
Slide 9
Second degree normalization SubjIDPostalCode 0001B3K 6R8
0002B3H 2Y9 PostalCodeCityProv B3K 6R8HalifaxNS B3H 2Y9HalifaxNS
Table 2: Revised with 2NF Table 3: Creating a secondary table for
2NF 8SHRUG, 2014-05-02
Slide 10
Second degree normalization Table 2 now no longer contains name
its replaced with the subject ID to get the subjects name we link
the table to the table in the first example, using SUBJID/ID column
we get the province and city by linking Table 2 and 3 using the
POSTALCODE column SUBJID is a primary key in Tables 1 and 2
POSTALCODE is a foreign key in Table 2, but a primary key in Table
3 9SHRUG, 2014-05-02
Slide 11
Primary/Foreign Keys primary key a column or combination of
columns that uniquely identify each row in the table e.g. patient
medical record needs at least 3 columns to identify a unique
record: patient ID, date of encounter, and provider ID foreign key
a column or combination of columns that is used to link data
between two tables 10SHRUG, 2014-05-02
Slide 12
Questions about 2NF? Can you see the advantage of splitting the
data into different tables? share examples of your data where
normalization is used higher degrees of normalization work
similarly to the examples above you have to go through more tables
for higher levels of normalization in order to link to the data
that you need 11SHRUG, 2014-05-02
Slide 13
Getting access to data: What do you need from DBA? Explain to
DBA that you need to query data, but have no need to write to the
database this helps them to determine where you belong on a user
matrix DBA or IT install necessary software on your machine Google
has lots of information on SAS Connect SAS Connect documentation
12SHRUG, 2014-05-02
Slide 14
How SAS authenticates User name is provided by DBA/IT In this
example the password is held in the macro DBPASS Statement to have
Oracle print any messages to the SAS log proc sql; connect to
oracle (user = password="&dbpass path = prod ); %put
&sqlxmsg; This is an example of pass-through code 13SHRUG,
2014-05-02
Slide 15
Using a LIBNAME to connect Recall that slide 13 showed
pass-through facility in SAS most of the query is done on the
database Can use libname statement to connect instead of pass-
through advantage to this method is that you are programming in SAS
(using SAS functions and formats) SAS determines which program (SAS
or RDB) will handle statements more efficiently 14SHRUG,
2014-05-02
Slide 16
Using a LIBNAME to connect Example using a libname statement:
libname onco odbc dsn='Oncolog' schema=dbo; 1.2.3. 1. The name of
the library 2. Tells SAS that you are using an ODBC engine 3. DSN
use the name of the database that was used to set up the odbc
connection NOTE: schema statement is not always required 15SHRUG,
2014-05-02
Slide 17
Seeing your data - Views Once view is created, you use the
EXPLORER tab in SAS and use as normal dataset 16SHRUG,
2014-05-02
Slide 18
Seeing your data - Views Using the view columns in SAS EXPLORER
17SHRUG, 2014-05-02
Slide 19
Seeing your data - Views Double click on table to get to see
the data NOTE: columns that identify personal information have been
removed from this screen shot 18SHRUG, 2014-05-02
Slide 20
Other ways to view data You may have software from the RDB:
TOAD (for Oracle) SQL Developer (for Oracle) SQL Server Teradata
All vendors may have some limited function development software
that allows: Viewing data Viewing the type of a column: char, num,
date, etc. Writing SQL queries 19SHRUG, 2014-05-02
Slide 21
Sample view from SQL Developer 20SHRUG, 2014-05-02
Slide 22
Syntax: Single table - 1 of 2 PROC SQL DATA STEP proc sql;
create as select,, etc from where quit; data ; set ( keep= where=(
)); run; Example: Create a dataset (table) with men aged 50 to 74.
Assume the source table is called demographics and contains
variables: subjectID, age and sex 21SHRUG, 2014-05-02
Slide 23
Syntax: Single table 2 of 2 PROC SQL DATA STEP proc sql; create
table men5074 as select subjectID, age from work.demographics where
sex=M and age between 50 and 74 ; quit; data men5074 (drop=sex);
set work.demographics (keep=subjectid sex age where=(sex='M' and
50
Parsing the code - 2 of 3 (select ptc.gender, count(*) from
(select participant_id, sex_cd, case when sex_cd=222 then 'F' else
'M' end as gender from csprod.participant where trunc(birth_dt)
between to_date('19520601','Y YYYMMDD') and to_date('19530531','Y
YYYMMDD) and sex_cd 240 and del_dt is null) ptc Put these columns
in the SAS dataset part60 Create a temporary table called ptc Table
PTC contains columns as listed from the PARTICIPANT table, with the
restrictions shown in the WHERE clause 51SHRUG, 2014-05-02
Slide 53
Parsing the code 3 of 3 inner join (select participant_id from
csprod.participant_program where program_id=1 and
program_status_cd=263 and del_dt is null)pp on ptc.participant_id=
pp.participant_id group by ptc.gender ; disconnect from myconn;
quit; Create temporary table, PP from PARTICIPANT_PROGRAM with
restrictions defined in the WHERE clause 52SHRUG, 2014-05-02
Slide 54
Joins for joining two or more tables This example shows an
inner join: want participants, and the # males and females
participating in CRC screening program age 60 as of May 31, 2013
PTC PP C Area C is the result of the inner join Temporary table
PTC: a subset of csprod.participant Temporary table PP: a subset of
csprod.part_program 53SHRUG, 2014-05-02
Slide 55
Task 2 - Results What will be the query result? Whats the
table/dataset name? How many rows? How many columns? What are the
columns called? 54SHRUG, 2014-05-02
Slide 56
Task 2 - Results 55 SHRUG, 2014-05-02
Slide 57
Task 3 Patients with kidney cancer 56SHRUG, 2014-05-02 REQUEST
Find number of patients with invasive kidney cancer (ICD-O-
3=C64.9) diagnosed between 2008 and 2010. Breakdown counts by age
and sex. Interested in age < 60 and age 60 BACKGROUND remove any
patients who were deleted remove any tumors that were deleted
diagnoses are in table called oldiagnostic sex is in table called
olpatient birth date in table called person
Task 3 Code (1 of 5) proc sql feedback; create table onco_coh
as select a.*, b.olsex, f.birth_dt,
floor(yrdif(f.birth_dt,a.initdx_dt,'act/act')) as ageatdx from 58
SHRUG, 2014-05-02
Slide 60
Task 3 Code (2 of 5) /*** get cases ***/ (select o.personser,
o.diagnosticser, datepart(o.DateInitialDiagnosis) as initdx_dt
format=date9., o.icdositecode from onco.oldiagnostic o where
o.icdositecode in ('C64.9') /*** only invasive cancers ***/ and
substr(o.icdohistocode,6,1)='3' and year(o.dateinitialdiagnosis)
between 2008 and 2010 and o.dxstate='NS 59 SHRUG, 2014-05-02
Slide 61
Task 3 Code (3 of 5) /*** patient not deleted ***/ and
o.personser not in (SELECT ps1.Personser FROM onco.OlPatientSup ps1
WHERE ps1.PersonSer = o.PersonSer and ps1.identifier =
'CCRPatientReportingStatu' AND ps1.String IN ('04','05') and
ps1.FieldSeq = 0) 60 SHRUG, 2014-05-02
Slide 62
Task 3 Code (4 of 5) /*** diagnosis not deleted ***/ and
o.diagnosticser not in (SELECT ds1.diagnosticser FROM
onco.OLdiagnosticsup ds1 WHERE ds1.PersonSer = o.PersonSer and
o.diagnosticser = ds1.diagnosticser and ds1.identifier =
'CCRPrimaryReportingStatu' AND ds1.String IN ('04','05') and
ds1.FieldSeq = 0)) a 61 SHRUG, 2014-05-02
Slide 63
Task 3 Code (5 of 5) /*** get patient's sex ***/ left join
(select personser, olsex from onco.olpatient) b on
a.personser=b.personser /*** get birth date ***/ left join (select
personser, datepart(DateOfBirth) as birth_dt format=date9. from
onco.person where lowcase(persontype)='patient') f on
a.personser=f.personser ; quit; 62 SHRUG, 2014-05-02
Slide 64
Task 3 - Results 63 SHRUG, 2014-05-02 Sex Age at diagnosis
Under 6060 and olderTotal M 128247375 F 76147223 Total
204394598
Slide 65
Self-join Correlated sub-query Outer from and where UNION
SHRUG, 2014-05-0264
Slide 66
What is the sound of one table joining? 77 /* select candidates
for babes becoming mothers */ 78 proc sql 79 ; 80 create table
Candidates as 81 select B1.BrthDate 82, B1.BirthID 83, B2.DLMBDate
84, B2.ContctID 85 from SASDM.DelnBrth as B1 /* babes */ 86,
SASDM.DelnBrth as B2 /* mums */ 87 where B1.BrthDate = B2.DLMBDate;
NOTE: Table WORK.CANDIDATES created, with 855040 rows and 4
columns. 65 SHRUG, 2014-05-02
Slide 67
OB/Research has data in: Clinical ultrasound db Maternal serum
screening db Objective: find all mothers with abnormal screening
and see if the ultrasound indicated risk for restricted growth
(small baby) Correlated Sub-query 66SHRUG, 2014-05-02
Slide 68
create table Work.WithAtlee as /* VP data only available after
2003 not 2000 */ select One18.* /* 18-wk US */, M.MO365/*perinatal
data*/, M.Wgt4Age /* 45 lines omitted here */ Correlated Sub-Query
67SHRUG, 2014-05-02
Slide 69
, M.DLPrvNND, M.DLPrvFTD /* no such variable as IUGR in */,
M.MotherID in /* previous pregnancy - back link */ ( select
Prev.MotherID from SASDM.DelnBrth as Prev where Prev.MotherID =
M.MotherID and Prev.Wgt4Age in ( 1, 2)/* pick