Upload
rooney-watts
View
31
Download
1
Embed Size (px)
DESCRIPTION
Data Reference (the very, very basics). Data-reference: what do we need?. Tools Strategies Terminology Understanding of what we are looking for: not books or articles -- or facts. Data-reference: what do we need?. - PowerPoint PPT Presentation
Citation preview
Data-reference: what do we need?
Tools Strategies Terminology Understanding of what we are looking for: not
books or articles -- or facts.
Data-reference: what do we need?
Understanding of what we are looking for: not books or articles -- or facts.
Terminology Strategies Tools
Raw (for analysis) Cooked (facts)
Intended for use by computer
For human use:Eye-readable, charts, tables, graphs
Collected based on social science methodologies or administrative procedures
Produced from data
Computer-readable
Can be print, micro, computer readable
Data Statistics
Data or Statistics: Why does it matter?
Different search strategies and tools. Defines your goal. Helps you know when you've found it!
Tip: Data or Statistics?
Determine if the user wants (needs) statistics or data.– Do you want want one number? – Are you looking for a fact or figure?– Do you want to know “how many?”
Tip: Data or Statistics?
Determine if the user wants (needs) statistics or data.– Or… do you want a series of numbers? – Do you want to identify trends, make comparisons,
model relationships?– Will you be using statistical software (not Excel)?
From survey to data to statistics…
Survey instrumentQ1. [enter zip code ]Q2. [enter R’s first name ]Q3. [enter sex of R ]Q4. What was your major in College?Q5. What was your income last year?Q6. Did you go to church last week?
Answers to Questions
Zip Name Sex Major income church29002 Wilma F lit 0 y99005 Barney M engin 10 n99005 Betty F . 0 n92005 Ethel F theater 1000 y12534 Fred M. M PE 10000 y 12534 Lucy F lit 700 y25000 Ricky M music 11000 y20000 Fred A. M dance 10500 n15000 Ginger F math 9500 y
Must anonymize the data!Zip Name Sex Major income church29002 Wilma F lit 0 y99005 Barney M engin 10 n99005 Betty F . 0 n92005 Ethel F theater 1000 y12534 Fred M. M PE 10000 y 12534 Lucy F lit 700 y25000 Ricky M music 11000 y20000 Fred A. M dance 10500 n15000 Ginger F math 9500 y
Zip Name Sex Major income church29002 001 F lit 0 y99005 002 M engin 10 n99005 003 F . 0 n92005 004 F theater 1000 y12534 005 M PE 10000 y 12534 006 F lit 700 y25000 007 M music 11000 y20000 008 M dance 10500 n15000 009 F math 9500 y
Must anonymize the data!
Change Text to Numeric CodesZip Name Sex Major income church29002 001 F lit 0 y99005 002 M engin 10 n99005 003 F . 0 n92005 004 F theater 1000 y12534 005 M PE 10000 y 12534 006 F lit 700 y25000 007 M music 11000 y20000 008 M dance 10500 n15000 009 F math 9500 y
Zip Name Sex Major income church29002 001 1 lit 0 y99005 002 2 engin 10 n99005 003 1 . 0 n92005 004 1 theater 1000 y12534 005 2 PE 10000 y 12534 006 1 lit 700 y25000 007 2 music 11000 y20000 008 2 dance 10500 n15000 009 1 math 9500 y
Change Text to Numeric Codes
Zip Name Sex Major income church29002 001 1 lit 0 y99005 002 2 engin 10 n99005 003 1 . 0 n92005 004 1 theater 1000 y12534 005 2 PE 10000 y 12534 006 1 lit 700 y25000 007 2 music 11000 y20000 008 2 dance 10500 n15000 009 1 math 9500 y
The “codebook” mustdocument the numeric codes used!
For example:
Variable: “sex” 1 = female 2 = male
Change Text to Numeric Codes
Zip Name Sex Major income church29002 001 1 0075 0 y99005 002 2 0070 10 n99005 003 1 . 0 n92005 004 1 0076 1000 y12534 005 2 0001 10000 y 12534 006 1 0075 700 y25000 007 2 0077 11000 y20000 008 2 0078 10500 n15000 009 1 0050 9500 y
Change Text to Numeric Codes
Zip Name Sex Major income church29002 001 1 0075 0 199005 002 2 0070 10 299005 003 1 . 0 292005 004 1 0076 1000 112534 005 2 0001 10000 1 12534 006 1 0075 700 125000 007 2 0077 11000 120000 008 2 0078 10500 215000 009 1 0050 9500 1
Change Text to Numeric Codes
Zip Name Sex Major income church29002 001 1 lit 0 y99005 002 2 engin 10 n99005 003 1 . 0 n92005 004 1 theater 1000 y12534 005 2 PE 10000 y 12534 006 1 lit 700 y25000 007 2 music 11000 y20000 008 2 dance 10500 n15000 009 1 math 9500 y
Change Text to Numeric Codes
Zip Name Sex Major income church29002 001 1 0075 0 y99005 002 2 engin 10 n99005 003 1 . 0 n92005 004 1 theater 1000 y12534 005 2 PE 10000 y 12534 006 1 0075 700 y25000 007 2 music 11000 y20000 008 2 dance 10500 n15000 009 1 math 9500 y
Change Text to Numeric Codes
Zip Name Sex Major income church29002 001 1 0075 0 y99005 002 2 0070 10 n99005 003 1 . 0 n92005 004 1 0076 1000 y12534 005 2 0001 10000 y 12534 006 1 0075 700 y25000 007 2 0077 11000 y20000 008 2 0078 10500 n15000 009 1 0050 9500 y
Change Text to Numeric Codes
Zip Name Sex Major income church29002 001 1 0075 0 199005 002 2 0070 10 299005 003 1 . 0 292005 004 1 0076 1000 112534 005 2 0001 10000 1 12534 006 1 0075 700 125000 007 2 0077 11000 120000 008 2 0078 10500 215000 009 1 0050 9500 1
Sometimes, evennumeric variablesare encoded in ranges. For example:
Variable: “income” 1 = less than 1000 2 = 1000 - 4999 3 = 5000 - 10000 4 = more than 10000 9 = not reported
Change Text to Numeric Codes
Zip Name Sex Major income church29002 001 1 0075 1 199005 002 2 0070 1 299005 003 1 . 1 292005 004 1 0076 2 112534 005 2 0001 3 1 12534 006 1 0075 1 125000 007 2 0077 4 120000 008 2 0078 4 215000 009 1 0050 3 1
Sometimes, evennumeric variablesare encoded in ranges. For example:
Variable: “income” 1 = less than 1000 2 = 1000 - 4999 3 = 5000 - 10000 4 = more than 10000 9 = not reported
Change Text to Numeric Codes
Data Files do not need “headers”Zip Name Sex Major income church29002 001 1 0075 1 199005 002 2 0070 1 299005 003 1 . 1 292005 004 1 0076 2 112534 005 2 0001 3 1 12534 006 1 0075 1 125000 007 2 0077 4 120000 008 2 0078 4 215000 009 1 0050 3 1
29002 001 1 0075 1 1 99005 002 2 0070 1 299005 003 1 . 1 292005 004 1 0076 2 112534 005 2 0001 3 1 12534 006 1 0075 1 125000 007 2 0077 4 120000 008 2 0078 4 215000 009 1 0050 3 1
Data Files do not need “headers”
Data Files do not need extra space
29002 001 1 0075 1 1 99005 002 2 0070 1 299005 003 1 . 1 292005 004 1 0076 2 112534 005 2 0001 3 1 12534 006 1 0075 1 125000 007 2 0077 4 120000 008 2 0078 4 215000 009 1 0050 3 1
290020011 0075 1 1 990050022 0070 1 2 990050031 . 1 2 920050041 0076 2 1125340052 0001 3 1 125340061 0075 1 1250000072 0077 4 1200000082 0078 4 2150000091 0050 3 1
Data Files do not need extra space
2900200110075 1 1 9900500220070 1 2 990050031. 1 2 9200500410076 2 11253400520001 3 1 1253400610075 1 12500000720077 4 12000000820078 4 21500000910050 3 1
Data Files do not need extra space
29002001100751 1 99005002200701 2 990050031. 1 2 92005004100762 112534005200013 1 12534006100751 125000007200774 120000008200784 215000009100503 1
Data Files do not need extra space
290020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031
Data Files do not need extra space
Codebook must document locations
290020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031
For example:
Variable: “sex” location: column 9 width: 1
290020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031
For example:
Variable: “sex” location: column 9 width: 1
123456789
Codebook must document locations
Codebook documents question, location, codes.
290020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031
For example: Q3. [enter sex of R ]
Variable: “sex” location: column 9 width: 1
Variable: “sex” 1 = female 2 = male
To Use Data You Need 3 Things
Data: the datafile (the raw numbers)Metadata: the “codebook” (where the
numbers are and what they mean)Statistical Software (for reading the
datafile and analyzing the data)
Statisticalsoftware
Codebook
Data
Q3. [enter sex of R ]Variable: “sex” location: column 9 width: 1Variable: “sex” 1 = female 2 = male
90020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031
+
+
SPSS commandsSPSS reads the program
90020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031
Student writes SPSS program to analyze data…
SPSS reads the data.
And produces charts, tables, analysis, etc.
Tip: Data-reference is not about searching for an answer…
Data reference is often less about searching to find an answer. (That's a statistical reference question.)
Data reference is often more about exploring to find data that will enable users to ask a question.
What have we learned?
Data and statistics are not the same Data reference leads to primary research
material, not facts or statistics. To use data, a user must have data, metadata,
and statistical software. A-and…
What have we learned?
"Variables" are what contain critical, important content of data files.
And that means that the gold-standard of data-reference is variable-level searching.