66

Data Reference (the very, very basics)

Embed Size (px)

DESCRIPTION

Data Reference (the very, very basics). Data-reference: what do we need?. Tools Strategies Terminology Understanding of what we are looking for: not books or articles -- or facts. Data-reference: what do we need?. - PowerPoint PPT Presentation

Citation preview

Data Reference(the very, very basics)

Data-reference: what do we need?

Tools Strategies Terminology Understanding of what we are looking for: not

books or articles -- or facts.

Data-reference: what do we need?

Understanding of what we are looking for: not books or articles -- or facts.

Terminology Strategies Tools

La trahison des images, The treachery of images, Rene Magritte

Ceci n’est pas les “data.”

C’est les statistiques!

Raw (for analysis) Cooked (facts)

Intended for use by computer

For human use:Eye-readable, charts, tables, graphs

Collected based on social science methodologies or administrative procedures

Produced from data

Computer-readable

Can be print, micro, computer readable

Data Statistics

Data

Statistics

Where do statistical babies come from?

+ =

Data or Statistics: Why does it matter?

Different search strategies and tools. Defines your goal. Helps you know when you've found it!

Tip: Data or Statistics?

Determine if the user wants (needs) statistics or data.– Do you want want one number? – Are you looking for a fact or figure?– Do you want to know “how many?”

Tip: Data or Statistics?

Determine if the user wants (needs) statistics or data.– Or… do you want a series of numbers? – Do you want to identify trends, make comparisons,

model relationships?– Will you be using statistical software (not Excel)?

http://factfinder.census.gov/

http://www.census.gov/compendia/statab/elections/election.pdf

http://www.census.gov/compendia/statab/tables/06s0405.xls

ftp://ftp.bls.gov/pub/special.requests/lf/aat44.txt

http://www.bls.gov/webapps/legacy/cpsatab7.htm

From survey to data to statistics…

Survey instrumentQ1. [enter zip code ]Q2. [enter R’s first name ]Q3. [enter sex of R ]Q4. What was your major in College?Q5. What was your income last year?Q6. Did you go to church last week?

Answers to Questions

Zip Name Sex Major income church29002 Wilma F lit 0 y99005 Barney M engin 10 n99005 Betty F . 0 n92005 Ethel F theater 1000 y12534 Fred M. M PE 10000 y 12534 Lucy F lit 700 y25000 Ricky M music 11000 y20000 Fred A. M dance 10500 n15000 Ginger F math 9500 y

Must anonymize the data!Zip Name Sex Major income church29002 Wilma F lit 0 y99005 Barney M engin 10 n99005 Betty F . 0 n92005 Ethel F theater 1000 y12534 Fred M. M PE 10000 y 12534 Lucy F lit 700 y25000 Ricky M music 11000 y20000 Fred A. M dance 10500 n15000 Ginger F math 9500 y

Zip Name Sex Major income church29002 001 F lit 0 y99005 002 M engin 10 n99005 003 F . 0 n92005 004 F theater 1000 y12534 005 M PE 10000 y 12534 006 F lit 700 y25000 007 M music 11000 y20000 008 M dance 10500 n15000 009 F math 9500 y

Must anonymize the data!

Change Text to Numeric CodesZip Name Sex Major income church29002 001 F lit 0 y99005 002 M engin 10 n99005 003 F . 0 n92005 004 F theater 1000 y12534 005 M PE 10000 y 12534 006 F lit 700 y25000 007 M music 11000 y20000 008 M dance 10500 n15000 009 F math 9500 y

Zip Name Sex Major income church29002 001 1 lit 0 y99005 002 2 engin 10 n99005 003 1 . 0 n92005 004 1 theater 1000 y12534 005 2 PE 10000 y 12534 006 1 lit 700 y25000 007 2 music 11000 y20000 008 2 dance 10500 n15000 009 1 math 9500 y

Change Text to Numeric Codes

Zip Name Sex Major income church29002 001 1 lit 0 y99005 002 2 engin 10 n99005 003 1 . 0 n92005 004 1 theater 1000 y12534 005 2 PE 10000 y 12534 006 1 lit 700 y25000 007 2 music 11000 y20000 008 2 dance 10500 n15000 009 1 math 9500 y

The “codebook” mustdocument the numeric codes used!

For example:

Variable: “sex” 1 = female 2 = male

Change Text to Numeric Codes

Zip Name Sex Major income church29002 001 1 0075 0 y99005 002 2 0070 10 n99005 003 1 . 0 n92005 004 1 0076 1000 y12534 005 2 0001 10000 y 12534 006 1 0075 700 y25000 007 2 0077 11000 y20000 008 2 0078 10500 n15000 009 1 0050 9500 y

Change Text to Numeric Codes

Zip Name Sex Major income church29002 001 1 0075 0 199005 002 2 0070 10 299005 003 1 . 0 292005 004 1 0076 1000 112534 005 2 0001 10000 1 12534 006 1 0075 700 125000 007 2 0077 11000 120000 008 2 0078 10500 215000 009 1 0050 9500 1

Change Text to Numeric Codes

Zip Name Sex Major income church29002 001 1 lit 0 y99005 002 2 engin 10 n99005 003 1 . 0 n92005 004 1 theater 1000 y12534 005 2 PE 10000 y 12534 006 1 lit 700 y25000 007 2 music 11000 y20000 008 2 dance 10500 n15000 009 1 math 9500 y

Change Text to Numeric Codes

Zip Name Sex Major income church29002 001 1 0075 0 y99005 002 2 engin 10 n99005 003 1 . 0 n92005 004 1 theater 1000 y12534 005 2 PE 10000 y 12534 006 1 0075 700 y25000 007 2 music 11000 y20000 008 2 dance 10500 n15000 009 1 math 9500 y

Change Text to Numeric Codes

Zip Name Sex Major income church29002 001 1 0075 0 y99005 002 2 0070 10 n99005 003 1 . 0 n92005 004 1 0076 1000 y12534 005 2 0001 10000 y 12534 006 1 0075 700 y25000 007 2 0077 11000 y20000 008 2 0078 10500 n15000 009 1 0050 9500 y

Change Text to Numeric Codes

Zip Name Sex Major income church29002 001 1 0075 0 199005 002 2 0070 10 299005 003 1 . 0 292005 004 1 0076 1000 112534 005 2 0001 10000 1 12534 006 1 0075 700 125000 007 2 0077 11000 120000 008 2 0078 10500 215000 009 1 0050 9500 1

Sometimes, evennumeric variablesare encoded in ranges. For example:

Variable: “income” 1 = less than 1000 2 = 1000 - 4999 3 = 5000 - 10000 4 = more than 10000 9 = not reported

Change Text to Numeric Codes

Zip Name Sex Major income church29002 001 1 0075 1 199005 002 2 0070 1 299005 003 1 . 1 292005 004 1 0076 2 112534 005 2 0001 3 1 12534 006 1 0075 1 125000 007 2 0077 4 120000 008 2 0078 4 215000 009 1 0050 3 1

Sometimes, evennumeric variablesare encoded in ranges. For example:

Variable: “income” 1 = less than 1000 2 = 1000 - 4999 3 = 5000 - 10000 4 = more than 10000 9 = not reported

Change Text to Numeric Codes

Data Files do not need “headers”Zip Name Sex Major income church29002 001 1 0075 1 199005 002 2 0070 1 299005 003 1 . 1 292005 004 1 0076 2 112534 005 2 0001 3 1 12534 006 1 0075 1 125000 007 2 0077 4 120000 008 2 0078 4 215000 009 1 0050 3 1

29002 001 1 0075 1 1 99005 002 2 0070 1 299005 003 1 . 1 292005 004 1 0076 2 112534 005 2 0001 3 1 12534 006 1 0075 1 125000 007 2 0077 4 120000 008 2 0078 4 215000 009 1 0050 3 1

Data Files do not need “headers”

Data Files do not need extra space

29002 001 1 0075 1 1 99005 002 2 0070 1 299005 003 1 . 1 292005 004 1 0076 2 112534 005 2 0001 3 1 12534 006 1 0075 1 125000 007 2 0077 4 120000 008 2 0078 4 215000 009 1 0050 3 1

290020011 0075 1 1 990050022 0070 1 2 990050031 . 1 2 920050041 0076 2 1125340052 0001 3 1 125340061 0075 1 1250000072 0077 4 1200000082 0078 4 2150000091 0050 3 1

Data Files do not need extra space

2900200110075 1 1 9900500220070 1 2 990050031. 1 2 9200500410076 2 11253400520001 3 1 1253400610075 1 12500000720077 4 12000000820078 4 21500000910050 3 1

Data Files do not need extra space

29002001100751 1 99005002200701 2 990050031. 1 2 92005004100762 112534005200013 1 12534006100751 125000007200774 120000008200784 215000009100503 1

Data Files do not need extra space

290020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031

Data Files do not need extra space

Codebook must document locations

290020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031

For example:

Variable: “sex” location: column 9 width: 1

290020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031

For example:

Variable: “sex” location: column 9 width: 1

123456789

Codebook must document locations

Codebook documents question, location, codes.

290020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031

For example: Q3. [enter sex of R ]

Variable: “sex” location: column 9 width: 1

Variable: “sex” 1 = female 2 = male

To Use Data You Need 3 Things

Data: the datafile (the raw numbers)Metadata: the “codebook” (where the

numbers are and what they mean)Statistical Software (for reading the

datafile and analyzing the data)

Statisticalsoftware

Codebook

Data

Q3. [enter sex of R ]Variable: “sex” location: column 9 width: 1Variable: “sex” 1 = female 2 = male

90020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031

+

+

SPSS commandsSPSS reads the program

90020011007511 990050022007012 990050031. 12 920050041007621125340052000131 125340061007511250000072007741200000082007842150000091005031

Student writes SPSS program to analyze data…

SPSS reads the data.

And produces charts, tables, analysis, etc.

Female

49 years old

Codebook entry for variable PRES92

Question text

Responses

Codebook entry for variable DEGREE

Question text

Responses

Voted for Clinton Junior college Female

49 years old

Degree

Pres92

Tip: "variables" contain the essential, important content of data files

Tip: Data-reference is not about searching for an answer…

Data reference is often less about searching to find an answer. (That's a statistical reference question.)

Data reference is often more about exploring to find data that will enable users to ask a question.

What have we learned?

Data and statistics are not the same Data reference leads to primary research

material, not facts or statistics. To use data, a user must have data, metadata,

and statistical software. A-and…

What have we learned?

"Variables" are what contain critical, important content of data files.

And that means that the gold-standard of data-reference is variable-level searching.

http://gort.ucsd.edu/calpol/

Question Text(Variable 34)

Study of July 2003