31
Creating Dictionaries

Creating Dictionaries

  • Upload
    udell

  • View
    108

  • Download
    0

Embed Size (px)

DESCRIPTION

Creating Dictionaries. What is a Dictionary?. CSPro data files are text files with no metadata, only data A dictionary is needed to describe the contents of the data file CSPro dictionaries: End with the extension . dcf Are text files that can be edited manually, though that is inadvisable - PowerPoint PPT Presentation

Citation preview

Page 1: Creating Dictionaries

Creating Dictionaries

Page 2: Creating Dictionaries

What is a Dictionary?

CSPro data files are text files with no metadata, only data

A dictionary is needed to describe the contents of the data file

CSPro dictionaries: End with the extension .dcf Are text files that can be edited manually, though that is

inadvisable Are not dependent on the existence of a data entry

application Every CSPro application needs a dictionary Multiple CSPro applications can share the same

dictionary

Page 3: Creating Dictionaries

CSPro Data Files

CSPro data files are: Flat files (all data in a single file) Text files (all data is stored in ANSI format and is human

readable) Items in the data file have a fixed length Records in the data file are stored one per line Have no specific file extension An index is created for the data file to allow for

quick access to specific cases (file extension: .idx)

Page 4: Creating Dictionaries

Identification Items

CSPro needs a way to differentiate between different cases (questionnaires)

Identification (ID) items uniquely identify all cases Two cases in a single data file cannot have the same

ID, but cases across data files can share IDs

Page 5: Creating Dictionaries

Identification Items (continued) Generally a questionnaire has geocodes or some

other system of attributes that uniquely identifies each unit of enumeration

For censuses, these IDs are almost always geocodes Example: Province – District – Division – Location –

Sublocation – Enumeration Area – Household Number For surveys, these ID sections are often more

condensed Example: Cluster – Household Number

Page 6: Creating Dictionaries

Identification Items (continued) It is common for the “identification section” of a questionnaire

to have questions that do not help uniquely identify a household

Examples include: Enumerator number Household type Urban/rural status

Some people prefer to make the ID section as small as possible, to pick the fewest number of items possible to ensure that each case is unique

Other people take a more liberal approach to ID fields, but CSPro does have a limit to how long the ID field can be (length: 127)

Page 7: Creating Dictionaries

ID Examples

ID: YearItem on Record: Winner of U.S. presidential election

1996William Jefferson Clinton2000George Walker Bush2004George Walker Bush2008Barack Hussein Obama II

ID: State, countyItem on Record: County name

0101Autauga [Alabama]5123Weston [Wyoming]

Page 8: Creating Dictionaries

Dictionary Fundamentals

Identification Items: value(s) to uniquely identify a case

Levels: a group of one or several records Records: a group of one or several items Items: a value, or variable, that is numeric or

alphanumeric Subitems: part of an item Value Sets: a listing of valid values for an item

Page 9: Creating Dictionaries

Dictionary Fundamentals (with a typical survey example) Identification Items: value (s) to uniquely identify a case

Cluster number, household number Levels: a group of one or several records

Household questionnaire, female questionnaires Records: a group of one or several items

Housing characteristics, household roster, fertility questions

Items: a value, or variable, that is numeric or alphanumeric Water access, roof type, …, sex, age, …, children ever born

Subitems: part of an itemDate of birth broken down into year, month, day

Value Sets: a listing of valid values for an itemSex: Male (1), Female (2)

Page 10: Creating Dictionaries

Naming Dictionary Elements

Every element of a dictionary has two attributes, a name and a label

Name You use the name to refer to the element while programming logic Can be up to 32 characters but must start with a letter Each dictionary element must have a unique name, and there are

some names that are reserved for CSPro keywords Label

A more thorough description of the element Can be up to 255 characters and can contain punctuation and spacing Often labels are the only documentation that anyone sees, so be sure

to take care when creating labels

Page 11: Creating Dictionaries

Naming Dictionary Elements (continued) If you plan on writing a lot of programming logic,

consider how long you make the names for elements

Three common approaches exist for naming elements when the questionnaire has each question numbered

Approach 1: P10_RELATIONSHIP, P11_SEX, P12_AGE

Approach 2: RELATIONSHIP, SEX, AGE Approach 3: P10, P11, P12 Remember that each element has a name and a

label, and that they do not (and probably should not) be the same value

Page 12: Creating Dictionaries

Levels

Applications can have one or two levels Most applications are and should be one-level applications,

though some applications are better designed as two-level applications

Each level usually has its own questionnaire associated with it The top-level can only have one questionnaire, while multiple

questionnaires can exist at lower levels Different sections on a questionnaire translate to multiple

records, not multiple levels How many levels do these questionnaires need?

Household questions, population questions, agriculture questions Population questions, women of reproductive age questions

Page 13: Creating Dictionaries

Records

Records are groupings of items, and generally translate to sections of a questionnaire

Examples of records in a census might be: housing record, population records, death records, emigrant records, agriculture record

A record can be optional, e.g., death records A record can occur more than once per

questionnaire, e.g., population records When deciding how many times a record can occur,

select the maximum possible reasonable value

Page 14: Creating Dictionaries

Record Type

When a dictionary has more than one kind of record, each record must have a type value

The type value differentiates one record in a data file from the other records

You can specify particular values for the record types, or allow CSPro to assign these values automatically

If your dictionary has many records, you may need to increase the length of the record type (default length: 1)

Page 15: Creating Dictionaries

Record Type in the Data File

This data file has two records: winner of the presidential election (1) and loser of the presidential election (2)

The ID item is the year of the election

RT ID RECORD ITEMS1 1996 William Jefferson Clinton2 1996 Robert Joseph Dole1 2000 George Walker Bush2 2000 Albert Arnold Gore, Jr.2 2004 John Forbes Kerry1 2004 George Walker Bush

Note that the order of the different records does not matter

Page 16: Creating Dictionaries

Multiply-Occurring Records in the Data File This data file has two records: winner of the presidential election

(1, singly-occurring) and losers of the presidential election (2, multiply-occurring)

The ID item is the year of the election

RT ID RECORD ITEMS1 1996 William Jefferson Clinton2 1996 Robert Joseph Dole2 1996 Henry Ross Perot1 2000 George Walker Bush2 2000 Albert Arnold Gore, Jr.2 2000 Ralph Nader2 2000 Patrick Joseph Buchanan

Note that the order of the multiply-occurring records DOES matter

Page 17: Creating Dictionaries

Items

Items (variables) describe the data for each question on a census or survey

Items have several properties: Length: How many characters are needed to faithfully

store all possible values for this question? Data Type: Will this item contain only numeric values, or

will it also store words or sentences? Item Type: Is this a subitem? (use selectively) Occurrences: Does this item repeat several times? (use

selectively)

Page 18: Creating Dictionaries

Items (continued)

Items have several properties: Decimal: Will this item hold a decimal fraction? If so, how

many digits are necessary to the right of the decimal point?

Decimal Character: If the numeric item holds a decimal fraction, should the item be saved to the data file with a decimal point? (This is a purely cosmetic indicator, though it does have bearing on the length of the item.)

Zero Fill: Do you want the unused spaces to the left of a number padded with zeroes?

Page 19: Creating Dictionaries

Item Representations

This is the number 3.14 stored using various item attributes Numeric, Length: 4, Decimal: 2, 3.14

Decimal Character: Yes, Zero Fill: Yes Numeric, Length: 6, Decimal: 2, 003.14

Decimal Character: Yes, Zero Fill: Yes Numeric, Length: 6, Decimal: 2, 000314

Decimal Character: No, Zero Fill: Yes Numeric, Length: 6, Decimal: 2, 3.14

Decimal Character: Yes, Zero Fill: No Numeric, Length: 6, Decimal: 3, 3.140

Decimal Character: Yes, Zero Fill: No Alphanumeric, Length: 6 3.14

Page 20: Creating Dictionaries

Subitems

People tend to overuse subitems, but they are useful in situations in which you intend to process data that makes up a small part of a larger number

Using logic you can access parts of items without having to make them subitems, but subitems can simplify processing, as well as satisfy value set checking while on a form

Example: Item: Social Security Number, Length 11, comprised of three

subitems: Area Number, digits 1-3 Group Number, digits 5-6 Serial Number, digits 8-11

Page 21: Creating Dictionaries

Value Sets Value sets are optional and tell CSPro what values are considered

acceptable for an item If no value set is present, CSPro will accept all values for the item

(within limit; i.e., numeric fields cannot contain letters) If an item has multiple value sets, CSPro will use the first one to check

the validity of keyed data Using logic the programmer can change what value set is active for an

item, and can even generate a value set dynamically Value sets can contain discrete values, and for numeric items, value

sets can contain ranges Value set ranges can overlap; this is common for tabulation applications If many items share the same possible values, you can link the value

sets so that modifying the value set of one item alters the value set for linked items

Page 22: Creating Dictionaries

Value Set Examples

Sex:Label From ToMale 1Female 2

Age:Minor 0 17Teenager 13 19Adult 18 99Retiree 67 99

The from/to values of each value set are what is stored in the keyed data file, not the value set labels

Page 23: Creating Dictionaries

Special Values

CSPro has three “special values” that describe certain kinds of data

Not Applicable: the item is blank(e.g., date of menarche would not be asked of men)

Missing: the codebook had a value for missing (or not stated) and you assign this value to be missing

Default: the item has an invalid value(e.g., your program logic assigned a three-digit value to a two-digit field)

By default CSPro ensures that keyed data fits in the value set and is not blank, but if desired CSPro can accept blank data or out of range data

Page 24: Creating Dictionaries

Documenting Dictionary Elements To the left of every element in the dictionary editor

is a small gray box under the column heading N Clicking on this box brings up a field in which you

can write notes about the dictionary element These notes are stored in the dictionary file but are

not visible during data entry Consider making use of these notes, especially

when working with partners on an application

Page 25: Creating Dictionaries

Relative Positioning

By default, CSPro will automatically assign the starting position (column number) of each item in your dictionary

When creating a new dictionary, it is best to let CSPro generate these values

Inserting an item in between other items, or modifying the length of an item, will cause all the other items’ starting positions to automatically change

There will be no gaps in the data file The default order in the data file will be: record

type, ID items, record items in the order they appear on the screen

Page 26: Creating Dictionaries

Absolute Positioning

However, if you are creating a dictionary to match an existing data file, it may be necessary to select absolute positioning

With absolute positioning, you must specify the starting position (column number) of each item in your dictionary

It is your responsibility to make sure that items do not overlap

Gaps can exist in a data file

Page 27: Creating Dictionaries

Relative vs. Absolute Example Relative:

11996William Jefferson Clinton21996Robert Joseph Dole

Absolute (one of many possibilities)

William Jefferson Clinton 1996 1Robert Joseph Dole 1996 2

Page 28: Creating Dictionaries

Modifying the Dictionary

Before a data entry operation begins, feel free to modify the dictionary freely

CSPro will detect changes between the dictionary and forms, so if you rename or delete a dictionary item, the field on the form will also be renamed, or will be removed from the form

However, once some data exists using a dictionary format, modifying the dictionary must be done with great care

In all cases, make backups of your dictionary before any modifications so that you always have a dictionary to read data that was entered at any time of the data entry operation

Page 29: Creating Dictionaries

Adding Fields to the Dictionary If, after the data entry process has begun, some

fields will be added to the dictionary, one option is to simply add them to the end of any given record

This means that, while the data that already exists will have blanks for these new values, that the data does not have to be reformatted and can be read by the new dictionary

However, if adding the fields to the end of a record is not practical, you can insert them in the record, but then all existing data must be reformatted to the new dictionary format

Page 30: Creating Dictionaries

Modifying Item Lengths

If, after the data entry process has begun, the length of some items will be increased, you must reformat the existing data files

However, if the length of some items will be decreased, it may be possible to use absolute positioning to make your old data files readable

Likewise, deleting an item from the dictionary can be done in a way that does not require reformatting, but again absolute positioning must be used

Page 31: Creating Dictionaries

Dictionary Macros

By right-clicking on the dictionary name in the tree you can access the undocumented dictionary macros

Names and labels of dictionary items, or value sets, can be copied to Excel format, modified in Excel, and then pasted back to CSPro

This can be particularly useful if you want coworkers who do not know how to use CSPro to help with the creation of the dictionary, perhaps by adding values to the codebook (value sets)