46
DATA PREPARATION|FREQUENCY DISTRIBUTION|CROSS- TABULATION DATA COLLECTION PREPARATION Resuello, Ron Jacob O. Gamit, Ynnah Ysabelle V.

Data Collection Preparation

Embed Size (px)

Citation preview

D A T A P R E P A R A T I O N | F R E Q U E N C Y D I S T R I B U T I O N | C R O S S -T A B U L A T I O N

DATA COLLECTION PREPARATION

Resuello, Ron Jacob O.Gamit, Ynnah Ysabelle V.

DATA

• Data is anything that has been produced or created during research. Primary data is data that you have created yourself, but your data sets can also contain data that has been created by other researchers.

WHAT IS DATA COLLECTION?

• It is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes.

METHODS OF DATA COLLECTION

• A. Interview (Direct) Method – a method of person-to-person exchange between the interviewer and the interviewee.

METHODS OF DATA COLLECTION

POSITIVE• 1) It provides consistent and more precise

information since clarification maybe given by the interviewee.

• 2) Questions maybe repeated or maybe modified to suit the interviewee’s level of understanding.

METHODS OF DATA COLLECTION

NEGATIVE• 1) Time-consuming• 2) Expensive• 3) Limited field coverage

METHODS OF DATA COLLECTION

• Questionnaire (Indirect) Method – in this method written responses are given to prepared questions. A questionnaire is used to elicit answers to the problems of the study. Questionnaires may be mailed or hand-carried.

METHODS OF DATA COLLECTION

POSITIVE• 1) Inexpensive• 2) Can cover a wide area in a shorter span of

time.• 3) Respondents may feel a greater sense of

freedom to express views and opinions because their anonymity is maintained.

METHODS OF DATA COLLECTION

NEGATIVE• 1) There’s a strong possibility of non-

response, especially when questionnaires are mailed.

• 2) Questions not easily understood may not be answered.

METHODS OF DATA COLLECTION

• C. Registration Method – this method of gathering information is enforced by law.

E.g. • Registration of births• Deaths• Vehicles• Licenses• Number of tourists in a City

METHODS OF DATA COLLECTION

POSITIVE• 1) Information is kept systematized.• 2) Information is always made available to

the public.

METHODS OF DATA COLLECTION

• D. Observation Method – the investigator observes the behavior of the subject/respondent. It is used when the subjects cannot talk or write.

POSITIVEThe recording of behavior at the appropriate time and situation is made possible.

METHODS OF DATA COLLECTION

• E. Experiment Method - this method is used when the objective is to determine the cause-and-effect relationship of certain phenomena under controlled conditions. It is usually used by scientific researches.

DATA COLLECTION PREPARATION

1. MAKE LOGISTICS ARRANGEMENTS.

• In order to make logistics arrangements, you will have to (1) set up central local headquarters, (2) contact local authorities where the survey will be carried out.

2. PREPARE THE QUESTIONNAIRE AND TRAINING MATERIALS.

• You must pre-test the translated questionnaire in the field.

2. PREPARE THE QUESTIONNAIRE AND TRAINING MATERIALS.

Specifically, the pre-test should answer the following questions:

• Are respondents willing to answer questions in the way you have asked them?

• Are any of the questions particularly difficult to answer or do they address sensitive issues?

• Are the questions well understood by the respondents?

• Is it necessary to create new codes for common answers which were not included in the original questionnaire?

3. CHOOSING AND PREPARING THE EQUIPMENT

• Equipment must be purchased well in advance of the survey. Examples are Weighing scales , Length/Height Boards,etc.

4. QUESTIONNAIRE CHECKING AGAIN

• Questionnaire checking involves eliminating unacceptable questionnaires. These questionnaires may be incomplete, instructions not followed, little variance, missing pages, past cut-off date or respondent not qualified.

5. COLLECTING DATA AND ANALYSIS

• Editing: Editing looks to correct illegible, incomplete, inconsistent and ambiguous answers.

• Coding: Coding typically assigns alpha or numeric codes to answers that do not already have them so that statistical techniques can be applied.

• Transcribing: Transcribing data involves transferring data so as to make it accessible to people or applications for further processing.

• Cleaning: Cleaning reviews data for consistencies. Inconsistencies may arise from faulty logic, out of range or extreme values.

• Statistical adjustments: Statistical adjustments applies to data that requires weighting and scale transformations.

• Analysis strategy selection: Finally, selection of a data analysis strategy is based on earlier work in designing the research project but is finalized after consideration of the characteristics of the data that has been gathered.

DATA PREPARATION

WHAT IS DATA PREPARATION?

• Data preparation is about constructing a dataset from one or more data sources to be used for exploration and modeling. It is a solid practice to start with an initial dataset to get familiar with the data, to discover first insights into the data and have a good understanding of any possible data quality issues.

DATA PREPARATION

• Organizing the data correctly can save a lot of time and prevent mistakes. 

•  Most researchers choose to use a database or statistical analysis program (Microsoft Excel, SPSS) that they can format to fit their needs in order to organize their data effectively.

•  Once the data has been entered, it is crucial that the researcher check the data for accuracy.

STEPS IN DATA PREPARATION

• 1. Checking the Data For Accuracy• As soon as data is received you should screen it for

accuracy. In some circumstances doing this right away will allow you to go back to the sample to clarify any problems or errors.

• Are the responses legible/readable?• Are all important questions answered?• Are the responses complete?• Is all relevant contextual information included (e.g., data,

time, place, researcher)?

• 2. EDITING• Editing detects error and omission correct them

as far as possible.

• Purpose:• To ensure accuracy• To bring about consistency with other information• Make sure the data is uniformly entered.• It is complete and arranged to simplify coding and

tabulation.

STEPS IN DATA PREPARATION

STEPS IN DATA PREPARATION

• 3. ENTERING THE DATA INTO THE COMPUTER•  the analyst should use a procedure called double entry.• This double entry procedure significantly reduces entry

errors.• An alternative is to enter the data once and set up a

procedure for checking the data for accuracy.• EXAMPLE: you might spot check records on a random basis.

• An alternative is to enter the data once and set up a procedure for checking the data for accuracy.

• you will use various programs to summarize the data that allow you to check that all the data are within acceptable limits and boundaries.

STEPS IN DATA PREPARATION

• 4. DATA TRANSFORMATIONS• Once the data have been entered it is almost always

necessary to transform the raw data into variables that are usable in the analyses.

• Missing values• Many analysis programs automatically treat blank values as

missing. In others, you need to designate specific values to represent missing values.

• Item reversals• On scales and surveys, we sometimes use reversal items to help

reduce the possibility of a response set. When you analyze the data, you want all scores for scale items to be in the same direction where high scores mean the same thing and low scores mean the same thing. In these cases, you have to reverse the ratings for some of the scale items.

STEPS IN DATA PREPARATION

• Scale totals

• Once you've transformed any individual scale items you will often want to add or average across individual items to get a total score for the scale.

• Categories

• For many variables you will want to collapse them into categories. For instance, you may want to collapse income estimates (in dollar amounts) into income ranges.

FREQUENCY DISTRIBUTION TABLE

FREQUENCY DISTRIBUTION TABLE

• Frequency tells you how often something occurs. The frequency of an observation in statistics tells you the number of times the observation occurs in the data.

• Frequency distribution tables can show either categorical variables (sometimes called qualitative variables) or quantitative variables (sometimes called numeric variables). You can think of categorical variables as being categories (like eye color or brand of dog food) and quantitative variables as being numbers.

GROUPED AND UNGROUPED DATA

•UNGROUPED FREQUENCY DISTRIBUTION

•  The data obtained in original form are called raw data or ungrouped data.

• In an ungrouped frequency distribution, the results are in order.

UNGROUPED DATA

• In each of 20 homes, people were asked how many cars were registered to their households. The results were recorded as follows:

• 1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0

Number of cars (x)

Tally Frequency (f)

0 4

1 6

2 5

3 3

4 2

Table 1. Frequency table for the number of cars registered in each household         

GROUPED DATA

•UNGROUPED DATA• a moderate range of frequencies are gathered together and compared to a similar range.

GROUPED DATA

• GROUPED DATA• a moderate range of frequencies are gathered together and

compared to a similar range•CLASS FREQUENCY

• Number of observations belonging to a class interval.• CLASS INTERVAL

• Refers to the grouping defined by a lower limit and upper limit

• CLASS BOUNDARIES• The lower and the upper true limits

• CLASS MARKS• Midpoint of each class interval and it is obtained by getting the

average of the lower class limit and the upper class limit • CLASS SIZE

•  difference between the upper class boundary and lower class boundary of a class interval.

GROUPED DATA

• Thirty AA batteries were tested to determine how long they would last. The results, to the nearest minute, were recorded as follows:

• 423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363, 391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410, 419, 386, 390

GROUPED DATA

• The lowest value is 363 and the highest is 431.

• Using the given data and a class interval of 10, the interval for the first class is 360 to 369 and includes 363 (the lowest value). Remember, there should always be enough class intervals so that the highest value is included.

• * Number of class intervals (ideal nc= 5 to 20)

GROUPED DATA

Battery life, minutes (x) Tally Frequency (f)

360–369 2

370–379 3

380–389 5

390–399 7

400–409 5

410–419 4

420–429 3

430–439 1

Total   30

Table 3. Life of AA batteries, in minutes                

CLASS BOUNDARY

CB CM <CF >CF

359.5-369.5

369.5-379.5

379.5-389.5

389.5-399.5

399.5-409.5

409.5-419.5

419.5-429.5

429.5-439.5

364374384394404414424434

21222324252627282

30282520138411

CROSS-TABULATION

CROSS-TABULATION

• Cross tabulation is a tool that allows you compare the relationship between two variables.

• A cross-tabulation is a two (or more) dimensional table that records the number (frequency) of respondents that have the specific characteristics described in the cells of the table.

• The Chi-square statistic is the primary statistic used for testing the statistical significance of the cross-tabulation table. Chi-square tests whether or not the two variables are independent.

CROSS TABULATION WITH CHI SQUARE ANALYSIS

CHI SQUARE ANALYSIS

• The chi-square statistic is computed by first computing a chi-square value for each individual cell of the table and then summing them up to form a total Chi-square value for the table. The chi-square value for the cell is computed as:

(Observed Value – Expected Value)2 / (Expected Value)

REMEMBER

• The chi-square statistic, along with the associated probability of chance observation, may be computed for any table. If the variables are related (i.e. the observed table relationships would occur with very low probability, say only 5%) then we say that the results are “statistically significant” at the “.05 or 5% level”. This means that the variables have a low chance of being independent.

SPSS TUTORIAL FOR CROSS TABULATION

What is SPSS?

"SPSS is a comprehensive system for analyzing data. SPSS is the acronym of Statistical Package for the Social Science

SPSS can take data from almost any type of file and use them to generate tabulated reports, charts, and plots of distributions and trends, descriptive statistics, and complex statistical analysis."

THANK YOU AND GOD BLESS!