49
Unit – 4 DATA PREPARATION AND ANALYSIS

DATA PREPARATION AND ANALYSIS

Embed Size (px)

DESCRIPTION

Data Preparation The data collected from the respondents is generally not in the form to be analyzed directly. After the responses are recorded or received, the next stage is that of preparation of data i.e. to make the data amenable for appropriate analysis. Data preparation includes editing, coding, and data entry and is the activity that ensures the accuracy of the data and their conversion from raw form to reduced and classified forms that are more appropriate for analysis. Preparing a descriptive statistical summary is another preliminary step leading to an understanding of the collected data

Citation preview

Page 1: DATA PREPARATION AND ANALYSIS

Unit – 4

DATA PREPARATION AND ANALYSIS

Page 2: DATA PREPARATION AND ANALYSIS

Data Preparation

The data collected from the respondents is generally not in the form to be analyzed directly. After the responses are recorded or received, the next stage is that of preparation of data i.e. to make the data amenable for appropriate analysis.

Data preparation includes editing, coding, and data entry and is the activity that ensures the accuracy of the data and their conversion from raw form to reduced and classified forms that are more appropriate for analysis. Preparing a descriptive statistical summary is another preliminary step leading to an understanding of the collected data

Page 3: DATA PREPARATION AND ANALYSIS

Data Preparation

Editing Coding Validation of data Data entry Classification Tabulation

Page 4: DATA PREPARATION AND ANALYSIS

EDITING

The customary first step in analysis is to edit the raw data. Editing detects errors and omissions, corrects them when possible, and certifies that maximum data quality standards are achieved. The editor's purpose is to guarantee that data are:  1. Accurate. 2. Consistent with the intent of the question and other

information in the survey. 3. Uniformly entered. 4. Complete. 5. Arranged to simplify coding and tabulation

Page 5: DATA PREPARATION AND ANALYSIS

Editing

Field Editing

Central Editing

In large projects, field editing review is a responsibility of the field supervisor. It, should be done soon after the data have been gathered. During the stress of data collection in a personal interview and paper-and-pencil recording in an observation, the researcher often uses ad hoc abbreviations special symbols. Soon after the interview, experiment, or observation, the investigator should review the reporting forms

Page 6: DATA PREPARATION AND ANALYSIS

Central Editing

It should take place when all forms or schedules have been completed and returned to the office.

This type of editing implies that all forms should get a thorough editing by a single editor in a small study and by a team of editors in case of a large inquiry.

Editor(s) may correct the obvious errors such as an entry in the wrong place, entry recorded in months when it should have been recorded in weeks, and the like. In case of inappropriate on missing replies, the editor can sometimes determine the proper answer by reviewing the other information in the schedule. At times, the respondent can be contacted for clarification.

Page 7: DATA PREPARATION AND ANALYSIS

Be familiar with instructions given to interviewers and coders.

Do not destroy, erase, or make illegible the original entry by the interviewer;

Original entries should remain legible.

Make all editing entries on an instrument in some distinctive color and in a standardized form.

Initial all answers changed or supplied.

Place initials and date of editing on each instrument completed.

Page 8: DATA PREPARATION AND ANALYSIS

CODING

Coding refers to the process of assigning numerals or other symbols to answers so that responses can be put into a limited number of categories or classes.

Numeric coding simplifies the researcher's task in converting a nominal variable, like gender, to a "dummy variable,". Statistical software also can use alphanumeric codes, as when we use M and F, or other letters, in combination with numbers and symbols for gender.

Page 9: DATA PREPARATION AND ANALYSIS

CODING

Coding involves assigning numbers or other symbols to answers so that the responses can be grouped into a limited number of categories.

In coding, categories are the partitions of a data set of a given variable (e.g., if the variable is gender, the partitions are male and female).

Both closed- and open-response questions must be coded.

Page 10: DATA PREPARATION AND ANALYSIS

Some examples of Pre coded questions

Questions Answers CodesHow often these days do you go to the cinema?

More than once a weekOnce a weekOnce a fortnightThree or four times a yearLess oftenNever

123456

Which type(s) of wristwatch do you own? Hand – wound Automatic Electronic

123

Which battery – operated equipment do you have at home?

TorchTransistorOther (specify)

123

Page 11: DATA PREPARATION AND ANALYSIS

VALIDATION OF DATAAfter the data is coded, it is validated for data entry

errors. The data is then used for further analysis. The purpose of validating the data is that it has been collected as per the specifications in the prescribed format or questionnaire.

For example, if the respondent is asked to rate a particular aspect on 1 to 7, then the obvious responses should be 1 or 2 ….., or 7. Any other inputted number is not considered as valid. In validation of the data, the above data will be restricted to the integers between 1 and 7. This minimizes the errors. The other validations are age within a number like 100, dates such as birth dates, joining dates, etc should not be future dates etc.

Page 12: DATA PREPARATION AND ANALYSIS

CLASSIFICATION

Data having a common characteristic are placed in one class and in this way the entire data get divided into a number of groups or classes. Classification can be one of the following two types, depending upon the nature of the phenomenon involved:

Classification according to attributes: As stated above, data are classified on the basis of common characteristics which can either be descriptive (such as literacy, sex, honesty, etc.) or numerical (such as weight, height, income, etc.). Classification according to class-intervals: Data relating to income, production, age, weight, etc. come under this category. Such data are known as statistics of variables and are classified on the basis of class intervals. For instance, persons whose incomes, say, are within Rs 201 to Rs 400 can form one group, those whose incomes are within Rs 401 to s 600 can form another group and so on

Page 13: DATA PREPARATION AND ANALYSIS

TABULATION

Tabulation is the process of summarizing raw data and displaying the same in compact form (i.e., in the form of statistical tables) for further analysis. In a broader sense, tabulation is an orderly arrangement of data in columns and rows.

Tabulation is essential because of the following reasons. 1. It conserves space and reduces explanatory and

descriptive statement to a minimum. 2. It facilitates the process of comparison. 3. It facilitates the summation of items and the detection of

errors and omissions. 4. It provides a basis for various statistical computations.

Page 14: DATA PREPARATION AND ANALYSIS

Generally accepted principles of tabulation:

1. Every table should have a clear, concise and adequate title so as to make the table intelligible without reference to the text and this title should always be placed just above the body of the table.

2. Every table should be given a distinct number to facilitate easy reference.

3. The column headings (captions) and the row headings (stubs) of the table should be clear and brief.

4. The units of measurement under each heading or sub-heading must always be indicated.

5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the table, along with the reference symbols used in the table.

 

Page 15: DATA PREPARATION AND ANALYSIS

6. Source or sources from where the data in the table have been obtained must be indicated just below the table.

7. Usually the columns are separated from one another by lines which make the table more readable and attractive.

8. The columns may be numbered to facilitate reference.

9. Decimal points and (+) or (-) signs should be in perfect alignment.

10. Abbreviations should be avoided to the extent possible and ditto marks should not be used in the table.

 

Page 16: DATA PREPARATION AND ANALYSIS

DATA ENTRY  

Data entry converts information gathered by secondary or primary methods to a medium for viewing and manipulation.

Keyboarding remains a mainstay for researchers who need to create a data file immediately and store it in a minimal space on a variety of media.

However, researchers have profited from more efficient ways of speeding up the research process, especially from bar coding and optical character and mark recognition.

Page 17: DATA PREPARATION AND ANALYSIS

TYPES OF DATA ANALYSIS

•Qualitative Data Analysis Techniques

•Quantitative Data Analysis Techniques

Page 18: DATA PREPARATION AND ANALYSIS

Quantitative Research

Measurability: Quantitative data is measurable. For example size of the market, rate of product usage.

Features:

1. Data collected is numerical in nature.2. Data collection methods are

a. Mail Questionnaireb. Personal Interviewc. Telephonic Interview

Characteristic:

3. Sample sixe used is very large.4. Structured questionnaire is used for data collection.

Page 19: DATA PREPARATION AND ANALYSIS

Qualitative Research

Measurability: Not possible or difficult to measure.

Features:

1. It is a kind of exploratory research

Characteristic:

2. Sample size used is usually small.3. Unstructured questionnaire is used for data collection.

There are four major techniques in qualitative research. They are

a. Depth Interviewb. Delphi Techniquesc. Focus Groupd. Projective Techniques

Page 20: DATA PREPARATION AND ANALYSIS

Basis of Difference

Qualitative Data Analysis Quantitative Data Analysis

Focus Understand and Interpret Describe, explain and Predict

Sample Design Non – Probability, Purposive Probability

Interpretation It relies on interpretation and logic.

Qualitative researchers present their analyses using text and arguments.

This analysis relies on STATISTICS.

Quantitative research use graphs and tables to present their analysis.

Page 21: DATA PREPARATION AND ANALYSIS

Basis of Difference

Qualitative Data Analysis Quantitative Data Analysis

Procedures and Rules

Qualitative analysis has no set of rules, but rather guidelines are there to support the analysis.

Quantitative analysis follows agreed upon standardised procedures and rules.

Occurrence This analysis occurs simultaneously with data collection.

Quantitative analysis occurs after data collection is finished.

Methodology Qualitative analysis may vary methods depending on the situations.

Methods of Quantitative analysis are determined in advance as part of the study design.

Page 22: DATA PREPARATION AND ANALYSIS

Reliability Qualitative analysis is validity, but is less reliability or consistent. They have a corresponding weakness in their ability to compare variables in different conditions.

Their reliability is easy to establish and that they generally involve sophisticated comparisons of variables in different conditions.

Questions Open – ended questions and probing yield detailed information that illuminates nuances and highlights divers it.

Specific questions obtain predetermined responses to standardised questions.

Information Provide more information on the application of the program a specific context to a specific population.

More likely provides information on the broad application of the program.

Suitability More suitable when time and resources are limited.

Relies on more extensive interviewing.

Page 23: DATA PREPARATION AND ANALYSIS

BIVARIATE CORRELATION ANALYSIS

Bivariate Statistical Techniques

Linear Correlation

Simple Regression

Two - way ANOVA

Bivariate analysis refers to simultaneous analysis of two variables. It is usually undertaken to see if one variable, such as gender is related to another variable, perhaps attitudes toward male/female equality.

Page 24: DATA PREPARATION AND ANALYSIS

Pearson’s correlation coefficient ‘r’ measures the direction and the strength of the linear association between two numerical paired variables in a bivariate correlation analysis.

The Pearson (product moment) correlation coefficient varies over a range of + 1 through 0 to -1. The designation r symbolizes the coefficient's estimate of linear association based on sampling data

Page 25: DATA PREPARATION AND ANALYSIS

LINEAR CORRELATIONThe correlation between two variables is said to be linear if

corresponding to a unit change in the value of one variable there is a constant change in the value of the other variable i.e. incase of linear correlation the relation between the variables x and y is of the type.

y = a + bx

Y = Dependent VariableX = Independent variable

Where a and b are constants which determine that the line is completed.

If a = 0, the relation become y = bx.In such cases the values of the variables are in constant ratio.

Page 26: DATA PREPARATION AND ANALYSIS

Non - Linear (Curvilinear) Correlation:The correlation between two variables is said to be non - linear

(curvilinear) if corresponding to a unit change in the value of one variable does not change at a constant rate but at fluctuating rate of other variable.

SIMPLE REGRESSIONThe dictionary term of the term ‘regression’ is the act of returning

or going back. The term ‘regression’ was first used by Sir Francis Galton in 1877 while studying the relationship between the heights of father and sons.

Regression equation of Y on X ∑Y = Na + b∑X∑XY = a∑X + ∑X2

Regression equation of X on Y ∑X = Na + b∑Y∑XY = a∑Y + ∑Y2

Page 27: DATA PREPARATION AND ANALYSIS

ANOVA

Prof. R.A.Fisher was the first man who use the term ‘VARIANCE’. Later Professor Snedecor and many others contributed to the development of this technique.

ANOVA is essentially a procedure for testing the difference among different groups of data for homogeneity.

Page 28: DATA PREPARATION AND ANALYSIS

“The essence of ANOVA is that the total amount of variation in a set of data is broken into two types, that amount which can be attributed to specified causes.”

There may be variation between samples and also within sample items.

Through ANOVA one can investigate the number of factors which are hypothesized or said to influence the dependent variable.

Page 29: DATA PREPARATION AND ANALYSIS

Two way ANOVA involves only two categorical variables or factors and examines the effect of these two factors on the dependent variable.

For example, the sales of Hyundai Verna car may be attributed to different salesmen and different states.

It examines the interaction between the different levels of these two factors. Similarly, the production of a particular product in a factory may be attributed to the different types of machines as well as the different grades of executives.

Page 30: DATA PREPARATION AND ANALYSIS

Procedure for Two – Way ANOVA

1. Identify dependent and independent variables.2. Partition (decomposition) of total variation.3. Calculate variations4. Calculate degree of freedom5. Calculate mean square6. Calculate F statistic or F ratio7. Determine level of significance8. Interpret the results.

Page 31: DATA PREPARATION AND ANALYSIS

Multivariate Data Analysis

Multivariate data analysis refers to any statistical technique used to analyze data that arises from more than one dependable variable.

Most of the applied and behavioural researches, we generally resort to multivariate analysis.

Multivariate analysis methods are typically used for

- Consumer and market research

- Quality control and quality assurance across a range of industries such as food & beverages, paint, pharmaceuticals, chemicals and telecommunications.

- Process optimization & process control

- Research and development

Page 32: DATA PREPARATION AND ANALYSIS

Multivariate techniques transforms a mass of observations into a smaller number of composite scores in such a way that they may reflect as much information as possible contained in the raw data obtained concerning a research study.

The contribution of this techniques is arranging a large amount of complex information in the real data into a simplified visible form.

Page 33: DATA PREPARATION AND ANALYSIS

Multivariate methods

Dependence Methods Interdependence Methods

Use Multiple Regressions

Multivariate analysis of variance

Use Multiple Discriminant

Analysis

Canonical analysis

Factor Analysis

Cluster Analysis

Multi Dimensional

Scaling

How many variables are dependent?

One Many

Is it Metric Are they Metric

YES NO YES NO

Page 34: DATA PREPARATION AND ANALYSIS

Regression Analysis

Regression analysis is a statistical process for estimating the relationships among variables.

Regression analysis helps one understand how the typical value of the dependent variable (or 'Criterion Variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed.

Page 35: DATA PREPARATION AND ANALYSIS

Uses of Regression Analysis

It is the most widely used techniques for and forecasting.

Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships.  In restricted circumstances, regression analysis can be used to infer Casual relationships between the independent and dependent variables.

Page 36: DATA PREPARATION AND ANALYSIS

In multiple regression analysis there are three or more variables say X1, X2 and X3.

We now take X1 as the dependent variable and try to find out its relative movement for movements in both X2 and X3, which are independent variables.

Thus in multiple regression analysis the effect of two or more independent variables on one dependent variables is studied.

Page 37: DATA PREPARATION AND ANALYSIS

Discriminant Analysis

The discriminant analysis aims at studying the effect of two or more predictor variables on certain evaluation criterion.

The evaluation criterion is categorizd into two groups, they may be good or bad, like or dislike, successful or unsuccessful, etc.

For Example:

•While grouping investment alternatives based on return, the criterion of the rate of return will be categorized into ‘good’ or ‘bad’.

•While grouping products by consumers in terms of their flavour, the criterion will be ‘like’ or ‘dislike’.

•While grouping the performance of an employee after training programme, the criterion will be ‘above expected level’ and ‘below expected level’.

Page 38: DATA PREPARATION AND ANALYSIS

Factor Analysis

Factor analysis is a technique used to study interrelationship among many variables.

The main purpose of factor analysis is to group large set of variable factors into fewer factors.

Each factor will account for one or more component. Each factor is a combination of many variables.

Page 39: DATA PREPARATION AND ANALYSIS

Factor Analysis

Factor Analysis is an interdependence technique. i.e. the variables are not classified as independent or dependent variable but their interrelationship is studied.

Factor analysis is used to draw inferences on unobservable quantities such as intelligence, musical ability, patriotism, consumer attitudes, that cannot be measured directly.

The goal of factor analysis is to describe correlations between p measured traits in terms of variation in few underlying and unobservable factors.

Page 40: DATA PREPARATION AND ANALYSIS

Factor Analysis

Factor analysis is a method of investigating whether a number of variables of interest are linearly related to small number of unavoidable factors. The observed variables are modeled as linear combinations of the factors, plus “error” terms.

Page 41: DATA PREPARATION AND ANALYSIS

Basic principle of Factor Analysis

1. They starts with the large number of variables.

2. Find minimum number of underlying factors (principal components) that together account for the pattern of inter-correlations among observed variables.

3. Variables must be coded in similar ways

Page 42: DATA PREPARATION AND ANALYSIS

Purpose of Factor Analysis

1. Identify underlying Dimensions, or Factors: Factor analysis strives to identify underlying dimensions or factors, that explain the correlation among a set of variables.

2. Identify a New Smaller set of Uncorrelated Variables: One of the prime objective of factor analysis is to identify a new, smaller set of uncorrelated variables to replace the original set of correlated variables in subsequent multivariate analysis (regression or discriminant analysis)

3. Identify a smaller set of salient variables: The purpose of factor analysis is to identify a smaller set of salient variables from a larger set for use in subsequent multivariate analysis

Page 43: DATA PREPARATION AND ANALYSIS

Interpreting results of Factor Analysis

Interpretation is facilitated by identifying the variables that have large loadings on the same factor.

That factor can then be interpreted in terms of the variables that load high on it.

Plot the variables using the factor loadings as co-ordinates.

Variables that have high loadings describe the factors.

If a factor cannot be clearly defined in terms of the original variables, it should be labeled as an undefined or a general factor.

Page 44: DATA PREPARATION AND ANALYSIS

Methods of Factor Analysis

There are two most commonly employed procedure of factor analysis are

•Principal Component Analysis (PCA)

•Common Factor Analysis (CFA)

Page 45: DATA PREPARATION AND ANALYSIS

Principal Component Analysis

When the objective is to summarise information from a large set of variables into fewer factors, Principle Component factor analysis is used.

It is a technique for forming set of new variables that are linear combinations of the original set of variables, and are uncorrelated. The new variables are called Principal component.

These variables are fewer in number as compared to the original variables, but they extract most of the information provided by the original variables.

Page 46: DATA PREPARATION AND ANALYSIS

Common Factor Analysis

If the researcher wants to analyse the components of the main factor, common factor analysis is used.

It is a statistical approach that is used to analyze interrelationships among a large number of variables (indicators) and to explain these variables (indicators) in terms of few unobservable constructs (factors). In fact these factors impact the variables, and are reflective indicators.

Helps in assessing the images of a company/enterprise, Attitudes of sales personnel and customers.

Page 47: DATA PREPARATION AND ANALYSIS

Factor Analysis - ExamplePurpose: Customer feedback about a two wheeler manufactured by a company.

Method: The Marketing Research Manager prepares a questionnaire to study the customer feedback. The researcher has identified six variables or factor for this purpose. They are as follows.

1. Fuel efficiency (A)2. Durability of life (B)3. Comfort (C)4. Spare parts availability (D)5. Breakdown frequency (E)6. Price (F)

Page 48: DATA PREPARATION AND ANALYSIS

The application of factor analysis has led to grouping the variables as follows.

A,B,D,E into - factor – 1F into - factor – 2C into - factor – 3

Factor – 1 can be termed as Technical factorFactor – 2 can be termed as Price factorFactor – 3 can be termed as Personal factor

Page 49: DATA PREPARATION AND ANALYSIS

For future analysis, while conducting a study to obtain customer’s opinion, three factors mentioned above would be sufficient.

The basic purpose of using factor analysis is to reduce the number of independent variables in the study. Too many independent variables, the M.R. study will suffer from following disadvantage.

1. Time for data collection is very high due to several independent variables.

2. Expenditure increases due to the time factor.3. Computation time is more, resulting in delay