20
Fundamentals of Data Fundamentals of Data Analysis Analysis

Fundamentals of Data Analysis

Embed Size (px)

DESCRIPTION

Fundamentals of Data Analysis. Four Types of Data. Alphabetical / Categorical / Nominal data: Information falls only in certain categories, not in-between categories No inferences possible between groups except that one group may contain more / less observations than the other - PowerPoint PPT Presentation

Citation preview

Page 1: Fundamentals of Data Analysis

Fundamentals of Data Fundamentals of Data AnalysisAnalysis

Page 2: Fundamentals of Data Analysis

Four Types of Data

• Alphabetical / Categorical / Nominal data:– Information falls only in certain

categories, not in-between categories– No inferences possible between groups

except that one group may contain more / less observations than the other

– Only reporting frequencies, percentages and mode makes sense (descriptive statistics)

– Chi Square measure of Association (inferential Statistics)

– Examples: gender, age groups, income groups, etc.

Page 3: Fundamentals of Data Analysis

Four Types of data

• Rank order data:– Ranked according to some logic, e.g.

preference, etc.– Again an in-between rank does not make

sense.– Difference between say rank 1 and 2 need

not necessarily be of the same magnitude as the difference between rank 3 and 4.

– Only reporting frequencies, percentages and mode makes sense (descriptive statistics); Spearman Rho coefficient of correlation (Inferential statistics)

– Examples: brand preferences, class rank on test, etc.

Page 4: Fundamentals of Data Analysis

Four Types of data

• Interval Level– Numerical data in which the numbers denote

the amount of presence / absence of a trait.– zero point does not necessarily mean

complete absence of the trait– In-between numbers make sense– Magnitude of difference between numbers on

the scale is constant.– All descriptive and inferential statistics

possible– Examples: attitude, satisfaction,

temperature, etc.

Page 5: Fundamentals of Data Analysis

Four Types of data

• Ratio level data– Interval level data with a meaningful zero point

meaning complete absence of the trait– Magnitude of the difference between numbers

of the scale is constant AND the zero point denotes complete absence of the trait being measured.

– All descriptive and inferential statistics possible– Examples: sales, profits, weight, height, etc.

Page 6: Fundamentals of Data Analysis

Type of data?

Age in yearsAge in years Recall order of brandsRecall order of brands

Age groupsAge groups Ad. costsAd. costs

IncomeIncome Number of students in Number of students in various classesvarious classes

Income groupsIncome groups TimeTime

NameName Test gradesTest grades

SAT scoresSAT scores Number of players in a teamNumber of players in a team

Attitude to brandAttitude to brand Number of students in WUNumber of students in WU

Number of ads recalledNumber of ads recalled CaloriesCalories

Page 7: Fundamentals of Data Analysis

Preparing the Data for Analysis

• Data editing – the process of identifying omissions, ambiguities and errors in the responses

• Coding – process of assigning numerical values to responses according to a pre-defined system

• Statistically adjusting the data – the process of modifying the data to enhance its quality for analysis

– Weighting, transformations, variable re-specification

Page 8: Fundamentals of Data Analysis

Preparing the Data for Analysis

Problems Identified With Data Editing

• Omissions – some unanswered questions

• Ambiguity – illegible response, choosing two boxes when only one has to be chosen

• Inconsistencies – logically inconsistent response

• Lack of Cooperation – checking the same response regardless of the question

• Ineligible Respondent – ignoring a filter question

Page 9: Fundamentals of Data Analysis

Preparing the Data for Analysis

• Solutions to such problems– Contact the respondent again and make

corrections– Throw out the whole questionnaire as unusable– Disregard questions with missing values in the

analysis– Code illegible or missing responses as ‘don’t

know’– Compute missing values on the basis of means

Page 10: Fundamentals of Data Analysis

Preparing the Data for Analysis

Coding

• closed-ended questions

– Relatively simple and straightforward

• open-ended questions

– Define all possible responses and categorize each response and then assign a numerical code

– If judgment calls are needed then have several coders do the same task and check inter-coder reliability

Page 11: Fundamentals of Data Analysis

Statistical adjustment of data

• Weighting – – process of enhancing / reducing the importance

of certain data by assigning a number– Usually done to increase the representativeness

of the sample or achieve study objectives– E.g. a sports drink survey would weigh younger

respondents higher than older respondents

• Scale transformations– Manipulation of scales to make them comparable

with other scales e.g. converting lbs to kgs. etc.– Z-scores (standardized scales)

Page 12: Fundamentals of Data Analysis

Preparing the Data for Analysis

• Variable Re-specification– Existing data modified to create new

variables

– Large number of variables collapsed into fewer variables

– Creates variables that are consistent with research questions

• Determine if the variable is categorical, rank-order, interval level or ratio level.

Page 13: Fundamentals of Data Analysis

Categorical Data Analysis - Objectives

• Describing the sample distribution for the variable (e.g. gender)

• Frequencies, percentages, quartiles, percentiles, graphs (bar, line, histogram, pie)

• What are the typical characteristics of the sample?• Mode

• Does the categorical variable bear any relationship with a distribution of another categorical variable (e.g. gender w.r.t. buy the product or not)

• Cross tabs and chi-square as a measure of association

Page 14: Fundamentals of Data Analysis

Cross tabulations – example – buyers by age

Under 18 yrs.

19-24 yrs. 25-34 yrs. Total for sample

First time buyers

14% 12.5% 6.6% 11.1%

Brand loyals

21.9% 20% 14.5% 18.9%

Switchers 50% 53% 60% 60%

Never bought

14.1% 14.5% 18.9% 10%

100% 100% 100% 100%Distribution of customer types by age: If there were no differences between age groups, then each age group’s distribution would have matched the distribution for the total sample.

Page 15: Fundamentals of Data Analysis

Crosstabs - conclusions

• The 25-34 yrs. Group is least likely to be first time buyers than the sample average

• The under 18 year group is more likely to be a brand loyal than the sample average

Page 16: Fundamentals of Data Analysis

Rank order data analysis - Objectives

• What are respondent preferences amongst several competing alternatives? (e.g. rank your preferences amongst ten different brands of cars)– Frequencies, Percentages, Graphs

• What is the typical preference pattern in the sample (e.g. which car does the sample prefer the most and which one the least?)– Mode

Page 17: Fundamentals of Data Analysis

Rank order data analysis - Objectives

• Are two sets of respondent preferences correlated? (e.g. wrist watches brand preferences with car brand preferences)– Spearman’s rank correlation coefficient

Page 18: Fundamentals of Data Analysis

Interval level / Ratio level data analysis - Objectives• What is the average response in the

sample (e.g. what is the mean attitude to the brand?)– Mean / Median

• What is the average variability of the response in the sample (e.g. On an average, how dispersed are the sample’s attitudes to the brand from the mean?)– Standard deviation

Page 19: Fundamentals of Data Analysis

Interval level / Ratio level data analysis - Objectives• Do two or more subgroups in the sample

differ from each other on the response / differ from a previously known / hypothesized value

• E.g. do males like the brand significantly more than the females? (t tests, z tests)

• E.g. Does attitude to WU vary by student status (freshman, sophomore, junior, senior)– ANOVA

Page 20: Fundamentals of Data Analysis

Interval level / Ratio level data analysis - Objectives• Are sample responses on two variables

correlated? (e.g. are sales related to the advertising expenditure?)– Pearson correlation

• Can we determine the value of the sample’s response on a variable, if we know the value on another variable? (e.g. If we need to achieve 1 million dollars in sales next year, how much should we spend on advertising?)– Regression analysis