27
The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e I

nfo

rmati

on

Sch

ool

of

the U

niv

ers

ity o

f W

ash

ing

ton

LIS 570

Session 7.1Bivariate Data Analysis

Page 2: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 2

Objectives

• Reinforce concept of standard error and the standard normal distribution (basis of confidence level and confidence interval)

• Understand different approaches to the analysis of bivariate data

• Gain confidence in use of SPSS

Page 3: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 3

Agenda• Review Central Limit Theorem • Visualization of “confidence

interval” and “confidence level”• Overview of bivariate analysis

approaches• Exploratory data analysis using

SPSS

Page 4: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 4

Shapes of distributionNormal distribution:symmetrical Bell-shapedcurve

Positively skewed:tail on the right, cluster towards low end of the variable

Negatively skewed:tail on the left, cluster towards high-end of the variable

sym

metr

ical

Bimodality: A double peak

asym

metr

ical

Page 5: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 5

Central Limit Theorem

The CLT states: regardless of the shape of the population distribution, as the number of samples (N) becomes very large (approaches infinity) the distribution of the sample mean ( m ) is normally distributed, with a mean of µ and standard deviation of σ/(√N).

Page 6: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 6

Standard Error of the Mean

Standard error of the mean (Sm)

Sm = N

– Standard error is inversely related to square root of sample size

– To reduce standard error, increase sample size– Standard error is directly related to standard

deviation – When N = 1, standard error is equal to

standard deviation

Standard deviationTotal number in the sample

SS

Page 7: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 7

Inferential statistics - univariate analysis

Interval estimates and interval variables• Estimation of sample mean accuracy—

based on random sampling and probability theoryStandardize the sample mean to estimate

population mean:t = sample mean – population mean

estimated SE

Population mean = sample mean + t * (estimated SE)

Page 8: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 8

Exercise—sampling distribution

• Coin tossing • Probability of head or tails—50%• Each of you is a “sample” for this

activity.• Flip the coin 9 times, count the #

of times you get a “head”.

Live demo: http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html

Page 9: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 9

Standard Error(for nominal & ordinal

data)Variable must have only two

categories(could combine categories to achieve

this)

SB = PQ

NStandard error for binominal distribution

P = the % in one category of the variableQ = the % in the other category of the variable

Total number in the sample

Page 10: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 10

Choosing the Statistical Technique*

Specific research question or hypothesis

Determine # of variables in question

Univariate analysis Bivariate analysis Multivariate analysis

Determine level of measurement of variables

Choose univariate method of analysis

Choose relevantdescriptive statistics

Choose relevantinferential statistics

* Source: De Vaus, D.A. (1991) Surveys in Social Research. Third edition. North Sydney, Australia: Allen & Unwin Pty Ltd., p133

Page 11: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 11

Methods of analysis (De Vaus, 134)

Univariate methods

Bivariate methods

Frequency distributions Cross tabulations

Scattergrams

Regression

Correlation

Comparison of means

Page 12: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 12

Association

• Example: gender and voting– Are gender and party supported

associated (related)?– Are gender and party supported

independent (unrelated)?– Are women more likely than men to

vote republican?Are men more likely to vote democrat?

Page 13: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 13

Association

Association in bivariate data means that certain values of one variable tend to occur more often with some values of the second variable than with other variables of that variable (Moore p.242)

Page 14: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 14

Cross Tabulation Tables

• Designate the X variable and the Y variable

• Place the values of X across the table• Draw a column for each X value • Place the values of Y down the table• Draw a row for each Y value• Insert frequencies into each CELL• Compute totals (MARGINALS) for each

column and row

Page 15: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 15

Determining if a Relationship Exists

• Compute percentages for each value of X (down each column)– Base = marginal for each column

• Read the table by comparing values of X for each value of Y– Read table across each row

• Terminology – strong/ weak; positive/ negative; linear/

curvilinear

Page 16: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 16

Cross tabulation tables

White collar Freq %

Blue collar Freq %

Total

Democrat 270 27% 810 81% 1080

Republican 730 73% 190 19% 920

Totals 1000 100% 1000 100% 2000

Calculatepercent

ReadTable

(De Vaus pp 158-160)

Occupation

Vote

Page 17: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 17

Cross tabulation• Use column percentages and

compare these across the table• Where there is a difference this

indicates some association

Page 18: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 18

Describing association

Direction Strength

Nature

Positive - Negative

Strong - Weak

Linear - Curvilinear

Page 19: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 19

Describing association

Two variables are positively associated when larger values of one tend to be accompanied by larger values of the other

The variables are negatively associated when larger values of one tend to be accompanied by smaller values of the other

(Moore, p. 254)

Page 20: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 20

Describing association

Scattergram or scatterplotGraph that can be used to show how two

interval level variables are related to one another

weight

Age

Variable A

Variable B

Page 21: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 21

Description of Scattergrams

– Strength of Relationship• Strong• Moderate• Low

– Linearity of Relationship• Linear• Curvilinear

– Direction• Positive• Negative

Page 22: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 22

Description of scatterplots

Strength and direction

Y

X X

X X

Y

Y Y

Page 23: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 23

Description of scatterplots

Strength and direction

Nature

X

X X

X

Y

Y Y

Y

Page 24: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 24

Correlation• Correlation coefficient—number

used to describe the strength and direction of association between variables

• Very strong = .80 through 1• Moderately strong = .60 through .79• Moderate = .50 through .59• Moderately weak = .30 through .49

• Very weak to no relationship 0 to .29 -1.00Perfect Negative Correlation

0.00 No relationship

1.00 Perfect PositiveCorrelation

Page 25: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 25

Correlation Coefficients

– Nominal• Phi• Cramer’s V

– Ordinal (linear)• Gamma

– Nominal and Interval• Eta

http://www.nyu.edu/its/socsci/Docs/correlate.html

Page 26: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 26

Correlation: Pearson’s r– Interval and/or ratio variables– Pearson product moment

coefficient (r)• two interval variables, normally

distributed • assumes a linear relationship• Can be any number from

– 0 to -1 : 0 to 1 (+1)• Sign (+ or -) shows direction• Number shows strength• Linearity cannot be determined from the

coefficiente.g.: r = .8913

Page 27: The Information School of the University of Washington LIS 570 Session 7.1 Bivariate Data Analysis

Th

e In

form

atio

n S

cho

ol

of t

he U

nive

rsity

of

Was

hing

ton

LIS 570 Univariate Analysis

Mason; p. 27

Summary• Bivariate analysis• crosstabulation

– X - columns– Y - rows

• calculate percentages for columns• read percentages across the rows to observe

association

• Correlation and scattergram: describe strength and direction of association