prev

next

of 66

View

59Download

0

Tags:

Embed Size (px)

DESCRIPTION

Multivariate Methods. LIR 832. Multivariate Methods: Topics of the Day. A. Isolating Interventions in a multi-causal world B.Multivariate probability Distributions C.The Building Block: covariance D.The Next Step: Correlation. A Multivariate World. - PowerPoint PPT Presentation

Multivariate Methods

LIR 832

Multivariate Methods: Topics of the DayA. Isolating Interventions in a multi-causal worldB.Multivariate probability DistributionsC.The Building Block: covarianceD.The Next Step: Correlation

A Multivariate WorldIsolating Interventions in a Multi-Causal WorldA. Example of problem: Evaluate a program to reduce absences from a plant?Is there age discrimination?B. Types of data ExperimentalQuasi-experimentalNon-experimentalC. Need multivariate analysis to sort out causal relationships.

Bi-Variate Relations: A First Run at Multivariate MethodsA. Many of the issues we are interested in are essentially about the relationship between two variables.B. Bi-variate can be generalized to multivariate relationshipsC. We learn bi-variate formally and make more intuitive reference to multivariate.D. What do we mean by bi-variate relationship?

Bi-Variate ExampleOur firm, has formed teams of engineers, accountants and general managers at all plants to work on several issues that are considered important in the firm. The firm has long been committed to gender diversity and we are interested in the distribution of gender among our managerial classifications. We are particularly concerned about the distribution of gender on these teams and particularly among engineers. Consider the distribution of two statistics about these three person teams. a. gender of the team members (X: x = number of men)b. is the engineer a woman (Y: 0 = man, 1 = woman)

Bi-Variate Example (cont.)

Bi-Variate Example (cont.)

Bi-Variate Example (cont.)

Bi-Variate Example (cont.)

Bi-Variate Example (cont.)We can also use this information to build conditional probabilities: What is the likelihood that the engineer is a woman, given that we have a man on the team?

Bi-Variate Example (cont.)What is the likelihood that the engineer is a woman, given that we have a man on the team?P(Y = 1 & X = 1|X= 1) = P(Y = 1 & X = 1)/P(X= 1) = (2/8) / (3/8) = 2/3Note: P(Y= 1|X=2) is:the probability that Y is equal to 1 given that X = 2" or the probability that Y = 1 conditional on X = 2"

Bi-Variate Example (cont.)What is the likelihood that there is only one man, given the engineer is a woman?P(Y = 1 & X = 1|Y= 1) = P(Y = 1 & X = 1)/P(Y= 1) = (2/8)/(4/8) = 2/4 =1/2

Bi-Variate Example (cont.)What is the likelihood that the engineer is a woman? P(Y= 1) = 1/2But if we know that there are two men, we can improve our estimate:P(Y=1 |X=2)= P(Y=1 & X=2|X=2)= P(Y=1 &X=2) / P(X=2)= 1/8 / 3/8 = 1/3What about calculating the likelihood of two men given the engineer is a woman?

Example: Gender Distribution

Example: Gender DistributionWorking with Conditional Probability:

P(female) = 50.91%

P(female| LRHR) = p(Female & LRHR)/P(LRHR) = 0.36/0.55 = 65%

P(LRHR) = 0.55%

P(LRHR|Female) = p(lrhr & female)/p(female) = .36/50.91 = .70%

Independence DefinedNow that we know a bit about bi-variate relationships, we can define what it means, in a statistical sense, for two events to be independent.If events are independent, then Their conditional probability is equal to their unconditional probabilityThe probability of the two independent events occurring is P(X)*P(Y) = P(X,Y).

Importance of IndependenceWhy is independence important? If events are independent, then we are getting unique information from each data point. If events are not independent, thenA practical example on running a survey on employee satisfaction within an establishment.

Example: Employee Satisfaction

CovarianceCovariance: Building Block of Multi-variate AnalysisAll very nice, but what we are looking for is a means of expressing and measuring the strength of association of two variables.How closely do they move together?Is variable A a good predictor of variable B?Move to a slightly more complex world, no more 2 and three category variables

Example: Age and Income Data

Example:Age and Income Data

Example: Age and Income Data

Example: Age and Income Data

__________________________________________________________________Descriptive Statistics: age, annual income

Variable N Mean Median StDev SE Meanage 23 24.565 23.000 4.251 0.886annual I 23 17174 10000 15712 3276

Variable Minimum Maximum Q1 Q3age 22.000 42.000 22.000 26.000annual I 0 65000 7000 25000_________________________________________________________________

Example:Age and Income Data

Example:Age and Income DataAdding some info to the graph

Covariance and Correlation DefinedDefine Covariance and Correlation for a random sample of data:Let our data be composed of pairs of data (Xi,Yi) where X has mean mx and Y has mean my. Then the covariance, the co-movement around their means, is defined as:

Example: CovarianceWe observe the relationship between the number of employees at work at a plant and the output for five days in a row:AttendanceOutput840328220639428What is the covariance of attendance and output?

Example: Covariance (cont.)The covariance is positive. This suggests that when attendance is above its mean, output is also above its mean. Similarly, when attendance is below its mean, output is below its mean.

Example: Overtime Hours and Productivity

Example: Overtime Hours and Productivity

Example: Overtime Hours and ProductivityCovariances: prod-avg, week

prod-avg weekprod-avg 113.7292week -49.5667 22.6667

Example: Overtime Hours and Productivity (cont.)

Example: Overtime Hours and Productivity (cont.)

Example: Overtime Hours and Productivity (cont.)Covariances: prod-avg, week, week-hours

prod-avg week week-hoursprod-avg 233.3345week -51.8706 21.3986week-hours -89.0777 0.0000 99.3069

Example: Overtime Hours and Productivity (cont.)

Example: Overtime Hours and Productivity (cont.)

Example: Overtime Hours and Productivity (cont.)

Correlation vs. CovarianceA limitation of covariance is that it is difficult to interpret. Its units are not well defined.Thus, we need a measure which is more readily interpreted and tells about the strength of association.Correlation:Population Correlation is Defined as:

Correlation = 1.00

Correlation = 0.94

Correlation = 0.604

Correlation = 0.198

Correlation: Previous Examples

Correlation: Previous Examples

Correlation:Previous Examples

Correlation: Previous ExamplesOvertime-Productivity:Limit to 5 days, 10 hours:

Correlations: prod-avg, week, week-hours

prod-avg weekweek -0.734 0.000

week-hours -0.585 0.000 0.000 1.000

Correlations:Previous Examples

Example: Correlation

Example: Correlation

Example: CorrelationWhat about some real data: Relationship between age gender and weekly earnings among human resource managers (admin associated occupations)?

Example: CorrelationDescriptive Statistics: Female, age, weekearn

Variable N N* Mean Median TrMean StDevFemale 55158 0 0.50471 1.00000 0.50524 0.49998age 55158 0 42.357 42.000 42.103 11.662weekearn 47576 7582 894.53 769.23 846.16 562.22

Variable SE Mean Minimum Maximum Q1 Q3Female 0.00213 0.00000 1.00000 0.00000 1.00000age 0.050 15.000 90.000 33.000 51.000weekearn 2.58 0.01 2884.61 519.00 1153.00

Example: CorrelationTabulated Statistics: Female Rows: Female weekearn weekearn Mean StDevMale1085.4 622.1Female 727.2 440.5 All 894.5 562.2

Example: CorrelationTabulated Statistics: Female

Rows: Female weekearn age weekearn age Mean Mean StDev StDev male 1085.4 43.256 622.1 11.856female 727.2 41.475 440.5 11.399all 894.5 42.357 562.2 11.662

Example: CorrelationCovariances: age, weekearn, Female

age weekearn Femaleage 135.99weekearn 1119.24 316094.42Female -0.45 -89.17 0.25

Example: CorrelationCorrelations: age, Female, weekearn

age FemaleFemale -0.076 0.000weekearn 0.174 -0.318

Example: Correlation

Example: Non-Linearity

Correlation and CovarianceSo covariance and correlation are measures of linear association, but not measures of association in general (or of non-linear association).

Correlation and CovarianceWhat if we do not have data on individuals but data on distributions? Example, we have plant level data but plants vary widely in employment. We want to give greater weight to plants with more employees.

Correlation and Covariance

Correlation and Covariance

Correlation and Covariance

Correlation and Covariance

Correlation and Covariance

********