1 Advanced Topics in Regression Quantile Regression Analysis of Causality Mediation Analysis...

Preview:

Citation preview

1

Advanced Topics in Regression

Quantile Regression Analysis of Causality Mediation Analysis Hierarchical Linear Modeling

Compiled by Nick Evangelopoulos, 2013

2

Part 1: Quantile Regression

3

Motivation for Quantile Regression

ProblemANOVA and regression provide information only about the

conditional mean.More knowledge about the distribution of the statistic may

be important.The covariates may shift not only the location or scale of the

distribution, they may affect the shape as well.Solution

Quantile regression models the relationship between X and the conditional quantiles of Y given X = x

Quantile Definition

• Definition: Given p ∈ [0, 1]. A pth quantile of a random variable Z is any number ζp such that Pr(Z< ζ p ) ≤ p ≤ Pr(Z ≤ ζ p ). The solution always exists, but need not be unique.Ex: Suppose Z={3, 4, 7, 9, 9, 11, 17, 21} and p=0.5 then Pr(Z<9) = 3/8 ≤ 1/2 ≤ Pr(Z ≤ 9) = 5/8

So, the 50th percentile is equal to 9

Quantile Regression• A family of conditional quantiles of Y given X=x.• The median regression line is also the OLS

regression line. The other quantile functions are solutions to a set of linear programming problems

x

Y

90%

75%

50%

25%

10%

Quantile Regression

Daily High Temperature

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50

Yesterday

To

day

A scatter of daily high temperature in Sydney. The red line is the 45-degree line

Quantile Regression

5 10 15 20

20

40

60

80Cool Yesterday (n=259)

Temperature Today

Freq

uenc

y75

1

X 1

18.47.6 X 0

Quantile Regression

15 20 25 30 35 40 45

20

40

60

80Hot Yesterday (n=259)

Temperature Today

Freq

uenc

y61

6

X 1

42.5514 X 0

Quantile RegressionQuantiles at .9, .75, .5, .25, and .10. Given yesterday’s temperature, today’s temperature has an expected distribution which is non-symmetrical

Temperature Quantiles

0

10

20

30

40

50

60

5 15 25 35 45

Yesterday

To

da

y

Quantile RegressionEstimation

• The quantile regression coefficients are the solution to

• The k first order conditions are

)1(xyxysgnpn

1min

n

1i

'ii

'ii2

121

)2(0xˆxysgn2

1

2

1p

n

1 n

1iip

'ii

Quantile RegressionCoefficient Interpretation

• The marginal change in the Θth conditional quantile due to a marginal change in the jth element of x. There is no guarantee that the ith person will remain in the same quantile after her x is changed.

ij

ii

x

x|yQ

Quantile RegressionBibliography

• Koenker and Hullock (2001), “Quantile Regression,” Journal of Economic Perspectives, Vol. 15, Pps. 143-156.

• Buchinsky (1998), “Recent Advances in Quantile Regression Models”, Journal of Human Resources, Vo. 33, Pps. 88-126.

• www.econ.uiuc.edu/~roger

• http://Lib.stat.cmu.edu/R/CRAN

Quantile Regression in SAS

Optional Reading:Colin (Lin) Chen, An Introduction to Quantile Regression and the QUANTREG Procedure, SUGI30, Paper 213-30

14

Part 2: Analysis of Causality

For more information: BUSI 6280 The material presented here is based on a paper by

Josef Brüderl (University of Mannheim, Germany)

Get more at http://dilbert.com/strips/

Panel Data Methods for analysis of causality exploit a data structure of

multi-dimensional longitudinal data, which is typically described in the statistics and econometrics literature as Panel Data

Panel data is defined as a combination of cross-section data, where data on one or more variables are collected at the same point in time, and time-series data, where data are collected at regular time intervals.

Analysis of panel data will be performed using the TSCREG procedure in the statistical package SAS (Allison 2005; Mohd Nor & Maarof 2007) and the xtreg procedure in the statistical package Stata (Brüderl 2005).

References Allison, P.D. (2005). Fixed Effects Regression Methods for Longitudinal Data

Using SAS. SAS Press. Brüderl, J. (2005). Panel Data Analysis. University of Mannheim,

http://www2.sowi.uni-mannheim.de/lsssm/veranst/Panelanalyse.pdf (accessed October 15, 2012)

Mohd Nor, A. H. S., & Maarof, F. (2007). “Panel Data Analysis Using SAS”. Proceedings of the 21st Annual SAS Malaysia Forum, 5th September 2007, Kuala Lumpur.

Halaby, C. (2004). Panel Models in Sociological Research. Annual Review of Sociology, 30: 507-544.

Wooldridge, J. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press.

Wooldridge, J. (2003). Introductory Econometrics: A Modern Approach. Thomson. Chapters 13, 14.

Baron and Kenny (1986)

18

Part 3: Mediation Analysis

For more information: BUSI 6280, EPSY 6270 The material presented here is based on Wikipedia

Mediation Models Mediation is a hypothesized causal chain in which one

variable affects a second variable that, in turn, affects a third variable. The intervening variable, M, is the mediator. It “mediates” the relationship between a predictor, X, and an outcome Y.a and b: direct effects of X on M and M on Y, resp.c’: direct effect of X on Y after accounting for M

X YMa b

c’

Baron and Kenny steps The Baron and Kenny (1986) approach is not the best, but

many researchers are still using it STEP 1: Conduct a simple regression analysis with X

predicting Y to test for path c alonec is the direct effect of X on Y, without taking into

account M. This is not the same as c’ on the previous slide!

X YM

c

Baron and Kenny steps STEP 2: Conduct a simple regression analysis with X

predicting M to test the significance of path a alone

X YMa

Baron and Kenny steps STEP 3: Conduct a simple regression analysis with M

predicting Y to test the significance of path b alone The purpose of Steps 1-3 is to establish that zero-order

relationships among the variables exist. If one or more of these relationships are non-significant, researchers usually conclude that mediation is not possible or likely

Assuming there are significant relationships from Steps 1 through 3, proceed to Step 4.

X YMb

Baron and Kenny steps STEP 4: Conduct a multiple regression analysis with X and

M predicting Y In Step 4, some form of mediation is supported if the effect of

M (path b) remains significant after controlling for X. If X is no longer significant when M is controlled, the finding supports full mediation. If X is still significant, the finding supports partial mediation.

X YMb

c’

Sobel steps STEP 1: Conduct a multiple regression analysis with X and

M predicting Y: Y = b0 + b1X + b2M + e

STEP 2: Conduct a simple regression analysis with X predicting M: M = b3 + b4X + u

STEP 3: Compute the indirect effect as bindirect = (b2)(b4) Significance is best determined using bootstrapping

X YMa

X YMb

c’

SEM approach The Structural Equation Modeling (SEM) approach is

considered the best for testing mediation effects. In SEM, a single mediation model is tested.

Full mediation and partial mediation models can be compared by fitting both as alternative models. The model with the highest fit statistics is the more appropriate

X YMa b

c’

X YMa b

Full mediation Partial mediation

References Baron, R.M. & Kenny, D.A. (1986). The Moderator-

Mediator variable distinction in Social Psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182.

MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. Mahwah, NJ: Erlbaum.

Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. In S. Leinhardt (Ed.), Sociological Methodology (pp. 290-312). Washington DC: American Sociological Association.

27

Part 4: Hierarchical Linear Modeling

For more information: BUSI 6480, EPSY 6230(EPSY offered at the UNT College of Education)

Multilevel Models Multilevel models are particularly appropriate for research

designs where the data for participants is organized at more than one level

Analysis of Covariance (ANCOVA) include nested designsIndividuals nested within groupsCompanies nested within industries

Recommended