Regression Part I

I271B Quantitative Methods

Next Week: Reading and Evaluating Research Suggest readings using regression, bivariate

statistics, etc

Course Review May 5

Exam distributed May 7 in class No lecture

Correlation makes no assumption about one whether one variable is dependent on the other– only a measure of general association

Regression attempts to describe a dependent nature of one or more explanatory variables on a single dependent variable. Assumes one-way causal link between X and Y.

Thus, correlation is a measure of the strength of a relationship -1 to 1, while regression is a more precise description of a linear relationship (e.g., the specific slope which is the change in Y given a change in X)

But, correlation is still a part of regression: the square of the correlation coefficient (R2) becomes an expression of how much of Y’s variance is explained by X.

So...what happens if B is negative?

The distance between each data point and the line of best fit is squared. All of the squared distances are added together.

Adapted from Myers, Gamst and Guarino 2006

For any Y and X, there is one and only one line of best fit. The least squares regression equation minimizes the possible error between our observed values of Y and our predicted values of Y (often called y-hat).

8Adapted from Myers, Gamst and Guarino 2006

There are two regression lines for any bivariate regression

Regression to the mean (regression effect) Appears whenever there is spread around the

SD line (1 SD increase in X r x SDy)

Regression fallacy Attempting to explain the regression effect

through another mechanism

We obtain a sample statistic, b, which estimates the population parameter.

b is the coefficient of the X variable (e.g., how much of a change in the predicted Y is associated with a change in X)

We also have the standard error for b, indicated by e.

We can use a standard t-distribution with n-2 degrees of freedom for hypothesis testing.

YYii = b = b0 0 + b+ b11xxii + e + eii..

Error = actual – predicted

The root-mean-square (r.m.s.) error is how far typical points are above or below the regression line.

Prediction errors are called residuals. The average of the residuals is = 0. The S.D. of the residuals is the same as the

r.m.s. of the regression line11

Predicted Y = constant + (coefficient * Value of X)

For example, suppose we are examining Education (X) in years and Income (Y) in thousands of dollars▪ Our constant is 10,000▪ Our coefficient for X is 5

OLS regression assumes that the variance of the error term is constant. If the error does not have a constant variance, then it is heteroskedastic (literally, “different scatter”).

Where it comes from Error may really change as an X increases Measurement error Underspecified model

Consequences We still get unbiased parameter estimates, but our line may not be the

best fit. Why? Because OLS gives more ‘weight’ to the cases that might actually

have the most error from the predicted line.

Detecting it We have to look at the residuals (difference between observed

responses from the predicted responses)

First, use a residual versus fitted values plot (in STATA, rvfplot) or the residuals versus predicted values plot, which is a plot of the residuals versus one of the independent variables.▪ We should see an even band across the 0 point (the line), indicating that our

error is roughly equal.

If we are still concerned, we can run a test such as the Breusch-Pagan/Cook-Weisberg Test for Heteroskedasticity. It tests the null hypothesis that the error variances are all EQUAL, and the alternative hypothesis that there is some difference. Thus, if it is significant then we reject the null hypothesis and we have a problem of heteroskedasticity.

Perhaps other variables better predict Y?

If you are still interested in current X, you can run a robust regression which will adjust the model to account for heteroskedasticity. Robust regression modifies the estimates

for our standard errors and thus our t-tests for coefficients.

http://www.math.csusb.edu/faculty/stanton/m262/regress/

Regress.do

GSS96_small.dta

Regression Part I

Documents

Logistic Regression - Part II

Regression Part II

Multiple Regression Analysis: Part 2

Chapter 8: Linear Regression—Part A

Statistics I - Introduction to ANOVA, Regression, And Logistic Regression

Regression Analysis Part B Calculation Procedures

Econ 422 – Lecture Notes Part I - University Of Marylandeconweb.umd.edu/~chao/Teaching/Econ422/Linear Regression...Econ 422 – Lecture Notes Part I (These notes are slightly modified

Multiple Regression Last Part – Model Reduction 1

Nonparametric Estimation: Part I - Stanford Universityyjhan/nonparametric regression... · 2015-11-14 · November 13, 2015 (Pray for Paris) Outline 1 Nonparametric regression problem

Lecture 11: Regression Methods I (Linear Regression)math.arizona.edu/~hzhang/math574m/2017Lect11_lm.pdf · Lecture 11: Regression Methods I (Linear Regression) 7/40. Linear Model

Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman

Multiple Regression II: Signi kanztests, Gewichtung ... · Kausalit at und Regression Multiple Regression I Bivariate Regression mit zus atzlichen unabh angigen Variablen I Unabh

Regression in R Part I : Simple Linear Regressionmath.arizona.edu/~hzhang/waeso/reg_1.pdf · Preliminaries Introduction Simple Linear Regression Resources References Upcoming Questions

Chapter 8: Simple Linear Regression...Hypothesis Tests in Simple Linear Regression I An important part of assessing the adequacy of a linear regression model is testing statistical

Multiple Regression Analysis: Part 1

Chapter 4 Multiple Regression: Part One

Lecture 20 correlation and regression part 1

Part II Linear Regressionlipas.uwasa.fi/~sjp/Teaching/ams/lectures/amsc2.pdf · 2019. 11. 7. · Simple Regression Multiple Regression Part II Linear Regression As of Nov 7, 2019

Introduction to Regression in R Part II: Multivariate ...scc.stat.ucla.edu/page_attachments/0000/0140/reg_2.pdf · Introduction to Regression in R Part II: Multivariate Linear Regression

Regression models I