Download pdf - STAT3012 Typed Lecture Notes - StudentVIP

STAT3012 TYPED LECTURE NOTES

STAT3012 Typed Lecture Notes

1

APPLIED LINEAR MODELS STAT3012 NOTES Contents Introduction ............................................................................................................................................ 7

Assessments ........................................................................................................................................ 7

Outline ................................................................................................................................................ 7

Outcomes ............................................................................................................................................ 8

Rough Weekly Outline ........................................................................................................................ 9

Background ......................................................................................................................................... 9

Roll of statistics ............................................................................................................................... 9

Unit aims ....................................................................................................................................... 10

‘Linear Models’.................................................................................................................................. 10

Note on ‘Linear Model’ ................................................................................................................. 10

Why linear models? ...................................................................................................................... 10

Course Synopsis ................................................................................................................................ 11

Lecture structure ........................................................................................................................... 11

Obtaining Data ...................................................................................................................................... 11

Experiments vs Observational studies .............................................................................................. 11

Definitions ..................................................................................................................................... 11

Experiments .................................................................................................................................. 12

Observational Studies ................................................................................................................... 12

Key message: ................................................................................................................................. 14

Statistical Principles in the Design of Experiments ............................................................................... 15

Theory: .............................................................................................................................................. 15

Terminology Definitions ................................................................................................................ 15

Common Experimental design problems: ..................................................................................... 17

Intro to R studio .................................................................................................................................... 18

Basic theory:...................................................................................................................................... 19

Expressions and assignments ........................................................................................................ 19

Sweave .......................................................................................................................................... 19

Simple Linear Regression ...................................................................................................................... 19

Assumed Knowledge: ........................................................................................................................ 20

Correlation Coefficient .................................................................................................................. 20


2

Data and model ................................................................................................................................. 21

Components: ................................................................................................................................. 21

Parameter Estimation: ...................................................................................................................... 24

Least Squares Estimates (LSEs) ..................................................................................................... 24

Method of Maximum Liklihood .................................................................................................... 26

Correlation Coefficient and Regression Slope .............................................................................. 27

Estimating the Error Variance: ...................................................................................................... 28

Explaining the variability of the 𝑌’s .............................................................................................. 30

Diagnostics and Inference in Regression .............................................................................................. 30

Model Diagnostics: ............................................................................................................................ 31

Assessing assumptions .................................................................................................................. 31

Q-Q plots ....................................................................................................................................... 32

Inference for a Linear Regression Model .......................................................................................... 35

Theorem: Distribution of 𝒄𝑻𝒀 ...................................................................................................... 35

Distribution of LSEs (least squares estimator) .............................................................................. 38

Inference and Prediction in simple linear regression ....................................................................... 39

Sampling Distribution of 𝛽1 .......................................................................................................... 39

Inference for the error variance 𝜎2 .............................................................................................. 41

Prediction and estimation in SLR .......................................................................................................... 42

Prediction of 𝑌|𝑥0 ............................................................................................................................. 42

Prediction vs estimation of mean response ................................................................................. 43

Sampling distribution of mean response ...................................................................................... 43

Multiple Regression .............................................................................................................................. 46

Multiple regression Theory – Data ................................................................................................... 47

Principle of least squares estimation ............................................................................................ 47

Polynomial regression: .................................................................................................................. 48

Goodness of Fit (GoF) criteria: ...................................................................................................... 52

Predicated value ........................................................................................................................... 57

Matrix approach to multiple regression ........................................................................................... 60

Matrix formulation of the linear model ........................................................................................ 60

Leverage and Cook’s Distance .......................................................................................................... 64

Outlying points in ℝ𝑝 .................................................................................................................... 64

Extensions to Regression Modelling: .................................................................................................... 72

Theory: ANOVA/ANCOVA ect ........................................................................................................... 72

Three treatments: ......................................................................................................................... 73

General F test /Multicollinearity ........................................................................................................... 76


3

General 𝐹 test ................................................................................................................................... 76

Classical approach ......................................................................................................................... 78

Variable selection: Backward and forward ........................................................................................... 83

Motivation: ....................................................................................................................................... 84

Possible subsets: ........................................................................................................................... 84

The linear regression model 𝑚 ..................................................................................................... 84

Automated variable selection algorithms ......................................................................................... 85

Steps .............................................................................................................................................. 85

Backward variable selection ......................................................................................................... 85

Forward variable selection ............................................................................................................ 90

Stepwise AIC and BIC ........................................................................................................................ 92

Theory: Stepwise forward variable selection ............................................................................... 93

Theory: More goodness of fit criteria ........................................................................................... 96

Polynomial Regression ........................................................................................................................ 103

Theory: ............................................................................................................................................ 104

Polynomial Regression Model..................................................................................................... 104

Collinearity .................................................................................................................................. 107

Robust Regression ............................................................................................................................... 111

References and further reading: ..................................................................................................... 111

Theory: ............................................................................................................................................ 112

𝑥 and 𝑦 outliers ........................................................................................................................... 112

Theory: ........................................................................................................................................ 114

Alternatives to LS and L1: ............................................................................................................ 116

Resistant and efficient regression ............................................................................................... 118

One way ANOVA ................................................................................................................................. 122

One way ANOVA: ............................................................................................................................ 122

Scope: .......................................................................................................................................... 122

ANOVA Model ................................................................................................................................. 123

Model equation ........................................................................................................................... 123

More 1 way ANOVA ........................................................................................................................ 129

Structure of ANOVA table for single treatment model............................................................... 130

Distribution of Treatment sum of squares (TSS) ......................................................................... 130

Contrasts: .................................................................................................................................... 131

Multiple Comparisons ......................................................................................................................... 135

Keeping the 𝛼 Error ......................................................................................................................... 135

Example: 3 CI’s ............................................................................................................................ 135


4

Data snooping ................................................................................................................................. 136

Tukey’s Confidence Intervals (Honest Significance difference) .................................................. 138

Bonferroni Cis.............................................................................................................................. 139

Scheffe simultaneous CI .............................................................................................................. 140

Conclusion: Multiple testing ........................................................................................................... 141

Quantitative factors ............................................................................................................................ 141

Factor or Numerical Variable? ........................................................................................................ 142

Example : drug levels .................................................................................................................. 142

Polynomial regression ..................................................................................................................... 142

Polynomial regression equivalent to ANOVA ............................................................................. 142

Nesting of linear effects .............................................................................................................. 142

2 way ANOVA ...................................................................................................................................... 148

2 way analysis of variance ............................................................................................................... 149

Additive factor model ................................................................................................................. 149

Main effects model for 2 factors ................................................................................................ 149

Estimation: .................................................................................................................................. 150

More 2 way ANOVA ........................................................................................................................ 154

Recall: Decomposing TSS given 𝑛𝑖𝑗 = 𝑟 ...................................................................................... 154

Test for interaction effects: ........................................................................................................ 155

Mean response/ interaction plot ................................................................................................ 156

Assessing Normality ............................................................................................................................ 161

Assessing normality ........................................................................................................................ 161

Data and testing problems: ......................................................................................................... 161

Pearson’s chi- squared test ......................................................................................................... 162

Kolmogorov-Smirnov (KS) test .................................................................................................... 164

How to get correct p values? ...................................................................................................... 169

Introduction to the Design of Experiments ........................................................................................ 172

Origins: RA Fisher ............................................................................................................................ 172

Randomised Design: ....................................................................................................................... 172

Example: 𝑡 = 2; 𝑛1 = 2; 𝑛2 = 3 ................................................................................................ 173

Randomised complete block design (RCBD) ............................................................................... 173

Simple design for comparing one factor ......................................................................................... 174

Completely randomised design .................................................................................................. 174

Randomised complete block design (RCBD) ....................................................................................... 179

Recall: .............................................................................................................................................. 179

2 way ANOVA for complete dock design ........................................................................................ 179


5

Assumptions: ............................................................................................................................... 179

Construction of RCBD’s ............................................................................................................... 180

2 way ANOVA table ......................................................................................................................... 182

Example: omninbus ..................................................................................................................... 182

Pairwise differences: ................................................................................................................... 184

Latin square design ............................................................................................................................. 185

Motivation: ..................................................................................................................................... 186

Definition of standard 𝑡2 latin square design (LSD) ................................................................... 186

Cyclic permutation in LSD ........................................................................................................... 187

Linear regression model for LSD ................................................................................................. 187

Analysis of LSD ................................................................................................................................ 187

3 way ANOVA for LSD.................................................................................................................. 188

Error variance in LSD ................................................................................................................... 190

Treatment contrasts ................................................................................................................... 191

Revisiting design of experiments ........................................................................................................ 193

Concepts: ........................................................................................................................................ 193

Previous work: ............................................................................................................................ 193

Experimental unit and observational unit ...................................................................................... 193

Example: lady tasting tea ............................................................................................................ 193

Example: tomatoes: .................................................................................................................... 194

Relationship between experimental unit (EU) and observational unit (OU) .................................. 194

Mathematical formulation .......................................................................................................... 194

Blocking ........................................................................................................................................... 194

Example: weed control ............................................................................................................... 195

How to block? ............................................................................................................................. 195

Ideal vs reality ............................................................................................................................. 197

Nested factors ..................................................................................................................................... 197

Concepts: ........................................................................................................................................ 198

Example: scientists in labs........................................................................................................... 198

Modelling with nested factors: ................................................................................................... 199

Nested Design ..................................................................................................................................... 204

Concepts ......................................................................................................................................... 204

Example: Calf feeding.................................................................................................................. 205

Pseudo-replication .......................................................................................................................... 206

Technical & biological replication ............................................................................................... 206

More on Split-plot designs .......................................................................................................... 210


6

Incomplete Block Design ..................................................................................................................... 211

Incomplete block designs ................................................................................................................ 212

Example: Potato yield ................................................................................................................. 212

Different types of sum of squares............................................................................................... 213

Balanced incomplete block design (BIBD) .................................................................................. 213

Analysis of Covariance (ANCOVA) ....................................................................................................... 218

Example: Optimal fish meal for ducklings ................................................................................... 219

ANCOVA .......................................................................................................................................... 220

Example: Fish meal ..................................................................................................................... 221

Common slope model ................................................................................................................. 221

ANCOVA: Treatment contrasts ................................................................................................... 222

Linear models for ANCOVA ......................................................................................................... 223

Random Effects Model ........................................................................................................................ 225

Example: Sodium content in beer ............................................................................................... 225

One way ANOVA model 2 ............................................................................................................... 226

Example: Beer sodium content ................................................................................................... 227

Differences between one-way ANOVA model 1 and 2 ............................................................... 228

Linear Mixed Models .......................................................................................................................... 230

Linear Mixed Model ........................................................................................................................ 230

Mixed model equations (MME) .................................................................................................. 231

Performance test: random effect model .................................................................................... 234

Some notes on linear mixed models: .......................................................................................... 237

Variance Component Estimation ........................................................................................................ 238

Concepts: ........................................................................................................................................ 239

Methods of moments estimate of variance components .............................................................. 239

Maximum likelihood estimate of variance components ............................................................ 240

Residual (restricted) likelihood ................................................................................................... 240

Longitudinal Data ................................................................................................................................ 244

Repeated measures and longitudinal data ..................................................................................... 246

Example: Sleep data .................................................................................................................... 246

Agricultural Data ................................................................................................................................. 251

Example: Split plot experiment for bean yield............................................................................ 252

Example: Pedigree information .................................................................................................. 255

More thoughts on covariance structure ......................................................................................... 255

Hierarchical Data ................................................................................................................................. 256

Nested vs cross random effects ...................................................................................................... 257


7

Small area estimation ................................................................................................................. 257

Area level model ......................................................................................................................... 258

Unit Level model: ........................................................................................................................ 260

Final Thoughts: ............................................................................................................................ 260

Revision Lecture and Exam Information ............................................................................................. 261

Exam info: ....................................................................................................................................... 261

Summary: ........................................................................................................................................ 261

Multiple Linear Regression: ........................................................................................................ 261

ANOVA and experimental Design ............................................................................................... 261

Linear Mixed Model .................................................................................................................... 262

Lecture 1.

Introduction Michael Stewart

Carslaw 818

Assessments - Week 04: Wed, 28/03/18 a quiz in place of the lecture

- Week 10: Fri, 18/05/17 a quiz in place of the lecture

- Week 13: no computer lab

Outline The main objective of this course is to introduce the fundamental concepts of analysis of data from

both observational studies and experimental designs using classical linear methods, together with

the teaching of concepts of collection of data and design of experiments. Additional objectives are to

gain competency in the application and understanding of linear models and regression methods with

diagnostics for checking appropriateness of models; to be introduced to robust regression methods;

to be introduced to the design and analysis of experiments and to further understand the notions of


8

replication, randomisation and ideas of factorial designs; to enhance proficiency in the use of the R

statistical package to give analyses and graphical displays.

Outcomes - Proficiency in the use of the general F-test as the main tool to choose between two nested

regression models

- proficiency in assessing model assumptions and outlier detection in regression models

through standard diagnostic plots (box plot, scatterplot, Q-Q-plot, Cook’s distance plot,

leverage vs residual plot), through influence measures (leverage values, Cook’s distance) and

through tests (Bartlett test against homoscedasticiy and normality tests)

- proficiency in the understanding and application of multiple linear regression and in the

understanding of R2 and the adjusted R2

- proficiency in the understanding and application of 1-way ANOVA models of type I and II,

including finding an interpretation of the TSS term through using the concept of orthogonal

contrasts and making inference on all parameters

- proficiency in the understanding and application of 2-way ANOVA models of type I and

making inference on all parameters

- proficiency in the calculation and decomposition of sum of squares terms in multi-way

ANOVA for orthogonal designs

- competency in correcting multiple pairwise comparisons by applying the Tukey, Scheffé

and Bonferroni correction

- competency in deriving the least-squares estimator in linear regression

- competency in the calculation and interpretation of confidence intervals for all parameters

in linear regression

- competency in the understanding of the difference between confidence intervals and

prediction intervals

- competency in model selection through using the F-test, t-test, AIC or BIC through full

searches or by using step-wise procedures (backward, forward, stepwise)

- competency of polynomial regression models and their selection through using orthogonal

polynomials • competency in using the R function lmer for the fitting of mixed models and a

basic understanding of these complicated models

- competency in reducing a nominal factor in a multi-way ANOVA to a continuous variable

through using linear contrast coefficients

- competency in calculating the distribution for contrasts and using this to calculate

confidence intervals for contrasts

- competency in the design of an appropriate scheme for treatment allocation and data

collection as well as the correct analysis for complete randomised designs (CBD),

randomised CBD (RCBD), Latin square designs (LSD), incomplete block designs (IBD) and

balanced IBD (BIBD), ANCOVAs, and nested designs

- competency in the understanding of blocks, nested factors, interactions terms and

confounding in experimental designs

- competency in using R to compute estimates and standard errors for regression parameters

without built-in functions such as lm and aov, for generating treatment allocation lists for

the CBD, RCBD and LSD.

- basic understanding of L1 regression, M regression and MM regression 3


9

- advanced stream students will additionally have competency in theoretical aspects of

regression methods, in particular the Gauss-Markov theorem and appreciation of the Ftest;

if time permits, partial correlation coefficients will be taught and in that case a level of

competency should be reached

Rough Weekly Outline 1. Experimental designs, observational studies, software R, simple linear regression

2. Model diagnostics, inference for linear regression, fitting multiple linear regression models

3. Inference for multiple regression models, multiple correlation coefficients, Leverage and

Cook’s distance, the general F-test

4. Subset selection using stepwise procedures and AIC, Cp and BIC

5. Polynomial regression, orthogonal polynomials, Robust regression, 1-way ANOVA

6. Simultaneous CIs, decomposing sums of squares

7. Quantitative factors, 2-way ANOVA, interactions

8. 2-way ANOVA with interactions, Normality tests, experimental designs

9. Randomized complete block designs, Latin square designs, incomplete block designs

10. Analysis of covariance, nested factors

11. Revisiting Experimental Design, nested designs, random effect model

12. Variance component estimation, mixed effects models, longitudinal data

13. Agricultural data, hierarchical data, revision

5/03/2018

Background - The scientific method is about getting knowledge based on (hard) evidence, which involves

the following steps:

o Formulate question

o Collect relevant data

o Do statistical analysis of data

o Draw conclusions

Example – abundance of bird species

- What habitat characteristics explain the diversity of bird species?

- Field study: collect relevant information (ask Ecologists / Biologists)

- Calculate correlation coefficients, look at scatter plots, use multiple linear regression.

- The best statistical model is based on characteristic x, y and z.

Roll of statistics - Design an appropriate scheme for data collection;

- Perform a suitable statistical analysis;

- Derive valid conclusions.

Applied Linear Models will help you do all these things.


10

Unit aims The overall aim of this unit is to develop skills in the statistical analysis of data from designed

experiments and observational studies.

Specific learning objectives are

- Understanding the fundamentals of good design in experiments and studies;

- Understanding the elements of a linear model;

- Ability to develop and apply linear models for data from real-world experiments and studies;

- Further proficiency with a statistical computer package for linear modelling

Note: destinction between observational studies and experiments

‘Linear Models’

Note on ‘Linear Model’ It’s clear that 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 is linear

But what if 𝑥 is not linear?

- Suppose you have data on a response variable 𝑦 (e.g. blood pressure) and an explanatory

variable 𝑥 (e.g. a measurement of cholesterol).

- Want to model the relationship between the mean value of 𝑦, and 𝑥.

- Might use a ‘simple’ linear regression model:

𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝛾𝑥𝑖 + 𝜖𝑖

You might say this is ‘quadratic regression’, but this is linear between 𝑦 and the parameters (usually

𝛽’s)

Eg

𝐸[𝑌] = 𝛽0 + 𝛽1𝑥2

As:

𝐸[𝑌] = (1 𝑥2) (𝛽0

𝛽1)

- As we know 𝑥 (that’s from the data), it’s the 𝛽’s that we actually don’t know and have to

estimate

Why linear models? - Linear models are easy to apply and interpret.

- The mathematical theory underlying linear models is very well understood.

- We can investigate the relationship between a response and lots of explanatory variables in

a straightforward manner.

- A linear model will often (but not always) provide an adequate approximation to reality


11

Course Synopsis - Introduction / overview

o Statistical experiments

o RStudio – R, TEXand html

o Simple linear regression - again!

- Multiple linear regression

- Analysis of variance and covariance (of experimental data)

- Analysis of experimental designs

o Repeated measures, nested factors, complete designs,

o Balanced incomplete designs, LSD, random effects, mixed models, etc

Lecture structure - Theme is shown after the lecture number

- Refresher key concepts from previous lecture(s)

- New concepts – motivation

- Several blocks of new material

o Theory : definitions, theorems and proofs

o Examples : by hand and with R

- Summary and outlook

Obtaining Data

3 types of data:

- Available data

- Experimental

- Observational studies

Experiments vs Observational studies

Definitions

Definition 1 (Experiment).

Something is done to people, animals or objects in order to observe the response.

Definition 2 (Observational study).

Individuals are observed and variables of interest are measured, but nothing is deliberately done to

the individuals to affect the response.

Example: Blood pressure

Ten volunteers have their blood pressure measured on day 1 and day 2 in a study. Describe a first

scenario in which these volunteers take part in an observational study and a second scenario where

they are part of an experiment.


12

- Observational study: the volunteers receive no specific instruction, i.e. their blood pressure

is measured together with other variables (amount of sleep, alcohol and food intake, level of

exercise etc).

- Experiment: each volunteer is assigned to a treatment group - 5h vs 8h sleep, with the

objective to different rates of change in blood pressure.

Lecture 2. Wednesday, 7 March 2018

Experiments

Theory

Terminology

- Individuals on which the experiment is done are called (experimental) units.

- Each unit is subjected to a specific experimental condition called a treatment.

- The treatment is determined by the combination of values (or levels) taken by the

explanatory variables (or factors).

Principle of well designed experiment:

1. Control: The effects of lurking variables on the response should be controlled.

E.g. effectiveness of drinking V: dosage, time of day, time since last meal

2. Randomization: Using impersonal chance to assign experimental units to treatments is

important for two reasons:

a. It removes the danger experimental bias;

E.g. Young people ask young people only, thus conclusion only true for

young people.

b. It allows the laws of probability to be applied in a straightforward fashion to the

results, and for conclusions to be interpreted in terms of causation.

3. Replication: This reduces chance variation in results. The larger the n the smaller the

standard error.

Observational Studies

Terminology

- The population is the entire group of individuals about which we want information.

- A sample is that part of the population that is examined in order to gather data.

Example: Birds of the High Paramo

- A paramo is an exposed, high plateau in the tropical parts of South America.

- In the northern Andes, there is a pattern of ìslands' of vegetation within the otherwise bare

paramo.

- An (observational) study conducted to investigate the bird life in this region.


13

- One question of interest : what characteristics of these islands (if any) affect the diversity of

bird species?

Investigation:

For each island of vegetation the following variables were recorded:

- number of species of bird present (𝑁)

- area of the island in square kilometers (𝐴𝑅),

- elevation in thousands of meters (𝐸𝐿),

- the distance from Ecuador in kilometers (𝐷𝐸𝑐)

- distance to the nearest other island in kilometers (𝐷𝑁𝐼).

Reference: Vuilleumier (1970), Ìnsular biogeography in continental regions. I. Thenorthern Andes of

South America', American Naturaliste, 104, 373-388.

Initial questions: observational study or experiment? experimental units? treatment? levels?

population? sample?


14

Example: stimulating effects of caffeine

Key message: Good design is importatnt

- Data can be obtained from a variety of sources.

- Importance of good design in experiments and observational studies cannot be overstated.

- Poorly designed studies can lead to data that cannot answer the scientific question at hand

no matter how cleverly they are analyzed. (otherwise GIGO: Garbage in, Garbage Out)


15

Statistical Principles in the Design of Experiments Several important facts:

- Statistically designed experiments are economical;

- They allow to measure the influence of one or several factors on a response;

- They allow the estimation of the magnitude of experimental error;

- Economical means achieve fixed type I&II error with smallest 𝑛 (lowest error for lowest

experimental units).

New concepts:

- Notion of factors, blocks, covariates, confounding variables, design layout, effects,

interactions, replications.

- Common problems in experimental designs: masking, under-powered or overpowered

studies.

Theory:

Terminology Definitions

Block

Group of homogeneous experimental units

- Eg: STAT3012 vs STAT3912

Confounding

One or more effects that cannot unambiguously be attributed to a single factor or interaction.

- Eg: murder rate and icecream consumption related by relationship of hot weather

Covariate:

Uncontrollable variable that influences the response but is unaffected by any other experimental

factors:

- Eg: age, health

Design (layout)

Complete specification of experimental test runs, including blocking, randomization, repeat tests,

replication, and the assignment of factor level combinations to experimental units.

Effect:

Change in the average response between two factor-level combination or between two

experimental conditions.

Factor.

A controllable experimental variable that is thought to influence the response.

- Sitting in the front vs sitting in the back.

- Each factor level combination is regarded as a different treatment

Interaction.


16

Existence of joint factor effects in which the effects of each factor depends on the levels of the other

factors.

Replication.

Repetition of an entire experiment or a portion of an experiment under two or more sets of

conditions.

Response.

Outcome or result of an experiment.

Unit (item).

Entity on which a measurement or an observation is made; sometimes refers to the actual

measurement or observation.

Example: agricultural experiment


17

Example: pipes

A test program was conducted to evaluate the quality of epoxy-glass-ber pipes taken from each of

two manufacturing plants. Each pipe was produced under normal or severe operating conditions and

at one of two water temperatures. The following test conditions constituted the experimental

protocol:

Common Experimental design problems: - Masking of factor effects: experimental variation masks factor effects.

- Uncontrolled factors: uncontrolled factors compromise experimental conclusions (too few

factors).

- Erroneous principles of efficiency lead to unnecessary waste or inconclusive results (too

many factors, too complex designs, e.g. gene arrays with p ≈ 30k but n ≤ 100).

- Scientific objectives for many-factor experiments may not be achieved with one-factor-at-a-

time designs


18

Example: masking problem

Two possible situations for an experimental factor with two levels (red/blue):

There is a clear effect (of the same size) but it can only be detected in case 2.

Intro to R studio New Concepts:

- Vectors put into data frames (flexible data structure).

- A data frame is a collection of column vectors each of the same length.

- The vectors may be numeric, factor, or whatever and each particular column of a data frame

is given a name (chosen by the user, or assigned a default by R).

- A data matrix in R is a collection of numeric vectors of the same length.

- An array in R is a collection of matrices of the same size.

- A list in R is a collection of different R objects


19

Basic theory:

Expressions and assignments

Sweave

Lecture 3. Friday, 9 March 2018

Simple Linear Regression

In statistics, simple linear regression is a linear regression model with a single explanatory variable.[1][2][3][4] That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the x and y coordinates in a Cartesian coordinate system) and finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variables. The adjective simplerefers to the fact that the outcome variable is related to a single predictor.

It is common to make the additional hypothesis that the ordinary least squares method should be used to minimize the residuals (vertical distances between the points of the data set and the fitted line). Under this hypothesis, the accuracy of a line through the sample points is measured by the sum of squared residuals, and the goal is to make this sum as small as possible. Other regression methods that can be used in place of ordinary least squares include least absolute deviations(minimizing the sum of absolute values of residuals) and the Theil–Sen estimator (which chooses a line whose slope is the median of the slopes determined by pairs of sample points). Deming regression (total least squares) also finds a line that fits a set of two-dimensional sample points, but (unlike ordinary least squares, least absolute deviations, and median slope regression) it is not really an instance of simple linear regression, because it does not separate the coordinates into one dependent and one independent variable and could potentially return a vertical line as its fit.

The remainder of the article assumes an ordinary least squares regression. In this case, the

slope of the fitted line is equal to the correlation between y and xcorrected by the ratio of

https://en.wikipedia.org/wiki/Statistics

https://en.wikipedia.org/wiki/Linear_regression

https://en.wikipedia.org/wiki/Covariate

https://en.wikipedia.org/wiki/Covariate

https://en.wikipedia.org/wiki/Simple_linear_regression#cite_note-1




https://en.wikipedia.org/wiki/Dependent_and_independent_variables

https://en.wikipedia.org/wiki/Dependent_and_independent_variables

https://en.wikipedia.org/wiki/Cartesian_coordinates

https://en.wikipedia.org/wiki/Cartesian_coordinates

https://en.wikipedia.org/wiki/Straight_line

https://en.wikipedia.org/wiki/Ordinary_least_squares

https://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics

https://en.wikipedia.org/wiki/Least_absolute_deviations

https://en.wikipedia.org/wiki/Least_absolute_deviations

https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator

https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator

https://en.wikipedia.org/wiki/Slope

https://en.wikipedia.org/wiki/Median

https://en.wikipedia.org/wiki/Deming_regression

https://en.wikipedia.org/wiki/Pearson_product_moment_correlation_coefficient


20

standard deviations of these variables. The intercept of the fitted line is such that it passes

through the center of mass (x, y) of the data points.

Assumed Knowledge: This topic is assumed knowledge and was already taught in STAT2x12, where the emphasis was on

the practical application of simple linear regression. In STAT3x12 this topic is revisited with the aim

to gain a better theoretical understanding

- Four assumptions for linear regression: errors are iid𝑁(0, 𝜎2).

- Least squares estimates for the parameters in the linear regression model.

- Pearson correlation (𝑟), 𝑟2, and their interpretation.

- Residuals estimate errors

Correlation Coefficient In statistics, the Pearson correlation coefficient (PCC, pronounced /ˈpɪərsən/), also referred to

as Pearson's r, the Pearson product-moment correlation coefficient(PPMCC) or

the bivariate correlation,[1] is a measure of the linear correlation between two variables X and Y.

It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear

correlation, and −1 is total negative linear correlation. It is widely used in the sciences. It was

developed by Karl Pearson from a related idea introduced by Francis Galton in the 1880s.[2][3][4]

In statistics, the correlation coefficient r measures the strength and

direction of a linear relationship between two variables on

a scatterplot. The value of r is always between +1 and –1. To

interpret its value, see which of the following values your

correlation r is closest to:

Exactly –1. A perfect downhill (negative) linear relationship

–0.70. A strong downhill (negative) linear relationship

–0.50. A moderate downhill (negative) relationship

–0.30. A weak downhill (negative) linear relationship

0. No linear relationship

+0.30. A weak uphill (positive) linear relationship

+0.50. A moderate uphill (positive) relationship

+0.70. A strong uphill (positive) linear relationship

Exactly +1. A perfect uphill (positive) linear relationship

https://en.wikipedia.org/wiki/Statistics

https://en.wikipedia.org/wiki/Help:IPA/English

https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#cite_note-1

https://en.wikipedia.org/wiki/Correlation

https://en.wikipedia.org/wiki/Karl_Pearson

https://en.wikipedia.org/wiki/Francis_Galton




http://www.dummies.com/education/math/statistics/statistics-for-dummies-cheat-sheet/

http://www.dummies.com/education/math/statistics/how-to-interpret-a-scatterplot/


21

𝑟 =∑ (𝑥𝑖 − �̅�)(𝑦𝑖 − �̅�)𝑛

𝑖=1

√∑ (𝑥𝑖 − �̅�)2𝑛𝑖=1 √∑ (𝑦𝑖 − �̅�)2𝑛

𝑖=1

Data and model Simple linear regression seeks to model the relationship between

- the mean of a response variable, 𝑌 , and

- a single explanatory variable (or predictor/covariate) 𝑥.

For data (𝑥1, 𝑌1), … , (𝑥𝑛, 𝑌𝑛), where 𝑥1, … , 𝑥𝑛 (lower case) are known constants and 𝑌𝑖 = 𝑦𝑖 are the

observed random responses, we formulate the simple linear regression model as

𝑌𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝜖𝑖

Components: There are two new components to the RHS of equation 1

- 𝛽0,1 are the regression parameters

- 𝜖1,…,𝑛 are error terms, satisfying

Error assumptions

1. 𝐸[𝜖𝑖] = 0, for 𝑖 = 1, . . , 𝑛

2. 𝜖𝑖’s are independent

3. 𝑉𝑎𝑟(𝜖𝑖) = 𝜎2 (homoscedasticity assumption)

4. 𝜖𝑖 normally distributied


22

Also described as:

𝜖𝑖 ∼ 𝑁𝐼𝐷(0, 𝜎2); 𝑖 = 1, , … , 𝑛

Example: Body weight vers brain weight

- Data from Allison T and Cicchetti D (1976), Sleep in Mammals, Ecological Constitutional

Correlates Science 194:732-734.

- It is of interest to know whether brain weight for different mammal species truly depends on

body weight.

- View brain weight as the response (𝑌) variable and body weight as the predictor (𝑥) variable.


23


24

- We can test if there is an underlying simple linear regression model

- Note that the log-transformed data result in a more homogenous scatterplot.

Parameter Estimation: We have 𝑌𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝜖𝑖, under assumption of 𝜖𝑖 ∼ 𝑁𝐼𝐷(0, 𝜎2)

Giving that

𝐸[𝑌𝑖|𝑥𝑖] = 𝜇𝑖 = 𝛽0 + 𝛽1𝑥𝑖

Our stragefY:

- Estimate parameters 𝛽0 and 𝛽1 from the data with the method of least squares, which in

the case of normal errors is the same as using the maximum-likelihood method.

Least Squares Estimates (LSEs)

�̂�0 and �̂�1 are those values that minimize the sum of squares


25

𝑆(𝛽0, 𝛽1) = ∑(𝑌𝑖 − 𝜇𝑖)2

𝑛

𝑖=1

= ∑(𝑌𝑖 − 𝛽0 − 𝛽1𝑥𝑖)2

𝑛

𝑖=1

The least squares estimators �̂�0 and �̂�1 are given by

�̂�0 = �̅� − �̂�1�̅�; �̂�1 =𝑆𝑥𝑦

𝑆𝑥𝑥

Where the sum of squares 𝑆𝑥𝑥, 𝑆𝑦𝑦, 𝑆𝑥𝑦 are

Proof of LSE

Estimate the parameters via least squares using partial derivatives ∇𝑆 = 0

Easier to find

min𝛽0,𝛽1

𝑆(𝛽0, 𝛽1) = min𝛽1

[minβ0

𝑆(𝛽0, 𝛽1)]

- So minimize each one separately

Giving:

1.

�̂�0 = �̅� − �̂�1�̅�

2. Subbing in to get

(�̅�𝑛�̅� − �̂�1𝑛�̅�2) + �̂�1 ∑ 𝑥𝑖2 = ∑ 𝑥𝑖𝑌𝑖

→ �̂�1 =𝑆𝑥𝑦

𝑆𝑥𝑥

Example: Brain weight

In R: use the lm function (linear model).

- lm.out = lm(y∼x)


26

Method of Maximum Liklihood - As we have seen, the simple linear regression model is 𝑌𝑖 = 𝛽0 + 𝛽1𝑥𝑖 + 𝜖𝑖 with errors 𝜖𝑖 ∼

𝑁𝐼𝐷(0, 𝜎2)

𝑌𝑖 ∼ 𝑁(𝛽0 + 𝛽1𝑥𝑖, 𝜎2)

Joint Density

- Remember that joint density is a product of ‘individual’ density functions

𝑓(𝑦1, … , 𝑦𝑛) = ∏ 𝑓𝑦𝑖(𝑦𝑖)

𝑛

𝑖=1

The joint density of the independent random responses 𝑌𝑖 evaluated at (the observed values) 𝒚𝑻 =

(𝑦1, … , 𝑦𝑛) is

𝑓(𝒚; 𝛽0, 𝛽1, 𝜎) =1

√2𝜋𝜎𝑒

−(𝑦1−𝛽0−𝛽1𝑥1)2

2𝜎2 × … × 𝑒−

(𝑦𝑛−𝛽0−𝛽1𝑥1)2

2𝜎2

= (1

√2𝜋𝜎)

𝑛

exp (−1

2𝜎2∑(𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)2

𝑛

𝑖=1

)

- The method of maximum-likelihood is called such because it finds parameter values �̂�0,1

and �̂� that maximise the joint density (likelihood).

o For 𝛽: we want to maximise the joint density, for each 𝜎 held fixed.


27

- One can show (worksheet week 2) that maximising (3) over 𝛽0 and 𝛽1 is independent of 𝜎

and is achieved by minimising ∑ (𝑦𝑖 − 𝛽0 − 𝛽1𝑥𝑖)2𝑛𝑖=1

- In this (special) case the method of maximum-likelihood gives the same parameter estimates

as the method of least-squares.

Sampling distribution of �̂�0,1

From the error assumptions, it follows that

�̂�0 ∼ 𝑁 (𝛽0, 𝜎2 [1

𝑛+

�̅�2

𝑆𝑥𝑥]) ; �̂�1 ∼ 𝑁 (𝛽1,

𝜎2

𝑆𝑥𝑥)

Fitted Regression Line

The fitted regression line equation becomes

𝑦 = �̂�0 + �̂�1𝑥 = �̅� + �̂�1(𝑥 − �̅�)

- Thus, the regression line passes through the component wise mine (�̅�, �̅�)

Correlation Coefficient and Regression Slope Recall that the Pearson correlation coefficient between vectors 𝑥 and 𝑦 is

𝑟 =𝑆𝑥𝑦

√𝑆𝑥𝑥𝑆𝑦𝑦

∈ [−1,1]

So we see that

�̂�1 = 𝑟 √𝑆𝑦𝑦

𝑆𝑥𝑥

- �̂�1 has the same sign as 𝑟 (is a scalled version of 𝑟)


Fitted regression line:

𝐵𝑟𝑎𝑖𝑛𝑊𝑡 = 91.00 + 0.97 × 𝐵𝑜𝑑𝑦𝑊𝑡


28

Estimating the Error Variance:

Residual sum of squares (RSS)

- 𝜎2 (the error variance) is also an unknown parameter

- an estimator can be obtained using residual sum of squares (RSS)

𝑅𝑆𝑆 = ∑(𝑌𝑖 − �̂�𝑖)2

𝑛

𝑖=1

= ∑(𝑌𝑖 − �̂�0 − �̂�1𝑥𝑖)2

𝑛

𝑖=1

We find that:

𝑅𝑆𝑆 ∼ 𝜎2𝜒𝑛−22 ⟹ 𝐸[𝑅𝑆𝑆] = (𝑛 − 2)𝜎2

Unbiased estimate of 𝜎2

So, the unbiased estimate of 𝝈𝟐 is

𝑠2 =1

𝑛 − 2𝑅𝑆𝑆 =

1

𝑛 − 2∑(𝑌𝑖 − �̂�0 − �̂�1𝑥𝑖)

2𝑛

𝑖=1


29


Residuals as Error Estimators:

- The residuals 𝑅𝑖can be thought of as estimates for the error terms, 𝜖𝑖, in the model.

- The empirical distribution of the residuals is an estimator of the error distribution.

- The residuals sum up to 0:

∑ 𝑅𝑖 = 0

- Note: random variables that sum to a constant cannot be independent!