7
4/17/2015 1 Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for slope Conditions for inference Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain Is the size of certain regions of your brain correlated with the size of your social network? Data from 40 students at City College London How to measure brain size? How to measure social network size? Source: R. Kanai, B. Bahrami, R. Roylance and G. Ree (2011). Online social network size is reflected in human brain structure, Proceedings of the Royal Society B: Biological Sciences. 10/19/11. Statistics: Unlocking the Power of Data Lock 5 Measuring Brain Size Structural Magnetic Resonance Imaging (MRI) Voxel-based morphometry (VBM) to compute regional grey matter volume based on T1-weighted anatomical MRI scans Brain regions found significant in initial study Amygdala (emotion and emotional memory) Middle temporal gyrus (social perception) Entorhinal cortex (memory and navigation) Superior temporal sulcus (perception of others) Response: normalized z-score of grey matter density for these brain regions Statistics: Unlocking the Power of Data Lock 5 Brain Regions Image from Do our Brains Determine our Facebook Friend Count? (www.nature.com) Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain How to measure size of social network? How many were present at your 18th or 21st birthday party? If you were going to have a party now, how many people would you invite? What is the total number of friends in your phonebook? Write down the names of the people to whom you would send a text message marking a celebratory event. How many people is that? Write down the names of people in your phonebook you would meet for a chat in a small group (one to three people). How many people is that? How many friends have you kept from school and university whom you could have a friendly conversation with now? How many friends do you have on ‘Facebook’? How many friends do you have from outside school or university? Write down the names of the people of whom you feel you could ask a favor and expect to have it granted. How many people is that? Explanatory variable Statistics: Unlocking the Power of Data Lock 5 Social Networks and the Brain r = 0.436 Is the association significant?

STAT 250 Dr. Kari Lock Morgan Simple Linear Regression · 2015. 4. 17. · 4/17/2015 1 Statistics: Unlocking the Power of Data 5 Lock STAT 250 Dr. Kari Lock Morgan Simple Linear Regression

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: STAT 250 Dr. Kari Lock Morgan Simple Linear Regression · 2015. 4. 17. · 4/17/2015 1 Statistics: Unlocking the Power of Data 5 Lock STAT 250 Dr. Kari Lock Morgan Simple Linear Regression

4/17/2015

1

Statistics: Unlocking the Power of Data Lock5

STAT 250 Dr. Kari Lock Morgan

Simple Linear Regression

SECTION 9.1

• Inference for correlation

• Inference for slope

• Conditions for inference

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain

Is the size of certain regions of your brain correlated with the size of your social network?

Data from 40 students at City College London

How to measure brain size?

How to measure social network size?

Source: R. Kanai, B. Bahrami, R. Roylance and G. Ree (2011). Online social network size is reflected in human brain structure, Proceedings of the Royal Society B: Biological Sciences. 10/19/11.

Statistics: Unlocking the Power of Data Lock5

Measuring Brain Size Structural Magnetic Resonance Imaging (MRI)

Voxel-based morphometry (VBM) to compute regional grey matter volume based on T1-weighted anatomical MRI scans

Brain regions found significant in initial study Amygdala (emotion and emotional memory) Middle temporal gyrus (social perception) Entorhinal cortex (memory and navigation) Superior temporal sulcus (perception of others)

Response: normalized z-score of grey matter density for these brain regions

Statistics: Unlocking the Power of Data Lock5

Brain Regions

Image from Do our Brains Determine our Facebook Friend Count? (www.nature.com)

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain How to measure size of social network?

How many were present at your 18th or 21st birthday party? If you were going to have a party now, how many people would you

invite? What is the total number of friends in your phonebook? Write down the names of the people to whom you would send a text

message marking a celebratory event. How many people is that? Write down the names of people in your phonebook you would

meet for a chat in a small group (one to three people). How many people is that?

How many friends have you kept from school and university whom you could have a friendly conversation with now?

How many friends do you have on ‘Facebook’? How many friends do you have from outside school or university? Write down the names of the people of whom you feel you could ask

a favor and expect to have it granted. How many people is that?

Explanatory variable

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain

r = 0.436

Is the association significant?

Page 2: STAT 250 Dr. Kari Lock Morgan Simple Linear Regression · 2015. 4. 17. · 4/17/2015 1 Statistics: Unlocking the Power of Data 5 Lock STAT 250 Dr. Kari Lock Morgan Simple Linear Regression

4/17/2015

2

Statistics: Unlocking the Power of Data Lock5

Parameter Distribution Standard Error

Proportion

Normal

Difference in Proportions

Normal

Mean t, df = n – 1

Difference in Means t, df = min(n1, n2) – 1

Correlation t, df = n – 2

Standard Error Formulas

(1 )p p

n

2

n

1 1

1

2 2

2

(1 ) (1 )p p p p

n n

2 2

1 2

1 2n n

1- r2

n - 2

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain • Is the grey matter volume of these regions of the brain significantly correlated with number of Facebook friends?

• From n = 40 people, we find r = .436. Is this significant?

(a) Yes

(b) No

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain 1. State hypotheses:

2. Check conditions:

3. Calculate test statistic:

4. Compute p-value:

5. Interpret in context:

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain

Should you go out and add more Facebook friends to increase the size of your brain?

a) Yes b) No

Statistics: Unlocking the Power of Data Lock5

Limitations

Statistics: Unlocking the Power of Data Lock5

Social Networks and the Brain

Give a 95% confidence interval for ρ, the true correlation between grey matter volume in the left middle temporal gyrus and number of Facebook friends. (Can use t* = 2).

a) (0.34, 0.54) b) (0.24, 0.64) c) (0.14, 0.73) d) (0.04, 0.83)

r = 0.436

SE =1- 0.4362

40 - 2= 0.156

Page 3: STAT 250 Dr. Kari Lock Morgan Simple Linear Regression · 2015. 4. 17. · 4/17/2015 1 Statistics: Unlocking the Power of Data 5 Lock STAT 250 Dr. Kari Lock Morgan Simple Linear Regression

4/17/2015

3

Statistics: Unlocking the Power of Data Lock5

R2

R2 is the proportion of the variability in the response variable, Y, that is

explained by the explanatory variable, X

For simple linear regression, R2 = r2 (R2 is just the sample correlation squared)

Statistics: Unlocking the Power of Data Lock5

R2

2 0.67R 2 0.09R

How much does the variability in Y decrease if you know X?

Statistics: Unlocking the Power of Data Lock5

Regression in Minitab Stat -> Regression -> Fitted Line Plot

0.4362 = 0.19

Statistics: Unlocking the Power of Data Lock5

Sample to Population

Everything we have done so far is based solely on sample data

Now, we will extend from the sample to the population

Statistical inference!

Statistics: Unlocking the Power of Data Lock5

• The population/true simple linear model is

𝑦 = 𝛽0 + 𝛽1𝑥 + 𝜀

• 0 and 1, are unknown parameters

• Can use familiar inference methods!

Intercept Slope

Simple Linear Model

Random error

Statistics: Unlocking the Power of Data Lock5

Inference for the Slope

Test for whether the slope is significantly different from 0 (whether there is any linear relationship between x and y):

Confidence interval for the true slope

H0

:b1

= 0

Ha

:b1¹ 0

Page 4: STAT 250 Dr. Kari Lock Morgan Simple Linear Regression · 2015. 4. 17. · 4/17/2015 1 Statistics: Unlocking the Power of Data 5 Lock STAT 250 Dr. Kari Lock Morgan Simple Linear Regression

4/17/2015

4

Statistics: Unlocking the Power of Data Lock5

• Confidence intervals and hypothesis tests for the slope can be done using the familiar formulas:

• Population Parameter: 1, Sample Statistic: 𝛽 1

• Use t-distribution with n – 2 degrees of freedom

Inference for the Slope

sample statistic null value

SEt

*sample statistic t SE

Statistics: Unlocking the Power of Data Lock5

Regression in Minitab Stat -> Regression -> Regression -> Fit Regression Model

Statistics: Unlocking the Power of Data Lock5

Inference for Slope

Is the slope significantly different from 0? (a) Yes (b) No

n = 40

Give a 95% confidence interval for the true slope.

Statistics: Unlocking the Power of Data Lock5

Hypothesis Test

Statistics: Unlocking the Power of Data Lock5

Regression in Minitab Stat -> Regression -> Regression -> Fit Regression Model

Statistics: Unlocking the Power of Data Lock5

Two Quantitative Variables

• The t-statistic (and p-value) for a test for a non-zero slope and a test for a non-zero correlation are identical! • They are equivalent ways of testing for a linear association between two quantitative variables.

Page 5: STAT 250 Dr. Kari Lock Morgan Simple Linear Regression · 2015. 4. 17. · 4/17/2015 1 Statistics: Unlocking the Power of Data 5 Lock STAT 250 Dr. Kari Lock Morgan Simple Linear Regression

4/17/2015

5

Statistics: Unlocking the Power of Data Lock5

Confidence Interval

*statistic t SE

0.0023 ± 2 ´ 0.00077

0.00076,0.00384( )We are 95% confident that the true slope, regressing grey matter volume of the left temporal gyrus on number of Facebook friends, is between 0.00076 and 0.00384.

Statistics: Unlocking the Power of Data Lock5

Multiple Testing?

Statistics: Unlocking the Power of Data Lock5

False Positive (Type I Error) Protection

To further protect against Type I errors, they performed two independent analysis on two separate samples (n = 125, then n = 40)

Statistics: Unlocking the Power of Data Lock5

Real-World Network Size

What about real-world network size?

Statistics: Unlocking the Power of Data Lock5

Inference based on the simple linear model is only valid if the following conditions hold:

1) Linearity 2) Constant Variability of Residuals 3) Normality of Residuals

Conditions

Statistics: Unlocking the Power of Data Lock5

• The relationship between x and y is linear (it makes sense to draw a line through the scatterplot)

Linearity

Page 6: STAT 250 Dr. Kari Lock Morgan Simple Linear Regression · 2015. 4. 17. · 4/17/2015 1 Statistics: Unlocking the Power of Data 5 Lock STAT 250 Dr. Kari Lock Morgan Simple Linear Regression

4/17/2015

6

Statistics: Unlocking the Power of Data Lock5

Dog Years

• 1 dog year = 7 human years • Linear: human age = 7×dog age

Charlie

• From www.dogyears.com: “The old rule-of-thumb that one dog year equals seven years of a human life is not accurate. The ratio is higher with youth and decreases a bit as the dog ages.”

LINEAR

ACTUAL

A linear model can still be useful, even if it doesn’t perfectly fit the data.

Statistics: Unlocking the Power of Data Lock5

“All models are wrong, but some are useful”

-George Box

Statistics: Unlocking the Power of Data Lock5

Residuals (errors)

~ 0,i N

The errors are normally distributed

The average of the errors is 0

The standard deviation of the errors is constant for all cases

Conditions for residuals:

Check with a histogram

(Always true for least squares regression)

Constant spread of points around

the line Statistics: Unlocking the Power of Data Lock5

Regression in Minitab

Is the association approximately linear? a) Yes b) No

Is the spread of the points around the line approximately constant? a) Yes b) No

Statistics: Unlocking the Power of Data Lock5

Histogram of Residuals

Are the residuals approximately normally distributed? a) Yes b) No

Statistics: Unlocking the Power of Data Lock5

Non-Constant Variability

Page 7: STAT 250 Dr. Kari Lock Morgan Simple Linear Regression · 2015. 4. 17. · 4/17/2015 1 Statistics: Unlocking the Power of Data 5 Lock STAT 250 Dr. Kari Lock Morgan Simple Linear Regression

4/17/2015

7

Statistics: Unlocking the Power of Data Lock5

Non-Normal Residuals

Statistics: Unlocking the Power of Data Lock5

• If the association isn’t linear: don’t use simple linear regression

• If variability is not constant, or residuals are not normal: The model itself is still valid, but inference may not be accurate

• If you want to do something more fancy so the conditions are met… take STAT 462!

Conditions not Met?

Statistics: Unlocking the Power of Data Lock5

1) Plot your data! • Association approximately linear? • Outliers? • Constant variability?

2) Fit the model (least squares)

3) Use the model • Interpret coefficients • Make predictions

4) Look at histogram of residuals (normal?)

5) Inference (extend to population) • Inference on slope (interval and test)

Simple Linear Regression

Statistics: Unlocking the Power of Data Lock5

To Do

Read Section 9.1

Do HW 9.1 (due Friday, 3/24)

Study for Exam 3 (Friday, 3/24)