16
Carolin Kosiol Institute of Population Genetics Vetmeduni Vienna <[email protected]> Spezielle Statistik in der Biomedizin WS 2014/15 Normal Distribution

Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Carolin Kosiol

Institute of Population Genetics

Vetmeduni Vienna

<[email protected]>

Spezielle Statistik in der Biomedizin

WS 2014/15

Normal Distribution

Page 2: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Normal Distribution

Central Limit Theorem:

If you take repeated samples from a population and

calculate their averages, then these averages will be

normally distributed.

Let’s demonstrate it for ourselves:

means <- numeric(1000)

for (i in 10000){

means[i] <- mean(runif(5)*10)

}

hist(means, ylim=c(0,1600))

Page 3: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

How close is this to a normal distribution?

mean(means)

sd(means)

Probability density function:

xv <- seq(0,10,0.1)

yv <- dnorm(xv, mean=4.998581, sd=1.2899)*5000

lines(xv,yv)

Page 4: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Cumulative Probability

pnorm(-2)

Just a bit less than 2.5% will be lower than -2 standard

devations

pnorm(-1)

About 16% of random samples will be smaller than 1

standard deviation below the mean.

1-pnorm(3)

Probability of a sample from a Normal distribution being

more than 3 standard deviations is less than 0.2%

Page 5: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Properties of the Normal Distribution

Page 6: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Quantiles of the Normal Distribution

qnorm(c(0.0025,0.975))

Page 7: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Suppose we have measured the height of 300 horses

horses = rnorm(300, 135, 5)

hist(horses, xlab=“Height at withers”, ylab=“Frequency”)

For large sample sizes n we approach

a Gaussian distribution

horses = rnorm(10000, 135, 5)

hist(horses, 20, xlim=c(100,180), xlab=“Height at withers”,

ylab=“Frequency”)

Height at withers

(height at withers)

Page 8: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Plot for Testing Normality of Single Samples

Quantile-quantile plot’

qqnorm(rnorm(100, mean = 5, sd = 3))

qqline(rnorm(100, mean = 5, sd = 3), col = 2)

qqnorm(rnorm(1000, mean = 5, sd = 3))

qqline(rnorm(1000, mean = 5, sd = 3), col = 2)

Page 9: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Students t- distribution

Used instead of the Normal distribution when sample

sizes are small (n<30)

Shaped like normal distribution, but heavier tails

The equivalents to pnorm and qnorm are pt and qt

Quantiles of the Student t-distribution

qt(0.975,5)

2.57082

Page 10: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

t-test

Comes in different „flavors“

One sample t-test to compare mean of the sample to

a known value

Two sample t-test for two independent samples

comparing the means of the two samples (see

example on the next slides)

Paired sample t-test compares the means of two

variables when measures are taken on the same

individuals (eg. before and after treatment)

Page 11: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Assumptions of the t-test

1) normal distribution of data

2) variance homogeneity (equal variances) of the 2

samples

The assumptions need to be checked!

If data is not normally distributed -> use Wilcoxon-

test instead (nonparametric test)

Page 12: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

T-test Example Davis balanced dataset with 88 females and 88 males

Page 13: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

t-test Example Davis balanced dataset with 88 females and 88 males

Page 14: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

t-Test for Unequal Sample Sizes

Page 15: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

Summary: the “norm” family

rnorm(n, mu, sqrt(sigma^2) ) simulates an iid sample

of size n with parameter mu and sigma^2 parameter

(but note that the function assumes you are inputting the

square root of this last parameter!!!)

dnorm(x, mu, sqrt(sigma^2) ) provides the value of the

normal probability density function (what is this?) for an x

of a particular value

pnorm(q, mu, sqrt(sigma^2) ) calculates the area

under the curve from negative infinity to the value q

qnorm(p, mu, sqrt(sigma^2) ) is the opposite of pnorm,

i.e. it takes an output of pnorm and returns the value q

(why is this useful!?)

Page 16: Normal Distribution - i122server.vu-wien.ac.ati122server.vu-wien.ac.at/pop/Kosiol_website/... · The equivalents to pnorm and qnorm are pt and qt Quantiles of the Student t-distribution

R exercise (develop you own code)

Using the "rnorm“ function in R, simulate two samples

of size n=40; the first with mean µ=3.7 and the second

with mean µ=5.5 (average number of offspring for

dourmouse in a conifer forests and broad-leafed forests

environment respectively). Vary the standard deviation

of the samples (1,2,3,4,5,6).

Use a t-test to answer the following question: When is

the difference between the two samples not significant

(P > 0.05) anymore?