BIOL2608 Biometrics 2011-2012 Computer lab session II Basic concepts in statistics

Preview:

Citation preview

BIOL2608 Biometrics 2011-2012Computer lab session II

Basic concepts in statistics

Measures of central tendency

• Also known as measure of location

• Indicates the location of the popn/sample along the measurement scale

• Useful for describing and comparing popn

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Mean (= Arithmetic mean)

• Commonly called average

• Sum of all measurements in the popn/sample divided by the popn/sample size

Mean = (10.5 + 11.5 x 2 + 12 + 12.5 + 13 x 3 + 13.5 x 2 + 14 + 14.5 + 15) / 13 = 12.88cm

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Median

• Middle measurement in an ordered dataset

10.5 11.5 11.5 12.0 12.5 13.0 13.0 13.0 13.5 13.5 14.0 14.5 15.0

Median = the middle (7th) of the 13 measurements

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Quartile

• Describes an ordered dataset in four equal fractions – 1/4 of the data smaller than 1st quartile (Q1)

– 1/4 lies between Q1 and Q2

– 1/4 lies between Q2 and Q3

– 1/4 bigger than the Q3

10.5 11.5 11.5 12.0 12.5 13.0 13.0 13.0 13.5 13.5 14.0 14.5 15.0

Q1 = 11.63 Q2 = Median = 13.0 Q3 = 13.88

Percentile

• Describes an ordered dataset in 100 equal fractions– 25th percentile = 1st quartile – 50th percentile = 2nd quartile = median– 75th pecentile = 3rd quartile

Measures of dispersion and variability

• Indicates how the measurements spread around the center of distribution

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Sample A

Sample B

Variance and standard deviationSample A Sample B

Variance (s2) 1.17cm2 2.67cm2

Standard deviation (s) 1.08cm 1.63cm

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0cm

Sample A

Sample B

Population or sample?

• Population– Entire collection of measurements in which one is

interested

Population or sample?

Population or sample?

• Population– Entire collection of measurements in which one is

interested– Often large and hard to obtain all measurements

• Sample– Subset of all measurements in the population

Population or sample?

Population or sample?

………..…..…………..…….……...

……..……………………………………………………………………………………………………………………………………………………………………………………………….………….......

Population or sample?

Sampling

Inference

Population (very large size)

Sample

Commonly used symbols

Population SampleMean μSize N nVariance σ2 s2

Standard deviation σ s

Estimation of mean

• Confidence Interval – Allows us to express the precision of the estimate of

population mean (μ) from sample mean ( )– When we say at 95% confidence level μ = ± y, it

means that we are 95% confident that μ lies between - y and + y

Estimation of variance and standard deviation

• NOTE: – Variance and standard deviation for a population

are calculated using slightly different formulae

.

Normal distribution

• A very common bell-shaped statistical distribution of data which allows us to carry out different statistical analysis

Normality check

• 6 criteria:Mean & Median Mean = Median

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shape

Histogram

Bin: Ideal bin size obtained by dividing the range by ideal no. of bin (n = 5logn)

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shapeSkewness & Kurtosis Within ± 1

Skewness

• Negative skew– longer left tail– data concentrated

on the right

• Positive skew– longer right tail– data concentrated

on the left

Kurtosis

• Measure of “peakedness” and “tailedness”• Positive kurtosis (leptokurtic)

– More acute peak around mean– Longer, fatter tails

• Negative kurtosis (platykurtic)– Lower, wider peak

around mean– Shorter, thinner tails

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shapeSkewness & Kurtosis Within ± 1 Box plot Symmetric

Box plot

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shapeSkewness & Kurtosis Within ± 1 Box plot SymmetricP-P plot / Q-Q plot Dots follow the incline straight line

P-P Plot / Q-Q Plot

Normality check

• 6 criteria:Mean & Median Mean = MedianHistogram Like a bell shapeSkewness & Kurtosis Within ± 1 Box plot SymmetricP-P plot / Q-Q plot Dots follow the incline straight lineGoodness of fit test K-S one-sample test; p > 0.05

K-S one-sample test

Related Readings

• Zar, J. H. (1999). Biostatistical Analysis, 4th edition. New Jersey: Prentice-Hall.– Chapters 2, 3, 4, 6

Recommended