22
Statistics: Data Presentation & Analysis Fr Clinic I

Statistics: Data Presentation & Analysis Fr Clinic I

  • View
    222

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Statistics: Data Presentation & Analysis Fr Clinic I

Statistics: Data Presentation & Analysis

Fr Clinic I

Page 2: Statistics: Data Presentation & Analysis Fr Clinic I

Overview

• Tables & Graphs• Populations & Samples• Mean, Median, & Variance• Error Bars

– Standard Deviation, Standard Error & 95% Confidence Interval (CI)

• Comparing Means of Two Populations• Linear Regression (LR)

Page 3: Statistics: Data Presentation & Analysis Fr Clinic I

Warning• Statistics is a huge field, I’ve simplified considerably

here. For example:– Mean, Median, and Standard Deviation

• There are alternative formulas

– 95% Confidence Interval• There are other ways to calculate CIs (e.g., z statistic instead of t;

difference between two means, rather than single mean…)

– Error Bars• Don’t go beyond the interpretations I give here!

– Comparing Means of Two Data Sets• We just cover the t test for two means when the variances are

unknown but equal, there are other tests

– Linear Regression• We only look at simple LR and only calculate the intercept, slope and

R2. There is much more to LR!

Page 4: Statistics: Data Presentation & Analysis Fr Clinic I

Tables

Water

(1)

Turbidity (NTU)

(2)

True Color (Pt-Co)

(3)

Apparent Color

(Pt-Co) (4)

Pond Water 10 13 30

Sweetwater 4 5 12

Hiker 3 8 11

Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters

Consistent Format, Title, Units, Big FontsDifferentiate Headings, Number Columns

Page 5: Statistics: Data Presentation & Analysis Fr Clinic I

Figures

11

Figure 1: Turbidity of Pond Water, Treated and Untreated

0

5

10

15

20

25

Pond Water Sweetwater Miniworks Hiker Pioneer Voyager

Turb

idit

y (N

TU

)

Filter

20

10

75

1

11

Consistent Format, Title, UnitsGood Axis Titles, Big Fonts

Page 6: Statistics: Data Presentation & Analysis Fr Clinic I

Populations and Samples• Population

– All possible outcomes of experiment or observation • US population• Particular type of steel beam

• Sample– Finite number of outcomes measured or observations made

• 1000 US citizens• 5 beams

• Use samples to estimate population properties– Mean, Variance

• E.g., Height of 1000 US citizens used to estimate mean of US population

Page 7: Statistics: Data Presentation & Analysis Fr Clinic I

Central Tendency

• Mean and MedianMean = xbar = Sum of values divided by sample size

= (1+3+3+6+8+10)/6 = 5.2 NTU

Median = m = Middle number Rank - 1 2 3 4 5 6Number - 1 3 3 6 8 10

For even number of sample points, average middle two

= (3+6)/2 = 4.5

13368

10

Excel: Mean – AVERAGE; Median - MEDIAN

Page 8: Statistics: Data Presentation & Analysis Fr Clinic I

Variability

• Variance, s2

– sum of the square of the deviation about the mean divided by degrees of freedom

– s2 = n(xi – xbar)2/(n-1)

– Where xi = a data point and n = number of data points

• Example (cont.)– s2 = [(1-5.2)2 + (3-5.2)2 + (3-5.2)2 + 6-5.2)2 + (8-5.2)2

+ (10-5.2)2] /(6-1) = 11.8 NTU2

Excel: Variance – VAR

Page 9: Statistics: Data Presentation & Analysis Fr Clinic I

Error Bars

• Show data variability on plot of mean values • Types of error bars include:

• Max/min, ± Standard Deviation, ± Standard Error, ± 95% CI

0

2

4

6

8

10

Filter 1 Filger 2 Filter 3

Filter Type

Tu

rbid

ity

(NT

U)

Page 10: Statistics: Data Presentation & Analysis Fr Clinic I

Standard Deviation, s

• Square-root of variance• If phenomena follows Normal Distribution

(bell curve), 95% of population lies within 1.96 standard deviations of the mean

• Error bar is s above & below mean

Normal Distribution

-4 -2 0 2 4

Standard Deviation

-1.96 1.96

95%

Standard Deviations from Mean

2ss

Excel: standard deviation – STDEV

Page 11: Statistics: Data Presentation & Analysis Fr Clinic I

Standard Error of Mean• Also called St-Err or sxbar

• For sample of size n taken from population with standard deviation estimated as s

• As n ↑, sxbar estimate↓, i.e., estimate of population mean improves

• Error bar is St-Err above & below mean

n

ssX

Xs

Page 12: Statistics: Data Presentation & Analysis Fr Clinic I

95% Confidence Interval (CI) for Mean

• A 95% Confidence Interval is expected to contain the population mean 95 % of the time (i.e., of 95%-CIs from 100 samples, 95 will contain pop mean)

• t95%,n-1 is a statistic for 95% CI from sample of size n– t95%,n-1 = TINV(0.05,n-1)– If n 30, t95%,n-1 ≈ 1.96 (Normal Distribution)

• Error bar is above & below mean

X1n%,95 stX

Xn st 1%,95

Page 13: Statistics: Data Presentation & Analysis Fr Clinic I

Using Error Bars to compare data• Standard Deviation

– Demonstrates data variability, but no comparison possible

• Standard Error– If bars overlap, any difference in means is not statistically significant– If bars do not overlap, indicates nothing!

• 95% Confidence Interval– If bars overlap, indicates nothing!– If bars do not overlap, difference is statistically significant

• We’ll use 95 % CI in this class– Any time you have 3 or more data points, determine mean,

standard deviation, standard error, and t95%,n-1, then plot mean with error bars showing the 95% confidence interval

Page 14: Statistics: Data Presentation & Analysis Fr Clinic I

Adding Error Bars to an Excel Graph• Create Graph

– Column, scatter,…

• Select Data Series• In Layout Tab-Analysis Group, select Error Bars • Select More Error Bar Options• Select Custom and Specify Values and select

cells containing the valuesXn st 1%,95

Page 15: Statistics: Data Presentation & Analysis Fr Clinic I

Example 1: 95% CITurbidity Data +/- 95% CI

1 2 3 mean St Dev n St-Err t95%,2 t95%,2St-Err

NTU NTU NTU NTU NTU NTUFilter 1 2.1 2.1 2.2 2.1 0.06 3 0.03 4.30 0.14Filter 2 3.2 4.4 5 4.2 0.92 3 0.53 4.30 2.28Filter 3 4.3 4.2 4.5 4.3 0.15 3 0.09 4.30 0.38

2.1

4.2 4.3

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

Filter 1 Filter 2 Filter 3

Portable Water Filter

Tu

rbid

ity

(NT

U)

Page 16: Statistics: Data Presentation & Analysis Fr Clinic I

What can we do?

• Lift weight multiple times using different solar panel combinations (or hyrdoturbines, or gear boxes) and plot mean and 95 % Confidence interval error bars.– If error bars overlap between to different test conditions,

indicates nothing!– If error bars do not overlap, difference is statistically

significant

Page 17: Statistics: Data Presentation & Analysis Fr Clinic I

T Test

• A more sophisticated way to compare means• Use t test to determine if means of two

populations are different• E.g., lift times with different solar panel combinations

or turbines or…

Page 18: Statistics: Data Presentation & Analysis Fr Clinic I

Comparing Two Data Sets using the t test

• Example - You lift weight with two panels in series and two in parallel.– Series: Mean = 2 min, s = 0.5 min, n = 20– Parallel: Mean = 3 min, s = 0.6 min, n = 20

• You ask the question - Do the different panel combinations result in different lift times?– Different in a statistically significant way

Page 19: Statistics: Data Presentation & Analysis Fr Clinic I

Are the Lift Times Different?• Use TTEST (Excel)

• Fractional probability of being wrong if you claim the two populations are different– We’ll say they are significantly different if

probability is ≤ 0.05

Series Parallel1.5 3

2 2.42.2 2.21.8 2.6

3 3.41.6 3.61.2 3.82.1 3.51.9 2.72.2 2.42.6 3.51.7 3.81.8 2.11.5 2.52.4 3.42.5 3.32.7 2.41.4 3.61.5 2.32.6 3.7

Page 20: Statistics: Data Presentation & Analysis Fr Clinic I

Marbles

Page 21: Statistics: Data Presentation & Analysis Fr Clinic I

Linear Regression

• Fit the best straight line to a data set

y = 1.897x + 0.8667R2 = 0.9762

0

5

10

15

20

25

0 2 4 6 8 10 12

Gra

de

Po

int

Ave

rag

e

Height (m)

Right-click on data point and select “trendline”. Select options to show equation and R2.

Page 22: Statistics: Data Presentation & Analysis Fr Clinic I

R2 - Coefficient of multiple Determination

• R2 = n(ŷi - ybar)2 / n(yi - ybar)2

– ŷi = Predicted y values, from regression equation

– yi = Observed y values

– Ybar = mean of y

• R2 = fraction of variance explained by regression– R2 = 1 if data lies along a straight line