Upload
william-heath
View
220
Download
1
Embed Size (px)
Citation preview
N318b Winter 2002 Nursing Statistics
Normal distribution, Z-scores, Central Limit Theorem,
Probability
Lecture 4
Nur 318b 2002 Lecture 4: page 2
School ofNursing
Institute for Work & Health
Today’s Class
Normal distribution Z-scores Central limit theorem << 10 min break >> Probability Applying knowledge to assigned readings
(Wolfe et al., 1996)
No work group today !
Nur 318b 2002 Lecture 4: page 3
School ofNursing
Institute for Work & Health
A Quick Review from Last Week
Data presentationBar graphs, pie chartsHistograms, polygons (lines)Box plots
Measures of asymmetrySkewKurtosis
Nur 318b 2002 Lecture 4: page 4
School ofNursing
Institute for Work & Health
Statistics is a branch of applied math Most statistical tests are based on a
set of basic assumptions about data Most assumptions refer to distribution If assumptions not true tests not
valid !
Normal Distribution
Review: How do you check normality of data?
“what is all the fuss about?!”
Nur 318b 2002 Lecture 4: page 5
School ofNursing
Institute for Work & Health
The (Standard) Normal Curve
- a hypothetical distribution that forms basis of statistical theory (also called Gaussian curve)
(See Figure 3.1 in textbook, page 64)
Nur 318b 2002 Lecture 4: page 6
School ofNursing
Institute for Work & Health
Why use normal curve? Many variables are normally distributed Many tests require normal distribution Allows for tests of inference since study
results can be compared against it (i.e. it is a probability or “chance” distribution)
“Understanding the normal curve prepares you for understanding the concept of hypothesis testing”
(Textbook page 64)
Nur 318b 2002 Lecture 4: page 7
School ofNursing
Institute for Work & Health
There is an elegant mathematical formula (theory) underlying the distribution (you don’t need to know it !)
Discovered in 1700’s by Demoivre, then later Gauss (1800’s) and then used by Galton (medicine)
Another example of mathematical theory helping to explain observed phenomena
Where did the normal curve come from?
Nur 318b 2002 Lecture 4: page 8
School ofNursing
Institute for Work & Health
What is the normal curve used for?
Test if your observed value (e.g. BP) is different from expected value (i.e. can use standardized or Z-scores to check this)
Estimate precision of observed study mean (i.e. confidence intervals)
Tests based on probability (likelihood) that observed results “fit” normal curve
Nur 318b 2002 Lecture 4: page 9
School ofNursing
Institute for Work & Health
What are the properties of the normal curve? X-axis measured in SD’s (from mean) Y-axis is frequency (units or counts) Mean, median, mode all same Symmetrical (“bell-shaped”) around mean +/- 1 SD includes 68% of population +/- 2 SD’s includes 96% of population “tails” hold very small % of population
(REMEMBER: total area under curve = 100% or 1.0)
Nur 318b 2002 Lecture 4: page 10
School ofNursing
Institute for Work & Health
Standard normal curve
+/- 2 SD includes 96% of sample
-2 SD +2 SD
+/-1 SD either side of mean includes about 68% of sample
-1 SD +1 SDMean
Nur 318b 2002 Lecture 4: page 11
School ofNursing
Institute for Work & Health
Z-scores
If a variable is normally distributed then observed (mean) values can be converted to a z-score
WHY?
Test if your study mean (e.g. BP) is different from expected value
Z-score just another name for SD “distance” from the population mean
Nur 318b 2002 Lecture 4: page 12
School ofNursing
Institute for Work & Health
HOW?
Z-scores – an example
A population has a mean sys BP of 110 mmHG and SD of 15 mmHG
What proportion (%) of people have BP between 95 and 120?
X - Z = ------ SD
= sample mean
SD = sample SD
Nur 318b 2002 Lecture 4: page 13
School ofNursing
Institute for Work & Health
X - Z = ------ SD
= sample mean
SD = sample SD
Z-scores – an example
X - Z1 = ------
X - Z2 = ------
95-110= ---------
15= -1.0
120-110= ---------
15= 0.67
Nur 318b 2002 Lecture 4: page 14
School ofNursing
Institute for Work & Health
Z-scores – an example
Now need to extract % values from the Z-scores using a table (e.g. Appendix A, pg. 417-8 of textbook)
-’ve values are % areas to left of mean, +’ve values are to the right of mean ( )
Z1 = -1.0 =
From Table in Appendix A
Total area = 34.13 + 24.86 = 58.99%
Z2 = 0.67 =34.13% (between 95 to 110)24.86% (between 110 to 125)
Nur 318b 2002 Lecture 4: page 15
School ofNursing
Institute for Work & Health
Z-scores – example 2
What proportion (%) of people have a systolic BP above 140?
X - Z = ------ SD
140-110= ---------
15= 2.0
Z = 2.0 = 47.72% between 110 to 140
From Table in Appendix A
> 140 = 50 – 47.72 = 2.28%
But this represents what?
Nur 318b 2002 Lecture 4: page 16
School ofNursing
Institute for Work & Health
Central Limit Theorem - What is it?When large enough (e.g. n>= 25) samples are drawn from a population with a known variance, the sample mean will be normally distributed
Theorem holds even if underlying distribution moderately non-normal (e.g. a bit skewed)
i.e. if you plot ’s you get a bell-curve
Nur 318b 2002 Lecture 4: page 17
School ofNursing
Institute for Work & Health
Central Limit Theorem – What is its importance? Now have ability to statistically test the
likelihood of observed (sample) mean Variation (“dispersion”) about true mean is
called “Standard error” (SE) of mean SE (of mean) and SD (of sample) are
directly related mathematically SE = SD / square root of n
(where n = sample size)
Nur 318b 2002 Lecture 4: page 18
School ofNursing
Institute for Work & Health
Z-scores – for meansHow likely is it (i.e. what %) that a sample of size n=100 will have mean systolic BP > 113 (assuming = 110 and = 15)?
Z = ------
113 - 110 = --------- 15 / 10
= 2.0
Z = 2.0 = 47.72% of area to right of But once again this represents what?
From Z-score Table in Appendix A
> 113 = 50 – 47.72 = 2.28%
Sample means between 110 - 113 mmHg
- Want Z-scores >= about 2 ! / n
Nur 318b 2002 Lecture 4: page 19
School ofNursing
Institute for Work & Health
What happens if sample size drops to 10 (i.e. n=10, > 113 and = 110, = 15)?
113 - 110 = ----------- 15 / 3.16
= 0.63
Z = 0.63 = 23.57% But once again this represents what?
From Table in Appendix A
For > 113 = 50 – 23.57 = 21.43%
- sample means that fall below 113 mmHg
Effect of sample size on mean?
Z = ------
-
/ n
Nur 318b 2002 Lecture 4: page 20
School ofNursing
Institute for Work & Health
10 minute break !
Nur 318b 2002 Lecture 4: page 21
School ofNursing
Institute for Work & Health
Think of it as a statistical measure of chance
Probability
A proportion (e.g. %) that lets you make intelligent guesses about future events
Often expressed as a “p-value” p-value “rules” in (quantitative) research
P(event) = number of events-------------------------number of subjects
(Often expressed as % when multiplied by 100)
Nur 318b 2002 Lecture 4: page 22
School ofNursing
Institute for Work & Health
Probability – cont’d
You read a well done clinical trial that followed 1000 women with breast CA, 200 of whom died from BC at 5 yrs
You then see a women with BC on the ward and she asks you if she is going to live – what do you tell her?
She has a 20% probability or a 1 in 5 chance of dying from BC within 5 yrs
Nur 318b 2002 Lecture 4: page 23
School ofNursing
Institute for Work & Health
Probability – cont’d
What if she then tells you she is node negative and the tumour was small?
Probability is a way of quantifying risk or likelihood of events occurring (usually according to a set of criteria)
Then she tells you her mother and sister both died from BC by age 45
Nur 318b 2002 Lecture 4: page 24
School ofNursing
Institute for Work & Health
Probability – Facts
Probabilities always between 0 and 1 (0 = min value = no chance)
(1 = max value = definite event)
P-value = “probability due to chance”
arbitrarily “set” at p<=0.05 in most cases, but it can vary from 0.2 to <0.01
P-value refers to the “tails” of the normal curve distribution (lower = better!)
Nur 318b 2002 Lecture 4: page 25
School ofNursing
Institute for Work & Health
Probability – Rules
Conditional Probabilities
probability of event A given event B
Multiplication Rule (Independence !)
probability of A and B = P(A) x P(B)
Addition Rule (Mutually exclusive !)
probability of A and B = P(A) + P(B)
Nur 318b 2002 Lecture 4: page 26
School ofNursing
Institute for Work & Health
Part 2: Application to the Assigned Reading
Nur 318b 2002 Lecture 4: page 27
School ofNursing
Institute for Work & Health
Wolfe et al. (1996)Quick summary of the paper: an etiologic study aimed at exploring
possible causal pathways between back pain and osteoarthritis of the knee
a 3-year consecutive series of 368 knee OA patients via a rheumatology clinic
X-sectional questionnaire assessment of key study variables (possible bias?)
Nur 318b 2002 Lecture 4: page 28
School ofNursing
Institute for Work & Health
Typical example of a sophisticated multistage exploratory analysis
Wolfe et al. (1996)
Descriptive analysis
Exploratory univariate analysis
Causal pathway multivariate analysis
Nur 318b 2002 Lecture 4: page 29
School ofNursing
Institute for Work & Health
Some questions …What does Figure 1 tell us?
Why did they group BMI in quartiles?
Nur 318b 2002 Lecture 4: page 30
School ofNursing
Institute for Work & Health
Some questions …
Do you understand the major features of the data in Table 1?
What do all the columns mean?e.g. “unadjusted” vs. “adjusted”
Odds ratios and confidence intervals studied later (CI’s in next lecture !)
Nur 318b 2002 Lecture 4: page 31
School ofNursing
Institute for Work & Health
Next Week - Lecture 5: Inference testing, Type I and Type II errors, p-values, and
Confidence Intervals
For next week’s class please review:1. Page 14 in syllabus2. Textbook Chapter 3, pages 80-913. Syllabus papers:
i) Birenbaum et al. (1996) ii) Gulick (1995)
Nur 318b 2002 Lecture 4: page 32
School ofNursing
Institute for Work & Health
Research Practicum
Can those who signed up please stay for a few extra minutes to decide placements?
Did those who signed-up last term and did NOT get placed want to be put back in the “pool” to be placed?