Upload
martin-barnett
View
225
Download
1
Tags:
Embed Size (px)
Citation preview
1
DATASET INTRODUCTION
2
Dataset: Urine
From Cleveland Clinic 1981-1984
3
Outcome Variable:
Categorical VariableCalcium Oxalate Crystal Presence
• In this analysis, this variable will be our
• Outcome variable
• Response Variable
• Dependent Variable• Note: The dataset is coded directly as Yes/No (not 0/1 coding)
4
Other Variables (Covariates)
QuantitativeVariables
Specific Gravity
pH
Osmolarity
Conductivity
Urea Concentration (millimoles/liter)
Calcium Concentration (millimoles/liter)
Cholesterol: serum cholesterol levels
5
Discussion/Review Purpose of dataset: Determine which of the covariates
are related to the outcome. Covariates can also be called
• Independent Variables
• Predictors
• Explanatory Variables
Outcomes/Covariates can be categorical or quantitative
Can be more than one outcome and many covariates in a given study with any mixture of variable types
6
Calcium Oxalate Crystal
PresenceN Mean
Std Dev
Min Q1 Med Q3 Max
No 42 2.69 1.90 0.17 1.22 2.16 3.93 8.48
Yes 31 5.92 3.59 0.27 3.10 6.19 7.82 14.34
7
Discussion Clearly, those with calcium oxalate crystals present tend
to have higher calcium concentrations
Later we will learn to conduct hypothesis tests in such situations
Now we use this data to illustrate concepts of probability
8
Comments To facilitate our discussion of probability and classification
tests
We will categorize the quantitative variable Calcium Concentration into four groups
1 = 0-1.992 = 2-4.993 = 5-7.994 = 8 or More
9
BASIC PROBABILITYPart 1 (Unconditional Probability using Logic)
10
Back to the Urine Dataset Suppose one individual is selected from our sample and
consider the following questions
• What is the probability that the individual has calcium oxalate crystals present?
• What is the probability that the individual has a calcium concentration of 5 or more?
• What is the probability the individual has calcium oxalate crystals present AND has a calcium concentration of 5 or more?
• What is the probability the individual has calcium oxalate crystals present OR has a calcium concentration of 5 or more?
11
Comments All of these four probability questions relate to the
ENTIRE SAMPLE
We begin by answering the questions logically from the table we created using software
12
Let’s Practice!
Basic Probability of an Event
• What is the probability that the individual has calcium oxalate crystals present? We will denote this event by A.
• = PREVALENCE of calcium oxalate crystals in our sampleTable of group by r
group (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
13
Let’s Practice!
Basic Probability of an Event
• What is the probability that the individual has a calcium concentration of 5 or more? We will denote this event by B.
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
14
Let’s Practice!
Basic Probability of an Event: Intersections
• What is the probability the individual has calcium oxalate crystals present AND has a calcium concentration of 5 or more?
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
15
Let’s Practice!
Basic Probability of an Event: Unions
• What is the probability the individual has calcium oxalate crystals present OR has a calcium concentration of 5 or more?
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
16
USING PROBABILITY RULESPart 1
17
Probability Rules Rules are created and used for many reasons
The rules and properties stated previously are important and useful in probability and sometimes in statistics
Not always needed
• If you can determine the answer through logic alone you may not need a rule!
• If you are provided only pieces of the puzzle, sometimes a rule is faster than logic!
18
Continuing We now illustrate a few formulas using the questions we
have already answered using logic
19
Let’s Practice Again!
Complement Rule
• What is the probability that the individual DOES NOT have calcium oxalate crystals present?
• We could use logic and count the No’s instead of the Yes’s however knowing P(Yes)=P(A):
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
20
Let’s Practice Again!
Addition Rule (Unions)
• What is the probability the individual has calcium oxalate crystals present OR has a calcium concentration of 5 or more?
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
21
Let’s Practice Again!
Addition Rule (Unions)
• What is the probability the individual has calcium oxalate crystals present OR has a calcium concentration of 5 or more?
22
INDEPENDENCEPart 1
23
Independent Events Two events are independent if knowing one event occurs
does not change the probability of the other
This is not the same as “disjoint” events which are separate in that they cannot occur together
These are two different concepts entirely
Independence is a statement about the equality of the probability of one event whether or not the other event occurs (or is occurring, or has occurred)
24
Let’s Practice!
Investigating Independence Part 1
We know the following from our sample
?
25
Let’s Practice!
Investigating Independence Part 1
From our sample we have:
This is clearly not equal to 0.247!!
In our sample the events are dependent (we can test this hypothesis about the population later)
26
BASIC PROBABILITYPart 2: Conditional Probability (Logic & Formula)
27
Conditional Probability So far, we have divided by the TOTAL
Sometimes, however, we have additional CONDITIONS that cause us to alter the denominator (bottom) of our probability calculation
Suppose, when choosing one person from the Urine data, we ask
• Given the individual has Calcium Oxalate Crystals present, what is the probability the individual’s calcium concentration is 5 or above?
“Conditional” refers to the fact that we have these additional conditions, restrictions, or other information
28
Let’s Practice!
CONDITIONAL Probability of an Event
• Given the individual has Calcium Oxalate Crystals present, what is the probability the individual’s calcium concentration is 5 or above?
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
29
Let’s Practice!
CONDITIONAL Probability FORMULA
• Given the individual has Calcium Oxalate Crystals present, what is the probability the individual’s calcium concentration is 5 or above?
30
Let’s Practice!
CONDITIONAL Probability of an Event
• Given the individual DOES NOT HAVE Calcium Oxalate Crystals present, what is the probability the individual’s calcium concentration is 5 or above?
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
31
MORE PRACTICEConditional Probability
32
Let’s Verify!
CONDITIONAL Probability of an Event
• Given the individual has a calcium concentration of 5 or above, what is the probability the individual has calcium oxalate crystals?
• We have a small amount of rounding error this timeTable of group by r
group (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
33
INDEPENDENCEPart 2
34
Let’s Practice!
Investigating Independence Part 2
We know the following from our sample
? ?
35
Comments
Investigating Independence Part 2
These probabilities are clearly unequal in our sample, our eventual question might be if this is also true for our population
In this sample, these events are dependent
From our analysis so far, it seems likely they may be dependent in our population (we can test later)
Knowing whether or not the person has calcium oxalate crystals present CHANGES the probability of having a calcium concentration of 5 or above!!
36
GENERAL MULTIPLICATION RULE
37
General Multiplication Rule
This formula comes from rearranging the definition of conditional probability
To achieve the second formulation on the right consider the formula below for P(A|B) instead and note that the numerator is unchanged
38
General Multiplication Rule
39
REPEATED SAMPLING
40
Repeated Sampling Often we consider problems in which we draw multiple
individuals from a set of individuals
• Drawing parts from a box where some are defective
• Choosing multiple people from a certain population
The formulas we have investigated can be used to calculate probabilities in these situations
41
Let’s Practice! If we select two subjects at random from our sample, what
is the probability that both have a calcium concentration of 8 or more?
Table of group by rgroup (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
42
WANT TO LEARN MORE?
READ THE FOLLOWING OPTIONAL MATERIALThe remaining slides are optional. They illustrate some more difficult probability rules along with additional examples of probability related to the health sciences
43
Optional Content: Read About Relative Risk
Total Probability Rule
Bayes Rule
Screening Tests
• Sensitivity/Specificity
• PV+/PV-
• False Positive and False Negative Rates
ROC Curves
44
Relative Risk Relative risk is
• the risk of an “event” relative to an “exposure”
• the ratio of the probability of the event occurring among “exposed” versus “non-exposed”
• If A and B are independent, the relative risk is 1
In our rule B is the EVENT and A is the EXPOSURE
45
Let’s Practice! Find the Relative Risk of High Calcium Concentration
Given Calcium Oxalate Crystal Presence
• Note: this is the reverse of what we probably want in this case, consider that for more practice!
• INTERPRET RR: Having a calcium concentration of 5 or more is around 4 times more likely among those with calcium oxalate crystals than among those without.
46
Total Probability Rule
47
Bayes’ Rule
We want to find P(A|B) so that we will need to “rearrange” the formula swapping A’s and B’s
48
Bayes’ Rule
49
Let’s Verify!
CONDITIONAL Probability of an Event
• Given the individual has a calcium concentration of 5 or above, what is the probability the individual has calcium oxalate crystals?
• We have a small amount of rounding error this timeTable of group by r
group (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyNo Yes Total
0-1.99 19 4 23
2-4.99 17 9 26
5-7.99 5 11 16
8 or More 1 7 8
Total 42 31 73
50
SCREENING TESTSand ROC Curves
51
Screening Tests
52
Sensitivity & Specificity
“Epi” StyleHas
Condition
Does not have
Condition
Test Positive
ATP
BFP
Total Positive
Test (A+B)
TestNegative
CFN
DTN
Total Negative
Test (C+D)
Number with
Condition(A+C)
Number without
Condition(B+D)
53
Sensitivity & Specificity
Has Condition
Does not have
Condition
0-1.99NEGATIVE 4 19
2 or morePOSITIVE 27 23
31 42
group (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyYes No Total
0-1.99 4 19 23
2-4.99 9 17 26
5-7.99 11 5 16
8 or More 7 1 8
Total 31 42 73
54
Sensitivity & Specificity
Has Condition
Does not have
Condition
0-4.99NEGATIVE 13 36
5 or morePOSITIVE 18 6
31 42
group (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyYes No Total
0-1.99 4 19 23
2-4.99 9 17 26
5-7.99 11 5 16
8 or More 7 1 8
Total 31 42 73
55
Sensitivity & Specificity
Has Condition
Does not have
Condition
0-7.99NEGATIVE 24 41
8 or morePOSITIVE 7 1
31 42
group (Calcium Concentration
Group)r (Calcium Oxalate Crystal Presence)
FrequencyYes No Total
0-1.99 4 19 23
2-4.99 9 17 26
5-7.99 11 5 16
8 or More 7 1 8
Total 31 42 73
56
Bayes’ Rule
Has Condition
Does not have
Condition
Negative0- 4.99 24 41
Positive ≥ 8 7 1
31 42
Here we Define: A = Disease B = Test Positive
57
Choosing Different Cut-Off
2 or more
Cut-point Sensitivity Specificity
2 or more 0.87 0.45
5 or more 0.58 0.86
8 or more 0.23 0.98
High Sensitivity but Low Specificity
58
Choosing Different Cut-Off
5 or more
Cut-point Sensitivity Specificity
2 or more 0.87 0.45
5 or more 0.58 0.86
8 or more 0.23 0.98
Specificity IncreasedBut you reduce sensitivity
(orange arrow)
59
Choosing Different Cut-Off
8 or more
Cut-point Sensitivity Specificity
2 or more 0.87 0.45
5 or more 0.58 0.86
8 or more 0.23 0.98
Very High SpecificityVery Low Sensitivity (High
False Negative Rate)
60
What happens when We assign all individuals a positive test result?
• Sensitivity = P(Test+|Disease) = 1
• Specificity = P(Test-|No Disease) = 0
• 1 – Specificity = 1
We assign all individuals a negative test result?
• Sensitivity = P(Test+|Disease) = 0
• Specificity = P(Test-|No Disease) =1
• 1 – Specificity = 0
61
Receiver Operating Characteristic curve (ROC curve)
Cut-point Sensitivity Specificity
2 or more 0.87 0.45
5 or more 0.58 0.86
8 or more 0.23 0.980.000.100.200.300.400.500.600.700.800.901.00
0.00 0.20 0.40 0.60 0.80 1.00
True
Pos
itive
Rat
e (S
ensi
tivity
)
False Positive Rate (1-Specificity)
ROC Curve for Calcium Oxalate
2
5
8
62
ROC Curves
Area under the curve = probability that for a randomly selected pair of normal and abnormal subjects, the test will correctly identify the normal subject given the “measurement”
Area = 0.89 for the example on the left
63
Trapezoidal Rule (FYI)