Chapter 3 Introduction to Probabilitywahed/teaching/2041/summer07/chapter 3.pdf · Chapter 3 Introduction to Probability 3.1 What is Probability? Table 3.1: 5-year Incidence of breast

Chapter 3

Introduction to Probability

3.1 What is Probability?

Table 3.1: 5-year Incidence of breast cancer in 2000 45-54 years old women

Had first child Total Diagnosed with breast cancer Proportion

Before the age of 20 1000 4 0.004

After the age of 30 1000 5 0.005

Total 2000 9 0.0045

• Is this evidence enough to confirm a difference in risk between

1

BIOS 2041 Statistical Methods Abdus S. Wahed

the two groups?

• How about if we increase sample sizes by 10-fold?

Table 3.2: 5-year Incidence of breast cancer in 20000 45-54 years old women

Had first child Total Diagnosed with breast cancer Proportion

Before the age of 20 10000 40 0.004

After the age of 30 10000 50 0.005

Total 2000 9 0.0045

• Not sure, as these differences in risks might just due to chance.

Thus, we need a formal way of judging if such differences in random

phenomenon could be attributed to chances only.

3.2 Probability

3.2.1 Experiment

An experiment is any action or process that generates observations.

1. Tossing a coin once,

2. Rolling a die twice ,

Chapter 3 2


3. measuring the blood pressure levels,

4. Obtaining blood types,

5. Picking up a student from this class at random and asking what

grade he/she expects in this class, etc.

3.2.2 Sample space

The sample space of an experiment, denoted by S, is the set of all

possible outcomes of the experiment. For the experiments mentioned

in the previous section,

1. S = {H, T},

2. S = {11, 12, 13, 14, 15, 16, 21, 22, . . . , 61, 62, 63, 64, 65, 66},

3. S = {x : x ≥ 0},

4. S = {A+, A−, B+, B−, . . .},

5. S = {A+, A, A−, . . . , }.

Chapter 3 3


3.2.3 Event

An event is any collection of outcomes contained in the sample space.

For the sample spaces mentioned in the previous section,

1. • E1 = {H, T} = Heads or Tails,

• E2 = {H} = Heads only,

• E3 = {T} = tails only,

• E4 = {} = ∅ (Empty Set)= Nothing, etc.

2. • E1 = {11} = Both dice shows 1 ,

• E2 = {11, 12, 13, 21, 22, 31} = Sum of the numbers is less

than 5, etc.

3. • E1 = {x : 80 < x < 92} = Blood pressure level is between

80 and 92,

• E2 = {x : x > 100} = Blood pressure level exceeds 100, etc.

4. • E1 = {A+} = A positive blood group , etc.

5. • E1 = {A+, A, A−} = At the least an A,

Chapter 3 4


• E2 = {A+, A, A−, B+, B, B−} = At the least a B, etc.

3.2.4 Probability

In the coin-tossing example, if the experiment is conducted with

fairness, the chance of “H” appearing is “50-50” in any toss. Why is

that? From our experience we know that if we toss the coin for a large

number of times, the number of times we will see “H” will closely

match the number of times we will see “H”. Thus we say that, in

this experiment the two outcomes are equally likely, or equivalently,

the outcome “H” will occur with a probability of 12. We write,

Pr(H) =1

2= Pr(T ).

Note that if the coin is weighted in such a way that “H” is three

times as likely to occur as “T”, then

Pr(H) =3

4, and Pr(T ) =

1

4.

Similarly, while tossing an “unbiased” die once, every number is

Chapter 3 5


equally likely to show up, resulting in

Pr(1) = Pr(2) = Pr(3) = Pr(4) = Pr(5) = Pr(6) =1

6.

In both cases we have assigned a number between 0 and 1 to each

of the outcomes in the sample space such that the total equals 1.

In a similar fashion, one can define the probabilities of events. For

example, while tossing a fair die, the two events

E1 : Less than 4

and

E2 : Greater than 3

are equally likely. we write

Pr(E1) = Pr(E2) =1

2.

Assigning probabilities in the blood pressure measurement exam-

ple or in the blood type examples is not as straightforward as in these

toy examples. However, the concept of probability easily generalizes

to those situations. In the coin toss example, we have assigned a

Chapter 3 6


probability of 12 to the outcome “H” because in “large” number of

tosses you would expect 50% of the times “heads” occuring. Simi-

larly, if we measure the blood pressure for a “large” number of times,

we would be able to know what proportion of times the blood prea-

sure level stays between 80 and 92. This proportion will serve as an

“estimate” of the probability of corresponding event. That is,

Pr(E) = P{x : 80 < x < 92}

=Number of measurements greater than 80 but less than 92

Total number of measurements.

Such probabilities are known as “empirical probabilities”.

In Table 3.1 of FOB, probability of a male live birth during 1965

is given by1, 927, 054

3, 760, 358= 0.51247.

If my chair randomly picks up a student from this class to know

about my teaching style, what is the probability that the student will

be a female?

Pr(F ) =#female students in this class

Total number of students.

Chapter 3 7


BASIC PROBABILITY LAWS

(i)For any event E, 0 ≤ Pr(E) ≤ 1.

(ii) Pr(S) = 1.

Intersection, Union and Complement

The union of two events E1 and E2, denoted by E1 ∪ E2 and read

as “ E1orE2”, is the event consisting of all outcomes that are either

in E1 or in E2 or in both. If in the blood pressure measurement

example,

E1 = {x : x < 90}

and

E2 = {x : 90 ≤ x < 95},

then

E1 ∪ E2 = {x : x < 95}.

The intersection of two events E1 and E2, denoted by E1 ∩ E2 and

read as “E1andE2”, is the event consisting of all outcomes that are

Chapter 3 8


both in E1 and in E2. In the above example,

E1 ∩ E2 = ∅.

But if we define another event as E3 : {x : x < 94}, then

E1 ∩ E3 = {x : x < 90},

and

E2 ∩ E3 = {x : 90 ≤ x < 94}.

The complement of an event E, denoted by E or Ec, is the event

consisting of all outcomes that are not in E. In the above example,

E1 = {x : x ≥ 90},

and

E2 = {x : x < 90 or x ≥ 95}.

Disjoint events

Two events E1 and E2 are disjoint or mutually exclusive if they can-

not both happen at the same time. In other words, two disjoint

Chapter 3 9


events E1 and E2 does not share any common outcomes. For ex-

ample, in the coin tossing example, the two events E2 = {H} and

E3 = {T} are mutually exclusive. However, E1 = {H, T} and E2

are not disjoint. In the blood pressure measuring example, the events

E1 = {x : x < 90} and E2 = {x : 90 ≤ x < 95} are disjoint.

BASIC PROBABILITY LAWS

For mutually exclusive events E1 and E2,

(i)E1 ∩ E2 = ∅, and

(ii) Pr(E1 ∪ E2) = Pr(E1) + Pr(E2)

(iii)Pr(E) = 1 − Pr(E).

Example 3.2.1. Example 3.12 (FOB). Suppose A =mother

is hypertensive (DBP ≥ 95), B =father is hypertensive. Further

suppose, Pr(A) = 0.1 and Pr(B) = 0.2.

1. What is the probability that the father is not hypertensive?

Pr(A) = 1 − 0.2 = 0.8.

2. What can we tell about the probability that both mother and

father are hypertensive?

Chapter 3 10


3.2.5 Independent Events

Two events E1 and E2 are called independent events if the occurrence

of one does not depend on the occurrence of the other. In terms of

probability,

Multiplicative Probability Law for Independent Events

For two independent events E1 and E2,

(i) Pr(E1 ∩ E2) = Pr(E1) × Pr(E2)

Example 3.2.2. Example 3.13 (FOB). Suppose A =mother

is hypertensive (DBP ≥ 95), B =father is hypertensive. Further

suppose, Pr(A) = 0.1 and Pr(B) = 0.2.

1. What can we tell about the probability that both mother and

father are hypertensive?

If we assume that the hypertensive status of the mother does

not depend at all on that of the father, then the probability

that both mother and father are hypertensive is Pr(A ∩ B) =

Pr(A) × Pr(B) = 0.1 × 0.2 = 0.02.

Chapter 3 11


3.2.6 Dependent Events

Two events E1 and E2 are called dependent events if the occurrence

of one depends on the occurrence of the other. In terms of probability,

for two dependent events E1 and E2,

Pr(E1 ∩ E2) 6= Pr(E1) × Pr(E2).

Example 3.2.3. Example 3.15 (FOB). Suppose

A+ =Doctor A makes a positive diagnosis,

B+ =Doctor B makes a positive diagnosis.

Given that,

Pr(A+) = 0.1,

P r(B+) = 0.17,

and

Pr(A+ ∩ B+) = 0.08.

Do you think that doctors A and B make independent diagnosis?

Pr(A+) × Pr(B+) = 0.1 ∗ 0.17 = 0.17 6= Pr(A+ ∩ B+) = 0.08.

Chapter 3 12


Thus events A+ and B+ are not independent.

Additive Probability Law

For two events E1 and E2,

(i) Pr(E1 ∪ E2) = Pr(E1) + Pr(E2) − P (E1 ∩ E2).

Example 3.2.4. Example 3.16 (FOB). Suppose

A+ =Doctor A makes a positive diagnosis,

B+ =Doctor B makes a positive diagnosis.

Given that,

Pr(A+) = 0.1,

P r(B+) = 0.17,

and

Pr(A+ ∩ B+) = 0.08.

Chapter 3 13


What is the probability that a patient will be diagnosed positive by

at least one of the two doctors?

Pr(A+ ∪ B+) = Pr(A) + Pr(B+) − Pr(A+ ∩ B+)

= 0.1 + 0.17 − 0.08

= 0.19.

3.3 Conditional Probability

In the above example, suppose, doctor A diagnoses a patient as posi-

tive. The patient wonders, what would have happened if the patient

was seen by doctor B? Can our probability theory help here?

Given A+, what can we say about B+?

In what proportion of cases, doctor B diagnoses positive when

doctor A diagnoses positive?

Pr(B+|A+) =Pr(B+ ∩ A+)

Pr(A+)=

0.08

0.10= 0.80.

• The probability that B occurs, given that A have already oc-

Chapter 3 14


curred, denoted by P (B|A) (read as probability of B given A),

is known as the conditional probability of B given A and is given

by the formula:

Pr(B|A) =Pr(B ∩ A)

Pr(A). (3.3.1)

Similarly,

Pr(A|B) =Pr(A ∩ B)

Pr(B)=

Pr(B|A) × P (A)

Pr(B). (3.3.2)

Some Properties

For two independent events E1 and E2,

(i) Pr(E1|E2) = Pr(E1)

(ii) Pr(E1|E2) = Pr(E1)

(iii) Pr(E1|E2) = Pr(E1)

(iv) Pr(E1|E2) = Pr(E1)

3.3.1 Relative Risk

The relative risk of B given A is defined as

RR =Pr(B|A)

Pr(B|A). (3.3.3)

Chapter 3 15


Example 3.3.1. Example 3.20 (FOB).

Pr(B+|A+) =Pr(B+ ∩ A+)

Pr(A+)=

0.08

0.10= 0.80.

P r(B+|A−) =Pr(B+ ∩ A−)

Pr(A−)

=Pr(B+) − Pr(A+ ∩ B+)

Pr(A−)

=0.17 − 0.08

1 − 0.10

= 0.10.

RR =Pr(B+|A+)

Pr(B+|A+)=

0.8

0.1= 8,

indicating that doctor B is 8 times as likely to diagnose a patient as

positive when doctor A diagnoses the patient as positive than when

doctor A diagnoses the patient as negative.

Chapter 3 16


3.3.2 Total Probability

Let us consider the following example:

Example 3.3.2. A chain of drug stores sells three different brands

of over the counter (OC) pain relievers. Of its OC pain reliever sales,

50% are brand A, 30% are brand B, and 20% are brand C. Each

manufacturer offers a 6-months satisfaction warranty. It is known

that 10% of brand A is returned to the store for refund within 6

months, whereas the corresponding percentages for brands B and C

are 7% and 3%, respectively.

1. What is the probability that a randomly selected purchaser who

has bought an OC pain reliever will return to the store for a

refund within 6 months?

2. If a customer returns to the store for a refund, what is the proba-

bility that it is a brand A pain reliever? A brand B pain reliever?

A brand C pain reliever?

Chapter 3 17


P(A) = .5

P(B) = .3

P(C) = .2

P(R|A) = .10

P(not R|A) = .90

P(R|B) = .07

P(R|C) = .03

P(not R|B) = .93

P(not R|C) = .97

Law of total probability

Suppose A1, A2, . . ., An are mutually exclusive events such that

A1 ∪ A2 ∪ . . . ∪ An = S.

If B is another event in the sample space, then

Pr(B) =∑k

i=1 Pr(B|Ai)Pr(Ai).

Chapter 3 18


Example 3.3.3. FOB 3.19, 3.21 Suppose that 20 in 100,000

women with negative mammograms will develop breast cancer within

2 years whereas 1 woman in 10 with positive mammograms will have

developed breast cancer within 2 years. Suppose that only 7% of the

general population of women will have a positive mammogram.

1. What is the probability that a randomly selected woman will

develop breast cancer within 2 years of having mammogram?

2. Suppose that a woman is diagnosed with breast cancer. What is

the probability that she had a negative result in her last mam-

mogram?

Chapter 3 19


3.3.3 Bayes’ Rule

Bayes’ Theorem

Suppose A1, A2, . . ., An are mutually exclusive events such that

A1 ∪ A2 ∪ . . . ∪ An = S.

If B is another event in the sample space, then

Pr(Aj|B) =Pr(B|Aj)Pr(Aj)

∑ki=1

Pr(B|Ai)Pr(Ai).

In Example 3.3.2, to answer the second question

If a customer returns to the store for a refund, what is the

probability that it is a brand A pain reliever? A brand B

pain reliever? A brand C pain reliever?

we have used the Bayes’ theorem.

Pr(A|R) =Pr(R|A)Pr(A)

Pr(R|A)Pr(A) + Pr(R|B)Pr(B) + Pr(R|C)Pr(C)

=.1(.5)

.1(.5) + .07(.3) + .03(.2)

=50

77= 0.65. (3.3.4)

Chapter 3 20


Pr(B|R) =Pr(R|B)Pr(B)


=.07(.3)

.1(.5) + .07(.3) + .03(.2)

=3

11= 0.27. (3.3.5)

Pr(C|R) =Pr(R|C)Pr(C)


=.03(.2)

.1(.5) + .07(.3) + .03(.2)

=6

77= 0.08. (3.3.6)

Positive Predictive Value/Predictive Value Positive

Positive Predictive Value (PPV)/Predictive Value Positive (PV+) of

a screening test is the probability that a person has a disease given

that the test is positive.

PV + = Pr(disease|test+).

Chapter 3 21


Negative Predictive Value/Predictive Value Negative

Negative Predictive Value (NPV)/Predictive Value Negative (PV−)

of a screening test is the probability that a person does not have a

disease given that the test is negative.

PV − = Pr(no disease|test−).

Example 3.3.4. Example 3.3.3 Continued. For the mammo-

gram test data, positive predictive value for the mammogram test

is

PV + = Pr(Breast Cancer|Mammogram+) =1

10= 0.1.

The negative predictive value

PV − = Pr(No Breast Cancer|Mammogram−) = 1−20

100000= 0.9998.

Sensitivity

The sensitivity of a test is given by the probability that the test is

positive when the person has the disease. i.e.,

Sensitivity = Pr(Positive Test|disease).

Chapter 3 22


Specificity

The Specificity of a test is given by the probability that the test is

negative when the person is disease-free. i.e.,

Specificity = Pr(Negative Test|no disease).

Example 3.3.5. Review Question 3, Page 59, FOB.

PSA test result Total

Prostate cancer + - Total

+ 92 46 138

- 27 72 99

Total 119 118 237

1. Sensitivity of PSA test

Sensitivity = Pr(Positive Test|disease)

=92

138

= 0.67. (3.3.7)

In 67% of the cases the PSA test detects prostate cancer when

the patient has cancer.

Chapter 3 23


Specificity of PSA test

Specificity = Pr(Negative Test|no disease)

=72

99

= 0.73. (3.3.8)

In 73% of the cases the PSA test correctly declares that there is

no prostate cancer when the patient does not have cancer.

2. Positive and negative predictive values

PV + = Pr(Prostate Cancer|PSA+) =92

119= 0.77.

The negative predictive value

PV − = Pr(No Prostate Cancer|PSA−) =99

118= 0.84.

Chapter 3 24


Example 3.3.6. Mental Health: Table 3.5 on Page 69,

FOB.

Table 3.3: Prevalence of Alzheimer’s disease (cases per 100 population)

Age group Males Females

65-69 1.6 0.0

70-74 0.0 2.2

75-79 4.9 2.3

80-84 8.6 7.8

85+ 35.0 27.9

Suppose an unrelated 77-year-old man, 76-year-old woman, and

82-year-old woman are selected from the community. Let

A:{77-year-old man has Alzheimer’s disease},

B:{76-year-old woman has Alzheimer’s disease}, and

C:{82-year-old woman has Alzheimer’s disease}. Then,

Pr(A) = 0.049

Pr(B) = 0.023

Pr(C) = 0.078

Chapter 3 25


3.17. Pr(All three have Alzheimer’s disease)

Pr(ABC) = Pr(A)Pr(B)Pr(C) = 0.000087906

3.20. Pr(Exactly one of the three have the Alzheimer’s disease)

= Pr(ABC) + Pr(ABC) + Pr(ABC)

= 0.049 ∗ 0.977 ∗ 0.922 + 0.951 ∗ 0.023 ∗ 0.922 + 0.951 ∗ 0.977 ∗ 0.078

= 0.137

3.22. Let D: {two of the three people have Alzheimer’s disease}.

That is, D = ABC ∪ ABC ∪ ABC.

Pr(BC|D) =Pr(BCD)

Pr(D)

=Pr(ABC)

Pr(D)

=Pr(ABC)

Pr(ABC) + Pr(ABC) + Pr(ABC)

Chapter 3 26

Documents

Chapter 3 Introduction to Probabilitywahed/teaching/2041/summer07/chapter 3.pdf · Chapter 3 Introduction to Probability 3.1 What is Probability? Table 3.1: 5-year Incidence of breast