Upload
others
View
33
Download
3
Embed Size (px)
Citation preview
v2020
1 / 7
Biomathematics 2
Probability, random variables.
Continuous random variable. Normal, standard normal
distribution.
Dr. Beรกta Bugyi
associate professor
University of Pรฉcs, Medical School
Department of Biophysics
2020
v2020
2 / 7
CONTINUOUS RANDOM VARIABLE continuous: uncountable, infinite number of values, arises from measurement
Probability โ discrete/continuous random variables
Letโs consider that a statistical experiment has an outcome corresponding to
A) a discrete random variable and X = 0 โ 10 (finite number of outcomes: 10)
Give the probability that the outcome is 6.
๐(๐ = 6) =1
10= 0.1
B) a continuous random variable and X = 0 โ 10 (infinite number of outcomes)
Give the probability that the outcome is 6. Exactly 6, not 6.1, 6.01, โฆ, 6.00000000001
๐(๐ = 6) =1
โ= 0
NORMAL DISTRIBUTION
๐(๐, ๐), ๐ = ๐๐๐๐, ๐ = ๐ ๐ก๐๐๐๐๐๐ ๐๐๐ฃ๐๐๐ก๐๐๐
Probability density function (PDF)
๐(๐ฅ) =1
โ2๐๐2exp (โ
(๐ฅ โ ๐)2
2๐2 )
Cumulative density function (CDF)
๐น(๐ฅ) = โซ1
โ2๐๐2exp (โ
(๐ฅ โ ๐)2
2๐2 )๐ฅ
โโ
Graphical representation of the PDF and CDF of normal distributions.
The normal distribution is defined by its mean (๐) and standard deviation (๐).
The PDF has a characteristic bell shape.
The PDF is symmetric to the mean of the distribution.
v2020
3 / 7
The inflection point of the PDF corresponds to the standard deviation of the distribution.
The width (width at half-maximum) of the PDF is proportional to the standard deviation; the
larger the width the larger the standard deviation.
Probability is given by the area under the PDF (see examples below).
Example 1
The test result of students from Subject 1 follows a normal distribution with a mean of 60% and
standard deviation of 10%. ๐ต(๐, ๐) = ๐ต(๐๐, ๐๐). Represent graphically the following
probabilities.
Q1.1: What is the probability that a student scores 60%? ๐(๐ = ๐ฅ = 60) = ?
Q1.2: What is the probability that a student scores less than 60%? ๐(๐ < ๐ฅ = 60) =?
Q1.3: What is the probability that a student scores more than 60%? ๐(๐ > ๐ฅ = 60) = ?
Q1.4: What is the probability that a student scores less than 80%? ๐(๐ < ๐ฅ = 80) = ?
Q1.5: What is the probability that a student scores between 60% and 80%? ๐(๐ฅ = 60 < ๐ < ๐ฅ =
80) = ?
Example 2
The test result of students from Subject 2 follows a normal distribution with a mean of 62% and
standard deviation of 8%. ๐ต(๐, ๐) = ๐ต(๐๐, ๐).
Question:
How can we work with different normal distributions? Do we need the PDF of each and every normal
distribution?
Answer:
Normal distributions can be standardized; โ normal distribution 1 standardized distribution
(standard normal distribution)
How to standardize normal distributions?
๐(๐, ๐)
z score: ๐ =๐โ๐
๐
z score: how many standard deviations (๐) is a given value (๐ฅ) from the mean (๐)
STANDARD NORMAL DISTRIBUTION
๐๐(0, 1), ๐ = 1, ๐ = 0
Probability density function (PDF)
๐(๐ฅ) =1
โ2๐๐2exp (โ
(๐ฅโ๐)2
2๐2 ) , ๐คโ๐๐๐ ๐ = 0 ๐๐๐ ๐ = 1: ๐(๐ฅ) =1
โ2๐exp (โ
๐ฅ2
2),
Cumulative density function (CDF)
๐น(๐ฅ) = โซ1
โ2๐exp (โ
๐ฅ2
2)
๐ฅ
โโ
Graphical representation of the PDF and CDF of the standard normal distribution.
v2020
4 / 7
Z table
summarizes the CDF of the standard normal distribution
Example 1
The test result of students from Subject 1 follows a normal distribution with a mean of 60% and
standard deviation of 10%. ๐ต(๐, ๐) = ๐ต(๐๐, ๐๐). Standardize the normal distribution. Give the
probabilities by using the Z table.
Q1.1: What is the probability that a student scores 60%? ๐(๐ = ๐ฅ = 60) = ?
๐(๐ = ๐ฅ = 60) = 0
Q1.2: What is the probability that a student scores less than 60%? ๐(๐ < ๐ฅ = 60) =?
๐ง =๐ฅ โ ๐
๐=
60 โ 60
10= 0.00
๐(๐ < ๐ฅ = 60) = 0.5 โ 50 %
Q1.3: What is the probability that a student scores more than 60%? ๐(๐ > ๐ฅ = 60) = ?
๐(๐ > ๐ฅ = 60) + ๐(๐ < ๐ฅ = 60) = 1
๐(๐ > ๐ฅ = 60) = 1 โ ๐(๐ < ๐ฅ = 60) = 1 โ 0.5 = 0.5 โ 50 %
Q1.4: What is the probability that a student scores less than 80%? ๐(๐ < ๐ฅ = 80) = ?
๐ง =๐ฅ โ ๐
๐=
80 โ 60
10= 2.00
๐(๐ < ๐ฅ = 80) = 0.9772 โ 97.72 %
Q1.5: What is the probability that a student scores between 60% and 80%? ๐(๐ฅ = 60 < ๐ < ๐ฅ =
80) = ?
๐(๐ < 80) โ ๐(๐ < 60) = 0.9772 โ 0.5 = 0.4772 โ 47.72%
Example 2
v2020
5 / 7
The test result of students from Subject 2 follows a normal distribution with a mean of 62% and
standard deviation of 8%. ๐ต(๐, ๐) = ๐ต(๐๐, ๐). Give the probabilities by using the Z table.
Q2.1: What is the probability that a student scores less than 65%? ๐(๐ < ๐ฅ = 65) =?
๐ง =๐ฅ โ ๐
๐=
65 โ 62
8= + 0.375
If a value is not listed in the table, use the following approximation:
+ 0.375 =0.37 + 0.38
2
๐(๐ < ๐ฅ = 65) =0.6443 + 0.6480
2= 0.6462 โ 64.62 %
Q2.2: What is the probability that a student scores less than 45%? ๐(๐ < ๐ฅ = 45) =?
๐ง =๐ฅ โ ๐
๐=
45 โ 62
8= โ2.125
If a value is not listed in the table, use the following approximation:
โ2.125 =โ2.12 + (โ2.13)
2
๐(๐ < ๐ฅ = 45) =0.0170 + 0.0166
2= 0.0168 โ 1.68 %
Q2.3: What is the probability that a student scores between 45% and 65%? ๐(๐ฅ = 45 < ๐ < ๐ฅ = 65) =
?
๐(๐ฅ = 45 < ๐ < ๐ฅ = 65) = ๐(๐ < ๐ฅ = 65) โ ๐(๐ < ๐ฅ = 45) = 0.6462 โ 0.0168 = 0.6294
โ 62.94 %
Q2.4: What is the median of the studentsโ scores? ๐(๐ < ๐ฅ) = 0.5, ๐ฅ = ?
๐(๐ < ๐ฅ) = 0.5 โ ๐ง = 0.00
๐ง =๐ฅ โ ๐
๐โ 0.00 =
๐ฅ โ 62
8โ ๐ฅ = 62
Note: The mean of a data set following normal distribution is equal to its median.
Q2.5: What is the first quartile of the studentsโ scores? ๐(๐ < ๐ฅ) = 0.25, ๐ฅ = ?
๐(๐ < ๐ฅ) = 0.25 โ ๐ง = โ0.675
๐ง =๐ฅ โ ๐
๐โ โ0.675 =
๐ฅ โ 62
8โ ๐ฅ = 56.6
Q2.6: What is the third quartile of the studentsโ scores? ๐(๐ < ๐ฅ) = 0.75, ๐ฅ = ?
๐(๐ < ๐ฅ) = 0.75 โ ๐ง = 0.675
๐ง =๐ฅ โ ๐
๐โ 0.675 =
๐ฅ โ 62
8โ ๐ฅ = 67.4
Q2.7: Find what percentage of data is between mean ยฑ 1รstandard deviation, mean ยฑ 2รstandard
deviation, mean ยฑ 3รstandard deviation.
v2020
6 / 7
IMPORTANCE OF NORMAL DISTRIBUTION
CENTRAL LIMIT THEOREM
Example 3
In a population of persons let X = life expectancy of a person (in years). The distribution of X
has a mean and standard deviation of 72 and 18.2 years, respectively.
๐ = ๐๐๐๐ ๐๐ฅ๐๐๐๐ก๐๐๐๐ฆ ๐๐ ๐ ๐๐๐๐ ๐๐ ๐๐ ๐ ๐๐๐๐ข๐๐๐ก๐๐๐ (๐ฆ๐๐๐๐ )
๐ = ๐ฅ๐๐๐๐ ๐๐1, ๐ฅ๐๐๐๐ ๐๐2, โฆ
We choose samples from the population, each of the samples consists of n persons and by
finding the average lifetime in each sample (๏ฟฝฬ ๏ฟฝ, sample mean) we obtain the distribution of ๏ฟฝฬ ๏ฟฝ.
Sampling distribution of sample means: a distribution of the sample means calculated from all
possible random samples of a specific size (n) taken from a population.
๏ฟฝฬ ๏ฟฝ = ๐๐ฃ๐๐๐๐๐ ๐๐๐๐ ๐๐ฅ๐๐๐๐ก๐๐๐๐ฆ ๐๐ ๐๐๐๐ ๐๐๐ ๐๐ ๐ ๐ ๐๐๐๐๐ (๐ฆ๐๐๐๐ )
๏ฟฝฬ ๏ฟฝ = ๏ฟฝฬ ๏ฟฝ๐ ๐๐๐๐๐1, ๏ฟฝฬ ๏ฟฝ๐ ๐๐๐๐๐2, โฆ
Properties of the distribution of the sample means
๐๏ฟฝฬ ๏ฟฝ = ๐๐
๐๏ฟฝฬ ๏ฟฝ =๐๐
โ๐ (standard error of the mean, SEM)
Characteristics of the distribution: Central limit theorem (CLT)
POPULATION SAMPLE
๐ = ๐ฅ
life expectancy of a person in a
population
๏ฟฝฬ ๏ฟฝ = ๏ฟฝฬ ๏ฟฝ
average life expectancy of persons in a
sample
normal distribution normal distribution for any n
not normal/not known distribution
CLT: if n is large enough (๐ โฅ 30)
approximated by normal distribution
the larger n, the better the approximation
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Q3.1: Consider that X has normal distribution: ๐๐(72, 18.2). What is the distribution of ๏ฟฝฬ ๏ฟฝ if n
= 10 or n = 40?
n = 10 normal, n = 40 normal
Q3.2: Consider that the distribution of X is not known/not normal. What is the distribution of
๏ฟฝฬ ๏ฟฝ if n = 10 or n = 40?
n = 10 not known/not normal, n = 40 approximated by normal
Q3.3: What is the mean of ๏ฟฝฬ ๏ฟฝ and standard deviation of ๏ฟฝฬ ๏ฟฝ (standard error of the mean) if n = 40?
๐๏ฟฝฬ ๏ฟฝ = ๐๐ = 72
๐๏ฟฝฬ ๏ฟฝ =๐๐
โ๐=
18.2
โ40= 2.88
v2020
7 / 7
๐๏ฟฝฬ ๏ฟฝ(72, 2.88)
Q3.4: Find ๐(๐ < ๐ฅ = 70) and ๐(๏ฟฝฬ ๏ฟฝ < ๏ฟฝฬ ๏ฟฝ = 70)?
๐(๐ < ๐ฅ = 70): What is the probability that the life expectancy of a person in the population
is less than 70 years?
๐๐(72, 18.2)
๐ง =๐ฅ โ ๐
๐=
70 โ 72
18.2= โ0.109
๐(๐ < ๐ฅ = 70) = 0.4247 โ 42.47 %
๐(๏ฟฝฬ ๏ฟฝ < ๏ฟฝฬ ๏ฟฝ = 70): What is the probability that the average life expectancy of persons in a sample
is less than 70 years?
๐๏ฟฝฬ ๏ฟฝ(72, 2.88)
๐ง =๐ฅ โ ๐
๐=
๏ฟฝฬ ๏ฟฝ โ ๐
๐๏ฟฝฬ ๏ฟฝ=
๏ฟฝฬ ๏ฟฝ โ ๐๐๐
โ๐
=70 โ 72
2.88= โ0.7
๐(๏ฟฝฬ ๏ฟฝ < ๏ฟฝฬ ๏ฟฝ = 70) = 0.2420 โ 24.2 %