7
v2020 1 / 7 Biomathematics 2 Probability, random variables. Continuous random variable. Normal, standard normal distribution. Dr. Beรกta Bugyi associate professor University of Pรฉcs, Medical School Department of Biophysics 2020

Probability, random variables. Continuous random variable

  • Upload
    others

  • View
    33

  • Download
    3

Embed Size (px)

Citation preview

v2020

1 / 7

Biomathematics 2

Probability, random variables.

Continuous random variable. Normal, standard normal

distribution.

Dr. Beรกta Bugyi

associate professor

University of Pรฉcs, Medical School

Department of Biophysics

2020

v2020

2 / 7

CONTINUOUS RANDOM VARIABLE continuous: uncountable, infinite number of values, arises from measurement

Probability โ€“ discrete/continuous random variables

Letโ€™s consider that a statistical experiment has an outcome corresponding to

A) a discrete random variable and X = 0 โ€“ 10 (finite number of outcomes: 10)

Give the probability that the outcome is 6.

๐‘ƒ(๐‘‹ = 6) =1

10= 0.1

B) a continuous random variable and X = 0 โ€“ 10 (infinite number of outcomes)

Give the probability that the outcome is 6. Exactly 6, not 6.1, 6.01, โ€ฆ, 6.00000000001

๐‘ƒ(๐‘‹ = 6) =1

โˆž= 0

NORMAL DISTRIBUTION

๐‘(๐œ‡, ๐œŽ), ๐œ‡ = ๐‘š๐‘’๐‘Ž๐‘›, ๐œŽ = ๐‘ ๐‘ก๐‘Ž๐‘›๐‘‘๐‘Ž๐‘Ÿ๐‘‘ ๐‘‘๐‘’๐‘ฃ๐‘–๐‘Ž๐‘ก๐‘–๐‘œ๐‘›

Probability density function (PDF)

๐‘“(๐‘ฅ) =1

โˆš2๐œ‹๐œŽ2exp (โˆ’

(๐‘ฅ โˆ’ ๐œ‡)2

2๐œŽ2 )

Cumulative density function (CDF)

๐น(๐‘ฅ) = โˆซ1

โˆš2๐œ‹๐œŽ2exp (โˆ’

(๐‘ฅ โˆ’ ๐œ‡)2

2๐œŽ2 )๐‘ฅ

โˆ’โˆž

Graphical representation of the PDF and CDF of normal distributions.

The normal distribution is defined by its mean (๐œ‡) and standard deviation (๐œŽ).

The PDF has a characteristic bell shape.

The PDF is symmetric to the mean of the distribution.

v2020

3 / 7

The inflection point of the PDF corresponds to the standard deviation of the distribution.

The width (width at half-maximum) of the PDF is proportional to the standard deviation; the

larger the width the larger the standard deviation.

Probability is given by the area under the PDF (see examples below).

Example 1

The test result of students from Subject 1 follows a normal distribution with a mean of 60% and

standard deviation of 10%. ๐‘ต(๐, ๐ˆ) = ๐‘ต(๐Ÿ”๐ŸŽ, ๐Ÿ๐ŸŽ). Represent graphically the following

probabilities.

Q1.1: What is the probability that a student scores 60%? ๐‘ƒ(๐‘‹ = ๐‘ฅ = 60) = ?

Q1.2: What is the probability that a student scores less than 60%? ๐‘ƒ(๐‘‹ < ๐‘ฅ = 60) =?

Q1.3: What is the probability that a student scores more than 60%? ๐‘ƒ(๐‘‹ > ๐‘ฅ = 60) = ?

Q1.4: What is the probability that a student scores less than 80%? ๐‘ƒ(๐‘‹ < ๐‘ฅ = 80) = ?

Q1.5: What is the probability that a student scores between 60% and 80%? ๐‘ƒ(๐‘ฅ = 60 < ๐‘‹ < ๐‘ฅ =

80) = ?

Example 2

The test result of students from Subject 2 follows a normal distribution with a mean of 62% and

standard deviation of 8%. ๐‘ต(๐, ๐ˆ) = ๐‘ต(๐Ÿ”๐Ÿ, ๐Ÿ–).

Question:

How can we work with different normal distributions? Do we need the PDF of each and every normal

distribution?

Answer:

Normal distributions can be standardized; โˆž normal distribution 1 standardized distribution

(standard normal distribution)

How to standardize normal distributions?

๐‘(๐œ‡, ๐œŽ)

z score: ๐’› =๐’™โˆ’๐

๐ˆ

z score: how many standard deviations (๐œŽ) is a given value (๐‘ฅ) from the mean (๐œ‡)

STANDARD NORMAL DISTRIBUTION

๐‘†๐‘(0, 1), ๐œ‡ = 1, ๐œŽ = 0

Probability density function (PDF)

๐‘“(๐‘ฅ) =1

โˆš2๐œ‹๐œŽ2exp (โˆ’

(๐‘ฅโˆ’๐œ‡)2

2๐œŽ2 ) , ๐‘คโ„Ž๐‘’๐‘Ÿ๐‘’ ๐œ‡ = 0 ๐‘Ž๐‘›๐‘‘ ๐œŽ = 1: ๐‘“(๐‘ฅ) =1

โˆš2๐œ‹exp (โˆ’

๐‘ฅ2

2),

Cumulative density function (CDF)

๐น(๐‘ฅ) = โˆซ1

โˆš2๐œ‹exp (โˆ’

๐‘ฅ2

2)

๐‘ฅ

โˆ’โˆž

Graphical representation of the PDF and CDF of the standard normal distribution.

v2020

4 / 7

Z table

summarizes the CDF of the standard normal distribution

Example 1

The test result of students from Subject 1 follows a normal distribution with a mean of 60% and

standard deviation of 10%. ๐‘ต(๐, ๐ˆ) = ๐‘ต(๐Ÿ”๐ŸŽ, ๐Ÿ๐ŸŽ). Standardize the normal distribution. Give the

probabilities by using the Z table.

Q1.1: What is the probability that a student scores 60%? ๐‘ƒ(๐‘‹ = ๐‘ฅ = 60) = ?

๐‘ƒ(๐‘‹ = ๐‘ฅ = 60) = 0

Q1.2: What is the probability that a student scores less than 60%? ๐‘ƒ(๐‘‹ < ๐‘ฅ = 60) =?

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽ=

60 โˆ’ 60

10= 0.00

๐‘ƒ(๐‘‹ < ๐‘ฅ = 60) = 0.5 โ†’ 50 %

Q1.3: What is the probability that a student scores more than 60%? ๐‘ƒ(๐‘‹ > ๐‘ฅ = 60) = ?

๐‘ƒ(๐‘‹ > ๐‘ฅ = 60) + ๐‘ƒ(๐‘‹ < ๐‘ฅ = 60) = 1

๐‘ƒ(๐‘‹ > ๐‘ฅ = 60) = 1 โˆ’ ๐‘ƒ(๐‘‹ < ๐‘ฅ = 60) = 1 โˆ’ 0.5 = 0.5 โ†’ 50 %

Q1.4: What is the probability that a student scores less than 80%? ๐‘ƒ(๐‘‹ < ๐‘ฅ = 80) = ?

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽ=

80 โˆ’ 60

10= 2.00

๐‘ƒ(๐‘‹ < ๐‘ฅ = 80) = 0.9772 โ†’ 97.72 %

Q1.5: What is the probability that a student scores between 60% and 80%? ๐‘ƒ(๐‘ฅ = 60 < ๐‘‹ < ๐‘ฅ =

80) = ?

๐‘ƒ(๐‘‹ < 80) โˆ’ ๐‘ƒ(๐‘‹ < 60) = 0.9772 โˆ’ 0.5 = 0.4772 โ†’ 47.72%

Example 2

v2020

5 / 7

The test result of students from Subject 2 follows a normal distribution with a mean of 62% and

standard deviation of 8%. ๐‘ต(๐, ๐ˆ) = ๐‘ต(๐Ÿ”๐Ÿ, ๐Ÿ–). Give the probabilities by using the Z table.

Q2.1: What is the probability that a student scores less than 65%? ๐‘ƒ(๐‘‹ < ๐‘ฅ = 65) =?

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽ=

65 โˆ’ 62

8= + 0.375

If a value is not listed in the table, use the following approximation:

+ 0.375 =0.37 + 0.38

2

๐‘ƒ(๐‘‹ < ๐‘ฅ = 65) =0.6443 + 0.6480

2= 0.6462 โ†’ 64.62 %

Q2.2: What is the probability that a student scores less than 45%? ๐‘ƒ(๐‘‹ < ๐‘ฅ = 45) =?

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽ=

45 โˆ’ 62

8= โˆ’2.125

If a value is not listed in the table, use the following approximation:

โˆ’2.125 =โˆ’2.12 + (โˆ’2.13)

2

๐‘ƒ(๐‘‹ < ๐‘ฅ = 45) =0.0170 + 0.0166

2= 0.0168 โ†’ 1.68 %

Q2.3: What is the probability that a student scores between 45% and 65%? ๐‘ƒ(๐‘ฅ = 45 < ๐‘‹ < ๐‘ฅ = 65) =

?

๐‘ƒ(๐‘ฅ = 45 < ๐‘‹ < ๐‘ฅ = 65) = ๐‘ƒ(๐‘‹ < ๐‘ฅ = 65) โˆ’ ๐‘ƒ(๐‘‹ < ๐‘ฅ = 45) = 0.6462 โˆ’ 0.0168 = 0.6294

โ†’ 62.94 %

Q2.4: What is the median of the studentsโ€™ scores? ๐‘ƒ(๐‘‹ < ๐‘ฅ) = 0.5, ๐‘ฅ = ?

๐‘ƒ(๐‘‹ < ๐‘ฅ) = 0.5 โ†’ ๐‘ง = 0.00

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽโ†’ 0.00 =

๐‘ฅ โˆ’ 62

8โ†’ ๐‘ฅ = 62

Note: The mean of a data set following normal distribution is equal to its median.

Q2.5: What is the first quartile of the studentsโ€™ scores? ๐‘ƒ(๐‘‹ < ๐‘ฅ) = 0.25, ๐‘ฅ = ?

๐‘ƒ(๐‘‹ < ๐‘ฅ) = 0.25 โ†’ ๐‘ง = โˆ’0.675

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽโ†’ โˆ’0.675 =

๐‘ฅ โˆ’ 62

8โ†’ ๐‘ฅ = 56.6

Q2.6: What is the third quartile of the studentsโ€™ scores? ๐‘ƒ(๐‘‹ < ๐‘ฅ) = 0.75, ๐‘ฅ = ?

๐‘ƒ(๐‘‹ < ๐‘ฅ) = 0.75 โ†’ ๐‘ง = 0.675

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽโ†’ 0.675 =

๐‘ฅ โˆ’ 62

8โ†’ ๐‘ฅ = 67.4

Q2.7: Find what percentage of data is between mean ยฑ 1ร—standard deviation, mean ยฑ 2ร—standard

deviation, mean ยฑ 3ร—standard deviation.

v2020

6 / 7

IMPORTANCE OF NORMAL DISTRIBUTION

CENTRAL LIMIT THEOREM

Example 3

In a population of persons let X = life expectancy of a person (in years). The distribution of X

has a mean and standard deviation of 72 and 18.2 years, respectively.

๐‘‹ = ๐‘™๐‘–๐‘“๐‘’ ๐‘’๐‘ฅ๐‘๐‘’๐‘๐‘ก๐‘Ž๐‘›๐‘๐‘ฆ ๐‘œ๐‘“ ๐‘Ž ๐‘๐‘’๐‘Ÿ๐‘ ๐‘œ๐‘› ๐‘–๐‘› ๐‘Ž ๐‘๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘› (๐‘ฆ๐‘’๐‘Ž๐‘Ÿ๐‘ )

๐‘‹ = ๐‘ฅ๐‘๐‘’๐‘Ÿ๐‘ ๐‘œ๐‘›1, ๐‘ฅ๐‘๐‘’๐‘Ÿ๐‘ ๐‘œ๐‘›2, โ€ฆ

We choose samples from the population, each of the samples consists of n persons and by

finding the average lifetime in each sample (๏ฟฝฬ…๏ฟฝ, sample mean) we obtain the distribution of ๏ฟฝฬ…๏ฟฝ.

Sampling distribution of sample means: a distribution of the sample means calculated from all

possible random samples of a specific size (n) taken from a population.

๏ฟฝฬ…๏ฟฝ = ๐‘Ž๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘”๐‘’ ๐‘™๐‘–๐‘“๐‘’ ๐‘’๐‘ฅ๐‘๐‘’๐‘๐‘ก๐‘Ž๐‘›๐‘๐‘ฆ ๐‘œ๐‘“ ๐‘๐‘’๐‘Ÿ๐‘ ๐‘œ๐‘›๐‘  ๐‘–๐‘› ๐‘Ž ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ (๐‘ฆ๐‘’๐‘Ž๐‘Ÿ๐‘ )

๏ฟฝฬ…๏ฟฝ = ๏ฟฝฬ…๏ฟฝ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’1, ๏ฟฝฬ…๏ฟฝ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’2, โ€ฆ

Properties of the distribution of the sample means

๐œ‡๏ฟฝฬ…๏ฟฝ = ๐œ‡๐‘‹

๐œŽ๏ฟฝฬ…๏ฟฝ =๐œŽ๐‘‹

โˆš๐‘› (standard error of the mean, SEM)

Characteristics of the distribution: Central limit theorem (CLT)

POPULATION SAMPLE

๐‘‹ = ๐‘ฅ

life expectancy of a person in a

population

๏ฟฝฬ…๏ฟฝ = ๏ฟฝฬ…๏ฟฝ

average life expectancy of persons in a

sample

normal distribution normal distribution for any n

not normal/not known distribution

CLT: if n is large enough (๐‘› โ‰ฅ 30)

approximated by normal distribution

the larger n, the better the approximation

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Q3.1: Consider that X has normal distribution: ๐‘๐‘‹(72, 18.2). What is the distribution of ๏ฟฝฬ…๏ฟฝ if n

= 10 or n = 40?

n = 10 normal, n = 40 normal

Q3.2: Consider that the distribution of X is not known/not normal. What is the distribution of

๏ฟฝฬ…๏ฟฝ if n = 10 or n = 40?

n = 10 not known/not normal, n = 40 approximated by normal

Q3.3: What is the mean of ๏ฟฝฬ…๏ฟฝ and standard deviation of ๏ฟฝฬ…๏ฟฝ (standard error of the mean) if n = 40?

๐œ‡๏ฟฝฬ…๏ฟฝ = ๐œ‡๐‘‹ = 72

๐œŽ๏ฟฝฬ…๏ฟฝ =๐œŽ๐‘‹

โˆš๐‘›=

18.2

โˆš40= 2.88

v2020

7 / 7

๐‘๏ฟฝฬ…๏ฟฝ(72, 2.88)

Q3.4: Find ๐‘ƒ(๐‘‹ < ๐‘ฅ = 70) and ๐‘ƒ(๏ฟฝฬ…๏ฟฝ < ๏ฟฝฬ…๏ฟฝ = 70)?

๐‘ƒ(๐‘‹ < ๐‘ฅ = 70): What is the probability that the life expectancy of a person in the population

is less than 70 years?

๐‘๐‘‹(72, 18.2)

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽ=

70 โˆ’ 72

18.2= โˆ’0.109

๐‘ƒ(๐‘‹ < ๐‘ฅ = 70) = 0.4247 โ†’ 42.47 %

๐‘ƒ(๏ฟฝฬ…๏ฟฝ < ๏ฟฝฬ…๏ฟฝ = 70): What is the probability that the average life expectancy of persons in a sample

is less than 70 years?

๐‘๏ฟฝฬ…๏ฟฝ(72, 2.88)

๐‘ง =๐‘ฅ โˆ’ ๐œ‡

๐œŽ=

๏ฟฝฬ…๏ฟฝ โˆ’ ๐œ‡

๐œŽ๏ฟฝฬ…๏ฟฝ=

๏ฟฝฬ…๏ฟฝ โˆ’ ๐œ‡๐œŽ๐‘‹

โˆš๐‘›

=70 โˆ’ 72

2.88= โˆ’0.7

๐‘ƒ(๏ฟฝฬ…๏ฟฝ < ๏ฟฝฬ…๏ฟฝ = 70) = 0.2420 โ†’ 24.2 %