55
OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Embed Size (px)

Citation preview

Page 1: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

OPIM 5103 Descriptive Statistics

Random SamplingIntro to Probability and Discrete Distributions

Jan Stallaert

Professor of OPIM

Page 2: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Median

Page 3: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Measures of Central Tendency

Central Tendency

Average Median Mode

Geometric Mean1

1

n

ii

N

ii

XX

n

X

N

1/

12

n

Gn XXXX

Page 4: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Mean (Arithmetic Mean)

• Mean (arithmetic mean) of data values– Sample mean

– Population mean

1 1 2

n

ii n

XX X X

Xn n

1 1 2

N

ii N

XX X X

N N

Sample Size

Population Size

Page 5: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Mean (Arithmetic Mean)

• The most common measure of central tendency• Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

Excel function: =average(range)

Page 6: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Median

• Robust measure of central tendency• Not affected by extreme values

• In an ordered array, the median is the “middle” number

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

Excel function: =median(range)

Page 7: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Measures of Variation

Variation

Variance Standard Deviation Coefficient of Variation

PopulationVariance

Sample

Variance

PopulationStandardDeviationSample

Standard

Deviation

Range

Interquartile Range

Page 8: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

ExampleHistogram

0123456789

Bins

Fre

qu

en

cy

.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Histogram

00.5

11.5

22.5

33.5

44.5

Bins

Fre

qu

en

cy

.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Page 9: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

ExampleHistogram

0123456789

Bins

Fre

qu

en

cy

.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Histogram

01234

56789

Bins

Fre

qu

en

cy

.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

Page 10: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Range

• Measure of variation• Difference between the largest and the smallest

observations:

• Ignores the way in which data are distributedLargest SmallestRange X X

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Page 11: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Quartiles

• Split Ordered Data into 4 Quarters

• = Median, A Measure of Central Tendency

25% 25% 25% 25%

1Q 2Q 3Q

2Q

Excel function: =quartile(range, number)=0: minimum value=1: Q1

…=4: maximum value

Page 12: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

• Measure of spread/dispersion• Also known as midspread

– Spread in the middle 50%

• Difference between the first and third quartiles

• Not affected by extreme values

Interquartile Range

Page 13: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

• Important measure of variation• Shows variation about the mean

– Sample variance:

• “Average of squared deviations from the mean”• “Standard deviation” = square root of variance

2

2 1

1

n

ii

X XS

n

Variance

Page 14: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Excel functions

• Variance

=VAR(range)

• Standard Deviation

=STDEV(range)

Page 15: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

Page 16: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Coefficient of Variation

• Measures relative variation

• Always in percentage (%)

• Shows variation relative to mean

• Is used to compare two or more sets of data

measured in different units

100%S

CVX

Page 17: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Comparing Coefficient of Variation

• Stock A:– Average price last year = $50– Standard deviation = $5

• Stock B:– Average price last year = $100– Standard deviation = $5

• Coefficient of variation:– Stock A:

– Stock B:

$5100% 100% 10%

$50

SCV

X

$5100% 100% 5%

$100

SCV

X

Page 18: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Exploratory Data Analysis

• Box-and-whisker plot– Graphical display of data using 5-number summary

Median( )

4 6 8 10 12

XlargestXsmallest1Q 3Q

2Q

Page 19: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Coefficient of Correlation

• Measures the strength of the linear relationship between two quantitative variables

1

2 2

1 1

n

i ii

n n

i ii i

X X Y Yr

X X Y Y

Page 20: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Features of Correlation Coefficient

• Unit free

• Ranges between –1 and 1

• The closer to –1, the stronger the negative linear

relationship

• The closer to 1, the stronger the positive linear

relationship

• The closer to 0, the weaker any positive linear

relationship

Page 21: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = .6 r = 1

Page 22: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Producing Data

• Sampling methods

• Survey Errors

Page 23: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Probability Sampling

• Subjects of the sample are chosen based on known probabilities

Probability Samples

Simple Random

Systematic Stratified Cluster

Page 24: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Simple Random Samples

• Every individual or item from the frame has an equal chance of being selected

• Selection may be with replacement or without replacement

• Samples obtained from table of random numbers or computer random number generators

Page 25: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Random Samples

Page 26: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

• Decide on sample size: n

• Divide frame of N individuals into groups of k individuals: k=N/n

• Randomly select one individual from the 1st group

• Select every k-th individual thereafter

Systematic Samples

N = 64

n = 8

k = 8

First Group

Page 27: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Stratified Samples

• Population divided into two or more groups according to some common characteristic

• Simple random sample selected from each group

• The two or more samples are combined into one

Page 28: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Advantages and Disadvantages

• Simple random sample and systematic sample– Simple to use– May not be a good representation of the population’s

underlying characteristics

• Stratified sample– Ensures representation of individuals across the

entire population

• Cluster sample– More cost effective– Less efficient (need larger sample to acquire the

same level of precision)

Page 29: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Key Definitions

• A population (universe) is the collection of things under consideration

• A sample is a portion of the frame selected for analysis

• A parameter is a summary measure computed to describe a characteristic of the population

• A statistic is a summary measure computed to describe a characteristic of the sample

Page 30: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Population and Sample

Population Sample

Use parameters to summarize features

Use statistics to summarize features

Inference on the population from the sample

Page 31: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Reasons for Drawing a Sample

• Less time consuming than a census

• Less costly to administer than a census

• Less cumbersome and more practical to administer than a census of the targeted population

Page 32: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Evaluating Survey Worthiness

• What is the purpose of the survey?• Is the survey based on a probability sample?• Coverage error – appropriate frame• Nonresponse error – follow up• Measurement error – good questions elicit good

responses• Sampling error – always exists when

sample ≠ population

Page 33: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Types of Survey Errors

• Coverage error

• Non response error

• Sampling error

• Measurement error

Excluded from frame.

Follow up on non responses.

Chance differences from sample to sample.

Bad Question!

Page 34: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Measurement Errors

• Question PhrasingAvoid negations

• Telescoping Effect• “Halo” Effect• Overzealous/Underzealous

Page 35: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Probability

Page 36: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Probability

• Probability is the numerical measure of the likelihood that an event will occur

• Value is between 0 and 1

• Sum of the probabilities of all mutually exclusive and collective exhaustive events is 1

Certain

Impossible

.5

1

0

Page 37: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

(There are 2 ways to get one 6 and the other 4)e.g. P( ) = 2/36

Computing Probabilities

• The probability of an event E:

• Each of the outcomes in the sample space is equally likely to occur

number of event outcomes( )

total number of possible outcomes in the sample space

P E

X

T

Page 38: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Empirical Probability

Example: Find the probability that a randomly selected person will be struck by lightning this year .

The sample space consists of two simple events: the person is struck by lightning or is not. Because these simple events are not equally likely, we can use the relative frequency approximation (Rule 1) or subjectively estimate the probability (Rule 3). Using Rule 1, we can research past events to determine that in a recent year 377 people were struck by lightning in the US, which has a population of about 274,037,295. Therefore, P(struck by lightning in a year)

= 377 / 274,037,295 = 1/727,000

Page 39: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Computing Joint Probability

• The probability of a joint event, A and B:

( and ) = ( )

number of outcomes from both A and B

total number of possible outcomes in sample space

P A B P A B

E.g. (Red Card and Ace)

2 Red Aces 1

52 Total Number of Cards 26

P

Page 40: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Computing Compound Probability

• Probability of a compound event, A or B:( or ) ( )

number of outcomes from either A or B or both

total number of outcomes in sample space

P A B P A B

E.g. (Red Card or Ace)

4 Aces + 26 Red Cards - 2 Red Aces

52 total number of cards28 7

52 13

P

Page 41: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Compound Probability (Addition Rule)

P(A or B ) = P(A) + P(B) - P(A and B)

For Mutually Exclusive Events: P(A or B) = P(A) + P(B)

P(A and B)P(A) P(B)

Page 42: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Computing Conditional Probability

• The probability of event A given that event B has occurred:

( and )( | )

( )

P A BP A B

P B

E.g.

(Red Card given that it is an Ace)

2 Red Aces 1

4 Aces 2

P

Page 43: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Conditional Probability

American Int’l Total

Men 0.25 0.15 0.40

Women 0.45 0.15 0.60

Total 0.70 0.30

Q: What is the probability that a randomly selected student is American, knowing that the student is female?

Page 44: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Conditional Probability and Joint Probability

• Conditional probability:

• Multiplication rule for joint probability:

( and )( | )

( )

P A BP A B

P B

( and ) ( | ) ( )

( | ) ( )

P A B P A B P B

P B A P A

Page 45: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Conditional Probability and Statistical Independence

• Events A and B are independent if

• Events A and B are independent when the probability of one event, A, is not affected by another event, B

(continued)

( | ) ( )

or ( | ) ( )

or ( and ) ( ) ( )

P A B P A

P B A P B

P A B P A P B

Page 46: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Example

• A company has two suppliers A and B. Rush orders are placed to both. If no raw material arrives in 4 days, the process shuts down.– A can deliver within 4 days with 55% probability.– B can deliver within 4 days with 35% probability.

1.What is the probability that A and B deliver within 4 days?

2.What is the probability the process shuts down?

3.What is the probability at least one delivers in 4 days?

Page 47: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Stock Trader’s Almanac

• 1998 stock trader’s almanac has 48 years of data (1950-1997)

• Stocks up in January: 31 times• Stocks up in year: 36 times• Stocks up in January AND year: 29 times

Page 48: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Binomial Probability Distribution

• ‘n’ identical trials– e.g.: 15 tosses of a coin; ten light bulbs taken from a

warehouse

• Two mutually exclusive outcomes on each trials– e.g.: Head or tail in each toss of a coin; defective or

not defective light bulb

• Trials are independent– The outcome of one trial does not affect the outcome

of the other

• Constant probability for each trial– e.g.: Probability of getting a tail is the same each time

we toss the coin

Page 49: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Excel’s Binomial Function

=BINOMDIST(no. of successes, no. of trials, prob. of success, cumulative?)

Example=BINOMDIST(2,8,0.5, FALSE) (=0.11)

“Probability of tossing (exactly) two heads within 8 trials”

=BINOMDIST(2,8,0.5, TRUE) (=0.14)“Probability of tossing two heads or less within 8

trials”

Page 50: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Binomial Setting

Examples• Number of times newspaper arrives on time (i.e.,

before 7:30 AM) in a week/month• Number of times I roll “5” on a die in 20 rolls• Number of times I toss heads within 20 trials• Students pick random number between 1 and

10. Number of students who picked “7”• Number of people who will vote “Republican” in

a group of 20• Number of left-handed people in a group of 40

Page 51: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Service Center Staffing

0 0.36417 0.36417 Assumptions1 0.371602 0.735771 - 50 computers sold2 0.185801 0.921572 - Prob. customer calls for service = 0.023 0.06067 0.982242 - Want < 5% that there is no engineer4 0.014548 0.996795 0.002732 0.9995226 0.000418 0.999947 5.36E-05 0.9999948 5.88E-06 0.9999999 5.6E-07 1

10 4.69E-08 111 3.48E-09 112 2.31E-10 113 1.38E-11 114 7.42E-13 115 3.64E-14 116 1.62E-15 117 6.63E-17 118 2.48E-18 119 8.52E-20 120 2.7E-21 121 7.86E-23 122 2.11E-24 123 5.25E-26 124 1.21E-27 125 2.56E-29 126 5.02E-31 127 9.11E-33 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Number of Service Calls

Pro

bab

ilit

y

Cumul. Prob.

Probability

Page 52: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Poisson Distribution

• Poisson Process:– Discrete events in an “interval”

• The probability of One Successin an interval is stable

• The probability of More thanOne Success in this interval is 0

– The probability of success isindependent from interval to interval

– e.g.: number of customers arriving in 15 minutes– e.g.: number of defects per case of light bulbs

P X x

x

x

( |

!

e-

Page 53: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Excel’s Poisson Function

=POISSON(no. of occurences, mean, cumulative?)

Example

=POISSON(5,2,FALSE) (=0.036)“Probability that (exactly) five customers arrive wihtin

an hour when the overall average is two”

=POISSON(5,2,TRUE) (=0.983)“Probability that five or less customers arrive wihtin an

hour when the overall average is two”

Page 54: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Poisson Setting

Examples• Number of accidents at an intersection in 6

months• Number of people entering a bank in a 30-

minute interval• Number of kids ringing the doorbell in 30

minutes for Halloween• Number of times a Microsoft machine crashes

within 24 hours• Number of sewing flaws per (100) garment(s)

Page 55: OPIM 5103 Descriptive Statistics Random Sampling Intro to Probability and Discrete Distributions Jan Stallaert Professor of OPIM

Halloween

0 0.000335 0.000335 Assume: on average 4 kids /hour (=lambda)1 0.002684 0.0030192 0.010735 0.0137543 0.028626 0.042384 0.057252 0.0996325 0.091604 0.1912366 0.122138 0.3133747 0.139587 0.4529618 0.139587 0.5925479 0.124077 0.716624

10 0.099262 0.81588611 0.07219 0.88807612 0.048127 0.93620313 0.029616 0.96581914 0.016924 0.98274315 0.009026 0.99176916 0.004513 0.99628217 0.002124 0.99840618 0.000944 0.9993519 0.000397 0.99974720 0.000159 0.999906

A Poisson Distribution

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9101112131415161718192021

Probability

Cum. Prob.