30
DATA Exploration: Statistics (One Variable) 1. Basic EXCELL/MATLAB functions for data exploration 2. Measures of central tendency, Distributions 1. Mean 2. Median 3. Mode 3. Measures of spread 1. Range 2. Variance 4. Simple Sampling 5. Example of Sampling by using EXCELL

DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Embed Size (px)

Citation preview

Page 1: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

DATA Exploration: Statistics (One Variable)

1. Basic EXCELL/MATLAB functions for data exploration 2. Measures of central tendency, Distributions

1. Mean2. Median3. Mode

3. Measures of spread1. Range2. Variance

4. Simple Sampling5. Example of Sampling by using EXCELL

Page 2: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

2

1. Working with Data in Excel: Arithmetic

Page 3: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

3

Use “Insert” then “Function” then “All” or “Statistical” to find an alphabetical list of functions

1. Summary Statistics in EXCELL (One Variable)

Page 4: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

4

1. Summary Statistics in EXCELL Average

Page 5: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

5

1. Summary Statistics in EXCELL (Median)

Page 6: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

6

1. Summary Statistics in EXCELL (Standard Deviation)

Page 7: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

7

1. Summary Statistics in EXCELL (Rand & RandBetween)

Page 8: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

8

1. Summary Statistics in EXCELL (Sort )

Page 9: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Function Descriptionmax Maximum valuemean Average or mean valuemedian Median valuemin Smallest valuemode Most frequent valuestd Standard deviationvar Variance, which measures the spread or

dispersion of the values

1. Summary Statistics in MATLAB

Page 10: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

2. Distributions Continuous Probability Distributions

Uniform Probability Distribution

Normal Probability Distribution

Exponential Probability Distribution

f (x)f (x)

x x

Uniform

x

f (x)Normal

xx

f (x)f (x) Exponential

Page 11: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Uniform Probability Distribution

where: a = smallest value the variable can assume b = largest value the variable can assume

f (x) = 1/(b – a) for a < x < b = 0 elsewhere f (x) = 1/(b – a) for a < x < b = 0 elsewhere

A random variable is uniformly distributed whenever the probability is proportional to the interval’s length.

The uniform probability density function is:

Page 12: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Var(x) = (b - a)2/12Var(x) = (b - a)2/12

E(x) = (a + b)/2E(x) = (a + b)/2

Uniform Probability Distribution

Expected Value of x

Variance of x

Page 13: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

The highest point on the normal curve is at the mean, which is also the median and mode. The highest point on the normal curve is at the mean, which is also the median and mode.

Normal Probability Distribution Characteristics

x

Page 14: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Normal Probability Distribution

Characteristics

-10 0 20

The mean can be any numerical value: negative, zero, or positive. The mean can be any numerical value: negative, zero, or positive.

x

Page 15: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

3. Normal Probability Distribution

Characteristics

s = 15

s = 25

The standard deviation determines the width of thecurve: larger values result in wider, flatter curves.The standard deviation determines the width of thecurve: larger values result in wider, flatter curves.

x

Page 16: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Converting to the Standard Normal Distribution

Standard Normal Probability Distribution

zx

We can think of z as a measure of the number ofstandard deviations x is from .

Page 17: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

3. Normal Probability Distribution

Characteristics

xm – 3s m – 1s

m – 2sm + 1s

m + 2sm + 3s

m

68.26%

95.44%

99.72%

Page 18: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

4. Sampling and Sampling Distributions

x Sampling Distribution of

Introduction to Sampling Distributions

Point Estimation

Simple Random Sampling

Other Sampling Methods

p Sampling Distribution of

Page 19: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

4. Simple Random Sampling:

Finite populations are often defined by lists such as: Organization membership roster Credit card account numbers Inventory product numbers

A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.

Page 20: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

s is the point estimator of the population standard deviation . s is the point estimator of the population standard deviation .

In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter.

In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter.

4. Point Estimation

We refer to as the point estimator of the population mean . We refer to as the point estimator of the population mean .

x

is the point estimator of the population proportion p. is the point estimator of the population proportion p.p

Page 21: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Process of Statistical Inference

The value of is used tomake inferences aboutthe value of m.

x The sample data provide a value forthe sample mean .x

A simple random sampleof n elements is selectedfrom the population.

Population with meanm = ?

Sampling Distribution of x

Page 22: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

4. Simple Random Sampling

The applicants were numbered, from 1 to 900, as their applications arrived.

She decides a sample of 30 applicants will be used.

Furthermore, the Director of Admissions must obtain estimates of the population parameters of interest for a meeting taking place in a few hours.

Now suppose that the necessary data on the current year’s applicants were not yet entered in the college’s database.

The population parameters of interest are the SAT scores and the percentage of students planning to live in dorms.

Page 23: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Taking a Sample of 30 Applicants

Excel’s RAND function generates random numbers between 0 and 1

Excel’s RAND function generates random numbers between 0 and 1

4. Simple Random Sampling:

Step 1: Assign a random number to each of the 900 applicants.

Step 2: Select the 30 applicants corresponding to the 30 smallest random numbers.

Page 24: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

4. Using Excel to Selecta Simple Random Sample

Excel Formula Worksheet

A B C D

1Applicant Number

SAT Score

On-Campus Housing

Random Number

2 1 1008 Yes =RAND()3 2 1025 No =RAND()4 3 952 Yes =RAND()5 4 1090 Yes =RAND()6 5 1127 Yes =RAND()7 6 1015 No =RAND()8 7 965 Yes =RAND()9 8 1161 No =RAND()

Page 25: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

4. Using Excel to Selecta Simple Random Sample

Excel Value Worksheet

A B C D

1Applicant Number

SAT Score

On-Campus Housing

Random Number

2 1 1008 Yes 0.610213 2 1025 No 0.837624 3 952 Yes 0.589355 4 1090 Yes 0.199346 5 1127 Yes 0.866587 6 1015 No 0.605798 7 965 Yes 0.809609 8 1161 No 0.33224

Page 26: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Put Random Numbers in Ascending Order

4. Using Excel to Selecta Simple Random Sample

Step 4 When the Sort dialog box appears: Choose Random Numbers in

the Sort by text box Choose Ascending Click OK

Step 3 Choose the Sort optionStep 2 Select the Data menu

Step 1 Select cells A2:A901

Page 27: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Using Excel to Selecta Simple Random Sample

Excel Value Worksheet (Sorted)

A B C D

1Applicant Number

SAT Score

On-Campus Housing

Random Number

2 12 1107 No 0.000273 773 1043 Yes 0.001924 408 991 Yes 0.003035 58 1008 No 0.004816 116 1127 Yes 0.005387 185 982 Yes 0.005838 510 1163 Yes 0.006499 394 1008 No 0.00667

Page 28: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

as Point Estimator of x

as Point Estimator of pp

29,910997

30 30ix

x

2( ) 163,99675.2

29 29ix x

s

20 30 .68p

Point Estimation

Note: Different random numbers would haveidentified a different sample which would haveresulted in different point estimates.

s as Point Estimator of

Page 29: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

PopulationParameter

PointEstimator

PointEstimate

ParameterValue

m = Population mean SAT score

990 997

s = Population std. deviation for SAT score

80 s = Sample std. deviation for SAT score

75.2

p = Population pro- portion wanting campus housing

.72 .68

Summary of Point EstimatesObtained from a Simple Random Sample

= Sample mean SAT score x

= Sample pro- portion wanting campus housing

p

Page 30: DATA Exploration: Statistics (One Variable) 1.Basic EXCELL/MATLAB functions for data exploration 2.Measures of central tendency, Distributions 1.Mean 2.Median

Other Sampling Methods

Stratified Random Sampling Cluster Sampling Systematic Sampling Convenience Sampling Judgment Sampling