Upload
claud-todd
View
219
Download
0
Embed Size (px)
Citation preview
DATA Exploration: Statistics (One Variable)
1. Basic EXCELL/MATLAB functions for data exploration 2. Measures of central tendency, Distributions
1. Mean2. Median3. Mode
3. Measures of spread1. Range2. Variance
4. Simple Sampling5. Example of Sampling by using EXCELL
2
1. Working with Data in Excel: Arithmetic
3
Use “Insert” then “Function” then “All” or “Statistical” to find an alphabetical list of functions
1. Summary Statistics in EXCELL (One Variable)
4
1. Summary Statistics in EXCELL Average
5
1. Summary Statistics in EXCELL (Median)
6
1. Summary Statistics in EXCELL (Standard Deviation)
7
1. Summary Statistics in EXCELL (Rand & RandBetween)
8
1. Summary Statistics in EXCELL (Sort )
Function Descriptionmax Maximum valuemean Average or mean valuemedian Median valuemin Smallest valuemode Most frequent valuestd Standard deviationvar Variance, which measures the spread or
dispersion of the values
1. Summary Statistics in MATLAB
2. Distributions Continuous Probability Distributions
Uniform Probability Distribution
Normal Probability Distribution
Exponential Probability Distribution
f (x)f (x)
x x
Uniform
x
f (x)Normal
xx
f (x)f (x) Exponential
Uniform Probability Distribution
where: a = smallest value the variable can assume b = largest value the variable can assume
f (x) = 1/(b – a) for a < x < b = 0 elsewhere f (x) = 1/(b – a) for a < x < b = 0 elsewhere
A random variable is uniformly distributed whenever the probability is proportional to the interval’s length.
The uniform probability density function is:
Var(x) = (b - a)2/12Var(x) = (b - a)2/12
E(x) = (a + b)/2E(x) = (a + b)/2
Uniform Probability Distribution
Expected Value of x
Variance of x
The highest point on the normal curve is at the mean, which is also the median and mode. The highest point on the normal curve is at the mean, which is also the median and mode.
Normal Probability Distribution Characteristics
x
Normal Probability Distribution
Characteristics
-10 0 20
The mean can be any numerical value: negative, zero, or positive. The mean can be any numerical value: negative, zero, or positive.
x
3. Normal Probability Distribution
Characteristics
s = 15
s = 25
The standard deviation determines the width of thecurve: larger values result in wider, flatter curves.The standard deviation determines the width of thecurve: larger values result in wider, flatter curves.
x
Converting to the Standard Normal Distribution
Standard Normal Probability Distribution
zx
We can think of z as a measure of the number ofstandard deviations x is from .
3. Normal Probability Distribution
Characteristics
xm – 3s m – 1s
m – 2sm + 1s
m + 2sm + 3s
m
68.26%
95.44%
99.72%
4. Sampling and Sampling Distributions
x Sampling Distribution of
Introduction to Sampling Distributions
Point Estimation
Simple Random Sampling
Other Sampling Methods
p Sampling Distribution of
4. Simple Random Sampling:
Finite populations are often defined by lists such as: Organization membership roster Credit card account numbers Inventory product numbers
A simple random sample of size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected.
s is the point estimator of the population standard deviation . s is the point estimator of the population standard deviation .
In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter.
In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter.
4. Point Estimation
We refer to as the point estimator of the population mean . We refer to as the point estimator of the population mean .
x
is the point estimator of the population proportion p. is the point estimator of the population proportion p.p
Process of Statistical Inference
The value of is used tomake inferences aboutthe value of m.
x The sample data provide a value forthe sample mean .x
A simple random sampleof n elements is selectedfrom the population.
Population with meanm = ?
Sampling Distribution of x
4. Simple Random Sampling
The applicants were numbered, from 1 to 900, as their applications arrived.
She decides a sample of 30 applicants will be used.
Furthermore, the Director of Admissions must obtain estimates of the population parameters of interest for a meeting taking place in a few hours.
Now suppose that the necessary data on the current year’s applicants were not yet entered in the college’s database.
The population parameters of interest are the SAT scores and the percentage of students planning to live in dorms.
Taking a Sample of 30 Applicants
Excel’s RAND function generates random numbers between 0 and 1
Excel’s RAND function generates random numbers between 0 and 1
4. Simple Random Sampling:
Step 1: Assign a random number to each of the 900 applicants.
Step 2: Select the 30 applicants corresponding to the 30 smallest random numbers.
4. Using Excel to Selecta Simple Random Sample
Excel Formula Worksheet
A B C D
1Applicant Number
SAT Score
On-Campus Housing
Random Number
2 1 1008 Yes =RAND()3 2 1025 No =RAND()4 3 952 Yes =RAND()5 4 1090 Yes =RAND()6 5 1127 Yes =RAND()7 6 1015 No =RAND()8 7 965 Yes =RAND()9 8 1161 No =RAND()
4. Using Excel to Selecta Simple Random Sample
Excel Value Worksheet
A B C D
1Applicant Number
SAT Score
On-Campus Housing
Random Number
2 1 1008 Yes 0.610213 2 1025 No 0.837624 3 952 Yes 0.589355 4 1090 Yes 0.199346 5 1127 Yes 0.866587 6 1015 No 0.605798 7 965 Yes 0.809609 8 1161 No 0.33224
Put Random Numbers in Ascending Order
4. Using Excel to Selecta Simple Random Sample
Step 4 When the Sort dialog box appears: Choose Random Numbers in
the Sort by text box Choose Ascending Click OK
Step 3 Choose the Sort optionStep 2 Select the Data menu
Step 1 Select cells A2:A901
Using Excel to Selecta Simple Random Sample
Excel Value Worksheet (Sorted)
A B C D
1Applicant Number
SAT Score
On-Campus Housing
Random Number
2 12 1107 No 0.000273 773 1043 Yes 0.001924 408 991 Yes 0.003035 58 1008 No 0.004816 116 1127 Yes 0.005387 185 982 Yes 0.005838 510 1163 Yes 0.006499 394 1008 No 0.00667
as Point Estimator of x
as Point Estimator of pp
29,910997
30 30ix
x
2( ) 163,99675.2
29 29ix x
s
20 30 .68p
Point Estimation
Note: Different random numbers would haveidentified a different sample which would haveresulted in different point estimates.
s as Point Estimator of
PopulationParameter
PointEstimator
PointEstimate
ParameterValue
m = Population mean SAT score
990 997
s = Population std. deviation for SAT score
80 s = Sample std. deviation for SAT score
75.2
p = Population pro- portion wanting campus housing
.72 .68
Summary of Point EstimatesObtained from a Simple Random Sample
= Sample mean SAT score x
= Sample pro- portion wanting campus housing
p
Other Sampling Methods
Stratified Random Sampling Cluster Sampling Systematic Sampling Convenience Sampling Judgment Sampling