Upload
zahra-zulaikha
View
3.341
Download
0
Embed Size (px)
DESCRIPTION
Dec 15, 2011 with Ma'am Daisy
Citation preview
Teaching Basic Statistics
INTRODUCTION TOSTATISTICS AND
STATISTICAL INFERENCE
Session 1.2
TEACHING BASIC STATISTICS
Session 1.3
TEACHING BASIC STATISTICS
Realities about Statistics
“There are three kinds of lies: lies, damned lies, and statistics” – Mark Twaine
One can not go about without statistics.“Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.” – Aaron Levenstein
Session 1.4
TEACHING BASIC STATISTICS
Definition of Statistics
plural sense: numerical facts, e.g. CPI, peso-dollar exchange rate
singular sense: scientific discipline consisting of theory and methods for processing numerical information that one can use when making decisions in the face of uncertainty.
Session 1.5
TEACHING BASIC STATISTICS
History of Statistics
The term statistics came from the Latin phrase “ratio status” which means study of practical politics or the statesman’s art.
In the middle of 18th century, the term statistik (a term due to Achenwall) was used, a German term defined as “the political science of several countries”
From statistik it became statistics defined as a statement in figures and facts of the present condition of a state.
Session 1.6
TEACHING BASIC STATISTICS
Application of Statistics
Diverse applications “During the 20th Century statistical thinking
and methodology have become the scientific framework for literally dozens of fields including education, agriculture, economics, biology, and medicine, and with increasing influence recently on the hard sciences such as astronomy, geology, and physics. In other words, we have grown from a small obscure field into a big obscure field.” – Brad Efron
Session 1.7
TEACHING BASIC STATISTICS
Application of Statistics
Comparing the effects of five kinds of fertilizers on the yield of a particular variety of corn
Determining the income distribution of Ateneo students under CHED
Comparing the effectiveness of two diet programs
Prediction of daily temperatures Evaluation of student performance
Session 1.8
TEACHING BASIC STATISTICS
Two Aims of Statistics
Statistics aims to uncover structure in data, to explain variation…
Descriptive Inferential
Session 1.9
TEACHING BASIC STATISTICS
Areas of Statistics
Descriptive statistics methods concerned w/
collecting, describing, and analyzing a set of data without drawing conclusions (or inferences) about a large group
Inferential statistics methods concerned
with the analysis of a subset of data leading to predictions or inferences about the entire set of data
Session 1.10
TEACHING BASIC STATISTICS
Examples of Descriptive Statistics
Presenting the Philippine population by constructing a graph indicating the total number of Filipinos counted during the last census by age group and sex
The Department of Social Welfare and Development (DSWD) cited statistics showing an increase in the number of child abuse cases during the past five years.
Session 1.11
TEACHING BASIC STATISTICS
Examples of Inferential Statistics
A new milk formulation designed to improve the psychomotor development of infants was tested on randomly selected infants. Based on the results, it was concluded that the new milk formulation is effective in improving the psychomotor development of infants.
Session 1.12
TEACHING BASIC STATISTICS
Inferential Statistics
Larger Set(N units/observations) Smaller Set
(n units/observations)
Inferences and Generalizations
Session 1.13
TEACHING BASIC STATISTICS
Key Definitions
A variable is a characteristic observed
or measured on every unit of the universe.
A population is the set of all possible values of the variable.
Session 1.14
TEACHING BASIC STATISTICS
Key Definitions
Parameters are numerical measures that describe the population or universe of interest. Usually donated by Greek letters; (mu), (sigma), (rho), (lambda), (tau), (theta), (alpha) and (beta).
Statistics are numerical measures of a sample
Session 1.15
TEACHING BASIC STATISTICS
VARIABLES
Qualitative Quantitative
ContinuousDiscrete
Types of Variables
Qualitative variable non-numerical values
Quantitative variable numerical values
a. Discrete countable
b. Continuous measurable
Session 1.16
TEACHING BASIC STATISTICS
Levels of Measurement
1. Nominal Numbers or symbols used to classify
2. Ordinal scale Accounts for order; no indication of
distance between positions
3. Interval scale Equal intervals; no absolute zero
4. Ratio scale Has absolute zero
Session 1.17
TEACHING BASIC STATISTICS
NOMINAL SCALEa nominal scale consists of a set of categories that have
different namesmeasurements on a nominal scale label and categorize
observations, but do not make any quantitative distinctions between observations.Variables measured at the nominal scale:
Gender (1= male, 0=female)ZIP code (7000=Philippines, …)Plate numbers of vehicles (JK3429, MC001, …)Course (Biology, Mathematics, History, …)Race (Asian, American, …)Eye color (Brown, Blue, …)
Session 1.18
TEACHING BASIC STATISTICS
ORDINAL SCALEconsists of a set of categories that are organized in an
ordered sequencemeasurements on an ordinal scale rank observations in
terms of sizevariables that can be measured at the ordinal scale:
Ranks in a race (first, second, third, …)Sizes of shirts (small, medium, large, …)Order of birth (first child, second child , third child ,
…)Socio-economic status (lower, middle, upper, …)Difficulty level of a test (easy, average, difficult, …)Degree of agreement (SD, D, A, SA)
Session 1.19
TEACHING BASIC STATISTICS
INTERVAL SCALE consists of ordered categories that are all
intervals of exactly the same sizeequal differences between numbers on the
scale reflect equal differences in magnitude, however, ratios of magnitudes are not meaningful.
Variables measured at the interval scale:Temperature (in oF or oC)IQ SAT scores
Session 1.20
TEACHING BASIC STATISTICS
RATIO SCALE is an interval scale with additional
feature of an absolute zero pointRatios of numbers do reflect ratios of
magnitudeVariables measured at the ratio scale:
Age (16, 20, 28, …)Height (165cm, 154cm, 144cm, …)Reaction time (20sec, 43sec, 37sec,
…) Number of siblings (2, 5, 8, …)Hours spent on studying for an
exam (0, 2, 3, …)
Session 1.21
TEACHING BASIC STATISTICS
Methods of Presenting Data
Textual
Tabular
Graphical
Session 1.22
TEACHING BASIC STATISTICS
Mean
Median
Mode
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of Variation
Range
Location
Maximum Minimum
Central Tendency
Percentile Quartile Decile
Interquartile Range
Skewness
Kurtosis
Session 1.23
TEACHING BASIC STATISTICS
Measures of Location
A Measure of Location summarizes a data set by giving a “typical value” within the range of the data values that describes its location relative to entire data set.Some Common Measures:
Minimum, Maximum
Central Tendency
Percentiles, Deciles, Quartiles
Session 1.24
TEACHING BASIC STATISTICS
Maximum and Minimum
Minimum is the smallest value in the data set, denoted as MIN.
Maximum is the largest value in the data set, denoted as MAX.
Session 1.25
TEACHING BASIC STATISTICS
Measure of Central Tendency
A single value that is used to identify the “center” of the data it is thought of as a typical value of
the distributionprecise yet simplemost representative value of the
data
Session 1.26
TEACHING BASIC STATISTICS
Mean
Most common measure of the center Also known as arithmetic average
1 21
n
ini
xx x x
xn n
Sample Mean
1 1 2
N
ii N
XX X X
N N
Population Mean
Session 1.27
TEACHING BASIC STATISTICS
Properties of the Mean
may not be an actual observation in the data set
can be applied in at least interval level
easy to compute every observation contributes
to the value of the mean
Session 1.28
TEACHING BASIC STATISTICS
Properties of the Mean
subgroup means can be combined to come up with a group mean
easily affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5Mean = 6
Session 1.29
TEACHING BASIC STATISTICS
Median
Divides the observations into two equal parts If n is odd, the median is the middle number. If n is even, the median is the average of the
2 middle numbers.
Sample median denoted as
while population median is denoted as
x~
~
Session 1.30
TEACHING BASIC STATISTICS
Properties of a Median
may not be an actual observation in the data set
can be applied in at least ordinal level a positional measure; not affected by
extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
Session 1.31
TEACHING BASIC STATISTICS
Mode
occurs most frequently nominal average computation of the mode for ungrouped or
raw data
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Session 1.32
TEACHING BASIC STATISTICS
Properties of a Mode
can be used for qualitative as well as quantitative data
may not be unique not affected by extreme values may not exist
Session 1.33
TEACHING BASIC STATISTICS
Mean, Median & Mode
Use the mean when:
sampling stability is desired other measures are to be
computed
Session 1.34
TEACHING BASIC STATISTICS
Mean, Median & Mode
Use the median when:
the exact midpoint of the distribution is desired
there are extreme observations
Session 1.35
TEACHING BASIC STATISTICS
Mean, Median & Mode
Use the mode when:
when the "typical" value is desired
when the dataset is measured on a nominal scale
Session 1.36
TEACHING BASIC STATISTICS
Percentiles
Numerical measures that give the relative position of a data value relative to the entire data set.
Divide an array (raw data arranged in increasing or decreasing order of magnitude) into 100 equal parts.
The jth percentile, denoted as Pj, is the data value in the the data set that separates the bottom j% of the data from the top (100-j)%.
Session 1.37
TEACHING BASIC STATISTICS
EXAMPLE
Suppose LJ was told that relative to the other scores on a certain test, his score was the 95th percentile. This means that 95% of those who took the test had scores less than or equal to LJ’s score, while 5% had scores higher than LJ’s.
Session 1.38
TEACHING BASIC STATISTICS
Deciles
Divide an array into ten equal parts, each part having ten percent of the distribution of the data values, denoted by Dj.
The 1st decile is the 10th percentile; the 2nd decile is the 20th percentile…..
Session 1.39
TEACHING BASIC STATISTICS
Quartiles
Divide an array into four equal parts, each part having 25% of the distribution of the data values, denoted by Qj.
The 1st quartile is the 25th percentile; the 2nd quartile is the 50th percentile, also the median and the 3rd quartile is the 75th percentile.
Session 1.40
TEACHING BASIC STATISTICS
Measures of Variation
A measure of variation is a single value that is used to describe the spread of the distributionA measure of central tendency
alone does not uniquely describe a distribution
Session 1.41
TEACHING BASIC STATISTICS
Mean = 15.5 s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.57
Data C
A look at dispersion…
Session 1.42
TEACHING BASIC STATISTICS
Two Types of Measures of Dispersion
Absolute Measures of Dispersion: Range Inter-quartile Range Variance Standard Deviation
Relative Measure of Dispersion: Coefficient of Variation
Session 1.43
TEACHING BASIC STATISTICS
Range (R)
The difference between the maximum and minimum value in a data set, i.e.
R = MAX – MINExample: Pulse rates of 15 male residents of a
certain village
54 58 58 60 62 65 66 71 74 75 77 78 80 82 85
R = 85 - 54 = 31
Session 1.44
TEACHING BASIC STATISTICS
Some Properties of the Range
The larger the value of the range, the more dispersed the observations are.
It is quick and easy to understand.
A rough measure of dispersion.
Session 1.45
TEACHING BASIC STATISTICS
Inter-Quartile Range (IQR)
The difference between the third quartile and first quartile, i.e.
IQR = Q3 – Q1 Example: Pulse rates of 15 residents of a
certain village
54 58 58 60 62 65 66 71 74 75 77 78 80 82 85
IQR = 78 - 60 = 18
Session 1.46
TEACHING BASIC STATISTICS
Some Properties of IQR
Reduces the influence of extreme values.
Not as easy to calculate as the Range.
Session 1.47
TEACHING BASIC STATISTICS
Variance
important measure of variation shows variation about the mean
Population variance
Sample variance
N
XN
ii
1
2
2
)(
1
)(1
2
2
n
xxs
n
ii
Session 1.48
TEACHING BASIC STATISTICS
Standard Deviation (SD)
most important measure of variation square root of Variance has the same units as the original data
Population SD
Sample SD
N
XN
ii
1
2)(
1
)(1
2
n
xxs
n
ii
Session 1.49
TEACHING BASIC STATISTICS
Data: 10 12 14 15 17 18 18 24
n = 8 Mean =16
309.4 7
2)1624(2)1618(2)1617(2)1615(2)1614(2)1612(2)1610(
s
Computation of Standard Deviation
Session 1.50
TEACHING BASIC STATISTICS
Remarks on Standard Deviation
If there is a large amount of variation, then on average, the data values will be far from the mean. Hence, the SD will be large.
If there is only a small amount of variation, then on average, the data values will be close to the mean. Hence, the SD will be small.
Session 1.51
TEACHING BASIC STATISTICS
Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5 s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5 s = 4.57
Data C
Comparing Standard Deviation
Session 1.52
TEACHING BASIC STATISTICS
Example: Team A - Heights of five marathon players in inches
65”
65 “ 65 “ 65 “ 65 “ 65 “
Mean = 65 S = 0
Comparing Standard Deviation
Session 1.53
TEACHING BASIC STATISTICS
Example: Team B - Heights of five marathon players in inches
62 “ 67 “ 66 “ 70 “ 60 “
Mean = 65” s = 4.0”
Comparing Standard Deviation
Session 1.54
TEACHING BASIC STATISTICS
Properties of Standard Deviation
It is the most widely used measure of dispersion. (Chebychev’s Inequality)
It is based on all the items and is rigidly defined.
It is used to test the reliability of measures calculated from samples.
The standard deviation is sensitive to the presence of extreme values.
It is not easy to calculate by hand (unlike the range).
Session 1.55
TEACHING BASIC STATISTICS
Coefficient of Variation (CV)
measure of relative variation usually expressed in percent shows variation relative to mean used to compare 2 or more groups Formula :
100%
Mean
SDCV
Session 1.56
TEACHING BASIC STATISTICS
Comparing CVs
Stock A: Average Price = P50
SD = P5
CV = 10% Stock B: Average Price = P100
SD = P5
CV = 5%
Session 1.57
TEACHING BASIC STATISTICS
Measure of Skewness
Describes the degree of departures of the distribution of the data from symmetry.
The degree of skewness is measured by the coefficient of skewness, denoted as SK and computed as,
SD
MedianMeanK
3S
Session 1.58
TEACHING BASIC STATISTICS
What is Symmetry?
A distribution is said to be symmetric about the mean, if the distribution to the left of mean is the “mirror image” of the distribution to the right of the mean. Likewise, a symmetric distribution has SK=0 since its mean is equal to its median and its mode.
Session 1.59
TEACHING BASIC STATISTICS
positively skewed
Measure of Skewness
negatively skewed
Session 1.60
TEACHING BASIC STATISTICS
Measure of Kurtosis
Describes the extent of peakedness or flatness of the distribution of the data.
Measured by coefficient of kurtosis (K) computed as,
4
1
43
N
i
i
X
KN
Session 1.61
TEACHING BASIC STATISTICS
K = 0 mesokurtic
K > 0 leptokurtic
K < 0platykurtic
Measure of Kurtosis