Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Deviations from NormalitySkewness and Kurtosis
Why do we care if the distribution is not normal?
• It helps you understand how a characteristic exhibits itself in your sample or in the population.
• It impacts what descriptive statistics you might use.
• It impacts the inferential statistics you might use.
SkewnessThe majority of scores do not fall in the middle of
the distribution.
The distribution is asymmetrical
You label the kind of skew according to the longer tail of the distribution.
Normal versus Skewed Distributionsen.wikipedia.org/wiki/Image:Standard_deviation_diagram.png
Freq
uenc
y
Negative skewPositive skew
Long tail is on the negative endLong tail is on
the positive end
Mean
Median
Mode
Normal Distribution
What Positive Skew MeansIf this represented the results of a quiz, the majority of the participants scored very low— almost no one scored in the highest range.
12
34
5
0 10 20 30 40 50 60 70 80 90 100
This test must have been very
easy or difficult?
Median
Mode
Mean
What Negative Skew MeansIf this represented the results of a quiz, the majority of the participants scored high— almost no one scored in the lowest range.
This test must have been very
easy or difficult?
0 10 20 30 40 50 60 70 80 90 100
12
34
5
Mean
Median
Mode
Normal vs. Skewed Distributions
•
Income Distributionshttp://www.city-data.com/city/Chicago-Illinois.html
Kurtosis
leptokurticplatykurtic
KurtosisLeaping
leptokurticdistribution
0 10 20 30 40 50 60 70 80 90 100
Platykurtic
0 10 20 30 40 50 60 70 80 90 100
Platykurtic like a
platypus distribution
leptokurtic
Normal (mesokurtic)
platykurtic
Sprinthall - Quick Kurtosis RuleWhen you have a distribution and know its standard deviation and range, you can estimate its kurtosis.
Fact: For a normal distribution, the standard deviation is about 1/6 of the range.
If the standard deviation is more than 1/6 of the range, then a distribution is platykurtic.
If the standard deviation is less than 1/6 of the range, then a distribution is leptokurtic.
How you determine this:Range = Standard Value
6Compare actual sd to SV
If sd > SV, platykurtic
If sd < SV, leptokurtic
Dr. Bellini’s MCC Research
47/6 = SV
SV = 7.83
8.98 > 7.83
Platykurtic
Dr. Bellini’s MCC Research
19/6 = SV
SV = 3.17
3.52 > 3.17
Platykurtic
Skew
$20,000 $40,000 $60,000
Beginning Salary
25
50
75
100
125
Cou
nt
$25,000 $50,000 $75,000 $100,000 $125,000
Current Salary
0
25
50
75
100
Cou
nt
C:\Program Files\SPSS\University of Florida graduate salaries.sav
10000 20000 30000 40000 50000 60000
Starting Salary
25
50
75
100
125
Cou
nt
Statistics
Starting Salary1100
26064.2026000.00
200006967.982
58300
NMeanMedianModeStd. DeviationRange
Measurement Scales
Measurement
• Assigning numbers to observations following a set of rules.
How are numbers assigned to observations?
What scale is used?1. Nominal2. Ordinal3. Interval4. Ratio
Nominal Data• Using numbers to label categories, but the
numbers have no inherent numerical qualities• Male = 1 Female = 2
• social security number • jersey numbers• race/ethnicity
Other examples of nominal scaling• Whether a participant does or does not have a
driver’s license (0,1)
• Whether the participant belongs to the experimental group or the control group (0,1)
• The school the participant attends (1,2,3,4,5)
Uses of nominal data• Generally, the most you can do with nominal data
is count it.
Categorical or Continuous?• Is a variable that uses a nominal scale of
measurement categorical or continuous?
Ordinal Scaling• The assigned number provides information about
the rank of an observation.
• Ordinal scales put observations in order.
• Rating scales are often considered ordinal in how they measure characteristics.
Example
Strongly Agree Neither agree Disagree Strongly agree nor disagree disagree
Statistical 1 2 3 4 5Thinking is my favorite Xclass
Hospital pain scales (1-10)
Ordinal Scales have two basic rules:1. Equality/non-equality rule
2. Greater-than-or-less-than rule
Ranking of Tennis Players http://sports.espn.go.com/sports/tennis/rankings?sport=WOMRANK
Uses of ordinal dataYou can express if something is greater than or less than (but you can’t express how much greater than or less than).
< > • Strongly agreeing is more than simply agreeing• Being tenth in the class is lower than being ninth
How much lower?
IssuesThere are limitation of what you can do statistically
with ordinal data
Degree of violation Consensus
NOIR Interval ScalingSatisfies all the requirements of an ordinal scale
(there is a high to low structure to the scale) and
• The intervals between the points on the scale become meaningful because the distance between successive points on an interval scale are equal.
Examples• Degrees Fahrenheit or Celsius• Calendar years
Interval scales do not have a meaningful zero point. They may contain a zero point, but it is arbitrary.
NOIR Ratio• Satisfies all the requirements of an interval scale• There is a real and meaningful zero point on a
ratio scale• Weight, height, heart rate, breaths per minute,
degrees Kelvin, annual income, miles per hour, pulse, etc.
HINT: If your scale has negative numbers, like with temperature, then it is interval but it probably isn’t ratio.
Our Survey
R N O N N R N O O R
Measures of Central Tendency and Dispersion
Describing data
Describing Data vs. Describing People
Measures of Central Tendency
•Mean (M or X)•Median (Md)•Mode (Mo)
Issues with Describing Data1 2 4 61 2 4 61 2 4 6 n = 341 2 4 6 Mean = 3.15 ???1 2 42 3 42 3 4 What does that mean?2 3 42 3 52 4 5
The MeanMost common synonym is the “average”
But what does the mean “mean”?
Definitions: • “arithmetic average”• a descriptor of the center of the data, when data
are distributed “normally”
When a mean value is most useful• To simply summarize a data set that is normally
distributed • To summarize data from a sample that can be used
to estimate information about an entire population
When a mean can be less useful as s descriptor
• To summarize data that is skewed (especially when it used as the only descriptor)
• To summarize data where there is an outlier
• To summarize data measured using a nominal scale
Mean and Skew- Find the mean annual income
$ 7,200$ 9,011 $ 20,074$ 24,999$ 36,567$ 32,145$ 54,158$567,987
$94,018How well does this number represent a measure of the center of our data set?
Mediana.k.a. Middle score in a ranked set of scores.It divides the distribution of scores into equal halves.
When there is an odd number of scores:1 2 2 3 5 5 7 8 10 15 16 16 21
When there is an even number of scores:4 5 7 15 16 19 31 32
Average = (15+16)/2 = 15.5 Median Score
Mean and Skew- Find the mean annual income
$28,572$7,200
$9,011
$20,074
$24,999
$32,145
$36,567
$54,158
$567,987
How well does this number represent a measure of the center of our data set?
10000.00 20000.00 30000.00 40000.00 50000.00
VAR00002
0.0
0.5
1.0
1.5
2.0C
ount
ModeMost frequently occurring score is a group of scores.
Exam scores:
78 85 92 55 87 85 98 84 71
88 85 78 65 99 100 85 62 100
Measures of central tendencyGuide to which measures of central tendency are appropriate to use
with each scale of measurement:Mean Median Mode
Nominal XOrdinal (X)? X XInterval X X XRatio X X X
Measures of central tendencyGuide to which measures of central tendency are appropriate to use
with each scale of measurement:Mean Median Mode
Nominal XOrdinal (X)? X X
Interval X X XRatio X X X
Mean, Median and/or Mode
Class Survey Age VariableDescribe this dataset in terms of its
Mean
Median
Mode
Range
Descriptive Statistics
14 37 36.79 10.71414
ageValid N (listwise)
N Range Mean Std. Deviation
30 40 50
age
0
2
4
6
Cou
nt
Class Survey Commute Variable
Describe this dataset in terms of its
Mean
Median
Mode
Range
What about this data set makes it difficult to describe using measures of central tendency?
10.0 20.0 30.0 40.0 50.0 60.0
travel
2
4
6
8
Cou
ntDescriptive Statistics
14 64.5 10.914 16.312414
travelValid N (listwise)
N Range Mean Std. Deviation
Commute
Giving more information, such as how skewed this variable is, would be helpful in the description.