Measures of Skewness and Kurtosis

Embed Size (px)

Citation preview

Measures of Skewness and Kurtosis

Skewne ss and Kurtosi s

A fundamental task in many statistical analyses is to characterize the location and variability of a data set. A further characterization of the data includes skewness and kurtosis. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak. A uniform distribution would be the extreme case. The histogram is an effective graphical technique for showing both the skewness and kurtosis of data set.

Definiti on of Skewne ss

For univariate data Y1, Y2, ..., YN, the formula for skewness is:

where is the mean, is the standard deviation, and N is the number of data points. The skewness for a normal distribution is zero, and any

symmetric data should have a skewness near zero. Negative values for the skewness indicate data that are skewed left and positive values for the skewness indicate data that are skewed right. By skewed left, we mean that the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is long relative to the left tail. Some measurements have a lower bound and are skewed right. For example, in reliability studies, failure times cannot be negative. Definiti on of Kurtosi s For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:

where is the mean, of data points.

is the standard deviation, and N is the number

Alterna tive Definiti on of Kurtosi s

The kurtosis for a standard normal distribution is three. For this reason, some sources use the following definition of kurtosis (often referred to as "excess kurtosis"):

This definition is used so that the standard normal distribution has a kurtosis of zero. In addition, with the second definition positive kurtosis indicates a "peaked" distribution and negative kurtosis indicates a "flat" distribution.

Which definition of kurtosis is used is a matter of convention (this handbook uses the original definition). When using software to compute the sample kurtosis, you need to be aware of which convention is being followed. Many sources use the term kurtosis when they are actually computing "excess kurtosis", so it may not always be clear. Exampl es The following example shows histograms for 10,000 random numbers generated from a normal, a double exponential, a Cauchy, and a

Weibull distribution.

Normal Distrib ution

The first histogram is a sample from a normal distribution. The normal distribution is a symmetric distribution with well-behaved tails. This is indicated by the skewness of 0.03. The kurtosis of 2.96 is near the expected value of 3. The histogram verifies the symmetry.

Double Expone ntial Distrib ution Cauchy Distrib ution

The second histogram is a sample from a double exponential distribution. The double exponential is a symmetric distribution. Compared to the normal, it has a stronger peak, more rapid decay, and heavier tails. That is, we would expect a skewness near zero and a kurtosis higher than 3. The skewness is 0.06 and the kurtosis is 5.9. The third histogram is a sample from a Cauchy distribution. For better visual comparison with the other data sets, we restricted the histogram of the Cauchy distribution to values between -10 and 10. The full data set for the Cauchy data in fact has a minimum of approximately -29,000 and a maximum of approximately 89,000. The Cauchy distribution is a symmetric distribution with heavy tails

and a single peak at the center of the distribution. Since it is symmetric, we would expect a skewness near zero. Due to the heavier tails, we might expect the kurtosis to be larger than for a normal distribution. In fact the skewness is 69.99 and the kurtosis is 6,693. These extremely high values can be explained by the heavy tails. Just as the mean and standard deviation can be distorted by extreme values in the tails, so too can the skewness and kurtosis measures.

Weibul l Distrib ution

The fourth histogram is a sample from a Weibull distribution with shape parameter 1.5. The Weibull distribution is a skewed distribution with the amount of skewness depending on the value of the shape parameter. The degree of decay as we move away from the center also depends on the value of the shape parameter. For this data set, the skewness is 1.08 and the kurtosis is 4.46, which indicates moderate skewness and kurtosis. Many classical statistical tests and intervals depend on normality assumptions. Significant skewness and kurtosis clearly indicate that data are not normal. If a data set exhibits significant skewness or kurtosis (as indicated by a histogram or the numerical measures), what can we do about it? One approach is to apply some type of transformation to try to make the data normal, or more nearly normal. The Box-Cox transformation is a useful technique for trying to normalize a data set. In particular, taking the log or square root of a data set is often useful for data that exhibit moderate right skewness. Another approach is to use techniques based on distributions other than the normal. For example, in reliability studies, the exponential, Weibull, and lognormal distributions are typically used as a basis for

Dealin g with Skewne ss and Kurtosi s

modelling rather than using the normal distribution. The probability plot correlation coefficient plot and the probability plot are useful tools for determining a good distributional model for the data.

Softwar e

The skewness and kurtosis coefficients are available in most general purpose statistical software programs, including Data plot.

SkewnessThe first thing you usually notice about a distributions shape is whether it has one mode (peak) or more than one. If its unimodal (has just one peak), like most data sets, the next thing you notice is whether its symmetric or skewed to one side. If the bulk of the data is at the left and the right tail is longer, we say that the distribution is skewed right or positively skewed; if the peak is toward the right and the left tail is longer, we say that the distribution is skewed left or negatively skewed. Look at the two graphs below. They both have = 0.6923 and = 0.1685, but their shapes are different.

1.3846 Beta(=4.5, =2) Beta(=4.5, =2) skewness = +0.5370 skewness = 0.5370 The first one is moderately skewed left: the left tail is longer and most of the distribution is at the right. By contrast, the second distribution is moderately skewed right: its right tail is longer and most of the distribution is at the left. You can get a general impression of skewness by drawing a histogram, but there are also some common numerical measures of skewness. Some authors favor one, some favor another. This Web page presents one of them. In fact, these are the same formulas that Excel uses in its Descriptive Statistics tool in Analysis Toolpak. You may remember that the mean and standard deviation have the same units as the original data, and the variance has the square of those units. However, the skewness has no units: its a pure number, like a z-score.

ComputingThe moment coefficient of skewness of a data set is skewness: g1 = m3 / m23/2 (1)where m3 = (x )3 / n and m2 = (x )2 / n is the mean and n is the sample size, as usual. m3 is called the third moment of the data set. m2 is the variance, the square of the standard deviation. Youll remember that you have to choose one of two different measures of standard deviation, depending on whether you have data for the whole population or just a sample. The same is true of skewness. If you have the whole population, then g1 above is the measure of skewness. But if you have just a sample, you need the sample skewness: (2)sample skewness: source: D. N. Joanes and C. A. Gill. Comparing Measures of Sample Skewness and Kurtosis.The Statistician 47(1):183189.

Excel doesnt concern itself with whether you have a sample or a population: its measure of skewness is always G1.

Example 1: College Mens HeightsHeight Class Frequ(inches) Mark, x ency, f 59.562.5 62.565.5 65.568.5 68.571.5 71.574.5 61 64 67 70 73 5 18 42 27 8

Here are grouped data for heights of 100 randomly selected male students, adapted from Spiegel & Stephens, Theory and Problems of Statistics 3/e (McGraw-Hill, 1999), page 68. A histogram shows that the data are skewed left, not symmetric.

But how highly skewed are they, compared to other data sets? To answer this question, you have to compute the skewness. Begin with the sample size and sample mean. (The sample size was given, but it never hurts to check.) n = 5+18+42+27+8 = 100 = (615 + 6418 + 6742 + 7027 + 738) 100 = 9305 + 1152 + 2814 + 1890 + 584) 100 = 6745100 = 67.45 Now, with the mean in hand, you can compute the skewness. (Of course in real life youd probably use Excel or a statistics package, but its good to know where the numbers come from.) Class Frequenc Mar y, f k, x xf (x ) (x ) f (x )f

61

5

305 -6.45 115 2 -3.45 281 4 -0.45 189 0 2.55

208.0 1341.6 1 8 214.2 5 739.15

64

18

67

42

8.51

-3.83

70

27

175.5 7 447.70 246.4 1367.6 2 3 852.7 269.3 5 3 8.527 2.693 5 3

73

8

584 5.55 674 5 67.4 5

n/a

, m 2, m 3

n/a

Finally, the skewness is g1 = m3 / m23/2 = 2.6933 / 8.52753/2 = 0.1082 But wait, theres more! That would be the skewness if the you had data for the whole population. But obviously there are more than 100 male students in the world, or even in almost any school, so what you have here is a sample, not the population. You must compute the sample skewness: = [(10099) / 98] [2.6933 / 8.52753/2] = 0.1098

InterpretingIf skewness is positive, the data are positively skewed or skewed right, meaning that the right tail of the distribution is longer than the left. If skewness is negative, the data are negatively skewed or skewed left, meaning that the left tail is longer. If skewness = 0, the data are perfectly symmetrical. But a skewness of exactly zero is quite unlikely for real-world data, so how can you interpret the skewness number?

Bulmer, M. G., Principles of Statistics (Dover, 1979) a classic suggests this rule of thumb: If skewness is less than 1 or greater than +1, the distribution is highly skewed. If skewness is between 1 and or between + and +1, the distribution ismoderately skewed. If skewness is between and +, the distribution is approximately symmetric. With a skewness of 0.1098, the sample data for student heights are approximately symmetric. Caution: This is an interpretation of the data you actually have. When you have data for the whole population, thats fine. But when you have a sample, the sample skewness doesnt necessarily apply to the whole population. In that case the question is, from the sample skewness, can you conclude anything about the population skewness? To answer that question, see the next section.

InferringYour data set is just one sample drawn from a population. Maybe, from ordinary sample variability, your sample is skewed even though the population is symmetric. But if the sample is

skewed too much for random chance to be the explanation, then you can conclude that there is skewness in the population. But what do I mean by too much for random chance to be the explanation? To answer that, you need to divide the sample skewness G1 by the standard error of skewness (SES)to get the test statistic, which measures how many standard errors separate the sample skewness from zero: (3)test statistic: Zg1 = G1/SES where This formula is adapted from page 85 of Cramer, Duncan, Basic Statistics for Social Research(Routledge, 1997). (Some authors suggest (6/n), but for small samples thats a poor approximation. And anyway, weve all got calculators, so you may as well do it right.) The critical value of Zg1 is approximately 2. (This is a two-tailed test of skewness 0 at roughly the 0.05 significance level.)