Lecture3 Handouts

Embed Size (px)

Citation preview

  • 7/28/2019 Lecture3 Handouts

    1/27

    Lecture 3: Descriptive Statistics

    Matt Golder & Sona Golder

    Pennsylvania State University

    Introduction

    In a broad sense, making an inference implies partially or completely describinga phenomenon.

    Before discussing inference making, we must have a method for characterizingor describing a set of numbers.

    The characterizations must be meaningful so that knowledge of the descriptivemeasures enable us to clearly visualize a set of numbers.

    Generally, we can characterize a set of numbers using either graphical or

    numerical methods.

    Were now going to look at numerical methods.

    Numerical Methods

    Graphical displays are not usually adequate for the purpose of makinginferences.

    We need rigorously defined quantities for summarizing the informationcontained in the sample. These sample quantities typically have mathematicalproperties that allow us to make probability statements regarding the goodnessof our inferences.

    The quantities we define are descriptive measures of a set of data.

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    2/27

    Descriptive Measures

    Typically, we are interested in two types of descriptive numbers: (i) measures ofcentral tendency and (ii) measures of dispersion or variation.

    Measures of central tendency, sometimes called location statistics,summarize a distribution by its typical value.

    Measures of dispersion indicate, through a single number, the extent to whicha distributions observations differ from one another.

    Together with information on its shape, the measures convey the most

    distinctive aspects of a distribution.

    Central Tendency

    There are three basic measures of central tendency:

    1 Mean

    2 Median

    3 Mode

    Central Tendency

    Redskins 7 Giants 16 Bengals 10 Ravens 17Jets 20 D olphins 14 C hiefs 10 P atriots 17Texans 17 Steelers 38 Jaguars 10 Titans 17Lions 21 Falcons 34 S eahawks 10 B ills 34Rams 3 Eagles 38 Buccaneers 20 Saints 24Bears 29 C olts 13 P anthers 26 C hargers 2 4Cardinals 23 49ers 13 Cowboys 28 Browns 10Vikings 19 Packers 24 Broncos 41 Raiders 14

    Figure: Points Scored in Week One of the 2008 NFL Season.01

    .01

    .0102

    .02

    .0203

    .03

    .0304

    .04

    .04ens ty

    Density

    Dens ty

    0

    0

    1

    1

    2

    2

    3

    3

    4

    45

    5requency

    Frequency

    Frequency

    0

    00

    10

    100

    20

    200

    30

    300

    40

    40oints Scored

    PointsScored

    Points Scored

    Notes

    Notes

    Notes

    http://find/http://find/http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/http://goforward/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    3/27

    Mean

    The mean is also known as an expected value or, colloquially, as an average.

    The mean of a sample of N measured responses, X1, X2, . . . , X n is given by:

    X =1

    n

    ni=1

    Xi =1

    n(X1 + X2 + . . . + Xn)

    = X1 + X2 + . . . + Xnn

    The symbol X, read X bar, refers to the sample mean. is used to denote

    the population mean.

    Mean

    Calculating a mean only makes sense if the data are numeric, and (in general)if the variable in question is measured at least at the interval level.

    While the mean of an ordinal variable might provide some useful information,the fact that the data are only ordinal means that the assumption necessary forsummation that the values of the variable are equally spaced isquestionable.

    It never makes sense to calculate the mean of a nominal-level variable.

    Mean

    Its common to think of the mean as the balance point of the data that is,the point in X at which, if every value of X was given weight according toits size, the data would balance.

    Thus, if we add a single data point (call it XN+1) to an existing variable X,the new mean (using N + 1 observations) will:

    be greater than the old one if XN+1 > X,

    be less than the old one if xN+1 < X, and

    be the same as the old one iff XN+1 = X.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    4/27

    Mean

    A mean can also be thought of as the value of X that minimizes the squareddeviations between itself and every value of Xi. That is, suppose we areinterested in choosing a value for X (call it ) that minimizes:

    f(X) =N

    i=1(Xi )2

    =Ni=1

    (X2i + 2 2Xi)

    Mean

    To find the minimum, we need to calculate f(X)X , set that equal to zero, andsolve.

    f(X)

    X=

    Ni=1

    (2 2Xi)

    Ni=1

    (2 2Xi) = 0

    2N 2Ni=1

    Xi = 0

    2N = 2Ni=1

    Xi

    =1

    N

    Ni=1

    Xi X

    Mean

    Note that the mean is very susceptible to the effect of outliers in the data.

    Exceptionally large values in X will tend to pull the mean in their direction,

    and so can provide a misleading picture of the actual central location of the

    variable in question.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    5/27

    Geometric Mean

    So far, we have looked at what formally is called the arithmetic mean.

    There are two other potentially useful variants on the mean: the geometricmean and the harmonic mean.

    The geometric mean is defined as:

    XG =

    Ni=1

    Xi

    1N

    which can also be written as:

    XG =N

    X1 X2 ... XN

    Geometric Mean

    The geometric mean can also be written using logarithms:

    Ni=1

    Xi

    1N

    = exp

    1

    N

    Ni=1

    ln Xi

    That is, the geometric mean of a variable X is equal to the exponent of thearithmetic mean of the natural logarithm of that variable.

    Geometric Mean

    The geometric mean has a geometric interpretation. You can think of thegeometric mean of two values X1 and X2 as the answer to the question:

    What is the length of one side of a square that has an area equal to

    that of a rectangle with width X1 and height X2?

    Similarly, for three values X1, X2, and X3, one could ask:

    What is the length of one side of a cube that has a volume equal to

    that of a box with width X1, height X2, and depth X3?

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    6/27

    Geometric Mean

    This idea is represented graphically (for the two-value case) in Figure 2.

    Figure: The Geometric Mean: A (Geometric!) Interpretation

    X1

    X2 =

    XG

    Geometric Mean

    Finally, note a few things:

    The geometric mean is only appropriate for variables with positive values.

    The geometric mean is always less than or equal to the arithmetic mean:

    X XGThe two are only equal if the values of X are the same for all Nobservations.

    While rare, the geometric mean is actually the more appropriate measure

    of central tendency to use for phenomena (such as percentages) that aremore accurately multiplied rather than summed.

    Geometric Mean

    Example: Suppose that the price of something doubled (went up 200 percentof the original price) in 2008, and then decreased by 50 percent (back to itsoriginal price) in 2009.

    The actual average change in the price across the two years is not(200 + 50)/2 = 125 percent, but rather

    200 50 = 10000 = 100, or zero

    percent average (annualized) net change.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    7/27

    Harmonic Mean

    The harmonic mean is defined as:

    XH =NNi=1

    1Xi

    i.e., it is the product of N and the reciprocal of the sum of the reciprocals ofthe Xis.

    Equivalently:

    XH =1 1X

    ,i.e., the harmonic mean is the reciprocal of the (arithmetic) mean of the

    reciprocals of X.

    Harmonic Mean

    Because it considers reciprocals, the harmonic mean is the friendliest towardsmall values in the data, and the least friendly to large values.

    This means, among other things, that:

    the harmonic mean will always be the smallest of the means in value (thearithmetic mean will be the biggest, and the geometric will be inbetween the other two)

    the harmonic mean tends to limit the impact of large outliers, andincrease the weight given to small values of X.

    To be honest, there are not a lot of instances where one is likely to use the

    harmonic mean.

    Median

    The median of a variable X sometimes labeled XMed or X is the middleobservation or the 50th percentile.

    Practically speaking, the median is the value of:

    the

    (N1)2 + 1

    th-largest value of X when N is odd, and

    the mean of theN2

    th- and

    N+2

    2

    th- largest values of X when N is

    even.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    8/27

    Median

    Example: For our NFL opening day data, there are 32 teams.

    The median is therefore the average of the numbers of points scored by the16th- and 17th-most point-scoring teams.

    Those values are 17 and 19, making the median 18 (even though exactly 18

    points were not scored by any team that week).

    The median is typically only calculated for ordinal-, interval- or ratio-level data.

    There is no particular problem with ordinal variables since it just reflects amiddle value (and an ordinal variable orders the observations).

    The median is not calculated for nominal data.

    Median

    While the mean is the number that minimizes the squared distance to the data,the median is the value of c that minimizes the absolute value of the distanceto the data:

    XMed = min

    Ni=1

    |Xi c|

    .

    The median is (relatively) unaffected by outliers and, as a result, is oftenknown as a robust statistic.

    Mode

    The mode XMode is nothing more than the most commonly-occurringvalue of X.

    In our NFL data, for example, the most common number of points scored wasten i.e. XMode = 10.

    The mode is the only measure of central tendency appropriate for use with datameasured at any level, including nominal.

    That means that the mode is the only descriptive statistic appropriate fornominal variables.

    The mode is (technically) undefined for any variable that has equal maximumfrequencies for two or more variable values.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    9/27

    Dichotomous Variables

    Think about a dichotomous (binary) variable, D. Note a few things:

    The mean D = 1NN

    i=1 Di [0, 1] is equal to the proportion of 1s inthe data.

    The median DMed {0, 1}, depending on whether there are more 0sor 1s in the data.

    The mode DMode = DMed .

    The mean of a binary variable tells us a lot about it: not just its mean, but, aswell see, its variance as well.

    It is never the case that a binary variables mean is equal to a value actuallypresent in the data.

    It is common to use the median/mode as a measure of central tendency forbinary variables.

    Relationships

    Figure: Central Tendency in a Symmetric Distribution with a Single Peak

    Mean

    Median

    Mode

    In a perfectly symmetrical continuous variable, the mean and median areidentical.

    If the same variable is unimodal, then the mode is also equal to the mean andthe median.

    Relationships

    If a continuous variable Z is right-skewed, then

    ZMode ZMed Z

    Similarly, if a continuous variable Z is left-skewed, then

    Z ZMed ZMode

    In both cases, the mean is the most affected by outliers, as can be seen inthe next figure.

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    10/27

    Relationships

    Figure: Points Scored in Week One of the 2008 NFL Season Redux0

    0

    0

    1

    1

    2

    2

    3

    3

    4

    4

    5

    5

    6

    6

    7

    78

    8requency

    Frequency

    Frequency

    0

    00

    10

    100

    20

    200

    30

    300

    40

    40oints Scored

    Points Scored

    Points Scoredode= 10, Median = 18, Mean = 20.03Mode = 10, Median = 18, Mean = 20.03

    Mode= 10, Median = 18, Mean = 20.03

    Cautions

    The mean, mode and median are all poor description of the data when youhave a bimodal distribution.

    Strongly bimodal distributions are an indication that what youve really got is adichotomous variable, or possibly that there are multiple dimensions in theintermediate values (Polity).

    With skewed distributions, notably exponential and power-law distributions,the mean can be much larger than the median and the mode.

    Dispersion

    A measure of central tendency does not provide an adequate description ofsome variable X on its own because it only locates the center of thedistribution of data.

    Figure: Frequency Distributions with Equal Means but Different Amounts ofVariation

    Notes

    Notes

    Notes

    http://find/http://find/http://goback/http://goback/http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    11/27

    Range

    The range is the most basic measure of dispersion it is the differencebetween the highest and lowest values of a (interval- or ratio-level) variable:

    Range(X) = Xmax Xmin

    Example: The range of our NFL points variable is (41 3) = 38.

    Things to note about the range:

    The range tells you how much variation there is in your variable, and doesso in units that are native to the variable itself.

    The range also scales with the variable i.e., if you rescale the variable, therange adjusts as well.

    Percentiles, IQR, etc.

    We can also get a handle on the variation in a variable by looking atpercentiles of a variable.

    The kth percentile is the value of the variable below which k percent of theobservations fall.

    If we have data on N = 100 observations, the 50th percentile is the valueof the 50th-largest-valued observation.

    If we have data on N = 6172, then the 50th percentile is the value of the6172 0.5 = 3086th-largest-valued observation in the data, and

    Thus,

    The 50th percentile is the same thing as the median Xmed.

    The 0th percentile is the same thing as Xmin.

    the 100th percentile is the same thing as Xmax.

    Percentiles, IQR, etc.

    We generally look at evenly-numbered percentiles, in particular quartiles anddeciles.

    Quartiles are the 25th, 50th, and 75th percentiles of a variable.

    Example: In our NFL data

    Lower quartile = 13 (25% of teams scored less than or equal to 13 points)

    Middle quartile = 18.

    Upper quartile = 24.5 (only 25% of teams scored more than 24.5 points)

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    12/27

    Percentiles, IQR, etc.

    The inter-quartile range (IQR) sometimes called the midspread isdefined as:

    IQR(X) = 75th percentile(X) 25th percentile(X)

    Example: The IQR for our NFL data is (24.5 13) = 11.5, which means thatthe middle 50 percent of teams scored points that fell within about a12-point range.

    The IQR is analogous to the range, except that it is robust to outlying datapoints.

    Example: If Denver had scored 82 points instead of 41, the range would thenbe (82 3) = 79, rather than 38, but the IQR would be unaffected.

    Percentiles, IQR, etc.

    Deciles are percentiles by tens the tenth, twentieth, thirtieth, etc.percentiles of the data.

    Deciles can be useful when the data are skewed i.e., when there are smallnumbers of relatively high or low values in the data. This is because theyprovide a finer-grained picture of the variation in X than quartiles.

    Deciles are often used to analyze data where there are large disparities betweenlarge and small values.

    Deviations

    A limitation of percentiles and ranges is that they do not make use of theinformation in all of the data.

    Approaches that make use of all the information are typically based on the ideaof a deviation.

    A deviation is just the extent to which an observations value on X differs fromsome benchmark value.

    The typical benchmarks are (i) the median and (ii) the mean.

    In either case, the deviation is just the (signed) difference between anobservations value and that benchmark.

    Deviations from the mean are (Xi

    X).Deviations from the median are (Xi XMed).

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    13/27

    Mean Deviation

    Individual deviations are not particularly interesting. Instead we want, say, atypical deviation.

    We might think to start with the average deviation from the mean value ofX:

    Mean Deviation = 1N

    Ni=1

    (Xi X)

    But note that this measure is actually useless because the mean deviationalways equals 0, no matter the spread of the distribution.

    A corollary of the fact that the mean minimizes the sum of squared deviationsis that the mean is also the value of X that makes the sum of deviations fromit equal to zero.

    Mean Deviation

    Proof.

    Let the sum of the differences be:

    d =

    (X X)

    Take

    through the brackets. Note that

    X is the same as NX and so

    d =

    X NX

    And we know that X =XN . And so

    d =

    X N

    X

    N

    The Ns cancel, leaving

    d =

    X

    X = 0

    Mean Squared Deviation

    One way to avoid having the positive and negative deviations cancel eachother out is to use the squared deviation from the mean, (Xi X)2.

    If we consider the average of this value, we get the mean squared deviation(MSD):

    Mean Squared Deviation = MSD =1

    N

    Ni=1

    (Xi X)2

    The MSD is intuitive enough, in that it is the average squared deviation fromthe mean.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    14/27

    Mean Squared Deviation

    But, think for a minute about the idea of variability. Imagine a dataset with asingle observation:

    team points

    ------------------

    Rams 14

    ------------------

    The mean is (obviously) 14; but whats the MSD?

    Mean Squared Deviation

    But, think for a minute about the idea of variability. Imagine a dataset with asingle observation:

    team points

    ------------------

    Rams 14

    ------------------

    The mean is (obviously) 14; but whats the MSD?

    The MSD is zero because the mean and Xi are identical.

    The point is that from one observation, we can know something about howmany points were scored on average, but we cant know anything about thedistribution (spread) of those points. Were all the games 14-14 ties? Werethey mostly 28-0 blowouts? Theres no way to know.

    Mean Squared Deviation

    Now, add an observation:

    team points

    ------------------

    Rams 14

    Titans 20

    ------------------

    The (new) mean is now 17, and the MSD is

    1

    2

    (14 17)2 + (20 17)2 = 1

    2(9 + 9) =

    18

    2= 9

    At this point, we can begin to learn something not just about the mean of thedata, but also about its variability.

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    15/27

    Mean Squared Deviation

    Informally, this suggests a principle that is wise to always remember:

    You cannot learn about more characteristics of data than you have

    observations.

    If you have one observation, you can learn about the mean, but not thevariability.

    With two observations, you can begin to learn about the mean and thevariation around that mean, but not the skewness (see below). And so forth.

    The formal name for this intuitive idea is degrees of freedom.

    Degrees of freedom are essentially pieces of information that are equal to thesample size, N, minus the number of parameters, P, estimated from the data.

    Variance

    The relevance of degrees of freedom is that, in the simple little example above,we actually only have one effective observation that is telling us about thevariation in X, not two.

    As a result, we should consider revising the denominator of our estimate of theMSD downwards by one.

    This gives us the variance:

    Variance = s2 =1

    N 1

    N

    i=1

    (Xi

    X)2.

    The variance in the sample is denoted by s2, whereas the variance in thepopulation is denoted by 2.

    Note that as N, s2 MSD, but that the two can be quite different insmall samples.

    Variance

    We can provide a shortcut for calculating

    (X X)2.

    (X X)2 = (X X)(X X) = X2 2XX+ X2

    Now apply the summation:

    (X X)(X X) =

    X2 2

    X

    N

    X+ N

    X

    N

    2

    We have taken advantage of the fact that

    X= NX and that X =XN

    . We cannow collect terms for

    (X X)2 =

    X2 2 (

    X)2

    N+ N

    (

    X)2

    N2

    =

    X2 (X)2

    N

    In other words, you can calculate the variance with just two pieces of information:X2 and (

    X)2.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    16/27

    Standard Deviation

    The variance is expressed in terms of squared units rather than the units ofX.

    To put s2 back on the same scale as X, we can take its square root and obtainthe standard deviation:

    Standard Deviation = s = 1N 1

    Ni=1

    (Xi X)2

    s (or for a population) is the closest analogue to an average deviation fromthe mean that we have.

    s is also the standard measure of empirical variability used with interval- andratio-level data.

    In fact, its generally good practice to report s every time you report X.

    Standard Deviation

    Notice that, as with the mean, s is expressed in the units of the originalvariable X.

    Example: In our 32-team NFL data, the variance s2 is 93.6, while the standarddeviation is about 9.67.

    That means that an average (or, more accurately, typical) teams scorewas about 9-10 points away from the empirical mean of (about) 20.

    Changing the Origin and Scale

    For a change in the origin, we have the following results:

    If x = x + a

    then x = x + a

    and sx = sx

    For a change in the scale, we have the following results:

    If x = bx

    then x = bx

    and sx = |b|sxWe can combine these results into one by considering y = a + bx.

    If y = a + bxthen y = a + bx

    and sy = |b|sx

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    17/27

    A Variant: Geometric s

    As with the (arithmetic) mean, theres a variant of s that is better used whenthe geometric mean is the more appropriate statistic.

    The geometric standard deviation is defined as:

    sG = expNi=1(ln Xi ln XG)2

    N sG is the geometric analogue to s, and is best applied in every circumstancewhere the geometric mean is also more appropriately used.

    Mean Absolute Deviation

    Variances and standard deviations, because they are based on means, havemany of the same problems that means have.

    In particular, they can be drastically affected by outlier values of X, with theresult that one or two small or large values can artificially distort the picturepresented by s.

    Example: Suppose that Denver put up 82 points instead of 41, but the other

    teams scores were identical.

    The result is that the mean of points rises (from 20 to about 21.3), but thestandard deviation rises even more dramatically (from 9.6 to about 14.2).

    That is, changing the points scored by a single team caused s to increase by 50percent.

    Mean Absolute Deviation

    An alternative to the variance and standard deviation is to consider not squareddeviations, but rather absolute values of deviations from some benchmark.

    Because the concern is over resistance to outliers, we typically substitute themedian for the mean, in two ways:

    1 We consider deviations around the median, rather than the mean, and

    2 When considering a typical value for the deviations, we use the medianvalue rather than the mean.

    The result is the median absolute deviation (MAD), defined as:

    Median Absolute Deviation = MAD = median|Xi XMed |.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    18/27

    Mean Absolute Deviation

    Note that some textbooks refer to the mean absolute deviation when referringto MAD.

    Mean Absolute Deviation = MAD2 =1

    N

    Ni=1

    |Xi X|

    This is merely a substitution of the absolute value into the standard formula fora mean deviation, but one that lacks the robustness to outliers that MAD does.

    The median absolute deviation (MAD) is often presented where there arehighly influential outlier observations i.e., in the same situations where themedian is used as a measure of central tendency.

    Example

    Table: Mean Squared Deviation (MSD), Variance (s2), and Standard Deviation (s)

    Squared SquaredData Deviations Deviations Deviations

    X (X X) (X X)2 (X X)210 -30 900 90020 -20 400 40030 -10 100 10050 10 100 10090 50 2500 2500

    X = 2005 = 40 0 MSD = 40005 = 800 s2 = 40004 = 1000

    s = 32

    Example

    Table: Deviations, MAD, and MAD2

    Absolu te Abs oluteD evia ti ons D evia ti ons

    Data (Median) (Mean)X |XXMed | |X X|10 20 3020 10 2030 0 1050 20 1090 60 50

    X= 2005 = 40 MAD = 20 MAD2 =120

    5 = 24

    XMed = 30

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    19/27

    Grouped (Frequency) Data

    If we have grouped (frequency) data, then it is possible to calculate our variousmeasures of central tendency and dispersion using relative frequencies.

    This can be useful if we only have a summary of, rather than the, original data.

    Mean for Grouped Data:

    X =1

    N

    Jj=1

    Xjfj

    =1

    N

    (X1 + X1 + . . . + X1)

    f1times

    + (X2 + X2 + . . . + X2) f2times

    + . . .

    =1

    N(X1f1 + X2f2 + . . . + XJfJ)

    where j indexes distinct values of X, and fj are the frequencies (counts) ofeach of the J distinct values of X.

    Grouped (Frequency) Data

    Variance for Grouped Data:

    Variance = s2 =1

    N 1Jj=1

    (Xj X)2fj

    Standard Deviation for Grouped Data:

    Standard Deviation = s = 1

    N 1

    J

    j=1(Xj X)2fj

    Example

    Table: Calculation of Mean and Standard Deviation from Relative FrequencyDistribution (n = 200)

    WeightedSquared Squared

    Height Weighting Deviation Deviations DeviationsX fj |X X| (X X)2 (X X)2fj60 4 -9 81 32463 12 -6 36 43266 44 -3 9 39669 64 0 0 072 56 3 9 50475 16 6 36 57678 4 9 81 324

    X = 69.3

    s2 = 2556199 = 12.8

    s =

    12.8 = 3.6

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    20/27

    Moments

    Means and variances are examples of moments.

    Moments are typically used to characterize random variables (rather thanempirical distributions of variables), but they give rise to some useful additionalstatistics.

    Think of the moment as a description of a distribution . . .

    We can take a moment around some number, usually the mean (or zero).

    In general, the kth moment around a variables mean is Mk = E[(X )k].

    The mean is the first moment: M1 = X = E(X) =XN

    .

    The variance is the second moment: M2 = s2 = E[(X )2] = (

    XX)2

    N1 .

    Skew

    Skew is the dimensionless version of the third moment:

    M3 = E[(X )3] = (

    X X)3N

    The third moment is rendered dimensionless by dividing by the cube of thestandard deviation of X:

    s3 = s.d(X)3 = (

    s2)3

    Thus, the skew of a distribution is given as:

    Skew = 1 =M3s3

    =1N

    Ni=1(Xi X)3

    1N

    Ni=1(Xi X)2

    3/2

    Skew

    Skew is a measure of symmetry it measures the extent to which a distributionhas long, drawn out tails on one side or the other.

    Skew = 0 is symmetrical .

    Skew > 0 is positive (tail to the right).

    Skew < 0 is negative (tail to the left).

    A Normal distribution is symmetrical and has a skew of 0. This will be useful in

    determining whether a variable is normally distributed.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    21/27

    Skew

    Figure: Symmetrical Distribution Zero Skew0

    0

    01

    .1

    .12

    .2

    .23

    .3

    .34

    .4

    .45.5

    .5

    0

    0

    2

    2

    4

    4

    6

    6

    8

    8

    Skew

    Figure: Right-Skewed Distribution Skew > 00

    0

    002

    .02

    .0204

    .04

    .0406

    .06

    .06

    0

    00

    10

    100

    20

    200

    30

    300

    40

    400

    50

    50

    X

    X

    Skew

    Figure: Left-Skewed Distribution Skew < 00

    0

    05

    .5

    .5

    1

    1.5

    1.

    5

    1.52

    2

    0

    05

    .5

    .5

    1

    1.5

    1.5

    1.5

    2

    2X

    X

    X

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    22/27

    Kurtosis

    Kurtosis is the dimensionless version of the fourth moment:

    M4 = E[(X )4] = (

    X X)4N

    The fourth moment is rendered dimensionless by dividing by the square of the

    variance of X:

    s4 = (s2)2

    Thus, the kurtosis of a distribution is given as:

    Kurtosis = 2 =M4s4

    3 =1N

    Ni=1(Xi X)4

    1N

    Ni=1(Xi X)2

    2 3

    Kurtosis

    Kurtosis has to do with the peakedness of a distribution.

    Thin-tailed/Peaked = leptokurtic M4 is large...

    Medium-tailed = mesokurtic.

    Fat-tailed/Flat = platykurtic M4 is small.

    Kurtosis can also be viewed as a measure of non-Normality.

    The Normal distribution is bell shaped, whereas a kurtotic distribution is not.

    The Normal distribution has zero kurtosis, while a flat-topped (platykurtic)distribution has negative values, and a pointy (leptokurtic) distribution haspositive values.

    Moments

    Figure: Kurtosis: Examples0

    0

    05

    .5

    .5

    1

    1.5

    1.5

    1.510

    -10

    -105

    -5

    -5

    0

    0

    5

    50

    10

    1010

    -10

    -105

    -5

    -5

    0

    0

    5

    50

    10

    1010

    -10

    -105

    -5

    -5

    0

    0

    5

    50

    10

    10esokurticMesokurtic

    MesokurticeptokurticLeptokurtic

    LeptokurticlatykurticPlatykurtic

    Platykurticensity

    Density

    Densityphbyyp

    G raphsbytype

    G phbyyp

    Notes

    Notes

    Notes

    http://find/http://find/http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    23/27

    Special Cases

    If the data are symmetrical, then a few things are true:

    The median is equal to the average (mean) of the first and thirdquartiles. That, in turn, means that:

    Half the IQR equals the MAD (i.e., the distance between the medianand (say) the 75th percentile is equal to the MAD).

    The skew is zero.

    Special Cases

    If you have nominal-level variables, all you can do is report categorypercentages.

    You should not use any of the measures discussed above withnominal-level data.

    If you have ordinal-levelvariables, it is usually a good idea to stick withrobust measures of variation/dispersion for ordinal-level variables.

    That means using MADs and the like, rather than variances and standarddeviations.

    While this is not usually (read: almost never) done in practice, it remains theright thing to do.

    Special Cases

    If you have dichotomous variables, then the descriptive statistics describedabove have some special characteristics.

    The range is always 1.

    The percentiles are always either 0 or 1.

    The IQR is always either 0 (if there are fewer than 25 or more than 75percent 1s) or 1 (if there are between 25 and 75 percent 1s).

    The variance of a dichotomous variable D is related to the mean.

    s2D = D (1 D)

    Since D is necessarily between zero and one, then s2D is as well.

    It follows that sD > s2D.sD and s

    2D will always be greatest when D = 0.5 i.e., an equal numbers of

    zeros and ones, and will decline as one moves away from this value.

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    24/27

    Special Cases

    Example: If if we added an indicator called NFC to our NFL Week One data, itwould look like this:

    . sum NFC, detail

    NFC-------------------------------------------------------------

    Percentiles Smallest

    1% 0 05% 0 0

    10% 0 0 Obs 3225% 0 0 Sum of Wgt. 32

    50% .5 Mean .5L ar ge st S td . D ev . . 508 00 05

    75% 1 190% 1 1 Variance .258064595% 1 1 Skewness 099% 1 1 Kurtosis 1

    Best Practices

    It is useful to present summary statistics for every variable you use in a paper.

    Presenting summary measures is a good way of ensuring that your reader can(better) understand what is going on.

    Typically, one presents means, standard deviations, and minimums &maximums, as well as an indication of the total number of observations in thedata.

    For an important dependent variable, its also generally a good idea to

    present some sort of graphical display of the distribution of the responsevariable. This is less important if the variable is dichotomous.

    Best Practices

    Summary Statistics

    StandardVariable Mean Deviation Minimum MaximumAssassination 0.01 0.09 0 1

    Previous Assassinations Since 1945 0.45 0.76 0 4GDP Per Capita / 1000 5.83 6.04 0.33 46.06Political Unrest 0.01 1.01 -1.67 20.11Political Instability -0.03 0.92 -4.66 10.08Executive Selection 1.54 1.34 0 4Executive Power 3.17 2.39 0 6Repression 1.67 1.19 0 3

    Note: N = 5614. Statistics are based on all non-missing observations in the model in Table X.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    25/27

    Best Practices

    Figure: Conditioning Plots / Statistics0

    0

    005

    .05

    .051

    .1

    .1

    0

    00

    10

    100

    20

    200

    30

    300

    40

    40

    0

    00

    10

    100

    20

    200

    30

    300

    40

    40FC

    AFC

    AFCFC

    NFC

    NFCoints Scored

    Points Scored

    Points ScoredphbyN

    GraphsbyN FC

    G phbyN

    Stata

    . summarize points, detail

    points-------------------------------------------------------------

    Percentiles Smallest1% 3 35% 7 7

    10% 10 10 Obs 3225% 13 10 Sum of Wgt. 32

    50% 18 Mean 20.03125L ar ge st S td . D ev . 9 .67 36 57

    75% 25 3490% 34 38 Variance 93.57964

    95% 38 38 Skewness .521958999% 41 41 Kurtosis 2.518387

    . ameans points

    Variable | Type Obs Mean [95% Conf. Interval]-------------+----------------------------------------------------------

    p oi nt s | A ri th me ti c 3 2 2 0. 03 12 5 1 6. 54 35 2 2 3. 51 89 8| Ge om et ric 3 2 1 7. 57 10 3 1 4. 36 26 2 21 .49 61 5| H armonic 32 14.65255 11.30963 20.8009

    ------------------------------------------------------------------------

    Stata

    . tabstat points, stats(mean median sum max min range sd variance skewness kurtosis q)

    variable | mean p50 sum max min range-------------+-------------------------------------------------------------

    points | 20.03125 18 641 41 3 38---------------------------------------------------------------------------

    sd variance skewness kurtosis p25 p50 p75--------------------------------------------------------------------------

    9.673657 93.57964 .5219589 2.518387 13 18 25--------------------------------------------------------------------------

    Notes

    Notes

    Notes

    http://find/http://find/http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 7/28/2019 Lecture3 Handouts

    26/27

    Stata

    . summarize population, detail

    . return list

    scalars:r(N) = 42

    r(sum_w) = 42r(mean) = 18531.88095238095

    r(Var) = 871461347.2293844r(sd) = 29520.52416928576

    r(skewness) = 2.480049617027598r(kurtosis) = 9.764342116660073

    r(sum) = 778339

    r(min) = 27r(max) = 146001

    r(p1) = 27r(p5) = 32

    r(p10) = 276r(p25) = 2405r(p50) = 5372r(p75) = 15892r(p90) = 59330r(p95) = 65667r(p99) = 146001

    . scalar mean = r(mean)

    . display mean18531.881

    Stata

    . sort NFC

    . by NFC: summarize points

    -----------------------------------------------------------------------> NFC = AFC

    Variable | Obs Mean Std. Dev. Min Max-------------+--------------------------------------------------------

    points | 16 19.125 10.07224 10 41

    -----------------------------------------------------------------------> NFC = NFC

    Variable | Obs Mean Std. Dev. Min Max

    -------------+--------------------------------------------------------points | 16 20.9375 9.497149 3 38

    R

    > install.packages("foreign")> library(foreign)> install.packages("psych")> library(psych)> NFL attach(NFL)> summary(points)

    Min. 1st Qu. Median Mean 3rd Qu. Max.3 .0 0 1 3. 00 1 8. 00 2 0. 03 2 4. 50 4 1. 00

    > geometric.mean(points)[1] 17.57103> harmonic.mean(points)[1] 14.65255

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 7/28/2019 Lecture3 Handouts

    27/27

    R

    > library(Hmisc)> describe(points)points

    n missing unique Mean .05 .10 .25 .50 .75 .90 .9532 0 18 20.03 8.65 10.00 13.00 18.00 24.50 34.00 38.00

    3 7 10 13 14 16 17 19 20 21 23 24 26 28 29 34 38 41Frequency 1 1 5 2 2 1 4 1 2 1 1 3 1 1 1 2 2 1% 3 3 16 6 6 3 12 3 6 3 3 9 3 3 3 6 6 3

    > library(pastecs)> stat.desc(points)

    nbr.val nbr.null nbr.na min max range sum median32.0000 0.0000 0.0000 3.0000 41.0000 38.0000 641.0000 18.0000

    mean SE.mean CI.mean.0.95 var std.dev coef.var20.0312 1.7101 3.4877 93.5796 9.6737 0.4829

    R

    > var(points)[1] 93.58> sd(points)[1] 9.674> mad(points)[1] 8.896

    > library(moments)> skewness(points)[1] 0.522> kurtosis(points)[1] 2.518

    > by(points,NFC,summary)

    NFC: AFCMin. 1st Qu. Median Mean 3rd Qu. Max.10.0 12.2 17.0 19.1 21.0 41.0

    ------------------------------------------------------------------------------------NFC: NFC

    Min. 1st Qu. Median Mean 3rd Qu. Max.3.0 15.2 22.0 20.9 26.5 38.0

    Notes

    Notes

    Notes