18
1 3. The Methodology of Descriptive Statistics The purpose of a descriptive statistical investigation is clarify a number of  characteristics of a given variable measures in time or as a cross section A descriptive statistical analysis consists of: Setting up a histogram (or a time series plot) Calculating descriptive statistics Measures of location and position Inspection for outliers Classification of the distribution of the examined data set The range of statistical techniques utilized have not provided us with  anything more than we w ould have got by taking th e [...] variables and looking at their graphs Statisti cs EUS & Negot Chinese 1 In statistics, we consider the following types of data: Cross-section: Many sectors/cate gories/regions at a given point in time Time series: One sector/category /regions over a period of time e.g. a year Panel: A combination of times series and cross section Census: Statistics provided through a questionnaire Statisti cs EUS & Negot Chinese 2

Descriptive Statistics PPT

Embed Size (px)

DESCRIPTION

Descriptive Statistics PPT from the course

Citation preview

  • 13. The Methodology of Descriptive StatisticsThe purpose of a descriptive statistical investigation is clarify a number of characteristics of a given variable measures in time or as a cross section

    A descriptive statistical analysis consists of: Setting up a histogram (or a time series plot) Calculating descriptive statistics

    Measures of location and position

    Inspection for outliers Classification of the distribution of the examined data set

    The range of statistical techniques utilized have not provided us with anything more than we would have got by taking the [...] variables and looking at their graphs

    Statistics EUS & Negot Chinese 1

    In statistics, we consider the following types of data:

    Cross-section:Many sectors/categories/regions at a given point in time

    Time series:One sector/category/regions over a period of time e.g. a year

    Panel:A combination of times series and cross section

    Census:Statistics provided through a questionnaire

    Statistics EUS & Negot Chinese 2

  • 24. Histogram A histogram displays classification into intervals of a

    quantitative variable The horizontal axis (x-axis) is the interval scale The vertical axis (y-axis) is used to display the frequency

    Data set with 20 observations of incomes in 1,000 DKK

    Ranked

    Statistics EUS & Negot Chinese 3

    9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17

    6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24

    How can the data set be divided into some efficient categories or groups?

    Ad hoc method:

    More mathematical approach: 2k=n where k is the number of categories

    Statistics EUS & Negot Chinese 4

    Below 5 6 to 10 11 to 15 16 to 20 21 or more TotalNumber 20Frequency 0 3 5 9 3 20

    Relative % 0 0.15 0.25 0.45 0.15 1.00

    Cumulative % 0 0.15 0.40 0.85 1.00

    10.5 to 15 16 to 15 15 to 19.5 19.5 to 24 TotalObservations 20Frequency 3 5 8 4 20Relative % 0.15 0.25 0.40 0.20 1.00Cumulative % 0.15 0.40 0.80 1.00

  • 3Statistics EUS & Negot Chinese 5

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    Under 5 5 to 10 11 to15 16 to 20 Over 20

    Frequency

    Interval (1,000 DKK)

    Monthly Income

    Construction of a Histogram by use of Excel

    Statistics EUS & Negot Chinese 6

  • 4A Special Histogram

    Age: 0 to 4 5 to 14 15 to 29 30 to 49 50 to 69 70 or more Total Persons, mill 116.60 196.90 350.50 283.10 147.90 36.80 1131.90Persons, % 10.30 17.40 30.97 25.01 13.07 3.25 100.00 Units of 5 years 1 2 3 4 4 [4]* 18 % units of 5 years 10.30 8.70 10.32 6.25 3.27 0.81 *=assumed Using data from the first part of the table the following graph can be drawn:

    0,005,00

    10,0015,0020,0025,0030,0035,00

    0 to 4 5 to 14 15 to 29 30 to 49 50 to 69 70 or more

    Percent

    Age

    Population China 1990

    Statistics EUS & Negot Chinese 7

    0.00

    2.00

    4.00

    6.00

    8.00

    10.00

    12.00

    0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85plus

    PopulationChina1.7.1990

    Statistics EUS & Negot Chinese 8

    Age, year 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 >85 Person,% 10.3 8.7 8.7 10.3 10.3 10.3 6.25 6.25 6.25 6.25 3.3 3.3 3.3 3.3 0.8 0.8 0.8 0.8 0.8

  • 55. Measures of LocationMost frequent or typical observation

    Sample mean (MB page 26)Modus or Mode (MB page 32)Median (MB page 31)Geometric Mean (MB page 57)Relation among the mean, mode and medianQuartiles and Perentiles

    Statistics EUS & Negot Chinese 9

    The meanUses information from all observations

    Man

    From the example:

    Grouped data set:

    Statistics EUS & Negot Chinese 10

  • 6Example of Grouped data set on GradesExam in the course International Economics that was held in February 2011 at the BA-int study in Flensburg

    Grouped mean:

    Modus or ModeThis is the most common observed observation (highest frequency)

    Income data example mode = 16Grouped data examplemode = 7

    Statistics EUS & Negot Chinese 11

    Grades of passed (7-point DK scale) 2 4 7 10 12 Total Frequency 10 26 33 19 4 92

    MedianThe middlemost observation:

    Median = 0.50(n + 1) ordered position0.50(20+1) = 10.5 ordered observation = 16

    Example with grades: At the 46.5 ordered obs. = 7

    Important measure because it is not sensitive with regard to outliers

    Statistics EUS & Negot Chinese 12

    Data 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24 Frequency .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 Cumulative .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

  • 7Sum function:

    Statistics EUS & Negot Chinese 13

    Dealing with symmetry

    Statistics EUS & Negot Chinese 14

  • 8Summing upSymmetry: M0 = Md = Skewed to the right: M0 < Md < (bulk of data left)Skewed to the left: < Md < M0 (bulk of data right)

    Income data set: = 15.85 < M0 = 16 and Md = 16 data is skewed to the left

    Grade data set: = 6.45 < Mo = 7 and Md = 7 data is skewed to the left

    Statistics EUS & Negot Chinese 15

    Quartiles and Percentiles

    Quartile = q(n+1) ordered position

    Percentile = p(n+1) ordered position

    5-point summary:1st decil is 0.10-percentileLower quartile is 0.25-percentile (called Q1)Median is 0.50-percentileUpper quartile is 0.75-percentile (called Q3)9th decil is 0.90-percentile

    Statistics EUS & Negot Chinese 16

  • 9Example

    10: (20+1)(10/100) = 2.10 observations appears at = 9.1025: (20+1)(25/100) = 5.25 observations appears at = 13.7550: (20+1)(50/100) = 10.50 observations appears at = 16.0075: (20+1)(75/100) = 15.75 observations appears at = 18.2590: (20+1)(90/100) = 18.90 observations appears at = 21.90

    Statistics EUS & Negot Chinese 17

    Data 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24 Frequency .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 Cumulative .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00 Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

    Geometric (multiplicative) MeanDefined as:

    The geometric mean is always smaller than the arithmic mean

    Example:

    Statistics EUS & Negot Chinese 18

  • 10

    6. Measures of DispersionRange, inter quartile range, decil range and Box-plotVariance and standard deviationCoefficient of variationSkewness and kurtosis

    Range = maximum minimumQuartile range = Q3 Q1 = 50 % of obs.Decil range = D9 D1 = 80 % of obs.

    Statistics EUS & Negot Chinese 19

    Box-plotA Box-plot is used in order to identify outliersOutlier: obs. more than 3 times the IRQ away from Q1 and Q3 Suspected outlier: obs. more than 1.5 (but less than 3) IRQ away from Q1 and Q3

    For our little data set we get

    (supected) Outlier Q1 Median Q3

    BoxPlot

    0 5 10 15 20 25 30

    Statistics EUS & Negot Chinese 20

  • 11

    Lower inner fence:Q1 1.5IQR = 13.75 1.5(4.5) = 7.00

    Lower outer fence:Q1 3.0IQR = 13.75 3.0(4.5) = 0.25

    Upper inner fence:Q3 + 1.5IQR = 18.25 + 1.5(4.5) = 25.50

    Upper outer fence:Q3 + 3.0IQR = 18.25 + 3.0(4.5) = 32.25

    Statistics EUS & Negot Chinese 21

    Variance and Standard DeviationMake use of all observations

    or

    Example on data set for incomes

    Statistics EUS & Negot Chinese 22

  • 12

    Grouped data set

    Example:

    Statistics EUS & Negot Chinese 23

    The Coefficient of Variation:Gives the relative dispersionRecommended for comparisons of different data sets

    If the distribution has large variation (is very flat) then CVtakes a large value.If the distribution has small variation (is very steep) then CVtakes a small value.

    Statistics EUS & Negot Chinese 24

  • 13

    Some examples:

    SK > 0: RightSK = 0: SymmetrySK < 0: Left

    KU large: DensityKU low: Uniform

    Statistics EUS & Negot Chinese 25

    7. Descriptive statistics on a Computer or Calculator

    Use of ExcelUse of MegastatUse of pocket calculator

    Statistics EUS & Negot Chinese 26

  • 14

    8. Descriptive Statistics in a Grouped Data Sets

    Statistics EUS & Negot Chinese 27

    More complex data set for the distribution of income, Denmark Disposal house hold incomes, Denmark, 1987 i

    Interval for incomes 1,000 DKK

    Number of households,

    1,000

    Mean income 1,000 DKK

    Income mass Mio. DKK

    Deviation

    Square

    fi xi fixi (xi ) (xi )2 fi(xi )2 1 2 3 4 5 6 7 8

    0 50

    100 150 200 250 300 400

    - 49.9 - 99.9

    - 149.9 - 199.9 - 249.9

    299.9 399.9

    -

    146 590 414 323 325 210 139 55

    36.9 73.2

    123.7 175.1 225.9 273.6 340.6 548.3

    5,387 43,202 51,224 56,568 73,435 57,446 47,339 30,156

    -128.7 -92.4 -41.9

    9.5 60.3

    108.0 175.0 382.7

    16563.69 8537.76 1755.61

    90.25 3636.09

    11664.00 30625.00

    146459.29

    2418298 5036983 726822 29151

    1181729 2449440 4256875 8055261

    Sum 2,202 364,757 24154559 Source: Statistics Denmark, Annual Statistical Review, 1994, page 220-221.

    Mean and Standard Deviation

    Statistics EUS & Negot Chinese 28

    Mean and Standard Deviation There are 8 categories i.e. k = 8. By insertion in the formulas:

    Mean: 6.165648,165202,2757,3641 DKK

    nxfk

    i ii

    Standard deviation: 73.104202,2

    559,154,24)(

    1

    2

    n

    xfk

    iii

  • 15

    Histogram, Quartiles, Median and Box-plotConsider the relative and cumulative distribution of data

    Statistics EUS & Negot Chinese 29

    Disponible husstandsindkomster, Danmark, 1987 i

    Interval for incomes 1,000 DKK

    Number of households,

    1,000

    Number of households

    frequency, %

    Cumulative frequency, %

    fi fi/n1 2 3 4 5 6 7 8

    0 50

    100 150 200 250 300 400

    - 49.9 - 99.9

    - 149.9 - 199.9 - 249.9

    299.9 399.9

    -

    146 590 414 323 325 210 139 55

    6.6 26.8 18.8 14.7 14.8

    9.5 6.3 2.5

    6.6 33.4 52.2 66.9 81.7 91.2 97.5

    100.0

    Sum 2,202 100.0 Source: Statistics Denmark, Annual Statistical Review, 1994, page 220-221

    Histogram

    Distribution Income, Denmark, 1987

    0,00

    5,00

    10,00

    15,00

    20,00

    25,00

    30,00

    0 - 49 50 - 99 100 -149

    150 -199

    200 -249

    250 -299

    300 -349

    350 -399

    Above400

    %

    Statistics EUS & Negot Chinese 30

  • 16

    Sum Function

    Statistics EUS & Negot Chinese 31

    How to do the interpolation

    We use a formula for example given as:

    Value = End value interval """"

    pctpercentinwidthTotalfractiletorelativelongtoo interval width in value

    Illustration: Frequency % 52.2 50 33.4 100 ? 149 income (1,000 DKK)

    Statistics EUS & Negot Chinese 32

  • 17

    Median: 149,144851,5000,150000,508.18

    )502.52(000,150 Similarly for the other quartiles and deciles:

    Lower quartile: 328,84000,508.26

    )254.33(000,100 (Q1)

    Upper quartile: 365,227000,508.14

    )757.81(000,250 (Q3)

    Lower decile: 343,56000,508.26

    )104.33(000,100

    Upper decile: 684,293000,505.9

    )902.91(000,300

    Statistics EUS & Negot Chinese 33

    Inter Quartile Range (IQR): (Q3Q1) = 227,365 84,328 = 143,037 Lower inner fence: Q1 1.5IQR = 84,328 1.5(143,037) = 130,228 Lower outer fence: Q1 3.0IQR = 84,328 3.0(143,037) = 344,783 Upper inner fence: Q3 + 1,5IQR = 227,365 + 1.5(143,037) = 441,921 Upper outer fence: Q3 + 3.0IQR = 227,365 + 3.0(143,037) = 656,476 Box-plot

    300 200 100 0 100 200 300 400 500 600

    LOF = 345 LIF = 130 Q1=84 M=144 Q3=227 UIF = 442 UOF = 656

    Statistics EUS & Negot Chinese 34

  • 18

    9. Descriptive Statistics an Example of Outliers

    Outliers are extremes

    Outliers make distributions non-normal Outliers changes the mean, standard deviation and skewness

    However, the median remains constant

    Statistics EUS & Negot Chinese 35

    Basic Max=34 Max=44 Max=54 Mean 15.85 16.35 16.85 17.35 Increases Standard Error 1.00 1.29 1,69 2.13 Median 16 16 16 16 Constant!! Modus / Mode 16 16 16 16 Standard deviation 4.46 5.79 7.56 9.52 Sample variance 19.92 33.50 57.08 90.66 Kurtosis 0.12 3.88 8.99 12.55 Skewness -0.35 1.19 2.43 3.16 Increases Range 18 28 38 48 Minimum 6 6 6 6 Maximum 24 34 44 54 Sum 317 327 337 347 Observations 20 20 20 20 Confidence interval(95 %) 2.09 2.71 3.54 4.46 Increases

    Statistics EUS & Negot Chinese 36