76
Looking at Data - Distributions Displaying Distributions with Graphs IPS Chapter 1.1 © 2009 W.H. Freeman and Company

separator

Embed Size (px)

DESCRIPTION

separator design

Citation preview

  • Looking at Data - DistributionsDisplaying Distributions with GraphsIPS Chapter 1.1 2009 W.H. Freeman and Company

  • Objectives (IPS Chapter 1.1)Displaying distributions with graphs VariablesTypes of variablesGraphs for categorical variablesBar graphs Pie chartsGraphs for quantitative variablesHistograms Stemplots Stemplots versus histogramsInterpreting histogramsTime plots

  • VariablesIn a study, we collect informationdatafrom individuals. Individuals can be people, animals, plants, or any object of interest.

    A variable is any characteristic of an individual. A variable varies among individuals.

    Example: age, height, blood pressure, ethnicity, leaf length, first language

    The distribution of a variable tells us what values the variable takes and how often it takes these values.

  • Two types of variables Variables can be either quantitative

    Something that takes numerical values for which arithmetic operations, such as adding and averaging, make sense.Example: How tall you are, your age, your blood cholesterol level, the number of credit cards you own.

    or categorical.

    Something that falls into one of several categories. What can be counted is the count or proportion of individuals in each category.Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not.

  • How do you know if a variable is categorical or quantitative?Ask: What are the n individuals/units in the sample (of size n)?What is being recorded about those n individuals/units?Is that a number ( quantitative) or a statement ( categorical)?Quantitative Each individual is attributed a numerical value.Categorical Each individual is assigned to one of several categories.

  • Ways to chart categorical dataBecause the variable is categorical, the data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.)

    Bar graphs Each category is represented by a bar.

    Pie charts The slices must represent the parts of one whole.

  • Example: Top 10 causes of death in the United States 2001For each individual who died in the United States in 2001, we record what was the cause of death. The table above is a summary of that information.

  • Top 10 causes of deaths in the United States 2001Bar graphs

    Each category is represented by one bar. The bars height shows the count (or sometimes the percentage) for that particular category.

    Chart20

    700.142

    553.768

    163.538

    123.013

    101.537

    71.372

    62.034

    53.852

    39.48

    32.238

    Percents

    Counts (x1000)

    Sheet1

    Heart disease909,894

    All Races, Females Percent* 1. Heart Disease 28.6 2. Cancer 21.6 3. Stroke 8.0 4. Chronic lower respiratory diseases 5.2 5. Alzheimer's disease 3.4 6. Diabetes 3.1 7. Unintentional injuries 3.0 8. Influenza and pneumonia 3.0 9. Kidney disease 1.7

    All Races, FemalesPercent*Percent

    1. Heart Disease28.6

    2. Cancer21.6

    3. Stroke8

    4. Chronic lower respiratory diseases5.2

    5. Alzheimer's disease3.4

    6. Diabetes3.1

    7. Unintentional injuries3

    8. Influenza and pneumonia3

    9. Kidney disease1.7

    10. Septicemia1.5

    All other causes20.9

    79.1

    Rank1Causes of deathNumberDeaths per

    100,000 populationAll causes2,416,425848.51.Diseases of heart700,142245.82.Malignant neoplasms (cancer)553,768194.43.Cerebrovascular diseases 163,53857.44.Chronic lower respiratory diseases123,01343.25.Accidents (unintentional injuries)101,53735.76.Diabe

    2,416,425

    RankCauses of deathCountsPercentsCounts (x1,000)

    1Heart diseases700,14229.0%700.142

    2Cancers553,76822.9%553.768

    3Cerebrovascular163,5386.8%163.538

    4Chronic respiratory123,0135.1%123.013

    5Accidents101,5374.2%101.537

    6Diabetes mellitus71,3723.0%71.372

    7Flu & pneumonia62,0342.6%62.034

    8Alzheimer's disease53,8522.2%53.852

    9Kidney disorders39,4801.6%39.48

    10Septicemia32,2381.3%32.238

    All other causes629,96726.1%629.967

    http://www.infoplease.com/ipa/A0005110.html

    1. Rank based on number of deaths.

    Source: U.S. National Center for Health Statistics, National Vital Statistics Report, vol. 52, no. 3, Sept. 18, 2003. Web: www.cdc.gov/nchs .

    NUM

    B

    ER

    O

    F

    IND

    I

    VID

    U

    AL

    S

    CA

    T

    E

    GO

    RY

    (

    D

    is

    ease

    )

    Ch

    ea

    se

    Hea

    rt

    D

    i

    sea

    s

    e

    C

    a

    nce

    r

    St

    rok

    e

    r

    on

    i

    c

    Re

    s

    pi

    r

    a

    t

    o

    r

    y D

    is

    Acc

    i

    d

    e

    nt

    s

    D

    i

    abe

    t

    e

    s

    P

    neu

    m

    on

    i

    a

    /I

    nf

    l

    uenz

    a

    A

    l

    zhe

    im

    e

    r

    's

    D

    i

    s

    e

    ase

    K

    i

    dney

    D

    i

    se

    a

    se

    S

    ep

    ti

    ce

    mi

    a

    T

    OT

    AL

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    NUM

    B

    ER

    O

    F

    IND

    I

    VID

    U

    AL

    S

    CA

    T

    E

    GO

    RY

    (

    D

    is

    ease

    )

    Ch

    ea

    se

    Hea

    rt

    D

    i

    sea

    s

    e

    C

    a

    nce

    r

    St

    rok

    e

    r

    on

    i

    c

    Re

    s

    pi

    r

    a

    t

    o

    r

    y D

    is

    Acc

    i

    d

    e

    nt

    s

    D

    i

    abe

    t

    e

    s

    P

    neu

    m

    on

    i

    a

    /I

    nf

    l

    uenz

    a

    A

    l

    zhe

    im

    e

    r

    's

    D

    i

    s

    e

    ase

    K

    i

    dney

    D

    i

    se

    a

    se

    S

    ep

    ti

    ce

    mi

    a

    T

    OT

    AL

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Percents

    Counts (x1,000)

    Sheet2

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Percents

    Percent

    Sheet3

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Counts

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Counts

    Cerebrovasc diseases

  • Bar graph sorted by rank Easy to analyzeTop 10 causes of deaths in the United States 2001Sorted alphabetically Much less useful

    Chart20

    700.142

    553.768

    163.538

    123.013

    101.537

    71.372

    62.034

    53.852

    39.48

    32.238

    Percents

    Counts (x1000)

    Sheet1

    Heart disease909,894

    All Races, Females Percent* 1. Heart Disease 28.6 2. Cancer 21.6 3. Stroke 8.0 4. Chronic lower respiratory diseases 5.2 5. Alzheimer's disease 3.4 6. Diabetes 3.1 7. Unintentional injuries 3.0 8. Influenza and pneumonia 3.0 9. Kidney disease 1.7

    All Races, FemalesPercent*Percent

    1. Heart Disease28.6

    2. Cancer21.6

    3. Stroke8

    4. Chronic lower respiratory diseases5.2

    5. Alzheimer's disease3.4

    6. Diabetes3.1

    7. Unintentional injuries3

    8. Influenza and pneumonia3

    9. Kidney disease1.7

    10. Septicemia1.5

    All other causes20.9

    79.1

    Rank1Causes of deathNumberDeaths per

    100,000 populationAll causes2,416,425848.51.Diseases of heart700,142245.82.Malignant neoplasms (cancer)553,768194.43.Cerebrovascular diseases 163,53857.44.Chronic lower respiratory diseases123,01343.25.Accidents (unintentional injuries)101,53735.76.Diabe

    2,416,425

    RankCauses of deathCountsPercentsCounts (x1,000)

    1Heart diseases700,14229.0%700.142

    2Cancers553,76822.9%553.768

    3Cerebrovascular163,5386.8%163.538

    4Chronic respiratory123,0135.1%123.013

    5Accidents101,5374.2%101.537

    6Diabetes mellitus71,3723.0%71.372

    7Flu & pneumonia62,0342.6%62.034

    8Alzheimer's disease53,8522.2%53.852

    9Kidney disorders39,4801.6%39.48

    10Septicemia32,2381.3%32.238

    All other causes629,96726.1%629.967

    http://www.infoplease.com/ipa/A0005110.html

    1. Rank based on number of deaths.

    Source: U.S. National Center for Health Statistics, National Vital Statistics Report, vol. 52, no. 3, Sept. 18, 2003. Web: www.cdc.gov/nchs .

    NUM

    B

    ER

    O

    F

    IND

    I

    VID

    U

    AL

    S

    CA

    T

    E

    GO

    RY

    (

    D

    is

    ease

    )

    Ch

    ea

    se

    Hea

    rt

    D

    i

    sea

    s

    e

    C

    a

    nce

    r

    St

    rok

    e

    r

    on

    i

    c

    Re

    s

    pi

    r

    a

    t

    o

    r

    y D

    is

    Acc

    i

    d

    e

    nt

    s

    D

    i

    abe

    t

    e

    s

    P

    neu

    m

    on

    i

    a

    /I

    nf

    l

    uenz

    a

    A

    l

    zhe

    im

    e

    r

    's

    D

    i

    s

    e

    ase

    K

    i

    dney

    D

    i

    se

    a

    se

    S

    ep

    ti

    ce

    mi

    a

    T

    OT

    AL

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    NUM

    B

    ER

    O

    F

    IND

    I

    VID

    U

    AL

    S

    CA

    T

    E

    GO

    RY

    (

    D

    is

    ease

    )

    Ch

    ea

    se

    Hea

    rt

    D

    i

    sea

    s

    e

    C

    a

    nce

    r

    St

    rok

    e

    r

    on

    i

    c

    Re

    s

    pi

    r

    a

    t

    o

    r

    y D

    is

    Acc

    i

    d

    e

    nt

    s

    D

    i

    abe

    t

    e

    s

    P

    neu

    m

    on

    i

    a

    /I

    nf

    l

    uenz

    a

    A

    l

    zhe

    im

    e

    r

    's

    D

    i

    s

    e

    ase

    K

    i

    dney

    D

    i

    se

    a

    se

    S

    ep

    ti

    ce

    mi

    a

    T

    OT

    AL

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Percents

    Counts (x1,000)

    Sheet2

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Percents

    Percent

    Sheet3

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Counts

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Counts

    Cerebrovasc diseases

    Chart21

    101.537

    53.852

    553.768

    163.538

    123.013

    71.372

    62.034

    700.142

    39.48

    32.238

    Percents

    Counts (x1000)

    Sheet1

    Heart disease909,894

    All Races, Females Percent* 1. Heart Disease 28.6 2. Cancer 21.6 3. Stroke 8.0 4. Chronic lower respiratory diseases 5.2 5. Alzheimer's disease 3.4 6. Diabetes 3.1 7. Unintentional injuries 3.0 8. Influenza and pneumonia 3.0 9. Kidney disease 1.7

    All Races, FemalesPercent*Percent

    1. Heart Disease28.6

    2. Cancer21.6

    3. Stroke8

    4. Chronic lower respiratory diseases5.2

    5. Alzheimer's disease3.4

    6. Diabetes3.1

    7. Unintentional injuries3

    8. Influenza and pneumonia3

    9. Kidney disease1.7

    10. Septicemia1.5

    All other causes20.9

    79.1

    Rank1Causes of deathNumberDeaths per

    100,000 populationAll causes2,416,425848.51.Diseases of heart700,142245.82.Malignant neoplasms (cancer)553,768194.43.Cerebrovascular diseases 163,53857.44.Chronic lower respiratory diseases123,01343.25.Accidents (unintentional injuries)101,53735.76.Diabe

    2,416,425

    RankCauses of deathCountsPercentsCounts (x1,000)

    5Accidents101,5374.2%101.537

    8Alzheimer's disease53,8522.2%53.852

    2Cancers553,76822.9%553.768

    3Cerebrovascular163,5386.8%163.538

    4Chronic respiratory123,0135.1%123.013

    6Diabetes mellitus71,3723.0%71.372

    7Flu & pneumonia62,0342.6%62.034

    1Heart diseases700,14229.0%700.142

    9Kidney disorders39,4801.6%39.48

    10Septicemia32,2381.3%32.238

    All other causes629,96726.1%629.967

    http://www.infoplease.com/ipa/A0005110.html

    1. Rank based on number of deaths.

    Source: U.S. National Center for Health Statistics, National Vital Statistics Report, vol. 52, no. 3, Sept. 18, 2003. Web: www.cdc.gov/nchs .

    NUM

    B

    ER

    O

    F

    IND

    I

    VID

    U

    AL

    S

    CA

    T

    E

    GO

    RY

    (

    D

    is

    ease

    )

    Ch

    ea

    se

    Hea

    rt

    D

    i

    sea

    s

    e

    C

    a

    nce

    r

    St

    rok

    e

    r

    on

    i

    c

    Re

    s

    pi

    r

    a

    t

    o

    r

    y D

    is

    Acc

    i

    d

    e

    nt

    s

    D

    i

    abe

    t

    e

    s

    P

    neu

    m

    on

    i

    a

    /I

    nf

    l

    uenz

    a

    A

    l

    zhe

    im

    e

    r

    's

    D

    i

    s

    e

    ase

    K

    i

    dney

    D

    i

    se

    a

    se

    S

    ep

    ti

    ce

    mi

    a

    T

    OT

    AL

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    NUM

    B

    ER

    O

    F

    IND

    I

    VID

    U

    AL

    S

    CA

    T

    E

    GO

    RY

    (

    D

    is

    ease

    )

    Ch

    ea

    se

    Hea

    rt

    D

    i

    sea

    s

    e

    C

    a

    nce

    r

    St

    rok

    e

    r

    on

    i

    c

    Re

    s

    pi

    r

    a

    t

    o

    r

    y D

    is

    Acc

    i

    d

    e

    nt

    s

    D

    i

    abe

    t

    e

    s

    P

    neu

    m

    on

    i

    a

    /I

    nf

    l

    uenz

    a

    A

    l

    zhe

    im

    e

    r

    's

    D

    i

    s

    e

    ase

    K

    i

    dney

    D

    i

    se

    a

    se

    S

    ep

    ti

    ce

    mi

    a

    T

    OT

    AL

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    709

    ,

    894

    551

    ,

    833

    166

    ,

    028

    123

    ,

    550

    93

    ,

    592

    68

    ,

    662

    67

    ,

    024

    49

    ,

    044

    37

    ,

    672

    31

    ,

    613

    1

    ,

    898,912

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Percents

    Counts (x1,000)

    Sheet2

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Percents

    Percent

    Sheet3

    700142

    553768

    163538

    123013

    101537

    71372

    62034

    53852

    39480

    32238

    629967

    Counts

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Counts

    Cerebrovasc diseases

  • Percent of people dying fromtop 10 causes of death in the United States in 2000Pie chartsEach slice represents a piece of one whole. The size of a slice depends on what percent of the whole this category represents.

  • Percent of deaths from top 10 causesPercent of deaths from all causesMake sure your labels match the data.

    Make sure all percentsadd up to 100.

  • Child poverty before and after government interventionUNICEF, 1996What does this chart tell you?The United States has the highest rate of child poverty among developed nations (22% of under 18).Its government does the leastthrough taxes and subsidiesto remedy the problem (size of orange bars and percent difference between orange/blue bars).

    Could you transform this bar graph to fit in 1 pie chart? In two pie charts? Why?The poverty line is defined as 50% of national median income.

  • Ways to chart quantitative data

    Histograms and stemplotsThese are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data.

    Line graphs: time plotsUse when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.

  • HistogramsThe range of values that a variable can take is divided into equal size intervals.

    The histogram shows the number of individual data points that fall in each interval.The first column represents all states with a Hispanic percent in their population between 0% and 4.99%. The height of the column shows how many states (27) have a percent in this range.

    The last column represents all states with a Hispanic percent in their population between 40% and 44.99%. There is only one such state: New Mexico, at 42.1% Hispanics.

  • Stem plotsHow to make a stemplot:Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which is that remaining final digit. Stems may have as many digits as needed, but each leaf contains only a single digit.Write the stems in a vertical column with the smallest value at the top, and draw a vertical line at the right of this column.Write each leaf in the row to the right of its stem, in increasing order out from the stem.STEMLEAVES

  • Percent of Hispanic residents in each of the 50 states

  • Stem PlotTo compare two related distributions, a back-to-back stem plot with common stems is useful.

    Stem plots do not work well for large datasets.

    When the observed values have too many digits, trim the numbers before making a stem plot.

    When plotting a moderate number of observations, you can split each stem.

  • Stemplots versus histograms Stemplots are quick and dirty histograms that can easily be done by hand, and therefore are very convenient for back of the envelope calculations. However, they are rarely found in scientific or laymen publications.

  • Interpreting histogramsWhen describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread.Histogram with a line connecting each column too detailedHistogram with a smoothed curve highlighting the overall pattern of the distribution

  • Most common distribution shapesA distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other.

  • OutliersAn important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.AlaskaFloridaThe overall pattern is fairly symmetrical except for 2 states that clearly do not belong to the main trend. Alaska and Florida have unusual representation of the elderly in their population.

    A large gap in the distribution is typically a sign of an outlier.

  • How to create a histogramIt is an iterative process try and try again.

    What bin size should you use?

    Not too many bins with either 0 or 1 countsNot overly summarized that you loose all the informationNot so detailed that it is no longer summary rule of thumb: start with 5 to 10 binsLook at the distribution and refine your bins(There isnt a unique or perfect solution)

  • Same data set

  • IMPORTANT NOTE:Your data are the way they are. Do not try to force them into a particular shape.It is a common misconception that if you have a large enough data set, the data will eventually turn out nice and symmetrical.Histogram of Drydays in 1995

  • Line graphs: time plotsIn a time plot, time always goes on the horizontal, x axis. We describe time series by looking for an overall pattern and for striking deviations from that pattern. In a time series:

  • Retail price of fresh oranges over timeThis time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges.

    There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.Time is on the horizontal, x axis.

    The variable of interesthere retail price of fresh oranges goes on the vertical, y axis.

  • A time plot can be used to compare two or more data sets covering the same time period.The pattern over time for the number of flu diagnoses closely resembles that for the number of deaths from the flu, indicating that about 8% to 10% of the people diagnosed that year died shortly afterward, from complications of the flu.

  • A picture is worth a thousand words,

    BUT There is nothing like hard numbers. Look at the scales.Scales matterHow you stretch the axes and choose your scales can give a different impression.

  • Why does it matter?What's wrong with these graphs?

    Careful reading reveals that:

    1. The ranking graph covers an 11-year period, the tuition graph 35 years, yet they are shown comparatively on the cover and without a horizontal time scale.2. Ranking and tuition have very different units, yet both graphs are placed on the same page without a vertical axis to show the units.3. The impression of a recent sharp drop in the ranking graph actually shows that Cornells rank has IMPROVED from 15th to 6th ...Cornells tuition over timeCornells ranking over time

  • Looking at Data - DistributionsDescribing distributions with numbersIPS Chapter 1.2 2009 W.H. Freeman and Company

  • Objectives (IPS Chapter 1.2)Describing distributions with numbers

    Measures of center: mean, medianMean versus medianMeasures of spread: quartiles, standard deviationFive-number summary and boxplotChoosing among summary statisticsChanging the unit of measurement

  • The mean or arithmetic average

    To calculate the average, or mean, add all values, then divide by the number of individuals. It is the center of mass.

    Sum of heights is 1598.3divided by 25 women = 63.9 inches Measure of center: the mean

    3 small ht

    heightbinnormalFrequencyoutlierFrequencystemleavesheightheight

    58.25757057058258.258.264.0

    59.55858058059559.559.564.5

    60.759591591607960.760.764.1

    60.960601601619960.960.964.8

    61.96161261262224961.961.965.2

    61.9626226226319961.961.965.7

    62.26363463464015862.262.266.2

    62.264644644652762.262.266.7

    62.465653653662762.462.467.1

    62.966662662671862.962.967.8

    63.16767267268963.163.968.9

    63.96868268269663.963.169.6

    63.96969169163.963.9

    64.07070170164.0

    64.171More071064.1

    64.57272064.5

    64.87474164.8

    65.2More065.2

    65.765.7

    66.266.2

    66.766.7

    67.167.1

    67.867.8

    68.968.9

    69.669.6

    74.01598.3sum

    63.9average

    63.9

    3 small ht

    Frequency

    Bin

    Frequency

    Histogram

    birthweight

    Frequency

    Bin

    Frequency

    Histogram

    myeloma

    lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html

    5009993250.517854.77147

    100014993840.65113.28333

    150019997551.2567.42699

    2000249924333.8572.342376

    25002999983215.2620.639770

    300034992373936.7540.2323685

    350039991973630.5350.1819701

    4000449963209.8140.226306

    4500499910561.650.471051

    5000up1090.210.92108

    total64689

    myeloma

    YrSinceDiagnosisPercentDeadThisYearCumulativeDeadBinForN=30PatientsYears2LiveBinFrequency

    129299112215

    21645521446

    31257431664

    4966341882

    587425110101

    668026112120

    758517114141

    838818116161

    9391191More0

    102930102More0

    111940112

    121950122

    131960132

    141971142

    151980152

    161990163

    1711001173

    30183

    194

    204

    214

    225

    235

    246

    256

    267

    278

    289

    2914

    3016

    Frequency

    Bin

    Frequency

    Histogram

  • Mathematical notation:Learn right away how to get the mean using your calculators.

  • Your numerical summary must be meaningful.The distribution of womens heights appears coherent and symmetrical. The mean is a good numerical summary.

  • 58 60 62 64 66 68 70 72 74 76 78 80 82 84A single numerical summary here would not make sense.

    Chart11

    100

    100

    200

    200

    400

    400

    310

    220

    200

    210

    110

    100

    040

    030

    010

    011

    004

    012

    002

    000

    001

    000

    001

    001

    001

    001

    000

    001

    blue

    pink

    red

    Height in centimeters

    Number of Plants

    Height of Plants by Color

    Ht,flowers

    heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinbluepinkred

    58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100

    59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100

    60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200

    60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200

    61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400

    61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400

    62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310

    62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220

    62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200

    62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210

    63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110

    63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100

    63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040

    64.07070164.070142Cindy OparahF511717230031Adam Parada7084240700More064.070172030

    64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010

    64.57264.572074101272064.572374011

    64.87364.873075004473064.873175004

    65.27465.274076102374165.274276012

    65.77565.775077002275465.775477002

    66.27666.276078040476266.276378000

    66.77766.777079001177266.777279001

    67.17867.178480000078067.178080000

    67.87967.8More081001179167.879181001

    68.98068.982001180068.980082001

    69.663.98169.683001181169.681183001

    78.064.5821598.3sum8400118216582184001

    78.08363.9average8500008317683185000

    78.0848600118417484186001

    78.065.98563.985071850

    8686172861

    More071More0

    66

    68average69.6

    72

    69

    71

    66

    73

    71

    72

    74

    75

    75

    75

    75

    76

    76

    77

    77

    79

    81

    82

    83

    84

    86

    Ht,flowers

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height

    birthweight

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height + 4

    myeloma

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Bin

    Frequency

    Height of women on on UCI basketball team

    01

    01

    02

    02

    04

    04

    13

    22

    02

    12

    11

    01

    40

    30

    10

    10

    00

    10

    00

    04

    00

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    UCI women

    Class

    UCI men

    Height in inches

    Number of Individuals

    Height of Class, Women's and Men's Teams

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    UCI women

    Class

    Height in inches

    Number of Women

    Women's Height: Class and Team

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    UCI women

    Class

    UCI men

    Height in inches

    Number of Individuals

    Plant Height and Flower Color

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Size of Red-flowered Plants

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Height of Plants

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    blue

    pink

    red

    Height in centimeters

    Number of Plants

    Height of Plants by Color

    lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html

    5009993250.517854.77147

    100014993840.65113.28333

    150019997551.2567.42699

    2000249924333.8572.342376

    25002999983215.2620.639770

    300034992373936.7540.2323685

    350039991973630.5350.1819701

    4000449963209.8140.226306

    4500499910561.650.471051

    5000up1090.210.92108

    total64689

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    YrSinceDiagnosisPercentDeadThisYearCumulativeDeadBinForN=30PatientsYears2LiveBinFrequency

    129299112215

    21645521446

    3125745731664

    4966341882

    587425110101

    668026112120

    758517114141

    838818116161

    9391191More0

    102930102More0

    111940112

    121950122

    131960132

    141971142

    151980152

    161990163

    1711001173

    30183

    194

    204

    214

    225

    235

    246

    256

    267

    278

    289

    2914

    3016

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Bin

    Frequency

    Histogram

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Year since diagnosis

    Percent dying per year

    Percent of people dying each year post-diagnosis from multiple myeloma

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Years since diagnosis

    Percent of Patients

    Percent Dead from Multiple Myeloma

  • Measure of center: the medianThe median is the midpoint of a distributionthe number such that half of the observations are smaller and half are larger. 1. Sort observations by size.n = number of observations______________________________

  • Mean and median for skewed distributionsMean and median for a symmetric distributionLeft skewRight skewMeanMedianMean MedianMeanMedian Comparing the mean and the medianThe mean and the median are the same only if the distribution is symmetrical. The median is a measure of center that is resistant to skew and outliers. The mean is not.

  • Percent of people dying Mean and median of a distribution with outliers

  • Impact of skewed data

  • M = median = 3.4Q1= first quartile = 2.2Q3= third quartile = 4.35Measure of spread: the quartilesThe first quartile, Q1, is the value in the sample that has 25% of the data at or below it ( it is the median of the lower half of the sorted data, excluding M).

    The third quartile, Q3, is the value in the sample that has 75% of the data at or below it ( it is the median of the upper half of the sorted data, excluding M).

    Ht,flowers

    heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinwhitepinkred

    58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100

    59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100

    60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200

    60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200

    61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400

    61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400

    62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310

    62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220

    62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200

    62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210

    63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110

    63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100

    63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040

    64.07070164.070142Cindy OparahF511717230031Adam Parada7084240mean700More064.070172030

    64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010

    64.57264.5720741012median72064.572374011

    64.87364.87307500447773064.873175004

    65.27465.274076102374165.274276012

    65.77565.775077002275465.775477002

    66.27666.276078040476266.276378000

    66.77766.777079001177266.777279001

    67.1median7867.178480000078067.178080000

    67.863.97967.8More081001179167.879181001

    68.9mean8068.982001180068.980082001

    69.663.98169.683001181169.681183001

    78.0821598.3sum8400118216582184001

    78.08363.9average8500008317683185000

    78.0mean c out848600118417484186001

    78.065.98563.985071850

    median with out8686172861

    64.1More071More0

    66

    68average69.6

    72

    69

    71

    66

    73

    71

    72

    74

    75

    75

    75

    75

    76

    76

    77

    77

    79

    81

    82

    83

    84

    86

    Ht,flowers

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height

    birthweight

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height + 4

    myeloma

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Bin

    Frequency

    Height of women on on UCI basketball team

    Sheet1

    01

    01

    02

    02

    04

    04

    13

    22

    02

    12

    11

    01

    40

    30

    10

    10

    00

    10

    00

    04

    00

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    UCI women

    Class

    UCI men

    Height in inches

    Number of Individuals

    Height of Class, Women's and Men's Teams

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    UCI women

    Class

    Height in inches

    Number of Women

    Women's Height: Class and Team

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Size of Red-flowered Plants

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Height of All Plants

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    white

    pink

    red

    Height in centimeters

    Number of Plants

    Height of Plants by Color

    lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html

    5009993250.517854.77147

    100014993840.65113.28333

    150019997551.2567.42699

    2000249924333.8572.342376

    25002999983215.2620.639770

    300034992373936.7540.2323685

    350039991973630.5350.1819701

    4000449963209.8140.226306

    4500499910561.650.471051

    5000up1090.210.92108

    total64689

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    YrSinceDiagnosisPercentDeadThisYearOf25NMMyelomabinBinFrequencyNMdisease xbinBinFrequencydisease x+outginBinFrequencytables altered110.610.6110.610.6110.6110.2

    1298110.2117110.61110.6111to show221.221.2221.221.2221.2220.7

    2164220.7225221.22241.2224even and odd331.631.6331.631.6331.6330.8

    3123330.8333331.63361.6336441.941.9441.941.9441.9440.9

    492440.9443441.94461.9446551.551.5551.551.5551.5551.0

    582551552551.55551.5555662.162.1662.162.1662.1661.0

    661661662662.16622.1662772.372.3772.372.3772.3771.0

    751771771772.3More12.3771882.382.3882.382.3812.3881.1

    830881.1880882.32.3880992.592.5992.592.5922.5991.2

    931991.2990992.52.599010102.8102.810102.8102.81032.810101.5

    102010101.51010110102.8average3.42916666672.81010011112.9112.911112.9112.91142.911111.6

    111111111.61111011112.9median3.42.91111012123.3123.3123.3123.31253.312121.8

    121112121.81212012123.43.412122133.4133.4133.4133.4133.4132.5

    1310132.513130133.43.4More01413.6143.61413.6143.61413.61412.8

    14101412.8141411413.63.61523.7153.71523.7153.71523.71522.8

    15101522.815More01523.73.7average3.95925925931633.8163.81633.8163.81633.81633.5

    16111633.51633.83.8median3.61743.9173.91743.9173.91743.91743.8

    17101743.81743.93.91854.1184.11854.1184.11854.11854.0

    2518541854.14.11964.2194.21964.2194.21964.21965.0

    1965average3.3521964.24.22074.5204.52074.5204.52074.52075.0

    2075median2.52074.54.52184.7214.72184.7214.72114.72185.1

    2185.12184.74.72294.9224.92294.9224.92224.92295.5

    2295.52294.94.923105.3235.323105.3235.32335.323107.0

    2310723105.35.324115.6245.624115.6245.62445.6241110.0

    24111024115.65.625126.1256.12556.1251214.0

    25121425126.16.1

    12

    12

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Individuals

    Years until death

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Individuals

    Years until death

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with multiple myeloma

  • M = median = 3.4Q3= third quartile = 4.35Q1= first quartile = 2.2Largest = max = 6.1Smallest = min = 0.6Five-number summary:min Q1 M Q3 maxFive-number summary and boxplotBOXPLOT

    Ht,flowers

    heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinwhitepinkred

    58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100

    59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100

    60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200

    60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200

    61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400

    61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400

    62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310

    62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220

    62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200

    62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210

    63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110

    63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100

    63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040

    64.07070164.070142Cindy OparahF511717230031Adam Parada7084240mean700More064.070172030

    64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010

    64.57264.5720741012median72064.572374011

    64.87364.87307500447773064.873175004

    65.27465.274076102374165.274276012

    65.77565.775077002275465.775477002

    66.27666.276078040476266.276378000

    66.77766.777079001177266.777279001

    67.1median7867.178480000078067.178080000

    67.863.97967.8More081001179167.879181001

    68.9mean8068.982001180068.980082001

    69.663.98169.683001181169.681183001

    78.0821598.3sum8400118216582184001

    78.08363.9average8500008317683185000

    78.0mean c out848600118417484186001

    78.065.98563.985071850

    median with out8686172861

    64.1More071More0

    66

    68average69.6

    72

    69

    71

    66

    73

    71

    72

    74

    75

    75

    75

    75

    76

    76

    77

    77

    79

    81

    82

    83

    84

    86

    Ht,flowers

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height

    birthweight

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height + 4

    myeloma

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Bin

    Frequency

    Height of women on on UCI basketball team

    Sheet1

    01

    01

    02

    02

    04

    04

    13

    22

    02

    12

    11

    01

    40

    30

    10

    10

    00

    10

    00

    04

    00

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    UCI women

    Class

    UCI men

    Height in inches

    Number of Individuals

    Height of Class, Women's and Men's Teams

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    UCI women

    Class

    Height in inches

    Number of Women

    Women's Height: Class and Team

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Size of Red-flowered Plants

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Height of All Plants

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    white

    pink

    red

    Height in centimeters

    Number of Plants

    Height of Plants by Color

    lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html

    5009993250.517854.77147

    100014993840.65113.28333

    150019997551.2567.42699

    2000249924333.8572.342376

    25002999983215.2620.639770

    300034992373936.7540.2323685

    350039991973630.5350.1819701

    4000449963209.8140.226306

    4500499910561.650.471051

    5000up1090.210.92108

    total64689

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    YrSinceDiagnosisPercentDeadThisYearOf25NMMyelomabinBinFrequencyNMdisease xbinBinFrequencydisease x+outginBinFrequencytables altered110.610.6110.610.62566.1110.2

    1298110.2117110.61110.6111to show221.221.2221.221.22455.6220.7

    2164220.7225221.22241.2224even and odd331.631.6331.631.62345.3330.8

    3123330.8333331.63361.6336441.941.9441.941.92234.9440.9

    492440.9443441.94461.9446551.551.5551.551.52124.7551.0

    582551552551.55551.5555662.162.1662.162.12014.5661.0

    661661662662.16622.1662772.372.3772.372.31964.2771.0

    751771771772.3More12.3771882.382.3882.382.31854.1881.1

    830881.1880882.32.3880992.592.5992.592.51743.9991.2

    931991.2990992.52.599010102.8102.810102.8102.81633.810101.5

    102010101.51010110102.8average3.42916666672.81010011112.9112.911112.9112.91523.711111.6

    111111111.61111011112.9median3.42.91111012123.3123.3123.3123.31413.612121.8

    121112121.81212012123.43.412122133.4133.4133.4133.4133.4132.5

    1310132.513130133.43.4More01413.6143.61413.6143.61263.31412.8

    14101412.8141411413.63.61523.7153.71523.7153.71152.91522.8

    15101522.815More01523.73.7average3.95925925931633.8163.81633.8163.81042.81633.5

    16111633.51633.83.8median3.61743.9173.91743.9173.9932.51743.8

    17101743.81743.93.91854.1184.11854.1184.1822.31854.0

    2518541854.14.11964.2194.21964.2194.2712.31965.0

    1965average3.3521964.24.22074.5204.52074.5204.5662.12075.0

    2075median2.52074.54.52184.7214.72184.7214.7551.52185.1

    2185.12184.74.72294.9224.92294.9224.9441.92295.5

    2295.52294.94.923105.3235.323105.3235.3331.623107.0

    2310723105.35.324115.6245.624115.6245.6221.2241110.0

    24111024115.65.625126.1256.1110.6251214.0

    25121425126.16.1

    12

    12

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Individuals

    Years until death

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Individuals

    Years until death

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with multiple myeloma

  • Comparing box plots for a normal and a right-skewed distributionBoxplots for skewed dataBoxplots remain true to the data and depict clearly symmetry or skew.

    Chart7

    0.2

    1

    2.5

    4.5

    14

    0.6

    2.2

    3.4

    4.35

    6.1

    Disease X Multiple Myeloma

    Years until death

    Ht,flowers

    heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinwhitepinkred

    58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100

    59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100

    60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200

    60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200

    61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400

    61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400

    62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310

    62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220

    62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200

    62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210

    63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110

    63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100

    63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040

    64.07070164.070142Cindy OparahF511717230031Adam Parada7084240mean700More064.070172030

    64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010

    64.57264.5720741012median72064.572374011

    64.87364.87307500447773064.873175004

    65.27465.274076102374165.274276012

    65.77565.775077002275465.775477002

    66.27666.276078040476266.276378000

    66.77766.777079001177266.777279001

    67.1median7867.178480000078067.178080000

    67.863.97967.8More081001179167.879181001

    68.9mean8068.982001180068.980082001

    69.663.98169.683001181169.681183001

    78.0821598.3sum8400118216582184001

    78.08363.9average8500008317683185000

    78.0mean c out848600118417484186001

    78.065.98563.985071850

    median with out8686172861

    64.1More071More0

    66

    68average69.6

    72

    69

    71

    66

    73

    71

    72

    74

    75

    75

    75

    75

    76

    76

    77

    77

    79

    81

    82

    83

    84

    86

    Ht,flowers

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height

    birthweight

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height + 4

    myeloma

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Bin

    Frequency

    Height of women on on UCI basketball team

    Sheet1

    01

    01

    02

    02

    04

    04

    13

    22

    02

    12

    11

    01

    40

    30

    10

    10

    00

    10

    00

    04

    00

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    UCI women

    Class

    UCI men

    Height in inches

    Number of Individuals

    Height of Class, Women's and Men's Teams

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    UCI women

    Class

    Height in inches

    Number of Women

    Women's Height: Class and Team

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Size of Red-flowered Plants

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Height of All Plants

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    white

    pink

    red

    Height in centimeters

    Number of Plants

    Height of Plants by Color

    lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html

    5009993250.517854.77147

    100014993840.65113.28333

    150019997551.2567.42699

    2000249924333.8572.342376

    25002999983215.2620.639770

    300034992373936.7540.2323685

    350039991973630.5350.1819701

    4000449963209.8140.226306

    4500499910561.650.471051

    5000up1090.210.92108

    total64689

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    YrSinceDiagnosisPercentDeadThisYearOf25NMMyelomabinBinFrequencyNMdisease xbinBinFrequencydisease x+outginBinFrequencytables alteredXXXXdisease X QMM

    1298110.2117110.61110.6111to show110.610.6110.610.610.620.2

    2164220.7225221.22241.2224even and odd221.221.2221.221.212.221

    3123330.8333331.63361.6336for median551.551.5551.551.513.422.5

    492440.9443441.94461.9446331.631.6331.631.614.3524.5

    582551552551.55551.5555441.941.9441.941.916.1214

    661661662662.16622.1662662.162.1662.162.110.6

    751771771772.3More12.3771772.372.3772.372.312.2

    830881.1880882.32.3880882.382.3882.382.313.4

    931991.2990992.52.5990992.592.5992.592.514.35

    102010101.51010110102.8average3.42916666672.81010010102.8102.810102.8102.816.1

    111111111.61111011112.9median3.42.91111011112.9112.911112.9112.9

    121112121.81212012123.43.41212212123.3123.3123.3123.3

    1310132.513130133.43.4More0133.4133.4133.4133.4

    14101412.8141411413.63.61413.6143.61413.6143.6

    15101522.815More01523.73.7average3.95925925931523.7153.71523.7153.7

    16111633.51633.83.8median3.61633.8163.81633.8163.8

    17101743.81743.93.91743.9173.91743.9173.9

    2518541854.14.11854.1184.11854.1184.1

    1965average3.3521964.24.21964.2194.21964.2194.2

    2075median2.52074.54.52074.5204.52074.5204.5

    2185.12184.74.72184.7214.72184.7214.7

    2295.52294.94.92294.9224.92294.9224.9

    2310723105.35.323105.3235.323105.3235.3

    24111024115.65.624115.6245.624115.6245.6

    25121425126.16.125126.1256.1

    12

    12

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Individuals

    Years until death

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Individuals

    Years until death

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with disease X

    7

    5

    3

    3

    2

    2

    1

    0

    0

    1

    0

    0

    0

    1

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with multiple myeloma

    0

    0

    0

    0

    0

    Years until death

    Disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Disease X Multiple Myeloma

    Years until death

    Comparison Using Box Plots

  • Suspected outliersOutliers are troublesome data points, and it is important to be able to identify them.

    One way to raise the flag for a suspected outlier is to compare the distance from the suspicious data point to the nearest quartile (Q1 or Q3). We then compare this distance to the interquartile range (distance between Q1 and Q3).

    We call an observation a suspected outlier if it falls more than 1.5 times the size of the interquartile range (IQR) above the first quartile or below the third quartile. This is called the 1.5 * IQR rule for outliers.

  • Q3 = 4.35Q1 = 2.2Interquartile rangeQ3 Q14.35 2.2 = 2.15Distance to Q37.9 4.35 = 3.55Individual #25 has a value of 7.9 years, which is 3.55 years above the third quartile. This is more than 3.225 years, 1.5 * IQR. Thus, individual #25 is a suspected outlier.

    Ht,flowers

    heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinwhitepinkred

    58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100

    59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100

    60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200

    60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200

    61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400

    61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400

    62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310

    62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220

    62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200

    62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210

    63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110

    63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100

    63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040

    64.07070164.070142Cindy OparahF511717230031Adam Parada7084240mean700More064.070172030

    64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010

    64.57264.5720741012median72064.572374011

    64.87364.87307500447773064.873175004

    65.27465.274076102374165.274276012

    65.77565.775077002275465.775477002

    66.27666.276078040476266.276378000

    66.77766.777079001177266.777279001

    67.1median7867.178480000078067.178080000

    67.863.97967.8More081001179167.879181001

    68.9mean8068.982001180068.980082001

    69.663.98169.683001181169.681183001

    78.0821598.3sum8400118216582184001

    78.08363.9average8500008317683185000

    78.0mean c out848600118417484186001

    78.065.98563.985071850

    median with out8686172861

    64.1More071More0

    66

    68average69.6

    72

    69

    71

    66

    73

    71

    72

    74

    75

    75

    75

    75

    76

    76

    77

    77

    79

    81

    82

    83

    84

    86

    Ht,flowers

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height

    birthweight

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in Inches

    Number of Women

    Women's Height + 4

    myeloma

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Bin

    Frequency

    Height of women on on UCI basketball team

    Sheet1

    01

    01

    02

    02

    04

    04

    13

    22

    02

    12

    11

    01

    40

    30

    10

    10

    00

    10

    00

    04

    00

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    UCI women

    Class

    UCI men

    Height in inches

    Number of Individuals

    Height of Class, Women's and Men's Teams

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    UCI women

    Class

    Height in inches

    Number of Women

    Women's Height: Class and Team

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Size of Red-flowered Plants

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Height in centimeters

    Number of Plants

    Height of All Plants

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    white

    pink

    red

    Height in centimeters

    Number of Plants

    Height of Plants by Color

    lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html

    5009993250.517854.77147

    100014993840.65113.28333

    150019997551.2567.42699

    2000249924333.8572.342376

    25002999983215.2620.639770

    300034992373936.7540.2323685

    350039991973630.5350.1819701

    4000449963209.8140.226306

    4500499910561.650.471051

    5000up1090.210.92108

    total64689

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    YrSinceDiagnosisPercentDeadThisYearOf25NMMyelomabinBinFrequencyNMdisease xbinBinFrequencydisease x+outginBinFrequencytables altered110.610.6110.610.62567.9110.2

    1298110.2117110.61110.6111to show221.221.2221.221.22456.1220.7

    2164220.7225221.22241.2224even and odd331.631.6331.631.62345.3330.8

    3123330.8333331.63361.6336441.941.9441.941.92234.9440.9

    492440.9443441.94461.9446551.551.5551.551.52124.7551.0

    582551552551.55551.5555662.162.1662.162.12014.5661.0

    661661662662.16622.1662772.372.3772.372.31964.2771.0

    751771771772.3More12.3771882.382.3882.382.31854.1881.1

    830881.1880882.32.3880992.592.5992.592.51743.9991.2

    931991.2990992.52.599010102.8102.810102.8102.81633.810101.5

    102010101.51010110102.8average3.42916666672.81010011112.9112.911112.9112.91523.711111.6

    111111111.61111011112.9median3.42.91111012123.3123.3123.3123.31413.612121.8

    121112121.81212012123.43.412122133.4133.4133.4133.4133.4132.5

    1310132.513130133.43.4More01413.6143.61413.6143.61263.31412.8

    14101412.8141411413.63.61523.7153.71523.7153.71152.91522.8

    15101522.815More01523.73.7average3.95925925931633.8163.81633.8163.81042.81633.5

    16111633.51633.83.8median3.61743.9173.91743.9173.9932.51743.8

    17101743.81743.93.91854.1184.11854.1184.1822.31854.0

    2518541854.14.11964.2194.21964.2194.2712.31965.0

    1965average3.3521964.24.22074.5204.52074.5204.5662.12075.0

    2075median2.52074.54.52184.7214.72184.7214.7551.52185.1

    2185.12184.74.72294.9224.92294.9224.9441.92295.5

    2295.52294.94.923105.3235.323105.3235.3331.623107.0

    2310723105.35.324115.6245.624115.6245.6221.2241110.0

    24111024115.65.625126.1256.1110.6251214.0

    25121425126.16.1

    12

    12

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Individuals

    Years until death

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Individuals

    Years until death

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with disease X

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Frequency

    Years until death

    Number of Individuals

    Years until death after diagnosis with multiple myeloma

  • The standard deviation s is used to describe the variation around the mean. Like the mean, it is not resistant to skew or outliers.Measure of spread: the standard deviationMean 1 s.d.

  • Calculations Well never calculate these by hand, so make sure to know how to get the standard deviation using your calculator or software.Mean = 63.4Sum of squared deviations from mean = 85.2Degrees freedom (df) = (n 1) = 13s2 = variance = 85.2/13 = 6.55 inches squareds = standard deviation = 6.55 = 2.56 inchesWomen height (inches)

  • Properties of Standard Deviations measures spread about the mean and should be used only when the mean is the measure of center.

    s = 0 only when all observations have the same value and there is no spread. Otherwise, s > 0.

    s is not resistant to outliers.

    s has the same units of measurement as the original observations.

  • Software output for summary statistics:

    Excel - From Menu: Tools/Data Analysis/Descriptive Statistics

    Give commonstatistics of your sample data.Minitab

  • Choosing among summary statisticsBecause the mean is not resistant to outliers or skew, use it to describe distributions that are fairly symmetrical and dont have outliers. Plot the mean and use the standard deviation for error bars.

    Otherwise use the median in the five number summary which can be plotted as a boxplot. Boxplot Mean SD

  • What should you use, when, and why?Arithmetic mean or median?

    Middletown is considering imposing an income tax on citizens. City hall wants a numerical summary of its citizens income to estimate the total tax base.

    In a study of standard of living of typical families in Middletown, a sociologist makes a numerical summary of family income in that city.Mean: Although income is likely to be right-skewed, the city government wants to know about the total tax base.

    Median: The sociologist is interested in a typical family and wants to lessen the impact of extreme incomes.

  • Changing the unit of measurementVariables can be recorded in different units of measurement. Most often, one measurement unit is a linear transformation of another measurement unit: xnew = a + bx.

    Temperatures can be expressed in degrees Fahrenheit or degrees Celsius. TemperatureFahrenheit = 32 + (9/5)* TemperatureCelsius a + bx.

    Linear transformations do not change the basic shape of a distribution (skew, symmetry, multimodal). But they do change the measures of center and spread: Multiplying each observation by a positive number b multiplies both measures of center (mean, median) and spread (IQR, s) by b. Adding the same number a (positive or negative) to each observation adds a to measures of center and to quartiles but it does not change measures of spread (IQR, s).

  • Looking at Data - DistributionsDensity Curves and Normal DistributionsIPS Chapter 1.3 2009 W.H. Freeman and Company

  • Objectives (IPS Chapter 1.3)Density curves and Normal distributions Density curves Measuring center and spread for density curvesNormal distributionsThe 68-95-99.7 ruleStandardizing observationsUsing the standard Normal Table Inverse Normal calculationsNormal quantile plots

  • Density curvesA density curve is a mathematical model of a distribution.The total area under the curve, by definition, is equal to 1, or 100%.The area under the curve for a range of values is the proportion of all observations for that range.Histogram of a sample with the smoothed, density curve describing theoretically the population.

  • Density curves come in any imaginable shape.

    Some are well known mathematically and others arent.

  • Median and mean of a density curveThe median of a density curve is the equal-areas point: the point that divides the area under the curve in half.

    The mean of a density curve is the balance point, at which the curve would balance if it were made of solid material.The median and mean are the same for a symmetric density curve. The mean of a skewed curve is pulled in the direction of the long tail.

  • Normal or Gaussian distributions are a family of symmetrical, bell-shaped density curves defined by a mean m (mu) and a standard deviation s (sigma) : N(m,s). Normal distributionse = 2.71828 The base of the natural logarithm

    = pi = 3.14159xx

  • A family of density curvesHere, means are different (m = 10, 15, and 20) while standard deviations are the same (s = 3)Here, means are the same (m = 15) while standard deviations are different (s = 2, 4, and 6).

    Chart1

    0.0005140930.00000049560

    0.0037986620.00001112360.000000002

    0.00874062970.00004461010.0000000142

    0.01799698880.00016009020.0000000885

    0.03315904630.0005140930.0000004956

    0.05467002490.00147728280.000002482

    0.08065690820.0037986620.0000111236

    0.10648266850.00874062970.0000446101

    0.12579440920.01799698880.0001600902

    0.13298076010.03315904630.000514093

    0.12579440920.05467002490.0014772828

    0.10648266850.08065690820.003798662

    0.08065690820.10648266850.0087406297

    0.05467002490.12579440920.0179969888

    0.03315904630.13298076010.0331590463

    0.01799698880.12579440920.0546700249

    0.00874062970.10648266850.0806569082

    0.0037986620.08065690820.1064826685

    0.00147728280.05467002490.1257944092

    0.0005140930.03315904630.1329807601

    0.00016009020.01799698880.1257944092

    0.00001112360.0037986620.0806569082

    0.00000049560.0005140930.0331590463

    0.0000000020.00001112360.003798662

    00.00000049560.000514093

    Species A

    Species B

    Species C

    PineNeedles

    populationmu1015

    sigma46

    lengthSpecies ASpecies B

    10.0079349130.0043703148

    20.01349774160.0063587706

    30.02156932970.0089984944

    40.03237939890.0123851939

    50.04566227130.0165795231

    60.06049268110.0215862659

    70.0752843580.0273350124

    80.08801633170.0336664476

    90.09666702920.0403284541

    100.09973557010.0469853126

    110.09666702920.0532413343

    120.08801633170.0586775545

    130.0752843580.0628972046

    140.06049268110.065573286

    150.04566227130.0664903801

    160.03237939890.065573286

    170.02156932970.0628972046

    180.01349774160.0586775545

    190.0079349130.0532413343

    200.00438207510.0469853126

    210.00227339060.0403284541

    220.00110796210.0336664476

    230.0005072620.0273350124

    240.00021817070.0215862659

    280.00000399590.0063587706

    samplinglengthSpecies ASpecies B

    distribution10.00000799190.000002482

    n=420.00006691510.0000111236

    30.00043634130.0000446101

    40.00221592420.0001600902

    50.00876415020.000514093

    60.02699548330.0014772828

    70.06475879780.003798662

    80.12098536230.0087406297

    90.17603266340.0179969888

    100.19947114020.0331590463

    110.17603266340.0546700249

    120.12098536230.0806569082

    130.06475879780.1064826685

    140.02699548330.1257944092

    150.00876415020.1329807601

    160.00221592420.1257944092

    170.00043634130.1064826685

    180.00006691510.0806569082

    190.00000799190.0546700249

    200.00000074340.0331590463

    210.00000005380.0179969888

    220.0000000030.0087406297

    230.00000000010.003798662

    2400.0014772828

    2800.0000111236

    samplinglengthSpecies ASpecies B

    distribution100

    n=25200

    300

    400

    50.00000000160

    60.00000185840

    70.00044074460.0000000001

    80.02191037560.0000000136

    90.22831135670.0000012389

    100.49867785050.0000564692

    110.22831135670.0012852325

    120.02191037560.0146069171

    130.00044074460.0828976157

    140.00000185840.2349265628

    150.00000000160.3324519003

    1600.2349265628

    1700.0828976157

    1800.0146069171

    1900.0012852325

    2000.0000564692

    2100.0000012389

    2200.0000000136

    2300.0000000001

    2400

    2800

    PineNeedles

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    000

    Species A

    Species B

    Needle length (inches)

    Population distributions

    GestationTime

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Species A

    Species B

    Mean needle length (inches)

    Sampling distributions (n=4)

    mu and sigma

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Species A

    Species B

    Needle length (inches)

    Sampling distributions (n=25)

    Sheet2

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Species A

    Species B

    Needle length (inches)

    Population distributions

    Sheet3

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Species A

    Species B

    Mean needle length (inches)

    Sampling distributions (n=4)

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Species A

    Species B

    Mean needle length (inches)

    Sampling distributions (n=25)

    populationmu250266

    sigma2015

    incrementlengthVitamins onlyVitamins and better food

    101400.00000000540

    1500.00000007430

    1600.00000079920

    1700.00000669150

    1800.00004363410.0000000019

    1900.00022159240.0000000709

    2000.0008764150.0000016628

    2100.00269954830.0000250189

    2200.00647587980.0002413624

    2300.01209853620.0014929687

    2400.01760326630.0059212307

    2500.0199471140.0150575218

    2600.01760326630.0245513427

    2700.01209853620.025667125

    2800.00647587980.0172051884

    2900.00269954830.0073947223

    3000.0008764150.002037814

    3100.00022159240.0003600704

    3200.00004363410.0000407935

    3300.00000669150.0000029633

    3400.00000079920.000000138

    3500.00000007430.0000000041

    3600.00000000540.0000000001

    3700.00000000030

    38000

    samplinglengthSpecies ASpecies B

    distribution100

    n=4200

    300

    400

    500

    600

    700

    800

    900

    1000

    1100

    1200

    1300

    1400

    1500

    1600

    1700

    1800

    1900

    2000

    2100

    2200

    2300

    2400

    2600

    samplinglengthSpecies ASpecies B

    distribution100

    n=25700

    800

    8.500

    900

    9.500

    1000

    10.500

    1100

    11.500

    1200

    12.500

    1300

    13.500

    1400

    14.500

    1500

    15.500

    1600

    16.500

    1700

    17.500

    1800

    18.500

    2600

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Vitamins only

    Vitamins and better food

    Gestation time (days)

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Species A

    Species B

    Mean needle length (inches)

    Sampling distributions (n=4)

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Species A

    Species B

    Needle length (inches)

    Sampling distributions (n=25)

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Vitamins only

    Gestation time (days)

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    00

    Species A

    Species B

    Mean needle length (inches