Upload
dineshhsenid
View
218
Download
4
Tags:
Embed Size (px)
DESCRIPTION
separator design
Citation preview
Looking at Data - DistributionsDisplaying Distributions with GraphsIPS Chapter 1.1 2009 W.H. Freeman and Company
Objectives (IPS Chapter 1.1)Displaying distributions with graphs VariablesTypes of variablesGraphs for categorical variablesBar graphs Pie chartsGraphs for quantitative variablesHistograms Stemplots Stemplots versus histogramsInterpreting histogramsTime plots
VariablesIn a study, we collect informationdatafrom individuals. Individuals can be people, animals, plants, or any object of interest.
A variable is any characteristic of an individual. A variable varies among individuals.
Example: age, height, blood pressure, ethnicity, leaf length, first language
The distribution of a variable tells us what values the variable takes and how often it takes these values.
Two types of variables Variables can be either quantitative
Something that takes numerical values for which arithmetic operations, such as adding and averaging, make sense.Example: How tall you are, your age, your blood cholesterol level, the number of credit cards you own.
or categorical.
Something that falls into one of several categories. What can be counted is the count or proportion of individuals in each category.Example: Your blood type (A, B, AB, O), your hair color, your ethnicity, whether you paid income tax last tax year or not.
How do you know if a variable is categorical or quantitative?Ask: What are the n individuals/units in the sample (of size n)?What is being recorded about those n individuals/units?Is that a number ( quantitative) or a statement ( categorical)?Quantitative Each individual is attributed a numerical value.Categorical Each individual is assigned to one of several categories.
Ways to chart categorical dataBecause the variable is categorical, the data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.)
Bar graphs Each category is represented by a bar.
Pie charts The slices must represent the parts of one whole.
Example: Top 10 causes of death in the United States 2001For each individual who died in the United States in 2001, we record what was the cause of death. The table above is a summary of that information.
Top 10 causes of deaths in the United States 2001Bar graphs
Each category is represented by one bar. The bars height shows the count (or sometimes the percentage) for that particular category.
Chart20
700.142
553.768
163.538
123.013
101.537
71.372
62.034
53.852
39.48
32.238
Percents
Counts (x1000)
Sheet1
Heart disease909,894
All Races, Females Percent* 1. Heart Disease 28.6 2. Cancer 21.6 3. Stroke 8.0 4. Chronic lower respiratory diseases 5.2 5. Alzheimer's disease 3.4 6. Diabetes 3.1 7. Unintentional injuries 3.0 8. Influenza and pneumonia 3.0 9. Kidney disease 1.7
All Races, FemalesPercent*Percent
1. Heart Disease28.6
2. Cancer21.6
3. Stroke8
4. Chronic lower respiratory diseases5.2
5. Alzheimer's disease3.4
6. Diabetes3.1
7. Unintentional injuries3
8. Influenza and pneumonia3
9. Kidney disease1.7
10. Septicemia1.5
All other causes20.9
79.1
Rank1Causes of deathNumberDeaths per
100,000 populationAll causes2,416,425848.51.Diseases of heart700,142245.82.Malignant neoplasms (cancer)553,768194.43.Cerebrovascular diseases 163,53857.44.Chronic lower respiratory diseases123,01343.25.Accidents (unintentional injuries)101,53735.76.Diabe
2,416,425
RankCauses of deathCountsPercentsCounts (x1,000)
1Heart diseases700,14229.0%700.142
2Cancers553,76822.9%553.768
3Cerebrovascular163,5386.8%163.538
4Chronic respiratory123,0135.1%123.013
5Accidents101,5374.2%101.537
6Diabetes mellitus71,3723.0%71.372
7Flu & pneumonia62,0342.6%62.034
8Alzheimer's disease53,8522.2%53.852
9Kidney disorders39,4801.6%39.48
10Septicemia32,2381.3%32.238
All other causes629,96726.1%629.967
http://www.infoplease.com/ipa/A0005110.html
1. Rank based on number of deaths.
Source: U.S. National Center for Health Statistics, National Vital Statistics Report, vol. 52, no. 3, Sept. 18, 2003. Web: www.cdc.gov/nchs .
NUM
B
ER
O
F
IND
I
VID
U
AL
S
CA
T
E
GO
RY
(
D
is
ease
)
Ch
ea
se
Hea
rt
D
i
sea
s
e
C
a
nce
r
St
rok
e
r
on
i
c
Re
s
pi
r
a
t
o
r
y D
is
Acc
i
d
e
nt
s
D
i
abe
t
e
s
P
neu
m
on
i
a
/I
nf
l
uenz
a
A
l
zhe
im
e
r
's
D
i
s
e
ase
K
i
dney
D
i
se
a
se
S
ep
ti
ce
mi
a
T
OT
AL
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
NUM
B
ER
O
F
IND
I
VID
U
AL
S
CA
T
E
GO
RY
(
D
is
ease
)
Ch
ea
se
Hea
rt
D
i
sea
s
e
C
a
nce
r
St
rok
e
r
on
i
c
Re
s
pi
r
a
t
o
r
y D
is
Acc
i
d
e
nt
s
D
i
abe
t
e
s
P
neu
m
on
i
a
/I
nf
l
uenz
a
A
l
zhe
im
e
r
's
D
i
s
e
ase
K
i
dney
D
i
se
a
se
S
ep
ti
ce
mi
a
T
OT
AL
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
Sheet1
0
0
0
0
0
0
0
0
0
0
Percents
Counts (x1,000)
Sheet2
0
0
0
0
0
0
0
0
0
0
0
Percents
Percent
Sheet3
0
0
0
0
0
0
0
0
0
0
0
Counts
0
0
0
0
0
0
0
0
0
0
Counts
Cerebrovasc diseases
Bar graph sorted by rank Easy to analyzeTop 10 causes of deaths in the United States 2001Sorted alphabetically Much less useful
Chart20
700.142
553.768
163.538
123.013
101.537
71.372
62.034
53.852
39.48
32.238
Percents
Counts (x1000)
Sheet1
Heart disease909,894
All Races, Females Percent* 1. Heart Disease 28.6 2. Cancer 21.6 3. Stroke 8.0 4. Chronic lower respiratory diseases 5.2 5. Alzheimer's disease 3.4 6. Diabetes 3.1 7. Unintentional injuries 3.0 8. Influenza and pneumonia 3.0 9. Kidney disease 1.7
All Races, FemalesPercent*Percent
1. Heart Disease28.6
2. Cancer21.6
3. Stroke8
4. Chronic lower respiratory diseases5.2
5. Alzheimer's disease3.4
6. Diabetes3.1
7. Unintentional injuries3
8. Influenza and pneumonia3
9. Kidney disease1.7
10. Septicemia1.5
All other causes20.9
79.1
Rank1Causes of deathNumberDeaths per
100,000 populationAll causes2,416,425848.51.Diseases of heart700,142245.82.Malignant neoplasms (cancer)553,768194.43.Cerebrovascular diseases 163,53857.44.Chronic lower respiratory diseases123,01343.25.Accidents (unintentional injuries)101,53735.76.Diabe
2,416,425
RankCauses of deathCountsPercentsCounts (x1,000)
1Heart diseases700,14229.0%700.142
2Cancers553,76822.9%553.768
3Cerebrovascular163,5386.8%163.538
4Chronic respiratory123,0135.1%123.013
5Accidents101,5374.2%101.537
6Diabetes mellitus71,3723.0%71.372
7Flu & pneumonia62,0342.6%62.034
8Alzheimer's disease53,8522.2%53.852
9Kidney disorders39,4801.6%39.48
10Septicemia32,2381.3%32.238
All other causes629,96726.1%629.967
http://www.infoplease.com/ipa/A0005110.html
1. Rank based on number of deaths.
Source: U.S. National Center for Health Statistics, National Vital Statistics Report, vol. 52, no. 3, Sept. 18, 2003. Web: www.cdc.gov/nchs .
NUM
B
ER
O
F
IND
I
VID
U
AL
S
CA
T
E
GO
RY
(
D
is
ease
)
Ch
ea
se
Hea
rt
D
i
sea
s
e
C
a
nce
r
St
rok
e
r
on
i
c
Re
s
pi
r
a
t
o
r
y D
is
Acc
i
d
e
nt
s
D
i
abe
t
e
s
P
neu
m
on
i
a
/I
nf
l
uenz
a
A
l
zhe
im
e
r
's
D
i
s
e
ase
K
i
dney
D
i
se
a
se
S
ep
ti
ce
mi
a
T
OT
AL
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
NUM
B
ER
O
F
IND
I
VID
U
AL
S
CA
T
E
GO
RY
(
D
is
ease
)
Ch
ea
se
Hea
rt
D
i
sea
s
e
C
a
nce
r
St
rok
e
r
on
i
c
Re
s
pi
r
a
t
o
r
y D
is
Acc
i
d
e
nt
s
D
i
abe
t
e
s
P
neu
m
on
i
a
/I
nf
l
uenz
a
A
l
zhe
im
e
r
's
D
i
s
e
ase
K
i
dney
D
i
se
a
se
S
ep
ti
ce
mi
a
T
OT
AL
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
Sheet1
0
0
0
0
0
0
0
0
0
0
Percents
Counts (x1,000)
Sheet2
0
0
0
0
0
0
0
0
0
0
0
Percents
Percent
Sheet3
0
0
0
0
0
0
0
0
0
0
0
Counts
0
0
0
0
0
0
0
0
0
0
Counts
Cerebrovasc diseases
Chart21
101.537
53.852
553.768
163.538
123.013
71.372
62.034
700.142
39.48
32.238
Percents
Counts (x1000)
Sheet1
Heart disease909,894
All Races, Females Percent* 1. Heart Disease 28.6 2. Cancer 21.6 3. Stroke 8.0 4. Chronic lower respiratory diseases 5.2 5. Alzheimer's disease 3.4 6. Diabetes 3.1 7. Unintentional injuries 3.0 8. Influenza and pneumonia 3.0 9. Kidney disease 1.7
All Races, FemalesPercent*Percent
1. Heart Disease28.6
2. Cancer21.6
3. Stroke8
4. Chronic lower respiratory diseases5.2
5. Alzheimer's disease3.4
6. Diabetes3.1
7. Unintentional injuries3
8. Influenza and pneumonia3
9. Kidney disease1.7
10. Septicemia1.5
All other causes20.9
79.1
Rank1Causes of deathNumberDeaths per
100,000 populationAll causes2,416,425848.51.Diseases of heart700,142245.82.Malignant neoplasms (cancer)553,768194.43.Cerebrovascular diseases 163,53857.44.Chronic lower respiratory diseases123,01343.25.Accidents (unintentional injuries)101,53735.76.Diabe
2,416,425
RankCauses of deathCountsPercentsCounts (x1,000)
5Accidents101,5374.2%101.537
8Alzheimer's disease53,8522.2%53.852
2Cancers553,76822.9%553.768
3Cerebrovascular163,5386.8%163.538
4Chronic respiratory123,0135.1%123.013
6Diabetes mellitus71,3723.0%71.372
7Flu & pneumonia62,0342.6%62.034
1Heart diseases700,14229.0%700.142
9Kidney disorders39,4801.6%39.48
10Septicemia32,2381.3%32.238
All other causes629,96726.1%629.967
http://www.infoplease.com/ipa/A0005110.html
1. Rank based on number of deaths.
Source: U.S. National Center for Health Statistics, National Vital Statistics Report, vol. 52, no. 3, Sept. 18, 2003. Web: www.cdc.gov/nchs .
NUM
B
ER
O
F
IND
I
VID
U
AL
S
CA
T
E
GO
RY
(
D
is
ease
)
Ch
ea
se
Hea
rt
D
i
sea
s
e
C
a
nce
r
St
rok
e
r
on
i
c
Re
s
pi
r
a
t
o
r
y D
is
Acc
i
d
e
nt
s
D
i
abe
t
e
s
P
neu
m
on
i
a
/I
nf
l
uenz
a
A
l
zhe
im
e
r
's
D
i
s
e
ase
K
i
dney
D
i
se
a
se
S
ep
ti
ce
mi
a
T
OT
AL
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
NUM
B
ER
O
F
IND
I
VID
U
AL
S
CA
T
E
GO
RY
(
D
is
ease
)
Ch
ea
se
Hea
rt
D
i
sea
s
e
C
a
nce
r
St
rok
e
r
on
i
c
Re
s
pi
r
a
t
o
r
y D
is
Acc
i
d
e
nt
s
D
i
abe
t
e
s
P
neu
m
on
i
a
/I
nf
l
uenz
a
A
l
zhe
im
e
r
's
D
i
s
e
ase
K
i
dney
D
i
se
a
se
S
ep
ti
ce
mi
a
T
OT
AL
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
709
,
894
551
,
833
166
,
028
123
,
550
93
,
592
68
,
662
67
,
024
49
,
044
37
,
672
31
,
613
1
,
898,912
Sheet1
0
0
0
0
0
0
0
0
0
0
Percents
Counts (x1,000)
Sheet2
0
0
0
0
0
0
0
0
0
0
0
Percents
Percent
Sheet3
700142
553768
163538
123013
101537
71372
62034
53852
39480
32238
629967
Counts
0
0
0
0
0
0
0
0
0
0
Counts
Cerebrovasc diseases
Percent of people dying fromtop 10 causes of death in the United States in 2000Pie chartsEach slice represents a piece of one whole. The size of a slice depends on what percent of the whole this category represents.
Percent of deaths from top 10 causesPercent of deaths from all causesMake sure your labels match the data.
Make sure all percentsadd up to 100.
Child poverty before and after government interventionUNICEF, 1996What does this chart tell you?The United States has the highest rate of child poverty among developed nations (22% of under 18).Its government does the leastthrough taxes and subsidiesto remedy the problem (size of orange bars and percent difference between orange/blue bars).
Could you transform this bar graph to fit in 1 pie chart? In two pie charts? Why?The poverty line is defined as 50% of national median income.
Ways to chart quantitative data
Histograms and stemplotsThese are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data.
Line graphs: time plotsUse when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.
HistogramsThe range of values that a variable can take is divided into equal size intervals.
The histogram shows the number of individual data points that fall in each interval.The first column represents all states with a Hispanic percent in their population between 0% and 4.99%. The height of the column shows how many states (27) have a percent in this range.
The last column represents all states with a Hispanic percent in their population between 40% and 44.99%. There is only one such state: New Mexico, at 42.1% Hispanics.
Stem plotsHow to make a stemplot:Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which is that remaining final digit. Stems may have as many digits as needed, but each leaf contains only a single digit.Write the stems in a vertical column with the smallest value at the top, and draw a vertical line at the right of this column.Write each leaf in the row to the right of its stem, in increasing order out from the stem.STEMLEAVES
Percent of Hispanic residents in each of the 50 states
Stem PlotTo compare two related distributions, a back-to-back stem plot with common stems is useful.
Stem plots do not work well for large datasets.
When the observed values have too many digits, trim the numbers before making a stem plot.
When plotting a moderate number of observations, you can split each stem.
Stemplots versus histograms Stemplots are quick and dirty histograms that can easily be done by hand, and therefore are very convenient for back of the envelope calculations. However, they are rarely found in scientific or laymen publications.
Interpreting histogramsWhen describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread.Histogram with a line connecting each column too detailedHistogram with a smoothed curve highlighting the overall pattern of the distribution
Most common distribution shapesA distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other.
OutliersAn important kind of deviation is an outlier. Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.AlaskaFloridaThe overall pattern is fairly symmetrical except for 2 states that clearly do not belong to the main trend. Alaska and Florida have unusual representation of the elderly in their population.
A large gap in the distribution is typically a sign of an outlier.
How to create a histogramIt is an iterative process try and try again.
What bin size should you use?
Not too many bins with either 0 or 1 countsNot overly summarized that you loose all the informationNot so detailed that it is no longer summary rule of thumb: start with 5 to 10 binsLook at the distribution and refine your bins(There isnt a unique or perfect solution)
Same data set
IMPORTANT NOTE:Your data are the way they are. Do not try to force them into a particular shape.It is a common misconception that if you have a large enough data set, the data will eventually turn out nice and symmetrical.Histogram of Drydays in 1995
Line graphs: time plotsIn a time plot, time always goes on the horizontal, x axis. We describe time series by looking for an overall pattern and for striking deviations from that pattern. In a time series:
Retail price of fresh oranges over timeThis time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges.
There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.Time is on the horizontal, x axis.
The variable of interesthere retail price of fresh oranges goes on the vertical, y axis.
A time plot can be used to compare two or more data sets covering the same time period.The pattern over time for the number of flu diagnoses closely resembles that for the number of deaths from the flu, indicating that about 8% to 10% of the people diagnosed that year died shortly afterward, from complications of the flu.
A picture is worth a thousand words,
BUT There is nothing like hard numbers. Look at the scales.Scales matterHow you stretch the axes and choose your scales can give a different impression.
Why does it matter?What's wrong with these graphs?
Careful reading reveals that:
1. The ranking graph covers an 11-year period, the tuition graph 35 years, yet they are shown comparatively on the cover and without a horizontal time scale.2. Ranking and tuition have very different units, yet both graphs are placed on the same page without a vertical axis to show the units.3. The impression of a recent sharp drop in the ranking graph actually shows that Cornells rank has IMPROVED from 15th to 6th ...Cornells tuition over timeCornells ranking over time
Looking at Data - DistributionsDescribing distributions with numbersIPS Chapter 1.2 2009 W.H. Freeman and Company
Objectives (IPS Chapter 1.2)Describing distributions with numbers
Measures of center: mean, medianMean versus medianMeasures of spread: quartiles, standard deviationFive-number summary and boxplotChoosing among summary statisticsChanging the unit of measurement
The mean or arithmetic average
To calculate the average, or mean, add all values, then divide by the number of individuals. It is the center of mass.
Sum of heights is 1598.3divided by 25 women = 63.9 inches Measure of center: the mean
3 small ht
heightbinnormalFrequencyoutlierFrequencystemleavesheightheight
58.25757057058258.258.264.0
59.55858058059559.559.564.5
60.759591591607960.760.764.1
60.960601601619960.960.964.8
61.96161261262224961.961.965.2
61.9626226226319961.961.965.7
62.26363463464015862.262.266.2
62.264644644652762.262.266.7
62.465653653662762.462.467.1
62.966662662671862.962.967.8
63.16767267268963.163.968.9
63.96868268269663.963.169.6
63.96969169163.963.9
64.07070170164.0
64.171More071064.1
64.57272064.5
64.87474164.8
65.2More065.2
65.765.7
66.266.2
66.766.7
67.167.1
67.867.8
68.968.9
69.669.6
74.01598.3sum
63.9average
63.9
3 small ht
Frequency
Bin
Frequency
Histogram
birthweight
Frequency
Bin
Frequency
Histogram
myeloma
lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html
5009993250.517854.77147
100014993840.65113.28333
150019997551.2567.42699
2000249924333.8572.342376
25002999983215.2620.639770
300034992373936.7540.2323685
350039991973630.5350.1819701
4000449963209.8140.226306
4500499910561.650.471051
5000up1090.210.92108
total64689
myeloma
YrSinceDiagnosisPercentDeadThisYearCumulativeDeadBinForN=30PatientsYears2LiveBinFrequency
129299112215
21645521446
31257431664
4966341882
587425110101
668026112120
758517114141
838818116161
9391191More0
102930102More0
111940112
121950122
131960132
141971142
151980152
161990163
1711001173
30183
194
204
214
225
235
246
256
267
278
289
2914
3016
Frequency
Bin
Frequency
Histogram
Mathematical notation:Learn right away how to get the mean using your calculators.
Your numerical summary must be meaningful.The distribution of womens heights appears coherent and symmetrical. The mean is a good numerical summary.
58 60 62 64 66 68 70 72 74 76 78 80 82 84A single numerical summary here would not make sense.
Chart11
100
100
200
200
400
400
310
220
200
210
110
100
040
030
010
011
004
012
002
000
001
000
001
001
001
001
000
001
blue
pink
red
Height in centimeters
Number of Plants
Height of Plants by Color
Ht,flowers
heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinbluepinkred
58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100
59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100
60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200
60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200
61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400
61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400
62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310
62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220
62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200
62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210
63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110
63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100
63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040
64.07070164.070142Cindy OparahF511717230031Adam Parada7084240700More064.070172030
64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010
64.57264.572074101272064.572374011
64.87364.873075004473064.873175004
65.27465.274076102374165.274276012
65.77565.775077002275465.775477002
66.27666.276078040476266.276378000
66.77766.777079001177266.777279001
67.17867.178480000078067.178080000
67.87967.8More081001179167.879181001
68.98068.982001180068.980082001
69.663.98169.683001181169.681183001
78.064.5821598.3sum8400118216582184001
78.08363.9average8500008317683185000
78.0848600118417484186001
78.065.98563.985071850
8686172861
More071More0
66
68average69.6
72
69
71
66
73
71
72
74
75
75
75
75
76
76
77
77
79
81
82
83
84
86
Ht,flowers
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height
birthweight
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height + 4
myeloma
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Bin
Frequency
Height of women on on UCI basketball team
01
01
02
02
04
04
13
22
02
12
11
01
40
30
10
10
00
10
00
04
00
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
UCI women
Class
UCI men
Height in inches
Number of Individuals
Height of Class, Women's and Men's Teams
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
UCI women
Class
Height in inches
Number of Women
Women's Height: Class and Team
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
UCI women
Class
UCI men
Height in inches
Number of Individuals
Plant Height and Flower Color
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Size of Red-flowered Plants
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Height of Plants
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
blue
pink
red
Height in centimeters
Number of Plants
Height of Plants by Color
lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html
5009993250.517854.77147
100014993840.65113.28333
150019997551.2567.42699
2000249924333.8572.342376
25002999983215.2620.639770
300034992373936.7540.2323685
350039991973630.5350.1819701
4000449963209.8140.226306
4500499910561.650.471051
5000up1090.210.92108
total64689
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
YrSinceDiagnosisPercentDeadThisYearCumulativeDeadBinForN=30PatientsYears2LiveBinFrequency
129299112215
21645521446
3125745731664
4966341882
587425110101
668026112120
758517114141
838818116161
9391191More0
102930102More0
111940112
121950122
131960132
141971142
151980152
161990163
1711001173
30183
194
204
214
225
235
246
256
267
278
289
2914
3016
0
0
0
0
0
0
0
0
0
Frequency
Bin
Frequency
Histogram
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Year since diagnosis
Percent dying per year
Percent of people dying each year post-diagnosis from multiple myeloma
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Years since diagnosis
Percent of Patients
Percent Dead from Multiple Myeloma
Measure of center: the medianThe median is the midpoint of a distributionthe number such that half of the observations are smaller and half are larger. 1. Sort observations by size.n = number of observations______________________________
Mean and median for skewed distributionsMean and median for a symmetric distributionLeft skewRight skewMeanMedianMean MedianMeanMedian Comparing the mean and the medianThe mean and the median are the same only if the distribution is symmetrical. The median is a measure of center that is resistant to skew and outliers. The mean is not.
Percent of people dying Mean and median of a distribution with outliers
Impact of skewed data
M = median = 3.4Q1= first quartile = 2.2Q3= third quartile = 4.35Measure of spread: the quartilesThe first quartile, Q1, is the value in the sample that has 25% of the data at or below it ( it is the median of the lower half of the sorted data, excluding M).
The third quartile, Q3, is the value in the sample that has 75% of the data at or below it ( it is the median of the upper half of the sorted data, excluding M).
Ht,flowers
heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinwhitepinkred
58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100
59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100
60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200
60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200
61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400
61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400
62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310
62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220
62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200
62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210
63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110
63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100
63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040
64.07070164.070142Cindy OparahF511717230031Adam Parada7084240mean700More064.070172030
64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010
64.57264.5720741012median72064.572374011
64.87364.87307500447773064.873175004
65.27465.274076102374165.274276012
65.77565.775077002275465.775477002
66.27666.276078040476266.276378000
66.77766.777079001177266.777279001
67.1median7867.178480000078067.178080000
67.863.97967.8More081001179167.879181001
68.9mean8068.982001180068.980082001
69.663.98169.683001181169.681183001
78.0821598.3sum8400118216582184001
78.08363.9average8500008317683185000
78.0mean c out848600118417484186001
78.065.98563.985071850
median with out8686172861
64.1More071More0
66
68average69.6
72
69
71
66
73
71
72
74
75
75
75
75
76
76
77
77
79
81
82
83
84
86
Ht,flowers
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height
birthweight
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height + 4
myeloma
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Bin
Frequency
Height of women on on UCI basketball team
Sheet1
01
01
02
02
04
04
13
22
02
12
11
01
40
30
10
10
00
10
00
04
00
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
UCI women
Class
UCI men
Height in inches
Number of Individuals
Height of Class, Women's and Men's Teams
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
UCI women
Class
Height in inches
Number of Women
Women's Height: Class and Team
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Size of Red-flowered Plants
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Height of All Plants
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
white
pink
red
Height in centimeters
Number of Plants
Height of Plants by Color
lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html
5009993250.517854.77147
100014993840.65113.28333
150019997551.2567.42699
2000249924333.8572.342376
25002999983215.2620.639770
300034992373936.7540.2323685
350039991973630.5350.1819701
4000449963209.8140.226306
4500499910561.650.471051
5000up1090.210.92108
total64689
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
YrSinceDiagnosisPercentDeadThisYearOf25NMMyelomabinBinFrequencyNMdisease xbinBinFrequencydisease x+outginBinFrequencytables altered110.610.6110.610.6110.6110.2
1298110.2117110.61110.6111to show221.221.2221.221.2221.2220.7
2164220.7225221.22241.2224even and odd331.631.6331.631.6331.6330.8
3123330.8333331.63361.6336441.941.9441.941.9441.9440.9
492440.9443441.94461.9446551.551.5551.551.5551.5551.0
582551552551.55551.5555662.162.1662.162.1662.1661.0
661661662662.16622.1662772.372.3772.372.3772.3771.0
751771771772.3More12.3771882.382.3882.382.3812.3881.1
830881.1880882.32.3880992.592.5992.592.5922.5991.2
931991.2990992.52.599010102.8102.810102.8102.81032.810101.5
102010101.51010110102.8average3.42916666672.81010011112.9112.911112.9112.91142.911111.6
111111111.61111011112.9median3.42.91111012123.3123.3123.3123.31253.312121.8
121112121.81212012123.43.412122133.4133.4133.4133.4133.4132.5
1310132.513130133.43.4More01413.6143.61413.6143.61413.61412.8
14101412.8141411413.63.61523.7153.71523.7153.71523.71522.8
15101522.815More01523.73.7average3.95925925931633.8163.81633.8163.81633.81633.5
16111633.51633.83.8median3.61743.9173.91743.9173.91743.91743.8
17101743.81743.93.91854.1184.11854.1184.11854.11854.0
2518541854.14.11964.2194.21964.2194.21964.21965.0
1965average3.3521964.24.22074.5204.52074.5204.52074.52075.0
2075median2.52074.54.52184.7214.72184.7214.72114.72185.1
2185.12184.74.72294.9224.92294.9224.92224.92295.5
2295.52294.94.923105.3235.323105.3235.32335.323107.0
2310723105.35.324115.6245.624115.6245.62445.6241110.0
24111024115.65.625126.1256.12556.1251214.0
25121425126.16.1
12
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Individuals
Years until death
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Individuals
Years until death
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with multiple myeloma
M = median = 3.4Q3= third quartile = 4.35Q1= first quartile = 2.2Largest = max = 6.1Smallest = min = 0.6Five-number summary:min Q1 M Q3 maxFive-number summary and boxplotBOXPLOT
Ht,flowers
heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinwhitepinkred
58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100
59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100
60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200
60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200
61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400
61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400
62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310
62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220
62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200
62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210
63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110
63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100
63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040
64.07070164.070142Cindy OparahF511717230031Adam Parada7084240mean700More064.070172030
64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010
64.57264.5720741012median72064.572374011
64.87364.87307500447773064.873175004
65.27465.274076102374165.274276012
65.77565.775077002275465.775477002
66.27666.276078040476266.276378000
66.77766.777079001177266.777279001
67.1median7867.178480000078067.178080000
67.863.97967.8More081001179167.879181001
68.9mean8068.982001180068.980082001
69.663.98169.683001181169.681183001
78.0821598.3sum8400118216582184001
78.08363.9average8500008317683185000
78.0mean c out848600118417484186001
78.065.98563.985071850
median with out8686172861
64.1More071More0
66
68average69.6
72
69
71
66
73
71
72
74
75
75
75
75
76
76
77
77
79
81
82
83
84
86
Ht,flowers
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height
birthweight
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height + 4
myeloma
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Bin
Frequency
Height of women on on UCI basketball team
Sheet1
01
01
02
02
04
04
13
22
02
12
11
01
40
30
10
10
00
10
00
04
00
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
UCI women
Class
UCI men
Height in inches
Number of Individuals
Height of Class, Women's and Men's Teams
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
UCI women
Class
Height in inches
Number of Women
Women's Height: Class and Team
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Size of Red-flowered Plants
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Height of All Plants
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
white
pink
red
Height in centimeters
Number of Plants
Height of Plants by Color
lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html
5009993250.517854.77147
100014993840.65113.28333
150019997551.2567.42699
2000249924333.8572.342376
25002999983215.2620.639770
300034992373936.7540.2323685
350039991973630.5350.1819701
4000449963209.8140.226306
4500499910561.650.471051
5000up1090.210.92108
total64689
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
YrSinceDiagnosisPercentDeadThisYearOf25NMMyelomabinBinFrequencyNMdisease xbinBinFrequencydisease x+outginBinFrequencytables altered110.610.6110.610.62566.1110.2
1298110.2117110.61110.6111to show221.221.2221.221.22455.6220.7
2164220.7225221.22241.2224even and odd331.631.6331.631.62345.3330.8
3123330.8333331.63361.6336441.941.9441.941.92234.9440.9
492440.9443441.94461.9446551.551.5551.551.52124.7551.0
582551552551.55551.5555662.162.1662.162.12014.5661.0
661661662662.16622.1662772.372.3772.372.31964.2771.0
751771771772.3More12.3771882.382.3882.382.31854.1881.1
830881.1880882.32.3880992.592.5992.592.51743.9991.2
931991.2990992.52.599010102.8102.810102.8102.81633.810101.5
102010101.51010110102.8average3.42916666672.81010011112.9112.911112.9112.91523.711111.6
111111111.61111011112.9median3.42.91111012123.3123.3123.3123.31413.612121.8
121112121.81212012123.43.412122133.4133.4133.4133.4133.4132.5
1310132.513130133.43.4More01413.6143.61413.6143.61263.31412.8
14101412.8141411413.63.61523.7153.71523.7153.71152.91522.8
15101522.815More01523.73.7average3.95925925931633.8163.81633.8163.81042.81633.5
16111633.51633.83.8median3.61743.9173.91743.9173.9932.51743.8
17101743.81743.93.91854.1184.11854.1184.1822.31854.0
2518541854.14.11964.2194.21964.2194.2712.31965.0
1965average3.3521964.24.22074.5204.52074.5204.5662.12075.0
2075median2.52074.54.52184.7214.72184.7214.7551.52185.1
2185.12184.74.72294.9224.92294.9224.9441.92295.5
2295.52294.94.923105.3235.323105.3235.3331.623107.0
2310723105.35.324115.6245.624115.6245.6221.2241110.0
24111024115.65.625126.1256.1110.6251214.0
25121425126.16.1
12
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Individuals
Years until death
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Individuals
Years until death
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with multiple myeloma
Comparing box plots for a normal and a right-skewed distributionBoxplots for skewed dataBoxplots remain true to the data and depict clearly symmetry or skew.
Chart7
0.2
1
2.5
4.5
14
0.6
2.2
3.4
4.35
6.1
Disease X Multiple Myeloma
Years until death
Ht,flowers
heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinwhitepinkred
58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100
59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100
60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200
60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200
61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400
61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400
62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310
62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220
62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200
62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210
63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110
63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100
63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040
64.07070164.070142Cindy OparahF511717230031Adam Parada7084240mean700More064.070172030
64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010
64.57264.5720741012median72064.572374011
64.87364.87307500447773064.873175004
65.27465.274076102374165.274276012
65.77565.775077002275465.775477002
66.27666.276078040476266.276378000
66.77766.777079001177266.777279001
67.1median7867.178480000078067.178080000
67.863.97967.8More081001179167.879181001
68.9mean8068.982001180068.980082001
69.663.98169.683001181169.681183001
78.0821598.3sum8400118216582184001
78.08363.9average8500008317683185000
78.0mean c out848600118417484186001
78.065.98563.985071850
median with out8686172861
64.1More071More0
66
68average69.6
72
69
71
66
73
71
72
74
75
75
75
75
76
76
77
77
79
81
82
83
84
86
Ht,flowers
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height
birthweight
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height + 4
myeloma
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Bin
Frequency
Height of women on on UCI basketball team
Sheet1
01
01
02
02
04
04
13
22
02
12
11
01
40
30
10
10
00
10
00
04
00
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
UCI women
Class
UCI men
Height in inches
Number of Individuals
Height of Class, Women's and Men's Teams
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
UCI women
Class
Height in inches
Number of Women
Women's Height: Class and Team
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Size of Red-flowered Plants
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Height of All Plants
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
white
pink
red
Height in centimeters
Number of Plants
Height of Plants by Color
lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html
5009993250.517854.77147
100014993840.65113.28333
150019997551.2567.42699
2000249924333.8572.342376
25002999983215.2620.639770
300034992373936.7540.2323685
350039991973630.5350.1819701
4000449963209.8140.226306
4500499910561.650.471051
5000up1090.210.92108
total64689
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
YrSinceDiagnosisPercentDeadThisYearOf25NMMyelomabinBinFrequencyNMdisease xbinBinFrequencydisease x+outginBinFrequencytables alteredXXXXdisease X QMM
1298110.2117110.61110.6111to show110.610.6110.610.610.620.2
2164220.7225221.22241.2224even and odd221.221.2221.221.212.221
3123330.8333331.63361.6336for median551.551.5551.551.513.422.5
492440.9443441.94461.9446331.631.6331.631.614.3524.5
582551552551.55551.5555441.941.9441.941.916.1214
661661662662.16622.1662662.162.1662.162.110.6
751771771772.3More12.3771772.372.3772.372.312.2
830881.1880882.32.3880882.382.3882.382.313.4
931991.2990992.52.5990992.592.5992.592.514.35
102010101.51010110102.8average3.42916666672.81010010102.8102.810102.8102.816.1
111111111.61111011112.9median3.42.91111011112.9112.911112.9112.9
121112121.81212012123.43.41212212123.3123.3123.3123.3
1310132.513130133.43.4More0133.4133.4133.4133.4
14101412.8141411413.63.61413.6143.61413.6143.6
15101522.815More01523.73.7average3.95925925931523.7153.71523.7153.7
16111633.51633.83.8median3.61633.8163.81633.8163.8
17101743.81743.93.91743.9173.91743.9173.9
2518541854.14.11854.1184.11854.1184.1
1965average3.3521964.24.21964.2194.21964.2194.2
2075median2.52074.54.52074.5204.52074.5204.5
2185.12184.74.72184.7214.72184.7214.7
2295.52294.94.92294.9224.92294.9224.9
2310723105.35.323105.3235.323105.3235.3
24111024115.65.624115.6245.624115.6245.6
25121425126.16.125126.1256.1
12
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Individuals
Years until death
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Individuals
Years until death
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with disease X
7
5
3
3
2
2
1
0
0
1
0
0
0
1
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with multiple myeloma
0
0
0
0
0
Years until death
Disease X
0
0
0
0
0
0
0
0
0
0
Disease X Multiple Myeloma
Years until death
Comparison Using Box Plots
Suspected outliersOutliers are troublesome data points, and it is important to be able to identify them.
One way to raise the flag for a suspected outlier is to compare the distance from the suspicious data point to the nearest quartile (Q1 or Q3). We then compare this distance to the interquartile range (distance between Q1 and Q3).
We call an observation a suspected outlier if it falls more than 1.5 times the size of the interquartile range (IQR) above the first quartile or below the third quartile. This is called the 1.5 * IQR rule for outliers.
Q3 = 4.35Q1 = 2.2Interquartile rangeQ3 Q14.35 2.2 = 2.15Distance to Q37.9 4.35 = 3.55Individual #25 has a value of 7.9 years, which is 3.55 years above the third quartile. This is more than 3.225 years, 1.5 * IQR. Thus, individual #25 is a suspected outlier.
Ht,flowers
heightbinclassFrequencystemleavesheightheightwomanheightwomanheight58.264.04 big outliersFrequencyNumberNamePOSFeetInchesTotalBinUCI womenClassUCI mentotal=allflowerscompressNamefeetinchestotalinchesweightMenFrequencyMenAloneFrequencyclass,women, men, no outliersBinFrequencyBinwhitepinkred
58.25757058258.258.264.0158.21464.059.564.55701Lisa FaulknerG55655901012Jesse Obrand627418557074158.257059100
59.55858059559.559.564.5259.51564.560.764.15803Ashley BigginsC64766001014Jerry Green637519058075459.558060100
60.759591607960.760.764.1360.71664.160.964.85914Katie SturgeonF62746102028Jeff Gloger637517559076260.759161200
60.960601619960.960.964.8460.91764.861.965.260110Lisa WoznickG511716202028DeVaughn Peace637518060077260.960162200
61.96161262224961.961.965.2561.91865.261.965.761211Jana CiperovaG60726304045Aras Baskauskas637518761078061.961263400
61.9626226319961.961.965.7661.91965.762.266.262212Wendy GabbeF511716404043Mike Hood647619062079161.962264400
62.26363464015862.262.266.2762.22066.262.266.763413Courtney FergusonG56666513047Jeff Hufford647617563080062.263465310
62.264644652762.262.266.7862.22166.762.467.164414Kristen GreenG58686622043Jordan Harris657721764081162.264466220
62.465653662762.462.467.1962.42267.162.967.865320Kimberly MartinF60726702027Ross Schraeder657718565082162.465467200
62.966662671862.962.967.81062.92367.863.968.966221Erin TomlinsonG59696812036Matt Okoro677922266083162.966468210
63.16767268963.163.968.91163.92468.963.169.667222Chanda McLeodF511716911021J.R. Christ698124567084163.167269110
63.96868269663.963.169.61263.12569.663.974.068224Nina HanG56667001012Stanislav Zuzak6108222068085063.968370100
63.96969163.963.91363.9n=2569134Brandy HudsonF61737140042Ryan Codi6118321069086163.969271040
64.07070164.070142Cindy OparahF511717230031Adam Parada7084240mean700More064.070172030
64.171More064.171044Christina CallawayF607270.4666666667731001Dave Korfman728627578.3333333333710this is for red plants64.171473010
64.57264.5720741012median72064.572374011
64.87364.87307500447773064.873175004
65.27465.274076102374165.274276012
65.77565.775077002275465.775477002
66.27666.276078040476266.276378000
66.77766.777079001177266.777279001
67.1median7867.178480000078067.178080000
67.863.97967.8More081001179167.879181001
68.9mean8068.982001180068.980082001
69.663.98169.683001181169.681183001
78.0821598.3sum8400118216582184001
78.08363.9average8500008317683185000
78.0mean c out848600118417484186001
78.065.98563.985071850
median with out8686172861
64.1More071More0
66
68average69.6
72
69
71
66
73
71
72
74
75
75
75
75
76
76
77
77
79
81
82
83
84
86
Ht,flowers
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height
birthweight
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in Inches
Number of Women
Women's Height + 4
myeloma
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Bin
Frequency
Height of women on on UCI basketball team
Sheet1
01
01
02
02
04
04
13
22
02
12
11
01
40
30
10
10
00
10
00
04
00
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
UCI women
Class
UCI men
Height in inches
Number of Individuals
Height of Class, Women's and Men's Teams
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
UCI women
Class
Height in inches
Number of Women
Women's Height: Class and Team
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Size of Red-flowered Plants
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Height in centimeters
Number of Plants
Height of All Plants
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
white
pink
red
Height in centimeters
Number of Plants
Height of Plants by Color
lowerlimitupperlimitNpercentdiepercent deadhttp://www.dhs.vic.gov.au/phb/hce/peri/pn/tables/t20.html
5009993250.517854.77147
100014993840.65113.28333
150019997551.2567.42699
2000249924333.8572.342376
25002999983215.2620.639770
300034992373936.7540.2323685
350039991973630.5350.1819701
4000449963209.8140.226306
4500499910561.650.471051
5000up1090.210.92108
total64689
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
YrSinceDiagnosisPercentDeadThisYearOf25NMMyelomabinBinFrequencyNMdisease xbinBinFrequencydisease x+outginBinFrequencytables altered110.610.6110.610.62567.9110.2
1298110.2117110.61110.6111to show221.221.2221.221.22456.1220.7
2164220.7225221.22241.2224even and odd331.631.6331.631.62345.3330.8
3123330.8333331.63361.6336441.941.9441.941.92234.9440.9
492440.9443441.94461.9446551.551.5551.551.52124.7551.0
582551552551.55551.5555662.162.1662.162.12014.5661.0
661661662662.16622.1662772.372.3772.372.31964.2771.0
751771771772.3More12.3771882.382.3882.382.31854.1881.1
830881.1880882.32.3880992.592.5992.592.51743.9991.2
931991.2990992.52.599010102.8102.810102.8102.81633.810101.5
102010101.51010110102.8average3.42916666672.81010011112.9112.911112.9112.91523.711111.6
111111111.61111011112.9median3.42.91111012123.3123.3123.3123.31413.612121.8
121112121.81212012123.43.412122133.4133.4133.4133.4133.4132.5
1310132.513130133.43.4More01413.6143.61413.6143.61263.31412.8
14101412.8141411413.63.61523.7153.71523.7153.71152.91522.8
15101522.815More01523.73.7average3.95925925931633.8163.81633.8163.81042.81633.5
16111633.51633.83.8median3.61743.9173.91743.9173.9932.51743.8
17101743.81743.93.91854.1184.11854.1184.1822.31854.0
2518541854.14.11964.2194.21964.2194.2712.31965.0
1965average3.3521964.24.22074.5204.52074.5204.5662.12075.0
2075median2.52074.54.52184.7214.72184.7214.7551.52185.1
2185.12184.74.72294.9224.92294.9224.9441.92295.5
2295.52294.94.923105.3235.323105.3235.3331.623107.0
2310723105.35.324115.6245.624115.6245.6221.2241110.0
24111024115.65.625126.1256.1110.6251214.0
25121425126.16.1
12
12
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Individuals
Years until death
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Individuals
Years until death
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with disease X
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Frequency
Years until death
Number of Individuals
Years until death after diagnosis with multiple myeloma
The standard deviation s is used to describe the variation around the mean. Like the mean, it is not resistant to skew or outliers.Measure of spread: the standard deviationMean 1 s.d.
Calculations Well never calculate these by hand, so make sure to know how to get the standard deviation using your calculator or software.Mean = 63.4Sum of squared deviations from mean = 85.2Degrees freedom (df) = (n 1) = 13s2 = variance = 85.2/13 = 6.55 inches squareds = standard deviation = 6.55 = 2.56 inchesWomen height (inches)
Properties of Standard Deviations measures spread about the mean and should be used only when the mean is the measure of center.
s = 0 only when all observations have the same value and there is no spread. Otherwise, s > 0.
s is not resistant to outliers.
s has the same units of measurement as the original observations.
Software output for summary statistics:
Excel - From Menu: Tools/Data Analysis/Descriptive Statistics
Give commonstatistics of your sample data.Minitab
Choosing among summary statisticsBecause the mean is not resistant to outliers or skew, use it to describe distributions that are fairly symmetrical and dont have outliers. Plot the mean and use the standard deviation for error bars.
Otherwise use the median in the five number summary which can be plotted as a boxplot. Boxplot Mean SD
What should you use, when, and why?Arithmetic mean or median?
Middletown is considering imposing an income tax on citizens. City hall wants a numerical summary of its citizens income to estimate the total tax base.
In a study of standard of living of typical families in Middletown, a sociologist makes a numerical summary of family income in that city.Mean: Although income is likely to be right-skewed, the city government wants to know about the total tax base.
Median: The sociologist is interested in a typical family and wants to lessen the impact of extreme incomes.
Changing the unit of measurementVariables can be recorded in different units of measurement. Most often, one measurement unit is a linear transformation of another measurement unit: xnew = a + bx.
Temperatures can be expressed in degrees Fahrenheit or degrees Celsius. TemperatureFahrenheit = 32 + (9/5)* TemperatureCelsius a + bx.
Linear transformations do not change the basic shape of a distribution (skew, symmetry, multimodal). But they do change the measures of center and spread: Multiplying each observation by a positive number b multiplies both measures of center (mean, median) and spread (IQR, s) by b. Adding the same number a (positive or negative) to each observation adds a to measures of center and to quartiles but it does not change measures of spread (IQR, s).
Looking at Data - DistributionsDensity Curves and Normal DistributionsIPS Chapter 1.3 2009 W.H. Freeman and Company
Objectives (IPS Chapter 1.3)Density curves and Normal distributions Density curves Measuring center and spread for density curvesNormal distributionsThe 68-95-99.7 ruleStandardizing observationsUsing the standard Normal Table Inverse Normal calculationsNormal quantile plots
Density curvesA density curve is a mathematical model of a distribution.The total area under the curve, by definition, is equal to 1, or 100%.The area under the curve for a range of values is the proportion of all observations for that range.Histogram of a sample with the smoothed, density curve describing theoretically the population.
Density curves come in any imaginable shape.
Some are well known mathematically and others arent.
Median and mean of a density curveThe median of a density curve is the equal-areas point: the point that divides the area under the curve in half.
The mean of a density curve is the balance point, at which the curve would balance if it were made of solid material.The median and mean are the same for a symmetric density curve. The mean of a skewed curve is pulled in the direction of the long tail.
Normal or Gaussian distributions are a family of symmetrical, bell-shaped density curves defined by a mean m (mu) and a standard deviation s (sigma) : N(m,s). Normal distributionse = 2.71828 The base of the natural logarithm
= pi = 3.14159xx
A family of density curvesHere, means are different (m = 10, 15, and 20) while standard deviations are the same (s = 3)Here, means are the same (m = 15) while standard deviations are different (s = 2, 4, and 6).
Chart1
0.0005140930.00000049560
0.0037986620.00001112360.000000002
0.00874062970.00004461010.0000000142
0.01799698880.00016009020.0000000885
0.03315904630.0005140930.0000004956
0.05467002490.00147728280.000002482
0.08065690820.0037986620.0000111236
0.10648266850.00874062970.0000446101
0.12579440920.01799698880.0001600902
0.13298076010.03315904630.000514093
0.12579440920.05467002490.0014772828
0.10648266850.08065690820.003798662
0.08065690820.10648266850.0087406297
0.05467002490.12579440920.0179969888
0.03315904630.13298076010.0331590463
0.01799698880.12579440920.0546700249
0.00874062970.10648266850.0806569082
0.0037986620.08065690820.1064826685
0.00147728280.05467002490.1257944092
0.0005140930.03315904630.1329807601
0.00016009020.01799698880.1257944092
0.00001112360.0037986620.0806569082
0.00000049560.0005140930.0331590463
0.0000000020.00001112360.003798662
00.00000049560.000514093
Species A
Species B
Species C
PineNeedles
populationmu1015
sigma46
lengthSpecies ASpecies B
10.0079349130.0043703148
20.01349774160.0063587706
30.02156932970.0089984944
40.03237939890.0123851939
50.04566227130.0165795231
60.06049268110.0215862659
70.0752843580.0273350124
80.08801633170.0336664476
90.09666702920.0403284541
100.09973557010.0469853126
110.09666702920.0532413343
120.08801633170.0586775545
130.0752843580.0628972046
140.06049268110.065573286
150.04566227130.0664903801
160.03237939890.065573286
170.02156932970.0628972046
180.01349774160.0586775545
190.0079349130.0532413343
200.00438207510.0469853126
210.00227339060.0403284541
220.00110796210.0336664476
230.0005072620.0273350124
240.00021817070.0215862659
280.00000399590.0063587706
samplinglengthSpecies ASpecies B
distribution10.00000799190.000002482
n=420.00006691510.0000111236
30.00043634130.0000446101
40.00221592420.0001600902
50.00876415020.000514093
60.02699548330.0014772828
70.06475879780.003798662
80.12098536230.0087406297
90.17603266340.0179969888
100.19947114020.0331590463
110.17603266340.0546700249
120.12098536230.0806569082
130.06475879780.1064826685
140.02699548330.1257944092
150.00876415020.1329807601
160.00221592420.1257944092
170.00043634130.1064826685
180.00006691510.0806569082
190.00000799190.0546700249
200.00000074340.0331590463
210.00000005380.0179969888
220.0000000030.0087406297
230.00000000010.003798662
2400.0014772828
2800.0000111236
samplinglengthSpecies ASpecies B
distribution100
n=25200
300
400
50.00000000160
60.00000185840
70.00044074460.0000000001
80.02191037560.0000000136
90.22831135670.0000012389
100.49867785050.0000564692
110.22831135670.0012852325
120.02191037560.0146069171
130.00044074460.0828976157
140.00000185840.2349265628
150.00000000160.3324519003
1600.2349265628
1700.0828976157
1800.0146069171
1900.0012852325
2000.0000564692
2100.0000012389
2200.0000000136
2300.0000000001
2400
2800
PineNeedles
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
000
Species A
Species B
Needle length (inches)
Population distributions
GestationTime
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Species A
Species B
Mean needle length (inches)
Sampling distributions (n=4)
mu and sigma
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Species A
Species B
Needle length (inches)
Sampling distributions (n=25)
Sheet2
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Species A
Species B
Needle length (inches)
Population distributions
Sheet3
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Species A
Species B
Mean needle length (inches)
Sampling distributions (n=4)
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Species A
Species B
Mean needle length (inches)
Sampling distributions (n=25)
populationmu250266
sigma2015
incrementlengthVitamins onlyVitamins and better food
101400.00000000540
1500.00000007430
1600.00000079920
1700.00000669150
1800.00004363410.0000000019
1900.00022159240.0000000709
2000.0008764150.0000016628
2100.00269954830.0000250189
2200.00647587980.0002413624
2300.01209853620.0014929687
2400.01760326630.0059212307
2500.0199471140.0150575218
2600.01760326630.0245513427
2700.01209853620.025667125
2800.00647587980.0172051884
2900.00269954830.0073947223
3000.0008764150.002037814
3100.00022159240.0003600704
3200.00004363410.0000407935
3300.00000669150.0000029633
3400.00000079920.000000138
3500.00000007430.0000000041
3600.00000000540.0000000001
3700.00000000030
38000
samplinglengthSpecies ASpecies B
distribution100
n=4200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2600
samplinglengthSpecies ASpecies B
distribution100
n=25700
800
8.500
900
9.500
1000
10.500
1100
11.500
1200
12.500
1300
13.500
1400
14.500
1500
15.500
1600
16.500
1700
17.500
1800
18.500
2600
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Vitamins only
Vitamins and better food
Gestation time (days)
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Species A
Species B
Mean needle length (inches)
Sampling distributions (n=4)
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Species A
Species B
Needle length (inches)
Sampling distributions (n=25)
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Vitamins only
Gestation time (days)
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
Species A
Species B
Mean needle length (inches