View
984
Download
4
Category
Tags:
Preview:
Citation preview
DescribingQuantitative Data
with Numbers
Summarizing distributions of univariate data
1. Measuring center: median, mean2. Measuring spread: range, interquartile
range, standard deviation3. Measuring position: quartiles, percentiles,
standardized scores (z-scores)4. Using boxplots5. The effect of changing units on summary
measures
Measuring Center
When describing the “center” of a set of data, we can use the mean or the median.
Mean: “Average” value
Median: “Center” value (Q2)
Where is the Center of the Distribution?
If you had to pick a single number to describe all the data what would you pick?
It’s easy to find the center when a histogram is unimodal and symmetric—it’s right in the middle.
On the other hand, it’s not so easy to find the center of a skewed histogram or a histogram with more than one mode.
Mean
To find the mean of a set of observations, add their values and divide by the number of observations.
x xin
Find the mean of:
2 3 4 6 8 12
61286432
833.5x
Although the mean is the most popular measure of center, it is not always the most appropriate.
The mean is very sensitive to extreme observations (outliers).
Because outliers affect the mean, we say that the mean is NOT a resistant measure of center.
So if the mean is not a resistant measure of center, what is? Median
MedianThe median is the value with exactly half the data values below it and half above it.
It is the middle data value once the data values have been ordered) that divides the histogram into two equal areas
It has the same units as the data
The median is not influenced by extreme observations, so we say that the median is a resistant measure of center.
Finding the Median
First sort the values (arrange them in order), then follow one of these:
1. If the number of data values is even, the median is found by computing the mean of the two middle numbers.
2. If the number of data values is odd, the median is the number located in the exact middle of the list.
5.40 1.10 0.42 0.73 0.48 1.10
0.42 0.48 0.73 1.10 1.10 5.40
(in order - even number of values – no exact middle shared by two numbers)
0.73 + 1.1 MEDIAN is 0.915 2
5.40 1.10 0.42 0.73 0.48 1.10 0.66 0.42 0.48 0.66 0.73 1.10 1.10 5.40
(in order - odd number of values)
exact middle MEDIAN is 0.73
Mean vs Median
Mean MedianAverage value of variable Typical value of variable
Not resistant to outliers Resistant to outliers
A good measure when the data is symmetric
A reliable measure regardless of the shape of the distribution
Farther out in the long tail than the median when data is skewed
Close to the center even when the data is skewed
Easy to find Less prone to mistakes
Check For Understanding
Check For Understanding
Measuring Spread
Range
Interquartile Range (IQR)
Standard Deviation
Range
Distance between largest and smallest values.
Range = Maximum – Minimum
Range is useful if there are no outliers.
Interquartile RangeHow to find the IQR: 1. Find median 2. Find the median of both halves of data
the lower median is 1st Quartilethe upper median is 3rd Quartile
3. Subtract the two quartile scores
Outliers
One general rule of thumb for identifying outliers is finding any data points that lie:
Lower than 1.5 * IQR below Q1OR
Higher than 1.5 * IQR above Q3
Check For Understanding
• The “Descriptive Statistics” of test grades for a certain class are listed below.
Mean = 74.71Median = 76Standard Deviation = 12.61Minimum = 35Maximum = 94Q1 = 68Q3 = 84• (a) Determine the IQR for this data. • (b) Using the answer from part (a), determine whether
the lowest and highest values in the data are outliers.
Standard Deviation
A standard deviation is a measure of the average deviation from the mean.
sx 1
n 1(xi x)
2
If the data is uniform or symmetric use:
If the data is skewed, use:
MeanCenter:
Spread:standard deviation
MedianCenter:
Spread:Five-number summary, Range, IQR
Distributions with Outliers
Since outliers affect mean and standard deviation, it is usually better to use median and IQR
However, if the distribution is unimodal—use mean and median and just report outliers separately
However, if you find a simple reason for outlier, eliminate it and use mean and standard devation—if symmetric
Measuring Position
Quartiles
Percentiles
Z-scores
• We can either use z-Scores or percentiles to declare the location of an observation in a distribution.
• z-Scores use the mean and standard deviation.
• Percentiles use a position relative to the starting point.
Percentiles/Quartiles
• is the notation for
the kth percentile
• is the notation for the nth quartile
P Q25 1P Q50 2 median
P Q75 3
Finding PercentilesIf you are trying to find the percentile
corresponding to a certain score x:
number of scores < 100
total number of scores
xPercentile
• Percentiles are used often when reporting academic scores such as SAT scores. Let’s say you get a 620 on the math portion of the SAT. It might also indicate that you are in the “78th percentile”. That means that you scored better than 78% of all students taking that particular SAT.
Measuring Relative Standing With Standardized Values (z-Scores)
• One way to compare an individual to the whole distribution is to describe it’s location in the distribution relative to the mean.
• Let’s do this by describing how many standard deviations an individual is away from the mean value.
• We call this the “standardized value,” or, the “z-
Score.”
Here is how to interpret z-scores:
A z-score less than 0 represents an element less than the mean.
A z-score greater than 0 represents an element greater than the mean.
A z-score equal to 0 represents an element equal to the mean.
A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.
A z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.
Five-Number Summary
The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest.
Minimum Q1 Median Q3 Maximum
Boxplots
The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot.
How to make a boxplot:
1. Draw and label a number line that includes the range of the distribution.
2. Draw a central box from Q1 to Q3.3. Note the median M inside the box.4. Extend lines (whiskers) from the box out to
the minimum and maximum values that are not outliers.
Comparing Boxplots
Check For Understanding
Effect of Changing Units
If you add a constant to every value, the mean and median increase by the same constant.
Example:Suppose you have a set of scores with a mean equal to 5 and a median equal to 6. If you add 10 to every score, the new mean will be 5 + 10 = 15; and the new median will be 6 + 10 = 16.
If you multiply every value by a constant. Then, the mean and the median will also be multiplied by that constant.
Example:Assume that a set of scores has a mean of 5 and a median of 6. If you multiply each of these scores by 10, the new mean will be 5 * 10 = 50; and the new median will be 6 * 10 = 60.
Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is how measures of central tendency are affected when we change units:
Check For Understanding
The average score on a test is 150 with a standard deviation of 15. Each score is then increased by 25. What are the new mean and standard deviation?
Check For UnderstandingThe test grades from a college statistics class are shown below.
85 72 64 65 98 78 75 76 82 80 61 92 72 58 65 74 92 85 74 76 77 77 62 68 68 54 62 76 73 85 88 91 99 82 80 74 76 77 70 60
(a) Construct two different graphs of these data(b) Calculate the five-number summary and the mean and standard deviation of the data.(c) Describe the distribution of the data, citing both the
plots and the summary statistics found in questions (a) and (b).
Recommended