Upload
noreen-jordan
View
225
Download
2
Embed Size (px)
Citation preview
Chapter 2Chapter 2 Slide Slide 11
Psy B07
DESCRIBING AND DESCRIBING AND EXPLORING DATAEXPLORING DATA
Chapter 2Chapter 2 Slide Slide 22
Psy B07
Plotting dataPlotting data Grouping dataGrouping data TerminologyTerminology NotationNotation Measures of Central TendencyMeasures of Central Tendency Measures of VariabilityMeasures of Variability Properties of a StatisticProperties of a Statistic
OutlineOutline
Chapter 2Chapter 2 Slide Slide 33
Psy B07
Plotting DataPlotting Data
Once a bunch of data has been Once a bunch of data has been collected, the raw numbers must be collected, the raw numbers must be manipulated in some fashion to manipulated in some fashion to make them more informative.make them more informative.
Several options are available Several options are available includingincluding plottingplotting the data or the data or calculatingcalculating descriptive statisticsdescriptive statistics
Chapter 2Chapter 2 Slide Slide 44
Psy B07
Plotting DataPlotting Data
AgeAge18 18 26 26 21 21 21 21 25 25 18 18 20 20 21 21 18 18 21 21 21 21 21 21 20 20 21 21 20 20 23 23 22 22 20 20 21 21 22 22 24 24 26 26 19 19 19 19
WeighWeightt107 107 115 115 108 108 111 111 163 163 119 119 119 119 200 200 178 178 135 135 143 143 113 113 103 103 166 166 112 112 151 151 192 192 135 135 117 117 138 138 137 137 161 161 117 117 142142
Raw data of Raw data of typical age typical age and weight in and weight in a second a second year course year course (made-up (made-up data)data)
Age Age 20 20 21 21 20 20 19 19 19 19 21 21 22 22 19 19 20 20 20 20 19 19 19 19 19 19 20 20 20 20 19 19 20 20 20 20 20 20 22 22 22 22 19 19 23 23 2020
WeighWeightt108 108 110 110 109 109 127 127 143 143 121 121 112 112 136 136 161 161 131 131 144 144 123 123 101 101 193 193 127 127 158 158 149 149 138 138 129 129 138 138 137 137 156 156 122 122 132132
Chapter 2Chapter 2 Slide Slide 55
Psy B07
Plotting DataPlotting Data
Often, the first thing one does with Often, the first thing one does with a set of raw data is to plot a set of raw data is to plot frequency distributions.frequency distributions.
Usually this is done by first Usually this is done by first creating a table of the frequencies creating a table of the frequencies broken down by values of the broken down by values of the relevant variable, then the relevant variable, then the frequencies in the table are plotted frequencies in the table are plotted in a in a histogramhistogram
Chapter 2Chapter 2 Slide Slide 66
Psy B07
Plotting DataPlotting Data
Example: Typical age in a second year courseExample: Typical age in a second year course
Age Frequency
18 319 1020 1421 1022 523 224 125 126 2
Note: The frequencies Note: The frequencies in the adjacent table in the adjacent table were calculated by were calculated by simply counting the simply counting the number of subjects number of subjects having the specified having the specified value for the age value for the age variablevariable
Chapter 2Chapter 2 Slide Slide 77
Psy B07
Plotting DataPlotting Data
Age Frequency18 319 1020 1421 1022 523 224 125 126 2
0
2
4
6
8
10
12
14
16
18 19 20 21 22 23 24 25 26
Age
Freq
uenc
y
Chapter 2Chapter 2 Slide Slide 88
Psy B07
Grouping DataGrouping Data
Plotting is easy when the variable Plotting is easy when the variable of interest has a relatively small of interest has a relatively small number of values (like our age number of values (like our age variable did).variable did).
However, the values of a variable However, the values of a variable are sometimes more continuous, are sometimes more continuous, resulting in uninformative resulting in uninformative frequency plots if done in the frequency plots if done in the above manner.above manner.
Chapter 2Chapter 2 Slide Slide 99
Psy B07
Grouping DataGrouping Data
For example, our weight variable For example, our weight variable ranges from 100 lb. to 200 lb. If we ranges from 100 lb. to 200 lb. If we used the previously described used the previously described technique, we would end up with 100 technique, we would end up with 100 bars, most of which with a frequency bars, most of which with a frequency less than 2 or 3 (and many with a less than 2 or 3 (and many with a frequency of zero).frequency of zero).
We can get around this problem by We can get around this problem by grouping our values into bins. Try for grouping our values into bins. Try for around 10 bins with natural splits.around 10 bins with natural splits.
Chapter 2Chapter 2 Slide Slide 1010
Psy B07
Grouping DataGrouping Data
Weight Bin Midpoint Frequency
100 - 109 104.5 6110 - 119 114.5 10120 - 129 124.5 6130 - 139 134.5 10140 - 149 144.5 5150 - 159 154.5 3160 - 169 164.5 4170 - 179 174.5 1180 - 189 184.5 0190 - 199 194.5 2200 - 209 204.5 1
Chapter 2Chapter 2 Slide Slide 1111
Psy B07
Grouping DataGrouping DataWeight Frequency
104.5 6114.5 10124.5 6134.5 10144.5 5154.5 3164.5 4174.5 1184.5 0194.5 2204.5 1
0
2
4
6
8
10
1210
4.5
114.
5
124.
5
134.
5
144.
5
154.
5
164.
5
174.
5
184.
5
194.
5
204.
5Weight (lbs)
Fre
qu
ency
Check out this demo which clearly shows how the width of the bin that you select can clearly affect the “look” of the data
Here is another similar demonstration of the effects of bin width
See section in text on cumulative frequency distributionsSee section in text on cumulative frequency distributions
Chapter 2Chapter 2 Slide Slide 1212
Psy B07
TerminologyTerminology
Often, frequency histograms tend to have a Often, frequency histograms tend to have a roughly symmetrical bell-shape and such roughly symmetrical bell-shape and such distributions are called distributions are called normalnormal or or GaussianGaussian
60.5 362.5 864.5 766.5 1268.5 770.5 672.5 474.5 076.5 1
0
2
4
6
8
10
12
14
60.5 62.5 64.5 66.5 68.5 70.5 72.5 74.5 76.5
Height (Inches)
Fre
quen
cy
Chapter 2Chapter 2 Slide Slide 1313
Psy B07
TerminologyTerminology
Sometimes, the bell shape is not Sometimes, the bell shape is not symmetricalsymmetrical
The term The term positive skewpositive skew refers to the refers to the situation where the “tail” of the situation where the “tail” of the distribution is to the right, distribution is to the right, negative negative skewskew is when the “tail” is to the left is when the “tail” is to the left
Chapter 2Chapter 2 Slide Slide 1414
Psy B07
TerminologyTerminology
60.5 3 0.75 762.5 8 2.75 1364.5 7 4.75 1266.5 12 6.75 568.5 7 8.75 570.5 6 10.75 272.5 4 12.75 074.5 0 14.75 176.5 1 16.75 1
18.75 020.75 1
0
2
4
6
8
10
12
14
0.75
2.75
4.75
6.75
8.75
10.8
12.8
14.8
16.8
18.8
20.8
Fre
qu
ency
Chapter 2Chapter 2 Slide Slide 1515
Psy B07
NotationNotation
VariablesVariables When we describe a set of data When we describe a set of data
corresponding to the values of corresponding to the values of some variable, we will refer to that some variable, we will refer to that set using a letter such as X or Y.set using a letter such as X or Y.
When we want to talk about When we want to talk about specific data points within that set, specific data points within that set, we specify those points by adding a we specify those points by adding a subscript to the letter like Xsubscript to the letter like X1.1.
Chapter 2Chapter 2 Slide Slide 1616
Psy B07
NotationNotation
5,5, 8, 12,8, 12, 3,3, 6,6, 8,8, 77
X1, X2, X3, X4, X5, X6, X7X1, X2, X3, X4, X5, X6, X7
Chapter 2Chapter 2 Slide Slide 1717
Psy B07
NotationNotation
The Greek letter sigma, which looks The Greek letter sigma, which looks like like , means “add up” or “sum” , means “add up” or “sum” whatever follows it.whatever follows it.
Thus, Thus, XXii, means “add up all the , means “add up all the XXiis.s.
If we use the XIf we use the Xiis from the previous s from the previous example, example, XXi i = 49 (or just = 49 (or just X).X).
Chapter 2Chapter 2 Slide Slide 1818
Psy B07
Nasty ExampleNasty Example
Midterm Real Student Mark Mark X Y
1 82 84 2 66 51 3 70 72 4 81 56 5 61 73
Chapter 2Chapter 2 Slide Slide 1919
Psy B07
Nasty ExampleNasty Example
XX = 360= 360
YY = 336= 336
(X-Y)(X-Y) = 24= 24
XX22 = 26262= 26262
((X)X)22 = 129600= 129600
Chapter 2Chapter 2 Slide Slide 2020
Psy B07
Your turnYour turn
(XY) = 24283(XY) = 24283
(((X-Y))(X-Y))22 = 576 = 576
(X(X22-Y-Y22) = 2956) = 2956
Chapter 2Chapter 2 Slide Slide 2121
Psy B07
NotationNotation
Sometimes things are made more Sometimes things are made more complicated because letters (e.g., complicated because letters (e.g., X) are sometimes used to refer to X) are sometimes used to refer to entire data sets (as opposed to entire data sets (as opposed to single variables) and multiple single variables) and multiple subscripts are used to specify subscripts are used to specify specific data points.specific data points.
Chapter 2Chapter 2 Slide Slide 2222
Psy B07
NotationNotation
Week1 2 3 4 5
Student
1 7 6 4 2 22 3 4 4 3 43 3 4 5 4 6
XX2424 = 3 = 3
X or X or XXijij = 61 = 61
Chapter 2Chapter 2 Slide Slide 2323
Psy B07
Measures of Central Measures of Central TendencyTendency
While distributions provide an While distributions provide an overall picture of some data set, it overall picture of some data set, it is sometimes desirable to represent is sometimes desirable to represent the entire data set usingthe entire data set using descriptive descriptive statisticsstatistics..
The first descriptive statistics we The first descriptive statistics we will discuss, are those used to will discuss, are those used to indicate where the centre of the indicate where the centre of the distribution lies.distribution lies.
Chapter 2Chapter 2 Slide Slide 2424
Psy B07
Measures of Central Measures of Central TendencyTendency
60.5 362.5 864.5 766.5 1268.5 770.5 672.5 474.5 076.5 1
0
2
4
6
8
10
12
14
60.5 62.5 64.5 66.5 68.5 70.5 72.5 74.5 76.5
Height (Inches)
Fre
qu
ency
Chapter 2Chapter 2 Slide Slide 2525
Psy B07
Measures of Central Measures of Central TendencyTendency
There are, in fact, three different There are, in fact, three different measures of central tendency.measures of central tendency.
The first of these is called the The first of these is called the modemode..
The mode is simply the value of the The mode is simply the value of the relevant variable that occurs most relevant variable that occurs most often (i.e., has the highest often (i.e., has the highest frequency) in the sample.frequency) in the sample.
Chapter 2Chapter 2 Slide Slide 2626
Psy B07
Measures of Central Measures of Central TendencyTendency
Note that if you have done a frequency Note that if you have done a frequency histogram, you can often identify the histogram, you can often identify the mode simply by finding the value with mode simply by finding the value with the highest bar.the highest bar.
However, that will not work when However, that will not work when grouping was performed prior to plotting grouping was performed prior to plotting the histogram (although you can still use the histogram (although you can still use the histogram to identify the modal the histogram to identify the modal group, just not the modal value)group, just not the modal value)
Chapter 2Chapter 2 Slide Slide 2727
Psy B07
Measures of Central Measures of Central TendencyTendency
Value Freq Value Freq
61 3 69 362 4 70 263 4 71 464 4 72 465 3 73 066 7 74 067 5 75 068 4 76 1
Create a non-grouped frequency table as Create a non-grouped frequency table as described previously, then identify the value with described previously, then identify the value with the greatest frequency.the greatest frequency.
Example: Class height.Example: Class height.
Chapter 2Chapter 2 Slide Slide 2828
Psy B07
Measures of Central Measures of Central TendencyTendency
A second measure of central A second measure of central tendency is called the tendency is called the medianmedian..
The median is the point The median is the point corresponding to the score that lies corresponding to the score that lies in the middle of the distribution in the middle of the distribution (i.e., there are as many data points (i.e., there are as many data points above the median as there are above the median as there are below the median).below the median).
Chapter 2Chapter 2 Slide Slide 2929
Psy B07
Measures of Central Measures of Central TendencyTendency
To find the median, the data points must To find the median, the data points must first be sorted into either ascending or first be sorted into either ascending or descending numerical order.descending numerical order.
The The positionposition of the median value can then of the median value can then be calculated using the following formula:be calculated using the following formula:
2
1N
Median Location
Chapter 2Chapter 2 Slide Slide 3030
Psy B07
Measures of Central Measures of Central TendencyTendency
1) If there are an odd number of data 1) If there are an odd number of data points:points:
(1, 3, 3, 4, 4, 5, 6, 7, 12)(1, 3, 3, 4, 4, 5, 6, 7, 12)
The median is the item in the fifth The median is the item in the fifth position of the ordered data set, position of the ordered data set, therefore the median is 4therefore the median is 4
Median Location 52
19
Chapter 2Chapter 2 Slide Slide 3131
Psy B07
Measures of Central Measures of Central TendencyTendency
2) If there are an even number of data 2) If there are an even number of data points:points:
(1, 3, 3, 3, 5, 5, 6, 7)(1, 3, 3, 3, 5, 5, 6, 7)
We take the average of the two adjacent We take the average of the two adjacent values – in this case giving us 4values – in this case giving us 4
Median Location 5.42
18
Chapter 2Chapter 2 Slide Slide 3232
Psy B07
Measures of Central Measures of Central TendencyTendency
Finally, the most commonly used Finally, the most commonly used measure of central tendency is called measure of central tendency is called the the meanmean (denoted x for a sample, and (denoted x for a sample, and μμ for a population).for a population).
The mean is the same of what most of The mean is the same of what most of us call the average, and it is calculated us call the average, and it is calculated in the following manner:in the following manner:
N
XX
Chapter 2Chapter 2 Slide Slide 3333
Psy B07
Measures of Central Measures of Central TendencyTendency
For example, given the data set that we For example, given the data set that we used to calculate the median (odd used to calculate the median (odd number example), the corresponding number example), the corresponding mean would be:mean would be:
59
45
N
XX
Chapter 2Chapter 2 Slide Slide 3434
Psy B07
Measures of Central Measures of Central TendencyTendency
When a distribution is fairly When a distribution is fairly symmetrical, the mean, median, symmetrical, the mean, median, and mode will be quite similarand mode will be quite similar
However, when the underlying However, when the underlying distribution is not symmetrical, the distribution is not symmetrical, the three measures of central tendency three measures of central tendency can be quite differentcan be quite different
Chapter 2Chapter 2 Slide Slide 3535
Psy B07
Measures of Central Measures of Central TendencyTendency
This raises the issue of which measure is best.This raises the issue of which measure is best.
Note that if you were calculating these values, you would show all your steps (it’s good to be a prof!).Note that if you were calculating these values, you would show all your steps (it’s good to be a prof!).
Mode = 2 slices per week
Median = 4 slices per week
Mean = 5.7 slices per week
Example: Pizza EatingValue Freq Value Freq
0 4 8 51 2 10 22 8 15 13 6 16 14 6 20 15 6 40 16 5
Chapter 2Chapter 2 Slide Slide 3636
Psy B07
Measures of Central Measures of Central TendencyTendency
Here is a demonstration that allows you to change a frequency histogram while simultaneously noting the effects of those changes on the mean versus the median.
As you use the demo, you should easily be able to think about how these changes are also affecting the mode, right?
Chapter 2Chapter 2 Slide Slide 3737
Psy B07
Measures of VariabilityMeasures of Variability
In addition to knowing where the In addition to knowing where the centre of the distribution is, it is centre of the distribution is, it is often helpful to know the degree to often helpful to know the degree to which individual values cluster which individual values cluster around the centre.around the centre.
This is known as This is known as variabilityvariability
Chapter 2Chapter 2 Slide Slide 3838
Psy B07
Measures of VariabilityMeasures of Variability
There are various measures of variability, There are various measures of variability, the most straightforward being the range the most straightforward being the range of the sample:of the sample:
Highest value minus lowest valueHighest value minus lowest value
While range provides a good first pass at While range provides a good first pass at variance, it is not the best measure variance, it is not the best measure because of its sensitivity to extreme because of its sensitivity to extreme scores (see text).scores (see text).
Chapter 2Chapter 2 Slide Slide 3939
Psy B07
Measures of VariabilityMeasures of Variability
One approach to estimating variability is One approach to estimating variability is to directly measure the degree to which to directly measure the degree to which individual data points differ from the individual data points differ from the mean and then average those deviations.mean and then average those deviations.
This is known as the This is known as the average deviationaverage deviation
N
XX )(
Chapter 2Chapter 2 Slide Slide 4040
Psy B07
Measures of VariabilityMeasures of Variability
However, if we try to do this with real However, if we try to do this with real data, the result will always be zero:data, the result will always be zero:
Example: (2,3,3,4,4,6,6,12)Example: (2,3,3,4,4,6,6,12)
08
0
8
)7,1,1,1,1,2,2,3()(
N
XX
Chapter 2Chapter 2 Slide Slide 4141
Psy B07
Measures of VariabilityMeasures of Variability
One way to get around the problem One way to get around the problem with the average deviation is to use with the average deviation is to use the absolute value of the differences, the absolute value of the differences, instead of the differences themselves.instead of the differences themselves.
The absolute value of some number is The absolute value of some number is just the number without any sign:just the number without any sign:
For Example: |-3| = 3For Example: |-3| = 3 And: |+3| = 3And: |+3| = 3
Chapter 2Chapter 2 Slide Slide 4242
Psy B07
Measures of VariabilityMeasures of Variability
Thus, we could re-write and solve our average Thus, we could re-write and solve our average deviation question as follows:deviation question as follows:
Therefore, this data set has a mean of 5, and a Therefore, this data set has a mean of 5, and a MAD of 2.25MAD of 2.25
25.28
188
71111223
N
XXMAD
Chapter 2Chapter 2 Slide Slide 4343
Psy B07
Measures of VariabilityMeasures of Variability
Although the MAD is an acceptable Although the MAD is an acceptable measure of variability, the most measure of variability, the most commonly used measure is commonly used measure is variance (denoted svariance (denoted s22 for a sample for a sample and and 22 for a population) and its for a population) and its square root termed the standard square root termed the standard deviation (denoted s for a sample deviation (denoted s for a sample and and for a population). for a population).
Chapter 2Chapter 2 Slide Slide 4444
Psy B07
Measures of VariabilityMeasures of Variability
The computation of variance is also The computation of variance is also based on the basic notion of the average based on the basic notion of the average deviation however, instead of getting deviation however, instead of getting around the “zero problem” by using around the “zero problem” by using absolute deviations (as in MAD), the absolute deviations (as in MAD), the “zero problem” is eliminating by “zero problem” is eliminating by squaring the differences from the meansquaring the differences from the mean
N
XX 22 )(
Chapter 2Chapter 2 Slide Slide 4545
Psy B07
Measures of VariabilityMeasures of Variability
Example: (2,3,4,4,4,5,6,12)Example: (2,3,4,4,4,5,6,12)
25.88
)491011149(
)( 22
N
XX
Chapter 2Chapter 2 Slide Slide 4646
Psy B07
Measures of VariabilityMeasures of Variability
To convert the variance into SD, we To convert the variance into SD, we simply take a square root of it:simply take a square root of it:
87.2
25.8
8
)491011149(
)( 2
N
XX
Chapter 2Chapter 2 Slide Slide 4747
Psy B07
Measures of VariabilityMeasures of Variability
This demonstration allows you to This demonstration allows you to play with the mean and standard play with the mean and standard deviation of a distribution. Note deviation of a distribution. Note that changing the mean of the that changing the mean of the distribution simply moves the entire distribution simply moves the entire distribution to the left or right distribution to the left or right without changing its shape. In without changing its shape. In contrast, changing the standard contrast, changing the standard deviation alters the spread of the deviation alters the spread of the data but does not affect where the data but does not affect where the distribution is “centered”distribution is “centered” DEMO
Chapter 2Chapter 2 Slide Slide 4848
Psy B07
Measures of VariabilityMeasures of Variability
Population vs. SamplePopulation vs. Sample As mentioned, we usually deal with As mentioned, we usually deal with
statistics, not parameters. statistics, not parameters. σσ22 andand σσ are are parameters. Their counterparts, when parameters. Their counterparts, when dealing with samples are sdealing with samples are s22 and s. The and s. The formulae are slightly differentformulae are slightly different
1
)(2
N
XXs
1
)(
N
XXs
Chapter 2Chapter 2 Slide Slide 4949
Psy B07
Properties of a StatisticProperties of a Statistic
So, the mean (X) and variance (sSo, the mean (X) and variance (s22) are ) are the descriptive statistics that are most the descriptive statistics that are most commonly used to represent the data commonly used to represent the data points of some sample.points of some sample.
The real reason that they are the The real reason that they are the preferred measures of central tendency preferred measures of central tendency and variance is because of certain and variance is because of certain properties they have as estimators of properties they have as estimators of their corresponding population their corresponding population parameters; parameters; μμ and and 22..
Chapter 2Chapter 2 Slide Slide 5050
Psy B07
Properties of a StatisticProperties of a Statistic
Four properties are considered desirable Four properties are considered desirable in a population estimator; sufficiency, in a population estimator; sufficiency, unbiasedness, efficiency, & resistance.unbiasedness, efficiency, & resistance.
Both the mean and the variance are the Both the mean and the variance are the best estimators in their class in terms of best estimators in their class in terms of the first three of these four properties.the first three of these four properties.
To understand these properties, you first To understand these properties, you first need to understand a concept in need to understand a concept in statistics called the sampling distributionstatistics called the sampling distribution
Chapter 2Chapter 2 Slide Slide 5151
Psy B07
Properties of a StatisticProperties of a Statistic
We will discuss sampling distributions off and on throughout the course, and I only want to touch on the notion now.
Basically, the idea is this – in order to examine the properties of a statistic we often want to take repeated samples from some population of data and calculate the relevant statistic on each sample. We can then look at the distribution of the statistic across these samples and ask a variety of questions about it.
Check out this demonstration which I hope makes the concept of sampling distributions more clear.
Chapter 2Chapter 2 Slide Slide 5252
Psy B07
Properties of a StatisticProperties of a Statistic
1) 1) SufficiencySufficiency
A A sufficientsufficient statistic is one that statistic is one that makes use of all of the information makes use of all of the information in the sample to estimate its in the sample to estimate its corresponding parameter.corresponding parameter.
Chapter 2Chapter 2 Slide Slide 5353
Psy B07
Properties of a StatisticProperties of a Statistic
2) 2) UnbiasednessUnbiasedness
A statistic is said to be an A statistic is said to be an unbiasedunbiased estimator if its expected value (i.e., estimator if its expected value (i.e., the mean of a number of sample the mean of a number of sample means) is equal to the population means) is equal to the population parameter it is estimating.parameter it is estimating.
Explanation of N-1 in sExplanation of N-1 in s22 formula. formula.
Chapter 2Chapter 2 Slide Slide 5454
Psy B07
Properties of a StatisticProperties of a Statistic
Using the procedure, the mean can Using the procedure, the mean can be shown to be an unbiased be shown to be an unbiased estimator (see p 47).estimator (see p 47).
However, if the However, if the σσ22 formula is used formula is used to calculate to calculate ss22 it turns out to it turns out to underestimate underestimate σσ22
Chapter 2Chapter 2 Slide Slide 5555
Psy B07
Properties of a StatisticProperties of a Statistic
The reason for this bias is that, when we The reason for this bias is that, when we calculate scalculate s2, 2, we use x, an estimator of the we use x, an estimator of the population meanpopulation mean
The chances of x being EXACTLY the same The chances of x being EXACTLY the same as as μμ are virtually nil, which results in the are virtually nil, which results in the biasbias
To compensate, we use N-1To compensate, we use N-1 Note that this is only true when calculating Note that this is only true when calculating
ss22, if you have a measurable population , if you have a measurable population and you want to calculate and you want to calculate 22, you use N in , you use N in the denominator, not N-1the denominator, not N-1
Chapter 2Chapter 2 Slide Slide 5656
Psy B07
Properties of a StatisticProperties of a Statistic
Degrees of FreedomDegrees of Freedom The mean of 6, 8, & 10 is 8.The mean of 6, 8, & 10 is 8.
If I allow you to change as many If I allow you to change as many of these numbers as you want of these numbers as you want BUT the mean must stay 8, how BUT the mean must stay 8, how many of the numbers are you free many of the numbers are you free to vary?to vary?
Chapter 2Chapter 2 Slide Slide 5757
Psy B07
Properties of a StatisticProperties of a Statistic
The point of this exercise is that when the The point of this exercise is that when the mean is fixed, it removes a degree of mean is fixed, it removes a degree of freedom from your sample -- this is like freedom from your sample -- this is like actually subtracting 1 from the number of actually subtracting 1 from the number of observations in your sample.observations in your sample.
It is for exactly this reason that we use N-It is for exactly this reason that we use N-1 in the denominator when we calculate s1 in the denominator when we calculate s22 (i.e., the calculation requires that the (i.e., the calculation requires that the mean be fixed first which effectively mean be fixed first which effectively removes -- fixes -- one of the data points).removes -- fixes -- one of the data points).
Chapter 2Chapter 2 Slide Slide 5858
Psy B07
Properties of a StatisticProperties of a Statistic
3) 3) EfficiencyEfficiency
The The efficiencyefficiency of a statistic is of a statistic is reflected in the variance that is reflected in the variance that is observed when one examines the observed when one examines the means of a bunch of independently means of a bunch of independently chosen samples. The smaller the chosen samples. The smaller the variance, the more efficient the variance, the more efficient the statistic is said to bestatistic is said to be
Chapter 2Chapter 2 Slide Slide 5959
Psy B07
Properties of a StatisticProperties of a Statistic
4) 4) ResistanceResistance
The The resistanceresistance of an estimator of an estimator refers to the degree to which that refers to the degree to which that estimate is effected by extreme estimate is effected by extreme values.values.
As mentioned previously, both X As mentioned previously, both X and sand s22 are highly sensitive to are highly sensitive to extreme valuesextreme values
Chapter 2Chapter 2 Slide Slide 6060
Psy B07
Properties of a StatisticProperties of a Statistic
4) 4) ResistanceResistance
Despite this, they are still the most Despite this, they are still the most commonly used estimates of the commonly used estimates of the corresponding population corresponding population parameters, mostly because of parameters, mostly because of their superiority over other their superiority over other measures in terms sufficiency, measures in terms sufficiency, unbiasedness, & efficiencyunbiasedness, & efficiency