Upload
tarrus-pointer
View
243
Download
0
Embed Size (px)
Citation preview
8/3/2019 2 Slides for Chapters 1-4
1/44
1
A Statistical Journey:Taming of the Skew
A Tutorial Of Chapters 1 4
c. 2009 by Dr. Donald F. DeMoulinand Dr. William Allen Kritsonis
These Slides May Not Be Altered or Modified
8/3/2019 2 Slides for Chapters 1-4
2/44
2
Topics Of This Lesson
1) Introduction2) Review of the Basics3) Research Rules4) Statistical Symbols5) Statistical Terms6) Data Strength7) Measures of Central Tendency
1) Mean2) Median
3) Mode8) Measures of Variability
1) Range2) Variance3) Standard Deviation
9) Distribution Types10) Putting it all together
8/3/2019 2 Slides for Chapters 1-4
3/44
3
IntroductionMany wonder why a statistical concept is so hard to grasp?
Have you ever tried to understand a native from Italy, France,Russia, China, or Japan
It is because they speak a foreign language and something that isunfamiliar to your vocabulary
In this realm, statisticians, most of the time, speak in a foreignlanguage; a language we will call Statonese (stat-n-eaze)
8/3/2019 2 Slides for Chapters 1-4
4/44
4
IntroductionFor ExampleHave you ever heard
Four out of five dentists recommend Brand X toothpaste to helpfight cavities
Nine out of 10 doctors stranded on a desert island recommendBrand Y aspirin for headache pain
Five out of six farmers reported significant increases in yield fromusing Brand Z fertilizer
These are just a few of the many thousands of examples for
statistical applications
8/3/2019 2 Slides for Chapters 1-4
5/44
5
IntroductionBut, have you ever thought:
1) What makes up the four out of five doctorsare thefour employed by Brand X toothpaste
2) Who are the nine out of ten doctorsand why arethey stranded on a desert island
3) What constitutes a significant increase in yield
8/3/2019 2 Slides for Chapters 1-4
6/44
6
IntroductionDeciphering what the numbers are and gaining an
understanding of statistical procedures and concepts
in order to make a somewhat accurate, independentjudgment of reports, statements, and claims is whatstatistics is all about
Our discussions will minimize the guesswork aboutstatistics and maximize the WHEN, the WHY, andthe HOWthe basis for statistical applications
8/3/2019 2 Slides for Chapters 1-4
7/447
IntroductionThe goal of these Power Point slides is to bring
statistical concepts, applications, and
explanations toyou in a language that can beunderstood of how statistical procedures aredeveloped, analyzed, and interpreted
So, Let Us Begin!
8/3/2019 2 Slides for Chapters 1-4
8/448
Review of the Basics A variable is usually represented by the symbolX
If you have two variables, they are usually associatedwith the symbolsXandY, although this is not set instonecertainly usingA, B, C, or D is also acceptable
A subscript after each variable denotes the numberedvariable (X1 X2 X3 X4 X5)
For example, let us have two variables, X and Y, thatrepresent two different turtle races
8/3/2019 2 Slides for Chapters 1-4
9/449
Review of the BasicsA hypothetical data set may look something like this:
* Time for a turtle to complete a one-inch race
* Number = five turtles for each race* Variable X = participants in race one
* Variable Y = participants in race two
Participant in Race one X1 X2 X3 X4 X5
Time (in seconds) 45 41 30 28 59
Participant in Race two Y1 Y2 Y3 Y4 Y5
Time (in seconds) 25 36 56 51 43
8/3/2019 2 Slides for Chapters 1-4
10/4410
Review of the BasicsParticipant in Race one X1 X2 X3 X4 X5Time (in seconds) 45 41 30 28 59
Participant in Race two Y1 Y2 Y3 Y4 Y5Time (in seconds) 25 36 56 51 43
What is the time logged for participating turtle X4?
28 seconds
Which participating turtle in race two logged a whopping 51 seconds forcompleting the race?
Turtle Y4
8/3/2019 2 Slides for Chapters 1-4
11/4411
Review of the BasicsThe Greek symbol means sum or to sum a set of
numbers following it
So, X would simply mean to sum all values of thevariable X
_ X_______X1 = 45
X2 = 41X3 = 30X4 = 28X5 = 59
X = 203
8/3/2019 2 Slides for Chapters 1-4
12/4412
Review of the BasicsThe notation X simply sums all the squared values of X
For example, the hypothetical data set for race one would be:
X X45 x 45 = 202541 x 41 = 168130 x 30 = 90028 x 28 = 78459 x 59 = 3481X = 203
X= 8,871
The X = 45 + 41 + 30 + 28 + 59 = 203
X is equivalent to (45) + (41) + (30) + (28) + (59)
The X = 2,025 + 1,681 + 900 + 784 + 3,481 = 8,871
8/3/2019 2 Slides for Chapters 1-4
13/4413
Review of the BasicsHow about the (X)
This identifier means to sum all the values of Xand square the answer
X
4541
30
28
59
X= 203 = (X) = (203) = 41,209
The (X) would be 45 + 41 + 30 + 28 + 59 = 203
Now, square the value 203 and you get 41,209
8/3/2019 2 Slides for Chapters 1-4
14/44
14
Review of the BasicsPlease, remember that X does not equal
(X)
X = 8,871
(X) = 41,209
X X45 x 45 = 202541 x 41 = 168130 x 30 = 90028 x 28 = 784
59 x 59 = 3481X = 203 (203) = 41,209
X= 8,871
8/3/2019 2 Slides for Chapters 1-4
15/44
15
Review of the BasicsFinally, we have XY
This notation signifies a summing of products of corresponding values X and Y (crossproducts)
X Y XY_____45 25 1,12541 36 1,47630 56 1,68028 51 1,42859 43 2,537
X = 203 Y = 211 XY = 8,246
All we do is multiply X and Y and put them in column labeled XY
Fill in the missing spaces and add the column XYAlgebraically, it would be (45)(25) +(41)(36) + (30)(56) + (28)(51) + (59)(43) = 1,125 + 1,476 + 1,680 + 1,428 +2,537 = 8,246
8/3/2019 2 Slides for Chapters 1-4
16/44
16
Summary of the BasicsA variable is usually represented by the symbol X
The Greek symbol means to sum a set of numbers following it
X would simply mean to sum all values of the variable X
X simply sums all the squared values of X
(X) means to sum all the values of X and square the answer
XY signifies a summing of products of corresponding values X and Y(cross products)
Now we continue with the Rules of Research
8/3/2019 2 Slides for Chapters 1-4
17/44
17
There are two rules in research
Rule #1 is that credibility and believability are vital components in research
In essence the researcher must be credible by conducting his/her researchwith integrity, honesty and within proper research etiquette
This leads to the next component of Rule #1 where the results (data input,data analysis and statistical interpretation) must be believable whichinvolves proper coding and the use of the appropriate statistical procedure
for analysis
Credibility and believability are the two critical aspects of any research forwithout them, the entire research process undertaken becomes aninsignificant exercise
Research Rules
8/3/2019 2 Slides for Chapters 1-4
18/44
18
Now, Rule # 2 is simply
Research Rules
8/3/2019 2 Slides for Chapters 1-4
19/44
19First learn Rule #1
8/3/2019 2 Slides for Chapters 1-4
20/44
20
Cowboy Proverb
These are critical rules in research because
if you do not have credibility as aresearcher, the results that are producedlack believability
And, as the ole Cowboy Proverb goes:
8/3/2019 2 Slides for Chapters 1-4
21/44
21
Cowboy Proverb
Dont dig for water
Under the Outhouse
8/3/2019 2 Slides for Chapters 1-4
22/44
22
Research Rules
In other words, dont expect believabilityand credibility with data that is polluted,
tainted or contaminated
By following the rules of research, you
maximize your credibility as a researcherand the believability of your results
8/3/2019 2 Slides for Chapters 1-4
23/44
23
----Sigma-------mathematical notation meaning to add up
X----variable that can represent any score in the distribution
------mu----------symbol for the mean of a population
X-bar-----------symbol for the mean of a sample
2 or S2----------symbol for the variance of a population
or S----------symbol for the standard deviation of a population
^
2 or s2 ----symbol for the variance of a sample
^
or s------symbol for the standard deviation of a sample
three dots (...)---symbolic representation which literally mean "and so on"
Statistical Symbols
The caret top ^ denotessample
8/3/2019 2 Slides for Chapters 1-4
24/44
24
Descriptive Statisticsis taking raw data and describingit in a meaningful way (to make sense out of data)generating a profile of that data set utilizing graphs,
charts, and other picturesque techniques to helpdisplay and interpret the data
Inferential Statisticsis taking the results fromdescriptive procedures of the raw data and subjectingthem to a higher order statistical procedure toreasonably infer results to a corresponding populationby following certain rules and assumptions
Statistical Terms
8/3/2019 2 Slides for Chapters 1-4
25/44
25
Parametric statisticsare concerned about a parameterof a given population, hence inferences can be madefrom the resulting analysis to the population of
concern
Non-parametric statistics, on the other hand, do notconform to any stringent assumptions, and thereforehave more latitude in proceduresbecause stringentassumptions are not strictly adhered to,we cannotconfidently generalize the results to a population
Statistical Terms
8/3/2019 2 Slides for Chapters 1-4
26/44
26
A variableis defined as a property of an eventor item that can be changed or can take ondifferent values
A dependent variableis called the measured,outcome, or criterion variable
An independent variableis the variable that ischanged, altered, or manipulated by theexperimenter during research
Statistical Terms
8/3/2019 2 Slides for Chapters 1-4
27/44
27
A qualitative variablerefers to non-numerical qualities, attributes, items
such as gender, eye color, etc.
A quantitative variableis concerned with
numerical qualities such as the numberof items falling into various categories ormeasurable data
Statistical Terms
8/3/2019 2 Slides for Chapters 1-4
28/44
28
Data are considered nominal strength if the assignment ofnumbers to objects does no more than identify the objects
An example of this would be a football jersey to identify a player on
the field
Data considered ordinal strength contain elements of thenominal scale of measurement plus the inclusion of an ordering ofobjects thereby implying magnitudecontaining objects that arelabeled, but also objects that are ranked in accordance to
importance
Military rank would be an example of ordinal data or lining up peopleaccording height with 1 being the smallest to 10 being the tallest
Data Strength
8/3/2019 2 Slides for Chapters 1-4
29/44
29
Data considered Interval strength contain all the elements ofnominal and ordinal scales (labeling and ordering) plus equalintervals between each item
A thermometer would be an example of interval strength data sincethe distance between 20 and 30 degrees is the same distancebetween 50 and 60 degreeshowever 60 degrees is not twice aswarm as 30 degrees since we can have minus degrees in temperature
Data considered Ratiostrength contain all elements of nominal,
ordinal, and interval strength (labeling, ordering, equal distancebetween items) plus the inclusion of an absolute zero
Height and weight are examples of ratio strength data since there isno negative weight or height
Data Strength
8/3/2019 2 Slides for Chapters 1-4
30/44
30
Measures of Central Tendency
How Data Gathers Around the Center of a Data Set
8/3/2019 2 Slides for Chapters 1-4
31/44
31
Measures of Central Tendency
Mean
Median
Mode
Exact Middle of a DataSetData Must BeRanked From High toLow or Low to High
Most Frequently OccurringData Point
mean absolute deviation
[(X X-bar) = 0]
Mean, Median and Modeare located at the exactsame place on a normaldistribution
The sum of the deviation fromthe mean must equal zero
8/3/2019 2 Slides for Chapters 1-4
32/44
32
Group 1 Group 2
(X) (Y) .
Score (XX-bar) Score (Y X-bar)
72 - 75 = -3 67 - 75 = -8
73 - 75 = -2 72 - 75 = -3
76 - 75 = 1 76 - 75 = 1
76 - 75 = 1 76 - 75 = 1
78 - 75 = 3 84 - 75 = 9 .
X = 375 Y = 375[(X X-bar) = 0] [(Y X-bar) = 0]
N = 5 N = 5
X-bar = 75 (375/5) X-bar = 75 (375/5)
Median = 76 Median = 76Mode = 76 (two scores of 76) Mode = 76 (two scores of 76)
Measures of Central Tendency
8/3/2019 2 Slides for Chapters 1-4
33/44
33
Measures of Variability
How Data is Dispersed Throughout a Data Set
8/3/2019 2 Slides for Chapters 1-4
34/44
34
Measures of Variability
Variance
StandardDeviation
RangeDifference Between Highest andLowest Numbers in a Data Set
X - (X)2 = n .
n
X - (X)
= n .n
Variance is the Square ofthe Standard Deviation
Standard Deviation is the
Square Root of the Variance
8/3/2019 2 Slides for Chapters 1-4
35/44
35
Measures of Variability
Group 1 Group 2
(X) X (Y) Y72 5,184 67 4,48973 5,329 72 5,18476 5,776 76 5,77676 5,776 76 5,776
78 6,084 84 7,056X = 375 X = 28,149 Y = 375 Y = 28,281N = 5 N = 5
8/3/2019 2 Slides for Chapters 1-4
36/44
36
Measures of Variability
Group 1
X - (X)
2 = n .
n
28,149 (375)
2
= 5 .5
28,149 (140,625)
2
= 5 .5
28,149 28,125
2 = 5
2 = 24
5
2 = 4.8 = 4.8 = 2.19
Group 2
Y - (Y)
2 = n .
n
28,281 (375)2 = 5 .
5
28,281 (140,625)2 = 5 .
5
28,281 28,125
2 = 5
2 = 156
5
2 = 31.2 = 31.2 = 5.59
8/3/2019 2 Slides for Chapters 1-4
37/44
37
Distribution Types
Normal
Skewed
8/3/2019 2 Slides for Chapters 1-4
38/44
38
Distribution Types
MesokurticMiddle Peaked
Leptokurtic
High-Peaked PlatykurticLow-Peaked
Normal Distribution
Also known as symmetrical, standard normal and z-normaldistributions
Right Half is
Mirror Image ofLeft Half
Kurtosis is how adistribution is peaked
If a distribution is notsymmetrical, then it isasymmetrical or skewedwhere the right half is notthe mirror image of theleft half
Tail points to the negative end of the number line Tail points to the positive end of the number line
8/3/2019 2 Slides for Chapters 1-4
39/44
39
Distribution Types
Roughly 68% of all
scores fall within onestandard deviation + or the mean
Roughly 95% of all scores fall withintwo standard deviations + or themean
Roughly 99.7% of all scores fall within three standard deviations+ or the mean
The remaining .003 percent isconsidered outliers that donot conform to the standardnormal population distributionabove 3 standard deviations +or the mean
The remaining .003 percent isconsidered outliers that donot conform to the standardnormal population distributionabove 3 standard deviations +or the mean
8/3/2019 2 Slides for Chapters 1-4
40/44
40
Relation of Mean andStandard Deviation
= 0
= 1
= 50
= 1020 30 40 50 60 70 80
= 50
= 244 46 48 50 52 54 56
The mean and standard deviation help determine the height (kurtosis) of a distribution through
the variability of scores dispersed throughout the data set
Range = 12 (56 - 44 = 12)
Moving more towards leptokurtic shape
Range = 60(80 20 = 60)
Moving more towards platykurtic shape
99.7% of scores fall between 3standard deviations plus or minusthe mean or between 20 and 80and between 44 and 56 in our
examples
8/3/2019 2 Slides for Chapters 1-4
41/44
41
Putting it all Together
8/3/2019 2 Slides for Chapters 1-4
42/44
42
Putting It All Together
Non Parametric
Parametric
Ratio
Interval
Ordinal
Nominal
__________________________________________________________
Data Type Data Strength Data Tests
One Sample z-testOne Sample t-testIndependent t-testDependent t-testANOVAPearson CorrelationRepeated Measures ANOVA
Mann-Whitney UWilcoxon TSpearman RhoKruskal-Wallis HFriedman ANOVA (ranks)
Chi-Square Goodness-Of-FitChi SquareTest of Independence
Mean
Median
Mode
AssumptionsNormally distributedVariableHomogeneity of VarianceNull hypothesis is trueat leastinterval strength data
8/3/2019 2 Slides for Chapters 1-4
43/44
43
Descriptive Statistics
Computer-Generated Analysis
(X) (Y)75 6276 7576 7677 76
78 85
8/3/2019 2 Slides for Chapters 1-4
44/44
Computer-Generated Results
76.400
1.140.510
5
75.000
78.000
0
1.300.015
3.000
382.000
29190.000
76.393
76.386
.272
-1.044
76.000
1.500
76.000
76.400
74.800
8.2283.680
5
62.000
85.000
0
67.700.110
23.000
374.000
28246.000
74.422
74.027
-.518
-.431
76.000
6.500
76.000
74.800
Mean
Std. Dev.Std. Error
Count
Minimum
Maximum
# Missing
VarianceCoef. Var.
Range
Sum
Sum Squares
Geom. Mean
Harm. Mean
Skew ness
Kurtosis
Median
IQR
Mode
10% Tr. Mean
Column 1 Column 2
Descriptive Statistics
Skewness is an asymmetrical distributionif skewnessis positive (negative), the data are skewed to the right
(left)the larger the number, the greater the skewnotice that Mean, Median and Mode are almostidentical giving an almost perfect normal distribution
Kurtosis refers to how peaked the distribution iswhen kurtosis = 3, it is a normal height distribution(Mesokurtic)Kurtosis > 3 is a high peakeddistribution (Leptokurtic)Kurtosis < 3 is a low