2 Slides for Chapters 1-4

8/3/2019 2 Slides for Chapters 1-4

1/44

1

A Statistical Journey:Taming of the Skew

A Tutorial Of Chapters 1 4

c. 2009 by Dr. Donald F. DeMoulinand Dr. William Allen Kritsonis

These Slides May Not Be Altered or Modified


2/44

2

Topics Of This Lesson

1) Introduction2) Review of the Basics3) Research Rules4) Statistical Symbols5) Statistical Terms6) Data Strength7) Measures of Central Tendency

1) Mean2) Median

3) Mode8) Measures of Variability

1) Range2) Variance3) Standard Deviation

9) Distribution Types10) Putting it all together


3/44

3

IntroductionMany wonder why a statistical concept is so hard to grasp?

Have you ever tried to understand a native from Italy, France,Russia, China, or Japan

It is because they speak a foreign language and something that isunfamiliar to your vocabulary

In this realm, statisticians, most of the time, speak in a foreignlanguage; a language we will call Statonese (stat-n-eaze)


4/44

4

IntroductionFor ExampleHave you ever heard

Four out of five dentists recommend Brand X toothpaste to helpfight cavities

Nine out of 10 doctors stranded on a desert island recommendBrand Y aspirin for headache pain

Five out of six farmers reported significant increases in yield fromusing Brand Z fertilizer

These are just a few of the many thousands of examples for

statistical applications


5/44

5

IntroductionBut, have you ever thought:

1) What makes up the four out of five doctorsare thefour employed by Brand X toothpaste

2) Who are the nine out of ten doctorsand why arethey stranded on a desert island

3) What constitutes a significant increase in yield


6/44

6

IntroductionDeciphering what the numbers are and gaining an

understanding of statistical procedures and concepts

in order to make a somewhat accurate, independentjudgment of reports, statements, and claims is whatstatistics is all about

Our discussions will minimize the guesswork aboutstatistics and maximize the WHEN, the WHY, andthe HOWthe basis for statistical applications


7/447

IntroductionThe goal of these Power Point slides is to bring

statistical concepts, applications, and

explanations toyou in a language that can beunderstood of how statistical procedures aredeveloped, analyzed, and interpreted

So, Let Us Begin!


8/448

Review of the Basics A variable is usually represented by the symbolX

If you have two variables, they are usually associatedwith the symbolsXandY, although this is not set instonecertainly usingA, B, C, or D is also acceptable

A subscript after each variable denotes the numberedvariable (X1 X2 X3 X4 X5)

For example, let us have two variables, X and Y, thatrepresent two different turtle races


9/449

Review of the BasicsA hypothetical data set may look something like this:

* Time for a turtle to complete a one-inch race

* Number = five turtles for each race* Variable X = participants in race one

* Variable Y = participants in race two

Participant in Race one X1 X2 X3 X4 X5

Time (in seconds) 45 41 30 28 59

Participant in Race two Y1 Y2 Y3 Y4 Y5

Time (in seconds) 25 36 56 51 43


10/4410

Review of the BasicsParticipant in Race one X1 X2 X3 X4 X5Time (in seconds) 45 41 30 28 59

Participant in Race two Y1 Y2 Y3 Y4 Y5Time (in seconds) 25 36 56 51 43

What is the time logged for participating turtle X4?

28 seconds

Which participating turtle in race two logged a whopping 51 seconds forcompleting the race?

Turtle Y4


11/4411

Review of the BasicsThe Greek symbol means sum or to sum a set of

numbers following it

So, X would simply mean to sum all values of thevariable X

_ X_______X1 = 45

X2 = 41X3 = 30X4 = 28X5 = 59

X = 203


12/4412

Review of the BasicsThe notation X simply sums all the squared values of X

For example, the hypothetical data set for race one would be:

X X45 x 45 = 202541 x 41 = 168130 x 30 = 90028 x 28 = 78459 x 59 = 3481X = 203

X= 8,871

The X = 45 + 41 + 30 + 28 + 59 = 203

X is equivalent to (45) + (41) + (30) + (28) + (59)

The X = 2,025 + 1,681 + 900 + 784 + 3,481 = 8,871


13/4413

Review of the BasicsHow about the (X)

This identifier means to sum all the values of Xand square the answer

X

4541

30

28

59

X= 203 = (X) = (203) = 41,209

The (X) would be 45 + 41 + 30 + 28 + 59 = 203

Now, square the value 203 and you get 41,209


14/44

14

Review of the BasicsPlease, remember that X does not equal

(X)

X = 8,871

(X) = 41,209

X X45 x 45 = 202541 x 41 = 168130 x 30 = 90028 x 28 = 784

59 x 59 = 3481X = 203 (203) = 41,209

X= 8,871


15/44

15

Review of the BasicsFinally, we have XY

This notation signifies a summing of products of corresponding values X and Y (crossproducts)

X Y XY_____45 25 1,12541 36 1,47630 56 1,68028 51 1,42859 43 2,537

X = 203 Y = 211 XY = 8,246

All we do is multiply X and Y and put them in column labeled XY

Fill in the missing spaces and add the column XYAlgebraically, it would be (45)(25) +(41)(36) + (30)(56) + (28)(51) + (59)(43) = 1,125 + 1,476 + 1,680 + 1,428 +2,537 = 8,246


16/44

16

Summary of the BasicsA variable is usually represented by the symbol X

The Greek symbol means to sum a set of numbers following it

X would simply mean to sum all values of the variable X

X simply sums all the squared values of X

(X) means to sum all the values of X and square the answer

XY signifies a summing of products of corresponding values X and Y(cross products)

Now we continue with the Rules of Research


17/44

17

There are two rules in research

Rule #1 is that credibility and believability are vital components in research

In essence the researcher must be credible by conducting his/her researchwith integrity, honesty and within proper research etiquette

This leads to the next component of Rule #1 where the results (data input,data analysis and statistical interpretation) must be believable whichinvolves proper coding and the use of the appropriate statistical procedure

for analysis

Credibility and believability are the two critical aspects of any research forwithout them, the entire research process undertaken becomes aninsignificant exercise

Research Rules


18/44

18

Now, Rule # 2 is simply

Research Rules


19/44

19First learn Rule #1


20/44

20

Cowboy Proverb

These are critical rules in research because

if you do not have credibility as aresearcher, the results that are producedlack believability

And, as the ole Cowboy Proverb goes:


21/44

21

Cowboy Proverb

Dont dig for water

Under the Outhouse


22/44

22

Research Rules

In other words, dont expect believabilityand credibility with data that is polluted,

tainted or contaminated

By following the rules of research, you

maximize your credibility as a researcherand the believability of your results


23/44

23

----Sigma-------mathematical notation meaning to add up

X----variable that can represent any score in the distribution

------mu----------symbol for the mean of a population

X-bar-----------symbol for the mean of a sample

2 or S2----------symbol for the variance of a population

or S----------symbol for the standard deviation of a population

^

2 or s2 ----symbol for the variance of a sample

^

or s------symbol for the standard deviation of a sample

three dots (...)---symbolic representation which literally mean "and so on"

Statistical Symbols

The caret top ^ denotessample


24/44

24

Descriptive Statisticsis taking raw data and describingit in a meaningful way (to make sense out of data)generating a profile of that data set utilizing graphs,

charts, and other picturesque techniques to helpdisplay and interpret the data

Inferential Statisticsis taking the results fromdescriptive procedures of the raw data and subjectingthem to a higher order statistical procedure toreasonably infer results to a corresponding populationby following certain rules and assumptions

Statistical Terms


25/44

25

Parametric statisticsare concerned about a parameterof a given population, hence inferences can be madefrom the resulting analysis to the population of

concern

Non-parametric statistics, on the other hand, do notconform to any stringent assumptions, and thereforehave more latitude in proceduresbecause stringentassumptions are not strictly adhered to,we cannotconfidently generalize the results to a population

Statistical Terms


26/44

26

A variableis defined as a property of an eventor item that can be changed or can take ondifferent values

A dependent variableis called the measured,outcome, or criterion variable

An independent variableis the variable that ischanged, altered, or manipulated by theexperimenter during research

Statistical Terms


27/44

27

A qualitative variablerefers to non-numerical qualities, attributes, items

such as gender, eye color, etc.

A quantitative variableis concerned with

numerical qualities such as the numberof items falling into various categories ormeasurable data

Statistical Terms


28/44

28

Data are considered nominal strength if the assignment ofnumbers to objects does no more than identify the objects

An example of this would be a football jersey to identify a player on

the field

Data considered ordinal strength contain elements of thenominal scale of measurement plus the inclusion of an ordering ofobjects thereby implying magnitudecontaining objects that arelabeled, but also objects that are ranked in accordance to

importance

Military rank would be an example of ordinal data or lining up peopleaccording height with 1 being the smallest to 10 being the tallest

Data Strength


29/44

29

Data considered Interval strength contain all the elements ofnominal and ordinal scales (labeling and ordering) plus equalintervals between each item

A thermometer would be an example of interval strength data sincethe distance between 20 and 30 degrees is the same distancebetween 50 and 60 degreeshowever 60 degrees is not twice aswarm as 30 degrees since we can have minus degrees in temperature

Data considered Ratiostrength contain all elements of nominal,

ordinal, and interval strength (labeling, ordering, equal distancebetween items) plus the inclusion of an absolute zero

Height and weight are examples of ratio strength data since there isno negative weight or height

Data Strength


30/44

30

Measures of Central Tendency

How Data Gathers Around the Center of a Data Set


31/44

31


Mean

Median

Mode

Exact Middle of a DataSetData Must BeRanked From High toLow or Low to High

Most Frequently OccurringData Point

mean absolute deviation

[(X X-bar) = 0]

Mean, Median and Modeare located at the exactsame place on a normaldistribution

The sum of the deviation fromthe mean must equal zero


32/44

32

Group 1 Group 2

(X) (Y) .

Score (XX-bar) Score (Y X-bar)

72 - 75 = -3 67 - 75 = -8

73 - 75 = -2 72 - 75 = -3

76 - 75 = 1 76 - 75 = 1

76 - 75 = 1 76 - 75 = 1

78 - 75 = 3 84 - 75 = 9 .

X = 375 Y = 375[(X X-bar) = 0] [(Y X-bar) = 0]

N = 5 N = 5

X-bar = 75 (375/5) X-bar = 75 (375/5)

Median = 76 Median = 76Mode = 76 (two scores of 76) Mode = 76 (two scores of 76)



33/44

33

Measures of Variability

How Data is Dispersed Throughout a Data Set


34/44

34


Variance

StandardDeviation

RangeDifference Between Highest andLowest Numbers in a Data Set

X - (X)2 = n .

n

X - (X)

= n .n

Variance is the Square ofthe Standard Deviation

Standard Deviation is the

Square Root of the Variance


35/44

35


Group 1 Group 2

(X) X (Y) Y72 5,184 67 4,48973 5,329 72 5,18476 5,776 76 5,77676 5,776 76 5,776

78 6,084 84 7,056X = 375 X = 28,149 Y = 375 Y = 28,281N = 5 N = 5


36/44

36


Group 1

X - (X)

2 = n .

n

28,149 (375)

2

= 5 .5

28,149 (140,625)

2

= 5 .5

28,149 28,125

2 = 5

2 = 24

5

2 = 4.8 = 4.8 = 2.19

Group 2

Y - (Y)

2 = n .

n

28,281 (375)2 = 5 .

5

28,281 (140,625)2 = 5 .

5

28,281 28,125

2 = 5

2 = 156

5

2 = 31.2 = 31.2 = 5.59


37/44

37

Distribution Types

Normal

Skewed


38/44

38

Distribution Types

MesokurticMiddle Peaked

Leptokurtic

High-Peaked PlatykurticLow-Peaked

Normal Distribution

Also known as symmetrical, standard normal and z-normaldistributions

Right Half is

Mirror Image ofLeft Half

Kurtosis is how adistribution is peaked

If a distribution is notsymmetrical, then it isasymmetrical or skewedwhere the right half is notthe mirror image of theleft half

Tail points to the negative end of the number line Tail points to the positive end of the number line


39/44

39

Distribution Types

Roughly 68% of all

scores fall within onestandard deviation + or the mean

Roughly 95% of all scores fall withintwo standard deviations + or themean

Roughly 99.7% of all scores fall within three standard deviations+ or the mean

The remaining .003 percent isconsidered outliers that donot conform to the standardnormal population distributionabove 3 standard deviations +or the mean

The remaining .003 percent isconsidered outliers that donot conform to the standardnormal population distributionabove 3 standard deviations +or the mean


40/44

40

Relation of Mean andStandard Deviation

= 0

= 1

= 50

= 1020 30 40 50 60 70 80

= 50

= 244 46 48 50 52 54 56

The mean and standard deviation help determine the height (kurtosis) of a distribution through

the variability of scores dispersed throughout the data set

Range = 12 (56 - 44 = 12)

Moving more towards leptokurtic shape

Range = 60(80 20 = 60)

Moving more towards platykurtic shape

99.7% of scores fall between 3standard deviations plus or minusthe mean or between 20 and 80and between 44 and 56 in our

examples


41/44

41

Putting it all Together


42/44

42

Putting It All Together

Non Parametric

Parametric

Ratio

Interval

Ordinal

Nominal

__________________________________________________________

Data Type Data Strength Data Tests

One Sample z-testOne Sample t-testIndependent t-testDependent t-testANOVAPearson CorrelationRepeated Measures ANOVA

Mann-Whitney UWilcoxon TSpearman RhoKruskal-Wallis HFriedman ANOVA (ranks)

Chi-Square Goodness-Of-FitChi SquareTest of Independence

Mean

Median

Mode

AssumptionsNormally distributedVariableHomogeneity of VarianceNull hypothesis is trueat leastinterval strength data


43/44

43

Descriptive Statistics

Computer-Generated Analysis

(X) (Y)75 6276 7576 7677 76

78 85


44/44

Computer-Generated Results

76.400

1.140.510

5

75.000

78.000

0

1.300.015

3.000

382.000

29190.000

76.393

76.386

.272

-1.044

76.000

1.500

76.000

76.400

74.800

8.2283.680

5

62.000

85.000

0

67.700.110

23.000

374.000

28246.000

74.422

74.027

-.518

-.431

76.000

6.500

76.000

74.800

Mean

Std. Dev.Std. Error

Count

Minimum

Maximum

# Missing

VarianceCoef. Var.

Range

Sum

Sum Squares

Geom. Mean

Harm. Mean

Skew ness

Kurtosis

Median

IQR

Mode

10% Tr. Mean

Column 1 Column 2

Descriptive Statistics

Skewness is an asymmetrical distributionif skewnessis positive (negative), the data are skewed to the right

(left)the larger the number, the greater the skewnotice that Mean, Median and Mode are almostidentical giving an almost perfect normal distribution

Kurtosis refers to how peaked the distribution iswhen kurtosis = 3, it is a normal height distribution(Mesokurtic)Kurtosis > 3 is a high peakeddistribution (Leptokurtic)Kurtosis < 3 is a low

Documents

2 Slides for Chapters 1-4