2 Slides for Chapters 1-4

Embed Size (px)

Citation preview

  • 8/3/2019 2 Slides for Chapters 1-4

    1/44

    1

    A Statistical Journey:Taming of the Skew

    A Tutorial Of Chapters 1 4

    c. 2009 by Dr. Donald F. DeMoulinand Dr. William Allen Kritsonis

    These Slides May Not Be Altered or Modified

  • 8/3/2019 2 Slides for Chapters 1-4

    2/44

    2

    Topics Of This Lesson

    1) Introduction2) Review of the Basics3) Research Rules4) Statistical Symbols5) Statistical Terms6) Data Strength7) Measures of Central Tendency

    1) Mean2) Median

    3) Mode8) Measures of Variability

    1) Range2) Variance3) Standard Deviation

    9) Distribution Types10) Putting it all together

  • 8/3/2019 2 Slides for Chapters 1-4

    3/44

    3

    IntroductionMany wonder why a statistical concept is so hard to grasp?

    Have you ever tried to understand a native from Italy, France,Russia, China, or Japan

    It is because they speak a foreign language and something that isunfamiliar to your vocabulary

    In this realm, statisticians, most of the time, speak in a foreignlanguage; a language we will call Statonese (stat-n-eaze)

  • 8/3/2019 2 Slides for Chapters 1-4

    4/44

    4

    IntroductionFor ExampleHave you ever heard

    Four out of five dentists recommend Brand X toothpaste to helpfight cavities

    Nine out of 10 doctors stranded on a desert island recommendBrand Y aspirin for headache pain

    Five out of six farmers reported significant increases in yield fromusing Brand Z fertilizer

    These are just a few of the many thousands of examples for

    statistical applications

  • 8/3/2019 2 Slides for Chapters 1-4

    5/44

    5

    IntroductionBut, have you ever thought:

    1) What makes up the four out of five doctorsare thefour employed by Brand X toothpaste

    2) Who are the nine out of ten doctorsand why arethey stranded on a desert island

    3) What constitutes a significant increase in yield

  • 8/3/2019 2 Slides for Chapters 1-4

    6/44

    6

    IntroductionDeciphering what the numbers are and gaining an

    understanding of statistical procedures and concepts

    in order to make a somewhat accurate, independentjudgment of reports, statements, and claims is whatstatistics is all about

    Our discussions will minimize the guesswork aboutstatistics and maximize the WHEN, the WHY, andthe HOWthe basis for statistical applications

  • 8/3/2019 2 Slides for Chapters 1-4

    7/447

    IntroductionThe goal of these Power Point slides is to bring

    statistical concepts, applications, and

    explanations toyou in a language that can beunderstood of how statistical procedures aredeveloped, analyzed, and interpreted

    So, Let Us Begin!

  • 8/3/2019 2 Slides for Chapters 1-4

    8/448

    Review of the Basics A variable is usually represented by the symbolX

    If you have two variables, they are usually associatedwith the symbolsXandY, although this is not set instonecertainly usingA, B, C, or D is also acceptable

    A subscript after each variable denotes the numberedvariable (X1 X2 X3 X4 X5)

    For example, let us have two variables, X and Y, thatrepresent two different turtle races

  • 8/3/2019 2 Slides for Chapters 1-4

    9/449

    Review of the BasicsA hypothetical data set may look something like this:

    * Time for a turtle to complete a one-inch race

    * Number = five turtles for each race* Variable X = participants in race one

    * Variable Y = participants in race two

    Participant in Race one X1 X2 X3 X4 X5

    Time (in seconds) 45 41 30 28 59

    Participant in Race two Y1 Y2 Y3 Y4 Y5

    Time (in seconds) 25 36 56 51 43

  • 8/3/2019 2 Slides for Chapters 1-4

    10/4410

    Review of the BasicsParticipant in Race one X1 X2 X3 X4 X5Time (in seconds) 45 41 30 28 59

    Participant in Race two Y1 Y2 Y3 Y4 Y5Time (in seconds) 25 36 56 51 43

    What is the time logged for participating turtle X4?

    28 seconds

    Which participating turtle in race two logged a whopping 51 seconds forcompleting the race?

    Turtle Y4

  • 8/3/2019 2 Slides for Chapters 1-4

    11/4411

    Review of the BasicsThe Greek symbol means sum or to sum a set of

    numbers following it

    So, X would simply mean to sum all values of thevariable X

    _ X_______X1 = 45

    X2 = 41X3 = 30X4 = 28X5 = 59

    X = 203

  • 8/3/2019 2 Slides for Chapters 1-4

    12/4412

    Review of the BasicsThe notation X simply sums all the squared values of X

    For example, the hypothetical data set for race one would be:

    X X45 x 45 = 202541 x 41 = 168130 x 30 = 90028 x 28 = 78459 x 59 = 3481X = 203

    X= 8,871

    The X = 45 + 41 + 30 + 28 + 59 = 203

    X is equivalent to (45) + (41) + (30) + (28) + (59)

    The X = 2,025 + 1,681 + 900 + 784 + 3,481 = 8,871

  • 8/3/2019 2 Slides for Chapters 1-4

    13/4413

    Review of the BasicsHow about the (X)

    This identifier means to sum all the values of Xand square the answer

    X

    4541

    30

    28

    59

    X= 203 = (X) = (203) = 41,209

    The (X) would be 45 + 41 + 30 + 28 + 59 = 203

    Now, square the value 203 and you get 41,209

  • 8/3/2019 2 Slides for Chapters 1-4

    14/44

    14

    Review of the BasicsPlease, remember that X does not equal

    (X)

    X = 8,871

    (X) = 41,209

    X X45 x 45 = 202541 x 41 = 168130 x 30 = 90028 x 28 = 784

    59 x 59 = 3481X = 203 (203) = 41,209

    X= 8,871

  • 8/3/2019 2 Slides for Chapters 1-4

    15/44

    15

    Review of the BasicsFinally, we have XY

    This notation signifies a summing of products of corresponding values X and Y (crossproducts)

    X Y XY_____45 25 1,12541 36 1,47630 56 1,68028 51 1,42859 43 2,537

    X = 203 Y = 211 XY = 8,246

    All we do is multiply X and Y and put them in column labeled XY

    Fill in the missing spaces and add the column XYAlgebraically, it would be (45)(25) +(41)(36) + (30)(56) + (28)(51) + (59)(43) = 1,125 + 1,476 + 1,680 + 1,428 +2,537 = 8,246

  • 8/3/2019 2 Slides for Chapters 1-4

    16/44

    16

    Summary of the BasicsA variable is usually represented by the symbol X

    The Greek symbol means to sum a set of numbers following it

    X would simply mean to sum all values of the variable X

    X simply sums all the squared values of X

    (X) means to sum all the values of X and square the answer

    XY signifies a summing of products of corresponding values X and Y(cross products)

    Now we continue with the Rules of Research

  • 8/3/2019 2 Slides for Chapters 1-4

    17/44

    17

    There are two rules in research

    Rule #1 is that credibility and believability are vital components in research

    In essence the researcher must be credible by conducting his/her researchwith integrity, honesty and within proper research etiquette

    This leads to the next component of Rule #1 where the results (data input,data analysis and statistical interpretation) must be believable whichinvolves proper coding and the use of the appropriate statistical procedure

    for analysis

    Credibility and believability are the two critical aspects of any research forwithout them, the entire research process undertaken becomes aninsignificant exercise

    Research Rules

  • 8/3/2019 2 Slides for Chapters 1-4

    18/44

    18

    Now, Rule # 2 is simply

    Research Rules

  • 8/3/2019 2 Slides for Chapters 1-4

    19/44

    19First learn Rule #1

  • 8/3/2019 2 Slides for Chapters 1-4

    20/44

    20

    Cowboy Proverb

    These are critical rules in research because

    if you do not have credibility as aresearcher, the results that are producedlack believability

    And, as the ole Cowboy Proverb goes:

  • 8/3/2019 2 Slides for Chapters 1-4

    21/44

    21

    Cowboy Proverb

    Dont dig for water

    Under the Outhouse

  • 8/3/2019 2 Slides for Chapters 1-4

    22/44

    22

    Research Rules

    In other words, dont expect believabilityand credibility with data that is polluted,

    tainted or contaminated

    By following the rules of research, you

    maximize your credibility as a researcherand the believability of your results

  • 8/3/2019 2 Slides for Chapters 1-4

    23/44

    23

    ----Sigma-------mathematical notation meaning to add up

    X----variable that can represent any score in the distribution

    ------mu----------symbol for the mean of a population

    X-bar-----------symbol for the mean of a sample

    2 or S2----------symbol for the variance of a population

    or S----------symbol for the standard deviation of a population

    ^

    2 or s2 ----symbol for the variance of a sample

    ^

    or s------symbol for the standard deviation of a sample

    three dots (...)---symbolic representation which literally mean "and so on"

    Statistical Symbols

    The caret top ^ denotessample

  • 8/3/2019 2 Slides for Chapters 1-4

    24/44

    24

    Descriptive Statisticsis taking raw data and describingit in a meaningful way (to make sense out of data)generating a profile of that data set utilizing graphs,

    charts, and other picturesque techniques to helpdisplay and interpret the data

    Inferential Statisticsis taking the results fromdescriptive procedures of the raw data and subjectingthem to a higher order statistical procedure toreasonably infer results to a corresponding populationby following certain rules and assumptions

    Statistical Terms

  • 8/3/2019 2 Slides for Chapters 1-4

    25/44

    25

    Parametric statisticsare concerned about a parameterof a given population, hence inferences can be madefrom the resulting analysis to the population of

    concern

    Non-parametric statistics, on the other hand, do notconform to any stringent assumptions, and thereforehave more latitude in proceduresbecause stringentassumptions are not strictly adhered to,we cannotconfidently generalize the results to a population

    Statistical Terms

  • 8/3/2019 2 Slides for Chapters 1-4

    26/44

    26

    A variableis defined as a property of an eventor item that can be changed or can take ondifferent values

    A dependent variableis called the measured,outcome, or criterion variable

    An independent variableis the variable that ischanged, altered, or manipulated by theexperimenter during research

    Statistical Terms

  • 8/3/2019 2 Slides for Chapters 1-4

    27/44

    27

    A qualitative variablerefers to non-numerical qualities, attributes, items

    such as gender, eye color, etc.

    A quantitative variableis concerned with

    numerical qualities such as the numberof items falling into various categories ormeasurable data

    Statistical Terms

  • 8/3/2019 2 Slides for Chapters 1-4

    28/44

    28

    Data are considered nominal strength if the assignment ofnumbers to objects does no more than identify the objects

    An example of this would be a football jersey to identify a player on

    the field

    Data considered ordinal strength contain elements of thenominal scale of measurement plus the inclusion of an ordering ofobjects thereby implying magnitudecontaining objects that arelabeled, but also objects that are ranked in accordance to

    importance

    Military rank would be an example of ordinal data or lining up peopleaccording height with 1 being the smallest to 10 being the tallest

    Data Strength

  • 8/3/2019 2 Slides for Chapters 1-4

    29/44

    29

    Data considered Interval strength contain all the elements ofnominal and ordinal scales (labeling and ordering) plus equalintervals between each item

    A thermometer would be an example of interval strength data sincethe distance between 20 and 30 degrees is the same distancebetween 50 and 60 degreeshowever 60 degrees is not twice aswarm as 30 degrees since we can have minus degrees in temperature

    Data considered Ratiostrength contain all elements of nominal,

    ordinal, and interval strength (labeling, ordering, equal distancebetween items) plus the inclusion of an absolute zero

    Height and weight are examples of ratio strength data since there isno negative weight or height

    Data Strength

  • 8/3/2019 2 Slides for Chapters 1-4

    30/44

    30

    Measures of Central Tendency

    How Data Gathers Around the Center of a Data Set

  • 8/3/2019 2 Slides for Chapters 1-4

    31/44

    31

    Measures of Central Tendency

    Mean

    Median

    Mode

    Exact Middle of a DataSetData Must BeRanked From High toLow or Low to High

    Most Frequently OccurringData Point

    mean absolute deviation

    [(X X-bar) = 0]

    Mean, Median and Modeare located at the exactsame place on a normaldistribution

    The sum of the deviation fromthe mean must equal zero

  • 8/3/2019 2 Slides for Chapters 1-4

    32/44

    32

    Group 1 Group 2

    (X) (Y) .

    Score (XX-bar) Score (Y X-bar)

    72 - 75 = -3 67 - 75 = -8

    73 - 75 = -2 72 - 75 = -3

    76 - 75 = 1 76 - 75 = 1

    76 - 75 = 1 76 - 75 = 1

    78 - 75 = 3 84 - 75 = 9 .

    X = 375 Y = 375[(X X-bar) = 0] [(Y X-bar) = 0]

    N = 5 N = 5

    X-bar = 75 (375/5) X-bar = 75 (375/5)

    Median = 76 Median = 76Mode = 76 (two scores of 76) Mode = 76 (two scores of 76)

    Measures of Central Tendency

  • 8/3/2019 2 Slides for Chapters 1-4

    33/44

    33

    Measures of Variability

    How Data is Dispersed Throughout a Data Set

  • 8/3/2019 2 Slides for Chapters 1-4

    34/44

    34

    Measures of Variability

    Variance

    StandardDeviation

    RangeDifference Between Highest andLowest Numbers in a Data Set

    X - (X)2 = n .

    n

    X - (X)

    = n .n

    Variance is the Square ofthe Standard Deviation

    Standard Deviation is the

    Square Root of the Variance

  • 8/3/2019 2 Slides for Chapters 1-4

    35/44

    35

    Measures of Variability

    Group 1 Group 2

    (X) X (Y) Y72 5,184 67 4,48973 5,329 72 5,18476 5,776 76 5,77676 5,776 76 5,776

    78 6,084 84 7,056X = 375 X = 28,149 Y = 375 Y = 28,281N = 5 N = 5

  • 8/3/2019 2 Slides for Chapters 1-4

    36/44

    36

    Measures of Variability

    Group 1

    X - (X)

    2 = n .

    n

    28,149 (375)

    2

    = 5 .5

    28,149 (140,625)

    2

    = 5 .5

    28,149 28,125

    2 = 5

    2 = 24

    5

    2 = 4.8 = 4.8 = 2.19

    Group 2

    Y - (Y)

    2 = n .

    n

    28,281 (375)2 = 5 .

    5

    28,281 (140,625)2 = 5 .

    5

    28,281 28,125

    2 = 5

    2 = 156

    5

    2 = 31.2 = 31.2 = 5.59

  • 8/3/2019 2 Slides for Chapters 1-4

    37/44

    37

    Distribution Types

    Normal

    Skewed

  • 8/3/2019 2 Slides for Chapters 1-4

    38/44

    38

    Distribution Types

    MesokurticMiddle Peaked

    Leptokurtic

    High-Peaked PlatykurticLow-Peaked

    Normal Distribution

    Also known as symmetrical, standard normal and z-normaldistributions

    Right Half is

    Mirror Image ofLeft Half

    Kurtosis is how adistribution is peaked

    If a distribution is notsymmetrical, then it isasymmetrical or skewedwhere the right half is notthe mirror image of theleft half

    Tail points to the negative end of the number line Tail points to the positive end of the number line

  • 8/3/2019 2 Slides for Chapters 1-4

    39/44

    39

    Distribution Types

    Roughly 68% of all

    scores fall within onestandard deviation + or the mean

    Roughly 95% of all scores fall withintwo standard deviations + or themean

    Roughly 99.7% of all scores fall within three standard deviations+ or the mean

    The remaining .003 percent isconsidered outliers that donot conform to the standardnormal population distributionabove 3 standard deviations +or the mean

    The remaining .003 percent isconsidered outliers that donot conform to the standardnormal population distributionabove 3 standard deviations +or the mean

  • 8/3/2019 2 Slides for Chapters 1-4

    40/44

    40

    Relation of Mean andStandard Deviation

    = 0

    = 1

    = 50

    = 1020 30 40 50 60 70 80

    = 50

    = 244 46 48 50 52 54 56

    The mean and standard deviation help determine the height (kurtosis) of a distribution through

    the variability of scores dispersed throughout the data set

    Range = 12 (56 - 44 = 12)

    Moving more towards leptokurtic shape

    Range = 60(80 20 = 60)

    Moving more towards platykurtic shape

    99.7% of scores fall between 3standard deviations plus or minusthe mean or between 20 and 80and between 44 and 56 in our

    examples

  • 8/3/2019 2 Slides for Chapters 1-4

    41/44

    41

    Putting it all Together

  • 8/3/2019 2 Slides for Chapters 1-4

    42/44

    42

    Putting It All Together

    Non Parametric

    Parametric

    Ratio

    Interval

    Ordinal

    Nominal

    __________________________________________________________

    Data Type Data Strength Data Tests

    One Sample z-testOne Sample t-testIndependent t-testDependent t-testANOVAPearson CorrelationRepeated Measures ANOVA

    Mann-Whitney UWilcoxon TSpearman RhoKruskal-Wallis HFriedman ANOVA (ranks)

    Chi-Square Goodness-Of-FitChi SquareTest of Independence

    Mean

    Median

    Mode

    AssumptionsNormally distributedVariableHomogeneity of VarianceNull hypothesis is trueat leastinterval strength data

  • 8/3/2019 2 Slides for Chapters 1-4

    43/44

    43

    Descriptive Statistics

    Computer-Generated Analysis

    (X) (Y)75 6276 7576 7677 76

    78 85

  • 8/3/2019 2 Slides for Chapters 1-4

    44/44

    Computer-Generated Results

    76.400

    1.140.510

    5

    75.000

    78.000

    0

    1.300.015

    3.000

    382.000

    29190.000

    76.393

    76.386

    .272

    -1.044

    76.000

    1.500

    76.000

    76.400

    74.800

    8.2283.680

    5

    62.000

    85.000

    0

    67.700.110

    23.000

    374.000

    28246.000

    74.422

    74.027

    -.518

    -.431

    76.000

    6.500

    76.000

    74.800

    Mean

    Std. Dev.Std. Error

    Count

    Minimum

    Maximum

    # Missing

    VarianceCoef. Var.

    Range

    Sum

    Sum Squares

    Geom. Mean

    Harm. Mean

    Skew ness

    Kurtosis

    Median

    IQR

    Mode

    10% Tr. Mean

    Column 1 Column 2

    Descriptive Statistics

    Skewness is an asymmetrical distributionif skewnessis positive (negative), the data are skewed to the right

    (left)the larger the number, the greater the skewnotice that Mean, Median and Mode are almostidentical giving an almost perfect normal distribution

    Kurtosis refers to how peaked the distribution iswhen kurtosis = 3, it is a normal height distribution(Mesokurtic)Kurtosis > 3 is a high peakeddistribution (Leptokurtic)Kurtosis < 3 is a low