STATISTICS For Research. 1. Quantitatively describe and summarize data A Researcher Can:

Preview:

Citation preview

STATISTICSSTATISTICSFor ResearchFor Research

1.1. QuantitativelyQuantitatively describe and describe and

summarize datasummarize data

A Researcher Can:A Researcher Can:

A Researcher Can:A Researcher Can:2.2. Draw conclusionsDraw conclusions about large sets about large sets

of data by sampling only of data by sampling only small small portions of themportions of them

3.3. ObjectivelyObjectively measure differencesmeasure differences and and relationships between sets of data.relationships between sets of data.

A Researcher Can:A Researcher Can:

• Samples should be taken at Samples should be taken at randomrandom

• Each measurement has an Each measurement has an equal equal opportunityopportunity of being selected of being selected

• Otherwise, sampling Otherwise, sampling procedures may be procedures may be biasedbiased

Random SamplingRandom Sampling

• A characteristic CANNOT be A characteristic CANNOT be estimated from a single data estimated from a single data pointpoint

• ReplicatedReplicated measurements should measurements should be taken, at least be taken, at least 1010..

Sampling ReplicationSampling Replication

MechanicsMechanics

1.1. Write down a Write down a formulaformula

2.2. Substitute numbers Substitute numbers into the into the

formulaformula

3.3. Solve Solve for the for the

unknownunknown..

The Null HypothesisThe Null Hypothesis• HHoo = There is no difference = There is no difference

between 2 or more sets of databetween 2 or more sets of data– any difference is due to chance any difference is due to chance

alonealone

– Commonly set at a probability of Commonly set at a probability of

95% (P 95% (P .05) .05)

The Alternative HypothesisThe Alternative Hypothesis• HHAA = There = There isis a difference a difference

between 2 or more sets of databetween 2 or more sets of data– the difference is due to more than the difference is due to more than

just chancejust chance

– Commonly set at a probability of Commonly set at a probability of

95% (P 95% (P .05) .05)

AveragesAverages• Population Average = mean ( Population Average = mean ( x x ))

• a a PopulationPopulation mean mean = ( = ( ))– take the mean of a take the mean of a random samplerandom sample

from the population ( from the population ( n n ))

Population MeansPopulation Means

To find the population mean ( To find the population mean ( ),),• add up (add up (Σ) the values ) the values

((x x = grasshopper mass, tree = grasshopper mass, tree height)height)

• divide by the number of values (divide by the number of values (nn):):

= = xx — —

nn

Measures of VariabilityMeasures of Variability• Calculating a mean gives only a Calculating a mean gives only a partialpartial

description of a set of data description of a set of data

– Set A = 1, 6, 11, 16, 21Set A = 1, 6, 11, 16, 21– Set B = 10, 11, 11, 11, 12Set B = 10, 11, 11, 11, 12

• Means for A & B Means for A & B ????????????

• Need a measure of how variable Need a measure of how variable the data are.the data are.

RangeRange• DifferenceDifference between the largest and between the largest and

smallest valuessmallest values

– Set ASet A = 1, 6, 11, 16, 21 = 1, 6, 11, 16, 21

• Range = Range = ??????

– Set BSet B = 10, 11, 11, 11, 12 = 10, 11, 11, 11, 12

• Range = Range = ??????

Standard Standard DeviationDeviation

Standard DeviationStandard Deviation• A measure of the deviation of A measure of the deviation of

data data from their mean.from their mean.

The FormulaThe Formula

__________SDSD = = NN ∑XX2 - 2 - ((∑XX))22

________ ________

NN ( (NN-1)-1)

SD SymbolsSD SymbolsSDSD = Standard Dev = Standard Dev

= Square Root= Square Root

∑XX2 2 = Sum of x= Sum of x22’’dd

∑((XX))2 2 = Sum of x= Sum of x’’s, then squareds, then squared

NN = # of samples = # of samples

The FormulaThe Formula

__________SDSD = = NN ∑XX2 - 2 - ((∑XX))22

________ ________

NN ( (NN-1)-1)

XX XX22

297 297 88,209 88,209301 301 90,601 90,601306 306 93,636 93,636312 312 97,344 97,344314 314 98,596 98,596317 317 100,489 100,489325 325 105,625 105,625329 329 108,241 108,241334 334 111,556 111,556350 350 122,500122,500XX = 3,185 = 3,185 XX22 = 1,016,797 = 1,016,797

You can use your You can use your calculatorcalculator to find SD! to find SD!

Once YouOnce You’’ve got the Idea:ve got the Idea:

The Normal The Normal CurveCurve

The Normal The Normal CurveCurve

SD & the Bell CurveSD & the Bell Curve

% Increments% Increments

Skewed CurvesSkewed Curves

medianmedian

Critical ValuesCritical Values

Standard Deviations Standard Deviations 2 2 SD above SD above

or below the mean or below the mean ==

““due todue to more than chance alone.” more than chance alone.”

THIS MEANSTHIS MEANS: The data lies : The data lies outsideoutside the the 95%95% confidence limits for confidence limits for probability. probability. Your research shows there is Your research shows there is something significant going on...something significant going on...

Chi-SquareChi-Square

22

Chi-Square Test Chi-Square Test RequirementsRequirements

• QuantiQuantitative datatative data• Simple Simple randomrandom sample sample• One or more categoriesOne or more categories• Data in frequency (Data in frequency (%%) form) form

• Independent observationsIndependent observations

• AllAll observations must be used observations must be used

• Adequate sample size (Adequate sample size (10)10)

ExampleExample

Chi-Square SymbolsChi-Square Symbols 22 = = ΣΣ (O - E)(O - E) 22

EE

OO = = Observed FrequencyObserved Frequency

EE = = Expected FrequencyExpected Frequency

ΣΣ = = sum ofsum of

dfdf = = degrees of freedomdegrees of freedom ( (n n -1) -1)

22 = = Chi SquareChi Square

Chi-Square WorksheetChi-Square Worksheet

Chi-Square AnalysisChi-Square AnalysisTable value for Chi Square = Table value for Chi Square = 9.499.49 44 dfdf P = .05P = .05 level of significancelevel of significance

Is there a significant difference in car preference????

SD & the Bell CurveSD & the Bell Curve

T-TestsT-Tests

T-TestsT-TestsFor populations that For populations that do do follow a follow a

normalnormal distribution distribution

T-TestsT-Tests• To draw conclusions about To draw conclusions about

similarities or differences between similarities or differences between population means population means ( ( ))

• Is average plant biomass the same in Is average plant biomass the same in – two different geographical two different geographical areas areas ??????

– two different two different seasons seasons ??? ???

T-TestsT-Tests• To be COMPLETELYTo be COMPLETELY confident you confident you

would have to measure would have to measure allall plant plant biomass in each biomass in each area.area.– Is this

PRACTICAL?????

Instead:Instead:• Take one sample from Take one sample from eacheach

population.population.

• InferInfer from the sample means and from the sample means and standard deviation (SD) whether the standard deviation (SD) whether the populations have the populations have the samesame or or differentdifferent means. means.

AnalysisAnalysis• SMALLSMALL tt values = values = high high probability probability

that the two population means are that the two population means are the the samesame

• LARGE LARGE tt values = values = low low probability probability (the means are different)(the means are different)

AnalysisAnalysis TTcalculatedcalculated > > ttcritical critical = reject = reject HHoo

ttcriticalcritical ttcriticalcritical

We will be using computer We will be using computer analysis to perform the analysis to perform the

tt-test -test

SimpsonSimpson’’s s Diversity IndexDiversity Index

Nonparametric TestingNonparametric Testing• For populations that For populations that do NOTdo NOT follow follow

a a normalnormal distribution distribution

– includes includes most wild populationsmost wild populations

Answers the QuestionAnswers the Question• If 2 indiv are taken at RANDOM from If 2 indiv are taken at RANDOM from

a community, what is the probability a community, what is the probability that they will be the that they will be the SAME SAME speciesspecies????????

The FormulaThe Formula

DD = 1 - = 1 - nni i (n(ni i - 1)- 1) ————— —————

N (N-1)N (N-1)

ExampleExample

ExampleExample

D = D = 1- 50(49)+25(24)+10(9)1- 50(49)+25(24)+10(9)

——————————————————————

85(84)85(84)

DD = 0.56(medium diversity)

AnalysisAnalysis• Closer to Closer to 1.01.0 = =

– more more HomoHomogeneousgeneous community community (low diversity)(low diversity)

• Farther away from Farther away from 1.01.0 = = – more more HeteroHeterogeneous geneous community community (high (high

diversity)diversity)

• You can You can calculate by hand calculate by hand to to find find ““DD””

• School Stats package School Stats package MAYMAY calculate it.calculate it.

Designed by Anne F. Maben

Former AP Science Coach, LACOE

for the Los Angeles County Science FairLos Angeles County Science Fair

© 2013 All rights reserved

This presentation is for viewing only and may not be published in any form

Recommended