53
1 STA 213 STA 213 Statistical Methods Statistical Methods A brief introduction to some of the basics of biostatistics

1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

  • View
    222

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

1

STA 213STA 213Statistical MethodsStatistical Methods

A brief introduction to some of the basics of biostatistics

Page 2: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

2

OutlineOutline

1. What is biostatistics?

2. Units of analysis

3. What is a variable?

4. Levels of Measurement

Page 3: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

3

1. What is biostatistics?1. What is biostatistics?

According to Zar:

Statistics is the analysis and interpretation of data with a view toward objective evaluation of the reliability of the conclusions based on the data.

Page 4: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

4

2. Units of Analysis2. Units of Analysis

One of the most basic concepts in scientific research, and one we must know about before getting into anything else, is the “unit of analysis”, i.e., the “what” or “who” is being studied, or the “level” at which the data are collected

Definitions

Neuman (glossary): “The kind of empirical case or unit that a researcher observes, measures, and analyses in a study”

Babbie: “Those units that we initially describe for the ultimate purpose of aggregating their characteristics in order to describe some larger group”

Page 5: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

5

Units of AnalysisUnits of Analysis

Each unit of analysis is equivalent to a case, i.e., all the cases in a study are the individual units of analysis.

We make observations about (usually) many aspects or attributes of these cases.

e.g., the individuals of this species are the units of analysis; each has various attributes which can be aggregated to make statements about the species as a whole.

Page 6: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

6

Units of AnalysisUnits of Analysis

Important to note that the species is NOT the unit of analysis.

Also, once we have aggregated upwards, we cannot disaggregate downwards (except in probabilistic terms) unless we have retained the original observations.

Page 7: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

7

Univariate analysisUnivariate analysis

There are basically three types of analysis:• Univariate• Bivariate• Multivariate.

Page 8: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

8

Univariate analysisUnivariate analysis

Univariate analysis represents the examination of just one variable (X1), and is concerned solely with description (but may include aggregation as a first step in identifying and understanding patterns).

Descriptions can be in:• numerical forms - averages (means, modes, medians)

or frequency distributions.• graphic – visual representation of the numerical data

Page 9: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

9

Mean: average data point.

Median: point at which half of data are higher value, half are lower.

Mode: most frequent data point.

Page 10: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

10

Univariate analysisUnivariate analysis

Descriptions of a single variable can be useful and interesting in themselves, because:

• they may identify previously unexpected patterns, or confirm what was expected (e.g. adherence to environmental views) -- graphs are particularly useful for this purpose.

• they provide us with the basic information on which further analysis can proceed.

Page 11: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

11

Univariate analysisUnivariate analysis

The emphasis in univariate analysis is still descriptive, i.e. “what”, not “why”.

It is necessary and usually desirable to go beyond this and ask “why?” ‑‑ i.e. to find regularities and explanations by introducing other variables into the analysis and looking for associations.

Page 12: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

12

Bivariate analysisBivariate analysis

Represents the simultaneous examination of 2 variables (X1, Y1), as in relationships (will discuss types of relationships in detail later in this section of the course).

Thus, one variable is seen to influence, explain, or be related to or associated with another variable.

Represents a major step towards understanding (e.g. influences, causes, conditions under which occurrences happen).

Page 13: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

13

Page 14: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

14

Bivariate analysisBivariate analysis

Depending on the form in which the data exist, relationships between two variables can be expressed in several ways:

• t-test• ANOVA• linear regression• chi-square• etc.

Page 15: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

15

Multivariate analysisMultivariate analysis

The limitation of bivariate analysis is that everything else other than that single relationship which might affect a phenomenon is excluded.

In the real world there is rarely a single cause of influence on, or explanation of, something else. Rather, something occurs because of multiple causes or factors, which:

• act simultaneously or together.• are themselves often interrelated with one another.

Page 16: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

16

Multivariate analysisMultivariate analysis

For example, ecosystem type may be a function of, among many other things:

• soil type

• soil nutrient level

• soil drainage

• insolation

• temperature

• slope

• aspect

• wind

In multivariate analysis, we wish to look atthe influence of two or more of these factors (X1, X2…Xn)simultaneously

Page 17: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

17

Multivariate analysisMultivariate analysis

Simply stated, multivariate analysis represents a group of statistical and analytical techniques to:

• examine the combined effect of two or more variables.

• assess their relative strength.• discard less important factors.• provide more accurate predictions.

Page 18: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

18

Multivariate analysisMultivariate analysis

While much analysis in biological and environmental science is conducted at the bivariate level, we should always strive, both in design and analysis, for at least the opportunity to conduct multivariate analysis.

• may require large samples.• may require that data are collected in a format that is

amenable to the use of particular kinds of multivariate statistics.

Page 19: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

19

3. What is a variable?3. What is a variable?

It is a logical grouping of attributes or characteristics of individuals (or some other unit of analysis).

It is the range of categories or values of an attribute.

Examples:• male/female are attributes; species is the variable.• Cold/warm are attributes of certain places;

temperature is the variable.

Page 20: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

20

Variable characteristicsVariable characteristics

A variable has two important characteristics:• it should usually be exhaustive ‑‑ i.e., theoretically, it

should include all possible categories.• it should usually be mutually exclusive ‑‑ it should be

concerned with only one attribute.

It is possible to develop composite variables by combining two or more basic variables.

Page 21: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

21

Different variable typesDifferent variable types

Knowing the definitions of, and differences among, types of variables is crucial because failure to grasp this can lead to:

• improper definition of relationships.• incorrect specification of statistical descriptions and

analyses.• faulty interpretations and explanations.

Page 22: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

22

Variable typesVariable types

There are four types of variables:

A. experimental (independent).

B. dependent.

C. controlled.

D. uncontrolled.

Page 23: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

23

A. Experimental variablesA. Experimental variables

Referred to as independent, causal, predictor, or explanatory variables.

We are interested in the effects of these variables on others, i.e. how does this variable affect something else?

• How does pH of water influence spawning success in fish?

Page 24: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

24

Experimental variablesExperimental variables

As the independent or experimental variable changes, we can expect some form of change in the affected variable.

Examples:• effect of age on fecundity.• effect of size on use of a habitat type.• effect of weather on flight distance of migratory

birds.

Page 25: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

25

Experimental variablesExperimental variables

Are used to predict changes in affected variables.

Alternatively, we can understand or explain variations in the affected variable with reference to the experimental variable.

Page 26: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

26

B. Dependent variablesB. Dependent variables

These are the “affected” variables, i.e. the ones we wish to understand, explain, predict ‑‑ they are usually the focus of the investigation.

They are called “dependent” because attributes for Y depend on the conditions of the experimental (independent) variable X.

Page 27: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

27

Dependent variablesDependent variables

They are usually the focus of the investigation.

Eg., we might begin by noting a pattern of variation in a variable of interest, and then ask “Why?” by introducing experimental variables.

Page 28: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

28

Dependent vs. ExperimentalDependent vs. Experimental

The relationship between the Dependent and the Experimental (Independent) variable can be expressed in the form of flow diagrams.

Page 29: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

29

Page 30: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

30

Dependent variablesDependent variables

Sometimes it is hard to identify which is the dependent and which is the independent variable.

Two useful rules‑of‑thumb for distinguishing between them doing so:

• time order (latter one is usually dependent).• alterability (more alterable one is usually dependent).

Page 31: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

31

Variable statusVariable status

Important to remember:• the status of a particular variable is not always fixed,

but depends on the research question being asked.• variable that is dependent in one analysis can become

the experimental variable in another analysis, and vice versa.

Page 32: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

32

Complexity with variablesComplexity with variables

When the level of complexity regarding the number of variables under consideration increases, or we want to answer complex questions (“How to account for the variability with two independent variables?”), then we need to use multivariate analysis methods.

Page 33: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

33

C. Control variableC. Control variable

These are variables over which we exert some control in a variety of possible ways. Methods of control include:

• study design.• statistical analysis.

Page 34: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

34

D. Uncontrolled variableD. Uncontrolled variable

These are all the variables over which we do not wish to, or cannot, exert control for the purpose of a particular analysis, i.e., they are excluded from data collection or from a particular data analysis.

Page 35: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

35

4. Levels of measurement4. Levels of measurement

There are 4 levels of measurement of variables:• Nominal.• Ordinal.• Interval.• Ratio.

The level/type of measurement you choose will affect the types of statistical analysis which are most appropriate and applicable, and thus, which inferences you can make.

Page 36: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

36

Designing questionsDesigning questions

When designing a question, it is advisable to:• look ahead to the desired type of analysis.• Descriptive vs. Explanatory vs. Predictive study.• choose an appropriate level of measurement for this

analysis.

Page 37: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

37

Designing questionsDesigning questions

Knowing about levels of measurement can help to understand some of:

• the opportunities and limitations of manipulating data.

• the ability to collapse data into more general categories.

Page 38: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

38

Nominal measuresNominal measures

Distinct categories between individuals; they do not imply a rank order, e.g.:

• sex.• habitat type.• region.• herbivore, carnivore, omnivore.

Is the most basic level of categorization.

Page 39: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

39

Ordinal measuresOrdinal measures

Reflect a rank‑order between categories but the intervals between categories are usually not equal; thus ordinal data cannot be averaged. E.g.:

• frequency of feeding (seldom, often).• adult vs. young-of-year.• categories of distance (near, intermediate, far).

Identifies a difference among categories, allowing them to be ranked.

Page 40: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

40

Interval measuresInterval measures

Identifies differences among variable attributes, ranks and categories.

Measures distance between categories, without being able to use zero as a reference point.

Common examples:• animal behaviour as a function of temperature.

Page 41: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

41

Interval measuresInterval measures

Same as for ordinal measures (a rank order between categories), but intervals between categories are assumed to be equal.

However, the data:• are not independent of scale (e.g. temperature).• can be averaged, but cannot be expressed as ratios.

Page 42: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

42

Ratio measuresRatio measures

The most precise (“highest”) level of measurement:• variable attributes can be rank ordered.• distance between attributes precisely measured.• absolute zero exists (reference point).

Page 43: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

43

Ratio measuresRatio measures

Have four characteristics:• real measurements or numbers.• with a true zero (i.e., absolute absence of the

attribute).• can be expressed as fractions or ratios.• ratios are independent of the scale.

Example: age, size, distance travelled, number of eggs, etc.

Page 44: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

44

Level of quantitative

precision

Analytical flexibility

Nominal Low Lowest

Ordinal Low Low

Interval Medium High

Ratio High Highest

Page 45: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

45

ImplicationsImplications

The choice of data measurement has several implications with regards to the types of analyses you can and cannot do with the data.

An important principle in the manipulation of data is that:

• ordinal, interval, & ratio measures can be converted “downwards”.

• nominal, ordinal, and interval measures cannot be converted “upwards”.

Page 46: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

46

Examples of data aggregationExamples of data aggregation

Real distance can be converted to an ordinal measure, but not vice versa.• 1-5 km = Near category• 6-15 km = Intermediate category• 16-30 km = Far category

Number of visits can be converted to frequency of visit categories, and subsequently to visitor/non‑visitor, but not vice-versa.

Page 47: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

47

Data aggregationData aggregation

Useful for putting data “into shape” to facilitate analysis, eg., contingency (chi-square) tables.

This process moves from greater detail to more generality. The process does not work in the other direction.

Page 48: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

48

Level affects analysisLevel affects analysis

The level of measurement will affect type of analysis that is possible:

• if both variables are interval or ratio – correlation analysis.

• if only one variable is interval or ratio - ANOVA.• if neither variable is interval or ratio – chi-square.

Page 49: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

49

Level affects analysisLevel affects analysis

The higher the level of measurement, the greater your choices for analytical methods. Eg.:

• interval and ratio measures can be converted to use methods other than correlation.

Alternatively, low levels of measurement dictate a much narrower range of appropriate analyses.

Page 50: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

50

Remember…Remember…

To maximize your chances at producing a successful project, data should always be collected at the highest possible level of measurement.

There are 2 caveats to this principle.

Page 51: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

51

Caveats - 1Caveats - 1

Is this level of detail really necessary?

Will we use the data in this form?

Will the data we collect enhance the project results?

In other words, do the expected returns merit the effort and expense?

Page 52: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

52

Caveats - 2Caveats - 2

You may not able to obtain detailed information (e.g., exact size of branches used for feeding by woodpeckers) whereas they would have no problem with broad categories (small, medium, large branch).

This may result in missing values, thus reducing N (the sample size) for analysis and even exclusion of a variable if insufficient N.

Page 53: 1 STA 213 Statistical Methods A brief introduction to some of the basics of biostatistics

53

Next ClassNext Class

Samples, populations

and

frequency distributions