Download pptx - An Introduction to Statistical Problem Solving in Geography - 2 nd Edition Chapter 2 - Summary Cathy Walker February 13, 2010 GEOG: 3000- Advanced Geographic

An Introduction to Statistical Problem

Solving in Geography - 2nd Edition

Chapter 2 - Summary

Cathy WalkerFebruary 13, 2010GEOG: 3000- Advanced Geographic StatisticsWinter Qrtr. 2010; P. Sutton

Definitions• Individual Level Data Sets - each data

value represents an individual element or unit of the phenomenon under study.

• Spatially Aggregated Data Sets - each value entered into the statistical analysis is a summary or spatial aggregation of individual units of information for a particular place or area.

• Ecological Fallacy - the invalid transfer of conclusions from spatially aggregated analysis to smaller areas or to the individual.

• Discrete Variable – a variable that has some restrictions placed on the values the variable can assume.

• Continuous Variable - a variable that has an infinite number of possible values along some interval of a real number line.• In general, discrete data are the result of

counting or tabulating the number of items, and potential values are limited to whole integers.

• Continuous data are the result of measurements, and values can be expressed as decimals.

• Quantitative - observations or responses are expressed numerically; units of data are assigned numerical values.

• Qualitative – each observation or response is assigned to one of two or more categories.

FOUR LEVELS OF MEASUREMENT

#1

• Each category is given a name or title, but no assumptions are made about any relationships between categories.

• Problems based on a nominal scale are considered categorical (qualitative).

• Two Necessary Conditions for Nominal Scale Classifications:

1. Categories are exhaustive; every value or unit of data can be assigned to a category.

2. Mutually exclusive; it is not possible to assign a value to more then one category because the categories do not overlap.

• Examples:• Religious Affiliation Classifications – Baptists, Catholic,

Methodist, Presbyterian, Mormon, Jewish, etc.• Political Party Affiliation – Democrat, Republican,

Independent

Nominal Scale

#2

• Values are placed in rank order.• More quantitative distinctions are possible than

with the nominal scale variables.• Strongly Ordered

• Each value or unit of data is given a particular position in a rank-order sequence

• Weakly Ordered • The values are placed in categories, and the categories

themselves are ranked ordered.

• Example:

Ordinal Scale

Top 10 best places to live in the U.S. No. 10: Des Moines, Iowa

No. 9: Charlotte, N.C. No. 8: Austin, Texas

No. 7: San Antonio, TexasNo. 6: Fort Collins, Colorado

No. 5: Omaha, Neb. No. 4: Houston, Texas

No. 3: Colorado Springs, Colorado No. 2: Boise, IdahoNo. 1: Raleigh, N.C.

#3

• Each value or unit is based on a measurement scale, and the interval between any two units of data on this scale can be measured.

• The origin or zero starting point is assigned arbitrarily (i.e. the origin does not have a “natural” or “real” meaning.

• Example:• The placement of the

zero degree point on these temperature scales is arbitrary; zero does not meana complete lack of heat.

Interval Scale

#4

• Each value or unit is based on a measurement scale, and the interval between any two units of data on this scale can be measured.

• The origin or zero starting point is “natural” or non-arbitrary, making it possible to determine the ratio between values.

• Example:• The measurement of precipitation from a rain gauge;

the ratio between 10 inches of rain and 5 inches of rain is precisely 2.

Ratio Scale

MEASUREMENT CONCEPTS

Precision & Accuracy

Precision – refers to the level of exactness associated with measurement.

Accuracy – refers to the extent of system wide bias in the measurement process.

It is possible for a measurement to be very precise yet inaccurate.

The Target Analogy

Case 1: Precise, Accurate Case 2: Precise, Inaccurate

Case 3: Imprecise, Accurate Case 4: Imprecise, Inaccurate

ValidityAddresses the measurement issues on the nature, meaning, or definition of a concept or variable.

To express the true meaning of multi-faceted concepts is often to difficult, so geographers often find it necessary to create operational definitions that can serve as indirect or surrogate measures for these variables.

Reliability

• Reliability problems often occur when using international data, since fully comparable and totally consistent methods of collecting data rarely exists from country to country.

• One way to assess the degree of reliability of a measurement instrument is to compare at least two applications of the data collection method used at different times.

When data are collected over time or when changes in spatial pattern are analyzed over time, the geographer must question the consistency and stability of the data.

BASIC CLASSIFICATION METHODS

Equal Intervals Based on Range

To determine class breaks, the range is divided into the desired number of equal-width class intervals

The range is simply the difference in magnitude between the smallest and largest values in an interval/ratio set of data.

Equal Intervals Not Based on Range

This classification method also designates class breaks to create equal-interval classes, but the exact range is not used to select the class breaks.

A convenient and practical interval width is selected arbitrarily, based on rounded-off class-break values.

This method if classification is preferred for constructing a frequency

distribution, histogram, or ogive to represent the data graphically.

Quantile BreaksThe total number of values is divided as equally as possible into the desired number of classes.

The allocation of an equal number of values to each category is often an advantage in choropleth mapping, particularly if an approximately equal area on the map is desired for each category.

The possible disadvantages of quantile breaks should also be

evaluated before deciding to use this method.

Natural BreaksThe most elementary natural-breaks method is known as the single-linkage approach.

The logic is to identify natural breaks in the data and separate values into different classes based on these breaks.

Similar values are kept together in the same category, dissimilar values are separated into different categories, and the gaps in the data are incorporated directly in the grouping procedure.

This method will highlight extreme values, placing unusual outliers of

data into their own unique categories.

WHAT CAN BE CONCLUDED ABOUT THE DISPARITIES AMONG

CLASSIFICATION METHODS?Depending on the classification method used, outcomes can be quite different, even though the same data is used and the same number of classes are created.

The logical conclusion is to recognize that any observed spatial pattern (map) is a function of the specific classification method applied and that using a different method of classification will likely result in a visually distinctive map.

GRAPHIC PROCEDURES

Definitions• Histogram - the frequency of values is

shown as a series of vertical bars, one for each value or class of values.

• When using categories instead of actual values along the horizontal scale of a histogram, classification by equal intervals not based on range is usually the best technique.

• Frequency Polygon - very similar to a histogram, except that the vertical position of each data value or class is shown as a point rather than a bar.

• Cumulative Frequency Diagram ( or Ogive) - instead of showing actual frequencies for each value or class, this graphic aggregates frequencies from value to value or class to class and displays the cumulative frequencies at each position.• The cumulative absolute frequencies can be

divided by the sum of all frequencies to obtain cumulative relative values or proportions.

• Scattergram (or Scatterplot) - shows the pattern of association or relationship between two variables ( a bivariate relationship)• If a set of observations is plotted, analysis of the

scatter of points suggests the amount and nature of association or relationship that exists between the two graphed variables.

Histogram Frequency Polygon

Cumulative Frequency Diagram

Scattergram

?? Questions ??