An Introduction to Statistical Problem
Solving in Geography - 2nd Edition
Chapter 2 - Summary
Cathy WalkerFebruary 13, 2010GEOG: 3000- Advanced Geographic StatisticsWinter Qrtr. 2010; P. Sutton
Definitions• Individual Level Data Sets - each data
value represents an individual element or unit of the phenomenon under study.
• Spatially Aggregated Data Sets - each value entered into the statistical analysis is a summary or spatial aggregation of individual units of information for a particular place or area.
• Ecological Fallacy - the invalid transfer of conclusions from spatially aggregated analysis to smaller areas or to the individual.
• Discrete Variable – a variable that has some restrictions placed on the values the variable can assume.
• Continuous Variable - a variable that has an infinite number of possible values along some interval of a real number line.• In general, discrete data are the result of
counting or tabulating the number of items, and potential values are limited to whole integers.
• Continuous data are the result of measurements, and values can be expressed as decimals.
• Quantitative - observations or responses are expressed numerically; units of data are assigned numerical values.
• Qualitative – each observation or response is assigned to one of two or more categories.
FOUR LEVELS OF MEASUREMENT
#1
• Each category is given a name or title, but no assumptions are made about any relationships between categories.
• Problems based on a nominal scale are considered categorical (qualitative).
• Two Necessary Conditions for Nominal Scale Classifications:
1. Categories are exhaustive; every value or unit of data can be assigned to a category.
2. Mutually exclusive; it is not possible to assign a value to more then one category because the categories do not overlap.
• Examples:• Religious Affiliation Classifications – Baptists, Catholic,
Methodist, Presbyterian, Mormon, Jewish, etc.• Political Party Affiliation – Democrat, Republican,
Independent
Nominal Scale
#2
• Values are placed in rank order.• More quantitative distinctions are possible than
with the nominal scale variables.• Strongly Ordered
• Each value or unit of data is given a particular position in a rank-order sequence
• Weakly Ordered • The values are placed in categories, and the categories
themselves are ranked ordered.
• Example:
Ordinal Scale
Top 10 best places to live in the U.S. No. 10: Des Moines, Iowa
No. 9: Charlotte, N.C. No. 8: Austin, Texas
No. 7: San Antonio, TexasNo. 6: Fort Collins, Colorado
No. 5: Omaha, Neb. No. 4: Houston, Texas
No. 3: Colorado Springs, Colorado No. 2: Boise, IdahoNo. 1: Raleigh, N.C.
#3
• Each value or unit is based on a measurement scale, and the interval between any two units of data on this scale can be measured.
• The origin or zero starting point is assigned arbitrarily (i.e. the origin does not have a “natural” or “real” meaning.
• Example:• The placement of the
zero degree point on these temperature scales is arbitrary; zero does not meana complete lack of heat.
Interval Scale
#4
• Each value or unit is based on a measurement scale, and the interval between any two units of data on this scale can be measured.
• The origin or zero starting point is “natural” or non-arbitrary, making it possible to determine the ratio between values.
• Example:• The measurement of precipitation from a rain gauge;
the ratio between 10 inches of rain and 5 inches of rain is precisely 2.
Ratio Scale
MEASUREMENT CONCEPTS
Precision & Accuracy
Precision – refers to the level of exactness associated with measurement.
Accuracy – refers to the extent of system wide bias in the measurement process.
It is possible for a measurement to be very precise yet inaccurate.
The Target Analogy
Case 1: Precise, Accurate Case 2: Precise, Inaccurate
Case 3: Imprecise, Accurate Case 4: Imprecise, Inaccurate
ValidityAddresses the measurement issues on the nature, meaning, or definition of a concept or variable.
To express the true meaning of multi-faceted concepts is often to difficult, so geographers often find it necessary to create operational definitions that can serve as indirect or surrogate measures for these variables.
Reliability
• Reliability problems often occur when using international data, since fully comparable and totally consistent methods of collecting data rarely exists from country to country.
• One way to assess the degree of reliability of a measurement instrument is to compare at least two applications of the data collection method used at different times.
When data are collected over time or when changes in spatial pattern are analyzed over time, the geographer must question the consistency and stability of the data.
BASIC CLASSIFICATION METHODS
Equal Intervals Based on Range
To determine class breaks, the range is divided into the desired number of equal-width class intervals
The range is simply the difference in magnitude between the smallest and largest values in an interval/ratio set of data.
Equal Intervals Not Based on Range
This classification method also designates class breaks to create equal-interval classes, but the exact range is not used to select the class breaks.
A convenient and practical interval width is selected arbitrarily, based on rounded-off class-break values.
This method if classification is preferred for constructing a frequency
distribution, histogram, or ogive to represent the data graphically.
Quantile BreaksThe total number of values is divided as equally as possible into the desired number of classes.
The allocation of an equal number of values to each category is often an advantage in choropleth mapping, particularly if an approximately equal area on the map is desired for each category.
The possible disadvantages of quantile breaks should also be
evaluated before deciding to use this method.
Natural BreaksThe most elementary natural-breaks method is known as the single-linkage approach.
The logic is to identify natural breaks in the data and separate values into different classes based on these breaks.
Similar values are kept together in the same category, dissimilar values are separated into different categories, and the gaps in the data are incorporated directly in the grouping procedure.
This method will highlight extreme values, placing unusual outliers of
data into their own unique categories.
WHAT CAN BE CONCLUDED ABOUT THE DISPARITIES AMONG
CLASSIFICATION METHODS?Depending on the classification method used, outcomes can be quite different, even though the same data is used and the same number of classes are created.
The logical conclusion is to recognize that any observed spatial pattern (map) is a function of the specific classification method applied and that using a different method of classification will likely result in a visually distinctive map.
GRAPHIC PROCEDURES
Definitions• Histogram - the frequency of values is
shown as a series of vertical bars, one for each value or class of values.
• When using categories instead of actual values along the horizontal scale of a histogram, classification by equal intervals not based on range is usually the best technique.
• Frequency Polygon - very similar to a histogram, except that the vertical position of each data value or class is shown as a point rather than a bar.
• Cumulative Frequency Diagram ( or Ogive) - instead of showing actual frequencies for each value or class, this graphic aggregates frequencies from value to value or class to class and displays the cumulative frequencies at each position.• The cumulative absolute frequencies can be
divided by the sum of all frequencies to obtain cumulative relative values or proportions.
• Scattergram (or Scatterplot) - shows the pattern of association or relationship between two variables ( a bivariate relationship)• If a set of observations is plotted, analysis of the
scatter of points suggests the amount and nature of association or relationship that exists between the two graphed variables.
Histogram Frequency Polygon
Cumulative Frequency Diagram
Scattergram
?? Questions ??