View
20
Download
0
Category
Preview:
Citation preview
RESEARCH METHODS
EnP. Angelica N. Francisco
April 9, 2016
CHE Multi Purpose Hall
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Statistical data are usually obtained by counting or
measuring items. Most data can be put into the
following categories:
• Qualitative - data are measurements that each fail
into one of several categories. (hair color, ethnic
groups and other attributes of the population)
• Quantitative - data are observations that are
measured on a numerical scale (distance traveled
to college, number of children in a family, etc.)
Statistical Data
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Types of Data
• Discrete or Scale Data
– values are numeric values on an interval or ratio
scale. Whole numbers that cannot be fractioned or
divided. (Age, number of houses, vehicles, scores
of game, etc.)
• Continuous
– based on Precision measurements, height,
weight, IQ, temperature, strength, endurance,
track, and field times, etc.
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
• Ordinal
– data values represent categories with some
intrinsic order or property of magnitude (vote yes,
no, or conditional, agree, partially agree, disagree,
letter grades – ABCDEF, etc).
• Nominal
– data values represent categories with no intrinsic
order (male/ female, Tagalog/ Visayan, blood type,
color of hair, parcel numbers, ID numbers, license
plate number, etc.
Types of Data
Primary data are collected specifically for the
analysis desired
Secondary data have already been compiled
and are available for statistical analysis
A variable refers to any characteristic of an
individual or entity. A variable can take different
values for different individuals. Variables can be
categorical or quantitative.
A constant has a fixed numerical value. a concept
that has only a single, never changing value
Statistical Data
Qualitative and quantitative variables may be further
subdivided:
Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Two kinds of variables:
Qualitative, or Attribute, or Categorical, Variable: A variable
that categorizes or describes an element of a population.
Quantitative, or Numerical, Variable: A variable that
quantifies an element of a population.
Experiment: The investigator controls or modifies the
environment and observes the effect on the variable
under study.
Survey: Data are obtained by sampling some of the
population of interest. The investigator does not modify
the environment.
Census: A 100% survey. A census collects information
about every member of the population Every element of
the population is listed. Seldom used: difficult and time-
consuming to compile, and expensive.
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Methods to collect data
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Consolidation of Data
• Time Series Data – ordered data values observed over time
• Cross Section Data – data values observed at a fixed point in
time
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
STATISTICS
Statistics
a set of scientific tools used to collect, organize
and interpret numeric and non-numeric data and to
convert raw data into processed information
helpful to decision makers.
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
STATISTICS
• Science of data collection, summarization,
analysis and interpretation
• Descriptive versus Inferential Statistics:
– Descriptive Statistic: Data description
(summarization) such as center, variability and
shape.
– Inferential Statistic : Drawing conclusion
beyond the sample studied, allowing for prediction.
TYPES OF STATISTICS
• Descriptive statistics – Methods of organizing,
summarizing, and presenting data in an
informative way
• Inferential statistics – The methods used to
determine something about a population on the
basis of a sample
– Population –The entire set of individuals or objects of
interest or the measurements obtained from all
individuals or objects of interest
– Sample – A portion, or part, of the population of
interest Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Descriptive Statistics
• Descriptive Statistics consists of procedures
used to present and summarize the information in
a set of measurements to describe the
characteristics of the whole set (whether a sample
or the population).
• Commonly used techniques are graphical
description, tabular description, and summary
statistics
• Examples: Demographics
• Frequency distribution is one way to display data.
Frequency Distribution
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Frequency
Distribution of Age
Grouped Frequency
Distribution of Age:
Age Group 1-2 3-4 5-6
Frequency 8 12 6
• Frequency distribution – shows the frequency, or number of
occurences, in each of several categories. Frequency
distributions are used to summarize large volumes of data
values.
• Consider a data set of 26 children of ages 1-6 years. Then
the frequency distribution of variable „age‟ can be tabulated
as follows:
• Statistics describes a categorical set of data by
• Frequency, percentage or proportion of each
category
• Statistics describes a numeric set of data by its
• Center (mean, median, mode etc)
• Variability (standard deviation, range etc)
• Shape (skewness, kurtosis etc)
Statistical Desciption of Data
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Measure of Central Tendency:
MEAN
• Mean is what most people call the average
• Derived by adding all the numbers together and
then divide by the total number of data values
• The mean is distorted if you have just one
extreme value which can be a problem
• However, it is the most commonly used as it can
be used for further mathematical processing
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
• Mode is simply the most frequently occurring event
• If we are using simple numbers then the mode is the
most frequently occurring number
• If we are looking at data on the nominal scale
(grouped into categories) the mode is the most
common category.
• The mode is very quick to calculate, but it cannot be
used for further mathematical processing.
• It is not affected by extreme values.
Measure of Central Tendency:
MODE
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Find the mode of this data set: 3,4,4,4,6,9
Find the mode of this nominal data:
Measure of Central Tendency:
MODE
land use -hectares
clover - 10
rye - 12
vegetables – 15
fruit - 3
wheat - 29
barley - 18
pasture - 17
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
• Median is the central value in a series of ranked
values.
• If there is an even number of values, the median
is the mid point between the two centrally placed
values.
• The median is not affected by extreme values
but it cannot be used for further mathematical
processing.
Measure of Central Tendency:
MEDIAN
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Find the median of this data set: 3,4,4,4,6,9
(answer = 4)
Find the median of this data set: 3,4,4,6,6,9
(answer = 5)
Measure of Central Tendency:
MEDIAN
Symmetrical and Asymmetrical Data
• It has been observed that the natural variation of many variables tends to follow a bell-shaped distribution, with most values clustered symmetrically near the mean and few values falling out on the tails. This referred to as the normal distribution.
With a normally
distributed bell curve,
the mean, median
and mode all fall on
the same value.
It is asymmetrical if data set is not normally
distributed
Symmetrical and Asymmetrical Data
• Shape of data is measured by
– Skewness
– Kurtosis
Shape of Data
• Measures asymmetry
of data
– Positive or right
skewed: Longer right tail
– Negative or left
skewed: Longer left tail
2/3
1
2
1
3
21
)(
)(
Skewness
Then, ns.observatio be ,...,Let
n
i
i
n
i
i
n
xx
xxn
nxxx
Skewness
• Measures
peakedness of the
distribution of data.
The kurtosis of
normal distribution
is 0.
3
)(
)(
Kurtosis
Then, ns.observatio be ,...,Let
2
1
2
1
4
21
n
i
i
n
i
i
n
xx
xxn
nxxx
Kurtosis
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
- Variability (or dispersion) measures the amount
of scatter in a dataset.
- statistics that concern the degree to which the
scores in a distribution are different from or
similar to each other.
- Concerned with the spread of data
• Range, Standard Deviation, Variance,
Interquartile Range, Coefficient of Variation
B. Measure of Variability or
Central Dispersion:
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
• The difference between the largest and the
smallest observations.
• The distance between the highest score and the
lowest score in a distribution
• The range of 10, 5, 2, 100 is (100-2)=98. It‟s a
crude measure of variability.
B. Measure of Variability or Dispersion:
RANGE
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
• If we want to obtain
some measure of the
spread of our data
about its mean we
calculate its standard
deviation.
• Two sets of figures can
have the same mean
but very different
standard deviations
B. Measure of Variability or Dispersion:
STANDARD DEVIATION
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
• Standard Deviation - the most
commonly used measure of
variability that indicates the average
to which the scores deviate from the
mean.
• The higher the standard deviation,
the greater the spread of data
around the mean.
• The standard deviation is the best
of the measures of spread as it
takes into account all of the values
under consideration.
B. Measure of Variability or Dispersion:
STANDARD DEVIATION
Variance: The variance of a set of observations is the average of the
squares of the deviations of the observations from their mean. In
symbols, the variance of the n observations x1, x2,…xn is
Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is
413
)57()53()55( 222
1
)(....)( 22
12
n
xxxxS n
Standard Deviation: Square root of the variance. The standard deviation
of the above example is 2.
B. Measure of Variability or Dispersion:
VARIANCE
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Quartiles: Data can be divided into four regions
that cover the total range of observed values. Cut
points for these regions are known as quartiles.
The first quartile (Q1) is the first 25% of the data. The
second quartile (Q2) is between the 25th and 50th
percentage points in the data. The upper bound of Q2 is
the median. The third quartile (Q3) is the 25% of the data
lying between the median and the 75% cut point in the
data.
B. Measure of Variability or Dispersion:
QUARTILES
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
B. Measure of Variability or Dispersion:
QUARTILES
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Step 1: Put the numbers in order.
1,2,5,6,7,9,12,15,18,19,27
Step 2: Find the median.
1,2,5,6,7,9,12,15,18,19,27
Step 3: Place parentheses around the numbers
above and below the median.
Not necessary statistically, but it makes Q1 and Q3
easier to spot.
(1,2,5,6,7),9,(12,15,18,19,27)
B. Measure of Variability or Dispersion:
QUARTILES
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Step 4: Find Q1 and Q3
Q1 can be thought of as a median in the lower half of
the data. Q3 can be thought of as a median for the
upper half of data.
(1,2,5,6,7), 9, ( 12,15,18,19,27). Q1=5 and Q3=18.
Step 5: Subtract Q1 from Q3 to find the interquartile
range.
18-5=13.
Percentiles: If data is ordered and divided into
100 parts, then cut points are called Percentiles.
25th percentile is the Q1, 50th percentile is the
Median (Q2) and the 75th percentile of the data is
Q3.
Deciles: If data is ordered and divided into 10
parts, then cut points are called Deciles
B. Measure of Variability or Dispersion:
DECILES AND PERCENTILES
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
In notations, percentiles of a data is the ((n+1)/100)p
the observation of the data, where p is the desired
percentile and n is the number of observations of
data.
Coefficient of Variation: The standard deviation of
data divided by it‟s mean. It is usually expressed in
percent. 100
x
Coefficient of Variation =
B. Measure of Variability or Dispersion:
DECILES AND PERCENTILES
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Descriptive Statistics:
Data Presentation
• Scatter plots
• Diagrams
• Histograms
• Venn Diagrams
• Bar charts
• Line graphs
• Trend charts
• Pie Charts
• Flow Charts
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Inferential Statistics
• Inferential statistics consists of procedures used
to infer (draw conclusions, make statements,
predict, decide) about certain characteristics of
one or more populations by examining
information contained in a sample from these
populations
• Inferential statistics help in reaching conclusions
that extend beyond the immediate data alone.
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
• The use of statistical tests, either to test for
significant relationships among variables or to
find statistical support for the hypotheses.
• This is based on the laws of probability.
Inferential Statistics
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Population vs. Sample
Population is the entire collection of things under
consideration.
• A parameter is a summary measure computed
to describe a characteristic of the population
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Population vs. Sample
Sample is a portion of the population selected for
analysis.
• A statistic is a summary measure computed to
describe a characteristic of the sample
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Sampling Techniques
SIMPLE RANDOM SAMPLING
• simple random sample (each sample of the same size
has an equal chance of being selected)
• Simple random sampling –using a random table of
numbers
• Simple random: units are randomly chosen from the
sampling frame
• Sampling frame is a list of all the individuals (units) in the
population from which the sample is taken.
Types of Probability Sampling
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
SYSTEMATIC RANDOM SAMPLING
• A sample in which every kth item of the sampling frame
is selected, starting from the first element which is
randomly selected from the first k elements.
• (randomly select a starting point and take every n-th piece of data from a listing of the population)
• number units within the sampling frame and select every 5th, 10th, etc.
• Systematic random sampling –take every 10th name
Types of Probability Sampling
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
STRATIFIED RANDOM SAMPLING
• A sample obtained by stratifying the sampling frame and then selecting a fixed number of items from each of the strata by means of a simple random sampling technique.
• (divide the population into groups called strata and then take a sample from each stratum)
• Stratified random: random sampling of units within categories (strata) that are assumed to exist within a population
• Stratified random sampling –divide or stratify by gender and sample within group
Types of Probability Sampling
CLUSTER SAMPLING
• A sample obtained by stratifying the sampling frame and then selecting some or all of the items from some of, but not all, the strata.
• (divide the population into strata and then randomly select some of the strata. All the members from these strata are in the cluster sample.)
• Clusters (each with multiple units) within a sampling frame are randomly selected.
• Cluster sampling – select units (clusters) in order to access patients or nurses
Types of Probability Sampling
• Convenience sampling: selection based on
availability or ease of inclusion (first persons to walk in
the door)
• Purposive sampling: selection of individuals from
whom you may be inclined to get more data (patients
living with an illness)
• Quota sampling: selection on the basis of categories
that are assumed to exist within a population; a
sample obtained by stratifying the sampling frame and
then selecting a number of items in proportion to the
size of the strata (or by quota) from each strata by
means of a simple random sampling technique.
Quota – equal numbers of men & women
Types of Non-Probability
Sampling
• Estimation
– e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g., Test the claim that the
population mean weight is 70
kg
Inference is the process of drawing conclusions or making decisions about a population based on sample results
Inferential Statistics
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Hypothesis Testing
• A hypothesis is an educated guess or predictive
statement expressed in falsifiable form that links
variables derived from theory and implies some
relationship between of cause and effect.
• Null hypothesis is the hypothesis of no
difference or where relationship between
variables cannot be found
• Type I error
– Claiming a difference between two samples when in fact there is none.
– Also called the error.
– Typically 0.05 is used
ERRORS
• Type II error
– Claiming there is no difference between two samples when in fact there is.
– Also called a error.
Null
Hypothesis
H0
Alternative
Hypothesis
H1
Null
Hypothesis
H0
No Error
Type I
Alternative
Hypothesis
H1
Type II
No Error
Test Result
Truth
ERRORS
• Null hypothesis and Alternative hypothesis
Real Situation
D
e
c
i
s
i
o
n
Ho is true Ho is false
Reject Ho Type I
error (α)
Correct
Decision
(1-)
Accept Ho Correct
Decision
(1- α)
Type II
Error ()
Hypothesis Testing
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Level of Significance
• An important factor in determining the
representativeness of the sample population and
the degree to which the chance affects the
findings.
• The level of significance is a numerical value
selected by the researcher before data collection
to indicate the probability of erroneous findings
being accepted as true. This value is
represented typically as 0.01 or 0.05
Level of Significance
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
• The probability of making a type I error
• If wanting to assume smaller risk level will be
set at 0.01
• Meaning researcher is willing to be wrong only
once in 100 trials
• Decision to use alpha level 0.05 or 0.01
depends of the study significance.
• Decreasing the risk of making a type I error
increases the risk of making a type II error.
Level of Significance
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Inferential Statistics
Uses of Inferential Analysis :
1. T-test - is used to examine the difference
between the means of two independent groups.
2. Analysis of Variance (ANOVA) / F-tests- is
used to test the significance of differences
between means of two or more groups.
3. Chi-square - this is used to test hypotheses
about the proportion of elements that fall into
various cells of a contingency table
Comparison of 2 Sample Means
– Assumes normally distributed continuous data.
T value = difference between means
standard error of difference
• T value then looked up in Table to determine significance
STATISTICAL TESTS : T-test
• Used to determine if two or more samples are
from the same population- the null hypothesis.
• Usually used for 3 or more samples.
• If it appears they are not from same population,
can‟t tell which sample is different.
STATISTICAL TESTS :
Analysis of Variance (ANOVA)
• Used to compare observed proportions of an
event compared to expected.
• Used with nominal data (better/ worse;
dead/alive)
• If there is a substantial difference between
observed and expected, then it is likely that the
null hypothesis is rejected.
• Often presented graphically as a 2 X 2 Table
STATISTICAL TESTS :
(Pearson’s) Chi-Squared (2) Test
• Chi-Squared (2) Formula
• Not applicable in small samples
STATISTICAL TESTS :
(Pearson’s) Chi-Squared (2) Test
• Assesses the linear relationship between two variables – Example: height and weight
• Strength of the association is described by a correlation coefficient- r
• r = 0 - .2 low, probably meaningless
• r = .2 - .4 low, possible importance
• r = .4 - .6 moderate correlation
• r = .6 - .8 high correlation
• r = .8 - 1 very high correlation
• Can be positive or negative
• Pearson‟s, Spearman correlation coefficient
• Tells nothing about causation
CORRELATION ANALYSIS
• Finding the relationship between two quantitative
variables without being able to infer causal relationships
• Correlation is a statistical technique used to determine
the degree to which two variables are related
• Correlation measures to what extent two (or more)
variables are related
– Correlation expresses a relationship that is not necessarily
precise (e.g. height and weight)
– Positive correlation indicates that the two variables move in the
same direction
– Negative correlation indicates that they move in opposite
directions
CORRELATION ANALYSIS
• It is also called Pearson's correlation
• It measures the nature and strength between two variables
of the quantitative type.
• Statistic showing the degree of relation between two variables
• The correlation coefficient r gives a measure (in the range –1, +1) of the relationship between two variables
– r=0 means no correlation
– r=+1 means perfect positive correlation
– r=-1 means perfect negative correlation
• Perfect correlation indicates that a p% variation in x corresponds to a p% variation in y
CORRELATION COEFFICIENT
If the sign is POSITIVE this means the relation is direct (an increase in one variable is associated with an increase in the other variable and a decrease in one variable is associated with a decrease in the other variable).
While if the sign is NEGATIVE this means an inverse or indirect relationship (which means an increase in one variable is associated with a decrease in the other).
The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the
association as illustrated
by the following diagram.
-1 1 0 -0.25 -0.75 0.75 0.25
strong strong intermediate intermediate weak weak
no relation
perfect
correlation
perfect
correlation
Direct indirect
• Level of correlation (value of the correlation coefficient): indicates to what extent the two variables “move together”
• Significance of correlation (p value): given that the correlation coefficient is computed on a sample, indicates whether the relationship appear to be statistically significant
• Examples
– Correlation is 0.50, but not significant: the sampling error is so high that the actual correlation could even be 0
– Correlation is 0.10 and highly significant: the level of correlation is very low, but we can be confident on the value of such correlation
SIGNIFICANCE LEVEL IN CORRELATION
• Two main ways to assess study precision and the role
of chance in a study.
– P value measures ( in probability) the evidence
against the null hypothesis.
– An interval within which the value of the parameter
lies with a specified probability
– E.g. 95% CI implies that if one repeats a study 100
times, the true measure of association will lie inside
the CI in 95 out of 100 measures
P-Value versus the Confidence Interval
• Based on fitting a line to data – Provides a regression coefficient, which is the slope
of the line
• Y = ax + b
– Use to predict a dependent variable‟s value based on the value of an independent variable.
• Very helpful- In analysis of height and weight, for a known height, one can predict weight.
• Much more useful than correlation – Allows prediction of values of Y rather than just
whether there is a relationship between two variable.
REGRESSION ANALYSIS
Linear regression analysis
i iy x
Dependent variable
Intercept
Regression
coefficient
Independent
variable
(explanatory
variable,
regressor…)
Error
Regression Analyses
• The process of predicting variable Y using variable X
• Uses a variable (x) to predict some outcome
variable (y)
• Tells you how values in y change as a function of
changes in values of x
Linear Regression
20 30 40 50 60
Age
200
300
400
Ch
ole
ste
rol
(mg
/100
ml)
Cholesterol (mg/100 ml) = 140.36 + 4.58 * age
R-Square = 0.65
The objective is to
identify the line (i.e.
the a and b
coefficients) that
minimise the distance
between the actual
points and the fit line
What regression analysis does
• Determine whether a relationships exist between the dependent and explanatory variables
• Determine how much of the variation in the dependent variable is explained by the independent variable (goodness of fit)
• Allow to predict the values of the dependent variable
Correlation and Regression
• Correlation: there is no causal relationship
assumed
• Regression: we assume that the
explanatory variables “cause” the
dependent variable
Correlation and Regression
• Correlation describes the strength of a linear
relationship between two variables
• Linear means “straight line”
• Regression tells us how to draw the straight line
described by the correlation
Regression
Calculates the “best-fit” line for a certain set of data
The regression line makes the sum of the squares of
the residuals smaller than for any other line Regression minimizes residuals
80
100
120
140
160
180
200
220
60 70 80 90 100 110 120Wt (kg)
SBP(mmHg)
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Qualitative Data Analysis
• Comparative
Analysis
• Gender Analysis
• Cause and Effect
Diagrams / Impact
Chain Analysis
• Problem Tree
Analysis
• Fault-Tree Analysis
• Flow Charts
• Affinity Grouping
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Qualitative Research Methods
Interviews
• Ethnographic interviews (Spradley, 1979)
• Contextual interviews (Holtzblatt and Jones, 1995)
Ethnographic observation (Spradley, 1980)
Participatory design sessions (Sanders, 2005)
Field deployments
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Qualitative Research Methods
• Windshield Survey and Field Surveys
• Rapid Rural Appraisal
• Transect Mapping
• Problem Identification through imagery
• Experts‟ opinion – delphi technique
• Key Informant Interviews – local experts, sectoral leaders
• Focused Group Discussions
• Brainstorming with key stakeholders
• Case Studies
76
THANK YOU!
Short Course on Environmental Planning
DCERP & HUMEIN Phils. Inc.
Recommended