Research Methods - WordPress.com€¦ · • Range, Standard Deviation, Variance, Interquartile...

RESEARCH METHODS

EnP. Angelica N. Francisco

April 9, 2016

CHE Multi Purpose Hall

Short Course on Environmental Planning

DCERP & HUMEIN Phils. Inc.

Statistical data are usually obtained by counting or

measuring items. Most data can be put into the

following categories:

• Qualitative - data are measurements that each fail

into one of several categories. (hair color, ethnic

groups and other attributes of the population)

• Quantitative - data are observations that are

measured on a numerical scale (distance traveled

to college, number of children in a family, etc.)

Statistical Data

Types of Data

• Discrete or Scale Data

– values are numeric values on an interval or ratio

scale. Whole numbers that cannot be fractioned or

divided. (Age, number of houses, vehicles, scores

of game, etc.)

• Continuous

– based on Precision measurements, height,

weight, IQ, temperature, strength, endurance,

track, and field times, etc.

• Ordinal

– data values represent categories with some

intrinsic order or property of magnitude (vote yes,

no, or conditional, agree, partially agree, disagree,

letter grades – ABCDEF, etc).

• Nominal

– data values represent categories with no intrinsic

order (male/ female, Tagalog/ Visayan, blood type,

color of hair, parcel numbers, ID numbers, license

plate number, etc.

Types of Data

Primary data are collected specifically for the

analysis desired

Secondary data have already been compiled

and are available for statistical analysis

A variable refers to any characteristic of an

individual or entity. A variable can take different

values for different individuals. Variables can be

categorical or quantitative.

A constant has a fixed numerical value. a concept

that has only a single, never changing value

Statistical Data

Qualitative and quantitative variables may be further

subdivided:

Nominal

Qualitative

Ordinal

Variable

Discrete

Quantitative

Continuous

Two kinds of variables:

Qualitative, or Attribute, or Categorical, Variable: A variable

that categorizes or describes an element of a population.

Quantitative, or Numerical, Variable: A variable that

quantifies an element of a population.

Experiment: The investigator controls or modifies the

environment and observes the effect on the variable

under study.

Survey: Data are obtained by sampling some of the

population of interest. The investigator does not modify

the environment.

Census: A 100% survey. A census collects information

about every member of the population Every element of

the population is listed. Seldom used: difficult and time-

consuming to compile, and expensive.

Methods to collect data

Consolidation of Data

• Time Series Data – ordered data values observed over time

• Cross Section Data – data values observed at a fixed point in

STATISTICS

Statistics

a set of scientific tools used to collect, organize

and interpret numeric and non-numeric data and to

convert raw data into processed information

helpful to decision makers.

STATISTICS

• Science of data collection, summarization,

analysis and interpretation

• Descriptive versus Inferential Statistics:

– Descriptive Statistic: Data description

(summarization) such as center, variability and

shape.

– Inferential Statistic : Drawing conclusion

beyond the sample studied, allowing for prediction.

TYPES OF STATISTICS

• Descriptive statistics – Methods of organizing,

summarizing, and presenting data in an

informative way

• Inferential statistics – The methods used to

determine something about a population on the

basis of a sample

– Population –The entire set of individuals or objects of

interest or the measurements obtained from all

individuals or objects of interest

– Sample – A portion, or part, of the population of

interest Short Course on Environmental Planning

Descriptive Statistics

• Descriptive Statistics consists of procedures

used to present and summarize the information in

a set of measurements to describe the

characteristics of the whole set (whether a sample

or the population).

• Commonly used techniques are graphical

description, tabular description, and summary

statistics

• Examples: Demographics

• Frequency distribution is one way to display data.

Frequency Distribution

Age 1 2 3 4 5 6

Frequency 5 3 7 5 4 2

Frequency

Distribution of Age

Grouped Frequency

Distribution of Age:

Age Group 1-2 3-4 5-6

Frequency 8 12 6

• Frequency distribution – shows the frequency, or number of

occurences, in each of several categories. Frequency

distributions are used to summarize large volumes of data

values.

• Consider a data set of 26 children of ages 1-6 years. Then

the frequency distribution of variable „age‟ can be tabulated

as follows:

• Statistics describes a categorical set of data by

• Frequency, percentage or proportion of each

category

• Statistics describes a numeric set of data by its

• Center (mean, median, mode etc)

• Variability (standard deviation, range etc)

• Shape (skewness, kurtosis etc)

Statistical Desciption of Data

Measure of Central Tendency:

• Mean is what most people call the average

• Derived by adding all the numbers together and

then divide by the total number of data values

• The mean is distorted if you have just one

extreme value which can be a problem

• However, it is the most commonly used as it can

be used for further mathematical processing

• Mode is simply the most frequently occurring event

• If we are using simple numbers then the mode is the

most frequently occurring number

• If we are looking at data on the nominal scale

(grouped into categories) the mode is the most

common category.

• The mode is very quick to calculate, but it cannot be

used for further mathematical processing.

• It is not affected by extreme values.

Find the mode of this data set: 3,4,4,4,6,9

Find the mode of this nominal data:

land use -hectares

clover - 10

rye - 12

vegetables – 15

fruit - 3

wheat - 29

barley - 18

pasture - 17

• Median is the central value in a series of ranked

values.

• If there is an even number of values, the median

is the mid point between the two centrally placed

values.

• The median is not affected by extreme values

but it cannot be used for further mathematical

processing.

MEDIAN

Find the median of this data set: 3,4,4,4,6,9

(answer = 4)

Find the median of this data set: 3,4,4,6,6,9

(answer = 5)

MEDIAN

Symmetrical and Asymmetrical Data

• It has been observed that the natural variation of many variables tends to follow a bell-shaped distribution, with most values clustered symmetrically near the mean and few values falling out on the tails. This referred to as the normal distribution.

With a normally

distributed bell curve,

the mean, median

and mode all fall on

the same value.

It is asymmetrical if data set is not normally

distributed

Symmetrical and Asymmetrical Data

• Shape of data is measured by

– Skewness

– Kurtosis

Shape of Data

• Measures asymmetry

of data

– Positive or right

skewed: Longer right tail

– Negative or left

skewed: Longer left tail

Skewness

Then, ns.observatio be ,...,Let

Skewness

• Measures

peakedness of the

distribution of data.

The kurtosis of

normal distribution

Kurtosis

Then, ns.observatio be ,...,Let

Kurtosis

- Variability (or dispersion) measures the amount

of scatter in a dataset.

- statistics that concern the degree to which the

scores in a distribution are different from or

similar to each other.

- Concerned with the spread of data

• Range, Standard Deviation, Variance,

Interquartile Range, Coefficient of Variation

B. Measure of Variability or

Central Dispersion:

• The difference between the largest and the

smallest observations.

• The distance between the highest score and the

lowest score in a distribution

• The range of 10, 5, 2, 100 is (100-2)=98. It‟s a

crude measure of variability.

B. Measure of Variability or Dispersion:

• If we want to obtain

some measure of the

spread of our data

about its mean we

calculate its standard

deviation.

• Two sets of figures can

have the same mean

but very different

standard deviations

STANDARD DEVIATION

• Standard Deviation - the most

commonly used measure of

variability that indicates the average

to which the scores deviate from the

• The higher the standard deviation,

the greater the spread of data

around the mean.

• The standard deviation is the best

of the measures of spread as it

takes into account all of the values

under consideration.

STANDARD DEVIATION

Variance: The variance of a set of observations is the average of the

squares of the deviations of the observations from their mean. In

symbols, the variance of the n observations x1, x2,…xn is

Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is

)57()53()55( 222

)(....)( 22

xxxxS n

Standard Deviation: Square root of the variance. The standard deviation

of the above example is 2.

VARIANCE

Quartiles: Data can be divided into four regions

that cover the total range of observed values. Cut

points for these regions are known as quartiles.

The first quartile (Q1) is the first 25% of the data. The

second quartile (Q2) is between the 25th and 50th

percentage points in the data. The upper bound of Q2 is

the median. The third quartile (Q3) is the 25% of the data

lying between the median and the 75% cut point in the

QUARTILES

Step 1: Put the numbers in order.

1,2,5,6,7,9,12,15,18,19,27

Step 2: Find the median.

1,2,5,6,7,9,12,15,18,19,27

Step 3: Place parentheses around the numbers

above and below the median.

Not necessary statistically, but it makes Q1 and Q3

easier to spot.

(1,2,5,6,7),9,(12,15,18,19,27)

QUARTILES

Step 4: Find Q1 and Q3

Q1 can be thought of as a median in the lower half of

the data. Q3 can be thought of as a median for the

upper half of data.

(1,2,5,6,7), 9, ( 12,15,18,19,27). Q1=5 and Q3=18.

Step 5: Subtract Q1 from Q3 to find the interquartile

range.

18-5=13.

Percentiles: If data is ordered and divided into

100 parts, then cut points are called Percentiles.

25th percentile is the Q1, 50th percentile is the

Median (Q2) and the 75th percentile of the data is

Deciles: If data is ordered and divided into 10

parts, then cut points are called Deciles

DECILES AND PERCENTILES

In notations, percentiles of a data is the ((n+1)/100)p

the observation of the data, where p is the desired

percentile and n is the number of observations of

Coefficient of Variation: The standard deviation of

data divided by it‟s mean. It is usually expressed in

percent. 100

Coefficient of Variation =

DECILES AND PERCENTILES

Descriptive Statistics:

Data Presentation

• Scatter plots

• Diagrams

• Histograms

• Venn Diagrams

• Bar charts

• Line graphs

• Trend charts

• Pie Charts

• Flow Charts

Inferential Statistics

• Inferential statistics consists of procedures used

to infer (draw conclusions, make statements,

predict, decide) about certain characteristics of

one or more populations by examining

information contained in a sample from these

populations

• Inferential statistics help in reaching conclusions

that extend beyond the immediate data alone.

• The use of statistical tests, either to test for

significant relationships among variables or to

find statistical support for the hypotheses.

• This is based on the laws of probability.

Population vs. Sample

Population is the entire collection of things under

consideration.

• A parameter is a summary measure computed

to describe a characteristic of the population

Population vs. Sample

Sample is a portion of the population selected for

analysis.

• A statistic is a summary measure computed to

describe a characteristic of the sample

Sampling Techniques

SIMPLE RANDOM SAMPLING

• simple random sample (each sample of the same size

has an equal chance of being selected)

• Simple random sampling –using a random table of

numbers

• Simple random: units are randomly chosen from the

sampling frame

• Sampling frame is a list of all the individuals (units) in the

population from which the sample is taken.

Types of Probability Sampling

SYSTEMATIC RANDOM SAMPLING

• A sample in which every kth item of the sampling frame

is selected, starting from the first element which is

randomly selected from the first k elements.

• (randomly select a starting point and take every n-th piece of data from a listing of the population)

• number units within the sampling frame and select every 5th, 10th, etc.

• Systematic random sampling –take every 10th name

STRATIFIED RANDOM SAMPLING

• A sample obtained by stratifying the sampling frame and then selecting a fixed number of items from each of the strata by means of a simple random sampling technique.

• (divide the population into groups called strata and then take a sample from each stratum)

• Stratified random: random sampling of units within categories (strata) that are assumed to exist within a population

• Stratified random sampling –divide or stratify by gender and sample within group

CLUSTER SAMPLING

• A sample obtained by stratifying the sampling frame and then selecting some or all of the items from some of, but not all, the strata.

• (divide the population into strata and then randomly select some of the strata. All the members from these strata are in the cluster sample.)

• Clusters (each with multiple units) within a sampling frame are randomly selected.

• Cluster sampling – select units (clusters) in order to access patients or nurses

• Convenience sampling: selection based on

availability or ease of inclusion (first persons to walk in

the door)

• Purposive sampling: selection of individuals from

whom you may be inclined to get more data (patients

living with an illness)

• Quota sampling: selection on the basis of categories

that are assumed to exist within a population; a

sample obtained by stratifying the sampling frame and

then selecting a number of items in proportion to the

size of the strata (or by quota) from each strata by

means of a simple random sampling technique.

Quota – equal numbers of men & women

Types of Non-Probability

Sampling

• Estimation

– e.g., Estimate the population

mean weight using the sample

mean weight

• Hypothesis testing

– e.g., Test the claim that the

population mean weight is 70

Inference is the process of drawing conclusions or making decisions about a population based on sample results

Hypothesis Testing

• A hypothesis is an educated guess or predictive

statement expressed in falsifiable form that links

variables derived from theory and implies some

relationship between of cause and effect.

• Null hypothesis is the hypothesis of no

difference or where relationship between

variables cannot be found

• Type I error

– Claiming a difference between two samples when in fact there is none.

– Also called the error.

– Typically 0.05 is used

ERRORS

• Type II error

– Claiming there is no difference between two samples when in fact there is.

– Also called a error.

Hypothesis

Alternative

Hypothesis

No Error

Type I

Alternative

Hypothesis

Type II

No Error

Test Result

ERRORS

• Null hypothesis and Alternative hypothesis

Real Situation

Ho is true Ho is false

Reject Ho Type I

error (α)

Correct

Decision

Accept Ho Correct

Decision

(1- α)

Type II

Error ()

Hypothesis Testing

Level of Significance

• An important factor in determining the

representativeness of the sample population and

the degree to which the chance affects the

findings.

• The level of significance is a numerical value

selected by the researcher before data collection

to indicate the probability of erroneous findings

being accepted as true. This value is

represented typically as 0.01 or 0.05

• The probability of making a type I error

• If wanting to assume smaller risk level will be

set at 0.01

• Meaning researcher is willing to be wrong only

once in 100 trials

• Decision to use alpha level 0.05 or 0.01

depends of the study significance.

• Decreasing the risk of making a type I error

increases the risk of making a type II error.

Uses of Inferential Analysis :

1. T-test - is used to examine the difference

between the means of two independent groups.

2. Analysis of Variance (ANOVA) / F-tests- is

used to test the significance of differences

between means of two or more groups.

3. Chi-square - this is used to test hypotheses

about the proportion of elements that fall into

various cells of a contingency table

Comparison of 2 Sample Means

– Assumes normally distributed continuous data.

T value = difference between means

standard error of difference

• T value then looked up in Table to determine significance

STATISTICAL TESTS : T-test

• Used to determine if two or more samples are

from the same population- the null hypothesis.

• Usually used for 3 or more samples.

• If it appears they are not from same population,

can‟t tell which sample is different.

STATISTICAL TESTS :

Analysis of Variance (ANOVA)

• Used to compare observed proportions of an

event compared to expected.

• Used with nominal data (better/ worse;

dead/alive)

• If there is a substantial difference between

observed and expected, then it is likely that the

null hypothesis is rejected.

• Often presented graphically as a 2 X 2 Table

STATISTICAL TESTS :

(Pearson’s) Chi-Squared (2) Test

• Chi-Squared (2) Formula

• Not applicable in small samples

STATISTICAL TESTS :

(Pearson’s) Chi-Squared (2) Test

• Assesses the linear relationship between two variables – Example: height and weight

• Strength of the association is described by a correlation coefficient- r

• r = 0 - .2 low, probably meaningless

• r = .2 - .4 low, possible importance

• r = .4 - .6 moderate correlation

• r = .6 - .8 high correlation

• r = .8 - 1 very high correlation

• Can be positive or negative

• Pearson‟s, Spearman correlation coefficient

• Tells nothing about causation

CORRELATION ANALYSIS

• Finding the relationship between two quantitative

variables without being able to infer causal relationships

• Correlation is a statistical technique used to determine

the degree to which two variables are related

• Correlation measures to what extent two (or more)

variables are related

– Correlation expresses a relationship that is not necessarily

precise (e.g. height and weight)

– Positive correlation indicates that the two variables move in the

same direction

– Negative correlation indicates that they move in opposite

directions

CORRELATION ANALYSIS

• It is also called Pearson's correlation

• It measures the nature and strength between two variables

of the quantitative type.

• Statistic showing the degree of relation between two variables

• The correlation coefficient r gives a measure (in the range –1, +1) of the relationship between two variables

– r=0 means no correlation

– r=+1 means perfect positive correlation

– r=-1 means perfect negative correlation

• Perfect correlation indicates that a p% variation in x corresponds to a p% variation in y

CORRELATION COEFFICIENT

If the sign is POSITIVE this means the relation is direct (an increase in one variable is associated with an increase in the other variable and a decrease in one variable is associated with a decrease in the other variable).

While if the sign is NEGATIVE this means an inverse or indirect relationship (which means an increase in one variable is associated with a decrease in the other).

The value of r ranges between ( -1) and ( +1)

The value of r denotes the strength of the

association as illustrated

by the following diagram.

-1 1 0 -0.25 -0.75 0.75 0.25

strong strong intermediate intermediate weak weak

no relation

perfect

correlation

perfect

correlation

Direct indirect

• Level of correlation (value of the correlation coefficient): indicates to what extent the two variables “move together”

• Significance of correlation (p value): given that the correlation coefficient is computed on a sample, indicates whether the relationship appear to be statistically significant

• Examples

– Correlation is 0.50, but not significant: the sampling error is so high that the actual correlation could even be 0

– Correlation is 0.10 and highly significant: the level of correlation is very low, but we can be confident on the value of such correlation

SIGNIFICANCE LEVEL IN CORRELATION

• Two main ways to assess study precision and the role

of chance in a study.

– P value measures ( in probability) the evidence

against the null hypothesis.

– An interval within which the value of the parameter

lies with a specified probability

– E.g. 95% CI implies that if one repeats a study 100

times, the true measure of association will lie inside

the CI in 95 out of 100 measures

P-Value versus the Confidence Interval

• Based on fitting a line to data – Provides a regression coefficient, which is the slope

of the line

• Y = ax + b

– Use to predict a dependent variable‟s value based on the value of an independent variable.

• Very helpful- In analysis of height and weight, for a known height, one can predict weight.

• Much more useful than correlation – Allows prediction of values of Y rather than just

whether there is a relationship between two variable.

REGRESSION ANALYSIS

Linear regression analysis

i iy x

Dependent variable

Intercept

Regression

coefficient

Independent

variable

(explanatory

variable,

regressor…)

Regression Analyses

• The process of predicting variable Y using variable X

• Uses a variable (x) to predict some outcome

variable (y)

• Tells you how values in y change as a function of

changes in values of x

Linear Regression

20 30 40 50 60

Cholesterol (mg/100 ml) = 140.36 + 4.58 * age

R-Square = 0.65

The objective is to

identify the line (i.e.

the a and b

coefficients) that

minimise the distance

between the actual

points and the fit line

What regression analysis does

• Determine whether a relationships exist between the dependent and explanatory variables

• Determine how much of the variation in the dependent variable is explained by the independent variable (goodness of fit)

• Allow to predict the values of the dependent variable

Correlation and Regression

• Correlation: there is no causal relationship

assumed

• Regression: we assume that the

explanatory variables “cause” the

dependent variable

Correlation and Regression

• Correlation describes the strength of a linear

relationship between two variables

• Linear means “straight line”

• Regression tells us how to draw the straight line

described by the correlation

Regression

Calculates the “best-fit” line for a certain set of data

The regression line makes the sum of the squares of

the residuals smaller than for any other line Regression minimizes residuals

60 70 80 90 100 110 120Wt (kg)

SBP(mmHg)

Qualitative Data Analysis

• Comparative

Analysis

• Gender Analysis

• Cause and Effect

Diagrams / Impact

Chain Analysis

• Problem Tree

Analysis

• Fault-Tree Analysis

• Flow Charts

• Affinity Grouping

Qualitative Research Methods

Interviews

• Ethnographic interviews (Spradley, 1979)

• Contextual interviews (Holtzblatt and Jones, 1995)

Ethnographic observation (Spradley, 1980)

Participatory design sessions (Sanders, 2005)

Field deployments

Qualitative Research Methods

• Windshield Survey and Field Surveys

• Rapid Rural Appraisal

• Transect Mapping

• Problem Identification through imagery

• Experts‟ opinion – delphi technique

• Key Informant Interviews – local experts, sectoral leaders

• Focused Group Discussions

• Brainstorming with key stakeholders

• Case Studies

THANK YOU!

Research Methods - WordPress.com€¦ · • Range, Standard Deviation, Variance, Interquartile...

Documents

Measures of Variability Range Interquartile range Variance Standard deviation Coefficient of variation

Finding Interquartile Range from Dot Plot 1

Measures of Variation Range Standard Deviation Variance

Probability and Statistics Guide1. Calculate the range, mean, median, ﬁrst and third quartiles, interquartile range, mode, variance, and standard deviation for the following population

GRADE 11 GENERAL MATHEMATICS 11.3: STATISTICSfode.education.gov.pg/courses/Mathematics/Grade 11...11.3.3.2 Quartile Deviation or Semi-Interquartile Range 106 11.3.3.3 Average Deviation

community project - statstutor · Ordinal Mode, median Range, interquartile range (IQR) Interval/scale Median, mean Variance, standard deviation Range The range is defined as the

Stem & Leaf Plots. Objective: 7.4.02 Calculate, use, and interpret the mean, median, mode, range, frequency distribution, and interquartile range for

Chapter 3 Describing Distributions Numerically. Describing the Distribution Center –Median –Mean Spread –Range –Interquartile Range –Standard Deviation

Finding Interquartile Range from Cumulative Frequency Histogram Polygon

Department of Education - GRADE 11 GENERAL MATHEMATICS … 11... · 2019. 4. 25. · 11.3.3.2 Quartile Deviation or Semi-Interquartile Range 106 11.3.3.3 Average Deviation 110 11.3.3.4

T he interquartile range, the range and the mode

STATEMENT OF WORK PEO-STRI-14-W055 UH-72A … · 24/09/2014 · 1.1 Background ... the interquartile range, and the sample standard deviation) shall be ... The following documents

COMPUTER ORIENTED · Range, The 10-90 Percentile Range, The Standard Deviation, The Variance, Short Methods for Computing the Standard Deviation, Properties of the Standard Deviation,

Finding Interquartile Range from Dot Plot 2

15-Apr-15Created by Mr. Lafferty1 Statistics Mode, Mean, Median and Range Semi-Interquartile Range ( SIQR ) Nat 5 Quartiles Boxplots

1) Quantitative Numerical data - meaning the data you are …€¦ · Mean: Median: Mode: Range: Interquartile Range: Standard Deviation: Z-Score (any three): Box and Whisker Plot:

Overview Interpret Median and Interquartile Range in Box Plots · 2020-03-19 · ©Curriculum Associates, LLC Copying is not permitted. LESSON 31 Interpret Median and Interquartile

Performance Assessment Task Suzi’s Company Grade 7 …s... · Common Core State Standards Math ... (interquartile range, standard deviation) ... • calculate and interpret mean,

H.1. Cytotoxicity Test Statistical Analysis Output · Std. Deviation 5,64849 Minimum 88,67 Maximum 106,72 Range 18,05 Interquartile Range 9,55 Skewness -,479 ,536 Kurtosis -,972 1,038

Finding Interquartile Range Introduction