Upload
antony-jayson-warner
View
214
Download
1
Embed Size (px)
Citation preview
DATA ANALYSIS IN RESEARCH
Dr. N. NAGESHAProfessor
Dept. of Industrial & Production EngineeringUniversity BDT College of Engineering
(A Constituent College of VTU, Belagavi)
DAVANAGERE – 577 004
18 March 2015
Presentation Plan
Introduction (Definition of Research)
Classification of Research
Research Methodology Vs Methods
Process of Research
Research Design
Data analysis
(Univariate, Bivariate, and Multivariate)
2Dr.NN, VTU_3DWSRML_ROB
What is Research?
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 3
“Research is the systematic approach to obtaining and confirming new and reliable knowledge”– Systematic and orderly (following a series of steps)– Purpose is new knowledge, which must be reliable
This is a general definition which applies to all disciplines
Research is not ……
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 4
Accidental Discovery1. Accidental discovery may occur in structured research process2. Usually takes the form of a phenomenon not previously noticed
Mere Data Collection 1. An intermediate step to gain reliable knowledge2. Collecting reliable data is part of the research process
Searching out Published Research results in Libraries/Internet1.This is an important early step of research2.The research process always includes synthesis and analysis3.But, just reviewing of literature is not research
But, Research is…
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 5
Searching for Explanation of Events, Phenomena, Relationships and Causes:
1. What, how, and why things occur2. Are there interactions
A Process: 1. Planned and managed – to make the information generated credible 2. The process is creative
3. All well designed and conducted research has potential application. 4. Researchers are also responsible to help users understand research implications
Types of Research
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 6
Basic Vs Applied Research• Basic – to determine or establish fundamental facts and relationships
within a discipline or field of study. It develop theories.
• Applied – undertaken specifically for the purpose of obtaining information to help resolve a particular problem.
• The distinction between them is the Application Basic has little application to real world policy and management but could be done to guide applied research.
Classification of Research
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 7
• Descriptive Research – the attempt to determine, describe, or identify something– The intent is often synthesis, which pulls knowledge or
information together• Analytic – the attempt to establish why something occurs or
how it came to be.
All disciplines generally engage in both
Methodology Vs Methods of Research
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 8
Methodology and Method are often (incorrectly) interchangeablyused
•Methodology – the study of the general approach to inquiry in a given field.
•Method – the specific techniques, tools or procedures applied to achieve a given objective
Research methods include regression analysis, mathematical
analysis, OR, DOE, surveys, data gathering, etc.
The Process of Research
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 9
1. Selection of topic
2. Reviewing the literature
3. Development of theoretical and conceptual frameworks
4. Clarification of research question/hypothesis
5. Research design
6. Data collection
7. Data analysis
8. Drawing conclusions
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 10
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 11
Asking the Research Question
Formulating theHypotheses
Evaluating the Hypotheses
Analyzing Data
Develop a research design
Contribute new
evidence to literature and begin
again
THEORY
Examine a social relationship, study the
relevant literature
Collecting Data
The Process of Research
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 12
• Research is a creative process• “…research includes far more than mere logic … It
includes insight, genius, groping, pondering – ‘sense’ … The logic we can teach; the art we cannot”
• Research requires (or at least works best) with imagination, initiative, intuition, and curiosity.
• There are different types of creativity, characteristic of different situations – “applied” and “theoretical” most closely associate with research
Fostering Creativity (Ladd 1987)
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 13
A. Gather and use previously developed knowledgeB. Exchange ideasC. Apply deductive logicD. Look at things alternate waysE. Question or challenge assumptionsF. Search for patterns or relationshipsG. Take risksH. Cultivate tolerance for uncertainty
Creativity may provide the difference between satisfactory and outstanding research
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 14
Asking the Research Question
Formulating theHypotheses
Evaluating the Hypotheses
Analyzing Data
Develop a research design
Contribute new
evidence to literature and begin
again
THEORY
Examine a social relationship, study the
relevant literature
Collecting Data
Research Design
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 15
• Decisions regarding what, where, when, how much • A research design – the arrangement of conditions for collection and analysis of data in a
manner that aims to combine relevance to the research purpose with economy in procedure.
Research Design
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 16
• The designing decisions happen to be in respect of:– (i) What is the study about?– (ii) Why is the study being made?– (iii) Where will the study be carried out?– (iv) What type of data is required?– (v) Where can the required data be found?– (vi) What periods of time will the study include?– (vii) What will be the sample design?– (viii) What techniques of data collection will be used?– (ix) How will the data be analyzed?– (x) In what style will the report be prepared?
Research Design
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 17
• Research design must, at least, contain– (a) a clear statement of the research problem;
– (b) procedures and techniques to be used for gathering information;
– (c) the population to be studied;
– (d) methods to be used in processing and analyzing data
Data Collection
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 18
Researchers must decide three things:– How to measure the variables of interest– How to select the cases for the research– What kind of data collection techniques to use
Levels of Measurements
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 19
Not every statistical operation can be used with every variable. The type of statistical operations one can employ depends on how the variables are measured.
Measurement Scales:
Nominal, Ordinal, Interval, Ratio
Measurement Scales
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 20
Nominal - Numbers or other symbols are assigned to a set of categories for the purpose of naming, labeling, or classifying the observations. (Place, Name, Religion, etc.)
Ordinal - Nominal variables that can be ranked from low to high. (Median, Percentile)
Interval -Variables where measurements for all cases are expressed in the same units and with equal interval. (Likert Scale) (Median, SD)
Ratio - Variables with a natural zero point, such as height and weight, are on ratio scale. (AM,GM,HM,SD,VA,CV, etc.)
Variable Types
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 21
• Discrete variables: Variables that have a minimum-sized unit of measurement, which cannot be sub-divided. Example: the number Refrigerators sold per month.
• Continuous variables: Variables that, in theory, can take on all possible numerical values in a given interval. Example: length, volume, pressure, etc.
• Dependent and independent variables – If one variable depends upon or is a consequence of the other variable ---
it is called a dependent variable
– the variable that is antecedent to the dependent variable ---
is called an independent variable
A concept which can take on different quantitative values is a variable
Variable Types
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 22
• Extraneous variable:
– Independent variables that are not related to the purpose of the study, but may affect the dependent variable.
– Whatever effect is noticed on dependent variable as a result of extraneous variable (s) is technically described as an ‘experimental error’.
– One important characteristic of a good research design is to minimize the influence or effect of extraneous variable (s).
• Confounded relationship– When the dependent variable is not free from the influence of extraneous
variable (s) the relationship between the dependent and independent variables is said to be confounded/confused by an extraneous variable (s).
Sampling
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 23
• Population: The total set of individuals, objects, groups,
or events in which the researcher is interested.
• Sample: A relatively small subset selected from a population.
Population Sample
Sampling
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 24
• Two basic types: probability and non-probability• Probability sampling can include random sampling, stratified random
sampling, and cluster sampling• Non-probability sampling can include quota sampling, haphazard
sampling, and convenience sampling
Random Sampling/Stratified Random Sampling
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 25
Random Sampling
•Every unit has an equal chance of selection
•Although it is relatively simple, members of specific subgroups may not be included in appropriate proportions
Stratified Random Sampling•The population is grouped according to meaningful characteristics or strata•This method is more likely to reflect the general population, and subgroup analysis is possible•However, it can be time consuming and costly
Systematic Sampling
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 26
• Every nth unit is selected (e.g., every 10th piece in a stock of flats may be selected)
• The method is convenient and close to random sampling if the starting point is randomly chosen
Cluster/ Multi Stage Sampling•Natural groups are sampled and then their members are sampled
•This method is convenient and can use existing units
Convenience Sampling
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 27
• This method uses readily available groups or units of individuals• It is practical and easy to use• However, it may produce a biased sample• Convenience sampling is acceptable if the purpose of the research is to test a
hypothesis that certain variables are related to one another
Snow ball Sampling •Previously identified members identify others
•This method is useful when a list of potential names is difficult to obtain
•However, it may produce a biased sample
Quota Sampling•The population is divided into subgroups and the sample is selected based on the proportions of the subgroups necessary to represent the population
•This method depends on reliable data about the proportions in the population
How Large Should a Sample Be?
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 28
Sample size can be determined following a formula that statisticians have created. It finds a sample size depending on:
@ How confident the researcher wants to be (usually 95%),
@ How much error can be tolerated (usually between 2-5%),
@ An estimate of how much variability (or heterogeneity) exists in the population, @ The size of the actual population to be studied.
A. Required sample size may be estimated using the following formula (for finite population (Kothari, 2001)).
n {Z2
.N.p2
} {(N-1).e2
Z2
.p2
}
where: n = Size of the sample required for a given precision and confidence level
N = Finite population size;
Z = Std. variate at a given confidence level (1.96 for 95% & 2.57 for 99% confidence level)
e = Acceptable error or the precision required (About 5% of mean value)
p = Std. deviation of the population (estimated through pilot study or past experience)
How Large Should a Sample Be?
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 29
B. Most conservative sample size for a confidence level of 90% and an error of 5% is z σp = 0.05, where σp = Standard error of proportions = √pq/n n = Sample size, p = binomial probability of success and q = 1-p
For 90% confidence level z = 1.64, substituting we get 1.64 √pq ∕ n = 0.05. p= 0.5, q = 1-p = 0.5; Thus n = 267
Another strategy considers the number of variables being examined and the number of hypotheses being tested. In general, a good rule of thumb is that the more details that are involved under the study, the larger the sample needs to be.
Analysis of Data
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 30
• The data may be collected by surveys, interviews, literature review, participant observation, Experiments, Simulation studies, etc.
• The measurements obtained in a research study are called the data.
• The goal of statistics is to help researchers organize and interpret the data.
Analysis of Data
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 31
Asking the Research Question
Formulating theHypotheses
Evaluating the Hypotheses
Analyzing Data
Develop a research design
Contribute new evidence to
literature and begin again
THEORY
Examine a social relationship, study the relevant literature
Collecting Data
Analysis of Data
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 32
• Descriptive statistics: Procedures that help us organize and describe data collected from either a sample or a population.
• Inferential statistics: The logic and procedures concerned with making predictions or inferences about a population from observations and analyses of a sample.
Overview: Quantitative Analysis
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 33
Univariate Data Analysis comprises
1.Frequency Tables2.Diagrams3.Measures of Central Tendency4.Measures of Dispersion5.Measures of Skewness6.Measures of Kurtosis
Data Analysis in Research may be Univariate, Bivariate, and
Multivariate
Frequency Tables
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 34
A frequency table provides the number of frequency of people (respondents) and the percentage belonging to each of the categories for the variable in question. It can be used in relation to all of the different types of variable. An example of a frequency table is shown:
School Students Staff
N % N %
A 50 16.4 34 19.7
B 52 17.1 40 23.1
C 27 8.9 31 17.9
D 54 17.8 38 22.0
E 47 15.5 30 17.3
TOTAL 304 100 173 100
Diagrams
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 35
Measures of Central Tendency
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 36
• Arithmetic Mean : AM = (ΣfXi ) / (Σfi ) This is the average as we understand it in everyday use• Geometric Mean : GM = (X1 .X2. X3………..Xn)1/n ;log G = (1/N)(Σfi
logX i.) It is suitable when the data is in terms of ratios or it represents rates of change Example: Rates of growth of population, Rates of growth of industrial production • Harmonic Mean: HM = (Σfi )/ (Σfi 1/X i.) This is most suitable when calculating the average speed of a vehicle. • Median This is the midpoint in a distribution of values • Mode This is the value that occurs most frequently in a distribution • Partition Values: Quartiles, Deciles, Percentiles: Q = l + (((N/n) –
F)/f) h
Measures of Dispersion
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 37
• Range: Xh- Xl
This is simply the difference between the maximum and minimum value in a distribution of value associated with interval or ratio scale variable. • Quartile Deviation: (Q3- Q1)/2
This is essentially the average difference between third and first quartiles.• Mean Deviation: (ΣIXi - X barI/ (Σfi )) This is essentially the arithmetic average of the deviation around the mean.• Standard Deviation: SQRT((Σfi (Xi - X bar)2/ (Σfi ))) This is essentially the average amount of variation around the mean.• Variance :(Σfi (Xi - X bar)2/ (Σfi )) This is the average of the square of deviations around the mean.• Co-efficient of Variation :(S.D/A.M)To avoid the error introduced by unit of measurement
Measures of Skewness
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 38
Skewness is a measure of asymmetricity in a distributionCo-efficient of skewness = (A.M – Mode)/SD
Measures of Kurtosis
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 39
Kurtosis is a measure of “peakedness” of a distribution
4 =(Σ(Xi - X bar)4/ n)
4 /4 = 3 (Normal Curve: Mesokurtic)
3 (Lepto kurtic) 3 (Platy kurtic)
Interquartile Range
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 40
A quartile is the value that marks one of the divisions that breaks a series of values into four equal parts.
The median is a quartile and divides the cases in half.
25th percentile is a quartile that divides the first ¼ of cases from the latter ¾.
75th percentile is a quartile that divides the first ¾ of cases from the latter ¼.
The interquartile range is the distance or range between the 25th percentile and the 75th percentile. Shown below, is the interquartile.
0 250 500 750 1000
25% of
cases
25% 25% of
cases
25%
Boxplot Construction
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 41
Bi-variate Data Analysis
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 42
Bivariate analysis is concerned with the analysis of two variables at a time in order to uncover whether or not the two variables are related.
• Contingency Tables or Cross Tabulations (Tables) • Correlations Analysis 1. Karl Pearson’s correlation coefficient, 2. Spearman’s Rank correlation coefficient, and 3. Kendal’s Rank correlation coefficient)
1. r = 1/n (Σ(X - X bar) (Y - Y bar))/ (xy))
2. rs = 1 - 6(Σdi 2)/ (n(n2-1)) ; {n: no of pairs, d: difference between ranks}
3. = (2s)/(n(n-1)) ; {s: sum of scores, n: no. of pairs}
Correlation Analysis
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 43
• The goal of a correlational study is to determine whether there is a relationship between two variables and to describe the relationship.
• A correlational study simply observes the two variables as they exist naturally.
Correlation Analysis
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 44
Assesses the linear relationship between two variables Example: height and weight
Strength of the association is described by a correlation coefficient - r r = 0.0 - 0.2 low, probably meaningless r = 0.2 - 0.4 low, possible importance r = 0.4 - 0.6 moderate correlation r = 0.6 - 0.8 high correlation r = 0.8 –1.0 very high correlation
Can be positive or negative Pearson’s, Spearman’s, and Kendal’s correlation coefficient Correlation tells nothing about causation
Beware of Spurious correlation
Correlation Analysis
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 45
Correlation Analysis
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 46
Multi-Variate Analysis
18 March 2015 Dr.NN, VTU_3DWSRML_ROB 47
Multivariate analysis entails the simultaneous analysis of 3 or more variables
– Multiple Linear Regression
– Factor Analysis
– Cluster Analysis
– Discriminant Analysis
– Multi-criteria Analysis