8/4/2019 MB0050 Complete
1/25
Name : J. H. PATEL
Roll No. : 511034080
COURSE : MBA
SEMESTER : 3 (THIRD)
SUBJECT : RESEARCH METHODOLOGY
SUB. CODE : MB0050
ASSIGNMENT SET : 1
8/4/2019 MB0050 Complete
2/25
Q 1. Why should a manger know about research when the job entails
managing people, products, events, environments, and the like? [10 Marks]
The manager, while managing people, products, events, and environments will invariably faceproblems, big and small, and will have to seek ways to find long lasting effective solutions. This
can be achieved only through knowledge of research even if consultants are engaged to solve
problems. Managers are responsible for the final outcome by making the right decisions at work.
This is greatly facilitated by research knowledge. Knowledge of research heightens the
sensitivity of managers to the innumerable internal and external factors of a varied nature
operating in their work and organizational environment. It also helps to facilitate effective
interactions with consultants and comprehension of the nuances of the research process.
Sophisticated technology such as simulation and model building is now available and may lend
itself to profitable application in certain business areas. The recommendations of the external
consultant who is proficient in this technology and urges its application in a particular situation
may make no sense to, and might create some misgivings, in the manager not acquainted with
research. Even a superficial knowledge of these techniques helps the manager to deal with the
researcher in a mature and confident manner, so that dealing with experts does not result in
discomfort. As the manager, you will be the one to make the final decision on the
implementation of the recommendations made by the research team. Remaining objective,
focusing on problem solutions, fully understanding the recommendations made, and why and
how they are arrived at, make for good managerial decision making. Although company
traditions are to be respected, there my be occasions where todays rapidly changing turbulent
environment would demand the substitution or re-adaptation of some of these traditions, based
on research findings. Thus, knowledge of research greatly enhances the decision making skills of
the manger.
Managers with knowledge of research have an advantage over those without. Though you
yourself may not be doing any major research as manager, you will have to understand, predict
and control events that are dysfunctional to the organization. For example: a new product
developed may not be taking off, or a financial investment may not be paying off as
anticipated. Such disturbing phenomena have to be understood and explained. Unless this is
8/4/2019 MB0050 Complete
3/25
done, it will not be possible to predict the future of that product or the prospects of that
investment, and how future catastrophic outcomes can be controlled. A grasp of research
methods will enable managers to understand, predict and control their environment. With the
ever increasing complexity of modern organizations, and the uncertainty of environment they
face, the management of organizational systems has become one of constant trouble shooting in
the workplace. It would help if managers could sense, spot and deal with problems before they
get out of hand. Knowledge of research and problem solving processes helps manager to identify
problem situations before they get out of control. Although minor problems can be fixed by the
manager, major problesm would warrant the hiring of outside researchers or consultant. The
manager who is knowledgeable about research can interact effectively with them. Knowledge
about research processes, design and interpretation of data also helps mangers to become
discriminating recipients of the research finding presented, and to determine whether or not the
recommended solutions are appropriate for implementation.
Another reason why professional managers today need to know about research methods is that
they will become more discriminating while sifting through the information disseminated in
business journals. Some journal articles are more scientific and objective than others. Unless the
manager is able to grasp fully what the published empirical research really conveys, she or he is
likely to err in incorporating some of the suggestions such publications offer.
There are several other reasons why professional managers, should be knowledgeable about
research and research methods in business. First such knowledge sharpens the sensitivity of
managers to the myriad variables operating in a situation and reminds then frequently of the
multicausality and multifinality of phenomena, thus avoiding inappropriate, simplistic notions of
one variable causing another. Second, when managers understand the research reports about their
organizations handed to them by professionals, they will be equipped to take intelligent,
educated, calculated risks with known probabilities attached to the success or failure of their
decisions, Reseach then becomes a useful decision making tool rathen than a mass of
incomprenesnsible statistical information. Third, because managers become knowledgeable
about scientific investigations vested interests inside or outside the organization will not prevail.
For instance, an internal research group within the organization will not be able to distort
8/4/2019 MB0050 Complete
4/25
information or manipulate the finding to their advantage if managers are aware of the biases that
could creep into research and know how data are analyzed and interpreted.
In summary, being knowledgeable about research and research methods helps professional
managers to:
1) Identify and effectively solve minor problems in the work setting.2) Know how to discriminate good from bad research.3) Appreciate and be constantly aware of the multiple influences and multiple effects of
factors impinging on a situation.
4) Take calculated risks in decision making, knowing full well the probabilities associatedwith the different possible outcomes.
5) Prevent possible vested interests form exercising their influence in a situation.6) Relate to hired researchers and consultants more effectively.7) Combine experience with scientific knowledge while making decisions.
Q 2. a. How do you evolve research design for exploratory research? Briefly
analyze
Exploratory research is a type ofresearch conducted for a problem that has not been clearly
defined. Exploratory research helps determine the best research design, data collection method
and selection of subjects. It should draw definitive conclusions only with extreme caution. Given
its fundamental nature, exploratory research often concludes that a perceived problem does not
actually exist. Exploratory research often relies on secondary research such as reviewing
available literature and/or data, or qualitative approaches such as informal discussions with
consumers, employees, management or competitors, and more formal approaches through in-
depth interviews, focus, projective methods, case studies or pilot studies. The Internet allows for
research methods that are more interactive in nature. For example, RSS feeds efficiently supply
researchers with up-to-date information; major search engine search results may be sent by email
to researchers by services such as Google Alerts; comprehensive search results are tracked over
lengthy periods of time by services such as Google; and websites may be created to attract
worldwide feedback on any subject. The results of exploratory research are not usually useful for
http://en.wikipedia.org/wiki/Researchhttp://en.wikipedia.org/wiki/Researchhttp://en.wikipedia.org/wiki/Data_collectionhttp://en.wikipedia.org/wiki/Secondary_researchhttp://en.wikipedia.org/wiki/Case_studieshttp://en.wikipedia.org/wiki/Pilot_studieshttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/RSShttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Google_Alertshttp://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Google_Alertshttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/RSShttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/Pilot_studieshttp://en.wikipedia.org/wiki/Case_studieshttp://en.wikipedia.org/wiki/Secondary_researchhttp://en.wikipedia.org/wiki/Data_collectionhttp://en.wikipedia.org/wiki/Researchhttp://en.wikipedia.org/wiki/Research8/4/2019 MB0050 Complete
5/25
decision-making by themselves, but they can provide significant insight into a given situation.
Although the results ofqualitative research can give some indication as to the "why", "how" and
"when" something occurs, it cannot tell us "how often" or "how many". Exploratory research is
not typically generalizable to the population at large. Exporatory research is conducted when the
researcher does not know how and why a certain phenomenon occurs, for example, how does the
customer evaluate the quality of a bank, hotel or airline? While in the case of a manufactured
product, quality is assessed on the basis of tangible features, replacement policy, warranty and so
on, in the case of services there are no tangibles. To understand this phenomenon, several
research have conducted focus group discussions to identify these quality parameters. For
example, Zeithaml, Parsuraman and Berry identified variables which they clubbed under five
groups. In doing so, they used focus groups. Since the prime goal of exploratory research is to
know the unknown, this research is unstructured. Focus group interviewing key customer groups,
experts and even search for printed or published information are some common techniques.
Objective: To provide insights and understanding.
Characteristics: Information needed is defined only loosely. Research process is flexible and
unstructured. Sample is small and non-representative. Analysis of primary data is qualitative.
Methods: Expert surveys, Pilot surveys, Secondary data:qualitative analysis
b. Briefly explain Independent dependent and extraneous variables in a
research design. [5 Marks]
A variable is something that can be changed, such as a characteristic or value. Variables aregenerally used in psychology experiments to determine if changes to one thing result in changesto another.
Independent Variable:
The independent variable is the variable that is controlled and manipulated by the experimenter.
For example, in an experiment on the impact of sleep deprivation on test performance, sleep
deprivation would be the independent variable. That factor which is measured, manipulated, or
selected by the experimenter to determine its relationship to an observed phenomenon. "In a
http://en.wikipedia.org/wiki/Qualitative_researchhttp://en.wikipedia.org/wiki/Statistical_populationhttp://psychology.about.com/od/iindex/g/independent-variable.htmhttp://psychology.about.com/od/iindex/g/independent-variable.htmhttp://en.wikipedia.org/wiki/Statistical_populationhttp://en.wikipedia.org/wiki/Qualitative_research8/4/2019 MB0050 Complete
6/25
research study, independent variables are antecedent conditions that are presumed to affect a
dependent variable. They are either manipulated by the researcher or are observed by the
researcher so that their values can be related to that of the dependent variable. For example, in a
research study on the relationship between mosquitoes and mosquito bites, the number of
mosquitoes per acre of ground would be an independent variable" While the independent
variable is often manipulated by the researcher, it can also be a classification where subjects are
assigned to groups. In a study where one variable causes the other, the independent variable is
the cause. In a study where groups are being compared, the independent variable is the group
classification.
Dependent Variable:The dependent variable is the variable that is measured by the experimenter. In our previous
example, the scores on the test performance measure would be the dependent variable. That
factor which is observed and measured to determine the effect of the independent variable, i.e.,
that factor that appears, disappears, or varies as the experimenter introduces, removes, or varies
the independent variable. "In a research study, the independent variable defines a principal focus
of research interest. It is the consequent variable that is presumably affected by one or more
independent variables that are either manipulated by the researcher or observed by the researcher
and regarded as antecedent conditions that determine the value of the dependent variable. For
example, in a study of the relationship between mosquitoes and mosquito bites, the number of
mosquito bites per hour would be the dependent variable" (Jaeger, 1990, p. 370)The dependent
variable is the participant's response.
The dependent variable is the outcome. In an experiment, it may be what was caused or what
changed as a result of the study. In a comparison of groups, it is what they differ on.
For example, we might change the type of information (e.g. organised or random) given to
participants to see what affect this might have on the amount of information remembered. In this
particular example the type of information is the independent variable (because it changes) and
the amount of information remembered is the dependent variable (because this is being
measured).
http://psychology.about.com/od/dindex/g/dependvar.htmhttp://psychology.about.com/od/dindex/g/dependvar.htm8/4/2019 MB0050 Complete
7/25
8/4/2019 MB0050 Complete
8/25
and temperament to ensure that these factors do not interfere with the results. If, however, a
variable cannot be controlled for, it becomes what is known as a confounding variable. This type
of variable can have an impact on the dependent variable, which can make it difficult to
determine if the results are due to the influence of the independent variable, the confounding
variable or an interaction of the two.
Suppose I wanted to measure the effects of Alcohol (IV) on driving ability (DV) I would have to
try to ensure that extraneous variables did not affect the results. These variables could include:
Familiarity with the car: Some people may drive better because they have drove this make ofcar before.
Familiarity with the test: Some people may do better than others because they know what to
expect in the test.
Used to drinking. The effects of alcohol on some people may be less than on others becausethey are used to drinking.
Full stomach. The effect of alcohol on some subjects may be less than on others because they
have just had a big meal.
If these extraneous variables are not controlled they may become confounding variables, becausethey could go on to affect the results of the experiment.
Example:
8/4/2019 MB0050 Complete
9/25
Q 3. A. Differentiate between Census survey and Sample Survey [5 Marks]
Practically every country in the world conducts censuses and sampling surveys on a regular basis
in order to get valuable data from and about their populations. This data is used by the federal
and state governments in making numerous decisions with regard to various health care, housing,
and educational issues, among others. While both these two data-gathering methods essentially
serve the same purpose, they have a number of differences with regard to approach and
methodology, as well as scope. These two methods may also differ in terms of the variance in the
data gathered, as you will see later.
Scope
A census involves the gathering of information from every person in a certain group. This may
include information on age, sex and language among others. A sample survey on the other hand
commonly involves gathering data from only a certain section of a particular group.
Sampling Variance
The main advantage of a census is a virtually zero sampling variance, mainly because the data
used is drawn from the whole population. In addition, more precise detail can generally be
gathered about smaller groups of the population.
As for sampling, there is a possibility of sampling variance, since the data used is drawn from
only a small section of the population. This makes sampling a much less accurate form of data
collection than a census. In addition, the sample may be too small to provide an accurate picture
of the population.
Cost and Timetable
A census can be quite expensive to conduct, particularly for large populations. In most cases,
they are also a lot more time-consuming than sample surveys. Adding considerably
to the timetable is the necessity of gathering data from every single member of the population.
The huge scope of a census also makes it harder to maintain control of the quality of the data.
8/4/2019 MB0050 Complete
10/25
For instance, anyone who does not complete a census form will be visited by a government
representative whos only job to is to gather census data.
A sample survey for its part costs quite a bit less than a census, since data is gathered from a
much smaller group of people. In addition, sample surveys generally take a much shorter time to
conduct, again given the smaller scope. This also means reduced requirements for respondents,
which in turn leads to better data monitoring and quality control.
Survey cycle and costs
The development of most surveys follows the same cycle. Unless the survey has been conducted
before, the survey cycle starts with the identification of the need for information by one or more
clients or data user. This step is the most important since, without clear identification of user
need, the purposes of the survey will be unclear and the development process will be flawed
from the beginning.
This starts the ongoing development process for data collection. User needs guide in the
planning process that leads into development and design of the survey. When data is collected
and processed, estimation of prevalence and other analyses begin. These are disseminated and
evaluated by, among others, the data user. So the process comes full circle. Diagram below
illustrates the survey cycle.The Survey Cycle
http://www.unescap.org/stat/disability/manual/Chapter5-Disability-Statistics.asp#Diagram5_1http://www.unescap.org/stat/disability/manual/Chapter5-Disability-Statistics.asp#Diagram5_18/4/2019 MB0050 Complete
11/25
Several of these phases, and in particular, those that raise issues of special concern to disability
data collection, are discussed in this and the following chapters.
Considerations of cost are always relevant to the development of surveys. National statistical
offices must always be aware of whether the potential benefits of the survey compensate for the
costs of developing a survey, putting it into the field, and collecting and analyzing the data.
National statistical agencies regularly record, not only total costs of data collection, but the costs
of each phase. Cost information is essential for budgeting each phase of survey development
within the organization, and may also be used to compare costs with those of other national
statistical organizations.
The most significant costs in any interviewer-administered data collection are the actual field
costs to administer the survey and collect the data. The planning and development costs are
significant and of course necessary to ensure high quality disability data. Costs of analysis and
data dissemination are also major expenses. Usually, total development costs are roughly equal
to total output related costs.
Census cycle
The cycle of phases of censuses is similar to that of surveys, as Diagram illustrates. In
particular, the census cycle begins and ends with evaluation of previously collected data and
user consultation. The major difference between the two is that far more time is required for all
phases of census collection, especially for additional consultation, design and testing
procedures, including topic selection, government endorsement of final design, and quality
assurance.
http://www.unescap.org/stat/disability/manual/Chapter5-Disability-Statistics.asp#Diagram5_2http://www.unescap.org/stat/disability/manual/Chapter5-Disability-Statistics.asp#Diagram5_28/4/2019 MB0050 Complete
12/25
The Census Cycle
b. Analyze multi-stage and sequential sampling. [5 Marks]
Multi-Stage Sampling
Multi-stage sampling is a kind of complex sample design in which two or more levels of units
are imbedded one in the other. For example: geographic areas (primary units), factories
(secondary units), employees (tertiary units). At each stage, a sample of the corresponding units
is selected. At first, a sample of primary units is selected, then, in each of those selected, a
sample of secondary units is selected, and so on. All ultimate units (individuals, for instance)
selected at the last step of this procedure are then surveyed.
The reasons for adopting such a design may be reducing costs, for example, when interviewers
are assigned to persons located in a restricted area, or reducing the sample error. Multi-stagesampling is sometimes used when no general sample frame exists. In this case, a first step is to
select, at random, a sample of areas, collective units, or villages from a list where they are all
registered (primary units). Then, for each selected primary unit, a comprehensive enumeration of
all units of lower rank is made, thus obtaining a local sample frame among which a sample of
secondary units will be selected.
8/4/2019 MB0050 Complete
13/25
For example, for each village of the primary sample, a list of all housing units is established,
allowing for a selection of a sample of households. Different probabilities can be used at each
stage, as well as within one particular stage, for the different units to be selected. Probabilities at
the successive stages multiply, so that the resulting probability for selecting one final unit is the
product of the probabilities used at each step. The corresponding answers need to be weighted by
the inverse of that final probability in order to obtain unbiased estimates. A cluster sample can be
seen as a two-stage sample where the secondary probability is 100 percent.
Sequential Sampling:
Sequential sampling is a non-probability sampling technique wherein the researcher picks a
single or a group of subjects in a given time interval, conducts his study, analyzes the results then
picks another group of subjects if needed and so on.
In sequential sampling technique, there exists another step, a third option. The researcher can
accept the null hypothesis, accept his alternative hypothesis, or select another pool of subjects
and conduct the experiment once again. This entails that the researcher can obtain limitless
number of subjects before finally making a decision whether to accept his null or alternative
hypothesis.
ADVANTAGES OF SEQUENTIAL SAMPLING
The researcher has a limitless option when it comes to sample size and samplingschedule. The sample size can be relatively small of excessively large depending on the
decision making of the researcher. Sampling schedule is also completely dependent to the
researcher since a second group ofsamples can only be obtained after conducting the
experiment to the initial group of samples.
As mentioned above, this sampling technique enables the researcher to fine-tune hisresearch methods and results analysis. Due to the repetitive nature of this sampling
method, minor changes and adjustments can be done during the initial parts of the study
to correct and hone the research method.
There is very little effort in the part of the researcher when performing this samplingtechnique. It is not expensive, not time consuming and not workforce extensive.
http://www.experiment-resources.com/sample-size.htmlhttp://www.experiment-resources.com/sample-group.htmlhttp://www.experiment-resources.com/sample-group.htmlhttp://www.experiment-resources.com/sample-size.html8/4/2019 MB0050 Complete
14/25
DISADVANTAGES OF SEQUENTIAL SAMPLING
This sampling method is hardly representative of the entire population. Its only hope ofapproaching representativeness is when the researcher chose to use a very large sample
size significant enough to represent a big fraction of the entire population.
The sampling technique is also hardly randomized. This contributes to the very littledegree representativeness of the sampling technique.
Due to the aforementioned disadvantages, results from this sampling technique cannot beused to create conclusions and interpretations pertaining to the entire population.
Q 4. List down various measures of central tendency and explain the
difference between them? [10 marks].
Statisticians use summary measures to describe patterns of data.
Measures of central tendency refer to the summary measures used to describe the most
"typical" value in a set of values.
The Mean and the Median
The two most common measures of central tendency are the median and the mean, which can be
illustrated with an example. Suppose we draw a sample of five women and measure theirweights. They weigh 100 pounds, 100 pounds, 130 pounds, 140 pounds, and 150 pounds.
To find the median, we arrange the observations in order from smallest to largest value. If there
is an odd number of observations, the median is the middle value. If there is an even number of
observations, the median is the average of the two middle values. Thus, in the sample of five
women, the median value would be 130 pounds; since 130 pounds is the middle weight.
The mean of a sample or a population is computed by adding all of the observations anddividing by the number of observations. Returning to the example of the five women, the mean
weight would equal (100 + 100 + 130 + 140 + 150)/5 = 620/5 = 124 pounds. In the general case,
the mean can be calculated, using one of the following equations:
Population mean = = X / N OR Sample mean = x = x / n
http://www.experiment-resources.com/randomization.htmlhttp://www.experiment-resources.com/research-population.htmlhttp://www.experiment-resources.com/research-population.htmlhttp://www.experiment-resources.com/randomization.html8/4/2019 MB0050 Complete
15/25
where X is the sum of all the population observations, N is the number of population
observations, x is the sum of all the sample observations, and n is the number of sample
observations.
When statisticians talk about the mean of a population, they use the Greek letter to refer to the
mean score. When they talk about the mean of a sample, statisticians use the symbol x to refer to
the mean score.
The Mean vs. the Median
As measures of central tendency, the mean and the median each have advantages and
disadvantages. Some pros and cons of each measure are summarized below.
The median may be a better indicator of the most typical value if a set of scores has an outlier.
An outlier is an extreme value that differs greatly from other values.
However, when the sample size is large and does not include outliers, the mean score usually
provides a better measure of central tendency.
To illustrate these points, consider the following example. Suppose we examine a sample of 10
households to estimate the typical family income. Nine of the households have incomes between$20,000 and $100,000; but the tenth household has an annual income of $1,000,000,000. That
tenth household is an outlier. If we choose a measure to estimate the income of a typical
household, the mean will greatly over-estimate the income of a typical family (because of the
outlier); while the median will not.
Effect of Changing Units
Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is howmeasures of central tendency are affected when we change units.
If you add a constant to every value, the mean and median increase by the same constant. For
example, suppose you have a set of scores with a mean equal to 5 and a median equal to 6. If you
http://stattrek.com/Help/Glossary.aspx?Target=Populationhttp://stattrek.com/Help/Glossary.aspx?Target=Populationhttp://stattrek.com/Help/Glossary.aspx?Target=Samplehttp://stattrek.com/Help/Glossary.aspx?Target=Samplehttp://stattrek.com/Help/Glossary.aspx?Target=Population8/4/2019 MB0050 Complete
16/25
add 10 to every score, the new mean will be 5 + 10 = 15; and the new median will be 6 + 10 =
16.
Suppose you multiply every value by a constant. Then, the mean and the median will also be
multiplied by that constant. For example, assume that a set of scores has a mean of 5 and a
median of 6. If you multiply each of these scores by 10, the new mean will be 5 * 10 = 50; and
the new median will be 6 * 10 = 60.
OR
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central tendency are
sometimes called measures of central location. They are also classed as summary statistics. The
mean (often called the average) is most likely the measure of central tendency that you are most
familiar with, but there are others, such as, the median and the mode.
The mean, median and mode are all valid measures of central tendency but, under different
conditions, some measures of central tendency become more appropriate to use than others. In
the following sections we will look at the mean, mode and median and learn how to calculate
them and under what conditions they are most appropriate to be used.
Mean (Arithmetic)
The mean (or average) is the most popular and well known measure of central tendency. It can
be used with both discrete and continuous data, although its use is most often with continuous
data (see our Types of Variable guide for data types). The mean is equal to the sum of all the
values in the data set divided by the number of values in the data set. So, if we have n values in a
data set and they have values x1, x2, ..., xn, then the sample mean, usually denoted by
(pronounced x bar), is:
This formula is usually written in a slightly different manner using the Greek capitol letter, ,
pronounced "sigma", which means "sum of...":
http://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.php8/4/2019 MB0050 Complete
17/25
You may have noticed that the above formula refers to the sample mean. So, why call have we
called it a sample mean? This is because, in statistics, samples and populations have very
different meanings and these differences are very important, even if, in the case of the mean,
they are calculated in the same way. To acknowledge that we are calculating the population
mean and not the sample mean, we use the Greek lower case letter "mu", denoted as :
The mean is essentially a model of your data set. It is the value that is most common. You will
notice, however, that the mean is not often one of the actual values that you have observed in
your data set. However, one of its important properties is that it minimises error in the predictionof any one value in your data set. That is, it is the value that produces the lowest amount of error
from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part of the
calculation. In addition, the mean is the only measure of central tendency where the sum of the
deviations of each value from the mean is always zero.
When not to use the mean
The mean has one main disadvantage: it is particularly susceptible to the influence of outliers.These are values that are unusual compared to the rest of the data set by being especially small or
large in numerical value. For example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this
mean value might not be the best way to accurately reflect the typical salary of a worker, as most
workers have salaries in the $12k to 18k range. The mean is being skewed by the two large
salaries. Therefore, in this situation we would like to have a better measure of central tendency.
As we will find out later, taking the median would be a better measure of central tendency in this
situation.
8/4/2019 MB0050 Complete
18/25
Another time when we usually prefer the median over the mean (or mode) is when our data is
skewed (i.e. the frequency distribution for our data is skewed). If we consider the normal
distribution - as this is the most frequently assessed in statistics - when the data is perfectly
normal then the mean, median and mode are identical. Moreover, they all represent the most
typical value in the data set. However, as the data becomes skewed the mean loses its ability to
provide the best central location for the data as the skewed data is dragging it away from the
typical value. However, the median best retains this position and is not as strongly influenced by
the skewed values. This is explained in more detail in the skewed distribution section later in this
guide.
Median
The median is the middle score for a set of data that has been arranged in order of magnitude.
The median is less affected by outliers and skewed data. In order to calculate the median,
suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case 56 (highlighted in bold). It is the middle mark
because there are 5 scores before it and 5 scores after it. This works fine when you have an odd
number of scores but what happens when you have an even number of scores? What if you had
only 10 scores? Well, you simply have to take the middle two scores and average the result. So,
if we look at the example below:
65 55 89 56 35 14 56 55 87 45
We again rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Only now we have to take the 5th and 6th score in our data set and average them to get a median
of 55.5.
8/4/2019 MB0050 Complete
19/25
Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest bar
in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most
popular option. An example of a mode is presented below:
Normally, the mode is used for categorical data where we wish to know which is the most
common category as illustrated below:
8/4/2019 MB0050 Complete
20/25
We can see above that the most common form of transport, in this particular data set, is the bus.
However, one of the problems with the mode is that it is not unique, so it leaves us withproblems when we have two or more values that share the highest frequency, such as below:
8/4/2019 MB0050 Complete
21/25
We are now stuck as to which mode best describes the central tendency of the data. This is
particularly problematic when we have continuous data, as we are more likely not to have any
one value that is more frequent than the other. For example, consider measuring 30 peoples'
weight (to the nearest 0.1 kg). How likely is it that we will find two or more people
with exactlythe same weight, e.g. 67.4 kg? The answer, is probably very unlikely - many people
might be close but with such a small sample (30 people) and a large range of possible weights
you are unlikely to find two people with exactly the same weight, that is, to the nearest 0.1 kg.
This is why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure of central
tendency when the most common mark is far away from the rest of the data in the data set, as
depicted in the diagram below:
8/4/2019 MB0050 Complete
22/25
In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is
not representative of the data, which is mostly concentrated around the 20 to 30 value range. To
use the mode to describe the central tendency of this data set would be misleading.
Skewed Distributions and the Mean and Median
We often test whether our data is normally distributed as this is a common assumption
underlying many statistical tests. An example of a normally distributed set of data is presented
below:
8/4/2019 MB0050 Complete
23/25
When you have a normally distributed sample you can legitimately use both the mean or the
median as your measure of central tendency. In fact, in any symmetrical distribution the mean,
median and mode are equal. However, in this situation, the mean is widely preferred as the bestmeasure of central tendency as it is the measure that includes all the values in the data set for its
calculation, and any change in any of the scores will affect the value of the mean. This is not the
case with the median or mode.
However, when our data is skewed, for example, as with the right-skewed data set below:
8/4/2019 MB0050 Complete
24/25
we find that the mean is being dragged in the direct of the skew. In these situations, the median is
generally considered to be the best representative of the central location of the data. The more
skewed the distribution the greater the difference between the median and mean, and the greater
emphasis should be placed on using the median as opposed to the mean. A classic example of the
above right-skewed distribution is income (salary), where higher-earners provide a false
representation of the typical income if expressed as a mean and not a median.
If dealing with a normal distribution, and tests of normality show that the data is non-normal,
then it is customary to use the median instead of the mean. This is more a rule of thumb than a
strict guideline however. Sometimes, researchers wish to report the mean of a skewed
distribution if the median and mean are not appreciably different (a subjective assessment) and if
it allows easier comparisons to previous research to be made.
Summary of when to use the mean, median and mode
Please use the following summary table to know what the best measure of central tendency is
with respect to the different types of variable.
http://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.php8/4/2019 MB0050 Complete
25/25
Type of Variable Best measure of central tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median
Recommended