Download pdf - MB0050 Complete

8/4/2019 MB0050 Complete

1/25

Name : J. H. PATEL

Roll No. : 511034080

COURSE : MBA

SEMESTER : 3 (THIRD)

SUBJECT : RESEARCH METHODOLOGY

SUB. CODE : MB0050

ASSIGNMENT SET : 1


2/25

Q 1. Why should a manger know about research when the job entails

managing people, products, events, environments, and the like? [10 Marks]

The manager, while managing people, products, events, and environments will invariably faceproblems, big and small, and will have to seek ways to find long lasting effective solutions. This

can be achieved only through knowledge of research even if consultants are engaged to solve

problems. Managers are responsible for the final outcome by making the right decisions at work.

This is greatly facilitated by research knowledge. Knowledge of research heightens the

sensitivity of managers to the innumerable internal and external factors of a varied nature

operating in their work and organizational environment. It also helps to facilitate effective

interactions with consultants and comprehension of the nuances of the research process.

Sophisticated technology such as simulation and model building is now available and may lend

itself to profitable application in certain business areas. The recommendations of the external

consultant who is proficient in this technology and urges its application in a particular situation

may make no sense to, and might create some misgivings, in the manager not acquainted with

research. Even a superficial knowledge of these techniques helps the manager to deal with the

researcher in a mature and confident manner, so that dealing with experts does not result in

discomfort. As the manager, you will be the one to make the final decision on the

implementation of the recommendations made by the research team. Remaining objective,

focusing on problem solutions, fully understanding the recommendations made, and why and

how they are arrived at, make for good managerial decision making. Although company

traditions are to be respected, there my be occasions where todays rapidly changing turbulent

environment would demand the substitution or re-adaptation of some of these traditions, based

on research findings. Thus, knowledge of research greatly enhances the decision making skills of

the manger.

Managers with knowledge of research have an advantage over those without. Though you

yourself may not be doing any major research as manager, you will have to understand, predict

and control events that are dysfunctional to the organization. For example: a new product

developed may not be taking off, or a financial investment may not be paying off as

anticipated. Such disturbing phenomena have to be understood and explained. Unless this is


3/25

done, it will not be possible to predict the future of that product or the prospects of that

investment, and how future catastrophic outcomes can be controlled. A grasp of research

methods will enable managers to understand, predict and control their environment. With the

ever increasing complexity of modern organizations, and the uncertainty of environment they

face, the management of organizational systems has become one of constant trouble shooting in

the workplace. It would help if managers could sense, spot and deal with problems before they

get out of hand. Knowledge of research and problem solving processes helps manager to identify

problem situations before they get out of control. Although minor problems can be fixed by the

manager, major problesm would warrant the hiring of outside researchers or consultant. The

manager who is knowledgeable about research can interact effectively with them. Knowledge

about research processes, design and interpretation of data also helps mangers to become

discriminating recipients of the research finding presented, and to determine whether or not the

recommended solutions are appropriate for implementation.

Another reason why professional managers today need to know about research methods is that

they will become more discriminating while sifting through the information disseminated in

business journals. Some journal articles are more scientific and objective than others. Unless the

manager is able to grasp fully what the published empirical research really conveys, she or he is

likely to err in incorporating some of the suggestions such publications offer.

There are several other reasons why professional managers, should be knowledgeable about

research and research methods in business. First such knowledge sharpens the sensitivity of

managers to the myriad variables operating in a situation and reminds then frequently of the

multicausality and multifinality of phenomena, thus avoiding inappropriate, simplistic notions of

one variable causing another. Second, when managers understand the research reports about their

organizations handed to them by professionals, they will be equipped to take intelligent,

educated, calculated risks with known probabilities attached to the success or failure of their

decisions, Reseach then becomes a useful decision making tool rathen than a mass of

incomprenesnsible statistical information. Third, because managers become knowledgeable

about scientific investigations vested interests inside or outside the organization will not prevail.

For instance, an internal research group within the organization will not be able to distort


4/25

information or manipulate the finding to their advantage if managers are aware of the biases that

could creep into research and know how data are analyzed and interpreted.

In summary, being knowledgeable about research and research methods helps professional

managers to:

1) Identify and effectively solve minor problems in the work setting.2) Know how to discriminate good from bad research.3) Appreciate and be constantly aware of the multiple influences and multiple effects of

factors impinging on a situation.

4) Take calculated risks in decision making, knowing full well the probabilities associatedwith the different possible outcomes.

5) Prevent possible vested interests form exercising their influence in a situation.6) Relate to hired researchers and consultants more effectively.7) Combine experience with scientific knowledge while making decisions.

Q 2. a. How do you evolve research design for exploratory research? Briefly

analyze

Exploratory research is a type ofresearch conducted for a problem that has not been clearly

defined. Exploratory research helps determine the best research design, data collection method

and selection of subjects. It should draw definitive conclusions only with extreme caution. Given

its fundamental nature, exploratory research often concludes that a perceived problem does not

actually exist. Exploratory research often relies on secondary research such as reviewing

available literature and/or data, or qualitative approaches such as informal discussions with

consumers, employees, management or competitors, and more formal approaches through in-

depth interviews, focus, projective methods, case studies or pilot studies. The Internet allows for

research methods that are more interactive in nature. For example, RSS feeds efficiently supply

researchers with up-to-date information; major search engine search results may be sent by email

to researchers by services such as Google Alerts; comprehensive search results are tracked over

lengthy periods of time by services such as Google; and websites may be created to attract

worldwide feedback on any subject. The results of exploratory research are not usually useful for
http://en.wikipedia.org/wiki/Researchhttp://en.wikipedia.org/wiki/Researchhttp://en.wikipedia.org/wiki/Data_collectionhttp://en.wikipedia.org/wiki/Secondary_researchhttp://en.wikipedia.org/wiki/Case_studieshttp://en.wikipedia.org/wiki/Pilot_studieshttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/RSShttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/Google_Alertshttp://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Websitehttp://en.wikipedia.org/wiki/Google_Alertshttp://en.wikipedia.org/wiki/Search_enginehttp://en.wikipedia.org/wiki/RSShttp://en.wikipedia.org/wiki/Internethttp://en.wikipedia.org/wiki/Pilot_studieshttp://en.wikipedia.org/wiki/Case_studieshttp://en.wikipedia.org/wiki/Secondary_researchhttp://en.wikipedia.org/wiki/Data_collectionhttp://en.wikipedia.org/wiki/Researchhttp://en.wikipedia.org/wiki/Research


5/25

decision-making by themselves, but they can provide significant insight into a given situation.

Although the results ofqualitative research can give some indication as to the "why", "how" and

"when" something occurs, it cannot tell us "how often" or "how many". Exploratory research is

not typically generalizable to the population at large. Exporatory research is conducted when the

researcher does not know how and why a certain phenomenon occurs, for example, how does the

customer evaluate the quality of a bank, hotel or airline? While in the case of a manufactured

product, quality is assessed on the basis of tangible features, replacement policy, warranty and so

on, in the case of services there are no tangibles. To understand this phenomenon, several

research have conducted focus group discussions to identify these quality parameters. For

example, Zeithaml, Parsuraman and Berry identified variables which they clubbed under five

groups. In doing so, they used focus groups. Since the prime goal of exploratory research is to

know the unknown, this research is unstructured. Focus group interviewing key customer groups,

experts and even search for printed or published information are some common techniques.

Objective: To provide insights and understanding.

Characteristics: Information needed is defined only loosely. Research process is flexible and

unstructured. Sample is small and non-representative. Analysis of primary data is qualitative.

Methods: Expert surveys, Pilot surveys, Secondary data:qualitative analysis

b. Briefly explain Independent dependent and extraneous variables in a

research design. [5 Marks]

A variable is something that can be changed, such as a characteristic or value. Variables aregenerally used in psychology experiments to determine if changes to one thing result in changesto another.

Independent Variable:

The independent variable is the variable that is controlled and manipulated by the experimenter.

For example, in an experiment on the impact of sleep deprivation on test performance, sleep

deprivation would be the independent variable. That factor which is measured, manipulated, or

selected by the experimenter to determine its relationship to an observed phenomenon. "In a
http://en.wikipedia.org/wiki/Qualitative_researchhttp://en.wikipedia.org/wiki/Statistical_populationhttp://psychology.about.com/od/iindex/g/independent-variable.htmhttp://psychology.about.com/od/iindex/g/independent-variable.htmhttp://en.wikipedia.org/wiki/Statistical_populationhttp://en.wikipedia.org/wiki/Qualitative_research


6/25

research study, independent variables are antecedent conditions that are presumed to affect a

dependent variable. They are either manipulated by the researcher or are observed by the

researcher so that their values can be related to that of the dependent variable. For example, in a

research study on the relationship between mosquitoes and mosquito bites, the number of

mosquitoes per acre of ground would be an independent variable" While the independent

variable is often manipulated by the researcher, it can also be a classification where subjects are

assigned to groups. In a study where one variable causes the other, the independent variable is

the cause. In a study where groups are being compared, the independent variable is the group

classification.

Dependent Variable:The dependent variable is the variable that is measured by the experimenter. In our previous

example, the scores on the test performance measure would be the dependent variable. That

factor which is observed and measured to determine the effect of the independent variable, i.e.,

that factor that appears, disappears, or varies as the experimenter introduces, removes, or varies

the independent variable. "In a research study, the independent variable defines a principal focus

of research interest. It is the consequent variable that is presumably affected by one or more

independent variables that are either manipulated by the researcher or observed by the researcher

and regarded as antecedent conditions that determine the value of the dependent variable. For

example, in a study of the relationship between mosquitoes and mosquito bites, the number of

mosquito bites per hour would be the dependent variable" (Jaeger, 1990, p. 370)The dependent

variable is the participant's response.

The dependent variable is the outcome. In an experiment, it may be what was caused or what

changed as a result of the study. In a comparison of groups, it is what they differ on.

For example, we might change the type of information (e.g. organised or random) given to

participants to see what affect this might have on the amount of information remembered. In this

particular example the type of information is the independent variable (because it changes) and

the amount of information remembered is the dependent variable (because this is being

measured).
http://psychology.about.com/od/dindex/g/dependvar.htmhttp://psychology.about.com/od/dindex/g/dependvar.htm


7/25


8/25

and temperament to ensure that these factors do not interfere with the results. If, however, a

variable cannot be controlled for, it becomes what is known as a confounding variable. This type

of variable can have an impact on the dependent variable, which can make it difficult to

determine if the results are due to the influence of the independent variable, the confounding

variable or an interaction of the two.

Suppose I wanted to measure the effects of Alcohol (IV) on driving ability (DV) I would have to

try to ensure that extraneous variables did not affect the results. These variables could include:

Familiarity with the car: Some people may drive better because they have drove this make ofcar before.

Familiarity with the test: Some people may do better than others because they know what to

expect in the test.

Used to drinking. The effects of alcohol on some people may be less than on others becausethey are used to drinking.

Full stomach. The effect of alcohol on some subjects may be less than on others because they

have just had a big meal.

If these extraneous variables are not controlled they may become confounding variables, becausethey could go on to affect the results of the experiment.

Example:


9/25

Q 3. A. Differentiate between Census survey and Sample Survey [5 Marks]

Practically every country in the world conducts censuses and sampling surveys on a regular basis

in order to get valuable data from and about their populations. This data is used by the federal

and state governments in making numerous decisions with regard to various health care, housing,

and educational issues, among others. While both these two data-gathering methods essentially

serve the same purpose, they have a number of differences with regard to approach and

methodology, as well as scope. These two methods may also differ in terms of the variance in the

data gathered, as you will see later.

Scope

A census involves the gathering of information from every person in a certain group. This may

include information on age, sex and language among others. A sample survey on the other hand

commonly involves gathering data from only a certain section of a particular group.

Sampling Variance

The main advantage of a census is a virtually zero sampling variance, mainly because the data

used is drawn from the whole population. In addition, more precise detail can generally be

gathered about smaller groups of the population.

As for sampling, there is a possibility of sampling variance, since the data used is drawn from

only a small section of the population. This makes sampling a much less accurate form of data

collection than a census. In addition, the sample may be too small to provide an accurate picture

of the population.

Cost and Timetable

A census can be quite expensive to conduct, particularly for large populations. In most cases,

they are also a lot more time-consuming than sample surveys. Adding considerably

to the timetable is the necessity of gathering data from every single member of the population.

The huge scope of a census also makes it harder to maintain control of the quality of the data.


10/25

For instance, anyone who does not complete a census form will be visited by a government

representative whos only job to is to gather census data.

A sample survey for its part costs quite a bit less than a census, since data is gathered from a

much smaller group of people. In addition, sample surveys generally take a much shorter time to

conduct, again given the smaller scope. This also means reduced requirements for respondents,

which in turn leads to better data monitoring and quality control.

Survey cycle and costs

The development of most surveys follows the same cycle. Unless the survey has been conducted

before, the survey cycle starts with the identification of the need for information by one or more

clients or data user. This step is the most important since, without clear identification of user

need, the purposes of the survey will be unclear and the development process will be flawed

from the beginning.

This starts the ongoing development process for data collection. User needs guide in the

planning process that leads into development and design of the survey. When data is collected

and processed, estimation of prevalence and other analyses begin. These are disseminated and

evaluated by, among others, the data user. So the process comes full circle. Diagram below

illustrates the survey cycle.The Survey Cycle
http://www.unescap.org/stat/disability/manual/Chapter5-Disability-Statistics.asp#Diagram5_1http://www.unescap.org/stat/disability/manual/Chapter5-Disability-Statistics.asp#Diagram5_1


11/25

Several of these phases, and in particular, those that raise issues of special concern to disability

data collection, are discussed in this and the following chapters.

Considerations of cost are always relevant to the development of surveys. National statistical

offices must always be aware of whether the potential benefits of the survey compensate for the

costs of developing a survey, putting it into the field, and collecting and analyzing the data.

National statistical agencies regularly record, not only total costs of data collection, but the costs

of each phase. Cost information is essential for budgeting each phase of survey development

within the organization, and may also be used to compare costs with those of other national

statistical organizations.

The most significant costs in any interviewer-administered data collection are the actual field

costs to administer the survey and collect the data. The planning and development costs are

significant and of course necessary to ensure high quality disability data. Costs of analysis and

data dissemination are also major expenses. Usually, total development costs are roughly equal

to total output related costs.

Census cycle

The cycle of phases of censuses is similar to that of surveys, as Diagram illustrates. In

particular, the census cycle begins and ends with evaluation of previously collected data and

user consultation. The major difference between the two is that far more time is required for all

phases of census collection, especially for additional consultation, design and testing

procedures, including topic selection, government endorsement of final design, and quality

assurance.
http://www.unescap.org/stat/disability/manual/Chapter5-Disability-Statistics.asp#Diagram5_2http://www.unescap.org/stat/disability/manual/Chapter5-Disability-Statistics.asp#Diagram5_2


12/25

The Census Cycle

b. Analyze multi-stage and sequential sampling. [5 Marks]

Multi-Stage Sampling

Multi-stage sampling is a kind of complex sample design in which two or more levels of units

are imbedded one in the other. For example: geographic areas (primary units), factories

(secondary units), employees (tertiary units). At each stage, a sample of the corresponding units

is selected. At first, a sample of primary units is selected, then, in each of those selected, a

sample of secondary units is selected, and so on. All ultimate units (individuals, for instance)

selected at the last step of this procedure are then surveyed.

The reasons for adopting such a design may be reducing costs, for example, when interviewers

are assigned to persons located in a restricted area, or reducing the sample error. Multi-stagesampling is sometimes used when no general sample frame exists. In this case, a first step is to

select, at random, a sample of areas, collective units, or villages from a list where they are all

registered (primary units). Then, for each selected primary unit, a comprehensive enumeration of

all units of lower rank is made, thus obtaining a local sample frame among which a sample of

secondary units will be selected.


13/25

For example, for each village of the primary sample, a list of all housing units is established,

allowing for a selection of a sample of households. Different probabilities can be used at each

stage, as well as within one particular stage, for the different units to be selected. Probabilities at

the successive stages multiply, so that the resulting probability for selecting one final unit is the

product of the probabilities used at each step. The corresponding answers need to be weighted by

the inverse of that final probability in order to obtain unbiased estimates. A cluster sample can be

seen as a two-stage sample where the secondary probability is 100 percent.

Sequential Sampling:

Sequential sampling is a non-probability sampling technique wherein the researcher picks a

single or a group of subjects in a given time interval, conducts his study, analyzes the results then

picks another group of subjects if needed and so on.

In sequential sampling technique, there exists another step, a third option. The researcher can

accept the null hypothesis, accept his alternative hypothesis, or select another pool of subjects

and conduct the experiment once again. This entails that the researcher can obtain limitless

number of subjects before finally making a decision whether to accept his null or alternative

hypothesis.

ADVANTAGES OF SEQUENTIAL SAMPLING

The researcher has a limitless option when it comes to sample size and samplingschedule. The sample size can be relatively small of excessively large depending on the

decision making of the researcher. Sampling schedule is also completely dependent to the

researcher since a second group ofsamples can only be obtained after conducting the

experiment to the initial group of samples.

As mentioned above, this sampling technique enables the researcher to fine-tune hisresearch methods and results analysis. Due to the repetitive nature of this sampling

method, minor changes and adjustments can be done during the initial parts of the study

to correct and hone the research method.

There is very little effort in the part of the researcher when performing this samplingtechnique. It is not expensive, not time consuming and not workforce extensive.
http://www.experiment-resources.com/sample-size.htmlhttp://www.experiment-resources.com/sample-group.htmlhttp://www.experiment-resources.com/sample-group.htmlhttp://www.experiment-resources.com/sample-size.html


14/25

DISADVANTAGES OF SEQUENTIAL SAMPLING

This sampling method is hardly representative of the entire population. Its only hope ofapproaching representativeness is when the researcher chose to use a very large sample

size significant enough to represent a big fraction of the entire population.

The sampling technique is also hardly randomized. This contributes to the very littledegree representativeness of the sampling technique.

Due to the aforementioned disadvantages, results from this sampling technique cannot beused to create conclusions and interpretations pertaining to the entire population.

Q 4. List down various measures of central tendency and explain the

difference between them? [10 marks].

Statisticians use summary measures to describe patterns of data.

Measures of central tendency refer to the summary measures used to describe the most

"typical" value in a set of values.

The Mean and the Median

The two most common measures of central tendency are the median and the mean, which can be

illustrated with an example. Suppose we draw a sample of five women and measure theirweights. They weigh 100 pounds, 100 pounds, 130 pounds, 140 pounds, and 150 pounds.

To find the median, we arrange the observations in order from smallest to largest value. If there

is an odd number of observations, the median is the middle value. If there is an even number of

observations, the median is the average of the two middle values. Thus, in the sample of five

women, the median value would be 130 pounds; since 130 pounds is the middle weight.

The mean of a sample or a population is computed by adding all of the observations anddividing by the number of observations. Returning to the example of the five women, the mean

weight would equal (100 + 100 + 130 + 140 + 150)/5 = 620/5 = 124 pounds. In the general case,

the mean can be calculated, using one of the following equations:

Population mean = = X / N OR Sample mean = x = x / n
http://www.experiment-resources.com/randomization.htmlhttp://www.experiment-resources.com/research-population.htmlhttp://www.experiment-resources.com/research-population.htmlhttp://www.experiment-resources.com/randomization.html


15/25

where X is the sum of all the population observations, N is the number of population

observations, x is the sum of all the sample observations, and n is the number of sample

observations.

When statisticians talk about the mean of a population, they use the Greek letter to refer to the

mean score. When they talk about the mean of a sample, statisticians use the symbol x to refer to

the mean score.

The Mean vs. the Median

As measures of central tendency, the mean and the median each have advantages and

disadvantages. Some pros and cons of each measure are summarized below.

The median may be a better indicator of the most typical value if a set of scores has an outlier.

An outlier is an extreme value that differs greatly from other values.

However, when the sample size is large and does not include outliers, the mean score usually

provides a better measure of central tendency.

To illustrate these points, consider the following example. Suppose we examine a sample of 10

households to estimate the typical family income. Nine of the households have incomes between$20,000 and $100,000; but the tenth household has an annual income of $1,000,000,000. That

tenth household is an outlier. If we choose a measure to estimate the income of a typical

household, the mean will greatly over-estimate the income of a typical family (because of the

outlier); while the median will not.

Effect of Changing Units

Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is howmeasures of central tendency are affected when we change units.

If you add a constant to every value, the mean and median increase by the same constant. For

example, suppose you have a set of scores with a mean equal to 5 and a median equal to 6. If you
http://stattrek.com/Help/Glossary.aspx?Target=Populationhttp://stattrek.com/Help/Glossary.aspx?Target=Populationhttp://stattrek.com/Help/Glossary.aspx?Target=Samplehttp://stattrek.com/Help/Glossary.aspx?Target=Samplehttp://stattrek.com/Help/Glossary.aspx?Target=Population


16/25

add 10 to every score, the new mean will be 5 + 10 = 15; and the new median will be 6 + 10 =

16.

Suppose you multiply every value by a constant. Then, the mean and the median will also be

multiplied by that constant. For example, assume that a set of scores has a mean of 5 and a

median of 6. If you multiply each of these scores by 10, the new mean will be 5 * 10 = 50; and

the new median will be 6 * 10 = 60.

OR

A measure of central tendency is a single value that attempts to describe a set of data by

identifying the central position within that set of data. As such, measures of central tendency are

sometimes called measures of central location. They are also classed as summary statistics. The

mean (often called the average) is most likely the measure of central tendency that you are most

familiar with, but there are others, such as, the median and the mode.

The mean, median and mode are all valid measures of central tendency but, under different

conditions, some measures of central tendency become more appropriate to use than others. In

the following sections we will look at the mean, mode and median and learn how to calculate

them and under what conditions they are most appropriate to be used.

Mean (Arithmetic)

The mean (or average) is the most popular and well known measure of central tendency. It can

be used with both discrete and continuous data, although its use is most often with continuous

data (see our Types of Variable guide for data types). The mean is equal to the sum of all the

values in the data set divided by the number of values in the data set. So, if we have n values in a

data set and they have values x1, x2, ..., xn, then the sample mean, usually denoted by

(pronounced x bar), is:

This formula is usually written in a slightly different manner using the Greek capitol letter, ,

pronounced "sigma", which means "sum of...":
http://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.php


17/25

You may have noticed that the above formula refers to the sample mean. So, why call have we

called it a sample mean? This is because, in statistics, samples and populations have very

different meanings and these differences are very important, even if, in the case of the mean,

they are calculated in the same way. To acknowledge that we are calculating the population

mean and not the sample mean, we use the Greek lower case letter "mu", denoted as :

The mean is essentially a model of your data set. It is the value that is most common. You will

notice, however, that the mean is not often one of the actual values that you have observed in

your data set. However, one of its important properties is that it minimises error in the predictionof any one value in your data set. That is, it is the value that produces the lowest amount of error

from all other values in the data set.

An important property of the mean is that it includes every value in your data set as part of the

calculation. In addition, the mean is the only measure of central tendency where the sum of the

deviations of each value from the mean is always zero.

When not to use the mean

The mean has one main disadvantage: it is particularly susceptible to the influence of outliers.These are values that are unusual compared to the rest of the data set by being especially small or

large in numerical value. For example, consider the wages of staff at a factory below:

Staff 1 2 3 4 5 6 7 8 9 10

Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this

mean value might not be the best way to accurately reflect the typical salary of a worker, as most

workers have salaries in the $12k to 18k range. The mean is being skewed by the two large

salaries. Therefore, in this situation we would like to have a better measure of central tendency.

As we will find out later, taking the median would be a better measure of central tendency in this

situation.


18/25

Another time when we usually prefer the median over the mean (or mode) is when our data is

skewed (i.e. the frequency distribution for our data is skewed). If we consider the normal

distribution - as this is the most frequently assessed in statistics - when the data is perfectly

normal then the mean, median and mode are identical. Moreover, they all represent the most

typical value in the data set. However, as the data becomes skewed the mean loses its ability to

provide the best central location for the data as the skewed data is dragging it away from the

typical value. However, the median best retains this position and is not as strongly influenced by

the skewed values. This is explained in more detail in the skewed distribution section later in this

guide.

Median

The median is the middle score for a set of data that has been arranged in order of magnitude.

The median is less affected by outliers and skewed data. In order to calculate the median,

suppose we have the data below:

65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case 56 (highlighted in bold). It is the middle mark

because there are 5 scores before it and 5 scores after it. This works fine when you have an odd

number of scores but what happens when you have an even number of scores? What if you had

only 10 scores? Well, you simply have to take the middle two scores and average the result. So,

if we look at the example below:

65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Only now we have to take the 5th and 6th score in our data set and average them to get a median

of 55.5.


19/25

Mode

The mode is the most frequent score in our data set. On a histogram it represents the highest bar

in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most

popular option. An example of a mode is presented below:

Normally, the mode is used for categorical data where we wish to know which is the most

common category as illustrated below:


20/25

We can see above that the most common form of transport, in this particular data set, is the bus.

However, one of the problems with the mode is that it is not unique, so it leaves us withproblems when we have two or more values that share the highest frequency, such as below:


21/25

We are now stuck as to which mode best describes the central tendency of the data. This is

particularly problematic when we have continuous data, as we are more likely not to have any

one value that is more frequent than the other. For example, consider measuring 30 peoples'

weight (to the nearest 0.1 kg). How likely is it that we will find two or more people

with exactlythe same weight, e.g. 67.4 kg? The answer, is probably very unlikely - many people

might be close but with such a small sample (30 people) and a large range of possible weights

you are unlikely to find two people with exactly the same weight, that is, to the nearest 0.1 kg.

This is why the mode is very rarely used with continuous data.

Another problem with the mode is that it will not provide us with a very good measure of central

tendency when the most common mark is far away from the rest of the data in the data set, as

depicted in the diagram below:


22/25

In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is

not representative of the data, which is mostly concentrated around the 20 to 30 value range. To

use the mode to describe the central tendency of this data set would be misleading.

Skewed Distributions and the Mean and Median

We often test whether our data is normally distributed as this is a common assumption

underlying many statistical tests. An example of a normally distributed set of data is presented

below:


23/25

When you have a normally distributed sample you can legitimately use both the mean or the

median as your measure of central tendency. In fact, in any symmetrical distribution the mean,

median and mode are equal. However, in this situation, the mean is widely preferred as the bestmeasure of central tendency as it is the measure that includes all the values in the data set for its

calculation, and any change in any of the scores will affect the value of the mean. This is not the

case with the median or mode.

However, when our data is skewed, for example, as with the right-skewed data set below:


24/25

we find that the mean is being dragged in the direct of the skew. In these situations, the median is

generally considered to be the best representative of the central location of the data. The more

skewed the distribution the greater the difference between the median and mean, and the greater

emphasis should be placed on using the median as opposed to the mean. A classic example of the

above right-skewed distribution is income (salary), where higher-earners provide a false

representation of the typical income if expressed as a mean and not a median.

If dealing with a normal distribution, and tests of normality show that the data is non-normal,

then it is customary to use the median instead of the mean. This is more a rule of thumb than a

strict guideline however. Sometimes, researchers wish to report the mean of a skewed

distribution if the median and mean are not appreciably different (a subjective assessment) and if

it allows easier comparisons to previous research to be made.

Summary of when to use the mean, median and mode

Please use the following summary table to know what the best measure of central tendency is

with respect to the different types of variable.
http://statistics.laerd.com/statistical-guides/types-of-variable.phphttp://statistics.laerd.com/statistical-guides/types-of-variable.php


25/25

Type of Variable Best measure of central tendency

Nominal Mode

Ordinal Median

Interval/Ratio (not skewed) Mean

Interval/Ratio (skewed) Median