of 135 /135

# 11th Social-economics-statistics for Economics

Embed Size (px)

Citation preview

CONTENTS

Foreword iii

Chapter 1 : Introduction 1

Chapter 2 : Collection of Data 9

Chapter 3 : Organisation of Data 22

Chapter 4 : Presentation of Data 40

Chapter 5 : Measures of Central Tendency 58

Chapter 6 : Measures of Dispersion 74

Chapter 7 : Correlation 91

Chapter 8 : Index Numbers 107

Chapter 9 : Use of Statistical Tools 121

APPENDIX A : GLOSSARY OF STATISTICAL TERMS 131

APPENDIX B : TABLE OF TWO-DIGIT RANDOM NUMBERS 134

told this subject is mainly around

what Alfred Marshall (one of the

founders of modern economics) called“the study of man in the ordinary

business of life”. Let us understand

what that means.

When you buy goods (you may

want to satisfy your own personal

needs or those of your family or thoseof any other person to whom you want

to make a gift) you are called

a consumer.

When you sell goods to make

a profit for yourself (you may be

a shopkeeper), you are called a seller.When you produce goods (you may

be a farmer or a manufacturer), you

are called a producer.

Introduction

1. WHY ECONOMICS?

You have, perhaps, already hadEconomics as a subject for your earlierclasses at school. You might have been

Studying this chapter shouldenable you to:• know what the subject of

economics is about;• understand how economics is

linked with the study of economicactivities in consumption,production and distribution;

• understand why knowledge ofstatistics can help in describingconsumption, production anddistribution;

• learn about some uses ofstatistics in the understanding ofeconomic activities.

CHA P T E R

1

2 STATISTICS FOR ECONOMICS

When you are in a job, working forsome other person, and you get paidfor it (you may be employed bysomebody who pays you wages or asalary), you are called a service-holder.

When you provide some kind ofservice to others for a payment (youmay be a lawyer or a doctor or abanker or a taxi driver or a transporterof goods), you are called a service-provider.

In all these cases you will be calledgainfully employed in an economicactivity. Economic activities are onesthat are undertaken for a monetarygain. This is what economists meanby ordinary business of life.

Activities

• List different activities of themembers of your family. Wouldyou call them economicactivities? Give reasons.

• Do you consider yourself aconsumer? Why?

We cannot get something fornothing

If you ever heard the story of Aladdinand his Magic Lamp, you would agreethat Aladdin was a lucky guy.Whenever and whatever he wanted, hejust had to rub his magic lamp onwhen a genie appeared to fulfill hiswish. When he wanted a palace to livein, the genie instantly made one forhim. When he wanted expensive giftsto bring to the king when asking forhis daughter’s hand, he got them atthe bat of an eyelid.

In real life we cannot be as luckyas Aladdin. Though, like him we haveunlimited wants, we do not have amagic lamp. Take, for example, thepocket money that you get to spend.If you had more of it then you couldhave purchased almost all the thingsyou wanted. But since your pocketmoney is limited, you have to chooseonly those things that you want themost. This is a basic teaching ofEconomics.

Activities

• Can you think for yourself ofsome other examples where aperson with a given income hasto choose which things and inwhat quantities he or she canbuy at the prices that are beingcharged (called the currentprices)?

• What will happen if the currentprices go up?

Scarcity is the root of all economicproblems. Had there been no scarcity,there would have been no economicproblem. And you would not havestudied Economics either. In our dailylife, we face various forms of scarcity.The long queues at railway bookingcounters, crowded buses and trains,shortage of essential commodities, therush to get a ticket to watch a newfilm, etc., are all manifestations ofscarcity. We face scarcity because thethings that satisfy our wants arelimited in availability. Can you thinkof some more instances of scarcity?

The resources which the producershave are limited and also have

INTRODUCTION 3

alternative uses. Take the case of foodthat you eat every day. It satisfies yourwant of nourishment. Farmersemployed in agriculture raise cropsthat produce your food. At any pointof time, the resources in agriculturelike land, labour, water, fertiliser, etc.,are given. All these resources havealternative uses. The same resourcescan be used in the production of non-food crops such as rubber, cotton, juteetc. Thus alternative uses of resourcesgive rise to the problem of choicebetween different commodities thatcan be produced by those resources.

Activities

• Identify your wants. How manyof them can you fulfill? Howmany of them are unfulfilled?Why you are unable to fulfillthem?

• What are the different kinds ofscarcity that you face in yourdaily life? Identify their causes.

Consumption, Production andDistribution

If you thought about it, you mighthave realised that Economics involvesthe study of man engaged in economic

activities of various kinds. For this,you need to know reliable facts aboutall the diverse economic activities likeproduction, consumption anddistribution. Economics is oftendiscussed in three parts: consum-ption, production and distribution.

We want to know how theconsumer decides, given his incomeand many alternative goods to choosefrom, what to buy when he knows theprices. This is the study of Consum-ption.

We also want to know how theproducer, similarly, chooses what toproduce for the market when heknows the costs and prices. This is thestudy of Production.

Finally, we want to know how thenational income or the total incomearising from what has been producedin the country (called the GrossDomestic Product or GDP) isdistributed through wages (andsalaries), profits and interest (We willleave aside here income frominternational trade and investment).This is study of Distribution.

Besides these three conventionaldivisions of the study of Economicsabout which we want to know all thefacts, modern economics has toinclude some of the basic problemsfacing the country for special studies.

For example, you might want toknow why or to what extent somehouseholds in our society have thecapacity to earn much more thanothers. You may want to know howmany people in the country are really

4 STATISTICS FOR ECONOMICS

Obviously, if you think along theselines, you will also appreciate why weneeded Statistics (which is the study

of numbers relating to selected factsin a systematic form) to be added toall modern courses of moderneconomics.

Would you now agree with thefollowing definition of economics thatmany economists use?

“Economics is the study of howpeople and society choose toemploy scarce resources that couldhave alternative uses in order toproduce various commodities thatsatisfy their wants and todistribute them for consumptionamong various persons and groupsin society.”

Activity

• Would you say, in the light of thediscussion above, that thisdefinition used to be given seemsa little inadequate now? Whatdoes it miss out?

2. STATISTICS IN ECONOMICS

In the previous section you were toldabout certain special studies thatconcern the basic problems facing acountry. These studies required thatwe know more about economic factsin terms of numbers. Such economicfacts are also known as data.

The purpose of collecting dataabout these economic problems is tounderstand and explain theseproblems in terms of the variouscauses behind them. In other words,we try to analyse them. For example,when we analyse the hardships ofpoverty, we try to explain it in termsof the various factors such as

INTRODUCTION 5

unemployment, low productivity ofpeople, backward technology, etc.

But, what purpose does theanalysis of poverty serve unless we areable to find ways to mitigate it. Wemay, therefore, also try to find thosemeasures that help solve an economicproblem. In Economics, suchmeasures are known as policies.

So, do you realise, then, that noanalysis of a problem would bepossible without the availability ofdata on various factors underlying aneconomic problem? And, that, in sucha situation, no policies can beformulated to solve it. If yes, then youhave, to a large extent, understood thebasic relationship between Economicsand Statistics.

3. WHAT IS STATISTICS?

At this stage you are probably readyto know more about Statistics. Youmight very well want to know what thesubject “Statistics” is all about. Whatare its specific uses in Economics?Does it have any other meaning? Letus see how we can answer thesequestions to get closer to the subject.

In our daily language the word‘Statistics’ is used in two distinctsenses: singular and plural. In theplural sense, ‘statistics’ means‘numerical facts systematicallycollected’ as described by OxfordDictionary. Thus, the simple meaningof statistics in plural sense is data.

Do you know that the term statistics

in singular means the ‘science ofcollecting, classifying and usingstatistics’ or a ‘statistical fact’.

By data or statistics, we mean bothquantitative and qualitative facts thatare used in Economics. For example,a statement in Economics like “theproduction of rice in India hasincreased from 39.58 million tonnesin 1974–75 to 58.64 million tonnes in1984–85”, is a quantitative fact. Thenumerical figures such as ‘39.58million tonnes’ and ‘58.64 milliontonnes’ are statistics of theproduction of rice in India for1974–75 and 1984–85 respectively.

In addition to the quantitativedata, Economics also uses qualitativedata. The chief characteristic of suchinformation is that they describeattributes of a single person or a groupof persons that is important to recordas accurately as possible even thoughthey cannot be measured inquantitative terms. Take, for example,“gender” that distinguishes a personas man/woman or boy/girl. It is oftenpossible (and useful) to state theinformation about an attribute of aperson in terms of degrees (like better/worse; sick/ healthy/ more healthy;unskilled/ skilled/ highly skilled etc.).Such qualitative information orstatistics is often used in Economicsand other social sciences andcollected and stored systematicallylike quantitative information (onprices, incomes, taxes paid etc.),whether for a single person or a groupof persons.

You will study in the subsequentchapters that statistics involvescollection and organisation of data. Thenext step is to present the data in

6 STATISTICS FOR ECONOMICS

tabular, diagrammatic and graphicforms. The data, then, is summarisedby calculating various numericalindices such as mean, variance,standard deviation etc. that representthe broad characteristics of thecollected set of information.

Activities

• Think of two examples ofqualitative and quantitative data.

• Which of the following would giveyou qualitative data; beauty,intelligence, income earned,marks in a subject, ability to

sing, learning skills?

4. WHAT STATISTICS DOES?

By now, you know that Statistics isan indispensable tool for an economistthat helps him to understand aneconomic problem. Using its variousmethods, effort is made to find thecauses behind it with the help of thequalitative and the quantitative factsof the economic problem. Once thecauses of the problem are identified,it is easier to formulate certain policiesto tackle it.

But there is more to Statistics. Itenables an economist to presenteconomic facts in a precise anddefinite form that helps in propercomprehension of what is stated.When economic facts are expressed instatistical terms, they become exact.Exact facts are more convincing thanvague statements. For instance,saying that with precise figures, 310people died in the recent earthquakein Kashmir, is more factual and, thus,

a statistical data. Whereas, sayinghundreds of people died, is not.

Statistics also helps in condensingthe mass of data into a few numericalmeasures (such as mean, varianceetc., about which you will learn later).These numerical measures helpsummarise data. For example, itwould be impossible for you toremember the incomes of all thepeople in a data if the number ofpeople is very large. Yet, one canremember easily a summary figure likethe average income that is obtainedstatistically. In this way, Statisticssummarises and presents ameaningful overall information abouta mass of data.

Quite often, Statistics is used infinding relationships between differenteconomic factors. An economist maybe interested in finding out whathappens to the demand for acommodity when its price increasesor decreases? Or, would the supply ofa commodity be affected by thechanges in its own price? Or, wouldthe consumption expenditure increasewhen the average income increases?Or, what happens to the general pricelevel when the governmentexpenditure increases? Such ques-tions can only be answered if anyrelationship exists between thevarious economic factors that havebeen stated above. Whether suchrelationships exist or not can be easilyverified by applying statisticalmethods to their data. In some casesthe economist might assume certainrelationships between them and like

INTRODUCTION 7

to test whether the assumption she/he made about the relationship is validor not. The economist can do this onlyby using statistical techniques.

In another instance, the economistmight be interested in predicting thechanges in one economic factor dueto the changes in another factor. Forexample, she/he might be interestedin knowing the impact of today’sinvestment on the national income infuture. Such an exercise cannot beundertaken without the knowledge ofStatistics.

Sometimes, formulation of plansand policies requires the knowledgeof future trends. For example, an

consumption of past years or of recentyears obtained by surveys. Thus,statistical methods help formulateappropriate economic policies thatsolve economic problems.

5. CONCLUSION

Today, we increasingly use Statisticsto analyse serious economic problemssuch as rising prices, growingpopulation, unemployment, povertyetc., to find measures that can solvesuch problems. Further it also helpsevaluate the impact of such policiesin solving the economic problems. Forexample, it can be ascertained easily

economic planner has to decide in2005 how much the economy shouldproduce in 2010. In other words, onemust know what could be theexpected level of consumption in 2010in order to decide the production planof the economy for 2010. In thissituation, one might make subjectivejudgement based on the guess aboutconsumption in 2010. Alternatively,one might use statistical tools topredict consumption in 2010. Thatcould be based on the data of

using statistical techniques whetherthe policy of family planning iseffective in checking the problem ofever-growing population.

In economic policies, Statisticsplays a vital role in decision making.For example, in the present time ofrising global oil prices, it might benecessary to decide how much oilIndia should import in 2010. Thedecision to import would depend onthe expected domestic production ofoil and the likely demand for oil in

Statistical methods are no substitute for common sense!

There is an interesting story which is told to make fun of statistics. It is saidthat a family of four persons (husband, wife and two children) once set outto cross a river. The father knew the average depth of the river. So hecalculated the average height of his family members. Since the average heightof his family members was greater than the average depth of the river, hethought they could cross safely. Consequently some members of the family(children) drowned while crossing the river.

Does the fault lie with the statistical method of calculating averages orwith the misuse of the averages?

8 STATISTICS FOR ECONOMICS

Recap

• Our wants are unlimited but the resources used in the productionof goods that satisfy our wants are limited and scarce. Scarcity isthe root of all economic problems.

• Resources have alternative uses.• Purchase of goods by consumers to satisfy their various needs is

Consumption.• Manufacture of goods by producers for the market is Production.• Division of the national income into wages, profits, rents and interests

is Distribution.• Statistics finds economic relationships using data and verifies them.• Statistical tools are used in prediction of future trends.• Statistical methods help analyse economic problems and

formulate policies to solve them.

EXERCISES

1. Mark the following statements as true or false.(i) Statistics can only deal with quantitative data.(ii) Statistics solves economic problems.(iii) Statistics is of no use to Economics without data.

2. Make a list of activities that constitute the ordinary business of life. Arethese economic activities?

3. ‘The Government and policy makers use statistical data to formulatesuitable policies of economic development’. Illustrate with two examples.

4. You have unlimited wants and limited resources to satisfy them. Explainby giving two examples.

5. How will you choose the wants to be satisfied?

6. What are your reasons for studying Economics?

7. Statistical methods are no substitute for common sense. Comment.

2010. Without the use of Statistics, itcannot be determined what theexpected domestic production of oiland the likely demand for oil wouldbe. Thus, the decision to import oil

cannot be made unless we know theactual requirement of oil. This vitalinformation that help make thedecision to import oil can only beobtained statistically.

Collection of Data

1. INTRODUCTION

In the previous chapter, you have readabout what is economics. You alsostudied about the role and importanceof statistics in economics. In this

Studying this chapter should enableyou to:• understand the meaning and

purpose of data collection;• distinguish between primary and

secondary sources;• know the mode of collection of data;• distinguish between Census and

Sample Surveys;• be familiar with the techniques of

sources of secondary data.

chapter, you will study the sources ofdata and the mode of data collection.The purpose of collection of data is tocollect evidence for reaching a soundand clear solution to a problem.

In economics, you often comeacross a statement like,

“After many fluctuations the outputof food grains rose to 176 million tonnesin 1990–91 and 199 million tonnes in1996–97, but fell to 194 million tonnesin 1997–98. Production of food grainsthen rose continuously and touched212 million tonnes in 2001–02.”

In this statement, you can observethat the food grains production indifferent years does not remain thesame. It varies from year to year andfrom crop to crop. As these values

C H A P T E R

2

1 0 STATISTICS FOR ECONOMICS

vary, they are called variable. Thevariables are generally represented bythe letters X, Y or Z. The values ofthese variables are the observation.For example, suppose the food grainproduction in India varies between100 million tonnes in 1970–71 to 220million tonnes in 2001–02 as shownin the following table. The years arerepresented by variable X and theproduction of food grain in India (inmillion tonnes) is represented byvariable Y:

TABLE 2.1

Production of Food Grain in India

(Million Tonnes)

X Y

1970–71 108

1978–79 132

1979–80 108

1990–91 176

1996–97 199

1997–98 194

2001–02 212

Here, these values of the variablesX and Y are the ‘data’, from which wecan obtain information about thetrend of the production of food grainsin India. To know the fluctuations inthe output of food grains, we need the‘data’ on the production of food grainsin India. ‘Data’ is a tool, which helpsin understanding problems byproviding information.

You must be wondering where do‘data’ come from and how do we collectthese? In the following sections we willdiscuss the types of data, method andinstruments of data collection andsources of obtaining data.

2. WHAT ARE THE SOURCES OF DATA?

Statistical data can be obtained fromtwo sources. The enumerator (personwho collects the data) may collect thedata by conducting an enquiry or aninvestigation. Such data are calledPrimary Data, as they are based onfirst hand information. Suppose, youwant to know about the popularity ofa film star among school students. Forthis, you will have to enquire from alarge number of school students, byasking questions from them to collectthe desired information. The data youget, is an example of primary data.

If the data have been collected andprocessed (scrutinised and tabulated)by some other agency, they are calledSecondary Data. Generally, thepublished data are secondary data.They can be obtained either frompublished sources or from any othersource, for example, a web site. Thus,the data are primary to the source thatcollects and processes them for thefirst time and secondary for all sourcesthat later use such data. Use ofsecondary data saves time and cost.For example, after collecting the dataon the popularity of the film staramong students, you publish a report.If somebody uses the data collectedby you for a similar study, it becomessecondary data.

3. HOW DO WE COLLECT THE DATA?

Do you know how a manufacturerdecides about a product or how apolitical party decides about acandidate? They conduct a survey by

COLLECTION OF DATA 1 1

asking questions about a particularproduct or candidate from a largegroup of people. The purpose ofsurveys is to describe somecharacteristics like price, quality,usefulness (in case of the product) andpopularity, honesty, loyalty (in caseof the candidate). The purpose of thesurvey is to collect data. Survey is amethod of gathering information fromindividuals.

Preparation of Instrument

The most common type of instrumentused in surveys is questionnaire/interview schedule. The questionnaireis either self administered by therespondent or administered by theresearcher (enumerator) or trainedinvestigator. While preparing thequestionnaire/interview schedule, youshould keep in mind the followingpoints;

• The questionnaire should not be toolong. The number of questionsshould be as minimum as possible.Long questionnaires discouragepeople from completing them.

• The series of questions should movefrom general to specific. Thequestionnaire should start fromgeneral questions and proceed tomore specific ones. This helps therespondents feel comfortable. Forexample:

Poor Q(i) Is increase in electricity charges

justified?(ii) Is the electricity supply in your

locality regular?

Good Q(i) Is the electricity supply in your

locality regular?(ii) Is increase in electricity charges

justified?

• The questions should be preciseand clear. For example,

Poor QWhat percentage of your income doyou spend on clothing in order to lookpresentable?Good QWhat percentage of your income doyou spend on clothing?

• The questions should not beambiguous, to enable the respon-dents to answer quickly, correctlyand clearly. For example:

Poor QDo you spend a lot of money on booksin a month?Good QHow much do you spend on books ina month?(i) Less than Rs 200(ii) Between Rs 200–300(iii)Between Rs 300–400(iv) More than Rs 400

• The question should not use doublenegatives. The questions startingwith “Wouldn’t you” or “Don’t you”should be avoided, as they maylead to biased responses. Forexample:

Poor QDon’t you think smoking should beprohibited?Good QDo you think smoking should beprohibited?

1 2 STATISTICS FOR ECONOMICS

• The question should not be a

Poor QHow do you like the flavour of thishigh-quality tea?Good QHow do you like the flavour of this tea?

• The question should not indicate

Poor QWould you like to do a job after collegeor be a housewife?Good QWould you like to do a job, if possible?

The questionnaire may consist ofclosed ended (or structured) questionsor open ended (or unstructured)questions.

Closed ended or structuredquestions can either be a two-wayquestion or a multiple choice question.When there are only two possibleanswers, ‘yes’ or ‘no’, it is called a two-way question.

When there is a possibility of morethan two options of answers, multiplechoice questions are more appropriate.Example,Q. Why did you sell your land?

(i) To pay off the debts.(ii) To finance children’s educa-

tion.(iii) To invest in another property.(iv) Any other (please specify).

Closed -ended questions are easyto use, score and code for analysis,

because all the respondents respondfrom the given options. But they aredifficult to write as the alternativesshould be clearly written to representboth sides of the issue. There is alsoa possibility that the individual’s trueresponse is not present among theoptions given. For this, the choice of‘Any Other’ is provided, where therespondent can write a response,which was not anticipated by theresearcher. Moreover, anotherlimitation of multiple-choice questionsis that they tend to restrict theanswers by providing alternatives,without which the respondents mayhave answered differently.

Open-ended questions allow formore individualised responses, butthey are difficult to interpret and hardto score, since there are a lot ofvariations in the responses. Example,Q. What is your view about

globalisation?

Mode of Data Collection

Have you ever come across a televisionshow in which reporters ask questionsfrom children, housewives or generalpublic regarding their examinationperformance or a brand of soap or apolitical party? The purpose of askingquestions is to do a survey forcollection of data. There are threebasic ways of collecting data: (i)Personal Interviews, (ii) Mailing(questionnaire) Surveys, and (iii)Telephone Interviews.

COLLECTION OF DATA 1 3

Personal Interviews

This method is usedwhen the researcherhas access to all themembers. The resea-rcher (or investigator)conducts face to face interviews withthe respondents.

Personal interviews are preferreddue to various reasons. Personalcontact is made between therespondent and the interviewer. Theinterviewer has the opportunity ofexplaining the study and answeringany query of the respondents. Theinterviewer can request the respon-dent to expand on answers that areparticularly important. Misinterpre-tation and misunderstanding can beavoided. Watching the reactions of therespondents can provide supplemen-tary information.

Personal interview has somedemerits too. It is expensive, as itrequires trained interviewers. It takeslonger time to complete the survey.Presence of the researcher may inhibitrespondents from saying what theyreally think.

Mailing Questionnaire

When the data in a survey arecollected by mail, the questionnaire issent to each individualby mail with a requestto complete and returnit by a given date. Theadvantages of thismethod are that, it is

less expensive. It allows the researcherto have access to people in remoteareas too, who might be difficult toreach in person or by telephone. Itdoes not allow influencing of therespondents by the interviewer. It alsopermits the respondents to takesufficient time to give thoughtfulanswers to the questions. These daysonline surveys or surveys throughshort messaging service i.e. SMS havebecome popular. Do you know how anonline survey is conducted?

The disadvantages of mail surveyare that, there is less opportunity toprovide assistance in clarifyinginstructions, so there is a possibilityof misinterpretation of questions.Mailing is also likely to produce lowresponse rates due to certain factorssuch as returning the questionnairewithout completing it, not returningthe questionnaire at all, loss ofquestionnaire in the mail itself, etc.

Telephone Interviews

In a telephone interview, theinvestigator asks questions over the

telephone. The advan-tages of telephoneinterviews are that theyare cheaper thanpersonal interviews and

can be conducted in a shorter time.They allow the researcher to assist therespondent by clarifying thequestions. Telephone interview isbetter in the cases where therespondents are reluctant to answercertain questions in personalinterviews.

1 4 STATISTICS FOR ECONOMICS

Activities

• You have to collect informationfrom a person, who lives in aremote village of India. Whichmode of data collection will bethe most appropriate forcollecting information from him?

• You have to interview the parentsabout the quality of teaching ina school. If the principal of theschool is present there, whattypes of problems can arise?

The disadvantage of this methodis access to people, as many peoplemay not own telephones. TelephoneInterviews also obstruct visualreactions of the respondents, whichbecomes helpful in obtaininginformation on sensitive issues.

Pilot Survey

Once the questionnaire is ready, it isadvisable to conduct a try-out with a

small group which is known as PilotSurvey or Pre-Testing of thequestionnaire. The pilot survey helpsin providing a preliminary idea aboutthe survey. It helps in pre-testing ofthe questionnaire, so as to know theshortcomings and drawbacks of thequestions. Pilot survey also helps inassessing the suitability of questions,clarity of instructions, performance ofenumerators and the cost and timeinvolved in the actual survey.

4. CENSUS AND SAMPLE SURVEYS

Census or Complete Enumeration

A survey, which includes everyelement of the population, is knownas Census or the Method of CompleteEnumeration. If certain agencies areinterested in studying the totalpopulation in India, they have toobtain information from all thehouseholds in rural and urban India.

• Most expensive• Possibility of influencing

respondents• More time taking.

• Cannot be used by illiterates• Long response time• Does not allow explanation of

unambiguous questions• Reactions cannot be watched.

• Limited use• Reactions cannot be watched• Possibility of influencing respon-

dents.

• Highest Response Rate• Allows use of all types of questions• Better for using open-endedquestions

• Allows clarification of ambiguousquestions.

• Least expensive• Only method to reach remote areas• No influence on respondents• Maintains anonymity of respondents• Best for sensitive questions.

• Relatively low cost• Relatively less influence onrespondents

• Relatively high response rate.

COLLECTION OF DATA 1 5

The essential feature of this methodis that this covers every individual unitin the entire population. You cannotselect some and leave out others. Youmay be familiar with the Census ofIndia, which is carried out every tenyears. A house-to-house enquiry iscarried out, covering all householdsin India. Demographic data on birthand death rates, literacy, workforce,life expectancy, size and compositionof population, etc. are collected andpublished by the Registrar General ofIndia. The last Census of India washeld in February 2001.

According to the Census 2001,population of India is 102.70 crore. Itwas 23.83 crore according to Census1901. In a period of hundred years,the population of our countryincreased by 78.87 crore. Census

1981 indicated that the rate ofpopulation growth during 1960s and1970s remained almost same. 1991Census indicated that the annualgrowth rate of population during1980s was 2.14 per cent, which camedown to 1.93 per cent during 1990saccording to Census 2001.

“At 00.00 hours of first March,2001 the population of India stoodat 1027,015,247 comprising of531,277,078 males and495,738,169 females. Thus, Indiabecomes the second country in theworld after China to cross the onebillion mark.”

Source: Census of India, 2001.

Sample Survey

Population or the Universe in statisticsmeans totality of the items understudy. Thus, the Population or theUniverse is a group to which theresults of the study are intended toapply. A population is always all theindividuals/items who possess certaincharacteristics (or a set of characteris-

Source: Census of India, 2001.

1 6 STATISTICS FOR ECONOMICS

• Sample: Ten per cent of theagricultural labourers in Chura-chandpur district.

Most of the surveys are samplesurveys. These are preferred instatistics because of a number ofreasons. A sample can providereasonably reliable and accurateinformation at a lower cost andshorter time. As samples are smallerthan population, more detailedinformation can be collected byconducting intensive enquiries. As weneed a smaller team of enumerators,it is easier to train them and supervisetheir work more effectively.

Now the question is how do youdo the sampling? There are two maintypes of sampling, random and non-random. The following description willmake their distinction clear.

Activities

• In which years will the nextCensus be held in India andChina?

• If you have to study the opinionof students about the neweconomics textbook of class XI,what will be your population andsample?

• If a researcher wants to estimatethe average yield of wheat inPunjab, what will be her/hispopulation and sample?

Random Sampling

As the name suggests, randomsampling is one where the individualunits from the population (samples)are selected at random. Thegovernment wants to determine the

tics), according to the purpose of thesurvey. The first task in selecting asample is to identify the population.Once the population is identified, theresearcher selects a RepresentativeSample, as it is difficult to study theentire population. A sample refers toa group or section of the populationfrom which information is to beobtained. A good sample (represen-tative sample) is generally smaller thanthe population and is capable ofproviding reasonably accurateinformation about the population ata much lower cost and shorter time.

Suppose you want to study theaverage income of people in a certainregion. According to the Censusmethod, you would be required to findout the income of every individual inthe region, add them up and divideby number of individuals to get theaverage income of people in the region.This method would require hugeexpenditure, as a large number ofenumerators have to be employed.Alternatively, you select a represent-ative sample, of a few individuals, fromthe region and find out their income.The average income of the selectedgroup of individuals is used as anestimate of average income of theindividuals of the entire region.

Example

• Research problem: To study theeconomic condition of agriculturallabourers in Churachandpur districtof Manipur.• Population: All agriculturallabourers in Churachandpur district.

COLLECTION OF DATA 1 7

impact of the rise in petrol price onthe household budget of a particularlocality. For this, a representative(random) sample of 30 households hasto be taken and studied. The namesof all the 300 households of that areaare written on pieces of paper andmixed well, then 30 names to beinterviewed are selected one by one.

In the random sampling, everyindividual has an equal chance of beingselected and the individuals who areselected are just like the ones who arenot selected. In the above example, allthe 300 sampling units (also calledsampling frame) of the population gotan equal chance of being included inthe sample of 30 units and hence thesample, such drawn, is a randomsample. This is also called lotterymethod. The same could be done usinga Random Number Table also.

How to use the Random NumberTables?

Do you know what are the RandomNumber Tables? Random number

tables have been generated toguarantee equal probability ofselection of every individual unit (bytheir listed serial number in thesampling frame) in the population.They are available either in apublished form or can be generatedby using appropriate softwarepackages (See Appendix B).You canstart using the table from anywhere,i.e., from any page, column, row orpoint. In the above example, you needto select a sample of 30 householdsout of 300 total households. Here, thelargest serial number is 300, a threedigit number and therefore we consultthree digit random numbers insequence. We will skip the randomnumbers greater than 300 since thereis no household number greater than300. Thus, the 30 selected householdsare with serial numbers: 149, 219,111, 165, 230, 007, 089, 212, 051,244, 300, 051, 244, 155, 300, 051,152, 156, 205, 070, 015, 157, 040,243, 479, 116, 122, 081, 160, 162.

Exit Polls

You must have seen that when an

election takes place, the television

networks provide election coverage.

They also try to predict the results.

This is done through exit polls,

wherein a random sample of voters

who exit the polling booths are asked

whom they voted for. From the data

of the sample of voters, the

A non RepresentativeSample

A RepresentativeSample

A Population of 20Kuchha and 20Pucca Houses

1 8 STATISTICS FOR ECONOMICS

Activity

• You have to analyse the trend offoodgrains production in Indiafor the last fifty years. As it isdifficult to include all the years,you have to select a sample ofproduction of ten years. Usingthe Random Number Tables,how will you select your sample?

Non-Random Sampling

There may be a situation that youhave to select 10 out of 100households in a locality. You have todecide which household to select andwhich to reject. You may select thehouseholds conveniently situated orthe households known to you or yourfriend. In this case, you are using yourjudgement (bias) in selecting 10households. This way of selecting 10out of 100 households is not a randomselection. In a non-random samplingmethod all the units of the populationdo not have an equal chance of beingselected and convenience or judgementof the investigator plays an importantrole in selection of the sample. They aremainly selected on the basis ofjudgment, purpose, convenience orquota and are non-random samples.

5. SAMPLING AND NON-SAMPLINGE R R O R S

Sampling Errors

The purpose of the sample is to takean estimate of the population.Sampling error refers to thedifferences between the sampleestimate and the actual value of a

characteristic of the population (thatmay be the average income, etc.). It isthe error that occurs when you makean observation from the sample takenfrom the population. Thus, thedifference between the actual value ofa parameter of the population (whichis not known) and its estimate (fromthe sample) is the sampling error. It ispossible to reduce the magnitude ofsampling error by taking a largersample.

Example

Consider a case of incomes of 5farmers of Manipur. The variable x(income of farmers) has measure-ments 500, 550, 600, 650, 700. Wenote that the population average of( 5 0 0 + 5 5 0 + 6 0 0 + 6 5 0 + 7 0 0 )÷ 5 = 3000 ÷ 5 = 600.

Now, suppose we select a sampleof two individuals where x hasmeasurements of 500 and 600. Thesample average is (500 + 600) ÷ 2= 1100 ÷ 2 = 550.Here, the sampling error of theestimate = 600 (true value) – 550(estimate) = 50.

Non-Sampling Errors

Non-sampling errors are more seriousthan sampling errors because asampling error can be minimised bytaking a larger sample. It is difficultto minimise non-sampling error, evenby taking a large sample. Even aCensus can contain non-samplingerrors. Some of the non-samplingerrors are:

COLLECTION OF DATA 1 9

Errors in Data Acquisition

This type of error arises from recordingof incorrect responses. Suppose, theteacher asks the students to measurethe length of the teacher’s table in theclassroom. The measurement by thestudents may differ. The differencesmay occur due to differences inmeasuring tape, carelessness of thestudents etc. Similarly, suppose wewant to collect data on prices oforanges. We know that prices varyfrom shop to shop and from marketto market. Prices also vary accordingto the quality. Therefore, we can onlyconsider the average prices. Recordingmistakes can also take place as theenumerators or the respondents maycommit errors in recording or trans-scripting the data, for example, he/she may record 13 instead of 31.

Non-Response Errors

Non-response occurs if an intervieweris unable to contact a person listed inthe sample or a person from thesample refuses to respond. In thiscase, the sample observation may notbe representative.

Sampling Bias

Sampling bias occurs when thesampling plan is such that somemembers of the target populationcould not possibly be included in thesample.

6. CENSUS OF INDIA AND NSSO

There are some agencies both at thenational and state level, which collect,

process and tabulate the statisticaldata. Some of the major agencies atthe national level are Census of India,National Sample Survey Organisation(NSSO), Central Statistical Organisa-tion (CSO), Registrar General of India(RGI), Directorate General ofCommercial Intelligence and Statistics(DGCIS), Labour Bureau etc.

The Census of India provides themost complete and continuousdemographic record of population. TheCensus is being regularly conductedevery ten years since 1881. The firstCensus after Independence was heldin 1951. The Census collectsinformation on various aspects ofpopulation such as the size, density,sex ratio, literacy, migration, rural-urban distribution etc. Census inIndia is not merely a statisticaloperation, the data is interpreted andanalysed in an interesting manner.

The NSSO was established by thegovernment of India to conductnation-wide surveys on socio-economic issues. The NSSO doescontinuous surveys in successiverounds. The data collected by NSSOsurveys, on different socio economicsubjects, are released through reportsand its quarterly journalSarvekshana. NSSO provides periodicestimates of literacy, schoolenrolment, utilisation of educationalservices, employment, unemployment,manufacturing and service sectorenterprises, morbidity, maternity,child care, utilisation of the publicdistribution system etc. The NSS 59thround survey (January–December

2 0 STATISTICS FOR ECONOMICS

Recap

• Data is a tool which helps in reaching a sound conclusion on anyproblem by providing information.

• Primary data is based on first hand information.• Survey can be done by personal interviews, mailing questionnaires

and telephone interviews.• Census covers every individual/unit belonging to the population.• Sample is a smaller group selected from the population from which

the relevant information would be sought.• In a random sampling, every individual is given an equal chance of

being selected for providing information.• Sampling error arises due to the difference between the actual

population and the estimate.• Non-sampling errors can arise in data acquisition, by non-response

or by bias in selection.• Census of India and National Sample Survey Organisation

are two important agencies at the national level, which collect,process and tabulate data.

EXERCISES

1. Frame at least four appropriate multiple-choice options for followingquestions:(i) Which of the following is the most important when you buy a new

dress?

2003) was on land and livestockholdings, debt and investment. TheNSS 60th round survey (January–June 2004) was on morbidity andhealth care. The NSSO alsoundertakes the fieldwork of Annualsurvey of industries, conducts cropestimation surveys, collects rural andurban retail prices for compilation ofconsumer price index numbers.

7. CONCLUSION

Economic facts, expressed in terms ofnumbers, are called data. The purpose

of data collection is to understand,

explain and analyse a problem and

causes behind it. Primary data is

obtained by conducting a survey.Survey includes various steps, which

need to be planned carefully. There are

various agencies which collect,

process, tabulate and publish

statistical data. These can be used as

secondary data. However, the choiceof source of data and mode of data

collection depends on the objective of

the study.

COLLECTION OF DATA 2 1

(ii) How often do you use computers?(iii) Which of the newspapers do you read regularly?(iv) Rise in the price of petrol is justified.(v) What is the monthly income of your family?

2. Frame five two-way questions (with ‘Yes’ or ‘No’).

3. (i) There are many sources of data (true/false).(ii) Telephone survey is the most suitable method of collecting data, when

the population is literate and spread over a large area (true/false).(iii) Data collected by investigator is called the secondary data (true/false).(iv) There is a certain bias involved in the non-random selection of samples

(true/false).(v) Non-sampling errors can be minimised by taking large samples (true/

false).

4. What do you think about the following questions. Do you find any problemwith these questions? If yes, how?(i) How far do you live from the closest market?(ii) If plastic bags are only 5 percent of our garbage, should it be banned?(iii) Wouldn’t you be opposed to increase in price of petrol?(iv) (a) Do you agree with the use of chemical fertilizers?

(b) Do you use fertilizers in your fields?(c) What is the yield per hectare in your field?

5. You want to research on the popularity of Vegetable Atta Noodles amongchildren. Design a suitable questionnaire for collecting this information.

6. In a village of 200 farms, a study was conducted to find the croppingpattern. Out of the 50 farms surveyed, 50% grew only wheat. Identify thepopulation and the sample here.

7. Give two examples each of sample, population and variable.

8. Which of the following methods give better results and why?(a) Census (b) Sample

9. Which of the following errors is more serious and why?

(a) Sampling error (b) Non-Sampling error

10. Suppose there are 10 students in your class. You want to select three outof them. How many samples are possible?

11. Discuss how you would use the lottery method to select 3 students out of10 in your class?

12. Does the lottery method always give you a random sample? Explain.

13. Explain the procedure of selecting a random sample of 3 students out of10 in your class, by using random number tables.

14. Do samples provide better results than surveys? Give reasons for youranswer.

between census and sampling. In thischapter, you will know how the data,that you collected, are to be classified.The purpose of classifying raw data isto bring order in them so that theycan be subjected to further statisticalanalysis easily.

Have you ever observed your localjunk dealer or kabadiwallah to whomyou sell old newspapers, brokenhousehold items, empty glass bottles,plastics etc. He purchases thesethings from you and sells them tothose who recycle them. But with somuch junk in his shop it would be verydifficult for him to manage his trade,if he had not organised them properly.To ease his situation he suitablygroups or “classifies” various junk.He puts old newspapers together and

Organisation of Data

1. INTRODUCTION

In the previous chapter you havelearnt about how data is collected. Youalso came to know the difference

Studying this chapter should enableyou to:• classify the data for further

statistical analysis;• distinguish between quantitative

and qualitative classification;• prepare a frequency distribution

table;• know the technique of forming

classes;• be familiar with the method of tally

marking;• differentiate between univariate

and bivariate frequency distribu-tions.

CHAPTER

ORGANISATION OF DATA 2 3

ties them with a rope. Then collectsall empty glass bottles in a sack. Heheaps the articles of metals in onecorner of his shop and sorts them intogroups like “iron”, “copper”,“aluminium”, “brass” etc., and so on.In this way he groups his junk intodifferent classes — “newspapers,“plastics”, “glass”, “metals” etc. — andbrings order in them. Once his junkis arranged and classified, it becomeseasier for him to find a particular itemthat a buyer may demand.

Likewise when you arrange yourschoolbooks in a certain order, itbecomes easier for you to handlethem. You may classify them

according to subjects where eachsubject becomes a group or a class.So, when you need a particular bookon history, for instance, all you needto do is to search that book in thegroup “History”. Otherwise, youwould have to search through yourentire collection to find the particularbook you are looking for.

While classification of objects orthings saves our valuable time andeffort, it is not done in an arbitrary

manner. The kabadiwallah groups hisjunk in such a way that each groupconsists of similar items. For example,under the group “Glass” he would putempty bottles, broken mirrors andwindowpanes etc. Similarly when youclassify your history books under thegroup “History” you would not put abook of a different subject in thatgroup. Otherwise the entire purposeof grouping would be lost.Classification, therefore, is arrangingor organising similar things into groupsor classes.

Activity

• Visit your local post-office to findout how letters are sorted. Doyou know what the pin-code in aletter indicates? Ask yourpostman.

2. RAW DATA

Like the kabadiwallah’s junk, theunclassified data or raw data arehighly disorganised. They are oftenvery large and cumbersome to handle.To draw meaningful conclusions fromthem is a tedious task because theydo not yield to statistical methodseasily. Therefore proper organisationand presentation of such data isneeded before any systematicstatistical analysis is undertaken.Hence after collecting data the nextstep is to organise and present themin a classified form.

Suppose you want to know theperformance of students inmathematics and you have collecteddata on marks in mathematics of 100

2 4 STATISTICS FOR ECONOMICS

students of your school. If you presentthem as a table, they may appearsomething like Table 3.1.

TABLE 3.1Marks in Mathematics Obtained by 100

Students in an Examination

47 45 10 60 51 56 66 100 49 4060 59 56 55 62 48 59 55 51 4142 69 64 66 50 59 57 65 62 5064 30 37 75 17 56 20 14 55 9062 51 55 14 25 34 90 49 56 5470 47 49 82 40 82 60 85 65 6649 44 64 69 70 48 12 28 55 6549 40 25 41 71 80 0 56 14 2266 53 46 70 43 61 59 12 30 3545 44 57 76 82 39 32 14 90 25

Or you could have collected dataon the monthly expenditure on foodof 50 households in yourneighbourhood to know their averageexpenditure on food. The datacollected, in that case, had you

presented as a table, would haveresembled Table 3.2. Both Tables 3.1and 3.2 are raw or unclassified data.In both the tables you find thatnumbers are not arranged in anyorder. Now if you are asked what arethe highest marks in mathematics

TABLE 3.2Monthly Household Expenditure (inRupees) on Food of 50 Households

1904 1559 3473 1735 27602041 1612 1753 1855 44395090 1085 1823 2346 15231211 1360 1110 2152 11831218 1315 1105 2628 27124248 1812 1264 1183 11711007 1180 1953 1137 20482025 1583 1324 2621 36761397 1832 1962 2177 25751293 1365 1146 3222 1396

from Table 3.1 then you have to firstarrange the marks of 100 studentseither in ascending or in descendingorder. That is a tedious task. Itbecomes more tedious, if instead of100 you have the marks of a 1,000students to handle. Similarly in Table3.2, you would note that it is difficultfor you to ascertain the averagemonthly expenditure of 50households. And this difficulty will goup manifold if the number was larger— say, 5,000 households. Like ourkabadiwallah, who would bedistressed to find a particular itemwhen his junk becomes large anddisarranged, you would face a similarsituation when you try to get anyinformation from raw data that arelarge. In one word, therefore, it is atedious task to pull information fromlarge unclassified data.

The raw data are summarised, andmade comprehensible by classifi-cation. When facts of similarcharacteristics are placed in the sameclass, it enables one to locate themeasily, make comparison, and drawinferences without any difficulty. You

ORGANISATION OF DATA 2 5

have studied in Chapter 2 that theGovernment of India conducts Censusof population every ten years. The rawdata of census are so large andfragmented that it appears an almostimpossible task to draw anymeaningful conclusion from them.But when the data of Census areclassified according to gender,education, marital status, occupation,etc., the structure and nature ofpopulation of India is, then, easilyunderstood.

The raw data consist ofobservations on variables. Each unitof raw data is an observation. In Table3.1 an observation shows a particularvalue of the variable “marks of astudent in mathematics”. The rawdata contain 100 observations on“marks of a student” since there are100 students. In Table 3.2 it shows aparticular value of the variable“monthly expenditure of a householdon food”. The raw data in it contain50 observations on “monthlyexpenditure on food of a household”because there are 50 households.

Activity

• Collect data of total weeklyexpenditure of your family for ayear and arrange it in a table.See how many observations youhave. Arrange the data monthlyand find the number ofobservations.

3. CLASSIFICATION OF DATA

The groups or classes of aclassification can be done in various

ways. Instead of classifying your booksaccording to subjects — “History”,“Geography”, “Mathematics”, “Science”etc. — you could have classified themauthor-wise in an alphabetical order.Or, you could have also classified themaccording to the year of publication.The way you want to classify themwould depend on your requirement.

Likewise the raw data could beclassified in various ways dependingon the purpose in hand. They can begrouped according to time. Such aclassification is known as aChronological Classification. Insuch a classification, data areclassified either in ascending or indescending order with reference totime such as years, quarters, months,weeks, etc. The following exampleshows the population of Indiaclassified in terms of years. Thevariable ‘population’ is a Time Seriesas it depicts a series of values fordifferent years.

Example 1

Population of India (in crores)

Year Population (Crores)

1951 35.7

1961 43.8

1971 54.6

1981 68.4

1991 81.8

2001 102.7

In Spatial Classification the dataare classified with reference togeographical locations such ascountries, states, cities, districts, etc.Example 2 shows the yield of wheat indifferent countries.

2 6 STATISTICS FOR ECONOMICS

Example 2

Yield of Wheat for Different Countries

Country Yield of wheat (kg/acre)

America 1925Brazil 127China 893Denmark 225France 439India 862

Activities

• In the time-series of Example 1,in which year do you find thepopulation of India to be theminimum. Find the year when itis the maximum.

• In Example 2, find the countrywhose yield of wheat is slightlymore than that of India’s. Howmuch would that be in terms ofpercentage?

• Arrange the countries ofExample 2 in the ascendingorder of yield. Do the sameexercise for the descending orderof yield.

Sometimes you come acrosscharacteristics that cannot beexpressed quantitatively. Suchcharacteristics are called Qualities orAttributes. For example, nationality,literacy, religion, gender, maritalstatus, etc. They cannot be measured.Yet these attributes can be classified

on the basis of either the presence orthe absence of a qualitativecharacteristic. Such a classification ofdata on attributes is called aQualitative Classification. In thefollowing example, we find populationof a country is grouped on the basisof the qualitative variable “gender”. Anobservation could either be a male ora female. These two characteristicscould be further classified on the basisof marital status (a qualitativevariable) as given below:

Example 3

Population

Male Female

Married Unmarried Married Unmarried

The classification at the first stageis based on the presence and absenceof an attribute i.e. male or not male(female). At the second stage, eachclass — male and female, is further subdivided on the basis of the presence orabsence of another attribute i.e.whether married or unmarried. On the

Activity

• The objects around can begrouped as either living or non-living. Is it a quantitativeclassification?

other hand, characteristics like height,weight, age, income, marks ofstudents, etc. are quantitative innature. When the collected data ofsuch characteristics are grouped into

ORGANISATION OF DATA 2 7

classes, the classification is aQuantitative Classification.

Example 4

Frequency Distribution of Marks inMathematics of 100 Students

Marks Frequency

0–10 110–20 820–30 630–40 740–50 2150–60 2360–70 1970–80 680–90 590–100 4

Total 100

Example 4 shows quantitativeclassification of the data of marks inmathematics of 100 students given inTable 3.1 as a Frequency Distribution.

Activity

• Express the values of frequencyof Example 4 as proportion orpercentage of total frequency.Note that frequency expressed inthis way is known as relativefrequency.

• In Example 4, which class hasthe maximum concentration ofdata? Express it as percentageof total observations. Which classhas the minimum concentrationof data?

4. VARIABLES: CONTINUOUS AND

DISCRETE

A simple definition of variable,which you have read in the last

chapter, does not tell you how it varies.Different variables vary differently anddepending on the way they vary, theyare broadly classified into two types:

(i) Continuous and(ii) Discrete.

A continuous variable can take anynumerical value. It may take integralvalues (1, 2, 3, 4, ...), fractional values(1/2, 2/3, 3/4, ...), and values thatare not exact fractions ( 2 =1.414,

3 =1.732, … , 7 =2.645). Forexample, the height of a student, ashe/she grows say from 90 cm to 150cm, would take all the values inbetween them. It can take values thatare whole numbers like 90cm, 100cm,108cm, 150cm. It can also takefractional values like 90.85 cm, 102.34cm, 149.99cm etc. that are not wholenumbers. Thus the variable “height”

is capable ofmanifesting inevery conceivablevalue and itsvalues can also

be broken down into infinitegradations. Other examples of acontinuous variable are weight, time,distance, etc.

Unlike a continuous variable, adiscrete variable can take only certainvalues. Its value changes only by finite“jumps”. It “jumps” from one value toanother but does not take anyintermediate value between them. Forexample, a variable like the “numberof students in a class”, for differentclasses, would assume values that areonly whole numbers. It cannot take

2 8 STATISTICS FOR ECONOMICS

any fractional value like0.5 because “half of astudent” is absurd.Therefore it cannot take avalue like 25.5 between 25and 26. Instead its valuecould have been either 25or 26. What we observe isthat as its value changesfrom 25 to 26, the valuesin between them — the fractions arenot taken by it. But do not have theimpression that a discrete variablecannot take any fractional value.Suppose X is a variable that takesvalues like 1/8, 1/16, 1/32, 1/64, ...Is it a discrete variable? Yes, becausethough X takes fractional values itcannot take any value between twoadjacent fractional values. It changesor “jumps” from 1/8 to 1/16 and from1/16 to 1/32. But cannot take a valuein between 1/8 and 1/16 or between1/16 and 1/32

Activity

• Distinguish the followingvariables as continuous anddiscrete:Area, volume, temperature,number appearing on a dice,crop yield, population, rainfall,number of cars on road, age.

Earlier we have mentioned thatexample 4 is the frequencydistribution of marks in mathematicsof 100 students as shown in Table 3.1.It shows how the marks of 100students are grouped into classes. Youwill be wondering as to how we got itfrom the raw data of Table 3.1. But,

before we address this question, youmust know what a frequencydistribution is.

5. WHAT IS A FREQUENCY DISTRIBUTION?

A frequency distribution is acomprehensive way to classify rawdata of a quantitative variable. Itshows how the different values of avariable (here, the marks inmathematics scored by a student) aredistributed in different classes alongwith their corresponding classfrequencies. In this case we have tenclasses of marks: 0–10, 10–20, … , 90–100. The term Class Frequency meansthe number of values in a particularclass. For example, in the class 30–40 we find 7 values of marks from rawdata in Table 3.1. They are 30, 37, 34,30, 35, 39, 32. The frequency of theclass: 30–40 is thus 7. But you mightbe wondering why 40–which isoccurring twice in the raw data – isnot included in the class 30–40. Hadit been included the class frequencyof 30–40 would have been 9 insteadof 7. The puzzle would be clear to youif you are patient enough to read thischapter carefully. So carry on. You willfind the answer yourself.

Each class in a frequencydistribution table is bounded by ClassLimits. Class limits are the two endsof a class. The lowest value is calledthe Lower Class Limit and the highestvalue the Upper Class Limit. Forexample, the class limits for the class:60–70 are 60 and 70. Its lower classlimit is 60 and its upper class limit is70. Class Interval or Class Width is

ORGANISATION OF DATA 2 9

the difference between the upper classlimit and the lower class limit. For theclass 60–70, the class interval is 10(upper class limit minus lower classlimit).

The Class Mid-Point or Class Markis the middle value of a class. It lieshalfway between the lower class limitand the upper class limit of a classand can be ascertained in thefollowing manner:

Class Mid-Point or Class Mark =(Upper Class Limit + Lower ClassLimit) / 2 .....................................(1)

The class mark or mid-value ofeach class is used to represent theclass. Once raw data are grouped intoclasses, individual observations arenot used in further calculations.Instead, the class mark is used.

TABLE 3.3The Lower Class Limits, the Upper Class

Limits and the Class Mark

Class Frequency Lower Upper ClassClass Class MarksLimit Limit

0–10 1 0 10 510–20 8 10 20 1520–30 6 20 30 2530–40 7 30 40 3540–50 21 40 50 4550–60 23 50 60 5560–70 19 60 70 6570–80 6 70 80 7580–90 5 80 90 8590–100 4 90 100 95

Frequency Curve is a graphicrepresentation of a frequencydistribution. Fig. 3.1 shows thediagrammatic presentation of the

frequency distribution of the data inour example above. To obtain thefrequency curve we plot the classmarks on the X-axis and frequency onthe Y-axis.

Fig. 3.1: Diagrammatic Presentation ofFrequency Distribution of Data.

How to prepare a FrequencyDistribution?

While preparing a frequencydistribution from the raw data of Table3.1, the following four questions needto be addressed:1. How many classes should we

have?2. What should be the size of each

class?3. How should we determine the class

limits?4. How should we get the frequency

for each class?

How many classes should we have?

Before we determine the numberof classes, we first find out as to whatextent the variable in hand changesin value. Such variations in variable’svalue are captured by its range. TheRange is the difference between thelargest and the smallest values of the

3 0 STATISTICS FOR ECONOMICS

variable. A large range indicates thatthe values of the variable are widelyspread. On the other hand, a smallrange indicates that the values of thevariable are spread narrowly. In ourexample the range of the variable“marks of a student” are 100 becausethe minimum marks are 0 and themaximum marks 100. It indicates thatthe variable has a large variation.

After obtaining the value of range,it becomes easier to determine thenumber of classes once we decide theclass interval. Note that range is thesum of all class intervals. If the classintervals are equal then range is theproduct of the number of classes andclass interval of a single class.

Range = Number of Classes × ClassInterval ........................................(2)

Activities

Find the range of the following:• population of India in Example 1,• yield of wheat in Example 2.

Given the value of range, thenumber of classes would be large ifwe choose small class intervals. Afrequency distribution with too manyclasses would look too large. Such adistribution is not easy to handle. Sowe want to have a reasonably compactset of data. On the other hand, giventhe value of range if we choose a classinterval that is too large then thenumber of classes becomes too small.The data set then may be too compactand we may not like the loss ofinformation about its diversity. For

example, suppose the range is 100and the class interval is 50. Then thenumber of classes would be just 2(i.e.100/50 = 2). Though there is nohard-and-fast rule to determine thenumber of classes, the rule of thumboften used is that the number ofclasses should be between 5 and 15.In our example we have chosen tohave 10 classes. Since the range is 100and the class interval is 10, thenumber of classes is 100/10 =10.

What should be the size of eachclass?

The answer to this question dependson the answer to the previousquestion. The equality (2) shows thatgiven the range of the variable, we candetermine the number of classes oncewe decide the class interval. Similarly,we can determine the class intervalonce we decide the number of classes.Thus we find that these two decisionsare inter-linked with one another. Wecannot decide on one without decidingon the other.

In Example 4, we have the numberof classes as 10. Given the value ofrange as 100, the class intervals areautomatically 10 by the equality (2).Note that in the present context wehave chosen class intervals that areequal in magnitude. However we couldhave chosen class intervals that arenot of equal magnitude. In that case,the classes would have been ofunequal width.

ORGANISATION OF DATA 3 1

How should we determine the classlimits?

When we classify raw data of acontinuous variable as a frequencydistribution, we in effect, group theindividual observations into classes.The value of the upper class limit of aclass is obtained by adding the classinterval with the value of the lowerclass limit of that class. For example,the upper class limit of the class 20–30 is 20 + 10 = 30 where 20 is thelower class limit and 10 is the classinterval. This method is repeated forother classes as well.

But how do we decide the lowerclass limit of the first class? That is tosay, why 0 is the lower class limit ofthe first class: 0–10? It is because wechose the minimum value of thevariable as the lower limit of the firstclass. In fact, we could have chosen avalue less than the minimum value ofthe variable as the lower limit of thefirst class. Similarly, for the upperclass limit for the last class we couldhave chosen a value greater than themaximum value of the variable. It isimportant to note that, when afrequency distribution is beingconstructed, the class limits shouldbe so chosen that the mid-point orclass mark of each class coincide, asfar as possible, with any value aroundwhich the data tend to beconcentrated.

In our example on marks of 100students, we chose 0 as the lower limitof the first class: 0–10 because theminimum marks were 0. And that iswhy, we could not have chosen 1 as

the lower class limit of that class. Hadwe done that we would have excludedthe observation 0. The upper classlimit of the first class: 0–10 is thenobtained by adding class interval withlower class limit of the class. Thus theupper class limit of the first classbecomes 0 + 10 = 10. And this proce-dure is followed for the other classesas well.

Have you noticed that the upperclass limit of the first class is equal tothe lower class limit of the secondclass? And both are equal to 10. Thisis observed for other classes as well.Why? The reason is that we have usedthe Exclusive Method of classificationof raw data. Under the method weform classes in such a way that thelower limit of a class coincides withthe upper class limit of the previousclass.

The problem, we would face next,is how do we classify an observationthat is not only equal to the upperclass limit of a particular class but isalso equal to the lower class limit ofthe next class. For example, we findobservation 30 to be equal to theupper class limit of the class 20–30and it is equal to the lower class limitof class 30–40. Then, in which of thetwo classes: 20–30 or 30–40 shouldwe put the observation 30? We can putit either in class 20–30 or in class 30–40. It is a dilemma that one commonlyfaces while classifying data inoverlapping classes. This problem issolved by the rule of classification inthe Exclusive Method.

3 2 STATISTICS FOR ECONOMICS

Exclusive Method

The classes, by this method, areformed in such a way that the upperclass limit of one class equals thelower class limit of the next class. Inthis way the continuity of the data ismaintained. That is why this methodof classification is most suitable incase of data of a continuous variable.Under the method, the upper class limitis excluded but the lower class limit ofa class is included in the interval. Thusan observation that is exactly equalto the upper class limit, according tothe method, would not be included inthat class but would be included inthe next class. On the other hand, ifit were equal to the lower class limitthen it would be included in that class.In our example on marks of students,the observation 40, that occurs twice,in the raw data of Table 3.1 is notincluded in the class: 30–40. It isincluded in the next class: 40–50. Thatis why we find the frequency corres-ponding to the class 30–40 to be 7instead of 9.

There is another method of formingclasses and it is known as theInclusive Method of classification.

Inclusive Method

In comparison to the exclusive method,the Inclusive Method does not excludethe upper class limit in a classinterval. It includes the upper classin a class. Thus both class limits areparts of the class interval.

For example, in the frequencydistribution of Table 3.4 we include

TABLE 3.4Frequency Distribution of Incomes of 550

Employees of a Company

Income (Rs) Number of Employees

800–899 50900–999 1001000–1099 2001100–1199 1501200–1299 401300–1399 10

Total 550

in the class: 800–899 those employeeswhose income is either Rs 800, orbetween Rs 800 and Rs 899, or Rs899. If the income of an employee isexactly Rs 900 then he is put in thenext class: 900–999.

A close observation of the InclusiveMethod in Table 3.4 would show thatthough the variable “income” is acontinuous variable, no suchcontinuity is maintained when theclasses are made. We find “gap” ordiscontinuity between the upper limitof a class and the lower limit of thenext class. For example, between theupper limit of the first class: 899 andthe lower limit of the second class:900, we find a “gap” of 1. Then howdo we ensure the continuity of thevariable while classifying data? Thisis achieved by making an adjustmentin the class interval. The adjustmentis done in the following way:1. Find the difference between the

lower limit of the second class andthe upper limit of the first class.For example, in Table 3.4 the lowerlimit of the second class is 900 and

ORGANISATION OF DATA 3 3

the upper limit of the first class is899. The difference between themis 1, i.e. (900 – 899 = 1)

2. Divide the difference obtained in(1) by two i.e. (1/2 = 0.5)

3. Subtract the value obtained in (2)from lower limits of all classes(lower class limit – 0.5)

4. Add the value obtained in (2) toupper limits of all classes (upperclass limit + 0.5).After the adjustment that restores

continuity of data in the frequencydistribution, the Table 3.4 is modifiedinto Table 3.5

After the adjustments in classlimits, the equality (1) that determinesthe value of class-mark would bemodified as the following:

TABLE 3.5Frequency Distribution of Incomes of 550

Employees of a Company

Income (Rs) Number of Employees

799.5–899.5 50899.5–999.5 100999.5–1099.5 2001099.5–1199.5 1501199.5–1299.5 401299.5–1399.5 10

Total 550

How should we get the frequencyfor each class?

In simple terms, frequency of anobservation means how many timesthat observation occurs in the rawdata. In our Table 3.1, we observe thatthe value 40 occurs thrice; 0 and 10occur only once; 49 occurs five timesand so on. Thus the frequency of 40is 3, 0 is 1, 10 is 1, 49 is 5 and so on.But when the data are grouped into

TABLE 3.6Tally Marking of Marks of 100 Students in Mathematics

Class Observations Tally Frequency ClassMark Mark

0–10 0 / 1 510–20 10, 14, 17, 12, 14, 12, 14, 14 //// /// 8 1520–30 25, 25, 20, 22, 25, 28 //// / 6 2530–40 30, 37, 34, 39, 32, 30, 35, //// // 7 3540–50 47, 42, 49, 49, 45, 45, 47, 44, 40, 44, //// //// ////

49, 46, 41, 40, 43, 48, 48, 49, 49, 40, //// /41 21 45

50–60 59, 51, 53, 56, 55, 57, 55, 51, 50, 56, //// //// ////59, 56, 59, 57, 59, 55, 56, 51, 55, 56, //// ///55, 50, 54 23 55

60–70 60, 64, 62, 66, 69, 64, 64, 60, 66, 69, //// //// ////62, 61, 66, 60, 65, 62, 65, 66, 65 //// 19 65

70–80 70, 75, 70, 76, 70, 71 ///// 6 7580–90 82, 82, 82, 80, 85 //// 5 8590–100 90, 100, 90, 90 //// 4 95

Total 100

3 4 STATISTICS FOR ECONOMICS

classes as in example 3, the ClassFrequency refers to the number ofvalues in a particular class. Thecounting of class frequency is done bytally marks against the particularclass.

Finding class frequency by tallymarking

A tally (/) is put against a class foreach student whose marks areincluded in that class. For example, ifthe marks obtained by a student are57, we put a tally (/) against class 50–60. If the marks are 71, a tally is putagainst the class 70–80. If someoneobtains 40 marks, a tally is putagainst the class 40–50. Table 3.6shows the tally marking of marks of100 students in mathematics fromTable 3.1.

The counting of tally is made easierwhen four of them are put as ////and the fifth tally is placed acrossthem as . Tallies are then countedas groups of five. So if there are 16tallies in a class, we put them as

/ for the sake ofconvenience. Thus frequency in aclass is equal to the number of talliesagainst that class.

Loss of Information

The classification of data as afrequency distribution has aninherent shortcoming. While itsummarises the raw data making itconcise and comprehensible, it doesnot show the details that are found inraw data. There is a loss of information

in classifying raw data though muchis gained by summarising it as aclassified data. Once the data aregrouped into classes, an individualobservation has no significance infurther statistical calculations. InExample 4, the class 20–30 contains6 observations: 25, 25, 20, 22, 25 and28. So when these data are groupedas a class 20–30 in the frequencydistribution, the latter provides onlythe number of records in that class(i.e. frequency = 6) but not their actualvalues. All values in this class areassumed to be equal to the middlevalue of the class interval or classmark (i.e. 25). Further statisticalcalculations are based only on thevalues of class mark and not on thevalues of the observations in thatclass. This is true for other classes aswell. Thus the use of class markinstead of the actual values of theobservations in statistical methodsinvolves considerable loss ofinformation.

Frequency distribution withunequal classes

By now you are familiar withfrequency distributions of equal classintervals. You know how they areconstructed out of raw data. But insome cases frequency distributionswith unequal class intervals are moreappropriate. If you observe thefrequency distribution of Example 4,as in Table 3.6, you will notice thatmost of the observations areconcentrated in classes 40–50, 50–60and 60–70. Their respective frequen-

ORGANISATION OF DATA 3 5

cies are 21, 23 and 19. It means thatout of 100 observations, 63(21+23+19) observations areconcentrated in these classes. Theseclasses are densely populated withobservations. Thus, 63 percent of datalie between 40 and 70. The remaining37 percent of data are in classes0–10, 10–20, 20–30, 30–40, 70–80,80–90 and 90–100. These classes aresparsely populated with observations.Further you will also notice thatobservations in these classes deviatemore from their respective class marksthan in comparison to those in otherclasses. But if classes are to be formedin such a way that class markscoincide, as far as possible, to a valuearound which the observations in aclass tend to concentrate, then in thatcase unequal class interval is moreappropriate.

Table 3.7 shows the samefrequency distribution of Table 3.6 in

terms of unequal classes. Each of theclasses 40–50, 50–60 and 60–70 aresplit into two classes. The class 40–50 is divided into 40–45 and 45–50.The class 50–60 is divided into 50– 55and 55–60. And class 60–70 is dividedinto 60–65 and 65–70. The newclasses 40–45, 45–50, 50–55, 55–60,60–65 and 65–70 have class intervalof 5. The other classes: 0–10, 10–20,20–30, 30–40, 70–80, 80–90 and 90–100 retain their old class interval of10. The last column of this table showsthe new values of class marks forthese classes. Compare them with theold values of class marks in Table 3.6.Notice that the observations in theseclasses deviated more from their oldclass mark values than their new classmark values. Thus the new class markvalues are more representative of thedata in these classes than the oldvalues.

TABLE 3.7Frequency Distribution of Unequal Classes

Class Observations Frequency ClassMark

0–10 0 1 510–20 10, 14, 17, 12, 14, 12, 14, 14 8 1520–30 25, 25, 20, 22, 25, 28 6 2530–40 30, 37, 34, 39, 32, 30, 35, 7 3540–45 42, 44, 40, 44, 41, 40, 43, 40, 41 9 42.545–50 47, 49, 49, 45, 45, 47, 49, 46, 48, 48, 49, 49 12 47.550–55 51, 53, 51, 50, 51, 50, 54 7 52.555–60 59, 56, 55, 57, 55, 56, 59, 56, 59, 57, 59, 55,

56, 55, 56, 55 16 57.560–65 60, 64, 62, 64, 64, 60, 62, 61, 60, 62, 10 62.565–70 66, 69, 66, 69, 66, 65, 65, 66, 65 9 67.570–80 70, 75, 70, 76, 70, 71 6 7580–90 82, 82, 82, 80, 85 5 8590–100 90, 100, 90, 90 4 95

Total 100

3 6 STATISTICS FOR ECONOMICS

Figure 3.2 shows the frequencycurve of the distribution in Table 3.7.The class marks of the table areplotted on X-axis and the frequenciesare plotted on Y-axis.

Fig. 3.2: Frequency Curve

Activity

• If you compare Figure 3.2 withFigure 3.1, what do you observe?Do you find any differencebetween them? Can you explainthe difference?

Frequency array

So far we have discussed theclassification of data for a continuousvariable using the example ofpercentage marks of 100 students inmathematics. For a discrete variable,the classification of its data is knownas a Frequency Array. Since a discretevariable takes values and notintermediate fractional valuesbetween two integral values, we havefrequencies that correspond to eachof its integral values.

The example in Table 3.8illustrates a Frequency Array.

TABLE 3.8Frequency Array of the Size of Households

Size of the Number ofHousehold Households

1 52 153 254 355 106 57 38 2

Total 100

The variable “size of thehousehold” is a discrete variable thatonly takes integral values as shownin the table. Since it does not take anyfractional value between two adjacentintegral values, there are no classesin this frequency array. Since thereare no classes in a frequency arraythere would be no class intervals. Asthe classes are absent in a discretefrequency distribution, there is noclass mark as well.

6. BIVARIATE FREQUENCY DISTRIBUTION

The frequency distribution of a singlevariable is called a UnivariateDistribution. The example 3.3 showsthe univariate distribution of thesingle variable “marks of a student”.A Bivariate Frequency Distribution isthe frequency distribution of twovariables.

Table 3.9 shows the frequencydistribution of two variable sales andadvertisement expenditure (in Rs.lakhs) of 20 companies. The values ofsales are classed in different columns

ORGANISATION OF DATA 3 7

and the values of advertisementexpenditure are classed in differentrows. Each cell shows the frequencyof the corresponding row and columnvalues. For example, there are 3 firmswhose sales are between Rs 135–145lakhs and their advertisementexpenditures are between Rs 64–66thousands. The use of a bivariatedistribution would be taken up inChapter 8 on correlation.

7. CONCLUSION

The data collected from primary andsecondary sources are raw or

unclassified. Once the data iscollected, the next step is to classifythem for further statistical analysis.Classification brings order in thedata.The chapter enables you to know howdata can be classified through afrequency distribution in acomprehensive manner. Once youknow the techniques of classification,it will be easy for you to construct afrequency distribution, both forcontinuous and discrete variables.

Recap

• Classification brings order to raw data.• A Frequency Distribution shows how the different values of a variable

are distributed in different classes along with their correspondingclass frequencies.

• The upper class limit is excluded but lower class limit is included inthe Exclusive Method.

• Both the upper and the lower class limits are included in the InclusiveMethod.

• In a Frequency Distribution, further statistical calculations are basedonly on the class mark values, instead of values of the observations.

• The classes should be formed in such a way that the class markof each class comes as close as possible, to a value aroundwhich the observations in a class tend to concentrate.

TABLE 3.9Bivariate Frequency Distribution of Sales (in Lakh Rs) and Advertisement Expenditure

(in Thousand Rs) of 20 Firms

115–125 125–135 135–145 145–155 155–165 165–175 Total

62–64 2 1 364–66 1 3 466–68 1 1 2 1 568–70 2 2 470–72 1 1 1 1 4

Total 4 5 6 3 1 1 20

3 8 STATISTICS FOR ECONOMICS

EXERCISES

1. Which of the following alternatives is true?(i) The class midpoint is equal to:

(a) The average of the upper class limit and the lower class limit.(b) The product of upper class limit and the lower class limit.(c) The ratio of the upper class limit and the lower class limit.(d) None of the above.

(ii) The frequency distribution of two variables is known as(a) Univariate Distribution(b) Bivariate Distribution(c) Multivariate Distribution(d) None of the above

(iii) Statistical calculations in classified data are based on(a) the actual values of observations(b) the upper class limits(c) the lower class limits(d) the class midpoints

(iv) Under Exclusive method,(a) the upper class limit of a class is excluded in the class interval(b) the upper class limit of a class is included in the class interval(c) the lower class limit of a class is excluded in the class interval(d) the lower class limit of a class is included in the class interval

(v) Range is the(a) difference between the largest and the smallest observations(b) difference between the smallest and the largest observations(c) average of the largest and the smallest observations(d) ratio of the largest to the smallest observation

2. Can there be any advantage in classifying things? Explain with an examplefrom your daily life.

3. What is a variable? Distinguish between a discrete and a continuousvariable.

4. Explain the ‘exclusive’ and ‘inclusive’ methods used in classification ofdata.

5. Use the data in Table 3.2 that relate to monthly household expenditure(in Rs) on food of 50 households and(i) Obtain the range of monthly household expenditure on food.(ii) Divide the range into appropriate number of class intervals and obtain

the frequency distribution of expenditure.(iii) Find the number of households whose monthly expenditure on food is

(a) less than Rs 2000(b) more than Rs 3000

ORGANISATION OF DATA 3 9

(c) between Rs 1500 and Rs 2500

6. In a city 45 families were surveyed for the number of domestic appliancesthey used. Prepare a frequency array based on their replies as recordedbelow.

1 3 2 2 2 2 1 2 1 2 2 3 3 3 33 3 2 3 2 2 6 1 6 2 1 5 1 5 32 4 2 7 4 2 4 3 4 2 0 3 1 4 3

7. What is ‘loss of information’ in classified data?

8. Do you agree that classified data is better than raw data?

9. Distinguish between univariate and bivariate frequency distribution.

10. Prepare a frequency distribution by inclusive method taking class intervalof 7 from the following data:

28 17 15 22 29 21 23 27 18 12 7 2 9 4 61 8 3 10 5 20 16 12 8 4 33 27 21 15 93 36 27 18 9 2 4 6 32 31 29 18 14 1315 11 9 7 1 5 37 32 28 26 24 20 19 2519 20

Suggested Activity

• From your old mark-sheets find the marks that you obtained inmathematics in the previous classes. Arrange them year-wise. Checkwhether the marks you have secured in the subject is a variable ornot. Also see, if over the years, you have improved in mathematics.

Presentation of Data

1. INTRODUCTION

You have already learnt in previouschapters how data are collected andorganised. As data are generallyvoluminous, they need to be put in acompact and presentable form. Thischapter deals with presentation of dataprecisely so that the voluminous datacollected could be made usable readilyand are easily comprehended. There aregenerally three forms of presentation ofdata:

• Textual or Descriptive presentation• Tabular presentation• Diagrammatic presentation.

2. TEXTUAL PRESENTATION OF DATA

In textual presentation, data aredescribed within the text. When thequantity of data is not too large this formof presentation is more suitable. Lookat the following cases:

Case 1

In a bandh call given on 08 September2005 protesting the hike in prices ofpetrol and diesel, 5 petrol pumps werefound open and 17 were closed whereas2 schools were closed and remaining 9schools were found open in a town ofBihar.

Studying this chapter shouldenable you to:• present data using tables;• represent data using appropriate

diagrams.

CHAPTER

PRESENTATION OF DATA 4 1

Case 2

Census of India 2001 reported thatIndian population had risen to 102crore of which only 49 crore werefemales against 53 crore males. 74 crorepeople resided in rural India and only28 crore lived in towns or cities. Whilethere were 62 crore non-workerpopulation against 40 crore workers inthe entire country, urban populationhad an even higher share of non-workers (19 crores) against the workers(9 crores) as compared to the ruralpopulation where there were 31 croreworkers out of a 74 crore population....

In both the cases data have beenpresented only in the text. A seriousdrawback of this method of presentationis that one has to go through thecomplete text of presentation forcomprehension but at the same time, itenables one to emphasise certain pointsof the presentation.

3. TABULAR PRESENTATION OF DATA

In a tabular presentation, data arepresented in rows (read horizontally)and columns (read vertically). Forexample see Table 4.1 below tabulatinginformation about literacy rates. It has

3 rows (for male, female and total) and3 columns (for urban, rural and total).It is called a 3 × 3 Table giving 9 itemsof information in 9 boxes called the"cells" of the Table. Each cell givesinformation that relates an attribute ofgender ("male", "female" or total) with anumber (literacy percentages of ruralpeople, urban people and total). Themost important advantage of tabulationis that it organises data for furtherstatistical treatment and decision-making. Classification used intabulation is of four kinds:

• Qualitative• Quantitative• Temporal and• Spatial

Qualitative classification

When classification is done accordingto qualitative characteristics like socialstatus, physical status, nationality, etc.,it is called qualitative classification. Forexample, in Table 4.1 the characteris-tics for classification are sex andlocation which are qualitative in nature.

TABLE 4.1Literacy in Bihar by sex and location (per cent)

Location TotalSex Rural Urban

Male 57.70 80.80 60.32Female 30.03 63.30 33.57

Total 44.42 72.71 47.53

Source: Census of India 2001, ProvisionalPopulation Totals.

Quantitative classification

In quantitative classification, the dataare classified on the basis of

4 2 STATISTICS FOR ECONOMICS

characteristics which are quantitativein nature. In other words thesecharacteristics can be measuredquantitatively. For example, age, height,production, income, etc are quantitativecharacteristics. Classes are formed byassigning limits called class limits forthe values of the characteristic underconsideration. An example ofquantitative classification is Table 4.2.

TABLE 4.2Distribution of 542 respondents by

their age in an election study in Bihar

Age group No. of(yrs) respondents Per cent

20–30 3 0.5530–40 61 11.2540–50 132 24.3550–60 153 28.2460–70 140 25.8370–80 51 9.4180–90 2 0.37

All 542 100.00

Source: Assembly election Patna centralconstituency 2005, A.N. Sinha Institute of SocialStudies, Patna.

Here classifying characteristic is age

in years and is quantifiable.

Activities

• Construct a table presentingdata on preferential liking of thestudents of your class for Star

News, Zee News, BBC World,CNN, Aaj Tak and DD News.

• Prepare a table of(i) heights (in cm) and(ii) weights (in kg) of students

Temporal classification

In this classification time becomes theclassifying variable and data arecategorised according to time. Timemay be in hours, days, weeks, months,years, etc. For example, see Table 4.3.

TABLE 4.3Yearly sales of a tea shop

from 1995 to 2000

Years Sale (Rs in lakhs)

1995 79.21996 81.31997 82.41998 80.51999 100.22000 91.2

Data Source: Unpublished data.

In this table the classifyingcharacteristic is year and takes valuesin the scale of time.

Activity

• Go to your library and collectdata on the number of books ineconomics, the library had atthe end of the year for the lastten years and present the datain a table.

Spatial classification

When classification is done in such away that place becomes the classifyingvariable, it is called spatialclassification. The place may be avillage/town, block, district, state,country, etc.

Here the classifying characteristic iscountry of the world. Table 4.4 is anexample of spatial classification.

PRESENTATION OF DATA 4 3

TABLE 4.4Export from India to rest of the world inone year as share of total export (per cent)

Destination Export share

USA 21.8Germany 5.6Other EU 14.7U K 5.7Japan 4.9Russia 2.1Other East Europe 0.6OPEC 10.5Asia 19.0Other LDCs 5.6Others 9.5

All 100.0

(Total Exports: US \$ 33658.5 million)

Activity

• Construct a table presentingdata collected from students ofyour class according to theirnative states/residentiallocality.

4. TABULATION OF DATA AND PARTS OF

A TABLE

To construct a table it is important tolearn first what are the parts of a goodstatistical table. When put together ina systematically ordered manner theseparts form a table. The most simple wayof conceptualising a table may be datapresented in rows and columnsalongwith some explanatory notes.Tabulation can be done using one-way, two-way or three-wayclassification depending upon thenumber of characteristics involved. Agood table should essentially have thefollowing:

(i) Table Number

Table number is assigned to a table foridentification purpose. If more than onetable is presented, it is the tablenumber that distinguishes one tablefrom another. It is given at the top orat the beginning of the title of the table.Generally, table numbers are wholenumbers in ascending order if there aremany tables in a book. Subscriptednumbers like 1.2, 3.1, etc. are also inuse for identifying the table accordingto its location. For example, Tablenumber 4.5 may read as fifth tableof the fourth chapter and so on.(See Table 4.5)

(ii) Title

The title of a table narrates about thecontents of the table. It has to be veryclear, brief and carefully worded so thatthe interpretations made from the tableare clear and free from any ambiguity.It finds place at the head of the tablesucceeding the table number or justbelow it. (See Table 4.5).

At the top of each column in a table acolumn designation is given to explainfigures of the column. This iscalled caption or column heading.(See Table 4.5)

Like a caption or column heading eachrow of the table has to be given aheading. The designations of the rowsare also called stubs or stub items, andthe complete left column is known as

4 4 STATISTICS FOR ECONOMICS

stub column. A brief description of therow headings may also be given at theleft hand top in the table. (See Table4.5).

(v) Body of the Table

Body of a table is the main part and itcontains the actual data. Location ofany one figure/data in the table is fixedand determined by the row and columnof the table. For example, data in thesecond row and fourth column indicatethat 25 crore females in rural India

were non-workers in 2001. (See Table4.5).

(vi) Unit of Measurement

The unit of measurement of the figuresin the table (actual data) should alwaysbe stated alongwith the title if the unitdoes not change throughout the table.If different units are there for rows orcolumns of the table, these units mustbe stated alongwith ‘stubs’ or‘captions’. If figures are large, theyshould be rounded up and the method

(Note : Table 4.5 presents the same data in tabular form already presented through case 2 intextual presentation of data)

Table 4.5 Population of India according to workers and non-workers by gender and location

Location Gender Workers Non-worker TotalMain Marginal Total

Male 17 3 20 18 38Female 6 5 11 25 36Total 23 8 31 43 74

Male 7 1 8 7 15Female 1 0 1 12 13Total 8 1 9 19 28

Male 24 4 28 25 53Female 7 5 12 37 49Total 31 9 40 62 102

Source : Census of India 2001

Foot note : Figures are rounded to nearest crore

All

Urb

anRu

ral

Bod

y of

the

tabl

e←

Footnote↑

Source note↑

Row

Hea

ding

s/st

ubs

↑Units

(Crore)

↓Title

↓Table Number

PRESENTATION OF DATA 4 5

of rounding should be indicated (SeeTable 4.5).

(vii) Source Note

It is a brief statement or phraseindicating the source of data presentedin the table. If more than one source isthere, all the sources are to be writtenin the source note. Source note isgenerally written at the bottom of thetable. (See Table 4.5).

(viii)Footnote

Footnote is the last part of the table.Footnote explains the specific featureof the data content of the table which isnot self explanatory and has not beenexplained earlier.

Activities

• How many rows and columnsare essentially required to forma table?

• Can the column/row headingsof a table be quantitative?

5. DIAGRAMMATIC PRESENTATION OF

D A T A

This is the third method of presentingdata. This method provides thequickest understanding of the actualsituation to be explained by data incomparison to tabular or textualpresentations. Diagrammatic presenta-tion of data translates quite effectivelythe highly abstract ideas contained innumbers into more concrete and easilycomprehensible form.

Diagrams may be less accurate butare much more effective than tables inpresenting the data.

There are various kinds of diagramsin common use. Amongst them theimportant ones are the following:

(i) Geometric diagram(ii) Frequency diagram(iii) Arithmetic line graph

Geometric Diagram

Bar diagram and pie diagram come inthe category of geometric diagram forpresentation of data. The bar diagramsare of three types – simple, multiple andcomponent bar diagrams.

Bar Diagram

Simple Bar Diagram

Bar diagram comprises a group ofequispaced and equiwidth rectangularbars for each class or category of data.Height or length of the bar reads themagnitude of data. The lower end of thebar touches the base line such that theheight of a bar starts from the zero unit.Bars of a bar diagram can be visuallycompared by their relative height andaccordingly data are comprehendedquickly. Data for this can be offrequency or non-frequency type. Innon-frequency type data a particularcharacteristic, say production, yield,population, etc. at various points oftime or of different states are noted andcorresponding bars are made of therespective heights according to thevalues of the characteristic to constructthe diagram. The values of thecharacteristics (measured or counted)

4 6 STATISTICS FOR ECONOMICS

retain the identity of each value. Figure4.1 is an example of a bar diagram.

Activity

• You had constructed a tablepresenting the data about thestudents of your class. Draw abar diagram for the same table.

Different types of data may requiredifferent modes of diagrammaticalrepresentation. Bar diagrams aresuitable both for frequency type andnon-frequency type variables andattributes. Discrete variables like familysize, spots on a dice, grades in anexamination, etc. and attributes suchas gender, religion, caste, country, etc.can be represented by bar diagrams.Bar diagrams are more convenient fornon-frequency data such as income-

expenditure profile, export/importsover the years, etc.

A category that has a longer bar(literacy of Kerala) than anothercategory (literacy of West Bengal), hasmore of the measured (or enumerated)characteristics than the other. Bars(also called columns) are usually usedin time series data (food grainproduced between 1980–2000,decadal variation in work participation

TABLE 4.6Literacy Rates of Major States of India

2001 1991

Major Indian States Person Male Female Person Male Female

Andhra Pradesh (AP) 60.5 70.3 50.4 44.1 55.1 32.7Assam (AS) 63.3 71.3 54.6 52.9 61.9 43.0Bihar (BR) 47.0 59.7 33.1 37.5 51.4 22.0Jharkhand (JH) 53.6 67.3 38.9 41.4 55.8 31.0Gujarat (GJ) 69.1 79.7 57.8 61.3 73.1 48.6Haryana (HR) 67.9 78.5 55.7 55.8 69.1 40.4Karnataka (KA) 66.6 76.1 56.9 56.0 67.3 44.3Kerala (KE) 90.9 94.2 87.7 89.8 93.6 86.2Madhya Pradesh (MP) 63.7 76.1 50.3 44.7 58.5 29.4Chhattisgarh (CH) 64.7 77.4 51.9 42.9 58.1 27.5Maharashtra (MR) 76.9 86.0 67.0 64.9 76.6 52.3Orissa (OR) 63.1 75.3 50.5 49.1 63.1 34.7Punjab (PB) 69.7 75.2 63.4 58.5 65.7 50.4Rajasthan (RJ) 60.4 75.7 43.9 38.6 55.0 20.4Tamil Nadu (TN) 73.5 82.4 64.4 62.7 73.7 51.3Uttar Pradesh (UP) 56.3 68.8 42.2 40.7 54.8 24.4Uttaranchal (UT) 71.6 83.3 59.6 57.8 72.9 41.7West Bengal (WB) 68.6 77.0 59.6 57.7 67.8 46.6India 64.8 75.3 53.7 52.2 64.1 39.3

PRESENTATION OF DATA 4 7

Fig. 4.1: Bar diagram showing literacy rates (person) of major states of India, 2001.

rate, registered unemployed over theyears, literacy rates, etc.) (Fig 4.2).

Bar diagrams can have differentforms such as multiple bar diagramand component bar diagram.

Activities

• How many states (among themajor states of India) hadhigher female literacy rate thanthe national average in 2001?

• Has the gap between maximumand minimum female literacyrates over the states in twoconsecutive census years 2001and 1991 declined?

Multiple Bar Diagram

Multiple bar diagrams (Fig.4.2) areused for comparing two or more sets ofdata, for example income andexpenditure or import and export for

different years, marks obtained indifferent subjects in different classes,etc.

Component Bar Diagram

Component bar diagrams or charts(Fig.4.3), also called sub-diagrams, arevery useful in comparing the sizes ofdifferent component parts (the elementsor parts which a thing is made up of)and also for throwing light on therelationship among these integral parts.For example, sales proceeds fromdifferent products, expenditure patternin a typical Indian family (componentsbeing food, rent, medicine, education,power, etc.), budget outlay for receiptsand expenditures, components oflabour force, population etc.Component bar diagrams are usuallyshaded or coloured suitably.

4 8 STATISTICS FOR ECONOMICS

TABLE 4.7Enrolment by gender at schools (per cent)of children aged 6–14 years in a district of

Bihar

Enrolled Out of schoolGender (per cent) (per cent)

Boy 91.5 8.5Girl 58.6 41.4All 78.0 22.0

Data Source: Unpublished data

A component bar diagram showsthe bar and its sub-divisions into twoor more components. For example, thebar might show the total population ofchildren in the age-group of 6–14 years.The components show the proportionof those who are enrolled and thosewho are not. A component bar diagrammight also contain different componentbars for boys, girls and the total ofchildren in the given age group range,as shown in Figure 4.3. To construct acomponent bar diagram, first of all, abar is constructed on the x-axis with

its height equivalent to the total valueof the bar [for per cent data the barheight is of 100 units (Figure 4.3)].Otherwise the height is equated to totalvalue of the bar and proportionalheights of the components are workedout using unitary method. Smallercomponents are given priority inparting the bar.

Pie Diagram

A pie diagram is also a component

Fig. 4.2: Multiple bar (column) diagram showing female literacy rates over two census years 1991and 2001 by major states of India.

Interpretation: It can be very easily derived from Figure 4.2 that female literacy rate over the yearswas on increase throughout the country. Similar other interpretations can be made from the figurelike the state of Rajasthan experienced the sharpest rise in female literacy, etc.

Fig. 4.3: Enrolment at primary level in a districtof Bihar (Component Bar Diagram)

PRESENTATION OF DATA 4 9

diagram, but unlike a component bardiagram, a circle whose area isproportionally divided among thecomponents (Fig.4.4) it represents. It

is also called a pie chart. The circle isdivided into as many parts as there arecomponents by drawing straight linesfrom the centre to the circumference.

Pie charts usually are not drawnwith absolute values of a category. Thevalues of each category are firstexpressed as percentage of the totalvalue of all the categories. A circle in apie chart, irrespective of its value ofradius, is thought of having 100 equalparts of 3.6° (360°/100) each. To findout the angle, the component shallsubtend at the centre of the circle, eachpercentage figure of every componentis multiplied by 3.6°. An example of thisconversion of percentages ofcomponents into angular componentsof the circle is shown in Table 4.8.

It may be interesting to note thatdata represented by a component bardiagram can also be representedequally well by a pie chart, the onlyrequirement being that absolute values

of the components have to be convertedinto percentages before they can beused for a pie diagram.

TABLE 4.8 Distribution of Indian population by their

working status (crore)

Status Population Per cent AngularComponent

Marginal Worker 9 8.8 32°Main Worker 31 30.4 109°Non-Worker 62 60.8 219°

All 102 100.0 360°

Fig. 4.4: Pie diagram for different categories ofIndian population according to working status2001.

Activities

• Represent data presentedthrough Figure 4.4 by acomponent bar diagram.

• Does the area of a pie have anybearing on total value of thedata to be represented by thepie diagram?

Frequency Diagram

Data in the form of grouped frequencydistributions are generally representedby frequency diagrams like histogram,frequency polygon, frequency curveand ogive.

5 0 STATISTICS FOR ECONOMICS

Histogram

A histogram is a two dimensionaldiagram. It is a set of rectangles withbases as the intervals between classboundaries (along X-axis) and withareas proportional to the classfrequency (Fig.4.5). If the class intervalsare of equal width, which they generallyare, the area of the rectangles areproportional to their respectivefrequencies. However, in some type ofdata, it is convenient, at timesnecessary, to use varying width of classintervals. For example, when tabulatingdeaths by age at death, it would be verymeaningful as well as useful too to havevery short age intervals (0, 1, 2, ..., yrs/0, 7, 28, ..., days) at the beginningwhen death rates are very highcompared to deaths at most otherhigher age segments of the population.For graphical representation of suchdata, height for area of a rectangle isthe quotient of height (here frequency)and base (here width of the classinterval). When intervals are equal, thatis, when all rectangles have the samebase, area can conveniently berepresented by the frequency of anyinterval for purposes of comparison.When bases vary in their width, theheights of rectangles are to be adjustedto yield comparable measurements.The answer in such a situation isfrequency density (class frequencydivided by width of the class interval)

TABLE 4.9Distribution of daily wage earners in a

locality of a town

Daily No. Cumulative Frequenceyearning of wage 'Less than' 'More than'(Rs) earners (f)

45–49 2 2 8550–54 3 5 8355–59 5 10 8060–64 3 13 7565–69 6 19 7270–74 7 26 6675–79 12 38 5980–84 13 51 4785–89 9 60 3490–94 7 67 2595–99 6 73 18100–104 4 77 12105–109 2 79 8110–114 3 82 6115–119 3 85 3

Source: Unpublished data

Since histograms are rectangles, a lineparallel to the base line and of the samemagnitude is to be drawn at a verticaldistance equal to frequency (orfrequency density) of the class interval.A histogram is never drawn for adiscrete variable/data. Since in aninterval or ratio scale the lower classboundary of a class interval fuses withthe upper class boundary of theprevious interval, equal or unequal, therectangles are all adjacent and there isno open space between two consecutiverectangles. If the classes are notcontinuous they are first converted intocontinuous classes as discussed inChapter 3. Sometimes the commonportion between two adjacentrectangles (Fig.4.6) is omitted giving abetter impression of continuity. Theresulting figure gives the impression ofa double staircase.

PRESENTATION OF DATA 5 1

A histogram looks similar to a bardiagram. But there are more differencesthan similarities between the two thanit may appear at the first impression.The spacing and the width or the areaof bars are all arbitrary. It is the heightand not the width or the area of the barthat really matters. A single vertical linecould have served the same purposeas a bar of same width. Moreover, inhistogram no space is left in betweentwo rectangles, but in a bar diagramsome space must be left betweenconsecutive bars (except in multiplebar or component bar diagram).Although the bars have the samewidth, the width of a bar is unimportantfor the purpose of comparison. Thewidth in a histogram is as importantas its height. We can have a bardiagram both for discrete and

continuous variables, but histogram isdrawn only for a continuous variable.Histogram also gives value of mode ofthe frequency distribution graphicallyas shown in Figure 4.5 and the x-coordinate of the dotted vertical linegives the mode.

Frequency Polygon

A frequency polygon is a planebounded by straight lines, usually fouror more lines. Frequency polygon is analternative to histogram and is alsoderived from histogram itself. Afrequency polygon can be fitted to ahistogram for studying the shape of thecurve. The simplest method of drawinga frequency polygon is to join themidpoints of the topside of theconsecutive rectangles of thehistogram. It leaves us with the two

Fig. 4.5: Histogram for the distribution of 85 daily wage earners in a locality of a town.

5 2 STATISTICS FOR ECONOMICS

ends away from the base line, denyingthe calculation of the area under thecurve. The solution is to join the twoend-points thus obtained to the baseline at the mid-values of the two classeswith zero frequency immediately ateach end of the distribution. Brokenlines or dots may join the two ends withthe base line. Now the total area underthe curve, like the area in thehistogram, represents the totalfrequency or sample size.

Frequency polygon is the mostcommon method of presenting groupedfrequency distribution. Both classboundaries and class-marks can beused along the X-axis, the distancesbetween two consecutive class marksbeing proportional/equal to the widthof the class intervals. Plotting of databecomes easier if the class-marks fallon the heavy lines of the graph paper.

No matter whether class boundaries ormidpoints are used in the X-axis,frequencies (as ordinates) are alwaysplotted against the mid-point of classintervals. When all the points have beenplotted in the graph, they are carefullyjoined by a series of short straight lines.Broken lines join midpoints of twointervals, one in the beginning and theother at the end, with the two ends ofthe plotted curve (Fig.4.6). Whencomparing two or more distributionsplotted on the same axes, frequencypolygon is likely to be more useful sincethe vertical and horizontal lines of twoor more distributions may coincide ina histogram.

Frequency Curve

The frequency curve is obtained bydrawing a smooth freehand curvepassing through the points of the

Fig. 4.6: Frequency polygon drawn for the data given in Table 4.9

PRESENTATION OF DATA 5 3

frequency polygon as closely aspossible. It may not necessarily passthrough all the points of the frequencypolygon but it passes through them asclosely as possible (Fig. 4.7).

Ogive

Ogive is also called cumulativefrequency curve. As there are two typesof cumulative frequencies, for exampleless than type and more than type,accordingly there are two ogives for anygrouped frequency distribution data.Here in place of simple frequencies asin the case of frequency polygon,cumulative frequencies are plottedalong y-axis against class limits of thefrequency distribution. For less thanogive the cumulative frequencies areplotted against the respective upperlimits of the class intervals whereas formore than ogives the cumulative

frequencies are plotted against therespective lower limits of the classinterval. An interesting feature of thetwo ogives together is that theirintersection point gives the medianFig. 4.8 (b) of the frequency distribu-tion. As the shapes of the two ogivessuggest, less than ogive is neverdecreasing and more than ogive isnever increasing.

TABLE 4.10Frequency distribution of marks

obtained in mathematics

Marks Number of ‘Less than’ ‘More than’students cumulative cumulative

x f frequency frequency

0–20 6 6 6420–40 5 11 5840–60 33 44 5360–80 14 58 2080–100 6 64 6

Total 64

Fig. 4.7: Frequency curve for Table 4.9

5 4 STATISTICS FOR ECONOMICS

Arithmetic Line Graph

An arithmetic line graph is also calledtime series graph and is a method ofdiagrammatic presentation of data. Init, time (hour, day/date, week, month,year, etc.) is plotted along x-axis andthe value of the variable (time seriesdata) along y-axis. A line graph byjoining these plotted points, thus,obtained is called arithmetic line graph(time series graph). It helps inunderstanding the trend, periodicity,etc. in a long term time series data.

Activity

• Can the ogive be helpful inlocating the partition values ofthe distribution it represents?Fig. 4.8(b): ‘Less than’ and ‘More than’ ogive

for data given in Table 4.10

Fig. 4.8(a): 'Less than' and 'More than' ogive for data given in Table 4.10

PRESENTATION OF DATA 5 5

Fig. 4.9: Arithmetic line graph for time series data given in Table 4.11

TABLE 4.11Value of Exports and Imports of India

(Rs in 100 crores)

Year Exports Imports

1977–78 54 601978–79 57 681979–80 64 911980–81 67 1251982–83 88 1431983–84 98 1581984–85 117 1711985–86 109 1971986–87 125 2011987–88 157 2221988–89 202 2821989–90 277 3531990–91 326 4321991–92 440 4791992–93 532 6341993–94 698 7311994–95 827 9001995–96 1064 12271996–97 1186 13691997–98 1301 15421998–99 1416 1761

Here you can see from Fig. 4.9 thatfor the period 1978 to 1999, althoughthe imports were more than the exportsall through, the rate of accelerationwent on increasing after 1988–89 andthe gap between the two (imports andexports) was widened after 1995.

6. CONCLUSION

By now you must have been able tolearn how collected data could bepresented using various forms ofpresentation — textual, tabular anddiagrammatic. You are now also ableto make an appropriate choice of theform of data presentation as well as thetype of diagram to be used for a givenset of data. Thus you can makepresentation of data meaningful,comprehensive and purposeful.

1978

1979

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Val

ues

(in

Rs

100 C

rore

s)

Scale: 1cm=200 crores on Y-axis

ExportsImports

Year

5 6 STATISTICS FOR ECONOMICS

Recap• Data (even voluminous data) speak meaningfully through

presentation.• For small data (quantity) textual presentation serves the purpose

better.• For large quantity of data tabular presentation helps in

accommodating any volume of data for one or more variables.• Tabulated data can be presented through diagrams which enable

quicker comprehension of the facts presented otherwise.

EXERCISES

Answer the following questions, 1 to 10, choosing the correct answer1. Bar diagram is a

(i) one-dimensional diagram(ii) two-dimensional diagram(iii) diagram with no dimension(iv) none of the above

2. Data represented through a histogram can help in finding graphically the(i) mean(ii) mode(iii) median(iv) all the above

3. Ogives can be helpful in locating graphically the(i) mode(ii) mean(iii) median(iv) none of the above

4. Data represented through arithmetic line graph help in understanding(i) long term trend(ii) cyclicity in data(iii) seasonality in data(iv) all the above

5. Width of bars in a bar diagram need not be equal (True/False).

6. Width of rectangles in a histogram should essentially be equal (True/False).

7. Histogram can only be formed with continuous classification of data(True/False).

PRESENTATION OF DATA 5 7

8. Histogram and column diagram are the same method of presentation ofdata. (True/False).

9. Mode of a frequency distribution can be known graphically with thehelp of histogram. (True/False).

10. Median of a frequency distribution cannot be known from the ogives.(True/False).

11. What kind of diagrams are more effective in representing the following?(i) Monthly rainfall in a year(ii) Composition of the population of Delhi by religion(iii) Components of cost in a factory

12. Suppose you want to emphasise the increase in the share of urbannon-workers and lower level of urbanisation in India as shown inExample 4.2. How would you do it in the tabular form?

13. How does the procedure of drawing a histogram differ when classintervals are unequal in comparison to equal class intervals in afrequency table?

14. The Indian Sugar Mills Association reported that, ‘Sugar productionduring the first fortnight of December 2001 was about 3,87,000 tonnes,as against 3,78,000 tonnes during the same fortnight last year (2000).The off-take of sugar from factories during the first fortnight of December2001 was 2,83,000 tonnes for internal consumption and 41,000 tonnesfor exports as against 1,54,000 tonnes for internal consumption andnil for exports during the same fortnight last season.’(i) Present the data in tabular form.(ii) Suppose you were to present these data in diagrammatic form which

of the diagrams would you use and why?(iii) Present these data diagrammatically.

15. The following table shows the estimated sectoral real growth rates(percentage change over the previous year) in GDP at factor cost.

Year Agriculture and allied sectors Industry Services(1) (2) (3) (4)

1994–95 5.0 9.2 7.01995–96 –0.9 11.8 10.31996–97 9.6 6.0 7.11997–98 –1.9 5.9 9.01998–99 7.2 4.0 8.31999–2000 0.8 6.9 8.2

Represent the data as multiple time series graphs.

Measures of Central Tendency

Studying this chapter shouldenable you to:• understand the need for

summarising a set of data by onesingle number;

• recognise and distinguishbetween the different types ofaverages;

• learn to compute different typesof averages;

• draw meaningful conclusionsfrom a set of data;

• develop an understanding ofwhich type of average would bemost useful in a particularsituation.

1. INTRODUCTION

In the previous chapter, you have readthe tabular and graphic representation

of the data. In this chapter, you willstudy the measures of centraltendency which is a numerical methodto explain the data in brief. You cansee examples of summarising a largeset of data in day to day life likeaverage marks obtained by studentsof a class in a test, average rainfall inan area, average production in afactory, average income of personsliving in a locality or working in a firmetc.

Baiju is a farmer. He grows foodgrains in his land in a village calledBalapur in Buxar district of Bihar. Thevillage consists of 50 small farmers.Baiju has 1 acre of land. You areinterested in knowing the economiccondition of small farmers of Balapur.You want to compare the economic

CHAPTER

MEASURES OF CENTRAL TENDENCY 5 9

condition of Baiju in Balapur village.For this, you may have to evaluate thesize of his land holding, by comparingwith the size of land holdings of otherfarmers of Balapur. You may like tosee if the land owned by Baiju is –1. above average in ordinary sense

(see the Arithmetic Mean below)2. above the size of what half the

farmers own (see the Medianbelow)

3. above what most of the farmersown (see the Mode below)In order to evaluate Baiju’s relative

economic condition, you will have tosummarise the whole set of data ofland holdings of the farmers ofBalapur. This can be done by use ofcentral tendency, which summarisesthe data in a single value in such away that this single value canrepresent the entire data. Themeasuring of central tendency is away of summarising the data in theform of a typical or representativevalue.

There are several statisticalmeasures of central tendency or“averages”. The three most commonlyused averages are:• Arithmetic Mean• Median• Mode

You should note that there are twomore types of averages i.e. GeometricMean and Harmonic Mean, which aresuitable in certain situations.However, the present discussion willbe limited to the three types ofaverages mentioned above.

2. ARITHMETIC MEAN

Suppose the monthly income (in Rs)of six families is given as:1600, 1500, 1400, 1525, 1625, 1630.

The mean family income isobtained by adding up the incomesand dividing by the number offamilies.

Rs1600 1500 1400 1525 1625 1630

6+ + + + +

= Rs 1,547

It implies that on an average, afamily earns Rs 1,547.

Arithmetic mean is the mostcommonly used measure of centraltendency. It is defined as the sum ofthe values of all observations dividedby the number of observations and isusually denoted by x . In general, ifthere are N observations as X

1, X

2, X3,

..., XN, then the Arithmetic Mean is

given by

xX X X X

NXN

N=+ + + +

=

1 2 3 ...

S

Where, SX = sum of all observa-tions and N = total number of obser-vations.

How Arithmetic Mean is Calculated

The calculation of arithmetic meancan be studied under two broadcategories:1. Arithmetic Mean for Ungrouped

Data.2. Arithmetic Mean for Grouped Data.

6 0 STATISTICS FOR ECONOMICS

Arithmetic Mean for Series ofUngrouped Data

Direct Method

Arithmetic mean by direct method isthe sum of all observations in a seriesdivided by the total number ofobservations.

Example 1

Calculate Arithmetic Mean from thedata showing marks of students in aclass in an economics test: 40, 50, 55,78, 58.

XXN

=S

=+ + + +

=40 50 55 78 58

556 2.

The average marks of students inthe economics test are 56.2.

Assumed Mean Method

If the number of observations in thedata is more and/or figures are large,it is difficult to compute arithmetic

mean by direct method. Thecomputation can be made easier byusing assumed mean method.

In order to save time of calculationof mean from a data set containing alarge number of observations as wellas large numerical figures, you canuse assumed mean method. Here youassume a particular figure in the dataas the arithmetic mean on the basisof logic/experience. Then you maytake deviations of the said assumedmean from each of the observation.You can, then, take the summation ofthese deviations and divide it by thenumber of observations in the data.The actual arithmetic mean isestimated by taking the sum of theassumed mean and the ratio of sumof deviations to number of observa-tions. Symbolically,Let, A = assumed mean

X = individual observationsN = total numbers of observa-

tionsd = deviation of assumed mean

from individual observation,i.e. d = X – A

(HEIGHT IN INCHES)

MEASURES OF CENTRAL TENDENCY 6 1

Then sum of all deviations is taken

as S Sd X A= -( )

Then find SdN

Then add A and SdN to get X

N= +

S

You should remember that anyvalue, whether existing in the data ornot, can be taken as assumed mean.However, in order to simplify thecalculation, centrally located value inthe data can be selected as assumedmean.

Example 2

The following data shows the weeklyincome of 10 families.Family

A B C D E F G HI J

Weekly Income (in Rs)850 700 100 750 5000 80 420 2500400 360

Compute mean family income.

TABLE 5.1Computation of Arithmetic Mean by

Assumed Mean Method

Families Income d = X – 850 d'(X) = (X – 850)/10

A 850 0 0B 700 –150 –15C 100 –750 –75D 750 –100 –10E 5000 +4150 +415F 80 –770 –77G 420 –430 –43H 2500 +1650 +165I 400 –450 –45J 360 –490 –49

11160 +2660 +266

Arithmetic Mean using assumed meanmethod

NRs

= + = +

=

S850 2 660 10

1116

( , )/

, .

Thus, the average weekly incomeof a family by both methods isRs 1,116. You can check this by usingthe direct method.

Step Deviation Method

The calculations can be furthersimplified by dividing all the deviationstaken from assumed mean by thecommon factor ‘c’. The objective is toavoid large numerical figures, i.e., ifd = X – A is very large, then find d'.This can be done as follows:

dc

X AC

=-

.

The formula is given below:

c= +¢·

S

Where d' = (X – A)/c, c = commonfactor, N = number of observations,A= Assumed mean.

Thus, you can calculate thearithmetic mean in the example 2, bythe step deviation method,

X = 850 + (266)/10 × 10 = Rs 1,116.

Calculation of arithmetic mean forGrouped data

Discrete Series

Direct Method

In case of discrete series, frequencyagainst each of the observations is

6 2 STATISTICS FOR ECONOMICS

multiplied by the value of theobservation. The values, so obtained,are summed up and divided by thetotal number of frequencies.Symbolically,

XfXf

=S

S

Where, S fX = sum of product ofvariables and frequencies.S f = sum of frequencies.

Example 3

Calculate mean farm size ofcultivating households in a village forthe following data.Farm Size (in acres):

64 63 62 61 60 59

No. of Cultivating Households:

8 18 12 9 7 6

TABLE 5.2 Computation of Arithmetic Mean by

Direct Method

Farm Size No. of X d fd(X) cultivating (1 × 2) (X - 62)(2 × 4)in acres households(f)(1) (2) (3) (4) (5)

64 8 512 +2 +1663 18 1134 +1 +1862 12 744 0 061 9 549 –1 –960 7 420 –2 –1459 6 354 –3 –18

60 3713 –3 –7

Arithmetic mean using direct method,

XfXf

acres= = =S

S

371760

61 88.

Therefore, the mean farm size in avillage is 61.88 acres.

Assumed Mean Method

As in case of individual series thecalculations can be simplified by usingassumed mean method, as describedearlier, with a simple modification.Since frequency (f) of each item isgiven here, we multiply each deviation(d) by the frequency to get fd. Then we

get S fd. The next step is to get the

total of all frequencies i.e. S f. Then

find out S fd/S f. Finally thearithmetic mean is calculated by

X Afdf

= +S

S using assumed mean

method.

Step Deviation Method

In this case the deviations are dividedby the common factor ‘c’ whichsimplifies the calculation. Here we

estimate d' = dc

X AC

=-

in order to

reduce the size of numerical figuresfor easier calculation. Then get fd' and

S fd'. Finally the formula for stepdeviation method is given as,

X Afdf

c= +¢·

S

S

Activity

• Find the mean farm size for thedata given in example 3, by usingstep deviation and assumedmean methods.

Continuous Series

Here, class intervals are given. Theprocess of calculating arithmetic mean

MEASURES OF CENTRAL TENDENCY 6 3

in case of continuous series is sameas that of a discrete series. The onlydifference is that the mid-points ofvarious class intervals are taken. Youshould note that class intervals maybe exclusive or inclusive or of unequalsize. Example of exclusive classinterval is, say, 0–10, 10–20 and soon. Example of inclusive class intervalis, say, 0–9, 10–19 and so on. Exampleof unequal class interval is, say,0–20, 20–50 and so on. In all thesecases, calculation of arithmetic meanis done in a similar way.

Example 4

Calculate average marks of thefollowing students using (a) Directmethod (b) Step deviation method.

Direct Method

Marks0–10 10–20 20–30 30–40 40–5050–60 60–70

No. of Students5 12 15 25 83 2

TABLE 5.3Computation of Average Marks for

Exclusive Class Interval by Direct Method

Mark No. of mid fm d'=(m-35) fd'(x) students value (2)×(3) 10

(f) (m)(1) (2) (3) (4) (5) (6)0–10 5 5 25 –3 –1510–20 12 15 180 –2 –2420–30 15 25 375 –1 –1530–40 25 35 875 0 040–50 8 45 360 1 850–60 3 55 165 2 660–70 2 65 130 3 6

70 2110 –34

Steps:

1. Obtain mid values for each classdenoted by m.

2. Obtain S fm and apply the directmethod formula:

X =fmf

=211070

= 30.14 marksS

S

Step deviation method

1. Obtain d' = m� A

c2. Take A = 35, (any arbitrary figure),

c = common factor.

X = A+fd’f

� c = 35 + (� 34)

70 � 10

= 30.14 marks

££

An interesting property of A.M.

It is interesting to know anduseful for checking your calculationthat the sum of deviations of itemsabout arithmetic mean is always equal

to zero. Symbolically, S ( X – X ) = 0.However, arithmetic mean is

affected by extreme values. Any largevalue, on either end, can push it upor down.

Weighted Arithmetic Mean

Sometimes it is important to assignweights to various items according totheir importance, when you calculatethe arithmetic mean. For example,there are two commodities, mangoesand potatoes. You are interested infinding the average price of mangoes(p

1) and potatoes (p

2). The arithmetic

6 4 STATISTICS FOR ECONOMICS

mean will bep p1 2

2+

. However, you

might want to give more importance

to the rise in price of potatoes (p2). To

do this, you may use as ‘weights’ the

quantity of mangoes (q1) and the

quantity of potatoes (q2). Now the

arithmetic mean weighted by the

quantities would be q p q p

q q1 1 2 2

1 2

+

+ .

In general the weighted arithmetic

mean is given by,

w x + w x +...+ w xw + w +...+ w

=wxw

1 1 2 2 n n

1 2 n

££

When the prices rise, you may be

interested in the rise in the price of

the commodities that are more

important to you. You will read more

about it in the discussion of Index

Numbers in Chapter 8.

Activities

• Check this property of thearithmetic mean for the followingexample:X: 4 6 8 10 12

• In the above example if mean isincreased by 2, then whathappens to the individualobservations, if all are equallyaffected.

• If first three items increase by2, then what should be thevalues of the last two items, sothat mean remains the same.

• Replace the value 12 by 96. Whathappens to the arithmetic mean.Comment.

3. MEDIAN

The arithmetic mean is affected by thepresence of extreme values in the data.If you take a measure of centraltendency which is based on middleposition of the data, it is not affectedby extreme items. Median is thatpositional value of the variable whichdivides the distribution into two equalparts, one part comprises all valuesgreater than or equal to the medianvalue and the other comprises allvalues less than or equal to it. TheMedian is the “middle” element whenthe data set is arranged in order of themagnitude.

Computation of median

The median can be easily computedby sorting the data from smallest tolargest and counting the middle value.

Example 5

Suppose we have the followingobservation in a data set: 5, 7, 6, 1, 8,10, 12, 4, and 3.Arranging the data, in ascending orderyou have:

1, 3, 4, 5, 6, 7, 8, 10, 12.

The “middle score” is 6, so themedian is 6. Half of the scores arelarger than 6 and half of the scoresare smaller.

If there are even numbers in thedata, there will be two observationswhich fall in the middle. The medianin this case is computed as the

MEASURES OF CENTRAL TENDENCY 6 5

arithmetic mean of the two middlevalues.

Example 6

The following data provides marks of20 students. You are required tocalculate the median marks.25, 72, 28, 65, 29, 60, 30, 54, 32, 53,33, 52, 35, 51, 42, 48, 45, 47, 46, 33.

Arranging the data in an ascendingorder, you get

25, 28, 29, 30, 32, 33, 33, 35, 42,45, 46, 47, 48, 51, 52, 53, 54, 60,

65, 72.You can see that there are two

observations in the middle, namely 45and 46. The median can be obtainedby taking the mean of the twoobservations:

Median = 45+46

2= 45.5 marks

In order to calculate median it isimportant to know the position of themedian i.e. item/items at which themedian lies. The position of themedian can be calculated by thefollowing formula:

Position of median = (N+1)

2item

th

Where N = number of items.You may note that the above

formula gives you the position of themedian in an ordered array, not themedian itself. Median is computed bythe formula:

Median = size of (N+1)

2item

th

Discrete Series

In case of discrete series the positionof median i.e. (N+1)/2th item can belocated through cumulative freque-ncy. The corresponding value at thisposition is the value of median.

Example 7

The frequency distribution of thenumber of persons and theirrespective incomes (in Rs) are givenbelow. Calculate the median income.

Income (in Rs): 10 20 30 40Number of persons: 2 4 10 4

In order to calculate the medianincome, you may prepare thefrequency distribution as given below.

TABLE 5.4Computation of Median for Discrete Series

Income No of Cumulative(in Rs) persons(f) frequency(cf)

10 2 220 4 630 10 1640 4 20

The median is located in the (N+1)/2 = (20+1)/2 = 10.5th observation.This can be easily located throughcumulative frequency. The 10.5thobservation lies in the c.f. of 16. Theincome corresponding to this is Rs 30,so the median income is Rs 30.

Continuous Series

In case of continuous series you haveto locate the median class where

6 6 STATISTICS FOR ECONOMICS

N/2th item [not (N+1)/2th item] lies.The median can then be obtained asfollows:

Median = L +(N/2 � c.f.)

f� h

Where, L = lower limit of the medianclass,c.f. = cumulative frequency of the classpreceding the median class,f = frequency of the median class,h = magnitude of the median classinterval.

No adjustment is required iffrequency is of unequal size ormagnitude.

Example 8

Following data relates to daily wagesof persons working in a factory.Compute the median daily wage.

Daily wages (in Rs):55–60 50–55 45–50 40–45 35–40 30–3525–30 20–25Number of workers:

7 13 15 20 30 3328 14

The data is arranged in ascendingorder here.

In the above illustration medianclass is the value of (N/2)th item(i.e.160/2) = 80th item of the series,which lies in 35–40 class interval.Applying the formula of the medianas:

TABLE 5.5Computation of Median for Continuous

Series

Daily wages No. of Cumulative(in Rs) Workers (f) Frequency

20–25 14 1425–30 28 4230–35 33 7535–40 30 10540–45 20 12545–50 15 14050–55 13 15355–60 7 160

Median =L + (N/2 � c.f.)

f� h

= 35+(80 � 75)

30� (40 � 35)

=Rs 35.83

Thus, the median daily wage isRs 35.83. This means that 50% of the

MEASURES OF CENTRAL TENDENCY 6 7

workers are getting less than or equalto Rs 35.83 and 50% of the workersare getting more than or equal to thiswage.

You should remember thatmedian, as a measure of centraltendency,is not sensitive to all thevalues in the series. It concentrateson the values of the central items ofthe data.

Activities

• Find mean and median for allfour values of the series. Whatdo you observe?

TABLE 5.6Mean and Median of different series

Series X (Variable Mean MedianValues)

A 1, 2, 3 ? ?B 1, 2, 30 ? ?C 1, 2, 300 ? ?D 1, 2, 3000 ? ?

• Is median affected by extremevalues? What are outliers?

• Is median a better method thanmean?

Quartiles

Quartiles are the measures whichdivide the data into four equal parts,each portion contains equal numberof observations. Thus, there are threequartiles. The first Quartile (denotedby Q

1) or lower quartile has 25% of

the items of the distribution below itand 75% of the items are greater thanit. The second Quartile (denoted by Q

2)

or median has 50% of items below itand 50% of the observations above it.

The third Quartile (denoted by Q3) or

upper Quartile has 75% of the itemsof the distribution below it and 25%of the items above it. Thus, Q

1 and Q

3

denote the two limits within whichcentral 50% of the data lies.

Percentiles

Percentiles divide the distribution intohundred equal parts, so you can get99 dividing positions denoted by P

1,

P2, P

3, ..., P

99. P50 is the median value.

If you have secured 82 percentile in amanagement entrance examination, itmeans that your position is below 18percent of total candidates appearedin the examination. If a total of onelakh students appeared, where do youstand?

Calculation of Quartiles

The method for locating the Quartile

is same as that of the median in case

of individual and discrete series. The

value of Q1 and Q

3 of an ordered series

can be obtained by the following

formula where N is the number of

observations.

Q1= size of

(N + 1)th4

item

6 8 STATISTICS FOR ECONOMICS

Q3 = size of

3(N +1)th4

item.

Example 9

Calculate the value of lower quartilefrom the data of the marks obtainedby ten students in an examination.22, 26, 14, 30, 18, 11, 35, 41, 12, 32.

Arranging the data in an ascendingorder,11, 12, 14, 18, 22, 26, 30, 32, 35, 41.

Q1 = size of

(N +1)th4

item = size of

(10+1)th4

item = size of 2.75th item

= 2nd item + .75 (3rd item – 2nd item)= 12 + .75(14 –12) = 13.5 marks.

Activity

• Find out Q3 yourself.

5. MODE

Sometimes, you may be interested inknowing the most typical value of aseries or the value around whichmaximum concentration of itemsoccurs. For example, a manufacturerwould like to know the size of shoesthat has maximum demand or styleof the shirt that is more frequentlydemanded. Here, Mode is the mostappropriate measure. The word modehas been derived from the Frenchword “la Mode” which signifies themost fashionable values of adistribution, because it is repeated thehighest number of times in the series.

Mode is the most frequently observeddata value. It is denoted by M

o.

Computation of Mode

Discrete Series

Consider the data set 1, 2, 3, 4, 4, 5.The mode for this data is 4 because 4occurs most frequently (twice) in thedata.

Example 10

Look at the following discrete series:

Variable 10 20 30 40 50Frequency 2 8 20 10 5

Here, as you can see the maximumfrequency is 20, the value of mode is30. In this case, as there is a uniquevalue of mode, the data is unimodal.But, the mode is not necessarilyunique, unlike arithmetic mean andmedian. You can have data with twomodes (bi-modal) or more than twomodes (multi-modal). It may bepossible that there may be no mode ifno value appears more frequent thanany other value in the distribution. Forexample, in a series 1, 1, 2, 2, 3, 3, 4,4, there is no mode.

Unimodal Data Bimodal Data

Continuous Series

In case of continuous frequencydistribution, modal class is the classwith largest frequency. Mode can becalculated by using the formula:

MEASURES OF CENTRAL TENDENCY 6 9

TABLE 5.7Grouping Table

Income (in’000 Rs) Group Frequency

I II III IV V VI

45–50 97 – 95 = 240–45 95 – 90 = 5 7 1735–40 90 – 80 = 10 1530–35 80 – 60 = 20 30 3525–30 60 – 30 = 30 50 6020–25 30 – 12 = 18 48 6815–20 12 – 4 = 8 26 5610–15 4 12 30

TABLE 5.8Analysis Table

Columns Class Intervals45–50 40–45 35–40 30–35 25–30 20–25 15–20 10–15

I ×II × ×III × ×IV × × ×V × × ×VI × × ×

Total – – 1 3 6 3 1 –

M LD

D DhO = +

+�1

1 2

Where L = lower limit of the modalclassD

1= difference between the frequency

of the modal class and the frequencyof the class preceding the modal class(ignoring signs).D2 = difference between the frequency

of the modal class and the frequencyof the class succeeding the modalclass (ignoring signs).h = class interval of the distribution.

You may note that in case ofcontinuous series, class intervalsshould be equal and series should be

exclusive to calculate the mode. If midpoints are given, class intervals areto be obtained.

Example 11

Calculate the value of modal workerfamily’s monthly income from thefollowing data:

Income per month (in ’000 Rs)Below 50 Below 45 Below 40 Below 35Below 30 Below 25 Below 20 Below 15

Number of families97 95 90 8060 30 12 4

As you can see this is a case ofcumulative frequency distribution. Inorder to calculate mode, you will haveto covert it into an exclusive series. In

7 0 STATISTICS FOR ECONOMICS

this example, the series is in thedescending order. Grouping andAnalysis table would be made todetermine the modal class.

The value of the mode lies in25–30 class interval. By inspectionalso, it can be seen that this is a modalclass.Now L = 25, D

1 = (30 – 18) = 12, D

2

= (30 – 20) = 10, h = 5Using the formula, you can obtain

the value of the mode as:MO (in ’000 Rs)

M=D

D +D� h1

1 2

= 25+12

10+12� 5=Rs27,273

Thus the modal worker family’smonthly income is Rs 27,273.

Activities

• A shoe company, making shoesfor adults only, wants to knowthe most popular size of shoes.Which average will be mostappropriate for it?

• Take a small survey in your classto know the student’s preferencefor Chinese food usingappropriate measure of centraltendency.

• Can mode be locatedgraphically?

6. RELATIVE POSITION OF ARITHMETICMEAN, MEDIAN AND MODE

Suppose we express,Arithmetic Mean = M

e

Median = Mi

Mode = Mo

so that e, i and o are the suffixes.The relative magnitude of the three areMe>M

i>M

o or M

e<M

i<M

o (suffixes

occurring in alphabetical order). Themedian is always between thearithmetic mean and the mode.

7. CONCLUSION

Measures of central tendency oraverages are used to summarise thedata. It specifies a single mostrepresentative value to describe thedata set. Arithmetic mean is the mostcommonly used average. It is simple

MEASURES OF CENTRAL TENDENCY 7 1

to calculate and is based on all theobservations. But it is unduly affectedby the presence of extreme items.Median is a better summary for suchdata. Mode is generally used todescribe the qualitative data. Medianand mode can be easily computed

EXERCISES

1. Which average would be suitable in the following cases?(i) Average size of readymade garments.(ii) Average intelligence of students in a class.(iii) Average production in a factory per shift.(iv) Average wages in an industrial concern.(v) When the sum of absolute deviations from average is least.(vi) When quantities of the variable are in ratios.(vii)In case of open-ended frequency distribution.

2. Indicate the most appropriate alternative from the multiple choicesprovided against each question.

(i) The most suitable average for qualitative measurement is(a) arithmetic mean(b) median(c) mode

Recap• The measure of central tendency summarises the data with a single

value, which can represent the entire data.• Arithmetic mean is defined as the sum of the values of all observations

divided by the number of observations.• The sum of deviations of items from the arithmetic mean is always

equal to zero.• Sometimes, it is important to assign weights to various items

according to their importance.• Median is the central value of the distribution in the sense that the

number of values less than the median is equal to the number greaterthan the median.

• Quartiles divide the total set of values into four equal parts.• Mode is the value which occurs most frequently.

graphically. In case of open-endeddistribution they can also be easilycomputed. Thus, it is important toselect an appropriate averagedepending upon the purpose ofanalysis and the nature of thedistribution.

7 2 STATISTICS FOR ECONOMICS

(d) geometric mean(e) none of the above

(ii) Which average is affected most by the presence of extreme items?(a) median(b) mode(c) arithmetic mean(d) geometric mean(e) harmonic mean

(iii) The algebraic sum of deviation of a set of n values from A.M. is(a) n(b) 0(c) 1(d) none of the above[Ans. (i) b (ii) c (iii) b]

3. Comment whether the following statements are true or false.(i) The sum of deviation of items from median is zero.(ii) An average alone is not enough to compare series.(iii) Arithmetic mean is a positional value.(iv) Upper quartile is the lowest value of top 25% of items.(v) Median is unduly affected by extreme observations.[Ans. (i) False (ii) True (iii) False (iv) True (v) False]

4. If the arithmetic mean of the data given below is 28, find (a) the missingfrequency, and (b) the median of the series:Profit per retail shop (in Rs) 0-10 10-20 20-30 30-40 40-50 50-60Number of retail shops 12 18 27 - 17 6(Ans. The value of missing frequency is 20 and value of the median isRs 27.41)

5. The following table gives the daily income of ten workers in a factory.Find the arithmetic mean.Workers A B C D E F G H I JDaily Income (in Rs) 120 150 180 200 250 300 220 350 370 260(Ans. Rs 240)

6. Following information pertains to the daily income of 150 families.Calculate the arithmetic mean.Income (in Rs) Number of familiesMore than 75 150,, 85 140,, 95 115,, 105 95,, 115 70,, 125 60,, 135 40,, 145 25(Ans. Rs 116.3)

MEASURES OF CENTRAL TENDENCY 7 3

7. The size of land holdings of 380 families in a village is given below. Findthe median size of land holdings.Size of Land Holdings (in acres)

Less than 100 100–200 200 – 300 300–400 400 and above. –Number of families

40 89 148 64 39(Ans. 241.22 acres)

8. The following series relates to the daily income of workers employed in afirm. Compute (a) highest income of lowest 50% workers (b) minimumincome earned by the top 25% workers and (c) maximum income earnedby lowest 25% workers.Daily Income (in Rs) 10–14 15–19 20–24 25–29 30–34 35–39Number of workers 5 10 15 20 10 5(Hint: compute median, lower quartile and upper quartile.)[Ans. (a) Rs 25.11 (b) Rs 19.92 (c) Rs 29.19]

9. The following table gives production yield in kg. per hectare of wheat of150 farms in a village. Calculate the mean, median and mode productionyield.

Production yield (kg. per hectare)50–53 53–56 56–59 59–62 62–65 65–68 68–71 71–74 74–77

Number of farms3 8 14 30 36 28 16 10 5

(Ans. mean = 63.82 kg. per hectare, median = 63.67 kg. per hectare,mode = 63.29 kg. per hectare)

As the summer heat rises, hillstations, are crowded with more andmore visitors. Ice-cream sales becomemore brisk. Thus, the temperature isrelated to number of visitors and saleof ice-creams. Similarly, as the supplyof tomatoes increases in your localmandi, its price drops. When the localharvest starts reaching the market,the price of tomatoes drops from aprincely Rs 40 per kg to Rs 4 per kg oreven less. Thus supply is related toprice. Correlation analysis is a meansfor examining such relationshipssystematically. It deals with questionssuch as:• Is there any relationship between

two variables?

Correlation7

1. INTRODUCTION

In previous chapters you have learnthow to construct summary measuresout of a mass of data and changesamong similar variables. Now you willlearn how to examine the relationshipbetween two variables.

Studying this chapter shouldenable you to:• understand the meaning of the

term correlation;• understand the nature of

relationship between twovariables;

• calculate the different measuresof correlation;

• analyse the degree and directionof the relationships.

CHAPTER

92 STATISTICS FOR ECONOMICS

• If the value of one variablechanges, does the value of theother also change?

• Do both the variables move in thesame direction?

• How strong is the relationship?

2. TYPES OF RELATIONSHIP

Let us look at various types ofrelationship. The relation betweenmovements in quantity demandedand the price of a commodity is an

integral part of the theory of demand,which you will read in class XII. Lowrainfall is related to low agriculturalproductivity. Such examples ofrelationship may be given a cause andeffect interpretation. Others may bejust coincidence. The relation betweenthe arrival of migratory birds in asanctuary and the birth rates in thelocality can not be given any causeand ef fect interpretation. Therelationships are simple coincidence.The relationship between size of theshoes and money in your pocket isanother such example. Even ifrelationship exist, they are difficult toexplain it.

In another instance a thirdvariable’s impact on two variables maygive rise to a relation between the twovariables. Brisk sale of ice-creams maybe related to higher number of deathsdue to drowning. The victims are notdrowned due to eating of ice-creams.Rising temperature leads to brisk saleof ice-creams. Moreover, large numberof people start going to swimmingpools to beat the heat. This might haveraised the number of deaths bydrowning. Thus temperature is behindthe high correlation between the saleof ice-creams and deaths due todrowning.

What Does Correlation Measure?

Correlation studies and measures thedirection and intensity of relationshipamong variables. Correlationmeasures covariation, not causation.Correlation should never be

CORRELATION 93

interpreted as implying cause andeffect relation. The presence ofcorrelation between two variables Xand Y simply means that when thevalue of one variable is found tochange in one direction, the value ofthe other variable is found to changeeither in the same direction (i.e.positive change) or in the oppositedirection (i.e. negative change), but ina definite way. For simplicity weassume here that the correlation, ifit exists, is linear, i.e. the relativemovement of the two variables can berepresented by drawing a straight lineon graph paper.

Types of Correlation

Correlation is commonly classifiedinto negative and positive correlation.The correlation is said to be positivewhen the variables move together inthe same direction. When the incomerises, consumption also rises. Whenincome falls, consumption also falls.Sale of ice-cream and temperaturemove in the same direction. Thecorrelation is negative when they movein opposite directions. When the priceof apples falls its demand increases.When the prices rise its demanddecreases. When you spend more timein studying, chances of your failingdecline. When you spend less hoursin study, chances of your failingincrease. These are instances ofnegative correlation. The variablesmove in opposite direction.

3. TECHNIQUES FOR MEASURING

CORRELATION

Widely used techniques for the studyof correlation are scatter diagrams,Karl Pearson’s coef ficient ofcorrelation and Spearman’s rankcorrelation.

A scatter diagram visually presentsthe nature of association withoutgiving any specific numerical value. Anumerical measure of linearrelationship between two variables isgiven by Karl Pearson’s coefficient ofcorrelation. A relationship is said tobe linear if it can be represented by astraight line. Another measure isSpearman’s coefficient of correlation,which measures the linear associationbetween ranks assigned to indiviualitems according to their attributes.Attributes are those variables whichcannot be numerically measured suchas intelligence of people, physicalappearance, honesty etc.

Scatter Diagram

A scatter diagram is a usefultechnique for visually examining theform of relationship, withoutcalculating any numerical value. Inthis technique, the values of the twovariables are plotted as points on agraph paper. The cluster of points, soplotted, is referred to as a scatterdiagram. From a scatter diagram, onecan get a fairly good idea of the natureof relationship. In a scatter diagramthe degree of closeness of the scatterpoints and their overall directionenable us to examine the relation-

94 STATISTICS FOR ECONOMICS

ship. If all the points lie on a line, thecorrelation is perfect and is said to beunity. If the scatter points are widelydispersed around the line, thecorrelation is low. The correlation issaid to be linear if the scatter pointslie near a line or on a line.

Scatter diagrams spanning overFig. 7.1 to Fig. 7.5 give us an idea ofthe relationship between twovariables. Fig. 7.1 shows a scatteraround an upward rising lineindicating the movement of thevariables in the same direction. WhenX rises Y will also rise. This is positivecorrelation. In Fig. 7.2 the points arefound to be scattered around adownward sloping line. This time thevariables move in opposite directions.When X rises Y falls and vice versa.This is negative correlation. In Fig.7.3there is no upward rising or downwardsloping line around which the pointsare scattered. This is an example ofno correlation. In Fig. 7.4 and Fig. 7.5the points are no longer scatteredaround an upward rising or downwardfalling line. The points themselves areon the lines. This is referred to asperfect positive correlation and perfectnegative correlation respectively.

Activity

• Collect data on height, weightand marks scored by studentsin your class in any two subjectsin class X. Draw the scatterdiagram of these variables takingtwo at a time. What type ofrelationship do you find?

Inspection of the scatter diagramgives an idea of the nature andintensity of the relationship.

Karl Pearson’s Coefficient ofCorrelation

This is also known as product momentcorrelation and simple correlationcoefficient. It gives a precise numericalvalue of the degree of linearrelationship between two variables Xand Y. The linear relationship may begiven by

Y = a + bXThis type of relation may be

described by a straight line. Theintercept that the line makes on theY-axis is given by a and the slope ofthe line is given by b. It gives thechange in the value of Y for very smallchange in the value of X. On the otherhand, if the relation cannot berepresented by a straight line as in

Y = X2

the value of the coefficient will be zero.It clearly shows that zero correlationneed not mean absence of any typeof relation between the two variables.

Let X1, X

2, ..., X

N be N values of X

and Y1, Y

2 ,..., Y

N be the corresponding

values of Y. In the subsequentpresentations the subscriptsindicating the unit are dropped for thesake of simplicity. The arithmeticmeans of X and Y are defined as

XXN

YY

N= =Σ Σ

;

and their variances are as follows

s22 2

2xX X

NXN

X= - = -Σ Σ( )

CORRELATION 95

96 STATISTICS FOR ECONOMICS

and s22 2

2yY Y

NYN

Y= - = -Σ Σ( )

The standard deviations of X andY respectively are the positive squareroots of their variances. Covariance ofX and Y is defined as

Cov(X,Y) =- - =Σ Σ( )( )X X Y Y

NxyN

Where x X X= - and y X Y= -are the deviations of the ith value of Xand Y from their mean valuesrespectively.

The sign of covariance between Xand Y determines the sign of thecorrelation coefficient. The standarddeviations are always positive. If thecovariance is zero, the correlationcoefficient is always zero. The productmoment correlation or the KarlPearson’s measure of correlation isgiven by

r xyN x y

= Σs s ...(1)

or

rX X Y Y

X X Y Y=

- -

- -

ΣΣ Σ

( ) ( )

( ) ( )2 2...(2)

or

rXY

X YN

XXN

YY

N

=-

- -

Σ Σ Σ

Σ Σ Σ Σ

( )( )

( ) ( )2

2

2

2 ...(3)

or

rN XY X Y

N X X N Y Y= Σ Σ Σ

Σ Σ Σ Σ( )( )

( ) ( )2 2 2 2 ...(4)

Properties of Correlation Coefficient

Let us now discuss the properties ofthe correlation coefficient• r has no unit. It is a pure number.

It means units of measurement arenot part of r. r between height infeet and weight in kilograms, forinstance, is 0.7.

• A negative value of r indicates aninverse relation. A change in onevariable is associated with changein the other variable in theopposite direction. When price ofa commodity rises, its demandfalls. When the rate of interestrises the demand for funds alsofalls. It is because now funds havebecome costlier.

• If r is positive the two variablesmove in the same direction. Whenthe price of coffee, a substitute oftea, rises the demand for tea alsorises. Improvement in irrigationfacilities is associated with higheryield. When temperature rises thesale of ice-creams becomes brisk.

CORRELATION 97

• If r = 0 the two variables areuncorrelated. There is no linearrelation between them. Howeverother types of relation may bethere.

• If r = 1 or r = –1 the correlation isperfect. The relation between themis exact.

• A high value of r indicates stronglinear relationship. Its value issaid to be high when it is close to+1 or –1.

• A low value of r indicates a weaklinear relation. Its value is said tobe low when it is close to zero.

• The value of the correlationcoefficient lies between minus oneand plus one, –1 ≤ r ≤ 1. If, inany exercise, the value of r isoutside this range it indicates errorin calculation.

• The value of r is unaffected by thechange of origin and change ofscale. Given two variables X and Ylet us define two new variables.

U =X A

B; V =

Y CD

where A and C are assumed means ofX and Y respectively. B and D arecommon factors. Then

rxy = ruv

This. property is used to calculate

correlation coefficient in a highlysimplified manner, as in the stepdeviation method.

As you have read in chapter 1, thestatistical methods are no substitutefor common sense. Here, is anotherexample, which highlights the need forunderstanding the data properly

before correlation is calculated. Anepidemic spreads in some villages andthe government sends a team ofdoctors to the affected villages. Thecorrelation between the number ofdeaths and the number of doctors sentto the villages is found to be positive.Normally the health care facilitiesprovided by the doctors are expectedto reduce the number of deathsshowing a negative correlation. Thishappened due to other reasons. Thedata relate to a specific time period.Many of the reported deaths could beterminal cases where the doctorscould do little. Moreover, the benefitof the presence of doctors becomesvisible after some time. It is alsopossible that the reported deaths arenot due to the epidemic. A tsunamisuddenly hits the state and death tollrises.

Let us illustrate the calculation ofr by examining the relationshipbetween years of schooling of thefarmer and the annual yield per acre.

Example 1

No. of years Annual yield perof schooling acre in ’000 (Rs)of farmers

0 42 44 66 108 10

10 812 7

Formula 1 needs the value of

Σxy x y, ,s s

98 STATISTICS FOR ECONOMICS

From Table 7.1 we get,

Σ

Σ

xy

X XNx

=

=-

=

42

1127

2

,

( ),s

sy

Y YN

= - =Σ( )2 387

Substituting these values informula (1)

r = =42

7112

7387

0 644.

The same value can be obtainedfrom formula (2) also.

rX X Y Y

X X Y Y=

- -

- -

Σ

Σ Σ

( )( )

( ) ( )2 2 ...(2)

r = =42

112 380 644.

Thus years of education of thefarmers and annual yield per acre arepositively correlated. The value of r isalso large. It implies that more thenumber of years farmers invest in

education, higher will be the yield peracre. It underlines the importance offarmers’ education.

To use formula (3)

rXY

X YN

XXN

YY

N

=-

- -

Σ Σ Σ

Σ Σ Σ Σ

( )( )

( ) ( )2

2

2

2 ...(3)

the value of the following expressionshave to be calculated i.e.Σ Σ ΣXY X Y, , .2 2

Now apply formula (3) to get thevalue of r.

Let us know the interpretation ofdifferent values of r. The correlationcoefficient between marks secured inEnglish and Statistics is, say, 0.1. Itmeans that though the marks securedin the two subjects are positivelycorrelated, the strength of therelationship is weak. Students with highmarks in English may be gettingrelatively low marks in statistics. Hadthe value of r been, say, 0.9, studentswith high marks in English willinvariably get high marks in Statistics.

TABLE 7.1Calculation of r between years of schooling of farmers and annual yield

Years of (X– X ) (X– X )2 Annual yield (Y– Y ) (Y– Y )2 (X– X )(Y– Y )Education per acre in ’000 Rs(X) (Y)

0 –6 36 4 –3 9 182 –4 16 4 –3 9 124 –2 4 6 –1 1 26 0 0 10 3 9 08 2 4 10 3 9 6

10 4 16 8 1 1 412 6 36 7 0 0 0

Σ X=42 Σ (X– X )2=112 Σ Y=49 Σ (Y– Y )2=38 Σ (X– X )(Y– Y )=42

CORRELATION 99

An example of negative correlationis the relation between arrival ofvegetables in the local mandi and priceof vegetables. If r is –0.9, vegetablesupply in the local mandi will beaccompanied by lower price ofvegetables. Had it been –0.1 largevegetable supply will be accompaniedby lower price, not as low as the price,when r is –0.9. The extent of price falldepends on the absolute value of r.Had it been zero there would havebeen no fall in price, even after largesupplies in the market. This is also apossibility if the increase in supply istaken care of by a good transportnetwork transferring it to othermarkets.

Activity

• Look at the following table.Calculate r between annualgrowth of national income atcurrent price and the GrossDomestic Saving as percentageof GDP.

Step deviation method to calculatecorrelation coefficient.

When the values of the variablesare large, the burden of calculationcan be considerably reduced by usinga property of r. It is that r isindependent of change in origin andscale. It is also known as stepdeviation method. It involves thetransformation of the variables X andY as follows:

TABLE 7.2

Year Annual growth Gross Domesticof National Saving as

Income percentage of GDP

1992–93 14 241993–94 17 231994–95 18 261995–96 17 271996–97 16 251997–98 12 251998–99 16 231999–00 11 252000–01 8 242001–02 10 23

Source: Economic Survey, (2004–05) Pg. 8,9

a property of r. It is that r isindependent of change in origin andscale. It is also known as stepdeviation method. It involves thetransformation of the variables X andY as follows:

U X Ah

V Y Bk

= =� ; �

where A and B are assumed means, hand k are common factors.Then r

UV = r

XY

This can be illustrated with theexercise of analysing the correlationbetween price index and moneysupply.

Example 2

Price 120 150 190 220 230index (X)Money 1800 2000 2500 2700 3000supplyin Rs crores (Y)

The simplification, using stepdeviation method is illustrated below.Let A = 100; h = 10; B = 1700 andk = 100

100 STATISTICS FOR ECONOMICS

The table of transformed variablesis as follows:

Calculation of r between priceindex and money supply using stepdeviation method

TABLE 7.3

U V

X -ÊËÁ

ˆ¯̃

10010

Y -ÊËÁ

ˆ¯̃

1700100

U2 V2 UV

2 1 4 1 2

5 3 25 9 15

9 8 81 64 72

12 10 144 100 120

13 13 169 169 169

Σ Σ ΣΣ Σ

U V U

V UV

= 41; = 35; = 423;

= 343; = 378

2

2

Substituting these values in formula(3)

rUV

U VN

UUN

VVN

=-

- -

Σ Σ Σ

Σ Σ Σ Σ

( )( )

( ) ( )2

2

22 (3)

=

- ¥

- -

37841 35

5

423415

343355

2 2( ) ( )

= 0.98

This strong positive correlationbetween price index and moneysupply is an important premise ofmonetary policy. When the moneysupply grows the price index alsorises.

Activity

• Take some examples of India’spopulation and national income.Calculate the correlationbetween them using stepdeviation method and see thesimplification.

Spearman’s rank correlation

Spearman’s rank correlation wasdeveloped by the British psychologistC.E. Spearman. It is used when thevariables cannot be measuredmeaningfully as in the case of price,income, weight etc. Ranking may bemore meaningful when themeasurements of the variables aresuspect. Consider the situation wherewe are required to calculate thecorrelation between height and weightof students in a remote village. Neithermeasuring rods nor weighing scalesare available. The students can beeasily ranked in terms of height andweight without using measuring rodsand weighing scales.

There are also situations when youare required to quantify qualities suchas fairness, honesty etc. Ranking maybe a better alternative to quantifica-tion of qualities. Moreover, sometimesthe correlation coefficient between twovariables with extreme values may bequite different from the coefficientwithout the extreme values. Underthese circumstances rank correlationprovides a better alternative to simplecorrelation.

Rank correlation coefficient andsimple correlation coefficient have thesame interpretation. Its formula has

CORRELATION 101

been derived from simple correlationcoefficient where individual valueshave been replaced by ranks. Theseranks are used for the calculation ofcorrelation. This coefficient providesa measure of linear associationbetween ranks assigned to theseunits, not their values. It is theProduct Moment Correlation betweenthe ranks. Its formula is

rD

n nk = 163

2Σ...(4)

where n is the number of observationsand D the deviation of ranks assignedto a variable from those assigned tothe other variable. When the ranks arerepeated the formula isrk = 1–

612 12

1

231 1

32 2

2

ΣDm m m m

n n

+ - + - +ÈÎÍ

˘˚̇

-

( ) ( )...

( )

where m1, m

2, ..., are the number of

repetitions of ranks and m m31 1

12 ...,

their corresponding correctionfactors. This correction is needed forevery repeated value of both variables.If three values are repeated, there willbe a correction for each value. Everytime m

1 indicates the number of times

a value is repeated.All the properties of the simple

correlation coefficient are applicablehere. Like the Pearsonian Coefficientof correlation it lies between 1 and–1. However, generally it is not asaccurate as the ordinary method. Thisis due the fact that all the information

concerning the data is not utilised.The first differences of the values ofthe items in the series, arranged inorder of magnitude, are almost neverconstant. Usually the data clusteraround the central values with smallerdifferences in the middle of the array.If the first differences were constantthen r and r

k would give identical

results. The first difference is thedifference of consecutive values.Rank correlation is preferred toPearsonian coefficient when extremevalues are present. In generalr

k is less than or equal to r.

The calculation of rank correlationwill be illustrated under threesituations.1. The ranks are given.2. The ranks are not given. They have

to be worked out from the data.3. Ranks are repeated.

Case 1: When the ranks are given

Example 3

Five persons are assessed by threejudges in a beauty contest. We haveto find out which pair of judges hasthe nearest approach to commonperception of beauty.

Competitors

Judge 1 2 3 4 5

A 1 2 3 4 5B 2 4 1 5 3C 1 3 5 2 4

There are 3 pairs of judgesnecessitating calculation of rankcorrelation thrice. Formula (4) will beused —

102 STATISTICS FOR ECONOMICS

rD

n ns = --

16 2

3

Σ...(4)

The rank correlation between Aand B is calculated as follows:

A B D D2

1 2 –1 12 4 –2 43 1 2 44 5 –1 15 3 2 4

Total 14

Substituting these values informula (4)

rD

n ns = --

16 2

3

Σ...(4)

= -¥-

= - = - =16 145 5

184

1201 0 7 0 3

3. .

The rank correlation between Aand C is calculated as follows:

A C D D2

1 1 0 02 3 –1 13 5 –2 44 2 2 45 4 1 1

Total 10

Substituting these values informula (4) the rank correlation is 0.5.Similarly, the rank correlationbetween the rankings of judges B andC is 0.9. Thus, the perceptions ofjudges A and C are the closest. JudgesB and C have very different tastes.

Case 2: When the ranks are not given

Example 4

We are given the percentage of marks,secured by 5 students in Economicsand Statistics. Then the ranking hasto be worked out and the rankcorrelation is to be calculated.

Student Marks in Marks inStatistics Economics

(X) (Y)

A 85 60B 60 48C 55 49D 65 50E 75 55

Student Ranking in Ranking in Statistics Economics

(Rx) (R

Y)

A 1 1B 4 5C 5 4D 3 3E 2 2

Once the ranking is completeformula (4) is used to calculate rankcorrelation.

Case 3: When the ranks are repeated

Example 5

The values of X and Y are given as

X 25 45 35 40 15 19 35 42Y 55 60 30 35 40 42 36 48

In order to work out the rankcorrelation, the ranks of the valuesare worked out. Common ranks aregiven to the repeated items. The

CORRELATION 103

common rank is the mean of the rankswhich those items would haveassumed if they were slightly differentfrom each other. The next item will beassigned the rank next to the rankalready assumed. The formula ofSpearman’s rank correlationcoef ficient when the ranks arerepeated is as follows

r

Dm m m m

n n

s = -

+ - + - +ÈÎÍ

˘˚̇

-

1

612 12

1

231 1

32 2

2

Σ ( ) ( )...

( )

where m1, m

2, ..., are the number

of repetitions of ranks and

m m31 1

12-

..., their corresponding

correction factors.X has the value 35 both at the

4th and 5th rank. Hence both aregiven the average rank i.e.,

4 52

4 5+

=th . th rank

X Y Rank of Rank of Deviation in D2

RankingXR' YR'' D=R'–R''

25 55 6 2 4 1645 80 1 1 0 035 30 4.5 8 3.5 12.2540 35 3 7 –4 1615 40 8 5 3 919 42 7 4 3 935 36 4.5 6 –1.5 2.2542 48 2 3 –1 1

Total ΣD = 65 5.

The necessary correction thus is

m m3 3

122 2

1212

- = - =

Using this equation

rD

m m

n ns = -+ -È

ÎÍ˘˚̇

-1

612

23

3

Σ ( )

...(5)

Substituting the values of theseexpressions

rs = - +-

= -

= - =

16 65 5 0 5

8 81

396504

1 0 786 0 214

3

( . . )

. .

Thus there is positive rank correlationbetween X and Y. Both X and Y movein the same direction. However, therelationship cannot be described asstrong.

Activity

• Collect data on marks scored by10 of your classmates in classIX and X examinations. Calculatethe rank correlation coefficientbetween them. If your data do nothave any repetition, repeat theexercise by taking a data sethaving repeated ranks. What arethe circumstances in which rankcorrelation coef ficient ispreferred to simple correlationcoefficient? If data are preciselymeasured will you still preferrank correlation coefficient tosimple correlation? When canyou be indifferent to the choice?Discuss in class.

4. CONCLUSION

We have discussed some techniquesfor studying the relationship between

104 STATISTICS FOR ECONOMICS

EXERCISES

1. The unit of correlation coefficient between height in feet and weight inkgs is(i) kg/feet(ii) percentage(iii) non-existent

2. The range of simple correlation coefficient is(i) 0 to infinity(ii) minus one to plus one(iii) minus infinity to infinity

3. If rxy is positive the relation between X and Y is of the type

(i) When Y increases X increases(ii) When Y decreases X increases(iii) When Y increases X does not change

two variables, particularly the linearrelationship. The scatter diagram givesa visual presentation of therelationship and is not confined tolinear relations. Measures ofcorrelation such as Karl Pearson’scoefficient of correlation andSpearman’s rank correlation arestrictly the measures of linear

Recap

• Correlation analysis studies the relation between two variables.• Scatter diagrams give a visual presentation of the nature of

relationship between two variables.• Karl Pearson’s coefficient of correlation r measures numerically only

linear relationship between two variables. r lies between –1 and 1.• When the variables cannot be measured precisely Spearman’s rank

correlation can be used to measure the linear relationshipnumerically.

• Repeated ranks need correction factors.• Correlation does not mean causation. It only means

covariation.

relationship. When the variablescannot be measured precisely, rankcorrelation can meaningfully be used.These measures however do not implycausation. The knowledge ofcorrelation gives us an idea of thedirection and intensity of change in avariable when the correlated variablechanges.

CORRELATION 105

4. If rxy = 0 the variable X and Y are

(i) linearly related(ii) not linearly related(iii) independent

5. Of the following three measures which can measure any type of relationship(i) Karl Pearson’s coefficient of correlation(ii) Spearman’s rank correlation(iii) Scatter diagram

6. If precisely measured data are available the simple correlation coefficientis(i) more accurate than rank correlation coefficient(ii) less accurate than rank correlation coefficient(iii) as accurate as the rank correlation coefficient

7. Why is r preferred to covariance as a measure of association?

8. Can r lie outside the –1 and 1 range depending on the type of data?

9. Does correlation imply causation?

10. When is rank correlation more precise than simple correlation coefficient?

11. Does zero correlation mean independence?

12. Can simple correlation coefficient measure any type of relationship?

13. Collect the price of five vegetables from your local market every day for aweek. Calculate their correlation coefficients. Interpret the result.

14. Measure the height of your classmates. Ask them the height of theirbenchmate. Calculate the correlation coefficient of these two variables.Interpret the result.

15. List some variables where accurate measurement is difficult.

16. Interpret the values of r as 1, –1 and 0.

17. Why does rank correlation coefficient differ from Pearsonian correlationcoefficient?

18. Calculate the correlation coefficient between the heights of fathers in inches(X) and their sons (Y)X 65 66 57 67 68 69 70 72Y 67 56 65 68 72 72 69 71(Ans. r = 0.603)

19. Calculate the correlation coefficient between X and Y and comment ontheir relationship:

X –3 –2 –1 1 2 3Y 9 4 1 1 4 9(Ans. r = 0)

106 STATISTICS FOR ECONOMICS

Activity

• Use all the formulae discussed here to calculate r betweenIndia’s national income and export taking at least tenobservations.

20. Calculate the correlation coefficient between X and Y and comment ontheir relationship

X 1 3 4 5 7 8Y 2 6 8 10 14 16(Ans. r = 1)

1. INTRODUCTION

In the previous chapter, you havestudied how to sum up the data intoa single representative value. However,that value does not reveal thevariability present in the data. In thischapter you will study those

measures, which seek to quantifyvariability of the data.

Three friends, Ram, Rahim andMaria are chatting over a cup of tea.During the course of theirconversation, they start talking abouttheir family incomes. Ram tells themthat there are four members in hisfamily and the average income permember is Rs 15,000. Rahim says thatthe average income is the same in hisfamily, though the number of membersis six. Maria says that there are fivemembers in her family, out of whichone is not working. She calculates thatthe average income in her family too,is Rs 15,000. They are a little surprisedsince they know that Maria’s father isearning a huge salary. They go intodetails and gather the following data:

Measures of Dispersion

Studying this chapter shouldenable you to:• know the limitations of averages;• appreciate the need of measures

of dispersion;• enumerate various measures of

dispersion;• calculate the measures and

compare them;• distinguish between absolute

and relative measures.

CHAPTER

MEASURES OF DISPERSION 75

Family Incomes

Sl. No. Ram Rahim Maria

1. 12,000 7,000 02. 14,000 10,000 7,0003. 16,000 14,000 8,0004. 18,000 17,000 10,0005. ----- 20,000 50,0006. ----- 22,000 ------

Total income 60,000 90,000 75,000Average income 15,000 15,000 15,000

Do you notice that although theaverage is the same, there areconsiderable differences in individualincomes?

It is quite obvious that averagestry to tell only one aspect of adistribution i.e. a representative sizeof the values. To understand it better,you need to know the spread of valuesalso.

You can see that in Ram’s family.,dif ferences in incomes arecomparatively lower. In Rahim’sfamily, differences are higher and inMaria’s family are the highest.Knowledge of only average isinsufficient. If you have another valuewhich reflects the quantum of

variation in values, your understan-ding of a distribution improvesconsiderably. For example, per capitaincome gives only the average income.A measure of dispersion can tell youabout income inequalities, therebyimproving the understanding of therelative standards of living enjoyed bydifferent strata of society.

Dispersion is the extent to whichvalues in a distribution differ from theaverage of the distribution.

To quantify the extent of thevariation, there are certain measuresnamely:(i) Range(ii) Quartile Deviation(iii)Mean Deviation(iv) Standard Deviation

Apart from these measures whichgive a numerical value, there is agraphic method for estimatingdispersion.

Range and Quartile Deviationmeasure the dispersion by calculatingthe spread within which the values lie.Mean Deviation and StandardDeviation calculate the extent towhich the values differ from theaverage.

2. MEASURES BASED UPON SPREAD OF

VALUES

Range

Range (R) is the difference between thelargest (L) and the smallest value (S)in a distribution. Thus,R = L – S

Higher value of Range implieshigher dispersion and vice-versa.

76 STATISTICS FOR ECONOMICS

Activities

Look at the following values:20, 30, 40, 50, 200• Calculate the Range.• What is the Range if the value

200 is not present in the dataset?

• If 50 is replaced by 150, whatwill be the Range?

Range: CommentsRange is unduly affected by extremevalues. It is not based on all thevalues. As long as the minimum andmaximum values remain unaltered,any change in other values does notaffect range. It can not be calculatedfor open-ended frequency distri-bution.

Notwithstanding some limitations,Range is understood and usedfrequently because of its simplicity.For example, we see the maximumand minimum temperatures ofdifferent cities almost daily on our TVscreens and form judgments about thetemperature variations in them.

Open-ended distributions are thosein which either the lower limit of thelowest class or the upper limit of thehighest class or both are notspecified.

Activity

• Collect data about 52-weekhigh/low of 10 shares from anewspaper. Calculate the rangeof share prices. Which stock ismost volatile and which is themost stable?

Quartile Deviation

The presence of even one extremelyhigh or low value in a distribution canreduce the utility of range as ameasure of dispersion. Thus, you mayneed a measure which is not undulyaffected by the outliers.

In such a situation, if the entiredata is divided into four equal parts,each containing 25% of the values, weget the values of Quartiles andMedian. (You have already read aboutthese in Chapter 5).

The upper and lower quartiles (Q3

and Q1, respectively) are used tocalculate Inter Quartile Range whichis Q3 – Q1.

Inter-Quartile Range is basedupon middle 50% of the values in adistribution and is, therefore, notaffected by extreme values. Half ofthe Inter-Quartile Range is calledQuartile Deviation. Thus:

Q .D . = Q - Q

23 1

Q.D. is therefore also called Semi-Inter Quartile Range.

Calculation of Range and Q.D. forungrouped data

Example 1

Calculate Range and Q.D. of thefollowing observations:

20, 25, 29, 30, 35, 39, 41,48, 51, 60 and 70

Range is clearly 70 – 20 = 50For Q.D., we need to calculate

values of Q3 and Q1.

MEASURES OF DISPERSION 77

Q1 is the size of n

th+14

value.

n being 11, Q1 is the size of 3rdvalue.

As the values are already arrangedin ascending order, it can be seen thatQ1, the 3rd value is 29. [What will youdo if these values are not in an order?]

Similarly, Q3 is size of 3 1

4( )n

th+

value; i.e. 9th value which is 51. HenceQ3 = 51

Q .D . = Q - Q

23 1 =

51 292

11-

=

Do you notice that Q.D. is theaverage difference of the Quartilesfrom the median.

Activity

• Calculate the median and checkwhether the above statement iscorrect.

Calculation of Range and Q.D. for afrequency distribution.

Example 2

For the following distribution of marksscored by a class of 40 students,calculate the Range and Q.D.

TABLE 6.1

Class intervals No. of students C I (f)

0–10 510–20 820–40 1640–60 760–90 4

40

Range is just the dif ferencebetween the upper limit of the highestclass and the lower limit of the lowestclass. So Range is 90 – 0 = 90. ForQ.D., first calculate cumulativefrequencies as follows:

Class- Frequencies CumulativeIntervals FrequenciesCI f c. f.

0–10 5 0510–20 8 1320–40 16 2940–60 7 3660–90 4 40

n = 40

Q1 is the size of n th

4 value in a

continuous series. Thus it is the sizeof the 10th value. The class containingthe 10th value is 10–20. Hence Q1 liesin class 10–20. Now, to calculate theexact value of Q1, the followingformula is used:

Q L

ncf

fi1

4= + ·

Where L = 10 (lower limit of therelevant Quartile class)

c.f. = 5 (Value of c.f. for the classpreceding the Quartile class)i = 10 (interval of the Quartile

class), andf = 8 (frequency of the Quartile

class) Thus,

Q1 1010 5

810 16 25= +

-· = .

Similarly, Q3 is the size of 3

4n th

78 STATISTICS FOR ECONOMICS

value; i.e., 30th value, which lies inclass 40–60. Now using the formulafor Q3, its value can be calculated asfollows:

Q = L +

3n4

- c.f.

f i3

Q = 40 + 30 - 29

7 203

Q = 42.87

Q.D. = 42.87 - 16.25

2 = 13.31

3

In individual and discrete series, Q1

is the size of n th+1

4value, but in a

continuous distribution, it is the size

of n th

4 value. Similarly, for Q3 and

median also, n is used in place ofn+1.

If the entire group is divided intotwo equal halves and the mediancalculated for each half, you will havethe median of better students and themedian of weak students. Thesemedians differ from the median of theentire group by 13.31 on an average.Similarly, suppose you have dataabout incomes of people of a town.Median income of all people can becalculated. Now if all people aredivided into two equal groups of richand poor, medians of both groups canbe calculated. Quartile Deviation willtell you the average difference betweenmedians of these two groups belonging

to rich and poor, from the median ofthe entire group.

Quartile Deviation can generally becalculated for open-ended distribu-tions and is not unduly affected byextreme values.

3. MEASURES OF DISPERSION FROM

AVERAGE

Recall that dispersion was defined asthe extent to which values differ fromtheir average. Range and QuartileDeviation do not attempt to calculate,how far the values are, from theiraverage. Yet, by calculating the spreadof values, they do give a good ideaabout the dispersion. Two measureswhich are based upon deviation of thevalues from their average are MeanDeviation and Standard Deviation.

Since the average is a centralvalue, some deviations are positiveand some are negative. If these areadded as they are, the sum will notreveal anything. In fact, the sum ofdeviations from Arithmetic Mean isalways zero. Look at the following twosets of values.

Set A : 5, 9, 16Set B : 1, 9, 20

You can see that values in Set Bare farther from the average and hencemore dispersed than values in Set A.Calculate the deviations fromArithmetic Mean amd sum them up.What do you notice? Repeat the samewith Median. Can you comment uponthe quantum of variation from thecalculated values?

MEASURES OF DISPERSION 79

Mean Deviation tries to overcomethis problem by ignoring the signs ofdeviations, i.e., it considers alldeviations positive. For standarddeviation, the deviations are firstsquared and averaged and thensquare root of the average is found.We shall now discuss them separatelyin detail.

Mean Deviation

Suppose a college is proposed forstudents of five towns A, B, C, D andE which lie in that order along a road.Distances of towns in kilometres fromtown A and number of students inthese towns are given below:

Town Distance No.from town A of Students

A 0 90B 2 150C 6 100D 14 200E 18 80

620

Now, if the college is situated intown A, 150 students from town B willhave to travel 2 kilometers each (atotal of 300 kilometres) to reach thecollege. The objective is to find alocation so that the average distancetravelled by students is minimum.

You may observe that the studentswill have to travel more, on an average,if the college is situated at town A orE. If on the other hand, it issomewhere in the middle, they arelikely to travel less. The averagedistance travelled is calculated by

Mean Deviation which is simply thearithmetic mean of the differences ofthe values from their average. Theaverage used is either the arithmeticmean or median.

(Since the mode is not a stableaverage, it is not used to calculateMean Deviation.)

Activities

• Calculate the total distance to betravelled by students if thecollege is situated at town A, attown C, or town E and also if itis exactly half way between A andE.

• Decide where, in you opinion,the college should be establi-shed, if there is only one studentin each town. Does it changeyour answer?

Calculation of Mean Deviation fromArithmetic Mean for ungroupeddata.

Direct Method

Steps:

(i) The A.M. of the values is calculated(ii) Difference between each value and

the A.M. is calculated. Alldif ferences are consideredpositive. These are denoted as |d|

(iii)The A.M. of these dif ferences(called deviations) is the MeanDeviation.

i.e. M Dd

n. .

| |=

S

Example 3

Calculate the Mean Deviation of thefollowing values; 2, 4, 7, 8 and 9.

80 STATISTICS FOR ECONOMICS

The A MXn

. . = =S

6

X |d|

2 44 27 18 29 3

12

M D X. . .( ) = =125

2 4

Assumed Mean Method

Mean Deviation can also be calculatedby calculating deviations from anassumed mean. This method isadopted especially when the actualmean is a fractional number. (Takecare that the assumed mean is closeto the true mean).

For the values in example 3,suppose value 7 is taken as assumedmean, M.D. can be calculated asunder:

Example 4

X |d|

2 54 37 08 19 2

11

In such cases, the followingformula is used,

M Dd x Ax f f

nxB A. .

| | ( )( )( ) =

+ - -S S S

Where Σ |d| is the sum of absolutedeviations taken from the assumedmean.

x is the actual mean.A x is the assumed mean used tocalculate deviations.

Σ fB is the number of values below theactual mean including the actualmean.

Σ fA is the number of values above theactual mean.

Substituting the values in theabove formula:

M D x. .( )( )

.( ) =+ - -

= = 11 6 7 2 3

5125

2 4

Mean Deviation from median forungrouped data.

Direct Method

Using the values in example 3, M.D.from the Median can be calculated asfollows,(i) Calculate the median which is 7.(ii) Calculate the absolute deviations

from median, denote them as |d|.(iii)Find the average of these absolute

deviations. It is the MeanDeviation.

Example 5

[X-Median]X |d|

2 54 37 08 19 2

11

MEASURES OF DISPERSION 81

M. D. from Median is thus,

M Dd

nmedian. .| |

.( ) = = =S 11

52 2

Short-cut method

To calculate Mean Deviation by shortcut method a value (A) is used tocalculate the deviations and thefollowing formula is applied.

M D

d Median A f fn

Median

B A

. .

| | ( )( )( )

=+ - -S S S

where, A = the constant from whichdeviations are calculated. (Othernotations are the same as given in theassumed mean method).

Mean Deviation from Mean forContinuous distribution

TABLE 6.2

Profits of Number ofcompanies Companies(Rs in lakhs) frequenciesClass-intervals

10–20 520–30 830–50 1650–70 870–80 3

40

Steps:

(i) Calculate the mean of thedistribution.

(ii) Calculate the absolute deviations|d| of the class midpoints from themean.

(iii)Multiply each |d| value with itscorresponding frequency to getf|d| values. Sum them up to get

Σ f|d|.

(iv) Apply the following formula,

M Df d

fx. .

| |( )

=S

S

Mean Deviation of the distributionin Table 6.2 can be calculated asfollows:

Example 6

C.I. f m.p. |d| f|d|

10–20 5 15 25.5 127.520–30 8 25 15.5 124.030–50 16 40 0.5 8.050–70 8 60 19.5 156.070–80 3 75 34.5 103.5

40 519.0

M Df d

fx. .

| |.

( )= = =

S

S

51940

12 975

Mean Deviation from Median

TABLE 6.3

Class intervals Frequencies

20–30 530–40 1040–60 2060–80 980–90 6

50

The procedure to calculate MeanDeviation from the median is thesame as it is in case of M.D. fromMean, except that deviations are tobe taken from the median as givenbelow:

82 STATISTICS FOR ECONOMICS

Example 7

C.I. f m.p. |d| f|d|

20–30 5 25 25 12530–40 10 35 15 15040–60 20 50 0 060–80 9 70 20 18080–90 6 85 35 210

50 665

M Df d

fMedian. .| |

( ) =S

S

= =66550

13 3.

Mean Deviation: CommentsMean Deviation is based on allvalues. A change in even one valuewill affect it. It is the least whencalculated from the median i.e., itwill be higher if calculated from themean. However it ignores the signsof deviations and cannot becalculated for open-ended distribu-tions.

Standard Deviation

Standard Deviation is the positivesquare root of the mean of squareddeviations from mean. So if there arefive values x1, x2, x3, x4 and x5, firsttheir mean is calculated. Thendeviations of the values from mean arecalculated. These deviations are thensquared. The mean of these squareddeviations is the variance. Positivesquare root of the variance is thestandard deviation.(Note that Standard Deviation iscalculated on the basis of the meanonly).

Calculation of Standard Deviationfor ungrouped data

Four alternative methods are availablefor the calculation of standarddeviation of individual values. Allthese methods result in the samevalue of standard deviation. These are:

(i) Actual Mean Method(ii) Assumed Mean Method(iii)Direct Method(iv) Step-Deviation Method

Actual Mean Method:

Suppose you have to calculate thestandard deviation of the followingvalues:

5, 10, 25, 30, 50

Example 8

X d d2

5 –19 36110 –14 19625 +1 130 +6 3650 +26 676

0 1270

Following formula is used:

s =Sdn

2

s = = =1270

5254 15 937.

Do you notice the value from whichdeviations have been calculated in theabove example? Is it the Actual Mean?

Assumed Mean Method

For the same values, deviations maybe calculated from any arbitrary value

MEASURES OF DISPERSION 83

A x such that d = X – A x . Taking A x= 25, the computation of the standarddeviation is shown below:

Example 9

X d d2

5 –20 40010 –15 22525 0 030 +5 2550 +25 625

–5 1275

Formula for Standard Deviation

s = -�

Ł���

ł����

S Sdn

dn

2 2

s = --�

Ł���

ł���� = =

12755

55

254 15 9372

.

The sum of deviations from a valueother than actul mean is not equalto zero

Direct Method

Standard Deviation can also becalculated from the values directly,i.e., without taking deviations, asshown below:

Example 10

X x2

5 2510 10025 62530 90050 2500

120 4150

(This amounts to taking deviationsfrom zero)

Following formula is used.

s = -Sxn

x2

2( )

or s = -4150

524 2( )

or s = =254 15 937.

Standard Deviation is not affectedby the value of the constant fromwhich deviations are calculated. Thevalue of the constant does not figurein the standard deviation formula.Thus, Standard Deviation isIndependent of Origin.

Step-deviation Method

If the values are divisible by a commonfactor, they can be so divided andstandard deviation can be calculatedfrom the resultant values as follows:

Example 11

Since all the five values are divisibleby a common factor 5, we divide andget the following values:

x x' d d2

5 1 –3.8 14.4410 2 –2.8 7.8425 5 +0.2 0.0430 6 +1.2 1.4450 10 +5.2 27.04

0 50.80

(Steps in the calculation are sameas in actual mean method).

The following formula is used tocalculate standard deviation:

84 STATISTICS FOR ECONOMICS

s = ·Sdn

c2

xxc

’ =

c = common factorSubstituting the values,

s = 50.80

5 5

s = ·10 16 5.

s = 15 937.

Alternatively, instead of dividingthe values by a common factor, thedeviations can be divided by acommon factor. Standard Deviationcan be calculated as shown below:

Example 12

x d d' d2

5 –20 –4 1610 –15 –3 925 0 0 030 +5 +1 150 +25 +5 25

–1 51

Deviations have been calculatedfrom an arbitrary value 25. Commonfactor of 5 has been used to dividedeviations.

s =S Sd

ndn

c’ ’2 2

�Ł�

�ł�

·

s = --51

51

55

2�Ł�

�ł�

·

s = =10 16 5 15 937. .·

Standard Deviation is notindependent of scale. Thus, if thevalues or deviations are divided bya common factor, the value of thecommon factor is used in theformula to get the value of StandardDeviation.

Standard Deviation in Continuousfrequency distribution:

Like ungrouped data, S.D. can becalculated for grouped data by any ofthe following methods:(i) Actual Mean Method(ii) Assumed Mean Method(iii)Step-Deviation Method

Actual Mean Method

For the values in Table 6.2, StandardDeviation can be calculated as follows:

Example 13

(1) (2) (3) (4) (5) (6) (7)CI f m fm d fd fd2

10–20 5 15 75 –25.5 –127.5 3251.2520–30 8 25 200 –15.5 –124.0 1922.0030–50 16 40 640 –0.5 –8.0 4.0050–70 8 60 480 +19.5 +156.0 3042.0070–80 3 75 225 +34.5 +103.5 3570.75

40 1620 0 11790.00

Following steps are required:1. Calculate the mean of the

distribution.

xfmf

= = =S

S

162040

40 5.

2. Calculate deviations of mid-valuesfrom the mean so that

d m x= - (Col. 5)3. Multiply the deviations with their

MEASURES OF DISPERSION 85

corresponding frequencies to get‘fd’ values (col. 6) [Note that Σ fd= 0]

4. Calculate ‘fd2’ values bymultiplying ‘fd’ values with ‘d’values. (Col. 7). Sum up these toget Σ fd2.

5. Apply the formula as under:

s = = =Sfd

n

2 1179040

17 168.

Assumed Mean Method

For the values in example 13,standard deviation can be calculatedby taking deviations from an assumedmean (say 40) as follows:

Example 14

(1) (2) (3) (4) (5) (6)CI f m d fd fd2

10–20 5 15 -25 –125 312520–30 8 25 -15 –120 180030–50 16 40 0 0 050–70 8 60 +20 160 320070–80 3 75 +35 105 3675

40 +20 11800

The following steps are required:1. Calculate mid-points of classes

(Col. 3)2. Calculate deviations of mid-points

from an assumed mean such thatd = m – A x (Col. 4). AssumedMean = 40.

3. Multiply values of ‘d’ withcorresponding frequencies to get‘fd’ values (Col. 5). (note that thetotal of this column is not zerosince deviations have been takenfrom assumed mean).

4. Multiply ‘fd’ values (Col. 5) with ‘d’values (col. 4) to get fd2 values (col.6). Find Σ fd2.

5. Standard Deviation can becalculated by the followingformula.

s = -�Ł�

�ł�

S Sfdn

fdn

2 2

or s = -�Ł�

�ł�

1180040

2040

2

or s = =294 75 17 168. .

Step-deviation Method

In case the values of deviations aredivisible by a common factor, thecalculations can be simplified by thestep-deviation method as in thefollowing example.

Example 15

(1) (2) (3) (4) (5) (6) (7)CI f m d d' fd' fd'2

10–20 5 15 –25 –5 –25 12520–30 8 25 –15 –3 –24 7230–50 16 40 0 0 0 050–70 8 60 +20 +4 +32 12870–80 3 75 +35 +7 +21 147

40 +4 472

Steps required:

1. Calculate class mid-points (Col. 3)and deviations from an arbitrarilychosen value, just like in theassumed mean method. In thisexample, deviations have beentaken from the value 40. (Col. 4)

2. Divide the deviations by a commonfactor denoted as ‘C’. C = 5 in the

86 STATISTICS FOR ECONOMICS

above example. The values soobtained are ‘d'’ values (Col. 5).

3. Multiply ‘d'’ values withcorresponding ‘f'’ values (Col. 2) toobtain ‘fd'’ values (Col. 6).

4. Multiply ‘fd'’ values with ‘d'’ valuesto get ‘fd'2’ values (Col. 7)

5. Sum up values in Col. 6 and Col.7 to get Σ fd' and Σ fd'2 values.

6. Apply the following formula.

s =¢

-¢�

Ł��ł�

·S

S

S

S

fdf

fdf

c2 2

or s = -�Ł�

�ł�

·47240

440

52

or s = - ·11 8 01 5. .

ors

s

= ·

=

11 79 5

17 168

.

.

Standard Deviation, the most widelyused measure of dispersion, is basedon all values. Therefore a change ineven one value affects the value ofstandard deviation. It is independentof origin but not of scale. It is alsouseful in certain advanced statisticalproblems.

5. ABSOLUTE AND RELATIVE MEASURES

OF DISPERSION

All the measures, described so far, areabsolute measures of dispersion. Theycalculate a value which, at times, isdifficult to interpret. For example,consider the following two data sets:

Set A 500 700 1000Set B 100000 120000 130000

Suppose the values in Set A arethe daily sales recorded by an ice-cream vendor, while Set B has thedaily sales of a big departmental store.Range for Set A is 500 whereas for SetB, it is 30,000. The value of Range ismuch higher in Set B. Can you saythat the variation in sales is higherfor the departmental store? It can beeasily observed that the highest valuein Set A is double the smallest value,whereas for the Set B, it is only 30%higher. Thus absolute measures maygive misleading ideas about the extentof variation specially when theaverages differ significantly.

Another weakness of absolutemeasures is that they give the answerin the units in which original valuesare expressed. Consequently, if thevalues are expressed in kilometers, thedispersion will also be in kilometers.However, if the same values areexpressed in meters, an absolutemeasure will give the answer in metersand the value of dispersion will appearto be 1000 times.

To overcome these problems,relative measures of dispersion can beused. Each absolute measure has arelative counterpart. Thus, for Range,there is Coefficient of Range which iscalculated as follows:

Coefficient of Range =-

+

L SL S

where L = Largest valueS = Smallest value

Similarly, for Quartile Deviation, it

MEASURES OF DISPERSION 87

be compared even across differentgroups having different units ofmeasurement.

7. LORENZ CURVE

The measures of dispersiondiscussed so far give a numericalvalue of dispersion. A graphicalmeasure called Lorenz Curve isavailable for estimating dispersion.You may have heard of statements like‘top 10% of the people of a countryearn 50% of the national income whiletop 20% account for 80%’. An ideaabout income disparities is given bysuch figures. Lorenz Curve uses theinformation expressed in a cumulativemanner to indicate the degree ofvariability. It is specially useful incomparing the variability of two ormore distributions.

Given below are the monthlyincomes of employees of a company.

TABLE 6.4

Incomes Number of employees

0–5,000 55,000–10,000 1010,000–20,000 1820,000–40,000 1040,000–50,000 7

is Coefficient of Quartile Deviationwhich can be calculated as follows:

Coefficient of Quartile Deviation

=-

+

Q QQ Q

3

3

1

1 where Q3=3rd Quartile

Q1 = 1st QuartileFor Mean Deviation, it is

Coefficient of Mean Deviation.Coefficient of Mean Deviation =

M D x

xor

M D MedianMedian

. .( ) . .( )

Thus if Mean Deviation iscalculated on the basis of the Mean,it is divided by the Mean. If Median isused to calculate Mean Deviation, itis divided by the Median.

For Standard Deviation, therelative measure is called Coefficientof Variation, calculated as below:

Coefficient of Variation

= Standard Deviation

Arithmetic Mean· 100

It is usually expressed inpercentage terms and is the mostcommonly used relative measure ofdispersion. Since relative measuresare free from the units in which thevalues have been expressed, they can

Example 16

Income Mid-points Cumulative Cumulative No. of Comulative Comulativelimits mid-points mid-points as employees frequencies frequencies as

percentages frequencies percentages(1) (2) (3) (4) (5) (6) (7)

0–5000 2500 2500 2.5 5 5 105000–10000 7500 10000 10.0 10 15 3010000–20000 15000 25000 25.0 18 33 6620000–40000 30000 55000 55.0 10 43 8640000–50000 45000 100000 100.0 7 50 100

88 STATISTICS FOR ECONOMICS

Construction of the Lorenz Curve

Following steps are required.

1. Calculate class mid-points andfind cumulative totals as in Col. 3in the example 16, given above.

2. Calculate cumulative frequenciesas in Col. 6.

3. Express the grand totals of Col. 3and 6 as 100, and convert thecumulative totals in these columnsinto percentages, as in Col. 4 and 7.

4. Now, on the graph paper, take thecumulative percentages of thevariable (incomes) on Y axis andcumulative percentages offrequencies (number of employees)on X-axis, as in figure 6.1. Thuseach axis will have values from ‘0’to ‘100’.

5. Draw a line joining Co-ordinate(0, 0) with (100,100). This is calledthe line of equal distributionshown as line ‘OC’ in figure 6.1.

6. Plot the cumulative percentages ofthe variable with correspondingcumulative percentages offrequency. Join these points to getthe curve OAC.

Studying the Lorenz Curve

OC is called the line of equaldistribution, since it would imply asituation like, top 20% people earn20% of total income and top 60% earn60% of the total income. The fartherthe curve OAC from this line, thegreater is the variability present in thedistribution. If there are two or morecurves, the one which is the farthest

from line OC has the highestdispersion.

8. CONCLUSION

Although Range is the simplest tocalculate and understand, it is undulyaffected by extreme values. QD is notaffected by extreme values as it isbased on only middle 50% of the data.However, it is more dif ficult tointerpret M.D. and S.D. both are basedupon deviations of values from theiraverage. M.D. calculates average ofdeviations from the average butignores signs of deviations andtherefore appears to be unmathema-tical. Standard Deviation attempts tocalculate average deviation frommean. Like M.D., it is based on allvalues and is also applied in moreadvanced statistical problems. It isthe most widely used measure ofdispersion.

MEASURES OF DISPERSION 89

EXERCISES

1. A measure of dispersion is a good supplement to the central value inunderstanding a frequency distribution. Comment.

2. Which measure of dispersion is the best and how?

3. Some measures of dispersion depend upon the spread of values whereassome calculate the variation of values from a central value. Do you agree?

4. In a town, 25% of the persons earned more than Rs 45,000 whereas 75%earned more than 18,000. Calculate the absolute and relative values ofdispersion.

5. The yield of wheat and rice per acre for 10 districts of a state is as under:District 1 2 3 4 5 6 7 8 9 10Wheat 12 10 15 19 21 16 18 9 25 10Rice 22 29 12 23 18 15 12 34 18 12Calculate for each crop,(i) Range(ii) Q.D.(iii) Mean Deviation about Mean(iv) Mean Deviation about Median(v) Standard Deviation(vi) Which crop has greater variation?(vii)Compare the values of different measures for each crop.

6. In the previous question, calculate the relative measures of variation andindicate the value which, in your opinion, is more reliable.

7. A batsman is to be selected for a cricket team. The choice is between Xand Y on the basis of their five previous scores which are:

Recap

• A measure of dispersion improves our understanding about thebehaviour of an economic variable.

• Range and Quartile Deviation are based upon the spread of values.• M.D. and S.D. are based upon deviations of values from the average.• Measures of dispersion could be Absolute or Relative.• Absolute measures give the answer in the units in which data are

expressed.• Relative smeasures are free from these units, and consequently can

be used to compare different variables.• A graphic method, which estimates the dispersion from shape

of a curve, is called Lorenz Curve.

90 STATISTICS FOR ECONOMICS

X 25 85 40 80 120Y 50 70 65 45 80Which batsman should be selected if we want, (i) a higher run getter, or(ii) a more reliable batsman in the team?

8. To check the quality of two brands of lightbulbs, their life in burninghours was estimated as under for 100 bulbs of each brand.

Life No. of bulbs(in hrs) Brand A Brand B

0–50 15 250–100 20 8100–150 18 60150–200 25 25200–250 22 5

100 100

(i) Which brand gives higher life?(ii) Which brand is more dependable?

9. Averge daily wage of 50 workers of a factory was Rs 200 with a StandardDeviation of Rs 40. Each worker is given a raise of Rs 20. What is thenew average daily wage and standard deviation? Have the wages becomemore or less uniform?

10. If in the previous question, each worker is given a hike of 10 % in wages,how are the Mean and Standard Deviation values affected?

11. Calculate the Mean Deviation about Mean and Standard Deviation for thefollowing distribution.

Classes Frequencies

20–40 340–80 680–100 20100–120 12120–140 9

50

12. The sum of 10 values is 100 and the sum of their squares is 1090. Findthe Coefficient of Variation.

Index Numbers

Studying this chapter shouldenable you to:• understand the meaning of the

term index number;• become familiar with the use of

some widely used indexnumbers;

• calculate an index number;• appreciate its limitations.

1. INTRODUCTION

You have learnt in the previouschapters how summary measures canbe obtained from a mass of data. Nowyou will learn how to obtain summarymeasures of change in a group ofrelated variables.

Rabi goes to the market after a longgap. He finds that the prices of most

commodities have changed. Someitems have become costlier, whileothers have become cheaper. On hisreturn from the market, he tells hisfather about the change in price of theeach and every item, he bought. It isbewildering to both. The industrialsector consists of many subsectors.Each of them is changing. The outputof some subsectors are rising, while itis falling in some subsectors. Thechanges are not uniform. Descriptionof the individual rates of change willbe difficult to understand. Can asingle figure summarise thesechanges? Look at the following cases:

Case 1

An industrial worker was earning asalary of Rs 1,000 in 1982. Today, he

CHAPTER

108 STATISTICS FOR ECONOMICS

earns Rs 12,000. Can his standard ofliving be said to have risen 12 timesduring this period? By how muchshould his salary be raised so thathe is as well off as before?

Case 2

You must be reading about the sensexin the newspapers. The sensexcrossing 8000 points is, indeed,greeted with euphoria. When, sensexdipped 600 points recently, it erodedinvestors’ wealth by Rs 1,53,690crores. What exactly is sensex?

Case 3

The government says inflation rate willnot accelerate due to the rise in theprice of petroleum products. Howdoes one measure inflation?

These are a sample of questionsyou confront in your daily life. A studyof the index number helps inanalysing these questions.

2. WHAT IS AN INDEX NUMBER

An index number is a statistical devicefor measuring changes in themagnitude of a group of relatedvariables. It represents the generaltrend of diverging ratios, from whichit is calculated. It is a measure of theaverage change in a group of relatedvariables over two different situations.The comparison may be between likecategories such as persons, schools,hospitals etc. An index number alsomeasures changes in the value of thevariables such as prices of specifiedlist of commodities, volume of

production in different sectors of anindustry, production of variousagricultural crops, cost of living etc.

Conventionally, index numbers areexpressed in terms of percentage. Ofthe two periods, the period with whichthe comparison is to be made, isknown as the base period. The valuein the base period is given the indexnumber 100. If you want to know howmuch the price has changed in 2005from the level in 1990, then 1990becomes the base. The index numberof any period is in proportion with it.Thus an index number of 250indicates that the value is two and halftimes that of the base period.

Price index numbers measure andpermit comparison of the prices ofcertain goods. Quantity indexnumbers measure the changes in thephysical volume of production,construction or employment. Thoughprice index numbers are more widelyused, a production index is also animportant indicator of the level of theoutput in the economy.

INDEX NUMBERS 109

3. CONSTRUCTION OF AN INDEX NUMBER

In the following sections, theprinciples of constructing an indexnumber will be illustrated throughprice index numbers.Let us look at the following example:

Example 1

Calculation of simple aggregative priceindex

TABLE 8.1

Commodity Base Current Percentageperiod period change

price (Rs) price (Rs)

A 2 4 100B 5 6 20C 4 5 25D 2 3 50

As you observe in this example, thepercentage changes are different forevery commodity. If the percentagechanges were the same for all fouritems, a single measure would havebeen sufficient to describe the change.However, the percentage changesdiffer and reporting the percentagechange for every item will beconfusing. It happens when thenumber of commodities is large, whichis common in any real marketsituation. A price index representsthese changes by a single numericalmeasure.

There are two methods ofconstructing an index number. It canbe computed by the aggregativemethod and by the method ofaveraging relatives.

The Aggregative Method

The formula for a simple aggregativeprice index is

PPP01

1

0

100= ¥ΣΣ

Where P1 and P

0 indicate the price

of the commodity in the currentperiod and base period respectively.Using the data from example 1, thesimple aggregative price index is

P01

4 6 5 32 5 4 2

100 138 5= + + ++ + +

¥ = .

Here, price is said to have risen by38.5 percent.

Do you know that such an indexis of limited use? The reason is thatthe units of measurement of prices ofvarious commodities are not thesame. It is unweighted, because therelative importance of the items hasnot been properly reflected. The itemsare treated as having equalimportance or weight. But whathappens in reality? In reality the itemspurchased dif fer in order ofimportance. Food items occupy alarge proportion of our expenditure.In that case an equal rise in the priceof an item with large weight and thatof an item with low weight will havedifferent implications for the overallchange in the price index.

The formula for a weightedaggregative price index is

PP qP q01

1 1

0 1

100= ¥ΣΣ

An index number becomes aweighted index when the relative

110 STATISTICS FOR ECONOMICS

importance of items is taken care of.Here weights are quantity weights. Toconstruct a weighted aggregativeindex, a well specified basket ofcommodities is taken and its wortheach year is calculated. It thusmeasures the changing value of a fixedaggregate of goods. Since the totalvalue changes with a fixed basket, thechange is due to price change.Various methods of calculating aweighted aggregative index usedifferent baskets with respect to time.

Example 2

Calculation of weighted aggregativeprice index

TABLE 8.2

Base period Current periodCommodity Price Quantity Price Quality

P0

q0

p1

q1

A 2 10 4 5B 5 12 6 10C 4 20 5 15D 2 15 3 10

PP qP q01

1 1

0 1

100= ¥ΣΣ

=¥ + ¥ + ¥ + ¥¥ + ¥ + ¥ + ¥

¥4 10 6 12 5 20 3 152 10 5 12 4 20 2 15

100

= ¥ =257190

100 135 3.

This method uses the base periodquantities as weights. A weightedaggregative price index using baseperiod quantities as weights, is alsoknown as Laspeyre’s price index. Itprovides an explanation to thequestion that if the expenditure onbase period basket of commoditieswas Rs 100, how much should be theexpenditure in the current period onthe same basket of commodities? Asyou can see here, the value of baseperiod quantities has risen by 35.3 percent due to price rise. Using baseperiod quantities as weights, the priceis said to have risen by 35.3 percent.

Since the current period quantitiesdiffer from the base period quantities,the index number using current periodweights gives a different value of theindex number.

PP qP q01

1 1

0 1

100= ¥ΣΣ

= ¥ + ¥ + ¥ + ¥¥ + ¥ + ¥ + ¥

¥4 5 6 10 5 15 3 102 5 5 10 4 15 2 15

100

= ¥ =185140

100 132 1.

It uses the current periodquantities as weights. A weightedaggregative price index using currentperiod quantities as weights is knownas Paasche’s price index. It helps inanswering the question that, if the

INDEX NUMBERS 111

the current period basket ofcommodities was consumed in thebase period and if we were spendingRs 100 on it, how much should be theexpenditure in current period on thesame basket of commodities. APaasche’s price index of 132.1 isinterpreted as a price rise of 32.1percent. Using current period weights,the price is said to have risen by 32.1per cent.

Method of Averaging relatives

When there is only one commodity, theprice index is the ratio of the price ofthe commodity in the current periodto that in the base period, usuallyexpressed in percentage terms. Themethod of averaging relatives takesthe average of these relatives whenthere are many commodities. Theprice index number using pricerelatives is defined as

Pn

pp01

1

0

1100= ¥Σ

where P1 and Po indicate the price ofthe ith commodity in the currentperiod and base period respectively.The ratio (P

1/P

0) × 100 is also referred

to as price relative of the commodity.n stands for the number ofcommodities. In the currentexample

P01

14

42

65

54

32

100 149= + + +ÊËÁ

ˆ¯̃¥ =

Thus the prices of the commoditieshave risen by 49 percent.

The weighted index of pricerelatives is the weighted arithmeticmean of price relatives defined as

P

WPP

W01

1

0

100

ÊËÁ

ˆ¯̃

Σ

Σwhere W = Weight.

In a weighted price relative indexweights may be determined by theproportion or percentage ofexpenditure on them in totalexpenditure during the base period.It can also refer to the current perioddepending on the formula used. Theseare, essentially, the value shares ofdifferent commodities in the totalexpenditure. In general the baseperiod weight is preferred to thecurrent period weight. It is becausecalculating the weight every year isinconvenient. It also refers to thechanging values of different baskets.They are strictly not comparable.Example 3 shows the type ofinformation one needs for calculatingweighted price index.

Example 3

Calculation of weighted price relativesindex

TABLE 8.3

Commodity Base Current Price Weightyear year price relative in %price (in Rs)

(in Rs.)

A 2 4 200 40B 5 6 120 30C 4 5 125 20D 2 3 150 10

112 STATISTICS FOR ECONOMICS

The weighted price index is

P

WPP

W01

1

0

100

ÊËÁ

ˆ¯̃

Σ

Σ

=¥ + ¥ + ¥ + ¥40 200 30 120 20 125 10 150

100

= 156The weighted price index is 156.

The price index has risen by 56percent. The values of the unweightedprice index and the weighted priceindex differ, as they should. The higherrise in the weighted index is due tothe doubling of the most importantitem A in example 3.

Activity

• Interchange the current periodvalues with the base periodvalues, in the data given inexample 2. Calculate the priceindex using Laspeyre’s, andPaasche’s formula. Whatdifference do you observe fromthe earlier illustration?

4. SOME IMPORTANT INDEX NUMBERS

Consumer price index

Consumer price index (CPI), alsoknown as the cost of living index,measures the average change in retailprices. The CPI for industrial workersis increasingly considered theappropriate indicator of generalinflation, which shows the mostaccurate impact of price rise on thecost of living of common people.Consider the statement that the CPI

Consumer Price Index

In India three CPI’s are constructed.They are CPI for industrial workers(1982 as base), CPI for urban nonmanual employees (1984–85 asbase) and CPI for agriculturallabourers (base 1986–87). They areroutinely calculated every month toanalyse the impact of changes in theretail price on the cost of living ofthese three broad categories ofconsumers. The CPI for industrialworkers and agricultural labourersare published by Labour Bureau,Shimla. The Central StatisticalOrganisation publishes the CPInumber of urban non manualemployees. This is necessarybecause their typical consumptionbaskets contain many dissimilaritems.The weight scheme in CPI for

industrial workers (1982=100) bymajor commodity groups is givenin the following table. In this schemefood has the largest weight. Foodbeing the most important category,any rise in the food price will have asignificant impact on CPI. This alsoexplains the government’s frequentstatement that oil price hike will notbe inflationary.

Major Group Weight in %

Food 57.00Pan, supari, tobacco etc. 3.15Fuel & light 6.28Housing 8.67Clothing, bedding & footwear 8.54Misc. group 16.36General 100.00

Source: Economic Survey, Government ofIndia.

INDEX NUMBERS 113

for industrial workers(1982=100) is526 in January 2005. What does thisstatement mean? It means that if theindustrial worker was spending Rs100 in 1982 for a typical basket ofcommodities, he needs Rs 526 inJanuary 2005 to be able to buy anidentical basket of commodities. It isnot necessary that he/she buys thebasket. What is important is whetherhe has the capability to buy it.

Example 4

Construction of consumer price indexnumber.

Wholesale price index

The wholesale price index numberindicates the change in the generalprice level. Unlike the CPI, it does nothave any reference consumercategory. It does not include itemspertaining to services like barbercharges, repairing etc.

What does the statement “WPI with1993-94 as base is 189.1 in March,2005” mean? It means that thegeneral price level has risen by 89.1percent during this period.

CPIWRW

= = =ΣΣ

9786 85100

97 86.

.

This exercise shows that the costof living has declined by 2.14 per cent.What does an index larger than 100indicate? It means a higher cost ofliving necessitating an upwardadjustment in wages and salaries. Therise is equal to the amount, it exceeds100. If the index is 150, 50 percentupward adjustment is required. Thesalaries of the employees have to beraised by 50 per cent.

Industrial production index

The index number of industrialproduction measures changes in thelevel of industrial productioncomprising many industries. Itincludes the production of the publicand the private sector. It is a weightedaverage of quantity relatives. Theformula for the index is

IIPq W

W011 100=¥

¥Σ

ΣIn India, it is currently calculated

every month with 1993–94 as thebase. In table 8.6, you can see theindex number of some industrialgroupings along with their weights.

TABLE 8.4

Item Weight in % Base period Current period R=P1/P

0 × 100 WR

W price (Rs) price (Rs) (in%)

Food 35 150 145 96.67 3883.45Fuel 10 25 23 92.00 920.00Cloth 20 75 65 86.67 1733.40Rent 15 30 30 100.00 1500.00Misc. 20 40 45 112.50 2250.00

9786.85

114 STATISTICS FOR ECONOMICS

Wholesale Price Index

The commodity weights in the WPIare determined by the estimates ofthe commodity value of domesticproduction and the value of importsinclusive of import duty during thebase year. It is available on a weeklybasis. Commodities are broadlyclassified into three categories vizprimary articles, fuel, power, lightand lubricants and manufacturedproducts. The weight scheme isgiven below. The low weight offuel,power,light and lubricantsexplains how the government canget away with such a statement thatthe oil price hike will not beinflationary at least in the short run.

TABLE 8.5

Category Weight in % No. of items

Primary articles 22.0 98Fuel, power,light & lubricants 14.2 19Manufacturedproducts 63.8 318

Source: Economic Survey 2004–2005,Govt. of India, p–89

TABLE 8.6Broad industrial grouping and their

weights

Broad groupings Weight in % Index no. inMay, 2005

Mining and quarrying 10.47 155.2Manufacturing 79.36 222.7Electricity 10.17 196.7General index 213.0

As the table shows, the growthperformances of the broad industrialcategories differ. The general indexrepresents the average performance of

these categories. Why does a compa-ratively lower performance of miningand quarrying not pull down thegeneral index?

Index number of agriculturalproduction

Index number of agricultural productionis a weighted average of quantityrelatives. Its base period is thetriennium ending 1981-82. In 2003–04 the index number of agriculturalproduction was 179.5. It means thatagricultural production has increasedby 79.5 percent over the average ofthe three years 1979–80, 1980–81 and1981–82. Foodgrains have a weight of62.92 percent in this index.

SENSEX

You ofen come across a news item ina newspaper,

“Sensex breaches 8700 mark. BSEcloses at 8650 points. Investor wealthrises by Rs 9,000 crore. The sensexbroke the 8700 mark for the first timein its history but ended off the markat 8650, also a new record closinglevel”.

The rise in sensex was at thehighest level till date, which reflectsthe good health of the economy ingeneral. As the share prices increase,reflected by the rise in sensex, thevalue of wealth of the shareholdersalso rises.Look at another news item,

“Sensex dips 600 in 30 days flat.Rs 1,53,690 crore investor wealtheroded. While the sensex has lost 338

INDEX NUMBERS 115

points in two consecutive days, it haseroded 6.8% or 598 points sinceOctober 4 when it hit an all time highat 8800 points. Investor wealth erodedby a staggering Rs 1,53,690 crore or6.7% during the period.”

It shows that all is not well withthe health of the economy. Theinvestors may find it hard to decidewhether to invest or not.

Another useful index in recentyears is the human developmentindex. Very soon producers price

index number will replace wholesaleprice index.

Producer Price Index

Producer price index numbermeasures price changes from theproducers’ perspective. It uses onlybasic prices including taxes, trademargins and transport costs. AWorking Group on Revision ofWholesale Price Index (1993–94=100) is inter alia examining thefeasibility of switching over from WPIto a PPI in India as in manycountries.

5. ISSUES IN THE CONSTRUCTION OF AN

INDEX NUMBER

You should keep certain importantissues in mind, while constructing anindex number.• You need to be clear about thepurpose of the index. Calculation of avolume index will be inappropriate,when one needs a value index.

Bombay Stock Exchange

Sensex is the short form ofBombay Stock Exchange SensitiveIndex with 1978–79 as base. Thevalue of the sensex is withreference to this period. It is thebenchmark index for the Indianstock market. It consists of 30stocks which represent 13 sectorsof the economy and the companieslisted are leaders in theirrespective industries. If the sensexrises, it indicates that the marketis doing well and investors expectbetter earnings from companies.It also indicates a growingconfidence of investors in the basic health of the economy.

116 STATISTICS FOR ECONOMICS

• Besides this, the items are notequally important for different groupsof consumers when a consumer priceindex is constructed. The rise in petrolprice may not directly impact the livingcondition of the poor agriculturallabourers. Thus the items to beincluded in any index have to beselected carefully to be asrepresentative as possible. Only thenyou will get a meaningful picture ofthe change.• Every index should have a base.This base should be as normal aspossible. Extreme values should notbe selected as base period. The periodshould also not belong to too far inthe past. The comparison between1993 and 2005 is much moremeaningful than a comparisonbetween 1960 and 2005. Many itemsin a 1960 typical consumption baskethave disappeared at present.Therefore, the base year for any indexnumber is routinely updated.• Another issue is the choice of theformula, which depends on the natureof question to be studied. The onlydifference between the Laspeyres’index and Paasche’s index is theweights used in these formulae.• Besides, there are many sourcesof data with different degrees ofreliability. Data of poor reliability willgive misleading results. Hence, duecare should be taken in the collectionof data. If primary data are not beingused, then the most reliable source ofsecondary data should be chosen.

Activity

• Collect data from the localvegetable market over a week for,at least 10 items. Try toconstruct the daily price indexfor the week. What problems doyou encounter in applying bothmethods for the construction ofa price index?

6. INDEX NUMBER IN ECONOMICS

Why do we need to use the indexnumbers? Wholesale price indexnumber (WPI), consumer price indexnumber (CPI) and industrialproduction index (IIP) are widely usedin policy making.• Consumer index number (CPI) orcost of living index numbers arehelpful in wage negotiation,formulation of income policy, pricepolicy, rent control, taxation andgeneral economic policy formulation.• The wholesale price index (WPI) isused to eliminate the effect of changesin prices on aggregates such asnational income, capital formation etc.• The WPI is widely used to measurethe rate of inflation. Inflation is ageneral and continuing increase inprices. If inflation becomes sufficientlylarge, money may lose its traditionalfunction as a medium of exchange andas a unit of account. Its primaryimpact lies in lowering the value ofmoney. The weekly inflation rate isgiven by

X XX

t t

t

1

1

100-

¥ where Xt and X

t-1

INDEX NUMBERS 117

refer to the WPI for the t th and (t-1)th weeks.• CPI are used in calculating thepurchasing power of money and realwage:(i) Purchasing power of money = 1/Cost of living index(ii) Real wage = (Money wage/Cost ofliving index) × 100

If the CPI (1982=100) is 526 inJanuary 2005 the equivalent of arupee in January, 2005 is given by

Rs100526

0 19= . . It means that it is

worth 19 paise in 1982. If the moneywage of the consumer is Rs 10,000,his real wage will be

Rs Rs10 000100526

1 901, ,¥ =

It means Rs 1,901 in 1982 hasthe same purchasing power as Rs10,000 in January, 2005. If he/shewas getting Rs 3,000 in 1982, he/she is worse off due to the rise in price.To maintain the 1982 standard ofliving the salary should be raised toRs 15,780 obtained by multiplying thebase period salary by the factor 526/100.• Index of industrial productiongives us a quantitative figure aboutthe change in production in theindustrial sector.• Agricultural production indexprovides us a ready reckoner of theperformane of agricultural sector.

• Sensex is a useful guide forinvestors in the stock market. If thesensex is rising, investors areoptimistic of the future performanceof the economy. It is an appropriatetime for investment.

Where can we get these indexnumbers?

Some of the widely used indexnumbers are routinely published inthe Economic Survey, an annualpublication of the Government of Indiaare WPI, CPI, Index Number of Yieldof Principal Crops, Index of IndustrialProduction, Index of Foreign Trade.

Activity

• Check from the newspapers andconstruct a time series of sensexwith 10 observations. Whathappens when the base of theconsumer price index is shiftedfrom 1982 to 2000?

7. CONCLUSION

Thus, the method of the index numberenables you to calculate a singlemeasure of change of a large numberof items. Index numbers can becalculated for price, quantity, volumeetc.

It is also clear from the formulaethat the index numbers need to beinterpreted carefully. The items to beincluded and the choice of the baseperiod are important. Index numbersare extremely important in policymaking as is evident by their varioususes.

118 STATISTICS FOR ECONOMICS

Recap

• An index number is a statistical device for measuring relative changein a large number of items.

• There are several formulae for working out an index number andevery formula needs to be interpreted carefully.

• The choice of formula largely depends on the question of interest.• Widely used index numbers are wholesale price index, consumer

price index, index of industrial production, agricultural productionindex and sensex.

• The index numbers are indispensable in economic policymaking.

EXERCISES

1. An index number which accounts for the relative importance of the itemsis known as(i) weighted index(ii) simple aggregative index(iii) simple average of relatives

2. In most of the weighted index numbers the weight pertains to(i) base year(ii) current year(iii) both base and current year

3. The impact of change in the price of a commodity with little weight in theindex will be(i) small(ii) large(iii) uncertain

4. A consumer price index measures changes in(i) retail prices(ii) wholesale prices(iii) producers prices

5. The item having the highest weight in consumer price index for industrialworkers is(i) Food(ii) Housing(iii) Clothing

6. In general, inflation is calculated by using(i) wholesale price index(ii) consumer price index(iii) producers’ price index

INDEX NUMBERS 119

7. Why do we need an index number?

8. What are the desirable properties of the base period?

9. Why is it essential to have different CPI for different categories ofconsumers?

10. What does a consumer price index for industrial workers measure?

11. What is the difference between a price index and a quantity index?

12. Is the change in any price reflected in a price index number?

13. Can the CPI number for urban non-manual employees represent thechanges in the cost of living of the President of India?

14. The monthly per capita expenditure incurred by workers for an industrialcentre during 1980 and 2005 on the following items are given below. Theweights of these items are 75,10, 5, 6 and 4 respectively. Prepare aweighted index number for cost of living for 2005 with 1980 as the base.

Items Price in 1980 Price in 2005

Food 100 200Clothing 20 25Fuel & lighting 15 20House rent 30 40Misc 35 65

INDEX OF INDUSTRIAL PRODUCTION BASE 1993–94

Industry Weight in % 1996–97 2003–2004

General index 100 130.8 189.0Mining and quarrying 10.73 118.2 146.9Manufacturing 79.58 133.6 196.6Electricity 10.69 122.0 172.6

16. Try to list the important items of consumption in your family.

17. If the salary of a person in the base year is Rs 4,000 per annum and thecurrent year salary is Rs 6,000, by how much should his salary rise tomaintain the same standard of living if the CPI is 400?

18. The consumer price index for June, 2005 was 125. The food index was120 and that of other items 135. What is the percentage of the totalweight given to food?

19. An enquiry into the budgets of the middle class families in a certain citygave the following information;

120 STATISTICS FOR ECONOMICS

Expenses on items Food Fuel Clothing Rent Misc.35% 10% 20% 15% 20%

Price (in Rs) in 2004 1500 250 750 300 400Price (in Rs) in 1995 1400 200 500 200 250

What is the cost of living index of 2004 as compared with 1995?

20. Record the daily expenditure, quantities bought and prices paid per unitof the dailypurchases of your family for two weeks. How has the pricechange affected your family?

21. Given the following data-

Year CPI of industrial CPI of urban CPI of agricultural WPI workers non-manual labourers (1993–94=100)

(1982 =100) employees (1986–87 = 100)(1984–85 = 100)

1995–96 313 257 234 121.61996–97 342 283 256 127.21997–98 366 302 264 132.81998–99 414 337 293 140.71999–00 428 352 306 145.32000–01 444 352 306 155.72001–02 463 390 309 161.32002–03 482 405 319 166.82003–04 500 420 331 175.9

Source: Economic Survey, Government of India.2004–2005

(i) Calculate the inflation rates using different index numbers.(ii) Comment on the relative values of the index numbers.(iii) Are they comparable?

Activity

• Consult your class teacher to make a list of widely used indexnumbers. Get the most recent data indicating the source. Can youtell what the unit of an index number is?

• Make a table of consumer price index for industrial workers in thelast 10 years and calculate the purchasing power of money. How is itchanging?

���78%8-78-'7�*36�)'3231-'7

SJ�]SYV�SFNIGXMZI��]SY�[MPP�TVSGIIH�[MXL

XLI�GSPPIGXMSR�ERH�TVSGIWWMRK�SJ�XLIHEXE��*SV�I\EQTPI��TVSHYGXMSR�SV�WEPI

SJ�E�TVSHYGX�PMOI�GEV��QSFMPI�TLSRI�

WLSI�TSPMWL��FEXLMRK�WSET�SV�E

HIXIVKIRX��QE]�FI�ER�EVIE�SJ�MRXIVIWX�XS

]SY��=SY�QE]�PMOI�XS�EHHVIWW�GIVXEMR

[EXIV�SV�IPIGXVMGMX]�TVSFPIQW�VIPEXMRK�XSLSYWILSPHW�SJ�E�TEVXMGYPEV�EVIE��=SY

QE]�PMOI�XS�WXYH]�EFSYX�GSRWYQIV

E[EVIRIWW�EQSRK�LSYWILSPHW��M�I��

E[EVIRIWW�EFSYX�VMKLXW�SJ�GSRWYQIVW�

'LSMGI�SJ�8EVKIX�+VSYT

8LI�GLSMGI�SV�MHIRXMJMGEXMSR�SJ�XLI�XEVKIX

KVSYT�MW�MQTSVXERX�JSV�JVEQMRKETTVSTVMEXI�UYIWXMSRW�JSV�]SYV

UYIWXMSRREMVI��-J�]SYV�TVSNIGX�VIPEXIW�XS

GEVW��XLIR�]SYV�XEVKIX�KVSYT�[MPP�QEMRP]

FI�XLI�QMHHPI�MRGSQI�ERH�XLI�LMKLIVMRGSQI�KVSYTW��*SV�XLI�TVSNIGX�WXYHMIW

VIPEXMRK�XS�GSRWYQIV�TVSHYGXW�PMOI

WSET��]SY�[MPP�XEVKIX�EPP�VYVEP�ERH�YVFER

GSRWYQIVW��*SV�XLI�EZEMPEFMPMX]�SJ�WEJI

HVMROMRK�[EXIV�]SYV�XEVKIX�KVSYT�GERFI�FSXL�YVFER�ERH�VYVEP�TSTYPEXMSR�

8LIVIJSVI��XLI�GLSMGI�SJ�XEVKIX�KVSYTW�

XS�MHIRXMJ]�XLSWI�TIVWSRW�SR�[LSQ�]SY

JSGYW�]SYV�EXXIRXMSR��MW�ZIV]�MQTSVXERX

[LMPI�TVITEVMRK�XLI�TVSNIGX�VITSVX�

'SPPIGXMSR�SJ�(EXE

8LI�SFNIGXMZI�SJ�XLI�WYVZI]�[MPP�LIPT�]SY

XS�HIXIVQMRI�[LIXLIV�XLI�HEXEGSPPIGXMSR�WLSYPH�FI�YRHIVXEOIR�F]

YWMRK�TVMQEV]�QIXLSH��WIGSRHEV]

QIXLSH�SV�FSXL�XLI�QIXLSHW��%W�]SY

LEZI�VIEH�MR�'LETXIV����E�JMVWX�LERH

GSPPIGXMSR�SJ�HEXE�F]�YWMRK�TVMQEV]QIXLSH�GER�FI�HSRI�F]�YWMRK�E

UYIWXMSRREMVI�SV�ER�MRXIVZMI[�WGLIHYPI�

[LMGL�QE]�FI�SFXEMRIH�F]�TIVWSREPMRXIVZMI[W��QEMPMRK�TSWXEP�WYVZI]W�

TLSRI��IQEMP��IXG��4SWXEP�UYIWXMSRREMVI

QYWX�LEZI�E�GSZIVMRK��PIXXIV�KMZMRK

HIXEMPW�EFSYX�XLI�TYVTSWI�SJ�MRUYMV]�

=SYV�SFNIGXMZI�[MPP�FI�XS�HIXIVQMRI�XLI

WM^I�ERH�GLEVEGXIVMWXMGW�SJ�]SYV�XEVKIXKVSYT��*SV�I\EQTPI��MR�E�WYVZI]

TIVXEMRMRK�XS�XLI�TVMQEV]�ERH

WIGSRHEV]�PIZIP�JIQEPI�PMXIVEG]�SV

GSRWYQTXMSR�SJ�E�TEVXMGYPEV�FVERH�SVWSET��]SY�[MPP�LEZI�XS�KS�XS�IEGL�ERH

IZIV]�JEQMP]�SV�LSYWILSPH�XS�GSPPIGX�XLI

MRJSVQEXMSR�

7IGSRHEV]�HEXE�[MPP�TVSZMHIMRJSVQEXMSR�XLVSYKL�TYFPMWLIH�SV

YRTYFPMWLIH�WSYVGIW��MRXIVREP�VIGSVH

SJ�ER]�SVKERMWEXMSR ��TVSZMHIH�MX�WYMXW

]SYV�VIUYMVIQIRX��7IGSRHEV]�WSYVGIW

SJ�HEXE�EVI�YWYEPP]�YWIH�[LIR�XLIVI�MWTEYGMX]�SJ�XMQI��QSRI]�ERH�QERTS[IV

VIWSYVGIW�ERH�XLI�MRJSVQEXMSR�MW�IEWMP]

EZEMPEFPI���-J�WEQTPMRK�MW�YWIH�MR�]SYV

QIXLSH�SJ�HEXE�GSPPIGXMSR��XLIR�XLI�GEVILEW�XS�FI�XEOIR�EFSYX�XLI�WYMXEFMPMX]�SJ

XLI�QIXLSH�SJ�WEQTPMRK�

3VKERMWEXMSR�ERH�4VIWIRXEXMSR�SJ�(EXE

%JXIV�GSPPIGXMRK�XLI�HEXE��]SY�RIIH�XSTVSGIWW�XLI�MRJSVQEXMSR�WS�VIGIMZIH��F]

SVKERMWMRK�ERH�TVIWIRXMRK�[MXL�XLI�LIPT

SJ�XEFYPEXMSR�ERH�WYMXEFPI�HMEKVEQW�

I�K��FEV�HMEKVEQW��TMI�HMEKVEQW��IXG�EFSYX�[LMGL�]SY�LEZI�WXYHMIH�MR

GLETXIV���ERH���

%REP]WMW�ERH�-RXIVTVIXEXMSR

1IEWYVIW�SJ�'IRXVEP�8IRHIRG]��I�K�QIER ��1IEWYVIW�SJ�(MWTIVWMSR��I�K�

97)�3*�78%8-78-'%0�83307���

7XERHEVH�HIZMEXMSR ��ERH�'SVVIPEXMSR

[MPP�IREFPI�]SY�XS�GEPGYPEXI�XLI�EZIVEKI�ZEVMEFMPMX]�ERH�VIPEXMSRWLMT��MJ�MX�I\MWXW

EQSRK�XLI�ZEVMEFPIW��=SY�LEZI�EGUYMVIH

XLI�ORS[PIHKI�VIPEXIH�XS�EFSZI�

QIRXMSRIH�QIEWYVIW�MR�GLETXIVW�����

ERH���

'SRGPYWMSR

8LI�PEWX�WXIT�[MPP�FI�XS�HVE[�QIERMRKJYP

GSRGPYWMSRW�EJXIV�%REP]WMRK�ERH-RXIVTVIXMRK�XLI�VIWYPXW��-J�TSWWMFPI�]SY

QYWX�XV]�XS�TVIHMGX�XLI� JYXYVI

TVSWTIGXW �ERH�WYKKIWXMSRW�VIPEXMRK�XS

KVS[XL�ERH�KSZIVRQIRX�TSPMGMIW��IXG��SR

XLI�FEWMW�SJ�XLI�MRJSVQEXMSR�GSPPIGXIH�

&MFPMSKVETL]

-R�XLMW�WIGXMSR��]SY�RIIH�XS�QIRXMSR�XLI

HIXEMPW�SJ�EPP�XLI�WIGSRHEV]�WSYVGIW��M�I��QEKE^MRIW��RI[WTETIVW��VIWIEVGL

VITSVXW�YWIH�JSV�HIZIPSTMRK�XLI�TVSNIGX�

79++)78)(�0 -78 3*�4 63.)'87

8LIWI�EVI�E�JI[�WYKKIWXIH�TVSNIGXW��=SY

EVI�JVII�XS�GLSSWI�ER]�XSTMG�XLEX�HIEPW

[MXL�ER�IGSRSQMG�MWWYI���'SRWMHIV�]SYVWIPJ�EW�ER�EHZMWSV�XS

8VERWTSVX�1MRMWXIV�[LS�EMQW�XS

FVMRK�EFSYX�E�FIXXIV�ERH

GSSVHMREXIH�W]WXIQ�SJXVERWTSVXEXMSR��4VITEVI�E�TVSNIGX

VITSVX�

��=SY�QE]�FI�[SVOMRK�MR�E�ZMPPEKI

GSXXEKI�MRHYWXV]��-X�GSYPH�FI�E�YRMXQERYJEGXYVMRK HLSST��EKEVFEXXM�

GERHPIW��NYXI�TVSHYGXW��IXG��=SY

[ERX�XS�WXEVX�E�RI[�YRMX�SJ�]SYV

S[R��4VITEVI�E�TVSNIGX�TVSTSWEP�JSV

KIXXMRK�E�FERO�PSER�

��7YTTSWI�]SY�EVI�E�QEVOIXMRK

QEREKIV�MR�E�GSQTER]�ERH�VIGIRXP]]SY�LEZI�TYX�YT�EHZIVXMWIQIRXW

EFSYX�]SYV�GSRWYQIV�TVSHYGX�

4VITEVI�E�VITSVX�SR�XLI�IJJIGX�SJ

EHZIVXMWIQIRXW�SR�XLI�WEPI�SJ�]SYV

TVSHYGX�

��=SY�EVI�E�(MWXVMGX�)HYGEXMSR�3JJMGIV�[LS�[ERXW�XS�EWWIWW�XLI�PMXIVEG]

PIZIPW�ERH�XLI�VIEWSRW�JSV�HVSTTMRK

SYX�SJ�WGLSSP�GLMPHVIR��4VITEVI�E

VITSVX���7YTTSWI�]SY�EVI�E�:MKMPERGI�3JJMGIV

SJ�ER�EVIE�ERH�]SY�VIGIMZI

GSQTPEMRXW�EFSYX�SZIVGLEVKMRK�SJ

KSSHW�F]�XVEHIVW�M�I���GLEVKMRK�ELMKLIV�TVMGI�XLER�XLI�1E\MQYQ

6IXEMP�4VMGI��164 ��:MWMX�E�JI[�WLSTW

ERH�TVITEVI�E�VITSVX�SR�XLI

GSQTPEMRX�

��'SRWMHIV�]SYVWIPJ�XS�FI�E�1YOLM]E�LIEH�SJ�+VEQ�4ERGLE]EX �SJ�E

TEVXMGYPEV�ZMPPEKI�[LS�[ERXW�XS

MQTVSZI�EQIRMXMIW�PMOI�WEJI

HVMROMRK�[EXIV�XS�]SYV�TISTPI�%HHVIWW�]SYV�MWWYIW�MR�E�VITSVX

JSVQ�

��%W�E�VITVIWIRXEXMZI�SJ�E�PSGEP

KSZIVRQIRX��]SY�[ERX�XS�EWWIWW�XLI

TEVXMGMTEXMSR�SJ�[SQIR�MR�ZEVMSYWIQTPS]QIRX�WGLIQIW�MR�]SYV�EVIE�

4VITEVI�E�TVSNIGX�VITSVX�

��=SY�EVI�XLI�'LMIJ�,IEPXL�3JJMGIV�SJ�E

VYVEP�FPSGO��-HIRXMJ]�XLI�MWWYIW�XS

FI�EHHVIWWIH�XLVSYKL�E�TVSNIGXWXYH]��8LMW�QE]�MRGPYHI�LIEPXL�ERH

WERMXEXMSR�TVSFPIQW�MR�XLI�EVIE�

��%W�XLI�'LMIJ�-RWTIGXSV�SJ�*SSH�ERH

'MZMP�7YTTPMIW�HITEVXQIRX��]SYLEZI�VIGIMZIH�E�GSQTPEMRX�EFSYX

JSSH�EHYPXIVEXMSR�MR�XLI�EVIE�SJ

���78%8-78-'7�*36�)'3231-'7

]SYV�HYX]��'SRHYGX�E�WYVZI]�XS�JMRH

XLI�QEKRMXYHI�SJ�XLI�TVSFPIQ����4VITEVI�E�VITSVX�SR�4SPMS

MQQYRMWEXMSR�TVSKVEQQI�MR�E

TEVXMGYPEV�EVIE�

���=SY�EVI�E�&ERO�3JJMGIV�ERH�[ERX�XS

WYVZI]�XLI�WEZMRK�LEFMXW�SJ�XLI

TISTPI�F]�XEOMRK�MRXS�GSRWMHIVEXMSRMRGSQI�ERH�I\TIRHMXYVI�SJ�XLI

TISTPI��4VITEVI�E�VITSVX�

���7YTTSWI�]SY�EVI�E�TEVX�SJ�E�KVSYT

SJ�WXYHIRXW�[LS�[ERXW�XS�WXYH]JEVQMRK�TVEGXMGIW�ERH�XLI�TVSFPIQW

JEGMRK�JEVQIVW�MR�E�ZMPPEKI��4VITEVI

E�TVSNIGX�VITSVX�

7%140)�4 63.)'8

8LMW�MW�E�WEQTPI�TVSNIGX�JSV�]SYVKYMHERGI��8LI�UYIWXMSR�GER�ZEV]

HITIRHMRK�YTSR�XLI�WYFNIGX�SJ�XLI

WXYH]�

=SY�EVI�E�]SYRK�IRXVITVIRIYV�[LS

[ERXW�XS�WIXYT�E�RI[�VIXEMP�WLST�ERH

[ERX�XS�GLSSWI�E�ZEVMIX]�SJ�XSSXLTEWXIFVERHW�XS�WIPP��%�WEQTPI�TVSNIGX�FEWIH

SR�TVMQEV]�WSYVGI�SJ�HEXE�GSPPIGXMSR

GSYPH�FI�TVITEVIH�JSV�XSSXLTEWXI�

=SY�LEZI�XS�WXEVX�F]�EWWYVMRK�XLIGSRGIVRIH�TIVWSR�SV�TEVX]��XLEX�XLI

MRJSVQEXMSR�VIUYMVIH�MW�JSV�WYVZI]�ERH

[MPP�RSX�FI�YWIH�JSV�ER]�SXLIV�TYVTSWI�

8LMW�MW�HSRI�XLVSYKL�E�GSZIVMRK�PIXXIV�%PP�XLI�MRJSVQEXMSR�WLEPP�FI�OITX

GSRJMHIRXMEP�

(EXE�%REP]WMW�ERH�-RXIVTVIXEXMSR

%JXIV�GSPPIGXMRK�XLI�IRXMVI�MRJSVQEXMSR]SY�RS[�LEZI�XS�SVKERMWI�ERH�GPEWWMJ]

HEXE�JSV�XLI�TYVTSWI�SJ�GLSSWMRK

FVERHW�SJ�XSSXLTEWXI�[LMGL�]SY�[ERX

XS�WIPP��,]TSXLIXMGEP�HEXE�MW�KMZIR�FIPS[

JSV�]SYV�VIJIVIRGI�[LIVI�]SY�[MPP�RS[

YWI�XLI�WXEXMWXMGEP�XSSPW�WYGL�EW�TMI

HMEKVEQW��FEV�HMEKVEQW��QIER�

WXERHEVH�HIZMEXMSR��IXG�

%VIE�(MWXVMFYXMSR

9VFER�YWIVW��

6YVEP�YWIVW��

3FWIVZEXMSR� �1ENSVMX]�SJ�YWIVW

FIPSRKIH�XS�YVFER�EVIE�

%KI�HMWXVMFYXMSR

%KI�MR�]IEVW2S��SJ�4IVWSRW

&IPS[�����

��z����

��z������z�����

��z����%FSZI�����

8SXEP���

97)�3*�78%8-78-'%0�83307���

59)78-322%-6)

����2EQI

%KI��MR�]IEVW 2S��SJ�TIVWSRW

�E &IPS[����F ��z���G ��z���H ��z��

�I ��z���J %FSZI���

����+IRHIV��1EPI�*IQEPI

����2YQFIV�SJ�QIQFIVW�MR�XLI�JEQMP]��E �z�

�F �z��G �z��H %FSZI��

����,S[�QER]�IEVRMRK�QIQFIVW�EVI�XLIVI�MR�]SYV�JEQMP]#

����1SRXLP]�JEQMP]�MRGSQI��E &IPS[��������F ������z�������G ������z������

�H %FSZI�������

����6IWMHIRX�SJ��9VFER�6YVEP�EVIE

����1ENSV�SGGYTEXMSR�SJ�XLI�QEMR�FVIEH�IEVRIV�

�E 7IVZMGI�F 4VSJIWWMSREP�G 1ERYJEGXYVIV�H 8VEHIV�I %R]�SXLIV��TPIEWI�WTIGMJ]

����;LEX�HS�]SY�YWI�XS�GPIER�]SYV�XIIXL��E 8SSXLTEWXI

�F 8SSXLTS[HIV�G %R]SXLIV

���;LMGL�FVERH�SJ�XSSXLTEWXI�HS�]SY�YWI#

�E %UYEJVIWL �F %RGLSV�G 'MFEGE �H &EFSSP�I 'PSWI�YT �J 4VSQMWI�K 'SPKEXI �L *SVLERW

���78%8-78-'7�*36�)'3231-'7

�M 1IW[EO �N 8IE�8VII�3MP��2IIQ�O 4ITWSHIRX �P 3VEP�&�Q 4IEVP��� �R 8VYI�(IRX�S ,SQISHIRX �T 7IRWSH]RI

�U %R]�SXLIV

���8LI�TVMGI�TEMH�JSV�IEGL�����KVEQ�TEGO�SJ�XLI�XSSXLTEWXI�

���(S�]SY�JMRH�XLI�TVSHYGX�GSWXP]#= IW�2S

���(S�]SY�I\EQMRI�XLI�HEXI�SJ�QERYJEGXYVMRK�ERH�I\TMV]�SJ�XLI�TVSHYGX#= IW�2S

���(S�]SY�GLIGO�XLI�WXERHEVHMWEXMSR�QEVO��PMOI�z�-7- #= IW�2S

���(S�]SY�GLIGO�XLI�MRKVIHMIRXW�YWIH#= IW�2S

���%VI�]SY�WEXMWJMIH�[MXL�XLI�UYEPMX]�SJ�XLI�TVSHYGX#= IW�2S

���(S�]SY�GSQTPEMR�XS�XLI�WLSTOIITIV�MR�GEWI�SJ�HMWWEXMWJEGXMSR#= IW�2S

���,EW�]SYV�GSQTPEMRX�FIIR�XMQIP]�EXXIRHIH#= IW�2S

���(MH�]SY�IZIV�KS�XS�E�GSRWYQIV�GSYVX�MR�GEWI�SJ�HMWWEXMWJEGXMSRVIKEVHMRK�XLI�TVSHYGX#= IW�2S

���;EW�]SYV�GSQTPEMRX�EXXIRHIH�XS�]SYV�WEXMWJEGXMSR#= IW�2S

���,S[�HMH�]SY�GSQI�XS�ORS[�EFSYX�XLI�TVSHYGX#%HZIVXMWIQIRX *EQMPMIW�-RJPYIRGIH8IPIZMWMSR

2I[WTETIV1EKE^MRI'MRIQE7EPIW�VITVIWIRXEXMZI)\LMFMXW���WXEPP6EHMS

���-W�XLI�EHZIVXMWIQIRX�SJ�XLI�TVSHYGX�TIVWYEWMZI#= IW�2S

���;IVI�]SY�EXXVEGXIH�F]�TVSQSXMSREP�SJJIVW�PMOI�VIFEXIW�JVII�XSSXL�FVYWL��FY]�SRI�KIX�SRI�JVII��IXG�#= IW�2S

���(S�XLI�GLMPHVIR�MRJPYIRGI�TYVGLEWI�SJ�TEVXMGYPEV�XSSXLTEWXI#= IW�2S

���-J�E�RI[�XSSXLTEWXI�MW�PEYRGLIH�MR�XLI�QEVOIX�[MPP�]SY�FY]�MX#= IW�2S-J�]IW��XLIR�[MXL�[LEX�GSRWMHIVEXMSRW#�/MRHP]�QIRXMSR�

97)�3*�78%8-78-'%0�83307���

*MK������ &EV�HMEKVEQ

3FWIVZEXMSR� �1ENSVMX]�SJ�XLI�TIVWSRW

WYVZI]IH�FIPSRKIH�XS�EKI�KVSYT���z���

*EQMP]�7M^I

*EQMP]�WM^I2S��SJ�JEQMPMIW

�z���

�z����z���

%FSZI����

8SXEP���

*MK������ �&EV�HMEKVEQ

3FWIVZEXMSR� �1ENSVMX]�SJ�XLI�JEQMPMIW

WYVZI]IH�LEZI��z��QIQFIVW�

*EQMP]�QSRXLP]�-RGSQI�WXEXYW

-RGSQI2S��SJ�,SYWILSPHW

&IPS[���������

�������z���������������z��������

%FSZI���������

&EV�(MEKVEQ�ERH�,MWXSKVEQ

VIWTIGXMZIP]�EVI�MRHMGEXMRK�XLI�PIZIP�SJJEQMPMIW�MRGSQI�

*MK������ �&EV�HMEKVEQ

*MK������ �,MWXSKVEQ

3FWIVZEXMSR� 1ENSVMX]�SJ�XLI�JEQMPMIW

WYVZI]IH�LEZI�QSRXLP]�MRGSQI�FIX[IIR

�������XS��������

1SRXLP]�*EQMP]�FYHKIX�SRXSMPIXVMIW

-XIQW )\TIRWI�MR�6W

8SSXLTEWXI��7SET��

7LEQTSS���7LEZMRK�GVIEQ��

���78%8-78-'7�*36�)'3231-'7

*MK������ �4MI�HMEKVEQ

3FWIVZEXMSR� 8SSXLTEWXI�EGGSYRXIH

JSV�WMKRMJMGERX�I\TIRHMXYVI�MR�JEQMP]FYHKIX�EQSRKWX�XSMPIXVMIW�

1ENSV�3GGYTEXMSREP�WXEXYW

*EQMP]�3GGYTEXMSR2S��SJ�JEQMPMIW

7IVZMGI��

4VSJIWWMSREP�1ERYJEGXYVIV��

8VEHIV��%R]�SXLIV��TPIEWI�WTIGMJ] ��

*MK������ �4MI�&EV�HMEKVEQ

3FWIVZEXMSR� 1ENSVMX]�SJ�XLI�JEQMPMIW

WYVZI]IH�[IVI�IMXLIV�WIVZMGI�GPEWW�SV

XVEHIVW�

4VIJIVVIH�YWI�SJ�XSSXLTEWXI

&VERH 9RMXW &VERH 9RMXW

%UYEJVIWL� %RGLSV�'MFEGE��&EFSSP�

'PSWI�YT�� 4VSQMWI��'SPKEXI��*SVLERW�

1IW[EO�8IE�XVII�

SMP��2IIQ4ITWSHIRX��3VEP�&��

4IEVP����8VYI�(IRX��

,SQISHIRX�7IRWSH]RI�%R]�SXLIV�

3FWIVZEXMSR� 4ITWSHIRX��'SPKEXI�ERH'PSWI�YT�[IVI�XLI�QSWX�TVIJIVVIH

FVERHW�

4VMGI�SJ�XLI�XSSXLTEWXI

4VMGIW�SJ��8SSXLTEWXI2S��SJ

*SV�����KVEQ�TEGO��6W ,S YWILSPHW

��z����

��z������z����

��z����

8SXEP���

'EPGYPEXI�XLI�QIER�ERH�HMWTIVWMSR

SR�XLI�FEWMW�SJ�XLI�EFSZI�MRJSVQEXMSR�

'EPGYPEXMSR�SJ�1IER�

4VMGI�SJ2S��SJ1MH8SSXLTEWXI, SYWILSPHW 4SMRXW

*SV�����KQ J QJQTEGO��6W

��z�����������

��z����������z�����������

��z�����������

8SXEP�������

Xf m

f

97)�3*�78%8-78-'%0�83307���

3FWIVZEXMSR� 8LI�EZIVEKI�TVMGI�SJ

XSSXLTEWXI�EGVSWW�EPP�FVERHW�MW�6W����

9WI�SJ�SXLIV�WXEXMWXMGEP�XSSPW�

4VMGIW�SJ 2S��SJ1MHH�!JH�JH�8SSXLTEWXI,SYWILSPHW4SMRXW�Q z

*SV�����KQJQ��

TEGO��6W�

��z��������z�z����

��z�������������z�������������

��z�������������

8SXEP�������

%TTP]MRK�XLI�JSVQYPE�SJ�7(

3FWIVZEXMSR� 4VMGI�SJ�XLI�QSWX

XSSXLTEWXI�VERKIH�FIX[IIR�6W���z��

&EWMW�SJ�WIPIGXMSR

*IEXYVIW *EQMP]�QIQFIVW

0MOIH�XLI�EHZIVXMWIQIRX��4IVWYEHIH�F]�XLI�(IRXMWX�

4VMGI��5YEPMX]��

8EWXI��-RKVIHMIRXW��

7XERHEVHMWIH�QEVOMRK��

8VMIH�RI[�TVSHYGX��'SQTER]�W�FVERH�REQI��

3FWIVZEXMSR� 1ENSVMX]�SJ�XLI�TISTPI

GLSSWI�XS�FY]�XLI�XSSXLTEWXI�JSV7XERHEVHMWIH�QEVOMRKW��UYEPMX]��TVMGI

ERH�GSQTER]vW�FVERH�REQI�

8EWXI�ERH�4VIJIVIRGIW

&VERH 7EXMWJMIH9RWEXMWJMIH

%UYEJVIWL���'MFEGE���

'PSWI�YT����'SPKEXI����

1IW[EO���

4ITWSHIRX���%RGLSV���

&EFSSP��

4VSQMWI����*SVLERW��

8IE�XVII�SMP�ERH�2IIQ���3VEP�&����

8VYI�(IRX���7IRWSH]RI��

4IEVP�����

,SQISHIRX��

3FWIVZEXMSR� %QSRKWX�XLI�QSWX�YWIH

XSSXLTEWXIW�XLI�TIVGIRXEKI�SJ

HMWWEXMWJEGXMSR�[EW�VIPEXMZIP]�PIWW�

-RKVIHMIRXW�4VIJIVIRGI

4PEMR�XSSXLTEWXI��

+IP�XSSXLTEWXI�%RXMWITXMG�XSSXLTEWXI��

*PEZSYVIH�XSSXLTEWXI��

'EVMIW�TVSXIGXMZI�XSSXLTEWXI��+YQ�XSSXLTEWXI��

3FWIVZEXMSR� �1ENSVMX]�SJ�XLI�TISTPI

TVIJIVVIH�GEVMIW�TVSXIGXMZI�ERHERXMWITXMG�FEWIH�XSSXLTEWXIW�SZIV�XLI

SXLIVW�

1IHME�-RJPYIRGI

%HZIVXMWIQIRX*EQMPMIW�-RJPYIRGIH

8IPIZMWMSR��2I[W�TETIV��

1EKE^MRI��'MRIQE��

7EPIW�VITVIWIRXEXMZI��)\LMFMXW���WXEPP��

6EHMS��

���78%8-78-'7�*36�)'3231-'7

6SPI�SJ�1IHME

6SPI�SJ�1IHME

8: 2I[WTETIV

1EKE 'MRIQE 7EPIW�VITVI

)\LMFMXWWXEPP

6EHMS�^MRI

�WIRXEXMZI

*MK������ �&EV�HMEKVEQ

3FWIVZEXMSR� 1ENSVMX]�SJ�TISTPI�GEQI

XS�ORS[�EFSYX�XLI�TVSHYGX�IMXLIVXLVSYKL�XIPIZMWMSR�SV�XLVSYKL

RI[WTETIV�

' 32'097-32�4 63.)'8�6 )4368

1ENSVMX]�SJ�XLI�YWIVW�FIPSRKIH�XS�YVFER

EVIE��1SWX�SJ�XLI�TISTPI�[LS�[IVI

WYVZI]IH�FIPSRKIH�XS�EKI�KVSYT���

6IGET

e8LI�SFNIGXMZI�SJ�XLI�WXYH]�WLSYPH�FI�GPIEVP]�MHIRXMJMIH�e8LI�TSTYPEXMSR�ERH�WEQTPI�LEW�XS�FI�GLSWIR�GEVIJYPP]�e8LI�SFNIGXMZI�SJ�WYVZI]�[MPP�MRHMGEXI�XLI�X]TI�SJ�HEXE�XS�FI�YWIH�

e%�UYIWXMSRREMVI�MRXIVZMI[�WGLIHYPI�MW�TVITEVIH�e'SPPIGXIH�HEXE�GER�FI�EREP]WIH�F]�YWMRK�ZEVMSYW�WXEXMWXMGEP�XSSPW�e6IWYPXW�EVI�MRXIVTVIXIH�XS�HVE[�E�QIERMRKJYP�GSRGPYWMSR�

]IEVW�XS����]IEVW�ERH�LEH�ER�EZIVEKI

�z��QIQFIVW�MR�E�JEQMP]��8LI�QSRXLP]MRGSQI�SJ�XLIWI�JEQMPMIW�VERKIH�FIX[IIR

6W��������XS�6W���������ERH�XLIMV

QEMR�SGGYTEXMSRW�[IVI�WIVZMGIW�ERH

XVEHMRK��)\TIRHMXYVI�SR�XSSXLTEWXI

EGGSYRXIH�JSV�E�QENSV�WLEVI�MR�XLIMV

JEQMP]�FYHKIX�EQSRKWX�XSMPIXVMIW�

4ITWSHIRX��'SPKEXI�ERH�'PSWI�YT�[IVI

XLI�QSWX�TVIJIVVIH�FVERHW�MR�XLI

LSYWILSPHW�WYVZI]IH��&]�GEPGYPEXMRK

XLI�QIER�MX�[EW�JSYRH�XLEX�XLI�TVMGI�SJ

ER�EZIVEKI�XSSXLTEWXI�[SYPH�FI�6W���

ETTVS\MQEXIP]�JSV�����KVEQW��4ISTPI

TVIJIVVIH�XLSWI�FVERHW�SJ�XSSXLTEWXI

[LMGL�LEW�IMXLIV�E�GEVMIW�TVSXIGXMSR�SV

ERXMWITXMG�FEWI��%�PSX�SJ�TISTPI�KIX

MRJPYIRGIH�[MXL�EHZIVXMWIQIRX�ERH�XLI

QSWX�TSTYPEV�QIHMYQ�XS�KIX�EGVSWW

XLVSYKL�TISTPI�MW�XIPIZMWMSR�

���78%8-78-'7�*36�)'3231-'7

XLEX�WEXMWJ]�XLIMV�[ERXW�ERH�XS�HMWXVMFYXI�XLIQ�JSV�GSRWYQTXMSR�EQSRK�ZEVMSYWTIVWSRW�ERH�KVSYTW�MR�WSGMIX]�

)RYQIVEXSV %�TIVWSR�[LS�GSPPIGXW�XLI�HEXE�

)\GPYWMZI�1IXLSH� %�QIXLSH�SJ�GPEWWMJ]MRK�SFWIVZEXMSRW�MR�[LMGL�ERSFWIVZEXMSR�IUYEP�XS�XLI�YTTIV�GPEWW�PMQMX�SJ�E�GPEWW�MW�RSX�TYX�MR�XLEX�GPEWWFYX�MW�TYX�MR�XLI�RI\X�GPEWW�

*VIUYIRG] 8LI�RYQFIV�SJ�XMQIW�ER�SFWIVZEXMSR�SGGYVW�MR�VE[�HEXE��-R�EJVIUYIRG]�HMWXVMFYXMSR�MX�QIERW�XLI�RYQFIV�SJ�SFWIVZEXMSRW�MR�E�GPEWW�

*VIUYIRG]�%VVE] ��%�GPEWWMJMGEXMSR�SJ�E�HMWGVIXI�ZEVMEFPI�XLEX�WLS[W�HMJJIVIRXZEPYIW�SJ�XLI�ZEVMEFPI�EPSRK�[MXL�XLIMV�GSVVIWTSRHMRK�JVIUYIRGMIW�

*VIUYIRG]�'YVZI �8LI�KVETL�SJ�E�JVIUYIRG]�HMWXVMFYXMSR�MR�[LMGL�GPEWWJVIUYIRGMIW�SR�=�E\MW�EVI�TPSXXIH�EKEMRWX�XLI�ZEPYIW�SJ�GPEWW�QEVOW�SR�<�E\MW�

*VIUYIRG]�(MWXVMFYXMSR� %�GPEWWMJMGEXMSR�SJ�E�UYERXMXEXMZI�ZEVMEFPI�XLEX�WLS[WLS[�HMJJIVIRX�ZEPYIW�SJ�XLI�ZEVMEFPI�EVI�HMWXVMFYXIH�MR�HMJJIVIRX�GPEWWIW�EPSRK

[MXL�XLIMV�GSVVIWTSRHMRK�GPEWW�JVIUYIRGMIW�

-RGPYWMZI�1IXLSH� %�QIXLSH�SJ�GPEWWMJ]MRK�SFWIVZEXMSRW�MR�[LMGL�ER�SFWIVZEXMSR

IUYEP�XS�XLI�YTTIV�GPEWW�PMQMX�SJ�E�GPEWW�MW�TYX�MR�XLEX�GPEWW�

-RJSVQERX -RHMZMHYEP�YRMX JVSQ�[LSQ�XLI�HIWMVIH�MRJSVQEXMSR�MW�SFXEMRIH�

1YPXM�1SHEP�(MWXVMFYXMSR �8LI�HMWXVMFYXMSR�XLEX�LEW�QSVI�XLER�X[S�QSHIW�

2EXMSREP�-RGSQI �8SXEP�MRGSQI�EVMWMRK�JVSQ�[LEX�LEW�FIIR�TVSHYGIH�MR�XLIGSYRXV]��-X�MW�EPWS�GEPPIH�+VSWW�2EXMSREP�4VSHYGX�

2SR�7EQTPMRK�)VVSV -X�EVMWIW�MR�HEXE�GSPPIGXMSR�HYI�XS��M �IVVSVW�MRQIEWYVIQIRX���MM �VIGSVHMRK�QMWXEOIW���MMM �RSR�VIWTSRWI�

3FWIVZEXMSR �%�YRMX�SJ�VE[�HEXE�

4IVGIRXMPIW �%�ZEPYI�[LMGL�HMZMHIW�XLI�HEXE�MRXS�LYRHVIH�IUYEP�TEVXW�WS�XLIVIEVI����TIVGIRXMPIW�MR�XLI�HEXE�

4SPMG] �8LI�QIEWYVI�XS�WSPZI�ER�IGSRSQMG�TVSFPIQ�

4STYPEXMSR �4STYPEXMSR�QIERW� EPP XLI�MRHMZMHYEPW�YRMXW�JSV�[LSQ�XLIMRJSVQEXMSR�LEW�XS�FI�WSYKLX�

4VSHYGIV �3RI�[LS�QERYJEGXYVIW�KSSHW�JSV�XLI�QEVOIX�

4VSHYGXMSR �8LI�EGX�SJ�QERYJEGXYVMRK�KSSHW�

5YEPMXEXMZI�'PEWWMJMGEXMSR� 'PEWWMJMGEXMSR�FEWIH�SR�UYEPMX]��*SV�I\EQTPI

GPEWWMJMGEXMSR�SJ�TISTPI�EGGSVHMRK�XS�KIRHIV��QEVMXEP�WXEXYW�IXG�

5YEPMXEXMZI�*EGXW �)GSRSQMG�MRJSVQEXMSR�SV�HEXE�I\TVIWWIH�MR�XIVQW�SJUYEPMXMIW�

5YERXMXEXMZI�*EGXW� )GSRSQMG�MRJSVQEXMSR�SV�HEXE�I\TVIWWIH�MR�RYQFIVW�

%44)2(-<�%

5YIWXMSRREMVI %�PMWX�SJ�UYIWXMSRW�TVITEVIH�F]�ER�MRZIWXMKEXSV�SR�XLI�WYFNIGXSJ�IRUYMV]��8LI�VIWTSRHIRX�MW�VIUYMVIH�XS�ERW[IV�XLI�UYIWXMSRW�

6ERHSQ�7EQTPMRK -X�MW�E�QIXLSH�SJ�WEQTPMRK�MR�[LMGL�XLI�VITVIWIRXEXMZIWIX�SJ�MRJSVQERXW�MW�WIPIGXIH�MR�E�[E]�XLEX�IZIV]�MRHMZMHYEP�MW�KMZIR�IUYEPGLERGI�SJ�FIMRK�WIPIGXIH�EW�ER�MRJSVQERX�

6ERKI �(MJJIVIRGI�FIX[IIR�XLI�QE\MQYQ�ERH�XLI�QMRMQYQ�ZEPYIW�SJ�EZEVMEFPI�

6IPEXMZI�*VIUYIRG] �*VIUYIRG]�SJ�E�GPEWW�EW�TVSTSVXMSR�SV�TIVGIRXEKI�SJ�XSXEPJVIUYIRG]

7EQTPI�7YVZI]�1IXLSH %�QIXLSH��[LMGL�VIUYMVIW�XLEX�SFWIVZEXMSRW�EVISFXEMRIH�SR�E�VITVIWIRXEXMZI�WIX�SJ�MRHMZMHYEPW��XLI�WEQTPI ��WIPIGXIH�JVSQ

XLI�TSTYPEXMSR�

7EQTPMRK�)VVSV -X�MW�XLI�RYQIVMGEP�HMJJIVIRGI�FIX[IIR�XLI�IWXMQEXI�ERH�XLIXVYI�ZEPYI�SJ�XLI�TEVEQIXIV�

7GEVGMX] �-X�QIERW�XLI�PEGO�SJ�EZEMPEFMPMX]�

7IEWSREPMX] 4IVMSHMGMX]�MR�HEXE�ZEVMEXMSR�[MXL�XMQI�TIVMSH�PIWW�XLER�SRI�]IEV�

7IPPIV �3RI�[LS�WIPPW�KSSHW�JSV�TVSJMX�

7IVZMGI�LSPHIV 3RI�[LS�KIXW�TEMH�JSV�E�NSF�SV�JSV�[SVOMRK�JSV�ERSXLIV�TIVWSR�

7IVZMGI�4VSZMHIV �3RI�[LS�TVSZMHIW�E�WIVZMGI�XS�SXLIVW�JSV�E�TE]QIRX�

7TEXMEP�'PEWWMJMGEXMSR� 'PEWWMJMGEXMSR�FEWIH�SR�KISKVETLMGEP�PSGEXMSR�

7XEXMWXMGW 8LI�QIXLSH�SJ�GSPPIGXMRK��SVKERMWMRK��TVIWIRXMRK�ERH�EREP]WMRKHEXE�XS�HVE[�QIERMRKJYP�GSRGPYWMSR��*YVXLIV��MX�EPWS�QIERW�HEXE�

7XVYGXYVIH�5YIWXMSRREMVI 7XVYGXYVIH�5YIWXMSRREMVI�GSRWMWXW�SJ�wGPSWIH�

IRHIHx�UYIWXMSRW��JSV�[LMGL�EPXIVREXMZI�TSWWMFPI�ERW[IVW�XS�GLSSWI�JVSQ�EVITVSZMHIH�

8EPP]�1EVOMRK� 8LI�GSYRXMRK�SJ�SFWIVZEXMSRW�MR�E�GPEWW�YWMRK�XEPP]��� �QEVOW�8EPPMIW�EVI�KVSYTIH�MR�JMZIW�

8MQI�7IVMIW �(EXE�EVVERKIH�MR�GLVSRSPSKMGEP�SVHIV�SV�X[S�ZEVMEFPI�HEXE�[LIVISRI�SJ�XLI�ZEVMEFPIW�MW�XMQI�

9RMZEVMEXI�(MWXVMFYXMSR �8LI�JVIUYIRG]�HMWXVMFYXMSR�SJ�SRI�ZEVMEFPI�

:EVMEFPI %�ZEVMEFPI�MW�E�UYERXMX]�YWIH�XS�QIEWYVI�ER�wEXXVMFYXIx��WYGL�EWLIMKLX��[IMKLX��RYQFIV�IXG� �SJ�WSQI�XLMRK�SV�WSQI�TIVWSRW��[LMGL�GER�XEOIHMJJIVIRX�ZEPYIW�MR�HMJJIVIRX�WMXYEXMSRW�

;IMKLXIH�%ZIVEKI ���8LI�EZIVEKI�MW�GEPGYPEXIH�F]�TVSZMHMRK�XLI�HMJJIVIRX�HEXETSMRXW�[MXL�HMJJIVIRX�[IMKLXW�

%44)2(-<�&���'SRX�

;,%8�8,)= �7%=

7XEXMWXMGW�EVI�RS�WYFWXMXYXI�JSV�NYHKIQIRX�

,IRV]�'PE]

-�EFLSV�EZIVEKIW��-�PMOI�XLI�MRHMZMHYEP�GEWI��%�QER�QE]�LEZI�WM\

QIEPW�SRI�HE]�ERH�RSRI�XLI�RI\X��QEOMRK�ER�EZIVEKI�SJ�XLVII�QIEPW

TIV�HE]��FYX�XLEX�MW�RSX�E�KSSH�[E]�XS�PMZI�

0SYMW�(��&VERHMIW

8LI�[IEXLIV�QER�MW�RIZIV�[VSRK��7YTTSWI�LI�WE]W�XLEX�XLIVIvW�ER

���GLERGI�SJ�VEMR��-J�MX�VEMRW��XLI����GLERGI�GEQI�YT��MJ�MX�HSIWRvX�

XLI����GLERGI�GSQI�YT�

7EYP�&EVVSR

8LI�HIEXL�SJ�SRI�QER�MW�E�XVEKIH]��8LI�HIEXL�SJ�QMPPMSRW�MW�E�WXEXMWXMG�

.SWITL�7XEPMR

;LIR�WLI�XSPH�QI�-�[EW�EZIVEKI��WLI�[EW�NYWX�FIMRK�QIER�

1MOI�&IGOQER

;L]�MW�E�TL]WMGMER�LIPH�MR�QYGL�LMKLIV�IWXIIQ�XLER�E�WXEXMWXMGMER#

%�TL]WMGMER�QEOIW�ER�EREP]WMW�SJ�E�GSQTPI\�MPPRIWW�[LIVIEW�E

WXEXMWXMGMER�QEOIW�]SY�MPP�[MXL�E�GSQTPI\�EREP]WMW�

+EV]�'��6EQWI]IV