Introduction to Statistics.docx

Embed Size (px)

Citation preview

  • 8/11/2019 Introduction to Statistics.docx

    1/36

    Introduction to StatisticsThe term statistics mean that the numerical statement as well as statistical methodology.

    When it is used in the sense of statistical data it refers to quantitative aspects of things and is anumerical description.

    Example : Income of family, production of automobile industry, sales of cars etc. Therequantities are numerical. But there are some quantities which are not in themselves numerical

    but can be made so by counting. The sex of a baby is not a number, but by counting the numberof boys, we can associate a numerical description to sex of all new born babies, for an example,when saying that 60% of all live-born babies are boy. This information then, comes within therealm of statistics.

    DefinitionThe word statistics can be used is two senses, viz, singular and plural. In narrow sense

    and plural sense, statistics denotes some numerical data (statistical data). In a wide and singularsense statistics refers to the statistical methods. Therefore, these have been grouped under twoheads Statistics as a data and Statistics as a methods.

    Statistics as a DataSome definitions of statistics as a data are

    a) Statistics are numerical statement of facts in any department of enquiring placed in relation toeach other.

    - Powley

    b) By statistics we mean quantities data affected to a marked extent by multiplasticity of course.- Yule and Kendall

    c) By statistics we mean aggregates of facts affected to a marked extent by multiplicity ofcauses, numerically expressed, enumerated or estimated according to reasonable standard ofaccuracy, collected in a systematic manner for pre-determinated purpose and placed inrelation to each other.

    - H. Secrist

    This definition is more comprehensive and exhaustive. It shows more light oncharacteristics of statistics and covers different aspects.

    Some characteristics the statistics should possess by H. Secrist can be listed as follows.

  • 8/11/2019 Introduction to Statistics.docx

    2/36

    Statistics are aggregate of facts Statistics are affected to a marked extent by multiplicity of causes. Statistics are numerically expressed Statistics should be enumerated / estimated Statistics should be collected with reasonable standard of accuracy Statistics should be placed is relation to each other.

    Statistics as a methodsDefinition

    a) Statistics may be called to science of counting

    - A.L. Bowley

    b)

    Statistics is the science of estimates and probabilities.- Boddington

    c) Dr. Croxton and Cowden have given a clear and concise definition.

    Statistics may be defined as the collection, pre sentation, analysis and interpretation ofnumerical data.

    According to Croxton and Cowden there are 4 stages.

    a) Coll ection of Data

    A structure of statistical investigation is based on a systematic collection of data. Thedata is classified into two groups

    i) Internal data and

    ii) External data

    Internal data are obtained from internal records related to operations of businessorganisation such as production, source of income and expenditure, inventory, purchases andaccounts.

    The external data are collected and purchased by external agencies. The external datacould be either primary data or secondary data. The primary data are collected for first time andoriginal, while secondary data are collected by published by some agencies.

    b) Organi sations of data

    The collected data is a large mass of figures that needs to be organised. The collecteddata must be edited to rectify for any omissions, irrelevant answers, and wrong computations.The edited data must be classified and tabulated to suit further analysis.

    c) Presentation of data

  • 8/11/2019 Introduction to Statistics.docx

    3/36

    The large data that are collected cannot be understand and analysis easily and quickly.Therefore, collected data needs to be presented in tabular or graphic form. This systematic orderand graphical presentation helps for further analysis.

    d) An alysis of data

    The analysis requires establishing the relationship between one or more variables.Analysis of data includes condensation, abstracting, summarization, conclusion etc. With thehelp of statistical tools and techniques like measures of dispersion central tendency, correlation,variance analysis etc analysis can be done.

    e) I nterpretation of data

    The interpretation requires deep insight of the subject. Interpretation involves drawingthe valid conclusions on the bases of the analysis of data. This work requires good experienceand skill. This process is very important as conclusions of results is done based on interpretation.

    We can define statistics as per Seligman as follows.

    Statistics is a science which deals with the method and of collecting, classifying,presenting, comparing and interpretating the numerical data collected to throw light onenquiry.

    Importance of statisticsIn todays context statistics is indispensable. As the use of statistics is extended to

    various field of experiments to draw valid conclusions, it is found increased importance andusage. The number of research investigations in the field of economics and commerce arelargely statistical. Further, the importance and statistics in various fields are listed as below.

    a) State Af fair s : In state affairs, statistics is useful in following ways

    1. To collect the information and study the economic condition of people in the states.2. To asses the resources available in states.

    3. To help state to take decision on accepting or rejecting its policy based on statistics.

    4. To provide information and analysis on various factors of state like wealth, crimes,agriculture experts, education etc.

    b) Economics : In economics, statistics is useful in following ways

    1. Helps in formulation of economic laws and policies

    2. Helps in studying economic problems

    3. Helps in compiling the national income accounts.4. Helps in economic planning.

    c) Business

    1. Helps to take decisions on location and size

    2. Helps to study demand and supply

    3. Helps in forecasting and planning

  • 8/11/2019 Introduction to Statistics.docx

    4/36

    4. Helps controlling the quality of the product or process

    5. Helps in making marketing decisions

    6. Helps for production, planning and inventory management.

    7. Helps in business risk analysis

    8. Helps in resource long term requirements, in estimating consumers preference and helpsin business research.

    d) Education : Statistics is necessary to formulate the polices regarding start of new courses,consideration of facilities available for proposed courses.

    e) Accounts and Audi ts:

    1. Helps to study the correlation between profits and dividends enable to know trend offuture profits.

    2. In auditing sampling techniques are followed.

    Functions of statisticsSome important functions of statistics are as follows

    1. To collect and present facts in a systematic manner.

    2. Helps in formulation and testing of hypothesis.

    3. Helps in facilitating the comparison of data.

    4. Helps in predicting future trends.

    5. Helps to find the relationship between variable.

    6. Simplifies the mass of complex data.7. Help to formulate polices.

    8. Helps Government to take decisions.

    Limitations of statistics1. Does not study qualitative phenomenon.

    2. Does not deal with individual items.

    3. Statistical results are true only on an average.

    4. Statistical data should be uniform and homogeneous.

    5. Statistical results depends on the accuracy of data.

    6. Statistical conclusions are not universally true.

    7. Statistical results can be interpreted only if person has sound knowledge of statistics.

  • 8/11/2019 Introduction to Statistics.docx

    5/36

    Distrust of StatisticsDistrust of statistics are due to lack of knowledge and limitations of its uses, but not due

    to statistical sciences.

    Distrust of statistics are due to following reasons.

    a) Figures are manipulated or incompleted. b) Quoting figures without their context.

    c) Inconsistent definitions.

    d) Selection of non-representative statistical units.

    e) Inappropriate comparison

    f) Wrong inference drawn.

    g) Errors in data collection.

    Statistical DataStatistical investigation is a long and comprehensive process and requires systematic

    collection of data in large size. The validity and accuracy of the conclusion or results of thestudy depends upon how well the data were gathered. The quality of data will greatly influencethe conclusions of the study and hence importance is to be given to the data collection process.

    Statistical data may be classified as Primary Data and Secondary Data based on thesources of data collection.

    Primary data

    Primary data are those which are collected for the first time by the investigator /researchers and are thus original in character. Thus, data collected by investigator may be for thespecific purpose / study at hand. Primary data are usually in the shape of raw materials to whichstatistical methods are applied for the purpose of analysis and interpretation.

    Secondary dataSecondary have been already collected for the purpose other than the problem at hand.

    These data are those which have already been collected by some other persons and which have passed through the statistical analysis at least once. Secondary data are usually in the shape offinished products since they have been already treated statistically in one or the other form.After statistical treatment the primary data lose their original shape and becomes secondary data.Secondary data of one organisation become the primary data of other organisation who firstcollect and publish them.

    Primary Vs Secondary Data

  • 8/11/2019 Introduction to Statistics.docx

    6/36

    Primary data are originated by researcher for specific purpose / study at hand whilesecondary data have already been collected for purpose other than research work at hand.

    Primary data collection requires considerably more time, relatively expensive. While thesecondary data are easily accessible, inexpensive and quickly obtained.

    Table A compression of Primary and Secondary DataPrimary data Secondary data

    Collection purpose For the problem at hand For other problems

    Collection process Very involved Rapid and easy

    Collection cost High Relatively low

    Collection time Long Short

    Suitability Its suitability is positive It may or may not suit theobject of survey

    Originality It is original It is not original

    Precautions No extra precautionsrequired to use the data

    It should be used withextra case

    Limitations of secondary dataa) Since secondary data is collected for some other purpose, its usefulness to current

    problem may be limited in several important ways, including relevancies and accuracy.

    b) The objectives, nature and methods used to collect secondary data may not be appropriateto present situation.

    c) The secondary data may not be accurate, or they may not be completely current ordependable.

    Criteria for evaluating secondary dataBefore using the secondary data it is important to evaluate them on following factors

    a) Specification and methodology used to collect the data

    b) Error and accuracy of data of the data

    c) The currency

    d) The objective The purpose for which data were collected

    e) The nature content of data

    f) The dependability

    Sources of data

  • 8/11/2019 Introduction to Statistics.docx

    7/36

    Primary source The methods of collecting primary data.

    When data is neither internally available nor exists as a secondary source, then the primary sources of data would be approximate.

    The various method of collection of primary data are as follows

    a) Direct personal investigation- Interview

    - Observation

    b) Indirect or oral investigation

    c) Information from local agents and correspondents

    d) Mailded questionnaires and schedules

    e) Through enumerations

    Secondary source The methods of collecting secondary datai) Published Statistics

    a) Official publications of Central Government

    Ex : Central Statistical Organisation (CSO) Ministry of planning

    - National Sample Survey Organisation (NSSO)

    - Office of the Registrar General and Census Committee GOI

    - Director of Statistics and Economics Ministry of Agriculture

    - Labour Bureau Ministry of Labour etc.

    ii) Publications of Semi-government organisation

    Ex :

    - The institute of foreign trade, New Delhi

    - The institute of economic growth, New Delhi.

    iii) Publication of research institutes

    Ex :

    - Indian Statistical Institute

    - Indian Agriculture Statistical Institute- NCRET Publications

    - Indian Standards Institute etc.

    iv) Publication of Business and Financial Institutions

    Ex :

  • 8/11/2019 Introduction to Statistics.docx

    8/36

    - Trade Association Publications like Sugar factory, Textile mill, Indian chamber ofIndustry and Commerce.

    - Stock exchange reports, Co-operative society reports etc.

    v) News papers and periodicals

    Ex :- The Financial Express, Eastern Economics, Economic Times, Indian Finance, etc.

    vi) Reports of various committees and commissions

    Ex :

    - Kothari commission report on education

    - Pay commission reports

    - Land perform committee reports etc.

    vii) Unpublished statistics

    - Internal and administrative data like Periodical Loss, Profit, Sales, ProductionRate, Balance Sheet, Labour Turnover, Budges, etc.

    Classification and TabulationThe data collected for the purpose of a statistical inquiry some times consists of a few

    fairly simple figures which can be easily understood without any special treatment. But moreoften there is an overwhelming mass of raw data without any structure. Thus, unwidely,unorganised and shapeless mass of collected is not capable of being rapidly or easily associatedor interpreted. Unorganised data are not fit for further analysis and interpretation. In order to

    make the data simple and easily understandable the first task is not condense and simplify themin such a way that irrelevant datas are removed and their significant features are stand out prominently. The procedure adopted for this purpose is known as method of classification andtabulation. Classification helps proper tabulation.

    Classified and arranged facts speak themselves; unarranged, unorganised th ey are deadas mutton.

    - Prof. J.R. Hicks

    Meaning of ClassificationClassification is a process of arranging things or data in groups or classes according to

    their resemblances and affinities and gives expressions to the unity of attributes that may subsitamong a diversity of individuals.

    Definition of ClassificationClassification is the process of arranging data into sequences and groups according to

    their common characteristics or separating them into different but related parts.

  • 8/11/2019 Introduction to Statistics.docx

    9/36

    - Secrist

    The process of grouping large number of individual facts and observations on the basis ofsimilarity among the items, is called classification.

    - Stockton & Clark

    Characteristics of classificationa) Classification performs homogeneous grouping of data

    b) It brings out points of similarity and dissimilating

    c) The classification may be either real or imaginary

    d) Classification is flexible to accommodate adjustments

    Objectives / purposes of classificationsi) To simplify and condense the large data

    ii) To present the facts to easily in understandable form

    iii) To allow comparisons

    iv) To help to draw valid inferences

    v) To relate the variables among the data

    vi) To help further analysis

    vii) To eliminate unwanted data

    viii) To prepare tabulation

    Guiding principles (rules) of classificationsFollowing are the general guiding principles for good classifications

    a) Exhaustive : Classification should be exhaustive. Each and every item in datamust belong to one of class. Introduction of residual class (i.e. either,miscellaneous etc.) should be avoided.

    b) M utual ly exclusive : Each item should be placed at only one class

    c) Suitability : The classification should confirm to object of inquiry.

    d) Stability : Only one principle must be maintained throughout the classification andanalysis.

    e) H omogeneity : The items included in each class must be homogeneous.f) Flexibility : A good classification should be flexible enough to accommodate new

    situation or changed situations.

    Modes / Types of Classification

  • 8/11/2019 Introduction to Statistics.docx

    10/36

    Modes / Types of classification refers to the class categories into which the data could besorted out and tabulated. These category depends on the nature of data and purpose for whichdata is being sought.

    Important types of classificationa) Geographical (i.e. on the basis of area or region wise)

    b) Chronological (On the basis of Temporal / Historical, i.e. with respect to time)

    c) Qualitative (on the basis of character / attributes)

    d) Numerical, quantitative (on the basis of magnitude)

    a) Geographical Classif ication

    In geographical classification, the classification is based on the geographical regions.

    Ex : Sales of the company (In Million Rupees) (region wise)Region Sales

    North 285

    South 300

    East 185

    West 235

    b)

    Chronological Classif icationIf the statistical data are classified according to the time of its occurrence, the type of

    classification is called chronological classification.

    Sales reported by a departmental store

    MonthSales

    (Rs.) in lakhs

    January 22

    February 26

    March 32

    April 25

    May 27

    June 29

  • 8/11/2019 Introduction to Statistics.docx

    11/36

    July 30

    August 30

    c) Qual itati ve Classif ication

    In qualitative classifications, the data are classified according to the presence or absenceof attributes in given units. Thus, the classification is based on some quality characteristics /attributes.

    Ex : Sex, Literacy, Education, Class grade etc.

    Further, it may be classified as

    a) Simple classification b) Manifold classification

    i) Simple classification : If the classification is done into only two classes then classificationis known as simple classification.

    Ex: a) Population in to Male / Female

    b) Population into Educated / Uneducated

    ii) Manifold classification : In this classification, the classification is based on more than oneattribute at a time.

    Ex :

    d) Quantitative Classification : In Quantitative classification, the classification is based onquantitative measurements of some characteristics, such as age, marks, income, production,sales etc. The quantitative phenomenon under study is known as variable and hence thisclassification is also called as classification by variable.

    Ex :

    For a 50 marks test, Marks obtained by students as classified as follows

    Population

    Smokers Non-smokers

    Illiterate Literate

    Male Female

    Male Female

    Literate Illiterate

    Male Female

    Male Female

  • 8/11/2019 Introduction to Statistics.docx

    12/36

    Marks No. of students

    0 10 5

    10 20 7

    20 30 10

    30 40 25

    40 50 3

    Total Students = 50

    In this classification marks obtained by students is variable and number of students ineach class represents the frequency.

    Meaning and Definition of TabulationTabulation may be defined as systematic arrangement of data is column and rows. It is

    designed to simplify presentation of data for the purpose of analysis and statistical inferences.

    Major Objectives of Tabulation1. To simplify the complex data

    2. To facilitate comparison3. To economise the space

    4. To draw valid inference / conclusions

    5. To help for further analysis

    Differences between Classification and Tabulation1. First data are classified and presented in tables; classification is the basis for tabulation.

    2. Tabulation is a mechanical function of classification because is tabulation classified dataare placed in row and columns.

    3. Classification is a process of statistical analysis while tabulation is a process of presenting data is suitable structure.

    Classification of tablesClassification is done based on

  • 8/11/2019 Introduction to Statistics.docx

    13/36

    1. Coverage (Simple and complex table)

    2. Objective / purpose (General purpose / Reference table / Special table or summary table)

    3. Nature of inquiry (primary and divided table).

    Ex:

    a) Simple table : Data are classified based on only one characteristic

    Distribution of marks

    Class Marks No. of students

    30 40 20

    40 50 20

    50 60 10

    Total 50

    b) Two-way table : Classification is based on two characteristics

    Class MarksNo. of students

    Boys Girls Total

    30 40 10 10 20

    40 50 15 5 20

    50 60 3 7 10

    Total 28 22 50

    Frequency DistributionFrequency distribution is a table used to organize the data. The left column (called

    classes or groups) includes numerical intervals on a variable under study. The right columncontains the list of frequencies, or number of occurrences of each class/group. Intervals arenormally of equal size covering the sample observations range.

    It is simply a table in which the gathered data are grouped into classes and the number ofoccurrences which fall in each class is recorded.

    Definition

  • 8/11/2019 Introduction to Statistics.docx

    14/36

    A frequency distribution is a statistical table which shows the set of all distinct values ofthe variable arranged in order of magnitude, either individually or in groups with theircorresponding frequencies.

    - Croxton and Cowden

    A frequency distribution can be classified as

    a) Series of individual observation

    b) Discrete frequency distribution

    c) Continuous frequency distribution

    a) Ser ies of in dividual observation

    Series of individual observation is a series where the items are listed one after the eachobservations. For statistical calculations, these observation could be arranged is either ascendingor descending order. This is called as array.

    Ex :

    Roll No.Marks obtained

    in statisticspaper

    1 83

    2 80

    3 75

    4 92

    5 65

    The above data list is a raw data. The presentation of data in above form doesnt revealany information. If the data is arranged in ascending / descending in the order of theirmagnitude, which gives better presentation then, it is called arraying of data.

    Discrete (ungrouped) Frequency Distribution

    If the data series are presented in such away that indicating its exact measurement ofunits, then it is called as discrete frequency distribution. Discrete variable is one where thevariates differ from each other by definite amounts.

    Ex :

    Assume that a survey has been made to know number of post-graduates in 10 families atrandom, the resulted raw data could be as follows.

    0, 1, 3, 1, 0, 2, 2, 2, 2, 4

  • 8/11/2019 Introduction to Statistics.docx

    15/36

    This data can be classified into an ungrouped frequency distribution. The number of post-graduates becomes variable (x) for which we can list the frequency of occurrence (f) in atabular from as follows;

    Number of postgraduates (x)

    Frequency(f)

    0 2

    1 2

    2 4

    3 1

    4 1

    The above example shows a discrete frequency distribution, where the variables hasdiscrete numerical values.

    Continuous frequency distribution (grouped frequency distribution)Continuous data series is one where the measurements are only approximations and are

    expressed in class intervals within certain limits. In continuous frequency distribution the classinterval theoretically continuous from the starting of the frequency distribution till the endwithout break. According to Boddington the variabl e which can take very intermediate value

    between the smallest and largest value in the distribution is a continuous frequency distribution.

    Ex :

    Marks obtained by 20 students in students exam for 50 marks are as given below convert the data into continuous frequency distribution form.

    18 23 28 29 44 28 48 33 32 43

    24 29 32 39 49 42 27 33 28 29

    By grouping the marks into class interval of 10 following frequency distribution table can be formed.

    Marks No. of students

    0 - 5 0

    5 10 0

    10 15 0

    15 20 1

    20 25 2

  • 8/11/2019 Introduction to Statistics.docx

    16/36

    25 30 7

    30 35 4

    35 40 1

    40 45 3

    45 50 2

    Technical terms used in formulation frequency distributiona) Class limi ts:

    The class limits are the smallest and largest values in the class.

    Ex :

    0 10, in this class, the lowest value is zero and highest value is 10. the two boundariesof the class are called upper and lower limits of the class. Class limit is also called as class

    boundaries.

    b) Class intervals

    The difference between upper and lower limit of class is known as class interval.

    Ex :

    In the class 0 10, the class interval is (10 0) = 10.

    The formula to find class interval is gives on below

    R SL

    i

    L = Largest value

    S = Smallest value

    R = the no. or classes

    Ex :

    If the marks of 60 students in a class varies between 40 and 100 and if we want to form 6classes, the class interval would be

    R SL

    i

    =6

    40100 =6

    60 = 10 L = 100

    S = 40

    K = 6

    Therefore, class intervals would be 40 50, 50 60, 60 70, 70 80, 80 90 and 90 100.

  • 8/11/2019 Introduction to Statistics.docx

    17/36

    Methods of forming class-intervala) Exclusive method (over lapping)

    In this method, the upper limits of one class-interval is the lower limit of next class. Thismethods makes continuity of data.Ex :

    Marks No. of students

    20 30 5

    30 40 15

    40 50 25

    A student whose mark is between 20 to 29.9 will be included in the 20 30 class.Better way of expressing is

    Marks No. of students

    20 to les than 30

    (More than 20 but les than 30)

    5

    30 to les than 40 15

    40 to les than 50 25

    Total Students 50

    b) I nclu sive method (non-overl aping)

    Ex :

    Marks No. of students

    20 29 5

    30 39 15

    40 49 25

    A student whose mark is 29 is included in 20 29 class interval and a student whosemark in 39 is included in 30 39 class interval.

    Class Frequency

  • 8/11/2019 Introduction to Statistics.docx

    18/36

    The number of observations falling within class-interval is called its class frequency.

    Ex : The class frequency 90 100 is 5, represents that there are 5 students scored between 90 and100. If we add all the frequencies of individual classes, the total frequency represents totalnumber of items studied.

    Magnitude of class intervalThe magnitude of class interval depends on range and number of classes. The range is

    the difference between the highest and smallest values is the data series. A class interval isgenerally in the multiples of 5, 10, 15 and 20.

    Sturges formula to find number of classes is given below

    K = 1 + 3.322 log N.

    K = No. of class

    log N = Logarithm of total no. of observations

    Ex : If total number of observations are 100, then number of classes could be

    K = 1 + 3.322 log 100

    K = 1 + 3.322 x 2

    K = 1 + 6.644

    K = 7.644 = 8 (Rounded off)

    NOTE : Under this formula number of class cant be less than 4 and not greater than 20.

    Class mid point or class marks

    The mid value or central value of the class interval is called mid point.

    Mid point of a class =2

    class)of limitupperclassof limit(lower

    Sturges formula to find size of class interval

    Size of class interval (h) = Nlog322.31

    Range

    Ex : In a 5 group of worker, highest wage is Rs. 250 and lowest wage is 100 per day. Find the

    size of interval.

    h = Nlog322.31

    Range =

    50log322.31100250

    = 55.57 56

    Constructing a frequency distribution

  • 8/11/2019 Introduction to Statistics.docx

    19/36

    The following guidelines may be considered for the construction of frequencydistribution.

    a) The classes should be clearly defined and each observations must belong to one and toonly one class interval. Interval classes must be inclusive and non-overlapping.

    b) The number of classes should be neither too large nor too small.

    Too small classes result greater interval width with loss of accuracy. Too many classinterval result is complexity.

    c) All interval should be of the same width. This is preferred for easy computations.

    The width of interval =classesof Number

    Range

    d) Open end classes should be avoided since creates difficulty in analysis and interpretation.

    e) Intervals would be continuous throughout the distribution. This is important forcontinuous distribution.

    f) The lower limits of the class intervals should be simple multiples of the interval.

    Ex : A simple of 30 persons weight of a particular class students are as follows. Construct afrequency distribution for the given data.

    62 58 58 52 48 53 54 63 69 63

    57 56 46 48 53 56 57 59 58 53

    52 56 57 52 52 53 54 58 61 63

    Steps of constructionStep 1

    Find the range of data (H) Highest value = 70

    (L) Lowest value = 46

    Range = H L = 69 46 = 23

    Step 2

    Find the number of class intervals.

    Sturges formula

    K = 1 + 3.322 log N.

    K = 1 + 3.222 log 30K = 5.90 Say K = 6

    No. of classes = 6

    Step 3

    Width of class interval

  • 8/11/2019 Introduction to Statistics.docx

    20/36

    Width of class interval =classesof Number

    Range = 4883.3

    623

    Step 4

    Conclusions all frequencies belong to each class interval and assign this total frequencyto corresponding class intervals as follows.

    Class interval Tally bars Frequency

    46 50 | | | 3

    50 54 | | | | | | | 8

    54 58 | | | | | | | 8

    58 62 | | | | | 6

    62 66 | | | | 4

    66 70 | 1

    Cumulative frequency distributionCumulative frequency distribution indicating directly the number of units that lie above

    or below the specified values of the class intervals. When the interest of the investigator is onnumber of cases below the specified value, then the specified value represents the upper limit ofthe class interval. It is known as less than cumulative frequency distribution. When theinterest is lies in finding the number of cases above specified value then this value is taken aslower limit of the specified class interval. Then, it is known as more than cumulativefrequency distribution.

    The cumulative frequency simply means that summing up the consecutive frequency.

    Ex :

    Marks No. of studentsLess thancumulativefrequency

    0 10 5 5

    10 20 3 8

    20 30 10 18

    30 40 20 38

    40 50 12 50

  • 8/11/2019 Introduction to Statistics.docx

    21/36

    In the above less than cumulative frequency distribution, there are 5 students less than10, 3 less than 20 and 10 less than 30 and so on.

    Similarly, following table shows greater than cumulative frequency distribution.

    Ex :

    Marks No. of studentsLess thancumulativefrequency

    0 10 5 50

    10 20 3 45

    20 30 10 42

    30 40 20 32

    40 50 12 12

    In the above greater than cumulative frequency distribution, 50 students are scoredmore than 0, 45 more than 10, 42 more than 20 and so on.

    Diagrammatic and Graphic RepresentationThe data collected can be presented graphically or pictorially to be easy understanding

    and for quick interpretation. Diagrams and graphs gives visual indications of magnitudes,groupings, trends and patterns in the data. There parameter can be more simply presented in the

    graphical manner. The diagrams and graphs helps for comparison of the variables.

    Diagrammatic presentationA diagram is a visual form for presentation of statistical data. The diagram refers various

    types of devices such as bars, circles, maps, pictorials and cartograms etc.

    Importance of Diagrams1. They are simple, attractive and easy understandable

    2. They give quick information3. It helps to compare the variables

    4. Diagrams are more suitable to illustrate discrete data

    5. It will have more stable effect in the readers mind.

    L imi tations of diagrams

  • 8/11/2019 Introduction to Statistics.docx

    22/36

    1. Diagrams shows approximate value

    2. Diagrams are not suitable for further analysis

    3. Some diagrams are limited to experts (multidimensional)

    4. Details cannot be provided fully

    5. It is useful only for comparison

    General Rules for drawing the diagramsi) Each diagram should have suitable title indicating the theme with which diagram is

    intended at the top or bottom.

    ii) The size of diagram should emphasize the important characteristics of data.

    iii) Approximate proposition should be maintained for length and breadth of diagram.

    iv) A proper / suitable scale to be apoted for diagram

    v) Selection of approximate diagram is important and wrong selection may mislead thereader.

    vi) Source of data should be mentioned at bottom.

    vii) Diagram should be simple and attractive

    viii) Diagram should be effective than complex.

    Some important types of diagramsa) One dimensional diagrams (line and bar)

    b) Two-dimensional diagram (rectangle, square, circle)

    c) Three dimensional diagram (cube, sphere, cylinder etc.)

    d) pictogram

    e) Cartogram

    a) One dimensional di agrams (li ne and bar)

    In one dimensional diagrams, the length of the bars or lines are taken into account.Width of the bars are not considered. Bar diagrams are classified mainly as follows.

    i) Line diagram

    ii) Bar diagram

    - Vertical bar diagram

    - Horizontal bar diagram

    - Multiple (compound) bar diagram

  • 8/11/2019 Introduction to Statistics.docx

    23/36

    - Sub-divided (component) bar diagram

    - Percentage subdivided bar diagram

    i) Line diagram

    This is simplest type of one dimensional diagram. On the basis of size of the figures,

    heights of the bar / lines are drawn. The distance between bars are kept uniform. The limitationof this diagram are it is not attractive cannot provide more than one information.

    Ex : Draw the line diagram for the following data

    Year 2001 2002 2003 2004 2005 2006

    No. of students passed in firstclass with distinction

    5 7 12 5 13 15

    2001 2002 2003 2004 2005 20064

    6

    8

    10

    12

    14

    16(15)

    (13)

    (5)

    (12)

    (7)

    (5)

    N o .

    o f s t u

    d e n

    t s p a s s e

    d i n F C D

    Year

    Indication of diagram: Highest FCD is at 2006 and lowest FCD are at 2001 and 2004.

    b) Simple bars diagr am

    A simple bar diagram can be drawn using horizontal or vertical bar. In business andeconomics, it is very a common diagram.

    Vertical bar diagram

    The annual expresses of maintaining the car of various types are given below. Draw thevertical bar diagram. The annual expenses of maintaining includes (fuel + maintenance + repair+ assistance + insurance).

  • 8/11/2019 Introduction to Statistics.docx

    24/36

    Type of the car Expense in Rs. / Year

    Maruthi Udyog 47533

    Hyundai 59230

    Tata Motors 63270

    Source : 2005 TNS TCS Study

    Published at : Vijaya Karnataka, dated: 03.08.2006

    Source : 2005 TNS TCS Study

    Published at : Vijaya Karnataka, dated: 03.08.2006

    Indicating of diagram

    a) Annual expenses of Maruthi Udyog brand car is comparatively less with other brands depicted

    b) High annual expenses of Tata motors brand can be seen from diagram.

    Horizontal bar diagram

    World biggest top 10 steel makers are data are given below. Draw horizontal bardiagram.

    Steelmaker

    ArcelorMittal Nippon POSCO JFE

    BAOSteel

    USSteel NUCOR

    RIVA Thyssen-krupp

    Tangshan

    47533

    5923063270

    30000

    35000

    40000

    45000

    50000

    5500060000

    65000

    70000

    Maruthi Udyog Hyundai Tata Motors

  • 8/11/2019 Introduction to Statistics.docx

    25/36

    Prodn.in

    milliontonnes

    110 32 31 30 24 20 18 18 17 16

    Source: ISSB Published by India Today

    Compound bar diagram (Multiple bar diagram)Multiple bar diagrams are used to provide more information than simple bar diagram.

    Multiple bar diagram provides more than one phenomenon and highly useful for directcomparison. The bars are drawn side by side and different columns, shades hatches can be usedfor indicating each variables used.

    Ex : Draw the bar diagram for the following data. Resale value of the cars (Rs. 000) are asfollows.

    Year (Model) Santro Zen Wagonr

    2003 208 252 248

    2004 240 278 274

    2005 261 296 302

    110

    3231

    30

    24

    20

    18

    18

    17

    16

    0 20 40 60 80 100 120

    Arcelor Mittal

    Nippon

    POSCO

    JFE

    BAO Steel

    US Steel

    NUCOR

    RIVA

    Thyssen-krupp

    Tangshan

    T o p -

    1 0 S t e e

    l M a

    k e r s

    Production of Steel (Million Tonnes)

  • 8/11/2019 Introduction to Statistics.docx

    26/36

    Source : True value used car purchase data

    Published by : Vijaya Karnataka, dated: 03.08.2006

    Ex : Represent following in suitable diagram

    Class A B C

    Male 1000 1500 1500

    Female 500 800 1000Total 1500 2300 2500

    208

    252 248240

    278 274261

    296 302

    0

    50

    100

    150

    200

    250

    300

    350

    1 2 3Model of Car

    V a

    l u e

    i n R s

    .

    Santro Zen Wagnor

  • 8/11/2019 Introduction to Statistics.docx

    27/36

    Ex : Draw the suitable diagram for following data

    Mode ofinvestment

    Investment in 2004 in Rs. Investment in 2005 in Rs.

    Investment %age Investment %age

    NSC 25000 43.10 30000 45.45

    MIS 15000 25.86 10000 15.15

    Mutual Fund 15000 25.86 25000 37.87LIC 3000 5.17 1000 1.52

    Total 58000 100 66000 100

    1000

    500

    1500

    800

    1500

    1000

    0

    500

    1000

    1500

    2000

    2500

    P o p u

    l a t i o n

    ( i n N

    o s . )

    1 2 3

    Class

    Male Female

    1500

    23002500

  • 8/11/2019 Introduction to Statistics.docx

    28/36

  • 8/11/2019 Introduction to Statistics.docx

    29/36

    Electricity 300 700

    House Rent 1500 2800

    Vehicle Fuel 500 1000

    Total 3500 7000

    Total expenditure will be taken as 100 and the expenditure on individual items areexpressed in percentage. The width of two rectangles are in proportion to the total expenses ofthe two families i.e. 3500 : 7000 or 1 : 2. The height of rectangles are according to percentage ofexpenses.

    ItemExpenditure

    Monthly expenditure

    Family A (Rs. 3500) Family B(Rs. 7000)

    Rs. %age Rs. %age

    Provisional stores 1000 28.57 2000 28.57

    Education 250 7.14 500 7.14

    Electricity 300 8.57 700 10

    House Rent 1500 42.85 2800 40

    Vehicle Fuel 500 12.85 1000 14.28

    Total 3500 100 7000 100

    0

    20

    40

    60

    80

    100

    B A

    % o

    f E x p e n

    d i t u r e

    Family

    Provisonal Stores Education Electricity House Rent Vehicle Fuel

  • 8/11/2019 Introduction to Statistics.docx

    30/36

    b) Square diagram

    To draw square diagrams, the square root is taken of the values of the various items to beshown. A suitable scale may be used to depict the diagram. Ratios are to be maintained to drawsquares.

    Ex : Draw the square diagram for following data

    4900 2500 1600

    Solution : Square root for each item in found out as 70, 50 and 40 and is divided by 10; thus weget 7, 5 and 4.

    0

    1000

    2000

    3000

    4000

    5000

    6000

    754

    321

    4900

    2500

    1600

  • 8/11/2019 Introduction to Statistics.docx

    31/36

    Pie diagramPie diagram helps us to show the portioning of a total into its component parts. It is used

    to show classes or groups of data in proportion to whole data set. The entire pie represents allthe data, while each slice represents a different class or group within the whole. Following

    illustration shows construction of pie diagram.

    Draw the pie diagram for following dataRevenue collections for the year 2005-2006 by government in Rs. (crore)s for petroleum

    products are as follows. Draw the pie diagram.

    Customs 9600

    Excise 49300

    Corporate Tax and dividend 18900

    States taking 48800

    Total 126600

    Solution:

    Item / Source Value incrores

    Angle of circle %ge

    Customs 9600 o30.27360x126600

    9600 7.58

    Excise 49300 o20.140360x12660049300

    39.00

    Corporate Tax and Dividend 18900 o70.53360x12660018900

    14.92

    States taking 48800 o80.138360x

    126600

    48800 38.50

    Total 126600 360 o 100

  • 8/11/2019 Introduction to Statistics.docx

    32/36

    Source : India Today 19 June, 2006

    Choice or selection of diagramThere are many methods to depict statistical data through diagram. No angle diagram is

    suited for all purposes. The choice / selection of diagram to suit given set of data requires skill,knowledge and experience. Primarily, the choice depends upon the nature of data and purpose of

    presentation, to whom it is meant. The nature of data will help in taking a decision as to one-dimensional or two-dimensional or three-dimensional diagram. It is also required to know theaudience for whom the diagram is depicted.

    The following points are to be kept in mind for the choice of diagram.

    1. To common man, who has less knowledge in statistics cartogram and pictograms aresuited.

    2. To present the components apart from magnitude of values, sub-divided bar diagram can be used.

    3. When a large number of components are to be shows, pie diagram is suitable.

    Graphic presentation

    A graphic presentation a visual form of presentation graphs are drawn on a special typeof paper known are graph paper.

    Common graphic representations are

    a) Histogram

    b) Frequency polygon

    c) Cumulative frequency curve (ogive)

    7.58

    39

    14.92

    38.5

    Customs

    Excise

    Corporate Taxand Dividend

    States taking

  • 8/11/2019 Introduction to Statistics.docx

    33/36

    Advantages of graphic presentation1. It provides attractive and impressive view

    2. Simplifies complexity of data

    3. Helps for direct comparison

    4. It helps for further statistical analysis

    5. It is simplest method of presentation of data

    6. It shows trend and pattern of data

    Di ff erence between graph and diagram

    Diagram Graph

    1. Ordinary paper can be used 1. Graph paper is required2. It is attractive and easily

    understandable2. Needs some effect to understand

    3. It is appropriate and effective tomeasure more variable

    3. It creates problem

    4. It cant be used for further analysis 4. Can be used for further analysis5. It gives comparison 5. It shows relationship between

    variables

    6. Data are represented by bars,rectangles

    6. Points and lines are used to representdata

    Frequency HistogramIn this type of representation the given data are plotted in the form of series of rectangles.

    Class intervals are marked along the x-axis and the frequencies are along the y-axis according tosuitable scale. Unlike the bar chart, which is one-dimensional, a histogram is two-dimensional inwhich the length and width are both important. A histogram is constructed from a frequencydistribution of grouped data, where the height of rectangle is proportional to respective frequencyand width represents the class interval. Each rectangle is joined with other and the blank space

    between the rectangles would mean that the category is empty and there is no values in that classinterval.

    Ex : Construct a histogram for following data.Marks obtained (x) No. of students (f) Mid point

    15 25 5 20

    25 35 3 3035 45 7 40

    45 55 5 50

    55 65 3 60

    65 75 7 70

    Total 30

  • 8/11/2019 Introduction to Statistics.docx

    34/36

    For convenience sake, we will present the frequency distribution along with mid-point ofeach class interval, where the mid-point is simply the average of value of lower and upper

    boundary of each class interval.

    0

    1

    2

    3

    4

    5

    6

    7

    75655545352515

    F r e q u e n c y

    ( N o .

    o f s

    t u d e n

    t s )

    Class Interval (Marks)

    Frequency polygonA frequency polygon is a line chart of frequency distribution in which either the values of

    discrete variables or the mid-point of class intervals are plotted against the frequency and those plotted points are joined together by straight lines. Since, the frequencies do not start at zero orend at zero, this diagram as such would not touch horizontal axis. However, since the area underentire curve is the same as that of a histogram which is 100%. The curve must be enclosed, sothat starting mid-poi nt is jointed with fictitious preceding mid -point whose value is zero. Sothat the beginning of curve touches the horizontal axis and the last mid-point is joined with afictitious succeeding mid -point, whose value is also zero, so that the curve will end athorizontal axis. This enclosed diagram is known as frequency polygon.

    Ex : For following data construct frequency polygon.Marks (CI) No. of frequencies (f) Mid-point

    15 25 5 2025 35 3 30

    35 45 7 40

    45 55 5 50

    55 65 3 60

    65 75 7 70

  • 8/11/2019 Introduction to Statistics.docx

    35/36

    0 10 20 30 40 50 60 70 80 90 1000

    2

    4

    6

    8

    10

    A Frequency polygon

    F r e q u e n c y

    Mid point (x)

    Cumulative frequency curve (ogive)ogives are the graphic representations of a cumulative frequency distribution. These

    ogives are classified as less than and more than ogives. In case of less than, cumulat ivefrequencies are plotted against upper boundaries of their respective class intervals. In case ofgrater than cumulative frequencies are plotted against upper boundaries of their respective classintervals. These ogives are used for comparison purposes. Several ogves can be compared onsame grid with different colour for easier visualisation and differentiation.

    Ex :Marks

    (CI)No. of

    frequencies (f) Mid-pointCum. Freq.Less than

    Cum. Freq.More than

    15 25 5 20 5 30

    25 35 3 30 8 25

    35 45 7 40 15 22

    45 55 5 50 20 15

    55 65 3 60 23 1065 75 7 70 30 7

  • 8/11/2019 Introduction to Statistics.docx

    36/36

    Less than give diagram

    20 30 40 50 60 70

    5

    10

    15

    20

    25

    30

    'Less than' ogive

    L e

    s s

    t h a n

    C u m u

    l a t i v e

    F r e q u e n

    c y

    Upper Boundary (CI)

    Less than give diagram

    10 20 30 40 50 60 70

    10

    15

    20

    25

    30

    35

    'More than' ogive

    M o r e

    t h a n

    O g

    i v e

    Lower Boundary (CI)