Final Assignment on Stats

Embed Size (px)

Citation preview

  • 8/8/2019 Final Assignment on Stats

    1/36

    STATISTICS FOR MANAGEMENT

    Question 1: What do you mean by sample survey?

    What are the different sampling methods? Briefly

    describe them?

    Answer

    Introduction:

    In statistics, survey sampling describes the process of selecting a sample ofelements from a target population in order to conduct a survey.

    A survey may refer to many different types or techniques of observation, but in thecontext of survey sampling it most often refers to a questionnaire used to measure

    the characteristics and/or attitudes of people. The purpose of sampling is to reducethe cost and/or the amount of work that it would take to survey the entire targetpopulation. A survey that measures the entire target population is called a census.

    Probability Sampling:

    In a probability sample (also called "scientific" or "random" sample) each memberof the target population has a known and non-zero probability of inclusion in thesample. A survey based on a probability sample can in theory produce statisticalmeasurements of the target population that are:

    unbiased, the expected value of the sample mean is equal to the populationmean E()=, and

    Have a measurable sampling error, which can be expressed as a confidenceinterval, ormargin of error.

    A probability based survey sample is created by constructing a list of the targetpopulation, called the sample frame, a randomized process for selecting units fromthe sample frame, called a selection procedure, and a method of contacting selectedunits to and enabling them complete the survey, called a data collection method ormode. For some target populations this process may be easy, for example, samplingthe employees of a company by using payroll list. However, in large, disorganized

    populations simply constructing a suitable sample frame is often a complex andexpensive task. Common methods of conducting a probability sample of thehousehold population in the United States are Area Probability Sampling, RandomDigit Dial telephone sampling, and more recently Address Based Sampling. Withinprobability sampling there are specialized techniques such as stratified sampling and

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 1

    http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Surveyhttp://en.wikipedia.org/wiki/Sampling_(statistics)http://en.wikipedia.org/wiki/Censushttp://en.wikipedia.org/wiki/Unbiasedhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Margin_of_errorhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Surveyhttp://en.wikipedia.org/wiki/Sampling_(statistics)http://en.wikipedia.org/wiki/Censushttp://en.wikipedia.org/wiki/Unbiasedhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Margin_of_error
  • 8/8/2019 Final Assignment on Stats

    2/36

    STATISTICS FOR MANAGEMENT

    cluster sampling that improve the precision or efficiency of the sampling processwithout altering the fundamental principles of probability sampling.

    Bias in Probability Sampling:

    Bias in surveys is undesirable, but often unavoidable. The major types of bias thatmay occur in the sampling process are:

    Non-response bias: When individuals or households selected in the surveysample cannot or will not complete the survey there is the potential for biasto result from this non-response. No response bias occurs when the observedvalue deviates from the population parameter due to differences betweenrespondents and no respondents.

    Coverage bias: Coverage bias can occur when population members do notappear in the sample frame (under coverage). Coverage bias occurs when theobserved value deviates from the population parameter due to differencesbetween covered and non-covered units. Telephone surveys suffer from awell known source of coverage bias because they cannot include householdswithout telephones.

    Selection Bias: Selection bias occurs when some units have a differing probability of selection that is unaccounted for by the researcher. Forexample, some households have multiple phone numbers making them morelikely to be selected in a telephone survey than households with only onephone number.

    Non-Probability Sampling:

    Many surveys are not based on a probability samples, but rather by finding asuitable collection of respondents to complete the survey. Some common examplesof non-probability sampling are:

    Judgment Samples: A researcher decides which population members toinclude in the sample based on his or her judgment. The researcher may provide some alternative justification for the representativeness of thesample.

    Snowball Samples: Often used when a target population is rare, members ofthe target population recruit other members of the population for the survey.

    Quota Samples: The sample is designed to include a designated number of people with certain specified characteristics. For example, 100 coffeedrinkers. This type of sampling is common in non-probability marketresearch surveys.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 2

  • 8/8/2019 Final Assignment on Stats

    3/36

    STATISTICS FOR MANAGEMENT

    Convenience Samples: The sample is composed of whatever persons can bemost easily accessed to fill out the survey.

    In non-probability samples the relationship between the target population and thesurvey sample is immeasurable and potential bias is unknowable. Sophisticated

    users of non-probability survey samples tend to view the survey as an experimentalcondition, rather than a tool for population measurement, and examine the results forinternally consistent relationships

    Sampling Methods:

    Random sampling is the purest form of probability sampling. Each member of thepopulation has an equal and known chance of being selected. When there are verylarge populations, it is often difficult or impossible to identify every member of thepopulation, so the pool of available subjects becomes biased.

    Systematic sampling is often used instead of random sampling. It is also called anNth name selection technique. After the required sample size has been calculated,every Nth record is selected from a list of population members. As long as the listdoes not contain any hidden order, this sampling method is as good as the randomsampling method. Its only advantage over the random sampling technique issimplicity.

    Stratified sampling is commonly used probability method that is superior torandom sampling because it reduces sampling error. A stratum is a subset of thepopulation that shares at least one common characteristic. Examples of stratumsmight be males and females, or managers and non-managers. The researcher firstidentifies the relevant stratums and their actual representation in the population.Random sampling is then used to select a sufficientnumber of subjects from eachstratum. "Sufficient" refers to a sample size large enough for us to be reasonablyconfident that the stratum represents the population.

    Convenience sampling is used in exploratory research where the researcher isinterested in getting an inexpensive approximation of the truth. As the name implies,the sample is selected because they are convenient. This no probability method isoften used during preliminary research efforts to get a gross estimate of the results,without incurring the cost or time required to select a random sample.

    Judgment sampling is a common no probability method. The researcher selects thesample based on judgment. This is usually an extension of convenience sampling.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 3

  • 8/8/2019 Final Assignment on Stats

    4/36

    STATISTICS FOR MANAGEMENT

    For example, a researcher may decide to draw the entire sample from one"representative" city, even though the population includes all cities. When using thismethod, the researcher must be confident that the chosen sample is trulyrepresentative of the entire population.

    Quota sampling is the no probability equivalent of stratified sampling. Likestratified sampling, the researcher first identifies the stratums and their proportionsas they are represented in the population. Then convenience or judgment sampling isused to select the required number of subjects from each stratum. This differs fromstratified sampling, where the stratums are filled by random sampling.

    Snowball sampling is a special no probability method used when the desiredsample characteristic is rare. It may be extremely difficult or cost prohibitive tolocate respondents in these situations. Snowball sampling relies on referrals frominitial subjects to generate additional subjects.

    Question 2: What is the different between correlation andregression? What do you understand by Rank Correlation?

    When we use rank correlation and when we use Pearsonian

    Correlation Coefficient? Fit a linear regression line in the

    following data

    X 12 15 18 20 27 34 28 48

    Y 123 150 158 170 180 184 176 130

    Answer

    Correlation:

    Several sets of (x,y) points, with the correlation coefficient ofx andy for each set.Note that the correlation reflects the noisiness and direction of a linear relationship

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 4

    http://en.wikipedia.org/wiki/File:Correlation_examples.pnghttp://en.wikipedia.org/wiki/File:Correlation_examples.png
  • 8/8/2019 Final Assignment on Stats

    5/36

    STATISTICS FOR MANAGEMENT

    (top row), but not the slope of that relationship (middle), nor many aspects ofnonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but inthat case the correlation coefficient is undefined because the variance ofYis zero. Instatistics, correlation (often measured as a correlation coefficient, ) indicates thestrength and direction of a relationship between two random variables. The

    commonest use refers to a linear relationship, but the concept of nonlinearcorrelation is also used. In general statistical usage, correlation orco-relation refersto the departure of two random variables from independence. In this broad sensethere are several coefficients, measuring the degree of correlation, adapted to thenature of the data.

    Pearson's product-momentcoefficient:

    A number of different coefficients are used for different situations. The best known

    is the Pearson product-moment correlation coefficient, which is obtained bydividing the covariance of the two variables by the product of their standarddeviations. Karl Pearson developed the coefficient from a similar but slightlydifferent idea by Francis Galton.

    Regression analysis:

    In statistics, regression analysis includes any techniques for modeling andanalyzing several variables, when the focus is on the relationship between a

    dependent variable and one or more independent variables. More specifically,regression analysis helps us understand how the typical value of the dependentvariable changes when any one of the independent variables is varied, while theother independent variables are held fixed. Most commonly, regression analysisestimates the conditional expectation of the dependent variable given theindependent variables that is, the average value of the dependent variable whenthe independent variables are held fixed. Less commonly, the focus is on a quantile,or otherlocation parameterof the conditional distribution of the dependent variablegiven the independent variables. In all cases, the estimation target is a function ofthe independent variables called the regression function. In regression analysis, it isalso of interest to characterize the variation of the dependent variable around the

    regression function, which can be described by a probability distribution.

    Regression analysis is widely used forprediction (including forecasting of time-series data). Use of regression analysis for prediction has substantial overlap withthe field ofmachine learning. Regression analysis is also used to understand which

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 5

    http://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Random_variableshttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Covariancehttp://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Karl_Pearsonhttp://en.wikipedia.org/wiki/Francis_Galtonhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Dependent_variablehttp://en.wikipedia.org/wiki/Independent_variablehttp://en.wikipedia.org/wiki/Conditional_expectationhttp://en.wikipedia.org/wiki/Quantilehttp://en.wikipedia.org/wiki/Location_parameterhttp://en.wikipedia.org/wiki/Function_(mathematics)http://en.wikipedia.org/wiki/Probability_distributionhttp://en.wikipedia.org/wiki/Predictionhttp://en.wikipedia.org/wiki/Forecasthttp://en.wikipedia.org/wiki/Time_serieshttp://en.wikipedia.org/wiki/Time_serieshttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Random_variableshttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Covariancehttp://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Karl_Pearsonhttp://en.wikipedia.org/wiki/Francis_Galtonhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Dependent_variablehttp://en.wikipedia.org/wiki/Independent_variablehttp://en.wikipedia.org/wiki/Conditional_expectationhttp://en.wikipedia.org/wiki/Quantilehttp://en.wikipedia.org/wiki/Location_parameterhttp://en.wikipedia.org/wiki/Function_(mathematics)http://en.wikipedia.org/wiki/Probability_distributionhttp://en.wikipedia.org/wiki/Predictionhttp://en.wikipedia.org/wiki/Forecasthttp://en.wikipedia.org/wiki/Time_serieshttp://en.wikipedia.org/wiki/Time_serieshttp://en.wikipedia.org/wiki/Machine_learning
  • 8/8/2019 Final Assignment on Stats

    6/36

  • 8/8/2019 Final Assignment on Stats

    7/36

    STATISTICS FOR MANAGEMENT

    are uncorrelated. However, in the special case when X and Y are jointly normal,uncorrelatedness is equivalent to independence.

    A correlation between two variables is diluted in the presence of measurement erroraround estimates of one or both variables, in which case disattenuation provides a

    more accurate coefficient.

    Sample correlation:

    If we have a series ofn measurements ofX and Y written asxi andyi where i = 1,2, ..., n, then the Pearson product-moment correlation coefficient can be used toestimate the correlation ofX and Y. The Pearson coefficient is also known as the"sample correlation coefficient". The Pearson correlation coefficient is then the bestestimate of the correlation ofXand Y. The Pearson correlation coefficient is written:

    where and are the sample means ofX and Y,sx andsy are the sample standarddeviations ofX and Y and the sum is from i = 1 to n. As with the populationcorrelation, we may rewrite this as

    Again, as is true with the population correlation, the absolute value of the samplecorrelation must be less than or equal to 1. The above formula conveniently suggestsa single-pass algorithm for calculating sample correlations, but, depending on thenumbers involved, it can sometimes be numerically unstable.

    The square of the sample correlation coefficient, which is also known as thecoefficient of determination, is the fraction of the variance in yi that is accounted forby a linear fit ofxi toyi. This is written

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 7

    http://en.wikipedia.org/wiki/Uncorrelatedhttp://en.wikipedia.org/wiki/Bivariate_Gaussian_distributionhttp://en.wikipedia.org/wiki/Disattenuationhttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Arithmetic_meanhttp://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Numerical_stabilityhttp://en.wikipedia.org/wiki/Coefficient_of_determinationhttp://en.wikipedia.org/wiki/Uncorrelatedhttp://en.wikipedia.org/wiki/Bivariate_Gaussian_distributionhttp://en.wikipedia.org/wiki/Disattenuationhttp://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficienthttp://en.wikipedia.org/wiki/Arithmetic_meanhttp://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Numerical_stabilityhttp://en.wikipedia.org/wiki/Coefficient_of_determination
  • 8/8/2019 Final Assignment on Stats

    8/36

    STATISTICS FOR MANAGEMENT

    Wheresy|x2 is the square of the error of a linear regression ofxi onyi by the equation

    y = a + bx:

    Andsy2 is just the variance ofy:

    Note that since the sample correlation coefficient is symmetric in xi andyi, we willget the same value for a fit ofyi toxi:

    This equation also gives an intuitive idea of the correlation coefficient for higherdimensions. Just as the above described sample correlation coefficient is the fractionof variance accounted for by the fit of a 1-dimensional linear sub manifold to a setof 2-dimensional vectors (xi,yi), so we can define a correlation coefficient for a fit ofan m-dimensional linear sub manifold to a set of n-dimensional vectors. Forexample, if we fit a plane z = a + bx + CYto a set of data (xi, yi, zi) then thecorrelation coefficient ofztox andy is

    The distribution of the correlation coefficient has been examined by R. A. Fisherand A. K. Gayen.

    Geometric interpretation:

    For centered data (i.e., data which have been shifted by the sample mean so as tohave an average of zero), the correlation coefficient can also be viewed as the cosine

    of the angle between the two vectors of samples drawn from the two randomvariables.

    Some practitioners prefer a un centered (non-Pearson-compliant) correlationcoefficient. See the example below for a comparison.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 8

    http://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Equationhttp://en.wikipedia.org/wiki/Dimensionhttp://en.wikipedia.org/wiki/Euclidean_spacehttp://en.wikipedia.org/wiki/R._A._Fisherhttp://en.wikipedia.org/wiki/Cosinehttp://en.wikipedia.org/wiki/Anglehttp://en.wikipedia.org/wiki/Vector_(geometry)http://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Equationhttp://en.wikipedia.org/wiki/Dimensionhttp://en.wikipedia.org/wiki/Euclidean_spacehttp://en.wikipedia.org/wiki/R._A._Fisherhttp://en.wikipedia.org/wiki/Cosinehttp://en.wikipedia.org/wiki/Anglehttp://en.wikipedia.org/wiki/Vector_(geometry)
  • 8/8/2019 Final Assignment on Stats

    9/36

    STATISTICS FOR MANAGEMENT

    As an example, suppose five countries are found to have gross national products of1, 2, 3, 5, and 8 billion dollars, respectively. Suppose these same five countries (inthe same order) are found to have 11%, 12%, 13%, 15%, and 18% poverty. Then letx and y be ordered 5-element vectors containing the above data: x = (1, 2, 3, 5, 8)and y = (0.11, 0.12, 0.13, 0.15, 0.18).

    By the usual procedure for finding the angle between two vectors (see dot product),the uncenteredcorrelation coefficient is:

    Note that the above data were deliberately chosen to be perfectly correlated: y =0.10 + 0.01 x. The Pearson correlation coefficient must therefore be exactly one.Centering the data (shifting x by E(x) = 3.8 and y by E(y) = 0.138) yields x = (2.8,1.8, 0.8, 1.2, 4.2) and y = (0.028, 0.018, 0.008, 0.012, 0.042), from which

    As expected.

    Motivation for the form of thecoefficient of correlation:

    Another motivation for correlation comes from inspecting the method of simplelinear regression. As above,Xis the vector of independent variables,xi, and Yof thedependent variables, yi, and a simple linear relationship between Xand Yis sought,through a least-squares method on the estimate ofY:

    Then, the equation of the least-squares line can be derived to be of the form:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 9

    http://en.wikipedia.org/wiki/Dot_producthttp://en.wikipedia.org/wiki/Linear_Regressionhttp://en.wikipedia.org/wiki/Dot_producthttp://en.wikipedia.org/wiki/Linear_Regression
  • 8/8/2019 Final Assignment on Stats

    10/36

    STATISTICS FOR MANAGEMENT

    Which can be rearranged in the form?

    Where rhas the familiar form mentioned above

    Rank correlation coefficients:

    Rank correlation coefficients, such as Spearman's rank correlation coefficient andKendall's rank correlation coefficient () measure the extent to which, as onevariable increases, the other variable tends to increase, without requiring thatincrease to be represented by a linear relationship. If, as the one variable increase,the otherdecreases, the rank correlation coefficients will be negative. It is commonto regard these rank correlation coefficients as alternatives to Pearson's coefficient,used either to reduce the amount of calculation or to make the coefficient lesssensitive to non-normality in distributions. However, this view has littlemathematical basis, as rank correlation coefficients measure a different type ofrelationship than the product moment correlation coefficient, and are best seen asmeasures of a different type ofassociation, rather than as alternative measure of thepopulation correlation coefficient. To illustrate the nature of rank correlation, and itsdifference from linear correlation, consider the following four pairs of numbers(x,y): (0, 1), (100, 10), (101, 500), (102, 2000).

    As we go from each pair to the next pairx increases, and so doesy. This relationshipis perfect, in the sense that an increase in x is always accompanied by an increaseiny. This means that we have a perfect rank correlation, and both Spearman's andKendall's correlation coefficients are 1, whereas in this example Pearson's productmoment correlation coefficient is 0.456, indicating that the points are far from lyingon a straight line. In the same way ify always decreases whenxincreases, the rankcorrelation coefficients will be 1, while the product moment correlation coefficientmay or may not be close to 1, depending on how close the points are to a straightline. Although in the extreme cases of perfect rank correlation the two coefficientsare both equal (being both +1 and both -1) this is not in general so, and values of thetwo coefficients cannot meaningfully be compared. For example, for the three pairs(1, 1) (2, 3) (3, 2) Spearman's coefficient is 1/2, while Kendall's coefficient is 1/3.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 10

    http://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficienthttp://en.wikipedia.org/wiki/Kendall's_tauhttp://en.wikipedia.org/wiki/Association_(statistics)http://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficienthttp://en.wikipedia.org/wiki/Kendall's_tauhttp://en.wikipedia.org/wiki/Association_(statistics)
  • 8/8/2019 Final Assignment on Stats

    11/36

    STATISTICS FOR MANAGEMENT

    Correlation and linearity

    Four sets of data with the same correlation of 0.816

    The Pearson correlation coefficient indicates the strength of a linear relationshipbetween two variables, but its value generally does not completely characterize theirrelationship. In particular, if the conditional mean ofYgivenX, denoted E (Y|X), isnot linear inX, the correlation coefficient will not fully determine the form of E ( Y|X).

    The image on the right shows scatter plots of Anscombe's quartet, a set of fourdifferent pairs of variables created by Francis Anscombe. The foury variables havethe same mean (7.5), standard deviation (4.12), correlation (0.816) and regressionline (y = 3 + 0.5x). However, as can be seen on the plots, the distribution of thevariables is very different. The first one (top left) seems to be distributed normally,

    and corresponds to what one would expect when considering two variablescorrelated and following the assumption of normality. The second one (top right) isnot distributed normally; while an obvious relationship between the two variablescan be observed, it is not linear, and the Pearson correlation coefficient is notrelevant. In the third case (bottom left), the linear relationship is perfect, except forone outlierwhich exerts enough influence to lower the correlation coefficient from 1to 0.816. Finally, the fourth example (bottom right) shows another example whenone outlier is enough to produce a high correlation coefficient, even though therelationship between the two variables is not linear.

    If a pair (X, Y) of random variables follows a bivariate normal distribution, the

    conditional mean E (X|Y) is a linear function ofY, and the conditional mean E (Y|X)is a linear function ofX. The correlation coefficient rbetweenXand Y, along withthe marginal means and variances ofXand Y, determines this linear relationship:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 11

    http://en.wikipedia.org/wiki/Conditional_expectationhttp://en.wikipedia.org/wiki/Scatterplothttp://en.wikipedia.org/wiki/Anscombe's_quartethttp://en.wikipedia.org/wiki/Francis_Anscombehttp://en.wikipedia.org/wiki/Outlierhttp://en.wikipedia.org/wiki/File:Anscombe.svghttp://en.wikipedia.org/wiki/File:Anscombe.svghttp://en.wikipedia.org/wiki/Conditional_expectationhttp://en.wikipedia.org/wiki/Scatterplothttp://en.wikipedia.org/wiki/Anscombe's_quartethttp://en.wikipedia.org/wiki/Francis_Anscombehttp://en.wikipedia.org/wiki/Outlier
  • 8/8/2019 Final Assignment on Stats

    12/36

    STATISTICS FOR MANAGEMENT

    whereEXandEYare the expected values ofXand Y, respectively, and x and y arethe standard deviations ofXand Y, respectively.

    a) Fit a linear regression line in the following data

    X 12 15 18 20 27 34 28 48

    Y 123 150 158 170 180 184 176 130

    Answer:

    Assumed mean of X is 26.

    Assumed mean of Y is 158

    X dx

    =X-26

    dx2 Y dy= Y-158 dy2 dxdy

    12 -14 196 123 -35 1225 490

    15 -11 121 150 -8 64 88

    18 -8 64 158 0 0 0

    20 -6 36 170 12 12 -72

    27 1 1 180 22 484 22

    34 7 49 184 26 676 182

    28 2 4 176 18 324 3648 22 484 130 -28 784 -616

    Total=202 -7 955 1271 7 3701 130

    Mean of X= 202/8 = 25.25, Mean of Y= 1271/8 = 158.8

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 12

  • 8/8/2019 Final Assignment on Stats

    13/36

    STATISTICS FOR MANAGEMENT

    Regression equation of Y on X

    Y-158.8= byx (X-25.25) where byx=N*dxdy dx*dy/N*dx2 (dx)2

    byx= 8*130- (-7)(7)/ 8*955- (-7)2

    byx= 540+49/ 7640-49

    byx = 589/7591

    byx= 0.07

    Y-158.8= 0.07(X-25.25)

    Y-158.8 = 0.07X- 1.7675

    Y=0.07X+ 157.0325

    Regression equation of X on Y

    X-25.25= bxy (X-158.8) where bxy=N*dxdy dx*dy/N*dy2 (dy)2

    bxy= 8* 130 (-7)(7)/ 8* 3701 (7)2

    bxy= 540 +49 / 29559bxy= 589/ 29559

    bxy = 0.019

    X-25.25 = 0.019 (Y- 158.8)

    X 25.25 = 0.019Y 3.0172

    X = 0.019Y + 22.2328

    Regression equation of Y on X:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 13

  • 8/8/2019 Final Assignment on Stats

    14/36

    STATISTICS FOR MANAGEMENT

    Y=0.07X+ 157.0325

    Regression equation of X on Y :

    X = 0.019Y + 22.2328

    Question 3: What do you mean by business

    forecasting? What are the different methods of

    business forecasting? Describe the effectiveness of

    time-series analysis as a mode of business

    forecasting. Describe the method of moving

    averages?

    Answer

    Introduction:

    Business forecasting has always been one component of running an enterprise.However, forecasting traditionally was based less on concrete and comprehensive

    data than on face-to-face meetings and common sense. In recent years, businessforecasting has developed into a much more scientific endeavor, with a host oftheories, methods, and techniques designed for forecasting certain types of data. Thedevelopment of information technologies and the Internet propelled thisdevelopment into overdrive, as companies not only adopted such technologies intotheir business practices, but into forecasting schemes as well. In the 2000s, projecting the optimal levels of goods to buy or products to produce involvedsophisticated software and electronic networks that incorporate mounds of data andadvanced mathematical algorithms tailored to a company's particular marketconditions and line of business. Business forecasting involves a wide range of tools,including simple electronic spreadsheets; enterprise resource planning (ERP) and

    electronic data interchange (EDI) networks, advanced supply chain managementsystems, and other Web-enabled technologies. The practice attempts to pinpoint keyfactors in business production and extrapolate from given data sets to produceaccurate projections for future costs, revenues, and opportunities. This normally is

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 14

  • 8/8/2019 Final Assignment on Stats

    15/36

    STATISTICS FOR MANAGEMENT

    done with an eye toward adjusting current and near-future business practices to takemaximum advantage of expectations.

    In the Internet age, the field of business forecasting was propelled by threeinterrelated phenomena. First, the Internet provided a new series of tools to aid the

    science of business forecasting. Second, business forecasting had to take the Internetitself into account in trying to construct viable models and make predictions.Finally, the Internet fostered vastly accelerated transformations in all areas ofbusiness that made the job of business forecasters that much more exacting. By the2000s, as the Internet and its myriad functions highlighted the central importance ofinformation in economic activity, more and more companies came to recognize thevalue, and often the necessity, of business forecasting techniques and systems.Business forecasting is indeed big business, with companies investing tremendousresources in systems, time, and employees aimed at bringing useful projections intothe planning process. According to a survey by the Hudson, Ohio-based AnswerThink Consulting Group, which specializes in studies of business planning, the

    average U.S. Company spends more than 25,000 person-days on businessforecasting and related activities for every billion dollars of revenue.

    Forecasting systems draw on several sources for their forecasting input, includingdatabases, e-mails, documents, and Web sites. After processing data from varioussources, sophisticated forecasting systems integrate all the necessary data into asingle spreadsheet, which the company can then manipulate by entering in various projectionssuch as different estimates of future salesthat the system willincorporate into a new readout.

    A flexible and sound architecture is crucial, particularly in the fast-paced, rapidly

    developing Internet economy. If a system's base is rigid or inadequate, it can beimpossible to reconfigure to adjust to changing market conditions. Along the samelines, according to the Journal of Business Forecasting Methods & Systems, it'simportant to invest in systems that will remain useful over the long term, weatheringalterations in the business climate.

    One of the distinguishing characteristics of forecasting systems is themathematical algorithms they use to take various factors into account. For example,most forecasting systems arrange relevant data into hierarchies, such as a consumerhierarchy, a supply hierarchy, a geography hierarchy, and so on. To return a usefulforecast, the system can't simply allocate down each hierarchy separately, but must

    account for the ways in which those dimensions interact with each other. Moreover,the degree of this interaction varies according to the type of business in which acompany is engaged. Thus, businesses need to fine-tune their allocation algorithmsin order to receive useful forecasts.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 15

  • 8/8/2019 Final Assignment on Stats

    16/36

    STATISTICS FOR MANAGEMENT

    The second forecasting model is cause-and-effect. In this model, one assumes acause, or driver of activity, that determines an outcome. For instance, a companymay assume that, for a particular data set, the cause is an investment in informationtechnology, and the effect is sales. This model requires the historical data not onlyof the factor with which one is concerned (in this case, sales), but also of that

    factor's determined cause (here, information technology expenditures). It isassumed, of course, that the cause-and-effect relationship is relatively stable andeasily quantifiable.

    The third primary forecasting model is known as the judgmental model. In this case,one attempts to produce a forecast where there is no useful historical data. Acompany might choose to use the judgmental model when it attempts to projectsales for a brand new product, or when market conditions have qualitativelychanged, rendering previous data obsolete. In addition, according to the Journal ofBusiness Forecasting Methods & Systems, this model is useful when the bulk ofsales derive only from a relative handful of customers. To proceed in the absence of

    historical data, alternative data is collected by way of experts in the field,prospective customers, trade groups, business partners, or any other relevant sourceof information. Business forecasting systems often work hand-in-hand with supplychain management systems. In such systems, all partners in the supply chain canelectronically oversee all movement of components within that supply chain andgear the chain toward maximum efficiency.

    The Internet has proven to be a panacea in this field, and business forecastingsystems allow partners to project the optimal flow of components into the future sothat companies can try to meet optimal levels rather than continually catch up tothem.

    Time series methods:

    Time series methods use historical data as the basis of estimating future outcomes.

    Rolling forecast is a projection into the future based on past performances,routinely updated on a regular schedule to incorporate data.[1]

    Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Growth curve Topics

    Causal / Econometric methods:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 16

    http://en.wikipedia.org/wiki/Time_serieshttp://en.wikipedia.org/wiki/Forecasting#cite_note-0http://en.wikipedia.org/wiki/Moving_averagehttp://en.wikipedia.org/wiki/Exponential_smoothinghttp://en.wikipedia.org/wiki/Extrapolationhttp://en.wikipedia.org/wiki/Linear_predictionhttp://en.wikipedia.org/wiki/Trend_estimationhttp://en.wikipedia.org/wiki/Growth_curvehttp://en.wikipedia.org/wiki/Time_serieshttp://en.wikipedia.org/wiki/Forecasting#cite_note-0http://en.wikipedia.org/wiki/Moving_averagehttp://en.wikipedia.org/wiki/Exponential_smoothinghttp://en.wikipedia.org/wiki/Extrapolationhttp://en.wikipedia.org/wiki/Linear_predictionhttp://en.wikipedia.org/wiki/Trend_estimationhttp://en.wikipedia.org/wiki/Growth_curve
  • 8/8/2019 Final Assignment on Stats

    17/36

    STATISTICS FOR MANAGEMENT

    Some forecasting methods use the assumption that it is possible to identify theunderlying factors that might influence the variable that is being forecast. Forexample, sales of umbrellas might be associated with weather conditions. If thecauses are understood, projections of the influencing variables can be made and usedin the forecast.

    Regression analysis using linear regression ornon-linear regression Autoregressive moving average (ARMA) Autoregressive integrated moving average (ARIMA) e.g. Box-Jenkins Econometrics

    Judgmental methods:

    Judgmental forecasting methods incorporate intuitive judgments, opinions andsubjectiveprobability estimates.

    Composite forecasts Surveys Delphi method Scenario building Technology forecasting Forecast by analogy

    Other methods:

    Simulation Prediction market Probabilistic forecasting and Ensemble forecasting Reference class forecasting

    Forecasting accuracy:

    The forecast error is the difference between the actual value and the forecast valuefor the corresponding period.

    Where E is the forecast error at period t, Y is the actual value at period t, and F is theforecast for period t.

    Measures of aggregate error:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 17

    http://en.wikipedia.org/wiki/Regression_analysishttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Non-linear_regressionhttp://en.wikipedia.org/wiki/Autoregressive_moving_average_modelhttp://en.wikipedia.org/wiki/Autoregressive_integrated_moving_averagehttp://en.wikipedia.org/wiki/Box-Jenkinshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/w/index.php?title=Composite_forecasts&action=edit&redlink=1http://en.wikipedia.org/wiki/Statistical_surveyhttp://en.wikipedia.org/wiki/Delphi_methodhttp://en.wikipedia.org/wiki/Scenario_buildinghttp://en.wikipedia.org/wiki/Technology_forecastinghttp://en.wikipedia.org/w/index.php?title=Forecast_by_analogy&action=edit&redlink=1http://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Prediction_markethttp://en.wikipedia.org/wiki/Probabilistic_forecastinghttp://en.wikipedia.org/wiki/Ensemble_forecastinghttp://en.wikipedia.org/wiki/Reference_class_forecastinghttp://en.wikipedia.org/wiki/Regression_analysishttp://en.wikipedia.org/wiki/Linear_regressionhttp://en.wikipedia.org/wiki/Non-linear_regressionhttp://en.wikipedia.org/wiki/Autoregressive_moving_average_modelhttp://en.wikipedia.org/wiki/Autoregressive_integrated_moving_averagehttp://en.wikipedia.org/wiki/Box-Jenkinshttp://en.wikipedia.org/wiki/Econometricshttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/w/index.php?title=Composite_forecasts&action=edit&redlink=1http://en.wikipedia.org/wiki/Statistical_surveyhttp://en.wikipedia.org/wiki/Delphi_methodhttp://en.wikipedia.org/wiki/Scenario_buildinghttp://en.wikipedia.org/wiki/Technology_forecastinghttp://en.wikipedia.org/w/index.php?title=Forecast_by_analogy&action=edit&redlink=1http://en.wikipedia.org/wiki/Simulationhttp://en.wikipedia.org/wiki/Prediction_markethttp://en.wikipedia.org/wiki/Probabilistic_forecastinghttp://en.wikipedia.org/wiki/Ensemble_forecastinghttp://en.wikipedia.org/wiki/Reference_class_forecasting
  • 8/8/2019 Final Assignment on Stats

    18/36

    STATISTICS FOR MANAGEMENT

    Mean Absolute Error (MAE)

    Mean Absolute Percentage Error(MAPE)

    Percent Mean Absolute Deviation (PMAD)

    Mean squared error(MSE)

    Root Mean squared error (RMSE)

    Forecast skill (SS)

    Time-Critical Decision Modeling andAnalysis:

    The ability to model and perform decision modeling and analysis is an essentialfeature of many real-world applications ranging from emergency medical treatmentin intensive care units to military command and control systems. Existingformalisms and methods of inference have not been effective in real-timeapplications where tradeoffs between decision quality and computational tractabilityare essential. In practice, an effective approach to time-critical dynamic decisionmodeling should provide explicit support for the modeling of temporal processesand for dealing with time-critical situations.

    One of the most essential elements of being a high-performing manager is the abilityto lead effectively one's own life, then to model those leadership skills foremployees in the organization. This site comprehensively covers theory and practiceof most topics in forecasting and economics. I believe such a comprehensiveapproach is necessary to fully understand the subject. A central objective of the siteis to unify the various forms of business topics to link them closely to each other andto the supporting fields of statistics and economics. Nevertheless, the topics andcoverage do reflect choices about what is important to understand for businessdecision making. Almost all managerial decisions are based on forecasts. Everydecision becomes operational at some point in the future, so it should be based onforecasts of future conditions. Forecasts are needed throughout an organization --and they should certainly not be produced by an isolated group of forecasters.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 18

    http://en.wikipedia.org/wiki/Mean_Absolute_Percentage_Errorhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Forecast_skillhttp://en.wikipedia.org/wiki/Mean_Absolute_Percentage_Errorhttp://en.wikipedia.org/wiki/Mean_squared_errorhttp://en.wikipedia.org/wiki/Forecast_skill
  • 8/8/2019 Final Assignment on Stats

    19/36

    STATISTICS FOR MANAGEMENT

    Neither is forecasting ever "finished". Forecasts are needed continually, and as timemoves on, the impact of the forecasts on actual performance is measured; originalforecasts are updated; and decisions are modified, and so on.

    For example, many inventory systems cater for uncertain demand. The inventory

    parameters in these systems require estimates of the demand and forecast errordistributions. The two stages of these systems, forecasting and inventory control, areoften examined independently. Most studies tend to look at demand forecasting as ifthis were an end in itself or at stock control models as if there were no precedingstages of computation. Nevertheless, it is important to understand the interaction between demand forecasting and inventory control since this influences the performance of the inventory system. This integrated process is shown in thefollowing figure:

    The decision-maker uses forecasting models to assist him or her in decision-makingprocess. The decision-making often uses the modeling process to investigate theimpact of different courses of action retrospectively; that is, "as if" the decision hasalready been made under a course of action. That is why the sequence of steps in themodeling process, in the above figure must be considered in reverse order. Forexample, the output (which is the result of the action) must be considered first.

    It is helpful to break the components of decision making into three groups:Uncontrollable, Controllable, and Resources (that defines the problem situation). Asindicated in the above activity chart, the decision-making process has the following

    components:

    1. Performance measure (or indicator, or objective): Measuring businessperformance is the top priority for managers. Management by objectiveworks if you know the objectives. Unfortunately, most business managers donot know explicitly what it is. The development of effective performance

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 19

  • 8/8/2019 Final Assignment on Stats

    20/36

    STATISTICS FOR MANAGEMENT

    measures is seen as increasingly important in almost all organizations.However, the challenges of achieving this in the public and for non-profitsectors are arguably considerable. Performance measure provides thedesirable level of outcome, i.e., objective of your decision. Objective isimportant in identifying the forecasting activity. The following table

    provides a few examples of performance measures for different levels ofmanagement:

    Level Performance Measure

    StrategicReturn of Investment, Growth, and

    Innovations

    TacticalCost, Quantity, and Customersatisfaction

    OperationalTarget setting, and Conformance withstandard

    2. Clearly, if you are seeking to improve a system's performance, an

    operational view is really what you are after. Such a view gets at how aforecasting system really works; for example, by what correlation its pastoutput behaviors have generated. It is essential to understand how a forecastsystem currently is working if you want to change how it will work in thefuture. Forecasting activity is an iterative process. It starts with effective andefficient planning and ends in compensation of other forecasts for theirperformance

    3. What is a System? Systems are formed with parts put together in a particularmanner in order to pursue an objective. The relationship between the partsdetermines what the system does and how it functions as a whole. Therefore,the relationships in a system are often more important than the individual

    parts. In general, systems that are building blocks for other systems arecalledsubsystems

    4. The Dynamics of a System: A system that does not change is a static system.Many of the business systems are dynamic systems, which mean their stateschange over time. We refer to the way a system changes over time as thesystem's behavior. And when the system's development follows a typicalpattern, we say the system has a behavior pattern. Whether a system is staticor dynamic depends on which time horizon you choose and on whichvariables you concentrate. The time horizon is the time period within whichyou study the system. The variables are changeable values on the system.

    5. Resources: Resources are the constant elements that do not change during

    the time horizon of the forecast. Resources are the factors that define thedecision problem. Strategic decisions usually have longer time horizons thanboth the Tactical and the Operational decisions.

    6. Forecasts: Forecasts input come from the decision maker's environment.Uncontrollable inputs must be forecasted or predicted.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 20

  • 8/8/2019 Final Assignment on Stats

    21/36

    STATISTICS FOR MANAGEMENT

    7. Decisions: Decisions inputs ate the known collection of all possible coursesof action you might take.

    8. Interaction: Interactions among the above decision components are thelogical, mathematical functions representing the cause-and-effectrelationships among inputs, resources, forecasts, and the outcome.

    Interactions are the most important type of relationship involved in thedecision-making process. When the outcome of a decision depends on thecourse of action, we change one or more aspects of the problematic situationwith the intention of bringing about a desirable change in some other aspectof it. We succeed if we have knowledge about the interaction among thecomponents of the problem.

    There may have also sets of constraints which apply to each of thesecomponents. Therefore, they do not need to be treated separately.

    9. Actions: Action is the ultimate decision and is the best course of strategy toachieve the desirable goal.

    Simple Moving Averages:

    The best-known forecasting methods is the moving averages or simply takes acertain number of past periods and add them together; then divide by the number ofperiods. Simple Moving Averages (MA) is effective and efficient approach providedthe time series is stationary in both mean and variance. The following formula isused in finding the moving average of order n, MA(n) for a period t+1,

    MAt+1 = [Dt + Dt-1 + ... +Dt-n+1] / n

    Where n is the number of observations used in the calculation.

    The forecast for time period t + 1 is the forecast for all future time periods.However, this forecast is revised only when new data becomes available. You maylike using Forecasting by Smoothing JavaScript, and then performing somenumerical experimentation for a deeper understanding of these concepts.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 21

    http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/ForecaSmo.htmhttp://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/ForecaSmo.htm
  • 8/8/2019 Final Assignment on Stats

    22/36

    STATISTICS FOR MANAGEMENT

    Weighted Moving Average:

    Very powerful and economical. They are widely used where repeated forecastsrequired-uses methods like sum-of-the-digits and trend adjustment methods. As anexample, a Weighted Moving Averages is:

    Weighted MA (3) = w1.Dt + w2.Dt-1 + w3.Dt-2

    Where the weights are any positive numbers such that: w1 + w2 + w3 = 1. A typicalweights for this example is, w1 = 3/ (1 + 2 + 3) = 3/6, w 2 = 2/6, and w3 = 1/6.

    You may like using Forecasting by Smoothing JavaScript, and then performingsome numerical experimentation for a deeper understanding of the concepts.

    An illustrative numerical example: The moving average and weighted movingaverage of order five are calculated in the following table.

    Week Sales ($1000) MA(5) WMA(5)

    1 105 - -

    2 100 - -

    3 105 - -

    4 95 - -

    5 100 101 100

    6 95 99 98

    7 105 100 1008 120 103 107

    9 115 107 111

    10 125 117 116

    11 120 120 119

    12 120 120 119

    Moving Averages with Trends: Any method of time series analysis involves adifferent degree of model complexity and presumes a different level ofcomprehension about the underlying trend of the time series. In many business timeseries, the trend in the smoothed series using the usual moving average methodindicates evolving changes in the series level to be highly nonlinear.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 22

    http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/ForecaSmo.htmhttp://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/ForecaSmo.htm
  • 8/8/2019 Final Assignment on Stats

    23/36

    STATISTICS FOR MANAGEMENT

    In order to capture the trend, we may use the Moving-Average with Trend (MAT)method. The MAT method uses an adaptive linearization of the trend by means ofincorporating a combination of the local slopes of both the original and the

    smoothed time series.

    In making a forecast, it is also important to provide a measure of how accurate onecan expect the forecast to be. The statistical analysis of the error terms known asresidual time-series provides measure tool and decision process for modelingselection process. In applying MAT method sensitivity analysis is needed todetermine the optimal value of the moving average parameter n, i.e., the optimalnumber of period m. The error time series allows us to study many of its statistical properties for goodness-of-fit decision. Therefore it is important to evaluate thenature of the forecast error by using the appropriate statistical tests. The forecasterror must be a random variable distributed normally with mean close to zero and a

    constant variance across time.

    For computer implementation of the Moving Average with Trend (MAT) methodone may use the forecasting (FC) module of WinQSB which is commercial gradestand-alone software. WinQSBs approach is to first select the model and then enterthe parameters and the data. With the Help features in WinQSB there is no learning-curve one just needs a few minutes to master its useful features.

    Exponential Smoothing Techniques: One of the most successful forecastingmethods is the exponential smoothing (ES) techniques. Moreover, it can bemodified efficiently to use effectively for time series with seasonal patterns. It is

    also easy to adjust for past errors-easy to prepare follow-on forecasts, ideal forsituations where many forecasts must be prepared, several different forms are useddepending on presence of trend or cyclical variations. In short, an ES is an averagingtechnique that uses unequal weights; however, the weights applied to pastobservations decline in an exponential manner.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 23

    http://home.ubalt.edu/ntsbarsh/Business-stat/opre/partX.htmhttp://home.ubalt.edu/ntsbarsh/Business-stat/opre/partX.htm
  • 8/8/2019 Final Assignment on Stats

    24/36

    STATISTICS FOR MANAGEMENT

    Question 4: What is definition of Statistics? What

    are the different characteristics of statistics? What

    are the different functions of Statistics? What are

    the limitations of Statistics?

    Answer

    Introduction:

    Statistics is considered by some to be a mathematical science pertaining to thecollection, analysis, interpretation or explanation, and presentation of data, whileothers consider it to be a branch of mathematics concerned with collecting andinterpreting data. Statisticians improve the quality of data with the design ofexperiments and survey sampling. Statistics also provides tools for prediction andforecasting using data and statistical models. Statistics is applicable to a wide varietyof academic disciplines, including natural and social sciences, government, andbusiness.

    Statistical methods can be used to summarize or describe a collection of data;this is calleddescriptive statistics. This is useful in research, when communicatingthe results of experiments. In addition, patterns in the data may be modeled in a waythat accounts forrandomness and uncertainty in the observations, and are then usedto draw inferences about the process or population being studied; this is calledinferential statistics. Inference is a vital element of scientific advance, since itprovides a prediction (based in data) for where a theory logically leads. To furtherprove the guiding theory, these predictions are tested as well, as part of the scientificmethod. If the inference holds true, then the descriptive statistics of the new dataincrease the soundness of that hypothesis. Descriptive statistics and inferentialstatistics (a.k.a., predictive statistics) together comprise applied statistics. There isalso a discipline called mathematical statistics, which is concerned with thetheoretical basis of the subject. The word statistics can either be singular or plural.In its singular form, statistics refers to the mathematical science discussed in thisarticle. In its plural form,statistics is the plural of the wordstatistic, which refers toa quantity (such as a mean) calculated from a set of data.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 24

    http://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Datahttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Datahttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Survey_samplinghttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Academic_disciplinehttp://en.wikipedia.org/wiki/Naturalhttp://en.wikipedia.org/wiki/Social_sciencehttp://en.wikipedia.org/wiki/Descriptive_statisticshttp://en.wikipedia.org/wiki/Descriptive_statisticshttp://en.wikipedia.org/wiki/Descriptive_statisticshttp://en.wikipedia.org/wiki/Mathematical_modelhttp://en.wikipedia.org/wiki/Randomhttp://en.wikipedia.org/wiki/Inferential_statisticshttp://en.wikipedia.org/wiki/Scientific_methodhttp://en.wikipedia.org/wiki/Scientific_methodhttp://en.wikipedia.org/wiki/Mathematical_statisticshttp://en.wikipedia.org/wiki/Statistichttp://en.wikipedia.org/wiki/Statistichttp://en.wikipedia.org/wiki/Meanhttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Datahttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Datahttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Survey_samplinghttp://en.wikipedia.org/wiki/Statistical_modelhttp://en.wikipedia.org/wiki/Academic_disciplinehttp://en.wikipedia.org/wiki/Naturalhttp://en.wikipedia.org/wiki/Social_sciencehttp://en.wikipedia.org/wiki/Descriptive_statisticshttp://en.wikipedia.org/wiki/Mathematical_modelhttp://en.wikipedia.org/wiki/Randomhttp://en.wikipedia.org/wiki/Inferential_statisticshttp://en.wikipedia.org/wiki/Scientific_methodhttp://en.wikipedia.org/wiki/Scientific_methodhttp://en.wikipedia.org/wiki/Mathematical_statisticshttp://en.wikipedia.org/wiki/Statistichttp://en.wikipedia.org/wiki/Mean
  • 8/8/2019 Final Assignment on Stats

    25/36

    STATISTICS FOR MANAGEMENT

    Experimental and observationalstudies:

    A common goal for a statistical research project is to investigate causality, and

    in particular to draw a conclusion on the effect of changes in the values of predictorsor independent variables on dependent variables or response. There are two majortypes of causal statistical studies: experimental studies and observational studies. In both types of studies, the effect of differences of an independent variable (orvariables) on the behavior of the dependent variable are observed. The differencebetween the two types lies in how the study is actually conducted. Each can be veryeffective.

    An experimental study involves taking measurements of the system understudy, manipulating the system, and then taking additional measurements using thesame procedure to determine if the manipulation has modified the values of the

    measurements. In contrast, an observational study does not involve experimentalmanipulation. Instead, data are gathered and correlations between predictors andresponse are investigated. An example of an observational study is one that exploresthe correlation between smoking and lung cancer. This type of study typically uses asurvey to collect observations about the area of interest and then performs statisticalanalysis. In this case, the researchers would collect observations of both smokersand non-smokers, perhaps through a case-control study, and then look for thenumber of cases of lung cancer in each group.

    The basic steps of an experiment are:

    1. Planning the research, including determining information sources, researchsubject selection, and ethical considerations for the proposed research andmethod.

    2. Design of experiments, concentrating on the system model and theinteraction of independent and dependent variables.

    3. Summarizing a collection of observations to feature their commonality bysuppressing details. (Descriptive statistics)

    4. Reaching consensus about what the observations tell about the world beingobserved. (Statistical inference)

    5. Documenting / presenting the results of the study.

    Levels of measurement:

    There are four types of measurements or levels of measurement or measurementscales used in statistics:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 25

    http://en.wikipedia.org/wiki/Causalityhttp://en.wikipedia.org/wiki/Independent_variablehttp://en.wikipedia.org/wiki/Dependent_variablehttp://en.wikipedia.org/wiki/Case-control_studyhttp://en.wikipedia.org/wiki/Ethicshttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Summary_statisticshttp://en.wikipedia.org/wiki/Descriptive_statisticshttp://en.wikipedia.org/wiki/Statistical_inferencehttp://en.wikipedia.org/wiki/Statistical_inferencehttp://en.wikipedia.org/wiki/Level_of_measurementhttp://en.wikipedia.org/wiki/Causalityhttp://en.wikipedia.org/wiki/Independent_variablehttp://en.wikipedia.org/wiki/Dependent_variablehttp://en.wikipedia.org/wiki/Case-control_studyhttp://en.wikipedia.org/wiki/Ethicshttp://en.wikipedia.org/wiki/Design_of_experimentshttp://en.wikipedia.org/wiki/Summary_statisticshttp://en.wikipedia.org/wiki/Descriptive_statisticshttp://en.wikipedia.org/wiki/Statistical_inferencehttp://en.wikipedia.org/wiki/Statistical_inferencehttp://en.wikipedia.org/wiki/Level_of_measurement
  • 8/8/2019 Final Assignment on Stats

    26/36

    STATISTICS FOR MANAGEMENT

    Nominal. Ordinal. Interval. Ratio.

    Characteristics of Statistics:Some of its important characteristics are given below:

    Statistics are aggregates of facts. Statistics are numerically expressed. Statistics are affected to a marked extent by multiplicity of causes. Statistics are enumerated or estimated according to a reasonable standard of

    accuracy. Statistics are collected for a predetermine purpose. Statistics are collected in a systemic manner. Statistics must be comparable to each other.

    Functions of Statistics:

    1) Statistics helps in providing a better understanding and exact description of aphenomenon of nature.

    (2) Statistical helps in proper and efficient planning of a statistical inquiry in anyfield of study.

    (3) Statistical helps in collecting an appropriate quantitative data.

    (4) Statistics helps in presenting complex data in a suitable tabular, diagrammaticand graphic form for an easy and clear comprehension of the data.

    (5) Statistics helps in understanding the nature and pattern of variability of aphenomenon through quantitative observations.

    (6) Statistics helps in drawing valid inference, along with a measure of theirreliability about the population parameters from the sample data.

    Limitations of Statistics:

    The important limitations of statistics are:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 26

  • 8/8/2019 Final Assignment on Stats

    27/36

    STATISTICS FOR MANAGEMENT

    (1) Statistics laws are true on average. Statistics are aggregates of facts. So singleobservation is not a statistics, it deals with groups and aggregates only.

    (2) Statistical methods are best applicable on quantitative data.

    (3) Statistical cannot be applied to heterogeneous data.

    (4) It sufficient care is not exercised in collecting, analyzing and interpretation thedata, statistical results might be misleading.

    (5) Only a person who has an expert knowledge of statistics can handle statisticaldata efficiently.

    (6) Some errors are possible in statistical decisions. Particularly the inferentialstatistics involves certain errors. We do not know whether an error has beencommitted or not.

    Question 5: What are the different stages of

    planning a statistical survey? Describe the various

    methods for collecting data in a statistical survey?

    Answer

    Introduction:

    Statistical surveys are used to collect quantitative information about items in apopulation. Surveys of human populations and institutions are common in politicalpolling and government, health, social science and marketing research. A surveymay focus on opinions or factual information depending on its purpose, and manysurveys involve administering questions to individuals. When the questions areadministered by a researcher, the survey is called a structured interview or aresearcher-administered survey. When the questions are administered by therespondent, the survey is referred to as a questionnaire or a self-administered survey.

    Structure and standardization:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 27

    http://en.wikipedia.org/wiki/Social_sciencehttp://en.wikipedia.org/wiki/Marketinghttp://en.wikipedia.org/wiki/Opinionhttp://en.wikipedia.org/wiki/Researcherhttp://en.wikipedia.org/wiki/Structured_interviewinghttp://en.wikipedia.org/wiki/Researcher_administered_surveyhttp://en.wikipedia.org/wiki/Respondenthttp://en.wikipedia.org/wiki/Questionnairehttp://en.wikipedia.org/wiki/Self-administered_surveyhttp://en.wikipedia.org/wiki/Social_sciencehttp://en.wikipedia.org/wiki/Marketinghttp://en.wikipedia.org/wiki/Opinionhttp://en.wikipedia.org/wiki/Researcherhttp://en.wikipedia.org/wiki/Structured_interviewinghttp://en.wikipedia.org/wiki/Researcher_administered_surveyhttp://en.wikipedia.org/wiki/Respondenthttp://en.wikipedia.org/wiki/Questionnairehttp://en.wikipedia.org/wiki/Self-administered_survey
  • 8/8/2019 Final Assignment on Stats

    28/36

    STATISTICS FOR MANAGEMENT

    The questions are usually structured and standardized. The structure is intended toreduce bias; (see questionnaire construction). For example, questions should beordered in such a way that a question does not influence the response to subsequentquestions. Surveys are standardized to ensure reliability, generalizability, andvalidity (see quantitative marketing research). Every respondent should be presented

    with the same questions and in the same order as other respondents. Inorganizational development (OD), carefully constructed survey instruments areoften used as the basis for data gathering, organizational diagnosis, and subsequentaction planning. Some OD practitioners (e.g. Fred Nickols) even consider surveyguided development as thesine qua non of OD.

    Serial surveys:

    Serial surveys are those which repeat the same questions at different points in time,producing time-series data. They typically fall into two types:

    Cross-sectional surveys which draw a new sample each time. In a sense anyone-off survey will also be cross-sectional.

    Longitudinal surveys where the sample from the initial survey is re-contacted at a later date to be asked the same questions.

    Advantages:

    It is an efficient way of collecting information from a large number ofrespondents. Very large samples are possible. Statistical techniques can beused to determine validity, reliability, and statistical significance.

    Surveys are flexible in the sense that a wide range of information can becollected. They can be used to study attitudes, values, beliefs, and pastbehaviors.

    Because they are standardized, they are relatively free from several types oferrors.

    They are relatively easy to administer. There is an economy in data collection due to the focus provided by

    standardized questions. Only questions of interest to the researcher areasked, recorded, codified, and analyzed. Time and money is not spent ontangential questions.

    Cheaper to run.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 28

    http://en.wikipedia.org/wiki/Questionnaire_constructionhttp://en.wikipedia.org/wiki/Reliability_(psychometric)http://en.wikipedia.org/wiki/Validity_(psychometric)http://en.wikipedia.org/wiki/Quantitative_marketing_researchhttp://en.wikipedia.org/wiki/Organizational_developmenthttp://en.wikipedia.org/w/index.php?title=Survey_guided_development&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Survey_guided_development&action=edit&redlink=1http://en.wikipedia.org/wiki/Sine_qua_nonhttp://en.wikipedia.org/wiki/Time-serieshttp://en.wikipedia.org/wiki/Cross-sectional_surveyhttp://en.wikipedia.org/wiki/Longitudinal_surveyhttp://en.wikipedia.org/wiki/Sampling_(statistics)http://en.wikipedia.org/wiki/Questionnaire_constructionhttp://en.wikipedia.org/wiki/Reliability_(psychometric)http://en.wikipedia.org/wiki/Validity_(psychometric)http://en.wikipedia.org/wiki/Quantitative_marketing_researchhttp://en.wikipedia.org/wiki/Organizational_developmenthttp://en.wikipedia.org/w/index.php?title=Survey_guided_development&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Survey_guided_development&action=edit&redlink=1http://en.wikipedia.org/wiki/Sine_qua_nonhttp://en.wikipedia.org/wiki/Time-serieshttp://en.wikipedia.org/wiki/Cross-sectional_surveyhttp://en.wikipedia.org/wiki/Longitudinal_surveyhttp://en.wikipedia.org/wiki/Sampling_(statistics)
  • 8/8/2019 Final Assignment on Stats

    29/36

    STATISTICS FOR MANAGEMENT

    Disadvantages:

    They depend on subjects motivation, honesty, memory, and ability torespond. Subjects may not be aware of their reasons for any given action.They may have forgotten their reasons. They may not be motivated to giveaccurate answers; in fact, they may be motivated to give answers that presentthemselves in a favorable light.

    Structured surveys, particularly those with closed ended questions, may havelow validity when researching affective variables.

    Although the chosen survey individuals are often a random sample, errorsdue to no response may exist. That is, people who choose to respond on thesurvey may be different from those who do not respond, thus biasing theestimates.

    Survey question answer-choices could lead to vague data sets because attimes they are relative only to a personal abstract notion concerning"strength of choice". For instance the choice "moderately agree" may meandifferent things to different subjects, and to anyone interpreting the data forcorrelation. Even yes or no answers are problematic because subjects mayfor instance put "no" if the choice "only once" is not available.

    Stages of Planning a statisticalsurvey:

    1. Nature of the problem to be investigated should be clearly defined in an un-ambiguous manner.

    2. Objectives of investigation should be stated at the outset. Objectives could be toobtain certain estimates or to establish a theory or to verify an existing statement tofind relationship between characteristics etc.

    3. The scope of investigation has to be made clear. It refers to area to be covered,identification of units to be studied, nature of characteristics to be observed,accuracy of measurements, analytical methods, time, cost and other resourcesrequired.

    4. Whether to use data collected from primary or secondary source should be

    determined in advance.

    5. The organization of investigation is the final step in the process. It encompassesthe determination of number of investigators required their training, supervisionwork needed, funds required.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 29

    http://en.wikipedia.org/wiki/Closed_ended_questionhttp://en.wikipedia.org/wiki/Validity_(psychometric)http://en.wikipedia.org/wiki/Closed_ended_questionhttp://en.wikipedia.org/wiki/Validity_(psychometric)
  • 8/8/2019 Final Assignment on Stats

    30/36

    STATISTICS FOR MANAGEMENT

    Modes of Data Collection:

    There are several ways of administering a survey, including:

    Telephone:

    Use of interviewers encourages sample persons to respond, leading to higherresponse rates.

    Interviewers can increase comprehension of questions by answeringrespondents' questions.

    Fairly cost efficient, depending on local call charge structure. Good for large national (or international) sampling frames. Some potential for interviewer bias (e.g. some people may be more willing

    to discuss a sensitive issue with a female interviewer than with a male one). Cannot be used for non-audio information (graphics, demonstrations,

    taste/smell samples). Unreliable for consumer surveys in rural areas where telephone penetration

    is low. Three types:

    o traditional telephone interviews

    o computer assisted telephone dialing

    o computer assisted telephone interviewing (CATI)

    Mail:

    The questionnaire may be handed to the respondents or mailed to them, butin all cases they are returned to the researcher via mail.

    Cost is very low, since bulk postage is cheap in most countries. Long time delays, often several months, before the surveys are returned and

    statistical analysis can begin.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 30

    http://en.wikipedia.org/wiki/Sampling_framehttp://en.wikipedia.org/wiki/CATIhttp://en.wikipedia.org/wiki/Sampling_framehttp://en.wikipedia.org/wiki/CATI
  • 8/8/2019 Final Assignment on Stats

    31/36

    STATISTICS FOR MANAGEMENT

    Not suitable for issues that may require clarification. Respondents can answer at their own convenience (allowing them to break

    up long surveys; also useful if they need to check records to answer aquestion).

    No interviewer bias introduced.

    Large amount of information can be obtained: some mail surveys are as longas 50 pages.

    Response rates can be improved by using mail panels:o Members of the panel have agreed to participate.

    o Panels can be used in longitudinal designs where the same

    respondents are surveyed several.

    Online surveys:

    Can use web ore-mail. Web is preferred over e-mail because interactive HTML forms can be used. Often inexpensive to administer. Very fast results. Easy to modify. Response rates can be improved by using online panels - members of the

    panel have agreed to participate. If not password-protected, easy to manipulate by completing multiple times

    to skew results. Data creation, manipulation and reporting can be automated and/or easily

    exported. into a format which can be read by PSPP, DAP or other statisticalanalysis software.

    Data sets created in real time.

    Some are incentive based (such as Survey Vault or Yoga). May skew sample towards a younger demographic compared with CATI. Often difficult to determine/control selection probabilities, hindering

    quantitative analysis of data. Use in large scale industries.

    Personal in-home survey:

    Respondents are interviewed in person, in their homes (or at the front door). Very high cost.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 31

    http://en.wikipedia.org/wiki/Electronic_mailhttp://en.wikipedia.org/wiki/HyperText_Markup_Languagehttp://en.wikipedia.org/wiki/Online_panelhttp://en.wikipedia.org/wiki/PSPPhttp://en.wikipedia.org/wiki/DAP_(software)http://en.wikipedia.org/wiki/CATIhttp://en.wikipedia.org/wiki/Electronic_mailhttp://en.wikipedia.org/wiki/HyperText_Markup_Languagehttp://en.wikipedia.org/wiki/Online_panelhttp://en.wikipedia.org/wiki/PSPPhttp://en.wikipedia.org/wiki/DAP_(software)http://en.wikipedia.org/wiki/CATI
  • 8/8/2019 Final Assignment on Stats

    32/36

    STATISTICS FOR MANAGEMENT

    Suitable when graphic representations, smells, or demonstrations areinvolved.

    Often suitable for long surveys (but some respondents object to allowingstrangers into their home for extended periods).

    Suitable for locations where telephone or mail are not developed.

    Skilled interviewers can persuade respondents to cooperate, improvingresponse rates.

    Potential for interviewer bias.

    Personal mall intercept survey:

    Shoppers at malls are intercepted - they are interviewed on the spot, taken toa room and interviewed, or taken to a room and given a self-administeredquestionnaire.

    Socially acceptable - people feel that a mall is a more appropriate place to do

    research than their home. Potential for interviewer bias. Fast. Easy to manipulate by completing multiple times to skew results.

    Methods used to increase responserates:

    Brevity - single page if possible. Financial incentives

    o Paid in advance.

    o Paid at completion.

    Non-monetary incentiveso Commodity giveaways (pens, notepads).

    o Entry into a lottery, draw or contest.

    o Discount coupons.

    o Promise of contribution to charity.

    Preliminary notification. Foot-in-the-door techniques - start with a small inconsequential request. Personalization of the request - address specific individuals. Follow-up requests - multiple requests. Claimed affiliation with universities, research institutions, or charities. Emotional appeals. Bids for sympathy. Convince respondent that they can make a difference. Guarantee anonymity.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 32

  • 8/8/2019 Final Assignment on Stats

    33/36

    STATISTICS FOR MANAGEMENT

    Legal compulsion (certain government-run surveys).

    Question 6: What are the functions of classification?

    What are the requisites of a good classification?What is Table and describe the usefulness of a table

    in mode of presentation of data?

    Answer

    Collected data in the raw form would be voluminous and no comprehensible.Therefore it should be condensed and simplified for better understanding andusefulness. Classification is first stage in simplification. It can be defined as asystematic grouping of the units according to their common characteristics. Each of

    the group is called class. For example in survey of Industrial workers of a particularindustry, workers can be classified as unskilled, semiskilled and skilled each ofwhich form a class.

    Types of classification:The very important types are:

    1) Geographical classification: Data are classified according to region.2) Chronological classification: Data are classified according to the time of its

    occurrence.

    3) Conditional classification: Data are classified according to certainconditions.4) Qualitative classification: Classification of data that is no measurable. E.g.

    Sex of a person, marital status, color etc.5) Quantitative classification: Classification of data that is measurable either in

    discrete or continuous form.6) Statistical Series: Data arranged logically according to size or time of

    occurrence or some other measurable or no measurable characteristics.

    Methods of Classification: Classification is done according to a single attribute or variable, is known as

    one way classification.

    Classification done according to two attributes or variables is known as two-wayClassification.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 33

  • 8/8/2019 Final Assignment on Stats

    34/36

    STATISTICS FOR MANAGEMENT

    Classification done according to more than two attributes or variables isknown asManifold classification.

    Examples:

    One-way classification: No. of students who secured more than 60 % invarious sections of same course.

    Two way classification: Classification of students according to sex who

    secured more than 60 %. Manifold classification: Classification of employees according to skill, sex

    and education.

    Statistical classification is a supervised machine learning procedure in whichindividual items are placed into groups based on quantitative information on one ormore characteristics inherent in the items (referred to as traits, variables, characters,etc) and based on a training set of previously labeled items.

    Formally, the problem can be stated as follows: given training data

    produce a classifier that maps any object

    to its true classification label defined by some unknown mapping

    (ground truth). For example, if the problem is filtering spam, then issome representation of an email andy is either "Spam" or "Non-Spam".

    The second problem is to consider classification as an estimation problem,where the goal is to estimate a function of the form

    Where the feature vector input is , and the functionf

    is typically parameterized bysome parameters . In the Bayesian approach to this problem, instead of choosing a

    single parameter vector , the result is integrated over all possible thetas, with thethetas weighted by how likely they are given the training dataD:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 34

    http://en.wikipedia.org/wiki/Supervised_learninghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Training_sethttp://en.wikipedia.org/wiki/Estimationhttp://en.wikipedia.org/wiki/Bayesian_statisticshttp://en.wikipedia.org/wiki/Supervised_learninghttp://en.wikipedia.org/wiki/Machine_learninghttp://en.wikipedia.org/wiki/Training_sethttp://en.wikipedia.org/wiki/Estimationhttp://en.wikipedia.org/wiki/Bayesian_statistics
  • 8/8/2019 Final Assignment on Stats

    35/36

    STATISTICS FOR MANAGEMENT

    The third problem is related to the second, but the problem is to estimate the

    class-conditional probabilities and then use Bayes' rule to

    produce the class probability as in the second problem.

    Table:In relational databases and flat file databases, a table is a set of data elements(values) that is organized using a model of vertical columns (which are identified bytheir name) and horizontal rows. A table has a specified number of columns, but canhave any number of rows. Each row is identified by the values appearing in aparticular column subset which has been identified as a candidate key. Table isanother term forrelations; although there is the difference in that a table is usually amulti-set (bag) of rows whereas a relation is a set and does not allow duplicates.Besides the actual data rows, tables generally have associated with them some meta-information, such as constraints on the table or on the values within particular

    columns. The data in a table does not have to be physically stored in the database.Views are also relational tables, but their data are calculated at query time. Anotherexample is nicknames, which represent a pointer to a table in another database.

    Comparisons with other data structures

    In non-relational systems, hierarchical databases, the distant counterpart of a table isa structured file, representing the rows of a table in each record of the file and eachcolumn in a record.

    Unlike a spreadsheet, the data type of field is ordinarily defined by the schema

    describing the table. Some relational systems are less strict about field data typedefinitions.

    Tabulation:

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2Page 35

    http://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Bayes'_rulehttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Flat_file_databasehttp://en.wikipedia.org/wiki/Column_(database)http://en.wikipedia.org/wiki/Row_(database)http://en.wikipedia.org/wiki/Candidate_keyhttp://en.wikipedia.org/wiki/Relation_(database)http://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Check_Constrainthttp://en.wikipedia.org/wiki/View_(database)http://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Computer_filehttp://en.wikipedia.org/wiki/Spreadsheethttp://en.wikipedia.org/wiki/Datatypehttp://en.wikipedia.org/wiki/Logical_schemahttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Bayes'_rulehttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Flat_file_databasehttp://en.wikipedia.org/wiki/Column_(database)http://en.wikipedia.org/wiki/Row_(database)http://en.wikipedia.org/wiki/Candidate_keyhttp://en.wikipedia.org/wiki/Relation_(database)http://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Check_Constrainthttp://en.wikipedia.org/wiki/View_(database)http://en.wikipedia.org/wiki/Hierarchical_modelhttp://en.wikipedia.org/wiki/Computer_filehttp://en.wikipedia.org/wiki/Spreadsheethttp://en.wikipedia.org/wiki/Datatypehttp://en.wikipedia.org/wiki/Logical_schema
  • 8/8/2019 Final Assignment on Stats

    36/36

    STATISTICS FOR MANAGEMENT

    Tabulation follows classification. It is a logical listing of related data in rows andcolumns. Objectives of tabulation are:

    To simplify complex data.

    To highlight important characteristics.

    To present data in minimum space.

    To facilitate comparison. To bring out trends and tendencies.

    To facilitate further analysis.

    RAHUL GUPTA, MBAHCS (1ST SEM), SUBJECT CODE-MB0024, SET-2P