Statistics Both Set

Embed Size (px)

Citation preview

  • 8/3/2019 Statistics Both Set

    1/20

    MBA SEMESTER 1

    MB0040 STATISTICS FOR MANAGEMENTAssignment Set- 1 (60 Marks)

    Ques1: (a) Statistics is the backbone of decision-making. Comment.

    (b) Statistics is as good as the user. Comment.

    Ans a) Statistics is the backbone of decision-making :-

    With the proper application of statistics and statistical software package on the collected data ,managers can take effective decision, which can increase the profits in a business.The word decisionsuggests a deliberate choice made out of several possible alternative coursesof action after carefully considering them. The act of choice signifying solution to an economic

    problem is economic decision making. It involves choices among a set of alternative courses ofaction.Decision-making is essentially a process of selecting the best out of many alternativeopportunities or courses of action that are open to a management.The choice made by the business executives are difficult, crucial and have far-reachingconsequences. The basic aim of taking a decision is to select the best course of action whichmaximizes the economic benefits and minimizes the use of scarce resources of a firm. Hence,each decision involves cost-benefit analysis. Any slight error or delay in decision making maycause considerable economic and financial damage to a firm. It is for this reason, managementexperts are of the opinion that right decision making at the right time is the secret of asuccessful manager.

    Due to advanced communication network, rapid changes in consumer behaviour, variedexpectations of variety of consumers and new market openings, modern managers have adifficult task of making quick and appropriate decisions. Therefore, there is a need for them todepend more upon quantitative techniques like mathematical models, statistics, operationsresearch and econometrics.Decision making is a key part of our day-to-day life. Even when we wish to purchase atelevision, we like to know the price, quality, durability, and maintainability of various brandsand models before buying one. As you can see, in this scenario we are collecting data andmaking an optimum decision. In other words, we are using Statistics.Again, suppose a company wishes to introduce a new product, it has to collect data on marketpotential, consumer likings, availability of raw materials, feasibility of producing the product.Hence, data collection is the back-boneof any decision making process.Many organisations find themselves data-rich but poor in drawing information from it.Therefore, it is important to develop the ability to extract meaningful information from raw datato make better decisions. Statistics play an important role in this aspect. Statistics is broadlydivided into two main categories. The two categories of Statistics are descriptive statistics andinferential statistics.

    Descriptive Statistics: Descriptive statistics is used to present the general description of data

  • 8/3/2019 Statistics Both Set

    2/20

    which is summarised quantitatively. This is mostly useful in clinical research, whencommunicating the results of experiments.

    Inferential Statistics: Inferential statistics is used to make valid inferences from the data whichare helpful in effective decision making for managers or professionals.Statistical methods such as estimation, prediction and hypothesis testing belong to inferential

    statistics. The researchers make deductions or conclusions from the collected data samplesregarding the characteristics of large population from which the samples are taken.So, we can say Statistics is the backbone of decision-making.

    Ans (b) Statistics is as good as the user:

    Statistics is used for various purposes. It is used to simplify mass data and to make comparisonseasier. It is also used to bring out trends and tendencies in the data as well as the hidden relationsbetween variables. All this helps to make decision making much easier.Let us look at each function of Statistics in detail:1.Statistics simplifies mass data

    The use of statistical concepts helps in simplification of complex data. Using statistical concepts,the managers can make decisions more easily. The statistical methods help in reducing thecomplexity of the data and consequently in the understanding of any huge mass of data.2.Statistics makes comparison easierWithout using statistical methods and concepts, collection of data and comparison cannot bedone easily. Statistics helps us to compare data collected from different sources. Grand totals,measures of central tendency, measures of dispersion, graphs and diagrams, coefficient ofcorrelation all provide ample scopes for comparison.

    3.Statistics brings out trends and tendencies in the dataAfter data is collected, it is easy to analyse the trend and tendencies in the data by using the

    various concepts of Statistics.4.Statistics brings out the hidden relations between variablesStatistical analysis helps in drawing inferences on data. Statistical analysis brings out the hiddenrelations between variables.5.Decision making power becomes easierWith the proper application of Statistics and statistical software packages on the collected data,

    managers can take effective decisions, which can increase the profits in a business.Seeing all these functionality we can say Statistics is as good as the user.

    Ques2: Distinguish between the following with example.

    (a) Inclusive and Exclusive limits:Inclusive series is the one which doesn't consider the upper limit.For example00-1010-2020-3030-40

  • 8/3/2019 Statistics Both Set

    3/20

    40-50In the first one (00-10), we will consider numbers from 00 to 9.99 only. And 10 will beconsidered in 10-20. So this is known as inclusive series.

    Exclusive series is the one which has both the limits included.

    For example00-0910-1920-2930-3940-49Here, both 00 and 09 will come under the first one (00-09). And 10 will come under thenext one

    (b) Continuous and discrete data:

    The numerical data that we will use in this course falls into 1 of 2 categories : discrete andcontinuous.

    A type of data is discrete if there are only a finite number of values possible or if there is a spaceon the number line between each 2 possible values.

    Ex. A 5 question quiz is given in a Math class. The number of correct answers on a student's quizis an example of discrete data. The number of correct answers would have to be one of thefollowing : 0, 1, 2, 3, 4, or 5. There are not an infinite number of values, therefore this data isdiscrete. Also, if we were to draw a number line and place each possible value on it, we wouldsee a space between each pair of values.

    Ex. In order to obtain a taxi license in Las Vegas, a person must pass a written exam regardingdifferent locations in the city. How many times it would take a person to pass this test is also anexample of discrete data. A person could take it once, or twice, or 3 times, or 4 times, or . So,the possible values are 1, 2, 3, . There are infinitely many possible values, but if we were toput them on a number line, we would see a space between each pair of values.

    Discrete data usually occurs in a case where there are only a certain number of values, or whenwe are counting something (using whole numbers).

    Continuous data makes up the rest of numerical data. This is a type of data that is usuallyassociated with some sort of physical measurement.

    Ex. The height of trees at a nursery is an example of continuous data. Is it possible for a tree tobe 76.2" tall? Sure. How about 76.29"? Yes. How about 76.2914563782"? You betcha! Thepossibilities depends upon the accuracy of our measuring device.

    One general way to tell if data is continuous is to ask yourself if it is possible for the data to takeon values that are fractions or decimals. If your answer is yes, this is usually continuous data.

  • 8/3/2019 Statistics Both Set

    4/20

    Ex. The length of time it takes for a light bulb to burn out is an example of continuous data.Could it take 800 hours? How about 800.7? 800.7354? The answer to all 3 is yes.

    (C) Qualitative and Quantitative data:

    Qualitative data is a categorical measurement expressed not in terms of numbers, but rather bymeans of a natural language description. In statistics, it is often used interchangeably with"categorical" data.

    For example: favorite color = "blue"height = "tall"

    Although we may have categories, the categories may have a structure to them. When there isnot a natural ordering of the categories, we call these nominal categories. Examples might begender, race, religion, or sport.

    When the categories may be ordered, these are called ordinal variables. Categorical variables that judge size (small, medium, large, etc.) are ordinal variables. Attitudes (strongly disagree,disagree, neutral, agree, strongly agree) are also ordinal variables, however we may not knowwhich value is the best or worst of these issues. Note that the distance between these categories isnot something we can measure.

    Quantitative data is a numerical measurement expressed not by means of a natural languagedescription, but rather in terms of numbers. However, not all numbers are continuous andmeasurable. For example, the social security number is a number, but not something that one canadd or subtract.

    For example: favorite color = "450 nm"height = "1.8 m"

    Quantitative data always are associated with a scale measure.

    (d) Class limits and class intervals:

    Class limits

    If we divide a set of data into classes, there are clearly going to be values which form dividinglines between the classes. These values are called class limits. Class limits must be chosen with

    considerable care, paying attention both to the form of the data and the use to which it is to be

    put.

    Consider our grouped distribution of heights. Why could we not simply state the first two

    classes as 160165 cm, and 165170 cm, rather than 160 to under 165 cm, etc.? The reason is

    that it is not clear into which class a measurement of exactly 165 cm would be put. We could not

    http://wiktionary.org/wiki/Qualitativehttp://wiktionary.org/wiki/Quantitativehttp://en.wikipedia.org/wiki/Bluehttp://wiktionary.org/wiki/Qualitativehttp://wiktionary.org/wiki/Quantitativehttp://en.wikipedia.org/wiki/Blue
  • 8/3/2019 Statistics Both Set

    5/20

    put it into both, as this would produce double counting, which must be avoided at all costs. Is

    one possible solution to state the classes in such terms as 160164 cm, 165 169 cm? It would

    appear to solve this problem as far as our data is concerned. But what would we do with a value

    of 164.5 cm? This immediately raises a query regarding the recording of the raw data.

    Class Intervals

    The width of a class is the difference between its two class limits, and is known as the classinterval. It is essential that the class interval should be able to be used in calculations, and for thisreason we need to make a slight approximation in the case of continuous variables.

    The true class limits of the first class in our distribution of heights (if the data has been rounded)

    are 159.5 cm and 164.4999 ... cm. Therefore the class interval is 4.999 ... cm. However, for

    calculation purposes we approximate slightly and say that because the lower limit of the first

    class is 159.5 cm and that of the next class is 164.5 cm, the class interval of the first class is the

    difference between the two, i.e. 5 cm.

    Ques 3: In a management class of 70 students three languages are offered as an additionalsubject viz. Hindi, English and Kannada. There are 28 students taking Hindi, 26 taking Kannadaand 16 taking English. There are 12 students taking both Hindi and Kannada, 4 taking Hindi andEnglish and 6 that are taking English and Kannada. In addition, we know that 2 students aretaking all the three languages.

    i) If a student is chosen randomly, what is the probability that he/ she is taking exactly onelanguage?

    Ans :- P(H) = 28/70 P(E)=16/70 P(K)=26/70

    P(HK)=12/70 P(EK)=6/70 P(HE)=4/70

    P(HKE) = 2/70

    P(H U K U E) = P(H) + P (E) + P(K) - P(HK) - P(HE) P ( EK) + P(HKE)

    =28/70+16/70+26/70-12/70-6/70-4/70 +2/70

    =50/70= 5/7Ques 4: List down various measures of central tendency and explain the difference betweenthem?

    Ans: Condensation of data is necessary for a proper statistical analysis. A large number of bignumbers are not only confusing to mind but also difficult to analyze. After a thorough scrutiny of

  • 8/3/2019 Statistics Both Set

    6/20

    collected data, classification which is a process of arranging data into different homogenousclasses according to resemblances and similarities is carried out first. Then of course tabulationof data is resorted to. The classification and tabulation of the collected data besides removing thecomplexity render condensation and comparison. An average is defined as a value which shouldrepresent the whole mass of data. It is atypical or central value summarizing the whole data. It is

    also called a measure of central tendency for the reason that the individual values in the datashow some tendency to centre about this average. It will be located in between the minimum andthe maximum of the values in the data. There are five types of average which are ArithmeticMean, Median, Mode, Geometric and Harmonic Mean Arithmetic Mean. The Arithmetic meanor simply the mean is the best known easily understood and most frequently used average in anystatistical analysis. It is defined as the sum of all the values in the data .Median: Median isanother widely known and frequently used average. It is defined as the most central or themiddle most value of the data given in the form of an array. By an array, we mean anarrangement of the data either in ascending order or descending order of magnitude. In the caseof ungrouped data one has to form an array first and then locate the middle most value which isthe median. For ungrouped data the median is fixed by using, Median = [n+1/2] the value in the

    array. Mode: The word mode seems to have been derived French 'a la mode'which means 'thatwhich is in fashion'. It is defined as the value in the data which occurs most frequently. In otherwords, it is the most frequently occurring value in the data. For ungrouped data we form thearray and then fix the mode as the value which occurs most frequently. If all the values aredistinct from each other, mode cannot be fixed. For a frequency distribution with just one highestfrequency such data are called uni modal or two highest frequencies [such data are calledbimodal], mode is found by using the formula, Mode = l+cf2/f1+f2 Where l is the lower limit ofthe model class, c is its class interval f1 is the frequency preceding the highest frequency and f2is the frequency succeeding the highest frequency.

    Relative merits and demerits of Mean, Median and Mode:Mean: The mean is the most commonly and frequently used average. It is a simple average,understandable even to a layman. It is based on all the values in a given data. It is easy tocalculate and is basic to the calculation of further statistical measures of dispersion, correlationetc. Of all the averages, it is the most stable one. However it has some demerits. It gives undueweightages to extreme value. In other words it is greatly influenced by extreme values.Moreover; it cannot be calculated for data with open -ended classes at the extreme. It cannot befixed graphically unlike the median or the mode. It is the most useful average of analysis whenthe analysis is made with full reference to the nature of individual values of the data. In spite of afew shortcomings; it is the most satisfactory average. Median: The median is another well-known and widely used average. It is well-defined formula and is easily understood. It isadvantageously used as a representative value of such factors or qualities which cannot bemeasured. Unlike the mean, median can be located graphically. It is also possible to find themedian for data with open ended classes at the extreme. It is amenable for further algebraicprocesses. However, it is an average, not based on all the values of the given data. It is not asstable as the mean. It has only a limited use in practice. Mode: It is a useful measure of centraltendency, as a representative of the majority of values in the data. It is a practical average, easilyunderstood by even laymen. Its calculations are not difficult. It can be ascertained even for datawith open-ended classes at the extreme. It can be located by graphical means using a frequency

  • 8/3/2019 Statistics Both Set

    7/20

    curve. The mode is not based on all the values in the data. It becomes less useful when the datadistribution is not uni-model. Of all the averages, it is the most unstable average

    Ques5: Define population and sampling unit for selecting a random sample in each of the

    following cases.a) Hundred voters from a constituencyb) Twenty stocks of National Stock Exchangec) Fifty account holders of State Bank of Indiad) Twenty employees of Tata motors.

    Ans: Population: A population is a collection of data whose properties are analyzed. Thepopulation is the complete collection to be studied, it contains all subjects of interest. Apopulation can be defined as including all people or items with the characteristic one wishes tounderstand. Because there is very rarely enough time or money to gather information from

    everyone or everything in a population, the goal becomes finding a representative sample (orsubset) of that population.

    A sample is apartof the population of interest, a sub-collection selected from a population.

    Random sampling: A sample is a subject chosen from a population for investigation. A randomsample is one chosen by a method involving an unpredictable component. Random sampling canalso refer to taking a number of independent observations from the same probability distribution,without involving any real population. The sample usually is not a representative of thepopulation from which it was drawn this random variation in the results is known as samplingerror. In the case of random samples, mathematical theory is available to assess the sampling

    error. Thus, estimates obtained from random samples can be accompanied by measures of theuncertainty associated with the estimate. This can take the form of a standard error, or if thesample is large enough for the central limit theorem to take effect, confidence intervals may becalculated.

    Types of random sample

    A simple random sample is selected so that all samples of the same size have an equalchance of being selected from the population.

    A self-weighting sample, also known as an EPSEM (Equal Probability of SelectionMethod) sample, is one in which every individual, or object, in the population of interesthas an equal opportunity of being selected for the sample. Simple random samples areself-weighting.

    Stratified sampling involves selecting independent samples from a number ofsubpopulations, group or strata within the population. Great gains in efficiency aresometimes possible from judicious stratification.

    Cluster sampling involves selecting the sample units in groups. For example, a sample oftelephone calls may be collected by first taking a collection of telephone lines andcollecting all the calls on the sampled lines. The analysis of cluster samples must take

    http://en.wikipedia.org/wiki/Sampling_errorhttp://en.wikipedia.org/wiki/Sampling_errorhttp://en.wikipedia.org/wiki/Standard_error_(statistics)http://en.wikipedia.org/wiki/Central_limit_theoremhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Simple_random_samplehttp://en.wikipedia.org/wiki/Stratified_samplinghttp://en.wikipedia.org/wiki/Cluster_samplinghttp://en.wikipedia.org/wiki/Sampling_errorhttp://en.wikipedia.org/wiki/Sampling_errorhttp://en.wikipedia.org/wiki/Standard_error_(statistics)http://en.wikipedia.org/wiki/Central_limit_theoremhttp://en.wikipedia.org/wiki/Confidence_intervalhttp://en.wikipedia.org/wiki/Simple_random_samplehttp://en.wikipedia.org/wiki/Stratified_samplinghttp://en.wikipedia.org/wiki/Cluster_sampling
  • 8/3/2019 Statistics Both Set

    8/20

    into account the intra-cluster correlation which reflects the fact that units in the samecluster are likely to be more similar than two units picked at random.

    The most widely known type of a random sample is the simple random sample (SRS). This ischaracterized by the fact that the probability of selection is the same for every case in thepopulation. Simple random sampling is a method of selecting n units from a population of size Nsuch that every possible sample of size an has equal chance of being drawn.

    An example may make this easier to understand. Imagine you want to carry out a survey of 100voters from a constituency with a population of 1,000 eligible voters. With in a constituency,there are "old-fashioned" ways to draw a sample. For example, we could write the names of allvoters on a piece of paper, put all pieces of paper into a box and draw 100 tickets at random. Youshake the box, draw a piece of paper and set it aside, shake again, draw another, set it aside, etc.until we had 100 slips of paper. These 100 form our sample. And this sample would be drawn

    through a simple random sampling procedure - at each draw, every name in the box had the sameprobability of being chosen.

    If you are collecting data on a large group of people (called a "population"), you might want tominimize the impact that the survey will have on the group that you are surveying. It is often notnecessary to survey the entire population. Instead, you can select a random sample of peoplefrom the population and survey just them. You can then draw conclusions about how the entirepopulation would respond based on the responses from this randomly selected group of people.This is exactly what political pollsters do - they ask a group of people a list of questions andbased on their results, they draw conclusions about the population as a whole with those oftenheard disclaimers of "plus or minus 5%."

    If your population consists of just a few hundred people, you might find that you need to surveyalmost all of them in order to achieve the level of accuracy that you desire. As the populationsize increases, the percentage of people needed to achieve a high level of accuracy decreasesrapidly. In other words, to achieve the same level of accuracy:Larger population = Smaller percentage of people surveyed

    Smaller population = Larger percentage of people surveyed

    Ques6: What is a confidence interval, and why it is useful? What is a confidence level?

    Ans: Confidence Interval: A confidence interval gives an estimated range of values which islikely to include an unknown population parameter, the estimated range being calculated from agiven set of sample data. The confidence intervalis the plus-or-minus figure usually reported innewspaper or television opinion poll results. For example, if you use a confidence interval of 4and 47% percent of your sample picks an answer you can be "sure" that if you had asked thequestion of the entire relevant population between 43% (47-4) and 51% (47+4) would havepicked that answer. In statistics, a confidence interval (CI) is a particular kind of intervalestimate of apopulation parameterand is used to indicate the reliability of an estimate. It is anobserved interval (i.e it is calculated from the observations), in principle different from sample to

    http://en.wikipedia.org/wiki/Intra-cluster_correlationhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Population_parameterhttp://en.wikipedia.org/wiki/Intra-cluster_correlationhttp://en.wikipedia.org/wiki/Statisticshttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Interval_estimationhttp://en.wikipedia.org/wiki/Population_parameter
  • 8/3/2019 Statistics Both Set

    9/20

    sample, that frequently includes the parameter of interest, if the experiment is repeated. Howfrequently the observed interval contains the parameter is determined by the confidence level orconfidence coefficient.

    A confidence interval with a particular confidence level is intended to give the assurance that, if

    the statistical model is correct, then taken over all the data that mighthave been obtained, theprocedure for constructing the interval would deliver a confidence interval that included the truevalue of the parameter the proportion of the time set by the confidence level. More specifically,the meaning of the term "confidence level" is that, if confidence intervals are constructed acrossmany separate data analyses of repeated (and possibly different) experiments, the proportion ofsuch intervals that contain the true value of the parameter will approximately match theconfidence level; this is guaranteed by the reasoning underlying the construction of confidenceintervals.

    ExampleSuppose a student measuring the boiling temperature of a certain liquid observes the readings (in

    degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different samples of theliquid. He calculates the sample mean to be 101.82. If he knows that the standard deviation forthis procedure is 1.2 degrees, what is the confidence interval for the population mean at a 95%confidence level?

    In other words, the student wishes to estimate the true mean boiling temperature of the liquidusing the results of his measurements. If the measurements follow a normal distribution, then the

    sample mean will have the distribution N( , ). Since the sample size is 6, the standarddeviation of the sample mean is equal to 1.2/sqrt(6) = 0.49.

    Confidence level: The confidence level tells you how sure you can be. It is expressed as apercentage and represents how often the true percentage of the population who would pick ananswer lies within the confidence interval. The 95% confidence level means you can be 95%certain; the 99% confidence level means you can be 99% certain. Most researchers use the 95%confidence level.

    Confidence level is a percentage of confidence in a finding.For example, if an insurance company's total Loss Reserves should be $10,000,000 in order toattain an 80% confidence level that enough money will be available to pay anticipated claims,

    then, in 8 times out of 10, after all claims have been settled the total claims paid out will be lessthan $10,000,000. Conversely, in 2 times out of 10 the total claims paid out will be greater than$10,000,000. In another example, a 70% confidence level of one's house burning would meanthat the house would burn approximately once every 3.33 years [1 _ (1-0.70) = 3.33].When you put the confidence level and the confidence interval together, you can say that you are95% sure that the true percentage of the population is between 43% and 51%.

    Statistical measure of the number of times out of 100 that test results can be expected to be

    http://www.answers.com/topic/bad-debthttp://www.answers.com/topic/bad-debt
  • 8/3/2019 Statistics Both Set

    10/20

    within a specified range. For example, a confidence level of 95% means that the result of anaction will probably meet expectations 95% of the time. Most analyses of variance or correlationare described in terms of some level of confidence.

    The wider the confidence interval you are willing to accept, the more certain you can be that the

    whole population answers would be within that range. For example, if you asked a sample of1000 people in a city which brand of cola they preferred, and 60% said Brand A, you can be verycertain that between 40 and 80% of all the people in the city actually do prefer that brand, butyou cannot be so sure that between 59 and 61% of the people in the city prefer the brand.

    -*-*-*-*-

  • 8/3/2019 Statistics Both Set

    11/20

    MBA SEMESTER 1

    MB0040 STATISTICS FOR MANAGEMENT- 4 Credits(Book ID: B1129)

    Assignment Set- 2 (60 Marks)

    Note: Each question carries 10 Marks. Answer all the questions

    Ques1: What are the characteristics of a good measure of central tendency?

    Ans: Central tendency is a statistical measure to determine a single score that defines the centerof the distribution. The goal of central tendency is to find the single score that is most typical ormost representative of the entire group.

    Three measures of central tendency are: mean, median, and mode.

    The mean for a distribution is the sum of the scores divided by the number of scores.

    SampleMean= Number of Scores Population Mean= Sum of the ScoresSum of the Scores Number of Scores

    M= x = Xn N

    Sample of Psychology Faculty Salaries Population of Journalism Faculty Salaries

    (in thousands of dollars) (in thousands of dollars)

    35 4933 5142 4555 5965 5146 6046 325042

    Sample Mean = Population Mean =Interpretation Interpretation

    Some characteristics of the mean include:

    Every score influences the mean. Changing a score changes the mean. Adding or subtracting a score changes the mean (unless the score equals the mean).

  • 8/3/2019 Statistics Both Set

    12/20

    If a constant value is added to every score, the same constant will be added to the mean. If aconstant value is subtracted from every score, the same constant will be subtracted from themean. If every score is multiplied or divided by a constant, the mean will change in the same way It is inappropriate to use the mean to summarize nominal and ordinal data; it is appropriate to

    use the mean to summarize interval and ratio data. If the distribution is skewed or has some outliers, the mean will be distorted.

    Median: If the scores in a distribution are listed in order, the median is the midpoint of the list.Half of the scores are below the median; half of the scores are above the median.

    1. Place the data in descending order. (Ascending would have worked too.)2. Find the score that cuts the sample into two halves.

    Age Soda Pops Consumed Today

    19 618 521 735 440 256 20

    Median = Median =Interpretation Interpretation

    Characteristics of the Median include:1. It is inappropriate to use the median to summarize nominal data; it is appropriate to use themedian tosummarize ordinal, interval, and ratio data.2. The median depends on the frequency of the scores, not on the actual values.3. The median is not distorted by outliers or extreme scores.4. The median is the preferred measure of central tendency when the distribution is skewed ordistorted by outliers.

    Mode: In a frequency distribution, the mode is the score or category that has the greatestfrequency.

    Restaurant Meals During Past Week Favorite Restaurant

    6 Chilis4 Charleys

  • 8/3/2019 Statistics Both Set

    13/20

    6 La Siesta7 La Siesta6 O Charleys5 La Siesta13 La Siesta

    8 Outback Steakhouse8 Taco BellMode = Interpretation Mode = Interpretation

    Characteristics of the Mode include: The mode may be used to summarize nominal, ordinal, interval, and ratio data. There may be more than one mode. The mode may not exist.

    Relationships among the Mean, Median, and Mode The mean and median are equal if the distribution is symmetric. The mean, median, and mode are equal if the distribution is uni modal and symmetric. Otherwise, they do not give you the same answer.

    (b) What are the uses of averages?

    Ans: Definition of Average: n mathematics, an average, or central tendencyof a data set is ameasure of the "middle" value of the data set.

    There are many different descriptive statistics that can be chosen as a measurement of the centraltendency of the data items. These include arithmetic mean, the median and the mode. Otherstatistical measures such as the standard deviation and the range are called measures of spreadand describe how spread out the data is.

    An average is a single value that is meant to typify a list of values. If all the numbers in the listare the same, then this number should be used. If the numbers are not the same, the average iscalculated by combining the values from the set in a specific way and computing a single numberas being the average of the set.

    The most common method is the arithmetic mean but there are many other types of centraltendency, such as median (which is used most often when the distribution of the values isskewed with some small numbers of very high values, as seen with house prices or incomes).

    Mean of the given numbers could be equivalent to the adding up of all the terms and totalnumbers of terms in the given list. The mean value is said to be average in statistics. The formulais,

    Mean = Average value of the given numbers

    http://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Data_sethttp://en.wikipedia.org/wiki/Descriptive_statisticshttp://en.wikipedia.org/wiki/Arithmetic_meanhttp://en.wikipedia.org/wiki/Medianhttp://en.wikipedia.org/wiki/Mode_(statistics)http://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Interval_(mathematics)http://en.wikipedia.org/wiki/Arithmetic_meanhttp://en.wikipedia.org/wiki/Medianhttp://en.wikipedia.org/wiki/Frequency_distributionhttp://en.wikipedia.org/wiki/Skewnesshttp://en.wikipedia.org/wiki/Mathematicshttp://en.wikipedia.org/wiki/Data_sethttp://en.wikipedia.org/wiki/Descriptive_statisticshttp://en.wikipedia.org/wiki/Arithmetic_meanhttp://en.wikipedia.org/wiki/Medianhttp://en.wikipedia.org/wiki/Mode_(statistics)http://en.wikipedia.org/wiki/Standard_deviationhttp://en.wikipedia.org/wiki/Interval_(mathematics)http://en.wikipedia.org/wiki/Arithmetic_meanhttp://en.wikipedia.org/wiki/Medianhttp://en.wikipedia.org/wiki/Frequency_distributionhttp://en.wikipedia.org/wiki/Skewness
  • 8/3/2019 Statistics Both Set

    14/20

    Example 1:

    Find the mean of the numbers, 60,47,55,39,18,22.

    Solution:

    Given numbers are, 60,47,55,39,18,22.

    The formula for mean is,

    Mean = Average value of the given numbers

    Here, the adding up of all the numbers = 60+47+55+39+18+22

    = 241

    Total number of given numbers = 6

    Therefore, Mean =

    = 40.17

    Answer: Mean = 40.17

    Ques2: Your company has launched a new product .Your company is a reputed company with50% market share of similar range of products. Your competitors also enter with their new products equivalent to your new product. Based on your earlier experience, you initiallyestimated that, your market share of the new product would be 50%. You carry out randomsampling of 25 customers who have purchased the new product ad realize that only eight of themhave actually purchased your product. Plan a hypothesis test to check whether you are likely tohave a half of market share.

    Ans :-

    The null hypothesis H0 : P = 0.5

    Sample proportion Ps= 8/15 = 0.32

    Using Z the statics we have

    X= (PQ/2)1/2

    Z = |P PS| /X

    X =(0.5 *(1-0.5)/2)1/2 Z= 0.5 0.32 /0.1 = 1.8

  • 8/3/2019 Statistics Both Set

    15/20

    Zcal = 1.8

    At 0.01 % significance level Ztab = 2.33

    Since Zcal < Ztab H0 is accepted hence at 0.01 % significance level we are likely to have half ofthe market share.

    Ques3: The upper and the lower quartile income of a group of workers are Rs 8 and Rs 3 per dayrespectively. Calculate the Quartile deviations and its coefficient?

    Ans: Unlike Range, quartile deviation doesnot involve the extreme values. It is defined as:Q.D. = | Q3 Q1|

    2

    Here, Q3 = Rs 8

    Q1 = Rs 3There fore,Q.D. = | 8 3 |

    2

    Q.D. = | 5 |

    2

    Q.D. = 2.5

    To find Coefficient of quartile deviation, formula isCoefficient of quartile deviation = Q3 Q1

    Q3 + Q1

    Coefficient of Q.D. = 8 38 + 3

    = 511

    Therefore, Coefficient of Quartile Deviation is 0.45

    Ques4: The cost of living index number on a certain data was 200. From the base period, thepercentage increases in prices wereRent Rs 60, clothing Rs 250, Fuel and Light Rs 150 andMiscellaneous Rs 120. The weights for different groups were food 60, Rent 16, clothing 12, Fueland Light 8 and Miscellaneous 4. Calculate the % change in food.

  • 8/3/2019 Statistics Both Set

    16/20

    Ans :-

    Group %income inprice

    Current index(P) Weight (w) P *w

    Food X 100+X 60 (100+X) 60

    Rent 60 160 16 2560

    Clothing 250 350 12 4200

    Fuel & light 150 250 8 2000

    Misc 120 320 4 880

    Cost of living index of current year = 200

    Pw/w=200

    60X+15640/100 = 200

    X= 72.67

    Ques5: Education seems to be a difficult field in which to use quality techniques. One possibleoutcome measures for colleges is the graduation rate (the percentage of the studentsmatriculating who graduate on time). Would you recommend using P or R charts to examine

    graduation rates at a school? Would this be a good measure of Quality?

    Ans: The p-chart is a type ofcontrol chart used to monitor the proportion ofnonconformingunits in a sample, where the sample proportion nonconforming is defined as the ratio of thenumber of nonconforming units to the sample size, n.

    The p-chart only accommodates "pass"/"fail"-type inspection as determined by one or more go-no go gauges or tests, effectively applying the specifications to the data before they're plotted onthe chart. Due to this sensitivity to the underlying assumptions, p-charts are often implementedincorrectly, with control limits that are either too wide or too narrow, leading to incorrectdecisions regarding process stability. A p-chart is a form of the Individuals chart

    In statistical quality control R chart is a type ofcontrol chart used to monitor a variables datawhen samples are collected at regular intervals from abusiness orindustrial process.

    The chart is advantageous in the following situations:

    1. The sample size is relatively small .The sample size is constant2. Humans must perform the calculations for the chart

    http://en.wikipedia.org/wiki/Control_charthttp://en.wikipedia.org/wiki/Nonconformity_(quality)http://en.wikipedia.org/wiki/Nonconformity_(quality)http://en.wikipedia.org/wiki/Sample_(statistics)http://en.wikipedia.org/wiki/Go-NoGo_gaugehttp://en.wikipedia.org/wiki/Go-NoGo_gaugehttp://en.wikipedia.org/wiki/Specification_(technical_standard)http://en.wikipedia.org/wiki/Shewhart_individuals_control_charthttp://en.wikipedia.org/wiki/Statistical_process_controlhttp://en.wikipedia.org/wiki/Control_charthttp://en.wikipedia.org/wiki/Business_processhttp://en.wikipedia.org/wiki/List_of_industrial_processeshttp://en.wikipedia.org/wiki/Control_charthttp://en.wikipedia.org/wiki/Nonconformity_(quality)http://en.wikipedia.org/wiki/Nonconformity_(quality)http://en.wikipedia.org/wiki/Sample_(statistics)http://en.wikipedia.org/wiki/Go-NoGo_gaugehttp://en.wikipedia.org/wiki/Go-NoGo_gaugehttp://en.wikipedia.org/wiki/Specification_(technical_standard)http://en.wikipedia.org/wiki/Shewhart_individuals_control_charthttp://en.wikipedia.org/wiki/Statistical_process_controlhttp://en.wikipedia.org/wiki/Control_charthttp://en.wikipedia.org/wiki/Business_processhttp://en.wikipedia.org/wiki/List_of_industrial_processes
  • 8/3/2019 Statistics Both Set

    17/20

    The "chart" actually consists of a pair of charts: One to monitor the process standard deviation

    (as approximated by the sample moving range) and another to monitor the process mean,

    Ques6: (a) Why do we use a chi-square test?

    Ans: Chi-square is a statistical test commonly used to compare observed data with data wewould expect to obtain according to a specific hypothesis. A chi-square test (also chi squared testor 2 test) is any statistical hypothesis test in which the sampling distribution of the test statistic isa chi-square distribution when the null hypothesis is true, or any in which this is asymptoticallytrue, meaning that the sampling distribution (if the null hypothesis is true) can be made toapproximate a chi-square distribution as closely as desired by making the sample size largeenough.

    For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to bemale and the actual observed number was 8 males, then you might want to know about the"goodness to fit" between the observed and expected. Were the deviations (differences betweenobserved and expected) the result of chance, or were they due to other factors. How muchdeviation can occur before you, the investigator, must conclude that something other than chanceis at work, causing the observed to differ from the expected. The chi-square test is always testingwhat scientists call the null hypothesis, which states that there is no significant differencebetween the expected and observed result.

    The formula for calculating chi-square (Chi-square is a statistical test commonly used tocompare observed data with data we would expect to obtain according to a specific hypothesis.

    For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to bemale and the actual observed number was 8 males, then you might want to know about the"goodness to fit" between the observed and expected. Were the deviations (differences betweenobserved and expected) the result of chance, or were they due to other factors. How muchdeviation can occur before you, the investigator, must conclude that something other than chanceis at work, causing the observed to differ from the expected. The chi-square test is always testingwhat scientists call the null hypothesis, which states that there is no significant differencebetween the expected and observed result.

    The chi-square is one of the most popular statistics because it is easy to calculate and interpret.There are two kinds of chi-square tests. The first is called a one-way analysis, and the second is

    called a two-way analysis. The purpose of both is to determine whether the observed frequencies(counts) markedly differ from the frequencies that we would expect by chance.

    The chi-square test statistic can be used to evaluate whether there is an association between therows and columns in a contingency table. More specifically, this statistic can be used todetermine whether there is any difference between the study groups in the proportions of the riskfactor of interest. Returning to our example, the chi-square statistic could be used to test whetherthe proportion of individuals who smoke differs by asthmatic status.

    http://en.wikipedia.org/wiki/Range_(statistics)http://www.answers.com/topic/statisticshttp://www.answers.com/topic/statistical-hypothesis-testinghttp://www.answers.com/topic/sampling-distributionhttp://www.answers.com/topic/chi-square-distributionhttp://www.answers.com/topic/null-hypothesishttp://en.wikipedia.org/wiki/Range_(statistics)http://www.answers.com/topic/statisticshttp://www.answers.com/topic/statistical-hypothesis-testinghttp://www.answers.com/topic/sampling-distributionhttp://www.answers.com/topic/chi-square-distributionhttp://www.answers.com/topic/null-hypothesis
  • 8/3/2019 Statistics Both Set

    18/20

    The chi-square test statistic is designed to test the null hypothesis that there is no associationbetween the rows and columns of a contingency table.

    Example:

    A year group in school chooses between drama and history as below. Is there any differencebetween boys' and girls' choices?

    Observed

    Chosedrama Chosehistory Total

    Boys 43 55 98

    Girls 52 54 106

    Total 95 109 204

    Expected = (row tot * col tot)/overall tot

    Chose

    drama

    Chose

    history Total

    Boys 45.6 52.4 98

    Girls 49.4 56.6 106

    Total 95 109 204

  • 8/3/2019 Statistics Both Set

    19/20

    (observed - expected)^2/expected

    Chose

    drama

    Chose

    history Total

    Boys 0.2 0.1

    Girls 0.1 0.1

    Total 0.55

    Chi-square is 0.55. There are (2-1)*(2-1) = 1 degree of freedom. Checking the Chi Square tableshows 0.55 is between 0.004 and 3.84, so no conclusion can be drawn about independence orsimilarity between boys' and girls' choices.

    (b) Why do we use analysis of variance?

    Ans: Analysis of variance (ANOVA) is a statistical technique that can be used to evaluate

    whether there are differences between the average value, or mean, across several population

    groups. With this model, the response variable is continuous in nature, whereas the predictor

    variables are categorical.

    For example, in a clinical trial of hypertensive patients, ANOVA methods could be used to

    compare the effectiveness of three different drugs in lowering blood pressure. Alternatively,

    ANOVA could be used to determine whether infant birth weight is significantly different among

    mothers who smoked during pregnancy relative to those who did not. In the simplest case, where

    two population means are being compared, ANOVA is equivalent to the independent two-sample

    t-test.

    The analysis of variance is process of resolving the total variation into its separate components

    that measure different sources of variance. If we have to test the equality of means between more

    than two populations, analysis of variance is used.

    To test the equality of two means of a population we use t-test. But if we have more than two

    populations, t-test is applied pairwise on all the populations. This pairwise comparison is

    practically impossible and time consuming so, we use analysis of variance.

    In analysis of variance all the populations of interest must have normal distribution. We assume

    that all the normal populations have equal variances. The populations from which the samples

    are taken are considered as independent.

    There are three methods of analysis of variance. Complete randomize design is used when one

    http://changingminds.org/explanations/research/analysis/chi_square_table.htmhttp://www.blurtit.com/q806342.htmlhttp://www.blurtit.com/q4455174.htmlhttp://changingminds.org/explanations/research/analysis/chi_square_table.htmhttp://www.blurtit.com/q806342.htmlhttp://www.blurtit.com/q4455174.html
  • 8/3/2019 Statistics Both Set

    20/20

    variable is involved. When two variables are involved then Randomization complete block

    design is used. Latin square design is a very effective method for three variables. An analysis of

    the variation between all of the variables used in an experiment.

    Analysis of variance is used in finance in several different ways, such as to forecasting the

    movements of security prices by first determining which factors influence stock fluctuations.This analysis can provide valuable insight into the behavior of a security or market index under

    various conditions.

    The easiest way to understand ANOVA is through a concept known as value splitting. ANOVA

    splits the observed data values into components that are attributable to the different levels of the

    factors. Value splitting is best explained by example:

    Thesimplest example of value splitting is when we just have one level of one factor.Suppose we have a turning operation in a machine shop where we are turning pins to a diameterof .125 +/- .005 inches. Throughout the course of a day we take five samples of pins and obtain

    the following measurements: .125, .127, .124, .126, .128. We can split these data values into acommon value (mean) and residuals (what's left over) as follows:

    .125 .127 .124 .126 .128

    =

    .126 .126 .126 .126 .126

    +

    -.001 .001 -.002 .000 .002

    From these tables, also called overlays, we can easily calculate the location and spread of thedata as follows:

    mean = .126

    std. deviation = .0016.

    -*-*-*-*-