Arya Statistics 11

Embed Size (px)

Citation preview

  • 7/31/2019 Arya Statistics 11

    1/12

    Quantitative Techniques 2012

    1

    Project Report

    OnQuantitative Statistics

    Submitted to:Prof. Venkatesh ShekharCourse In-chargeManagerial Statistics

    Roll No-P301311CMG216

    NIIT University, Neemrana

    Rajasthan

    Submitted by:

    Arya PradhanMBA (F&B), Batch III, Term I

  • 7/31/2019 Arya Statistics 11

    2/12

    Quantitative Techniques 2012

    2

    Table of Content

    Sl. No Topic1 Objectives

    2 Introduction

    3 Measure of Central Tendency

    7 Frequency Distribution

    8 Probability on the Curve

    9 Sample Probability

    9 Hypothesis Testing for Single Population

    10 Single Factor Anova

    10 Hypothesis Testing for 2 Population

    11 F-test

    12 Conclusion

  • 7/31/2019 Arya Statistics 11

    3/12

    Quantitative Techniques 2012

    3

    Objective

    This project aims at understanding statistics as a tool to explore a collected data for time

    spent on moodle for the Month of May and June 2012. The project aims to summarise and

    interpret data in the correct perspective with the use of statistical models and formulae.

    The inference for all statistical results aims to understand various concepts like:

    Measure of Central Tendency

    Measure of Dispersion

    Concept of Outliers

    Frequency Distribution

    Probability on the Curve

    Sample Probability

    Hypothesis Testing for Single Population

    Hypothesis Testing for 2 Population

    Single Factor Anova

    F- test

  • 7/31/2019 Arya Statistics 11

    4/12

    Quantitative Techniques 2012

    4

    Introduction

    Statistics is the study of the collection, organization, analysis, and interpretation of data. It

    deals with all aspects of this, including the planning of data collection in terms of the design

    of surveys and experiments.

    A statistician is someone who is particularly well versed in the ways of thinking necessary for

    the successful application of statistical analysis. Such people have often gained this

    experience through working in any of a wide number of fields. There is also a discipline

    called mathematical statistics that studies statistics mathematically.

    Statistical methods can be used for summarizing or describing a collection of data; this is

    called descriptive statistics. This is useful in research, when communicating the results of

    experiments. In addition, patterns in the data may be modelled in a way that accounts for

    randomness and uncertainty in the observations, and are then used for drawing inferences

    about the process or population being studied; this is called inferential statistics. Inference is

    a vital element of scientific advance, since it provides a means for drawing conclusions from

    data that are subject to random variation.

    http://en.wikipedia.org/wiki/Datahttp://en.wikipedia.org/wiki/Statistical_surveyhttp://en.wikipedia.org/wiki/Experimental_designhttp://en.wikipedia.org/wiki/Statisticianhttp://en.wikipedia.org/wiki/List_of_fields_of_application_of_statisticshttp://en.wikipedia.org/wiki/Mathematical_statisticshttp://en.wikipedia.org/wiki/Mathematical_statisticshttp://en.wikipedia.org/wiki/List_of_fields_of_application_of_statisticshttp://en.wikipedia.org/wiki/Statisticianhttp://en.wikipedia.org/wiki/Experimental_designhttp://en.wikipedia.org/wiki/Statistical_surveyhttp://en.wikipedia.org/wiki/Data
  • 7/31/2019 Arya Statistics 11

    5/12

    Quantitative Techniques 2012

    5

    POPULATION DATA SET

    Population data is composed of observations of time spent on moodle at various times, with

    the data from each observation serving as a different member of the overall group. In short itis a complete set of data for conducting any statistical analysis.

    The table below represents data collected for the last two months.

    DateTime spent onMoodle

    01-May 0.0

    02-May 0.0

    03-May 0.0

    04-May 0.0

    05-May 2.5

    06-May 5.007-May 7.5

    08-May 2.7

    09-May 6.9

    10-May 7.0

    11-May 0.0

    12-May 15.5

    13-May 2.5

    14-May 2.0

    15-May 2.7

    16-May 8.6

    17-May 0.018-May 0.0

    19-May 2.5

    20-May 15.0

    21-May 5.3

    22-May 4.4

    23-May 16.8

    24-May 6.7

    25-May 0.0

    26-May 7.1

    27-May 17.5

    28-May 0.029-May 10.5

    30-May 4.9

    31-May 6.9

    01-Jun 8.1

    02-Jun 11.2

    03-Jun 6.9

    04-Jun 7.3

    05-Jun 0.0

    06-Jun 16.8

    07-Jun 8.1

    08-Jun 19.9

    09-Jun 8.1

    10-Jun 6.9

  • 7/31/2019 Arya Statistics 11

    6/12

    Quantitative Techniques 2012

    6

    11-Jun 16.8

    12-Jun 7.0

    13-Jun 5.3

    14-Jun 10.6

    15-Jun 0.0

    16-Jun 6.717-Jun 0.0

    18-Jun 3.9

    19-Jun 0.0

    DESCRIPTIVE STATISTICS:

    Summarizes the population data by describing what was observed in the sample numerically

    or graphically. Numerical descriptors include mean and standard deviation for continuous

    data types (like heights or weights), while frequency and percentage are more useful in

    terms of describing categorical data. The table below represents descriptive analysis of

    population data and its inference.

    Mode 0.000

    Median 6.003

    Mean 6.086

    Qmin 0.000

    Q1 0.508

    Q2 6.003

    Q3 8.090

    Qmax 19.926

    Variance 30.627

    Standard Deviation 5.534

    Mean Absolute Deviation 4.337

    Coefficient Of Variation 91%

    Skewness 0.817625

    INFERENCES

    Range - In the descriptive statistics, the range is the length of the smallest interval which

    contains all the data. It is calculated by subtracting the smallest observation (sample

    minimum) from the greatest (sample maximum) and provides an indication of statistical

    dispersion. In our case the range of time spent on moodle is 19.9.

    Mean - For a data set, the mean is the sum of the values divided by the number of values.

    The mean of a set of numbers x1, x2, ..., xn is typically denoted by, pronounced "x bar". This

    mean is a type of arithmetic mean. If the data set were based on a series of observations

    obtained by sampling a statistical population, this mean is termed the "sample mean" to

  • 7/31/2019 Arya Statistics 11

    7/12

    Quantitative Techniques 2012

    7

    distinguish it from the "population mean". In our case the population mean is 6.086, which is

    average daily time spent on moodle.

    Median - The median of a set of data values is the middle value of the data set when it has

    been arranged in ascending order. That is, from the smallest value to the highest value. In

    our case the median is 6.003.

    Outlier - An outlying observation, or outlier, is one that appears to deviate markedly from

    other members of the sample in which it occurs.

    Outliers can occur by chance in any distribution, but they are often indicative either of

    measurement error or that the population has a heavy-tailed distribution. In the former case

    one wishes to discard them or use statistics that are robust to outliers, while in the latter

    case they indicate that the distribution has high kurtosis and that one should be very

    cautious in using tools or intuitions that assume a normal distribution.

    Q min Q1 Q2 Q3 Q maxQuartile 0.0 0.508 6.003 8.090 19.926

    Standard Deviation - In statistics, standard deviation (represented by the symbol ) shows

    how much variation or "dispersion" exists from the average (mean, or expected value). A low

    standard deviation indicates that the data points tend to be very close to the mean, whereas

    high standard deviation indicates that the data points are spread out over a large range of

    values. The Standard Deviation of 5.534 represents the measure of dispersion in data.

    Skewness- It is a measure of the asymmetry of the probability distribution of a real-valued

    random variable. The skewness value can be positive or negative, or even undefined.Qualitatively, a negative skew indicates that the tail on the left side of the probability density

    function is longer than the right side and the bulk of the values lie to the right of the mean. A

    positive skew indicates that the tail on the right side is longer than the left side and the bulk

    of the values lie to the left of the mean. A zero value indicates that the values are relatively

    evenly distributed on both sides of the mean, typically but not necessarily implying a

    symmetric distribution. In this case the data is positively skewed (0.817).

    Frequency Distribution - In statistics, a frequency distribution is an arrangement of the

    values that one or more variables take in a sample. Each entry in the table contains the

    frequency or count of the occurrences of values within a particular group or interval, and in

    this way, the table summarizes the distribution of values in the sample.

    Frequency distributions are used for both qualitative and quantitative data. From the

    histogram we can infer that the most of the time spent on moodle lie within 0 -0.5 minutes

    bucket.

  • 7/31/2019 Arya Statistics 11

    8/12

    Quantitative Techniques 2012

    8

    From this we can infer that the most the time moodle has been used only for downloading

    the study material.

    Random Sampling - A random sample is one chosen by a method involving an

    unpredictable component. Random sampling can also refer to taking a number of

    independent observations from the same probability distribution, without involving any real

    population.

    The random sample drawn in this case is:

    Random Sample1 6.71

    2 2.50

    3 2.70

    4 7.04

    5 7.14

    6 7.14

    7 2.70

    8 4.93

    9 0.00

    10 0.00

    11 0.00

    12 6.71

    13 7.30

    14 16.81

    15 8.09

    16 8.09

    17 0.00

    18 17.50

    19 7.14

    20 0.00

    21 0.00

    22 0.00

    23 6.89

    24 11.19

    0

    2

    4

    6

    8

    10

    12

    14

    1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

    Frequency Distribution

    Frequency

  • 7/31/2019 Arya Statistics 11

    9/12

    Quantitative Techniques 2012

    9

    Probability for the Population and Interval Estimates:

    Let us consider an example of time spent on moodle. The probability of spending less than 4

    minutes is 43.15%. Now we will estimate the population mean from sample mean and thesame can be done with confidence interval approach and the details as follows

    ESTIMATING POPULATION MEAN FROM SAMPLE

    Confidence Interval 95%

    t Value 2.07

    Sample Mean 5.44

    Sample Size 24

    Point Estimator 5.44

    Interval Estimate (Upper value) 7.66

    Interval Estimate (Lower value) 3.22

    Population Mean 6.09

    HYPOTHESIS TESTING ABOUT SINGLE POPULATION

    H1:U= 4.80

    H2:U not equal to 4.8

    t calculated -0.60

    Confidence Interval 95%

    t Critical Value (Two tailed test) +-2.069

    Hypothesis cannot be rejected

    Population mean is within the confidence interval of 7.23 minutes to 2.60 minutes.

    Hypothesis Test (Single Population)

    Let us consider Null hypothesis to be = 4.80. Alternate hypothesis is not equal to

    4.80.

    Since the sample size is less than 30, we have used the t- distribution. For the same we

    have considered the random sample of 24 values and the mean sample has also been found

    out, which is 4.92 minutes. Using the t- distribution we have found out the t- calculated value

    as 0.11 which is less than the t critical value of +-2.069 for a two-tailed test (Confidence

  • 7/31/2019 Arya Statistics 11

    10/12

    Quantitative Techniques 2012

    10

    Interval = 95%, degree of freedom = 23). Since the t calculated is within the acceptance

    region, we have accepted null hypothesis (= 4.80).

    2 Population Tests:

    A Z-test is any statistical test for which the distribution of the test statistic under the nullhypothesis can be approximated by a normal distribution. Because of the central limit

    theorem, many test statistics are approximately normally distributed for large samples. For

    each significance level, the Z-test has a single critical value (for example, 1.96 for 5% two

    tailed) which makes it more convenient than the Student's t-test which has separate critical

    values for each sample size. Therefore, many statistical tests can be conveniently performed

    as approximate Z-tests if the sample size is large or the population variance known. If the

    population variance is unknown (and therefore has to be estimated from the sample itself)

    and the sample size is not large, the Student t-test may be more appropriate.

    Now we are considering the two sample test and here we have taken the sample from

    population of time spent by Devesh and Dennis. Z- Distribution is being utilised to find thatthere is any difference in time spent on moodle by both the persons.

    Let us consider Null hypothesis to be Ho:1=2. Alternate hypothesis is Ha:12

    z-Test: Two Sample for Means

    DENNIS ARYA

    Mean 5.09 3.52Known Variance 26.56 21.32

    Observations 31.00 31.00Hypothesized Mean Difference 0.00z 1.26P(Z

  • 7/31/2019 Arya Statistics 11

    11/12

    Quantitative Techniques 2012

    11

    chance of committing a type I error. For this reason, ANOVAs are useful in comparing two,

    three, or more means. Based on the above samples, we shall undertake the following

    hypothesis.

    H1:1=2=3

    H2: Any of the sample means are not equivalent to the others.

    Anova: Single Factor

    SUMMARYGroups Count Sum Average Variance

    Dennis 31 157.90 5.09 28.26Devesh 31 109.18 3.52 26.30

    Arya 31 236.39 7.62 28.80

    ANOVASource ofVariation SS df MS F P-value F crit

    BetweenGroups 265.77 2 132.88 4.78 0.0106 3.098Within Groups 2501.42 90 27.79

    Total 2767.20 92

    Since F calculated is greater than F critical, we should reject the null hypothesis.Which

    means that there is a difference in time spent on moodle by all three persons.

    F

    Test:

    An F-test is any statistical test in which the test statistic has an F-distribution under the null

    hypothesis. It is most often used when comparing statistical models that have been fit to

    a data set, in order to identify the model that best fits the population from which the data

    were sampled. Exact F-tests mainly arise when the models have been fit to the data

    using least squares

    F-TEST TWO-SAMPLE FOR VARIANCES

    Dennis Devesh

    Mean

    3.52 7.63

    Variance26.31 28.81

    Observations31.00 31

    df30.00 30

    F0.91

    P(F

  • 7/31/2019 Arya Statistics 11

    12/12

    Quantitative Techniques 2012

    12

    F-test is rejected for the above case as F calculated < F critical. Hence the null hypothesis ofHo: 1=2 is accepted.

    CONCLUSION:

    We have done statistical analysis upon the time spent on moodle by members within the

    group. The time spend pattern patterns between Members are out of sync as the measures

    of dispersion are too wide.