2. Basic_statistics 30 Sep 2013.pdf

Embed Size (px)

Citation preview

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    1/20

    Basic Statistics - Concepts

    and Examples

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    2/20

    Elementary Concepts

    Variables: Variables are things that we measure, control, or manipulatein research. They differ in many respects, most notably in the role theyare given in our research and in the type of measures that can beapplied to them.

    Observational vs. experimental research. Most empirical research

    belongs clearly to one of those two general categories. In observationalresearch we do not (or at least try not to) influence any variables butonly measure them and look for relations (correlations) between someset of variables. In experimental research, we manipulate somevariables and then measure the effects of this manipulation on othervariables.

    Dependent vs. independent variables. Independent variables are thosethat are manipulated whereas dependent variables are only measured orregistered.

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    3/20

    Systematic and Random Errors

    Error:Defined as the difference between a

    calculated or observed value and the true value

    Blunders: Usually apparent either as obviously incorrect datapoints or results that are not reasonably close to the expected value.Easy to detect.

    Systematic Errors:Errors that occur reproducibly from faultycalibration of equipment or observer bias. Statistical analysis ingenerally not useful, but rather corrections must be made based onexperimental conditions.

    Random Errors:Errors that result from the fluctuations inobservations. Requires that experiments be repeated a sufficientnumber of time to establish the precision of measurement.

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    4/20

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    5/20

    Uncertainties

    In most cases, cannot know what the true value is unless

    there is an independent determination (i.e.different

    measurement technique).

    Only can consider estimatesof the error.

    Discrepancyis the difference between two or more

    observations. This gives rise to uncertainty.

    Probable Error:Indicates the magnitude of the error we

    estimate to have made in the measurements. Means that if

    we make a measurement that we probably wont be

    wrong by that amount.

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    6/20

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    7/20

    some univariate statistical terms:

    mode: value that occurs most frequently in a distribution

    (usually the highest point of curve)

    may have more than one mode in a dataset

    median: value midway in the frequency distribution

    half the area of curve is to right and other to left

    mean: arithmetic average

    sum of all observations divided by # of observations

    poor measure of central tendency in skewed distributions

    range: measure of dispersion about mean(maximum minus minimum)

    when max and min are unusual values, range may be

    a misleading measure of dispersion

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    8/20

    Distribution vs. Sample Size

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    9/20

    histogramis a useful graphic representation of

    information content of sample or parent population

    many statistical tests assume

    values are normally distributed

    not always the case!

    examine data prior

    to processing

    from: Jensen, 1996

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    10/20

    Deviations

    The deviation, di, of any measurementxifrom the mean mof theparent distribution is defined as the difference betweenxiand m

    Average deviation, a, is defined as the average of the magnitudes

    of the deviations, which is given by the absolute value of the

    deviations.

    di xi m

    a limN

    1

    N xi mi1

    n

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    11/20

    variance: average squared deviation of all possible observations

    from a sample mean (calculated from sum of squares)

    standard deviation: positive square root of the variancesmall std dev: observations are clustered tightly

    around a central value

    large std dev: observations are scattered widely

    about the mean

    s2

    i= lim [1/NS(xi- )2

    ]i=1

    n

    s2i= S(xi- )2i=1n

    N- 1

    where: is the mean,

    xiis observed value, and

    Nis the number of observations

    N->

    Number decreased fromNto

    N - 1for the sample variance

    as is used in the calculation

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    12/20

    Sample Mean and Standard Deviation

    For a series of Nobservations, the most probable estimate of the

    mean is the average x of the observations. We refer to this as

    the sample mean x to distinguish it from the parent mean .

    m x

    1

    N xi

    s

    2 1

    N ix m

    2

    1

    N i2

    x m2

    Sample Mean

    Our best estimate of the standard deviation swould be from:

    s2 s2

    1

    N1x

    i

    x

    2

    Sample Variance

    But we cannot know the true parent mean so the best estimate

    of the sample variance and standard deviation would be:

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    13/20

    Distributions

    Binomial Distribution:Allows us to define theprobability,p, of observingxa specific combination of n

    items, which is derived from the fundamental formulas for

    the permutations and combinations.

    Permutations:Enumerate the number of permutations,

    Pm(n,x),of coin flips, when we pick up the coins one at

    a time from a collection of n coins and putxof them

    into the heads box.

    Pm(n,x) n!

    (n x)!

    n!n(n1)(n2) (3)(2)(1)

    1!1

    0!1

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    14/20

    Distributions - cont.

    Combinations: Relates to the number of ways we can

    combine the various permutations enumerated above

    from our coin flip experiment. Thus the number of

    combinations is equal to the number of permutationsdivided by the degeneracy factorx! of the permutations.

    C(n,x) Pm(n,x)x!

    n!x!(nx)!

    nx

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    15/20

    Probability and the Binomial Distribution

    Coin Toss Experiment: If p is the probability of success (landing heads up)is not necessarily equal to the probability q = 1- pfor failure

    (landing tails up) because the coins may be lopsided!

    The probability for each of the combinations ofxcoins heads up and

    n -xcoins tails up is equal topxqn-x. The binomial distributioncan be

    used to calculate the probability:

    PB(x,n,p) n

    x

    p

    xqnx

    n!

    x!(nx)!px

    (1 p)n x

    The coefficientsPB(x,n,p)are closely related to the binomial theorem

    for the expansion of a power of a sum:

    p q n n

    x

    pxqnx

    x0

    n

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    16/20

    Mean and Variance: Binomial Distribution

    m x n!

    x! n x !p

    x(1 p)n x

    x0

    n

    np

    The mean of the binomial distribution is evaluated by combining the

    definition of with the function that defines the probability, yielding:

    The average of the number of successes will approach a mean value

    given by the probability for success of each item ptimes the number of

    items. For the coin toss experimentp=1/2, half the coins should land

    heads up on average.

    s2 (x m) 2 n!

    x! nx !px

    (1 p)n x

    x0

    n

    np(1 p)

    If the the probability for a single successpis equal to the probability for

    failurep=q=1/2, the final distribution is symmetric about the mean and

    mode and median equal the mean. The variance, s2 m/2.

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    17/20

    Other Probability Distributions: Special Cases

    Poisson Distribution:An approximation to the

    binomial distribution for the special case when the averagenumber of successes is very much smaller than the

    possible number i.e.

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    18/20

    Gaussian or Normal Error Distribution Details

    Gaussian Distribution:Most important probabilitydistribution in the statistical analysis of experimental data.functional form is relatively simple and the resultantdistribution is reasonable. Again this is a special limitingcase to the binomial distribution where the number of

    possible different observations, n, becomes infinitely largeyielding np >> 1. Most probable estimate of the mean from a random sample of

    observations is the average of those observations!

    G 2.354s

    Tangent along the steepest portion

    of the probability curve intersects

    at e-1/2and intersects x axis at the

    points x = 2s

    Probable Error (P.E.)is defined as the

    absolute value of the deviation such

    thatPGof the deviation of any randomobservation is < 1/2

    P.E. 0.6745s 0.2865 G

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    19/20

    For gaussian or normal error distributions:

    Total area underneath curve is 1.00 (100%)

    68.27% of observations lie within 1 std dev of mean

    95% of observations lie within 2 std dev of mean

    99% of observations lie within 3 std dev of mean

    Variance, standard deviation, probable error, mean, and

    weighted root mean square error are commonly used

    statistical terms in geodesy.

    compare (rather than attach significance to numerical value)

  • 8/14/2019 2. Basic_statistics 30 Sep 2013.pdf

    20/20