85
Skyline Technologies presents StatPad TM Quick and Easy Data Analysis Using Excel ® 0 5 10 15 2004 2006 2008 2010 2012 2014 Amazon Revenue ($billions) Time Revenue Trend Forecast Copyright © 1988, 1997, 2000, 2003, 2011 by Skyline Technologies, Inc. StatPad is a trademark of Skyline Technologies, Inc. Excel is a registered trademark of Microsoft Corporation.

StatPad Installation and User Guide

Embed Size (px)

DESCRIPTION

StatPad Installation and User Guide

Citation preview

  • Skyline Technologies presents

    StatPadTM

    Quick and Easy Data Analysis Using Excel

    0

    5

    10

    15

    2004 2006 2008 2010 2012 2014

    Am

    azo

    n R

    even

    ue

    ($b

    illi

    on

    s)

    Time

    Revenue

    Trend

    Forecast

    Copyright 1988, 1997, 2000, 2003, 2011 by Skyline Technologies, Inc.

    StatPad is a trademark of Skyline Technologies, Inc.

    Excel is a registered trademark of Microsoft Corporation.

  • 2 What is StatPad?

    Table of Contents

    What is StatPad? ..............................................................................................................................4

    How to Install StatPad .....................................................................................................................5

    How to Use StatPad .........................................................................................................................6

    Overview of StatPad Features ........................................................................................................12

    One-Sample Analysis .....................................................................................................................18

    Summaries................................................................................................................................18

    Histogram .................................................................................................................................19

    Histogram (With Customized Bin Width and Landmark) .......................................................20

    Box Plot ...................................................................................................................................21

    Cumulative Distribution ..........................................................................................................22

    Confidence Interval ..................................................................................................................23

    Confidence Interval (One-Sided, 99%) ....................................................................................24

    Hypothesis Test ........................................................................................................................25

    Hypothesis Test (One-Sided) ...................................................................................................26

    Percentile ..................................................................................................................................27

    Percentile Ranking ...................................................................................................................28

    Sampling ........................................................................................................................................29

    Random Sample Without Replacement ...................................................................................29

    Random Sample With Replacement ........................................................................................30

    Uniform Distribution ...............................................................................................................31

    Normal Distribution .................................................................................................................32

    Binomial Distribution ..............................................................................................................33

    Binomial Percentages ...............................................................................................................34

    Probability Calculations .................................................................................................................35

    Normal Probability (Greater Than) ..........................................................................................35

    Normal Probability (Between) .................................................................................................36

    Binomial Probability (Equal to) ...............................................................................................37

    Binomial Probability (This or Less) .........................................................................................38

    Binomial Percent (Equal to) .....................................................................................................39

    Binomial Percent (Between) ....................................................................................................40

    Poisson Probability (Equal to) .................................................................................................41

    Poisson Probability (This or Less) ...........................................................................................42

    Exponential Probability (This or More) ...................................................................................43

    Exponential Probability (Between) ..........................................................................................44

    Discrete Probability ..................................................................................................................45

    Two-Sample Analysis ....................................................................................................................46

    Summaries................................................................................................................................46

    Histograms ...............................................................................................................................47

    Box Plots ..................................................................................................................................48

    Confidence Interval ..................................................................................................................49

    Hypothesis Test ........................................................................................................................50

  • Many-Sample Analysis ..................................................................................................................51

    Summaries................................................................................................................................51

    Histograms ...............................................................................................................................52

    Box Plots ..................................................................................................................................53

    F Test for One-Way ANOVA ..................................................................................................54

    Mean Differences .....................................................................................................................55

    Bivariate Analysis ..........................................................................................................................56

    Scatterplot ................................................................................................................................56

    Scatterplot with Least-Squares Line ........................................................................................57

    Correlation ...............................................................................................................................58

    Correlation with Test ...............................................................................................................59

    Regression ................................................................................................................................60

    Predicted and Residuals ...........................................................................................................61

    Univariate Summaries ..............................................................................................................62

    Histograms ...............................................................................................................................63

    Box Plots ..................................................................................................................................64

    Multivariate Analysis and Multiple Regression ............................................................................65

    Scatterplots ...............................................................................................................................65

    Correlations ..............................................................................................................................66

    Multiple Regression .................................................................................................................67

    Predicted and Residuals ...........................................................................................................68

    Diagnostic Plot .........................................................................................................................69

    Univariate Summaries ..............................................................................................................70

    Histograms ...............................................................................................................................71

    Box Plots ..................................................................................................................................72

    Time-Series Analysis .....................................................................................................................73

    Trend-Seasonal ........................................................................................................................73

    Forecast with Series .................................................................................................................74

    Moving Average (Smooth) ......................................................................................................75

    Seasonal Index .........................................................................................................................76

    Seasonally Adjusted Series ......................................................................................................77

    Long-Term Trend .....................................................................................................................78

    Seasonalized Trend ..................................................................................................................79

    A Combination: Data Series With Long-Term Trend and Forecast ........................................80

    Numeric Output .......................................................................................................................81

    Quality Control ..............................................................................................................................82

    X-Bar, R Charts (No Standard Given) ......................................................................................82

    X-Bar, R Charts (Standard Given) ............................................................................................83

    Percentage or Count Chart (No Standard Given) .....................................................................84

    Percentage or Count Chart (Standard Given) ..........................................................................85

  • 4 What is StatPad?

    What is StatPad? Welcome to StatPad1, a software system designed for people who wish to perform statistical

    analysis within their Microsoft Excel2 computer spreadsheets. StatPad was designed to make

    statistical analysis as accessible, painless, and easy to understand as possible by bringing basic

    statistical analysis and its interpretation into the environment where business and other data are

    often found: namely within an Excel spreadsheet. Whenever possible, the analysis is guided by

    choices from a dialog box that adapts itself automatically to your situation. The results,

    consisting of charts, explanatory text, and computations, then become part of your worksheet.

    StatPad will perform all aspects of basic statistics: design using a random sample, exploration

    through graphic representations of data, estimation with summaries and confidence intervals

    (both one-and two-sided at various confidence levels), hypothesis testing, normal and binomial

    probability calculations, multiple regression analysis, trend-seasonal time series analysis, and

    statistical quality control charts.

    Heres how to get started if you are in a hurry: after you open the file STATPAD.XLA, you will

    find StatPad listed under the Excels Add-Ins Ribbon (or Tools menu for older versions of Excel)

    ready for you to select. When selected, StatPad greets you with its main dialog box, ready for

    analysis.

    1StatPad is a trademark of Skyline Technologies, Inc.

    2Excel is a registered trademark of Microsoft Corporation.

  • How to Install StatPad 5

    How to Install StatPad All you need in order to run StatPad is a computer running Microsoft Excel for Windows. There

    are two ways to install StatPad, depending upon whether or not you want StatPad to be there

    automatically whenever you work in Excel. Please begin by copying the file STATPAD.XLA to a

    folder on your computer.

    If you wish StatPad to be available automatically when you run Excel:

    1. In Excel, choose File/Options, select Add-Ins at the left, wait a moment, then choose

    "Go" near the bottom to manage Excel Add-Ins (Excel 2007 users will start by clicking

    on the OfficeButton at the top left, choose ExcelOptions at the bottom before continuing

    by selecting Add-Ins at the left and choosing "Go").

    2. Browse to the folder where you put the file STATPAD.XLA, select the file, and click OK.

    3. Be sure the StatPad entry is checked in the list of add-ins, then choose OK.

    4. StatPad will be available in the Add-Ins Ribbon near the top (or Tools menu for older

    versions of Excel).

    If you wish to load StatPad manually each time you open Excel:

    Either double-click the file STATPAD.XLA or use Excels File Open menu commands to

    open this file from its folder on your computer. Choose Enable Macros if necessary.

    The choice StatPad will then be available under Excels Add-Ins Ribbon (or Tools menu

    for older versions of Excel). StatPad will remain available until you close Excel.

    If you need to change Excel's macro security level, you will find this at File / Options /

    TrustCenter / TrustCenterSettings / Add-Ins.

  • 6 How to Use StatPad

    How to Use StatPad Heres how to use StatPad:

    1. Get into Excel and bring your data (if any) into the worksheet.

    2. If StatPad has already been installed, simply select StatPad from Excels Add-Ins Ribbon

    near the top of the screen to begin statistical analysis.

    If StatPad has not yet been installed, either open the file STATPAD.XLA using Excels File

    Open menu command near the top of the screen or read the previous section How to

    Install StatPad to see how to make StatPad available whenever you are in Excel.

    3. You will see StatPads main dialog box, ready to guide you through the analysis:

    4. Select a situation from the list near the top left (One Sample, Sampling, Probability, Two

    Sample, Many Sample, Bivariate, Multivariate, Time Series, or Quality Control).

    5. Select the analysis you want from the list near the top right. Note that this analysis list

    changes automatically for you, depending on the situation you choose. For a One Sample

    situation, the analysis choices are Summaries, Histogram, etc. But if you select

    Probability instead, the analysis choices instantly change to Normal Probability, Binomial

    Probability, and Binomial Percent.

    6. Give StatPad the additional information it needs. StatPad will automatically change to

    show you what is needed, so you may fill in the blanks as they appear. For One Sample,

    Summaries, you need to give StatPad a data set name and an output range. For One

    Sample, Confidence Interval, so that you can tell StatPad which confidence level you

  • How to Use StatPad 7

    wish, an edit box will appear automatically for this purpose (you may also decide to

    choose a one-sided interval). Heres how the main dialog box changes:

    For a multiple regression analysis, StatPads main dialog changes again (automatically!)

    allowing you to select the X variables (for example, income, percent male, and

    readership) to use to explain the Y variable (for example, the cost of a full-page color

    magazine ad).

  • 8 How to Use StatPad

    7. Heres how to select your data set(s) from the list(s). StatPad puts into its lists each Excel

    range name that identifies a single column of numbers.3 When you name your data with

    StatPad, the name also becomes an Excel range name.

    a. If just one data set is needed (e.g., for one-sample analysis), you may choose one of

    the following:

    i. Click on its name, in the list.

    or

    ii. Type its name into the edit-box, just above the list.

    or

    iii. Click on the edit-box, just above the list, and then drag in the worksheet with the

    mouse to identify your column of numbers. This is useful for a quick analysis

    when you do not care to use a name to identify the data.

    b. If more than one data set can be specified (e.g., many-sample analysis, or the X

    variables for a multiple regression), you may choose one of the following:

    i. Click on each name that you wish to select, scrolling up and down as needed. If

    you click again on a selected name, it is unselected (be careful not to click quickly

    twice on the same name; Excel will interpret this as a double-click and StatPad

    will immediately begin the analysis).

    or

    ii. Move through the list using the cursor (arrow) keys, selecting and unselecting by

    hitting the spacebar.

    8. If your data are in the worksheet, but are not offered to you as a choice4 in StatPads lists,

    heres how to proceed:

    a. Click on StatPads Add Data button (at the right, just above the middle of StatPads

    main dialog box) to put a data set name into the list. You then see the following

    dialog box, and you may drag with the mouse to select the data (one column of

    numbers) and specify the name you want. This name will then appear in StatPads

    lists along with the other data sets.

    3Heres a quick way to find out the name (if any) associated with a list of numbers. Highlight the list (drag with the

    mouse), then look for the name in Excels Name Box near the top left corner of the worksheet. StatPad limits the size

    of a each list to a maximum of 65,000 numbers.

    4If you have used Excel to name a column of numbers (e.g., with Excels Insert Name Define menu items), this name

    will appear automatically in StatPads list. When you name a column of numbers within StatPad, this name also

    becomes an Excel range name for your data. Names can be deleted using Excels Insert Name Define Delete menu

    items.

  • How to Use StatPad 9

    Heres how the screen might look after you (1) click in the Range box of the above

    dialog box, (2) highlight your data in the worksheet, (3) click in the Name box of the

    dialog box, and (4) type in the name (Prices for this example, but please dont use

    spaces or special characters):

    b. Alternatively, you may feel free to type a name for the data set into the edit-box in

    StatPads main dialog box, even if that name is not proposed for you. This can be

    done whenever only one data set can be used for the chosen situation (but please dont

    use spaces or special characters in the name). Once you hit Enter, click on Do It, or

    double click, to begin the analysis, StatPad will ask you to select the column of

    numbers you want, using the following dialog box. After this, the name will

    automatically show up in StatPads data set lists.

  • 10 How to Use StatPad

    9. Use the Output Range box at the lower right of the main dialog box to tell StatPad where

    to put the results.

    a. If youve asked for a chart:

    i. If you provide a single cell as the Output Range, then StatPad will place a chart of

    the default size with upper-left corner at this cell.

    ii. If you provide a rectangular range of cells as the Output Range, then StatPad will

    make the chart the same size as your range.

    b. If youve asked for numbers and text:

    i. If there is enough room without erasing any of your data, StatPad will place the

    upper-left cell of the output at the Output Range you specified.

    ii. If your results would overwrite any of your data, StatPad will give you the option

    of either specifying a different Output Range, or (use caution!) going ahead and

    erasing some of your data to make room for the results if you wish.

    10. After StatPad performs the analysis you requested (or asks for clarification, if needed),

    you will again find the StatPad main dialog box on your screen, ready for further analysis.

    You may either continue your analysis with StatPad, or leave StatPad (select Cancel or

    hit the Esc key) to return control to Excel and your worksheet.

    11. You can format StatPads results because they are part of your Excel spreadsheet, after

    leaving the StatPad dialog box by hitting the Esc key or selecting Cancel.

    a. You can select and format numbers in individual cells as you ordinarily would in

    Excel (for example, using the Number Group of the Home Ribbon). For example, you

    can format with dollar signs, set the number of decimal places, format as percentages,

    etc.

    b. You can customize StatPads charts as you would for any Excel chart. For example,

    you might select the chart and then use the Chart Tools Ribbons (Design, Layout, and

    Format) at the top of the Excel window. Another method would be to double-click the

    part of the chart you wish to change, for example the x axis, to bring up the relevant

    formatting options. You might then choose set the scale under Axis Options (e.g., to

    change the minimum and/or maximum) or select Number (e.g., to change the number

    formatting).

  • How to Use StatPad 11

    12. You can copy StatPads results to your word processor, after leaving the StatPad dialog

    box by hitting the Esc key or selecting Cancel.

    a. To copy text and numbers to your word processor, proceed as follows:

    i. Highlight your cell(s) and choose Copy from the Clipboard Group of the Home

    Ribbon.

    ii. Activate your word processor and move the cursor to where you want the results

    to go.

    iii. Depending upon your word processor, you may wish to paste as unformatted text.

    The text then becomes part of the text document and you may format it as you

    like. For example, with Microsoft Word 2010, you might click on the word

    "Paste" in the Clipboard Group of the Home Ribbon, then choose Paste Special

    from the Paste Options, to obtain the Unformatted Text choice.

    b. To copy charts to your word processor, proceed as follows:

    i. Click on the edge of a chart (just one at a time) to select it, then choose Copy from

    the Clipboard Group of Excel's Home Ribbon.

    ii. Activate your word processor and move the cursor to where you want the chart to

    go.

    iii. Depending upon your word processor, you may wish to paste as a Picture

    (instead of as an Excel object). The chart then becomes part of the text document

    and you would be able to place and size it using your word processors

    commands. For example, with Microsoft Word 2010, you might click on the

    word "Paste" in the Clipboard Group of the Home Ribbon, then choose Paste

    Special from the Paste Options, to obtain the Picture (Enhanced Metafile)

    choice.

    13. For more information about statistical analysis, its applications and interpretation, please

    consult a book such as Practical Business Statistics by Andrew F. Siegel (Elsevier /

    Academic Press, sixth edition, 2012).

  • 12 Overview of StatPad Features

    Overview of StatPad Features StatPads statistical analyses are grouped into the following situations:

    One Sample

    Sampling

    Probability

    Two Sample

    Many Sample

    Bivariate

    Multivariate

    Time Series

    Quality Control

    These situations are presented in a list at the left in StatPads main dialog box. When you select a

    situation, the appropriate analyses are available in a list to the right in this dialog box. When you

    select a situation and analysis, an explanation also appears in the dialog box and the dialog box

    changes to allow you to specify what is needed for the analysis (e.g., a confidence level). Here is

    a list of the situations, analyses, and explanations available within StatPad. More details about

    each one, with an example, are given on the pages that follow.

    One Sample

    Summaries Compute statistical summaries for the data: count, average or mean, median,

    smallest, largest, quartiles, standard deviation, and standard error.

    Histogram Draw a histogram to explore the data, showing the shape of the distribution,

    typical values, variability, and outliers. Data are concentrated where the

    histogram bars are high. Check 'Customize' to specify optional bin width and

    landmark point.

    Box Plot Draw a box plot to explore the data, showing the 5-number summary (smallest,

    lower quartile, median, upper quartile, and largest). In the ordinary box plot, a

    line extends from the box on each side to the most extreme value. Check

    Detailed box plot to indicate outliers separately and have the lines extend from

    the box on each side to the most extreme value (adjacent value) that is not an

    outlier.

    Cumulative

    Distribution

    Draw a cumulative distribution function for the data, showing the percentage of

    data values less than each given number. This shows you the percentiles.

  • Overview of StatPad Features 13

    Confidence

    Interval

    Compute a confidence interval for the population mean. This is statistical

    inference about the population, based on random sampling. Two-sided or one-

    sided interval, with your chosen confidence level.

    Hypothesis

    Test

    Test the null hypothesis that the population mean is equal to a given reference

    value. This is statistical inference about the population, based on random

    sampling. Two-sided or one-sided testing (Student's t test) is used.

    Percentile Given a percentage, find the percentile value. This data value has approximately

    this percentage of the data values smaller than it.

    Percentile

    Ranking

    Find the percentage ranking for a given value. This is the approximate

    percentage of data values that are less than the given value.

    Sampling

    Sample

    Without

    Replacement

    Select a random sample from a larger population, without replacement so that

    no item can be selected more than once. All population items are equally likely

    to appear in the sample, and they are chosen independently of one another.

    Sample With

    Replacement

    Select a random sample from a larger population, with replacement so that an

    item may be selected more than once. All population items are equally likely to

    appear in the sample, and they are chosen independently of one another.

    Uniform

    Distribution

    Select a random sample from a uniform distribution, where all values are

    equally likely between the smallest and largest possible value. By specifying a

    name, you will be able to easily use the result later.

    Normal

    Distribution

    Select a random sample from a normal distribution, given the mean and

    standard deviation. By specifying a name, you will be able to easily use the

    result later.

    Binomial

    Distribution

    Select a random sample from a binomial distribution (the number of

    occurrences) given the number of trials and the probability of occurrence. By

    specifying a name, you will be able to easily use the result later.

    Binomial

    Percentages

    Select a random sample of binomial percentages, given the number of trials and

    the probability of occurrence. By specifying a name, you will be able to easily

    use the result later.

  • 14 Overview of StatPad Features

    Probability

    Normal

    Probability

    Probabilities for a normal distribution: the symmetric bell-shaped curve, given a

    mean and a positive standard deviation.

    Binomial

    Probability

    Probabilities for a binomial distribution: the number of occurrences out of a

    given number of independent trials with a given probability.

    Binomial

    Percent

    Probabilities for a binomial percentage, given the number of independent trials

    and the probability for each trial.

    Poisson

    Probability

    Probabilities for a Poisson distribution: the number of random occurrences

    where the rate is fixed, given the mean number. For example, the number of

    orders you will receive next week, if orders occur at a constant rate with an

    average of 5 per week.

    Exponential

    Probability

    Probabilities for an exponential distribution: a highly skewed distribution with

    no memory, given the mean. For example, the length of a telephone call or the

    time until the next customer arrives where the mean is 9 minutes.

    Discrete

    Probability

    Mean (expected value) and standard deviation for a discrete random variable,

    given a set of values and their associated probabilities.

    Two Samples

    Summaries Compute univariate summaries for each data set. Also find the average

    difference and its standard error. If sample sizes are identical, you may indicate

    that a pair of measurements was made on each item.

    Histograms Draw a histogram for each data set, for data exploration.

    Box Plots Draw a box plot for each data set, for data exploration, using the same scale for

    comparison. In the ordinary box plot, a line extends from the box on each side

    to the most extreme value. Check Detailed box plot to indicate outliers

    separately and have the lines extend from the box on each side to the most

    extreme value (adjacent value) that is not an outlier.

    Confidence

    Interval

    Compute a confidence interval for the population mean difference. This is

    statistical inference. Two-sided interval, with chosen confidence level. If

    sample sizes are identical, you may indicate that a pair of measurements was

    made on each item.

    Hypothesis

    Test

    Test the null hypothesis that the population mean difference is zero. This is

    statistical inference. Two-sided testing using Student's t test. If sample sizes are

    identical, you may indicate that a pair of measurements was made on each item.

  • Overview of StatPad Features 15

    Many Samples

    Summaries Select as many data sets as you wish. Compute univariate summaries for each.

    Histograms Draw a histogram for each sample, for data exploration.

    Box Plots Draw a box plot for each sample, for data exploration, using the same scale for

    comparison. In the ordinary box plot, a line extends from the box on each side

    to the most extreme value. Check Detailed box plot to indicate outliers

    separately and have the lines extend from the box on each side to the most

    extreme value (adjacent value) that is not an outlier.

    F Test One-way analysis of variance (ANOVA). Test the null hypothesis that the

    population means are all identical. This is statistical inference.

    Mean

    Differences

    Confidence intervals and hypothesis tests for the difference of each pair of

    population means (least-significant-difference test). This is statistical inference.

    Bivariate

    Scatterplot Draw a scatterplot to explore the relationship between two variables.

    Scatterplot

    with Line

    Draw a scatterplot with least-squares line to explore the relationship between

    two variables.

    Correlation Find the strength of the relationship between two variables as a pure number

    where 1 indicates a perfect increasing relationship, -1 a perfect decreasing

    relationship, and 0 suggesting no relationship.

    Correlation

    with Test

    Find and test the strength of the relationship between two variables. This is

    statistical inference.

    Regression Predict the dependent Y variable from the independent X variable using a

    straight-line relationship.

    Predicted and

    Residuals Predicted values of Y based on X, the residual difference: Actual Y Predicted

    Y, and the standardized residuals.

    Univariate

    Summaries

    Compute univariate summaries for each variable.

    Histograms Draw a histogram for each variable, for data exploration.

  • 16 Overview of StatPad Features

    Box Plots Draw a box plot for each variable, for data exploration. In the ordinary box plot,

    a line extends from the box on each side to the most extreme value. Check

    Detailed box plot to indicate outliers separately and have the lines extend from

    the box on each side to the most extreme value (adjacent value) that is not an

    outlier.

    Multivariate

    Scatterplots Select as many X variables as you wish, but just one Y variable. Draw

    scatterplots for all pairs of variables to explore their relationships.

    Correlations Find the strength of the relationship between pairs of variables as a matrix of

    correlation coefficients (1 is perfect positive correlation, 1 is perfect negative

    correlation, and 0 suggests no relationship).

    Regression Prediction of the dependent Y variable from the independent X variables using a

    linear relationship.

    Predicted and

    Residuals

    Predicted values of Y based on the X variables, the residual differences (Actual

    Y Predicted Y) and the standardized residuals.

    Diagnostic

    Plot

    Look for problems in the regression linear model, such as unequal variability or

    nonlinearity.

    Univariate

    Summaries

    Compute univariate summaries for each variable.

    Histograms Draw a histogram of each variable, for data exploration.

    Box Plots Draw a box plot of each variable, for data exploration. In the ordinary box plot,

    a line extends from the box on each side to the most extreme value. Check

    Detailed box plot to indicate outliers separately and have the lines extend from

    the box on each side to the most extreme value (adjacent value) that is not an

    outlier.

    Time Series

    Trend-

    Seasonal

    A decomposition into (1) long-term trend (linear or exponential), (2) repeating

    seasonal component (monthly or quarterly), (3) wandering cyclic component,

    and (4) irregular component. Seasonal adjustment & forecasting. Time must

    increase down data column.

  • Overview of StatPad Features 17

    Quality Control

    X-Bar, R

    Charts

    Chart the averages and the ranges of your data to see if this process is in or out

    of control. Choose a subgroup size from 2 to 25. You may specify a standard if

    one is available.

    Pct, Count

    Chart

    Chart the percents or counts to see if this process is in or out of control. Your

    data may be either counts or percentages (counts divided by the sample size).

    You may specify a standard if one is available.

  • 18 Overview of StatPad Features

    One-Sample Analysis

    Summaries

    Summaries are used to give you selected

    numbers that represent and describe your

    data set.

    StatPads summaries (below) for Quality

    scores show how many data values there are

    (n = 50), typically how high the scores are

    ( 1 /n

    iiX X n

    =90.78), and about how

    far individual scores are

    ( S X X niin

    ( ) / ( )21

    1 = 7.56) from

    the mean. The quartiles are about 1/4 of the

    way in from each end (highest and lowest)

    while the median is 1/2 way in. The standard

    error of the average is S S nX / .

    To compute the summaries using StatPads main dialog box, select One Sample as the

    situation and Summaries as the analysis. Select your data from the list (or use Add Data if your

    column of numbers is in the worksheet but is not in the list), check the Output Range to be sure

    that is where you want the results to appear, and then select Do It (or hit the Enter key).

    Quality Summaries

    50 Count n

    90.78 Mean or average

    7.56 Standard deviation (variability of individuals)

    72 Smallest

    86 Lower quartile

    93 Median

    97 Upper quartile

    99 Largest

    1.069 Standard error (variability of sample average, if random sample)

  • Overview of StatPad Features 19

    Histogram

    The histogram is used to visually explore

    a data set. The data axis is horizontal, and

    the bars show how many data values are

    within each interval. Data are concentrated

    where bars are tall. You can see typical

    value, variability, and distribution shape.

    StatPads histogram (below) shows that

    the Quality scores fall within the interval

    from about 70 to 100. They are skewed with a

    long tail towards lower values, being more

    concentrated in the higher end of the range.

    To create a histogram using StatPads

    main dialog box, select One Sample as the

    situation and Histogram as the analysis. Select your data from the list (or use Add Data if your

    column of numbers is in the worksheet but is not in the list), check the Output Range, and then

    select Do It.

    StatPad chooses a default bin width and landmark (which could be a left or right endpoint

    of the histogram, or any bin boundary) for the histogram bars. These can be changed using the

    Customize check-box (see next item). Note that Excel (not StatPad) chooses the minimum and

    maximum horizontal scale. These may be changed (as was done for the chart below) by leaving

    StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to find

    Minimum and Maximum as Axis Options.

    0

    10

    20

    60 70 80 90 100

    Fre

    qu

    ency

    Quality

  • 20 Overview of StatPad Features

    Histogram (With Customized Bin Width and Landmark)

    There are often several reasonable

    choices for how wide to make the histogram

    bars and where to place them left-to-right.

    StatPad can choose a default bin width and

    landmark for the histogram bars, or you can

    specify customized values.

    In the customized histogram below, the

    bin width has been decreased to 1 to show

    more detail (StatPads default bin width for

    this data set was 5).

    To create a customized histogram using

    StatPads main dialog box, select One

    Sample as the situation and Histogram as the

    analysis. Select your data from the list (or use Add Data if your column of numbers is in the

    worksheet but is not in the list). When you click on Customize, two edit-boxes appear: for Bin

    Width and for the optional Landmark. You may then click on each and type the value you wish.

    The Landmark setting would shift the bars left or right to align on the specified value. Then

    check the Output Range, and then select Do It.

    0

    5

    10

    60 70 80 90 100

    Fre

    quen

    cy

    Quality

  • Overview of StatPad Features 21

    Box Plot

    The box plot is used to quickly and

    visually explore a data set; it shows you a

    central box defined by the quartiles, with the

    median indicated within the box. In the

    ordinary box plot, a line extends from the box

    on each side to the most extreme value. In the

    detailed box plot, outliers are indicated

    separately and these lines extend from the

    box on each side to the most extreme value

    (adjacent value) that is not an outlier.

    In StatPads box plot for Sensitivity

    (below left) you see that the middle half of the

    data extends from about 60 to 100, with the

    median at about 80. The line at the right

    extends to the largest at about 180.

    StatPads detailed box plot (below right) shows outliers separately, revealing that the

    largest value, at about 180, is an outlier.

    To display a box plot using StatPads main dialog box, select One Sample as the situation

    and Box Plot as the analysis. Select your data from the list (or use Add Data if your column of

    numbers is in the worksheet but is not in the list). Click on Detailed box plot if you wish outliers

    to be displayed separately. Then check the Output Range, and then select Do It.

    Outliers are defined as data values more than 1.5 times the interquartile range away from

    either quartile.

    0 50 100 150 200

    Sensitivity

    0 50 100 150 200

    Sensitivity

  • 22 Overview of StatPad Features

    Cumulative Distribution

    The cumulative distribution function is

    used to show you the percentiles of the data.

    Percentages are shown vertically (from 0 to

    100%) and data values are horizontal. The

    chart shows the percentage of the data values

    (vertical scale) that are equal or less to the

    given value (horizontal scale).

    In StatPads cumulative distribution

    function for Quality (below) you can see that

    about 10% of the Quality scores are less than

    or equal to 80, about 25% of the Quality

    scores are less than or equal to 85, and that

    about a third are scores of 90 or less.

    To compute a cumulative distribution function using StatPads main dialog box, select One

    Sample as the situation and Cumulative Distribution as the analysis. Select your data from the

    list (or use Add Data if your column of numbers is in the worksheet but is not in the list), check

    the Output Range, and then select Do It.

    0%

    20%

    40%

    60%

    80%

    100%

    60 70 80 90 100

    Cu

    mu

    lati

    ve

    Per

    cen

    t

    Quality

  • Overview of StatPad Features 23

    Confidence Interval

    A confidence interval for the mean

    includes the unknown population mean with

    known confidence, e.g., 95%. Random

    sampling from a normal population is

    assumed.

    StatPads two-sided 95% confidence

    interval results for Quality (below) tell you

    that the bounds of the interval are 88.63 and

    92.93.

    To compute a confidence interval using

    StatPads main dialog box, select One

    Sample as the situation and Confidence

    Interval as the analysis. Select your data

    from the list (or use Add Data if your column of numbers is in the worksheet but is not in the

    list), check the Output Range, and then select Do It. You may also change the Confidence level

    (from the default 95%) or select a one-sided interval instead of a two-sided interval (see next

    item).

    Confidence interval for Quality:

    We are 95% sure that the

    population mean for Quality

    is somewhere between

    88.63 and 92.93

    (assuming a random sample from a normal population).

  • 24 Overview of StatPad Features

    Confidence Interval (One-Sided, 99%)

    The one-sided interval says, with

    specified confidence, that the unknown

    population mean is either at least ... (for

    an upper confidence interval) or no more

    than ... (for a lower confidence interval).

    You should decide whether to use a one-sided

    or two-sided confidence interval before you

    look at the data. You should not use both

    upper and lower one-sided confidence

    intervals on the same data set; either use a

    two-sided interval, or choose just one side for

    a one-sided confidence interval. If in doubt,

    use a two-sided confidence interval.

    StatPads one-sided upper 99%

    confidence interval for Quality (below) shows you that the bound is 88.21.

    To compute a one-sided 99% confidence interval using StatPads main dialog box, select

    One Sample as the situation and Confidence Interval as the analysis. Select your data from the

    list (or use Add Data if your column of numbers is in the worksheet but is not in the list), click on

    the 1-sided box of your choice, set the Confidence Level to 99%, check the Output Range, and

    then select Do It.

    One-sided upper confidence interval for Quality:

    We are 99% sure that the

    population mean for Quality

    is at least

    88.21

    (assuming a random sample from a normal population).

  • Overview of StatPad Features 25

    Hypothesis Test

    A hypothesis test is used to decide, based

    on data, whether or not the unobservable

    population mean could reasonably be equal

    to a given reference value. Because the

    sample average represents (with statistical

    error) the unknown population mean, the

    result is often stated in terms of a significant

    (or nonsignificant) difference between the

    sample average and the reference value, both

    of which are known. Random sampling from

    a normal population is assumed.

    StatPads hypothesis test results for

    Quality (below) show a very highly

    significant difference between the reference

    value (given here as 87.5) and the observed average Quality score of 90.78. Results include the t

    value, the p value, the practical interpretation of the results, and a formal statement of the null

    hypothesis being tested.

    To perform a hypothesis test using StatPads main dialog box, select One Sample as the

    situation and Hypothesis Test as the analysis. Select your data from the list (or use Add Data if

    your column of numbers is in the worksheet but is not in the list), specify the Reference Value,

    check the Output Range, and then select Do It. Optionally, you may specify a one-sided test

    (upper or lower); see next item.

    The p value says that, if the population mean had been equal to the reference value, then p is

    the probability of observing such a large (or larger) difference between the sample average and

    the reference value. Smaller p values indicate significance because rare events are unlikely.

    .

    Hypothesis test for Quality:

    t = 3.07

    p = 0.00350

    The sample average

    90.78

    is highly significantly different (p

  • 26 Overview of StatPad Features

    Hypothesis Test (One-Sided)

    A one-sided upper hypothesis test can

    decide only whether the sample average is

    significantly larger than the reference value.

    A one-sided lower hypothesis test can decide

    only whether the sample average is

    significantly less than the reference value.

    You should decide whether to use a one-sided

    or two-sided test before you look at the data.

    You should not use both upper and lower

    one-sided hypothesis tests on the same data

    set; either use a two-sided interval, or choose

    just one side for a one-sided hypothesis test.

    If in doubt, use a two-sided test.

    StatPads one-sided upper hypothesis

    test results for Quality (below) show that the scores are significantly larger, on average, than the

    reference value (given here as 87.5). Results include the t value, the p value, the practical

    interpretation of the results, and a formal statement of the null hypothesis being tested.

    To perform a one-sided hypothesis test using StatPads main dialog box, select One Sample

    as the situation and Hypothesis Test as the analysis. Select your data from the list (or use Add

    Data if your column of numbers is in the worksheet but is not in the list), click on the 1-sided box

    of your choice, specify the Reference Value, check the Output Range, and then select Do It.

    One-sided hypothesis test for Quality:

    t = 3.07

    p = 0.00175

    The sample average

    90.78

    is highly significantly larger (p

  • Overview of StatPad Features 27

    Percentile

    Percentiles are landmarks in the data

    that are a known percentage (of the data

    values) from smallest to largest. The smallest

    data value is the 0th

    percentile, the largest is

    the 100th

    percentile, the median is the 50th

    percentile, and so forth.

    In StatPads percentile calculation

    (below) the 85th percentile for the Quality

    scores is found to be a score of 98. That is,

    the score 98 is about 85% of the way (in the

    ordered list of scores) from the smallest to the

    largest score.

    To find a percentile using StatPads main

    dialog box, select One Sample as the situation and Percentile as the analysis. Select your data

    from the list (or use Add Data if your column of numbers is in the worksheet but is not in the

    list), provide the Percentage for which you would like the percentile, check the Output Range,

    and then select Do It.

    For Quality:

    85 th percentile

    is 98

  • 28 Overview of StatPad Features

    Percentile Ranking

    The percentile ranking of a given data

    value gives you the percentage of the way

    along in the list of data values (from smallest

    to largest) that this given data value is.

    In StatPads percentile calculation

    (below) the Quality score 87.5 is found to be

    30% of the way from smallest to largest.

    To find a percentile ranking using

    StatPads main dialog box, select One

    Sample as the situation and Percentile

    Ranking as the analysis. Select your data

    from the list (or use Add Data if your column

    of numbers is in the worksheet but is not in

    the list), provide the data Value for which you would like the percentile ranking, check the

    Output Range, and then select Do It.

    For Quality:

    87.5 is the

    30 th percentile

  • Overview of StatPad Features 29

    Sampling

    Random Sample Without Replacement

    A random sample without replacement

    is chosen from a population so that (1) all

    population units are equally likely to be

    chosen, (2) units are selected independently

    of one another, and (3) once a unit is chosen,

    it cannot be chosen again. All sampled units

    are different when sampling without

    replacement.

    StatPads results (below) show a sample

    of 5 selected at random (without replacement)

    from a population of size 100. The selected

    items (in order) are 19, 25, 59, 67, and 89.

    This list of five numbers has also been given a

    name (firstSample was chosen here) which

    will appear in StatPads lists of data sets.

    To select a random sample without replacement using StatPads main dialog box, select

    Sampling as the situation and Sample Without Replacement as the analysis. Specify a

    Population Size and a Sample Size. Provide an optional name for the resulting data in case you

    plan to refer to it later, check the Output Range, and then select Do It.

    Random sample of size 5

    from population numbered from 1 to 100

    chosen without replacement:

    firstSample

    19

    25

    59

    67

    89

  • 30 Overview of StatPad Features

    Random Sample With Replacement

    A random sample with replacement is

    chosen from a population so that (1) all

    population units are equally likely to be

    chosen, (2) units are selected independently

    of one another, and (3) once a unit is chosen,

    it is replaced so that it may be chosen again.

    The sampled units may or may not all be

    different when sampling with replacement.

    The StatPad results below show a sample

    of 5 selected at random (with replacement)

    from a population of size 100. The selected

    items (in order) are 43, 51, 55, 55, and 82. Note that an item (55) was chosen twice. This can

    happen when sampling with replacement. This

    list of five numbers has also been given a name (secondSample was chosen here) which will

    appear in StatPads lists of data sets.

    To select a random sample with replacement using StatPads main dialog box, select

    Sampling as the situation and Sample With Replacement as the analysis. Specify a Population

    Size and a Sample Size. Provide an optional name for the resulting data in case you plan to

    refer to it later, check the Output Range, and then select Do It.

    Random sample of size 5

    from population numbered from 1 to 100

    chosen with replacement:

    secondSample

    43

    51

    55

    55

    82

  • Overview of StatPad Features 31

    Uniform Distribution

    A uniform distribution generates

    numbers, chosen independently of one

    another, that are equally likely to fall

    anywhere within a specified interval.

    In StatPads results (below) five numbers

    were selected uniformly from 35 to 45. This

    list of five numbers has also been given a

    name (uniformSample was chosen here)

    which will appear in StatPads lists of data

    sets.

    To select a uniform sample using

    StatPads main dialog box, select Sampling

    as the situation and Uniform Distribution as

    the analysis. Specify the Smallest and Largest values of the distribution. Specify the Sample

    Size. Provide an optional name for the resulting data in case you plan to refer to it later, check

    the Output Range, and then select Do It.

    Random sample of size 5

    selected from a uniform distribution from 35 to 45:

    uniformSample

    37.28

    41.19

    39.90

    41.81

    43.87

  • 32 Overview of StatPad Features

    Normal Distribution

    A normal distribution generates

    numbers, chosen independently of one

    another, that follow a bell-shaped

    distribution, with values most likely to fall

    near the mean and the width of the bell

    defined by the standard deviation (Std dev).

    Observations fall within one standard

    deviation of the mean about 68% of the time.

    In StatPads results (below) five numbers

    were selected from a normal distribution with

    mean 65 and standard deviation 20. This list

    of five numbers has also been given a name

    (simulatedScores was chosen here) which

    will appear in StatPads lists of data sets.

    To select a normal sample using StatPads main dialog box, select Sampling as the

    situation and Normal Distribution as the analysis. Specify the Mean and Standard Deviation

    (Std dev) values of the distribution. Specify the Sample Size. Provide an optional name for the

    resulting data in case you plan to refer to it later, check the Output Range, and then select Do It.

    Random sample of size 5

    selected from a normal distribution with mean 65 and standard deviation 20:

    simulatedScores

    49.78

    69.27

    58.02

    88.10

    63.90

  • Overview of StatPad Features 33

    Binomial Distribution

    A binomial distribution is used to

    describe the number of times an event

    happens out of n trials, where each trial was

    performed independently with a fixed

    probability.

    In StatPads results (below) five numbers

    are selected from a binomial distribution with

    10 trials each having probability 0.5 of

    success. In the first of the five samples, there

    were 4 out of 10 successes. In the second

    sample, 6 of 10 were successful.

    To select a binomial sample using

    StatPads main dialog box, select Sampling

    as the situation and Binomial Distribution as the analysis. Specify the Number n of trials and

    the Probability of each trial. Specify the Sample Size. Provide an optional name for the resulting

    data in case you plan to refer to it later, check the Output Range, and then select Do It.

    Random sample of size 5 selected from a binomial distribution

    representing the number of successes in 10 trials, each with probability 0.5:

    4

    6

    6

    5

    6

  • 34 Overview of StatPad Features

    Binomial Percentages

    Binomial percentages describe the

    percent or proportion of the time an event

    happens out of n trials, where each trial was

    performed independently with a fixed

    probability.

    In StatPads results (below) five binomial

    percents were selected from a distribution

    with 10 trials each having probability 0.5 of

    success. In the first of the five samples, 0.3 or

    30% of the 10 trials were successful. In the

    second sample, 60% of the 10 were

    successful.

    To select a sample of binomial

    percentages using StatPads main dialog box, select Sampling as the situation and Binomial

    Percentages as the analysis. Specify the Number n of trials and the Probability of each trial.

    Specify the Sample Size. Provide an optional name for the resulting data in case you plan to

    refer to it later, check the Output Range, and then select Do It.

    Random sample of size 5 selected from a binomial distribution

    representing the percentage of successes in 10 trials, each with probability 0.5:

    0.3

    0.6

    0.4

    0.8

    0.3

  • Overview of StatPad Features 35

    Probability Calculations

    Normal Probability (Greater Than)

    A normal distribution generates numbers

    according to a bell-shaped distribution, with

    values most likely to fall near the mean and

    the width of the bell defined by the standard

    deviation. Observations fall within one

    standard deviation of the mean about 68% of

    the time. Probabilities for a normal

    distribution are given by the area under the

    bell-shaped curve.

    StatPads result (below) shows the

    probability (0.401) that the specified normal

    distribution (with mean 75 and standard

    deviation 20) is greater than the given value

    (80).

    To find a normal probability using StatPads main dialog box, select Probability as the

    situation and Normal Probability as the analysis. Choose the type of probability you want

    (Greater than, Less than, Between, or Not between), then give the Value(s) requested. Specify the

    Mean and Standard Deviation of the normal distribution. Check the Output Range, and then

    select Do It.

    The probability that a normal random variable

    with mean 75 and standard deviation 20

    is greater than 80 is:

    0.401

  • 36 Overview of StatPad Features

    Normal Probability (Between)

    When you ask StatPad to find the

    probability of being between (or not

    between), the dialog box changes to allow

    you to specify the two values, lower and

    upper.

    StatPads result (below) shows the

    probability (0.175) that the specified normal

    distribution (with mean 75 and standard

    deviation 20) is between the two given values

    (80 and 90).

    To find a normal probability using

    StatPads main dialog box, select Probability

    as the situation and Normal Probability as

    the analysis. Choose the type of probability you want (Greater than, Less than, Between, or Not

    between), then give the Value(s) requested. Specify the Mean and Standard Deviation of the

    normal distribution. Check the Output Range, and then select Do It.

    The probability that a normal random variable

    with mean 75 and standard deviation 20

    is between 80 and 90 is:

    0.175

  • Overview of StatPad Features 37

    Binomial Probability (Equal to)

    A binomial distribution describes the

    number of times an event happens out of n

    trials, where each trial was performed

    independently with a fixed probability.

    StatPads result (below) shows the

    probability (0.205) that a specified binomial

    distribution is exactly equal to 4. That is, the

    probability is 0.205 of observing exactly 4

    successes out of 10 independent trials with

    probability 0.5 for each trial.

    To find a binomial probability using

    StatPads main dialog box, select Probability

    as the situation and Binomial Probability as

    the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or

    Between), then give the Value(s) requested. Specify the Number n of trials and the Probability

    for each trial of the binomial distribution. Check the Output Range, and then select Do It.

    If you specify an Equal to value that is not a whole number, StatPad correctly reports the

    resulting probability as zero because a binomial random variable gives the number of successes

    (which must be a whole number).

    The probability that a binomial random variable

    with 10 repeated trials, each with probability 0.5

    is equal to 4 is:

    0.205

  • 38 Overview of StatPad Features

    Binomial Probability (This or Less)

    You can also ask StatPad to find the

    probability that a binomial random variable

    is This value or more, This value or

    less, or Between two values.

    StatPads result (below) shows the

    probability (0.377) that a specified binomial

    distribution is 4 or less. That is, the

    probability is 0.377 of observing exactly 0, 1,

    2, 3, or 4 successes out of 10 independent

    trials with probability 0.5 for each trial.

    To find a binomial probability using

    StatPads main dialog box, select Probability

    as the situation and Binomial Probability as

    the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or

    Between), then give the Value(s) requested. Specify the Number n of trials and the Probability

    for each trial of the binomial distribution. Check the Output Range, and then select Do It.

    The probability that a binomial random variable

    with 10 repeated trials, each with probability 0.5

    is 4 or less is:

    0.377

  • Overview of StatPad Features 39

    Binomial Percent (Equal to)

    A binomial percentage describes the

    percent or proportion of the time an event

    happens out of n trials, where each trial was

    performed independently with a fixed

    probability.

    StatPads result (below) shows the

    probability (0.117) that a specified binomial

    percentage distribution is exactly equal to

    70%. That is, the probability is 0.117 of

    observing exactly 70% successes out of 10

    independent trials (this would be 7 successes

    out of the 10 trials) with probability 0.5 for

    each trial.

    To find a probability for a binomial percent using StatPads main dialog box, select

    Probability as the situation and Binomial Percent as the analysis. Choose the type of probability

    you want (Equal to, This or more, This or less, or Between), then give the Value(s) requested as

    a percentage. Specify the Number n of trials and the Probability for each trial of the binomial

    distribution. Check the Output Range, and then select Do It.

    The probability that a binomial percentage

    with 10 repeated trials, each with probability 0.5

    is equal to 70% is:

    0.117

  • 40 Overview of StatPad Features

    Binomial Percent (Between)

    You can also ask StatPad to find the

    probability that a binomial percent is This

    value or more, This value or less, or

    Between two values.

    StatPads result (below) shows the

    probability (0.171) that a specified binomial

    percentage distribution is between 70% and

    90%. That is, the probability is 0.171 of

    observing between 70% and 90% successes

    out of 10 independent trials with probability

    0.5 for each trial (which, in this case,

    corresponds to 7, 8, or 9 occurrences

    representing 70%, 80%, or 90% successes).

    To find a probability for a binomial percent using StatPads main dialog box, select

    Probability as the situation and Binomial Percent as the analysis. Choose the type of probability

    you want (Equal to, This or more, This or less, or Between), then give the Value(s) requested as

    a percentage. Specify the Number n of trials and the Probability for each trial of the binomial

    distribution. Check the Output Range, and then select Do It.

    The probability that a binomial percentage

    with 10 repeated trials, each with probability 0.5

    is between 70% and 90% is:

    0.171

  • Overview of StatPad Features 41

    Poisson Probability (Equal to)

    A Poisson distribution describes the

    number of times an event happens, where the

    event happens independently at a fixed mean

    rate.

    StatPads result (below) shows the

    probability (0.0337) that a specified Poisson

    distribution is exactly equal to 1. That is, the

    probability is 0.0337 of observing exactly 1

    occurrence of the event when we expect on

    average to see 5 occurrences. The probability

    is small because we expect many more (5

    occurrences), on average, but may

    occassionally (about 3% of the time) see just

    one.

    To find a Poisson probability using StatPads main dialog box, select Probability as the

    situation and Poisson Probability as the analysis. Choose the type of probability you want

    (Equal to, This or more, This or less, or Between), then specify the whole-number Value(s) and

    the Mean rate of occurrence (which is not required to be a whole number) check the Output

    Range, and then select Do It.

    The probability that a Poisson random variable

    with mean 5

    is equal to 1 is:

    0.0337

  • 42 Overview of StatPad Features

    Poisson Probability (This or Less)

    You can also ask StatPad to find the

    probability that a Poisson random variable is

    This value or more, This value or less,

    or Between two values.

    StatPads result (below) shows the

    probability (0.265) that the specified Poisson

    distribution is 3 or less. That is, the

    probability is 0.265 of observing exactly 0, 1,

    2, or 3 occurrences when we expect on

    average to see 5 occurrences.

    To find a Poisson probability using

    StatPads main dialog box, select Probability

    as the situation and Poisson Probability as

    the analysis. Choose the type of probability you want (Equal to, This or more, This or less, or

    Between), then give the whole-number Value(s) requested. Specify the Mean rate of occurrence

    of the Poisson distribution (which is not required to be a whole number) check the Output

    Range, and then select Do It.

    The probability that a Poisson random variable

    with mean 5

    is 3 or less is:

    0.265

  • Overview of StatPad Features 43

    Exponential Probability (This or More)

    The exponential distribution is a skewed

    continuous distribution that is often used to

    model the amount of time until a task is

    completed or until an event happens. The

    distribution is specified by giving its mean,

    which is not required to be a whole number.

    StatPads result (below) shows the

    probability (0.768) that the specified

    exponential distribution is 2.38 or more. That

    is, the probability is 0.768 of observing 2.38

    or more when we expect 9 on average.

    To find an exponential probability using

    StatPads main dialog box, select Probability

    as the situation and Exponential Probability as the analysis. Choose the type of probability you

    want (This or more, This or less, Between, or Not between), then specify the Value(s) and the

    Mean, check the Output Range, and then select Do It.

    The probability that an exponential random variable

    with mean 9

    is 2.38 or more is:

    0.768

  • 44 Overview of StatPad Features

    Exponential Probability (Between)

    You can also ask StatPad to find the

    probability that an exponential random

    variable is This value or less, Between

    two values, or Not between two values.

    StatPads result (below) shows the

    probability (0.243) that the specified

    exponential distribution is between 5.2 and

    10.3. That is, the probability is 0.243 of

    observing a value between 5.2 and 10.3 when

    we expect 9 on average.

    To find an exponential probability using

    StatPads main dialog box, select Probability

    as the situation and Exponential Probability

    as the analysis. Choose the type of probability you want (This or more, This or less, Between, or

    Not between), then specify the Value(s) and the Mean, check the Output Range, and then select

    Do It.

    The probability that an exponential random variable

    with mean 9

    is between 5.2 and 10.3 is:

    0.243

  • Overview of StatPad Features 45

    Discrete Probability

    A discrete probability distribution is

    characterized by two lists: a list of values and

    a list of probabilities (where the probabilities

    must add up to 1). StatPad computes the

    Expected Value (also called the Mean) as the

    weighted average of the values (using

    probabilities as the weights) and also

    computes the standard deviation, once you

    specify these two columns of numbers.

    StatPads results for a situation with

    three possibilities is shown below, where the

    probability is 0.2 that profit is 3

    ($thousands), the probability is 0.5 that profit

    is 5, and probability is 0.3 that profit is 8.

    These are specified as two separate columns of numbers, each with its name (Profit is a

    column containing 3, 5, and 8, while ProbabilityOfProfit is a column containing 0.2, 0.5, and

    0.3 which properly add up to 1). We see from the results below that the expected value is $5.5

    thousand and the standard deviation (measuring the risk of this situation) is $1.8 thousand for

    this discrete random variable.

    To compute mean and standard deviation for a discrete random variable, using StatPads

    main dialog box, select Probability as the situation and Discrete Probability as the analysis.

    Select one from each of the two lists (or use Add Data if your columns of numbers are in the

    worksheet but are not in the lists) being sure to correctly specify which one contains the values

    and which one contains the probabilities. Next check the Output Range and then select Do It.

    For the discrete random variable with values in Profit

    and probabilities in ProbabilityOfProfit, we have:

    5.50 Mean (or expected value)

    1.80 Standard Deviation

  • 46 Overview of StatPad Features

    Two-Sample Analysis

    Summaries

    Summaries are used to give you selected

    numbers that represent and describe your

    data sets. When you have two samples,

    StatPad first reports summaries for each

    sample separately, then gives the average

    difference and the standard error of the

    average difference, indicating the sampling

    variability of the average difference. Note

    that the two samples are assumed to have the

    same measurement units (e.g., dollars).

    StatPads two-sample summaries (below)

    are shown for the results of a survey sent to

    customers in the East and to those in the

    West.

    To compute summaries for two samples using StatPads main dialog box, select Two

    Samples as the situation and Summaries as the analysis. Select a data set from each list (or use

    Add Data if your columns of numbers are in the worksheet but are not in the lists). You may

    (optionally) click on Paired to specify that the data sets have a natural pairing if the counts are

    equal for the two data sets. Next check the Output Range and then select Do It.

    The Paired check-box only affects the standard error of the difference. For a paired

    situation, StatPad gives the ordinary standard error for the paired differences. For an unpaired

    situation, StatPad uses the large-sample formula S n S n12

    1 2

    2

    2/ / if both counts are at least 30.

    Otherwise, StatPad uses the small-sample formula (assuming equal population variabilities)

    ( ) ( ) / / / ( )n S n S n n n n1 12

    2 2

    2

    1 2 1 21 1 1 1 2 .

    East West Summaries

    17 19 Count n

    1,834 2,390 Mean or average

    661 761 Standard deviation (variability of individuals)

    752 836 Smallest

    1,295 2,004 Lower quartile

    1,931 2,294 Median

    2,426 2,853 Upper quartile

    2,975 4,085 Largest

    160 175 Standard error (variability of sample average, if random sample) 557 Average difference, West East

    239 Standard error of the difference

    using the small-sample unpaired formula,

    which assumes equal population variabilities.

  • Overview of StatPad Features 47

    Histograms

    Histograms are used to visually explore

    data sets. The data axis is horizontal, and the

    bars show how many data values are within

    each interval. Data are concentrated where

    bars are tall. You can see typical value,

    variability, and distribution shape.

    StatPads histograms are shown below

    for the East and West survey data, one

    histogram for each data set.

    To create histograms for two samples

    using StatPads main dialog box, select Two

    Samples as the situation and Histograms as

    the analysis. Select a data set from each list

    (or use Add Data if your columns of numbers are in the worksheet but are not in the lists). Next

    check the Output Range and then select Do It.

    StatPad chooses a default bin width and landmark for the histogram bars. If you wish to

    change these, use the Customize check-box found under One Sample, Histogram. Note that

    Excel (not StatPad) chooses the minimum and maximum horizontal scale. These may be changed

    by leaving StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to

    find Minimum and Maximum as Axis Options.

    0

    1

    2

    3

    4

    0 1000 2000 3000

    East

    Fre

    quency

    0

    5

    10

    0 1000 2000 3000 4000 5000

    West

    Fre

    quency

  • 48 Overview of StatPad Features

    Box Plots

    Box plots are used to visually explore

    and compare data sets; they show you a

    central box defined by the quartiles, with the

    median indicated within the box. In the

    ordinary box plot, a line extends from the box

    on each side to the most extreme value. In the

    detailed box plot, outliers are indicated

    separately and these lines extend from the

    box on each side to the most extreme value

    (adjacent value) that is not an outlier.

    StatPads detailed box plots are shown

    below, on the same scale, for the East and

    West survey data. Note that the western

    values are generally somewhat higher,

    although there is considerable overlap. There are no outliers.

    To create box plots for two samples using StatPads main dialog box, select Two Samples

    as the situation and Box Plots as the analysis. Select a data set from each list (or use Add Data

    if your columns of numbers are in the worksheet but are not in the lists). Click on Detailed box

    plot if you wish outliers to be displayed separately. Next check the Output Range and then select

    Do It.

    Outliers are defined as data values more than 1.5 times the interquartile range away from

    either quartile.

    0 1000 2000 3000 4000 5000

    East (bottom), West (top)

    0 1000 2000 3000 4000 5000

    East (bottom), West (top)

  • Overview of StatPad Features 49

    Confidence Interval

    A two-sample confidence interval for the

    population mean difference includes this

    unknown population mean difference with

    known confidence, e.g., 95%, when random

    sampling is used and normal distributions are

    assumed.

    StatPads 95% confidence interval

    results (below) for the mean difference, West

    minus East, tell you that the bounds of the

    interval are 71.16 and 1,042.42.

    To compute a two-sample confidence

    interval using StatPads main dialog box,

    select Two Samples as the situation and Confidence Interval as the analysis. Select a data set

    from each list (or use Add Data if your columns of numbers are in the worksheet but are not in

    the lists). You may (optionally) change the Confidence level (from the default 95%). You may

    also (optionally) click on Paired to specify that the data sets have a natural pairing if the counts

    are equal for the two data sets. Next check the Output Range and then select Do It.

    The two-sample confidence interval is based on the standard error of the difference,

    described previously under Two Sample, Summaries. If unpaired, random sampling from each of

    two normal populations is assumed (also assuming equal population variabilities if the small-

    sample standard error is used). If paired, random sampling from a normal population is

    assumed for the differences formed from the two measurements on each unit sampled.

    Confidence interval for the difference:

    West East We are 95% sure that the

    population mean difference is between

    71.16 and 1,042.42

    using the small-sample unpaired standard error,

    which assumes equal population variabilities, and also

    assuming random samples from normal populations.

  • 50 Overview of StatPad Features

    Hypothesis Test

    A two-sample hypothesis test is used to

    decide, based on data, whether or not the

    unobservable population means could

    reasonably be equal to each other. Because

    the sample averages represent (with

    statistical error) their respective unknown

    population means, the result is often stated in

    terms of a significant (or nonsignificant)

    difference between the sample averages, both

    of which are known.

    StatPads two-sample hypothesis test

    results for the East and West survey (below)

    show a significant difference between the two

    regions (East and West) on average. Results

    include the t value, the p value, the practical interpretation of the results, and a formal statement

    of the null hypothesis being tested.

    To perform a two-sample hypothesis test using StatPads main dialog box, select Two

    Samples as the situation and Hypothesis Test as the analysis. Select a data set from each list (or

    use Add Data if your columns of numbers are in the worksheet but are not in the lists). You may

    (optionally) click on Paired to specify that the data sets have a natural pairing if the counts are

    equal for the two data sets. Next check the Output Range and then select Do It.

    The two-sample hypothesis test is based on the standard error of the difference, described

    previously under Two Sample Summaries. If unpaired, random sampling from each of two

    normal populations is assumed (also assuming equal population variabilities if the small-sample

    standard error is used). If paired, random sampling from a normal population is assumed for the

    differences formed from the two measurements on each unit sampled.

    The p value says that, if the population means had been equal to each other, then p is the

    probability of observing such a large (or larger) difference between the sample averages.

    Smaller p values indicate significance because rare events are unlikely.

    Hypothesis test for East and West:

    t = 2.33

    p = 0.026

    The sample averages

    1,834 and 2,390

    are significantly different (p

  • Overview of StatPad Features 51

    Many-Sample Analysis

    Summaries

    Summaries are used to give you selected

    numbers that represent and describe your

    data sets. When you have many samples,

    StatPad reports summaries for each sample

    separately.

    StatPads many-sample summaries

    (below) are shown for the quality scores of

    four suppliers (defining four samples). For

    example, supplier B had 35 scores listed, with

    an average of 85.14.

    To compute summaries for many samples

    using StatPads main dialog box, select Many

    Samples as the situation and Summaries as

    the analysis. Select your data sets from the list (or use Add Data if your columns of numbers are

    in the worksheet but are not in the list). Next check the Output Range and then select Do It.

    SupplierA SupplierB SupplierC SupplierD Summaries

    20 35 15 25 Count n

    90.97 85.14 76.00 89.35 Mean or average

    4.68 4.84 3.73 5.05 Standard deviation (variability of individuals)

    82.41 74.31 71.11 75.97 Smallest

    88.57 81.61 73.10 86.77 Lower quartile

    90.13 84.97 75.22 88.80 Median

    93.00 89.05 79.56 93.24 Upper quartile

    99.63 94.61 82.44 98.54 Largest

    1.05 0.82 0.96 1.01 Standard error (variability of sample average, if

    random sample)

  • 52 Overview of StatPad Features

    Histograms

    Histograms are used to visually explore

    data sets. The data axis is horizontal, and the

    bars show how many data values are within

    each interval. Data are concentrated where

    bars are tall. You can see typical value,

    variability, and distribution shape.

    StatPads many-sample histograms

    (below) are shown for the quality scores of

    the four suppliers. Some of the horizontal

    scales have been changed using Excel chart

    commands (see below) because Excels

    choice did not show enough detail.

    To create histograms for many samples

    using StatPads main dialog box, select Many Samples as the situation and Histograms as the

    analysis. Select your data sets from the list (or use Add Data if your columns of numbers are in

    the worksheet but are not in the list). Next check the Output Range and then select Do It.

    StatPad chooses a default bin width and landmark for the histogram bars. If you wish to

    change these, use the Customize check-box found under One Sample, Histogram. Note that

    Excel (not StatPad) chooses the minimum and maximum horizontal scale. These may be changed

    by leaving StatPad by hitting the Esc key or selecting Cancel, then double-clicking on the axis to

    find Minimum and Maximum as Axis Options.

    0

    2

    4

    6

    8

    70 80 90 100

    SupplierA

    Fre

    quency

    0

    5

    10

    15

    70 80 90 100

    SupplierB

    Fre

    quency

    0

    1

    2

    3

    4

    70 75 80 85

    SupplierC

    Fre

    quency

    0

    5

    10

    70 80 90 100

    SupplierD

    Fre

    quency

  • Overview of StatPad Features 53

    Box Plots

    Box plots are used to visually explore

    and quickly compare data sets; they show you

    a central box defined by the quartiles, with

    the median indicated within the box. In the

    ordinary box plot, a line extends from the box

    on each side to the most extreme value. In the

    detailed box plot, outliers are indicated

    separately and these lines extend from the

    box on each side to the most extreme value

    (adjacent value) that is not an outlier.

    StatPads many-sample detailed box

    plots (below) are shown for the quality scores

    of the four suppliers, arranged on the same

    scale for easy comparison. There is one box

    plot for each supplier. Suppliers A and D seem to have the highest scores overall, while supplier

    C has the lowest. Supplier D has a low outlier score. The horizontal scale was changed using

    Excel chart commands (see below) because Excels choice did not show enough detail.

    To create box plots for many samples using StatPads main dialog box, select Many

    Samples as the situation and Box Plots as the analysis. Select your data sets from the list (or use

    Add Data if your columns of numbers are in the worksheet but are not in the list). Click on

    Detailed box plot if you wish outliers to be displayed separately. Next check the Output Range

    and then select Do It.

    Outliers are defined as data values more than 1.5 times the interquartile range away from

    either quartile. Note that Excel (not StatPad) chooses the minimum and maximum horizontal

    scale. These may be changed by leaving StatPad by hitting the Esc key or selecting Cancel, then

    double-clicking on the axis to find Minimum and Maximum as Axis Options.

    70 80 90 100

    bottom to top: SupplierA, SupplierB, SupplierC,

    SupplierD

    70 80 90 100

    bottom to top: SupplierA, SupplierB, SupplierC,

    SupplierD

  • 54 Overview of StatPad Features

    F Test for One-Way ANOVA

    The F test for one-way ANOVA (analysis

    of variance) is used to decide, based on data,

    whether or not the unobservable population

    means could reasonably be equal to each

    other. Because the sample averages represent

    (with statistical error) their respective

    unknown population means, the result is often

    stated in terms of a