Intro To Statistics

Embed Size (px)

DESCRIPTION

Introduction to basic statistics.

Citation preview

PowerPoint Presentation

Basic statisticsDescriptive Analysis (Graphical)

1

Dealing with Uncertainty Everyday decisions are based on incomplete information

Consider:

Will the job market be strong when I graduate?

Will the price of HELPs stock be higher in six months than it is now?

Dealing with Uncertainty (continued)Numbers and data are used to assist decision making

Statistics is a tool to help process, summarize, analyze, and interpret data

Population vs. Sample Population

Population vs. Sample (continued)Values calculated using population data are called parameters

Values computed from sample data are called statistics

Symbols

Descriptive and Inferential StatisticsTwo branches of statistics:Descriptive statisticsUsing graphical and numerical procedures to summarize and process dataInferential statisticsUsing data to make predictions, forecasts, and estimates to assist decision making

Descriptive StatisticsCollect datae.g., Survey Present datae.g., Tables and graphs Summarize datae.g., Sample mean =

Inferential StatisticsEstimatione.g., Estimate the population mean weight using the sample mean weightHypothesis testinge.g., Test the claim that the population mean weight is 140 pounds

Inference is the process of drawing conclusions or making decisions about a population based on sample results

Types of DataExamples:Marital StatusAre you registered to vote?Eye Color (Defined categories or groups)Examples:Number of ChildrenDefects per hour (Counted items)Examples:WeightVoltage(Measured characteristics)

10

Measurement Levels

Interval Data

Ordinal Data

Nominal Data Categories (no ordering or direction)Ordered Categories (rankings, order, or scaling) Differences between measurements but no true zeroRatio Data

Differences between measurements, true zero existsQuantitative DataQualitative Data

Measurement Levels

Variables can be split into categorical and continuous, and within these types there are different levels of measurement :CategoricalNominalThe lowest scaleNumbers assigned to identify attributesNo order/ sequence

Ordinal scaleThe same as a nominal variable but thecategories have a logical orderArrange from lowest to highest or vice versa

Continuous variableInterval ScaleEqual intervals on the variable represent equaldifferences in the property being measuredArbitrary zero

Ratio Scale Same as an interval variable, equal intervals on the variable represent equal differences in the property being measuredTrue zero

What are these variables measurement ScaleSpeed(Km/h) 9. Favorite FoodMotivation scores 10. Speaking AbilityNumber of SMS received.NationalityPerception scores Quality of work life scores.Income categoriesMusical ability

What are these variables measurement ScaleSpeed(Km/h)- Ratio 9. Favorite Food- NominalMotivation scores-Interval 10. Speaking Ability-OrdinalNumber of SMS - Ratio received.Nationality- NominalPerception scores IntervalQuality of work life - Interval scores.Income categories OrdinalMusical ability- Ordinal

Descriptive statistics : Graphical Procedures

Data in raw form are usually not easy to use for decision making

Some type of organization is neededTableGraph

The type of graph to use depends on the variable being summarized

16

Descriptive statistics : Graphical Procedures(continued)

CategoricalVariables Frequency distribution Bar chart Pie chart

NumericalVariables

Frequency distribution Histogram and ogive Stem-and-leaf display Scatter plot

17

Tables and Graphs for Categorical Variables

Graphing Data

Pie Chart

Bar Chart

Frequency Distribution TableTabulating DataCategorical Data

18

The Frequency Distribution Table

Example: Hospital Patients by Unit

Hospital Unit Number of Patients

Cardiac Care 1,052 Emergency 2,245Intensive Care 340Maternity 552Surgery 4,630

(Variables are categorical)

Bar and Pie ChartsBar charts and Pie charts are often used for qualitative (category) data

Height of bar or size of pie slice shows the frequency or percentage for each category

Bar Chart Example

Hospital Number Unit of Patients

Cardiac Care 1,052Emergency 2,245Intensive Care 340Maternity 552Surgery4,630

Pie Chart Example(Percentages are rounded to the nearest percent)

Hospital Number % of Unit of Patients Total

Cardiac Care 1,052 11.93Emergency 2,245 25.46Intensive Care 340 3.86Maternity 552 6.26Surgery 4,630 52.50

Graphs to Describe Numerical Variables

Numerical Data

Stem-and-LeafDisplay

Histogram

OgiveFrequency Distributions andCumulative Distributions

Frequency DistributionsWhat is a Frequency Distribution?A frequency distribution is a list or a table

containing class groupings (categories or ranges within which the data fall) ...

and the corresponding frequencies with which data fall within each class or category

24

Why Use Frequency Distributions?A frequency distribution is a way to summarize data

The distribution condenses the raw data into a more useful form...

and allows for a quick visual interpretation of the data

25

Frequency Distribution Example

Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Frequency Distribution ExampleIntervalFrequencyRelative FrequencyPercentage10 but less than 2030.150.1520 but less than 3060.300.3030 but less than 4050.250.2540 but less than 5040.200.2050 but less than 6020.100.10TOTAL201100

Data in ordered array:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

27

HistogramA graph of the data in a frequency distribution is called a histogram

The interval endpoints are shown on the horizontal axis

The vertical axis is either frequency, relative frequency, or percentage

Bars of the appropriate heights are used to represent the number of observations within each class

Histogram Example

Temperature in Degrees(No gaps between bars)

Interval10 but less than 20 320 but less than 30 630 but less than 40 540 but less than 50 450 but less than 60 2Frequency

The Cumulative Frequency Distribution

Class10 but less than 20 3 15 3 1520 but less than 30 6 30 9 4530 but less than 40 5 25 14 7040 but less than 50 4 20 18 9050 but less than 60 2 10 20 100 Total 20 100

Percentage

Cumulative Percentage

Data in ordered array:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58FrequencyCumulative Frequency

30

The Ogive Example

Interval endpoints

IntervalLess than 10 10 010 but less than 20 20 1520 but less than 30 30 4530 but less than 40 40 7040 but less than 50 50 9050 but less than 60 60 100

Cumulative PercentageUpper interval endpoint

Stem-and-Leaf Diagram

A simple way to see distribution details in a data set

METHOD: Separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves)

Example

21 is shown as38 is shown as

Stem Leaf 2 1 3 8Data in ordered array:21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Here, use the 10s digit for the stem unit:

ExampleData in ordered array:21, 24, 24, 26, 27, 27, 30, 32, 38, 41Completed stem-and-leaf diagram:StemLeaves21 4 4 6 7 730 2 841

(continued)

Using other stem units

Using the 100s digit as the stem:Round off the 10s digit to form the leaves

613 would become 6 1776 would become 7 8 . . .1224 becomes 12 2

Stem Leaf

Using other stem units

Using the 100s digit as the stem:The completed stem-and-leaf display:

Stem Leaves 6 1 3 6 7 2 2 5 8 8 3 4 6 6 9 9 9 1 3 3 6 8 10 3 5 6 11 4 7 12 2Data:

613, 632, 658, 717,722, 750, 776, 827,841, 859, 863, 891,894, 906, 928, 933,955, 982, 1034, 1047,1056, 1140, 1169, 1224

(continued)

Relationships Between Variables

Graphs illustrated so far have involved only a single variableWhen two variables exist other techniques are used:Categorical(Qualitative)VariablesNumerical(Quantitative)Variables Cross tables Scatter plots

Scatter DiagramsScatter Diagrams are used for paired observations taken from two numerical variables

The Scatter Diagram:one variable is measured on the vertical axis and the other variable is measured on the horizontal axis

Scatter Diagram Example

Volume per dayCost per day231252614029146331603816742170501885519560200

Cross Table and Side by Side Bar Chart

Sales by quarter for three sales territories:

41

Sheet1Hospital UnitNumber of PatientsCardiac Care1052Emergency2245Intensive Care340Maternity552Surgery4630

Chart4105222453405524630

Number of patients per yearHospital Patients by Unit

Chart1105222453405524630

Hospital Patients by Unit

Sheet1Hospital UnitNumber of PatientsCardiac Care1052Emergency2245Intensive Care340Maternity552Surgery4630

Sheet2

Sheet3

Chart20365420

FrequencyFrequencyHistogram: Daily High Temperature

Sheet4BinFrequency105200300400500More0

Sheet5BinFrequency100203307404504602More0

Sheet50000000

FrequencyBinFrequencyHistogram

Sheet6BinFrequencyCumulative %BinFrequencyCumulative %50.00%35630.00%15210.00%45555.00%25430.00%25475.00%35660.00%15285.00%45585.00%55295.00%55295.00%651100.00%651100.00%50100.00%More0100.00%More0100.00%

Sheet60000000000000000

FrequencyCumulative %BinFrequencyHistogram

Sheet7BinFrequency5001531025620355304544055250More060

Sheet7

FrequencyFrequencyHistogram

Sheet8BinFrequency00102204306405502601More0

Sheet800000000

FrequencyFrequencyHistogram

Sheet9BinFrequency00103207304404502More0

Sheet90000000

FrequencyBinFrequencyHistogram

Sheet10

Sheet11BinFrequency9.9319.9620.9130.9440.9450.92More0

Sheet110000000

FrequencyBinFrequencyHistogram

Sheet12BinFrequency9.9319.9629.9539.9449.9259.90More0

Sheet120000000

FrequencyFrequencyHistogram

Sheet229.9319.9729.91139.91249.91459.91617172021252728313334364348

Sheet3103206305404502

Chart1015457090100

FrequencyCumulative PercentageOgive: Daily High Temperature

Sheet4BinFrequency105200300400500More0

Sheet5BinFrequency100203307404504602More0

Sheet50374420

FrequencyBinFrequencyHistogram

Sheet6BinFrequencyCumulative %BinFrequencyCumulative %50.00%35630.00%15210.00%45555.00%25430.00%25475.00%35660.00%15285.00%45585.00%55295.00%55295.00%651100.00%651100.00%50100.00%More0100.00%More0100.00%

Sheet660.350.5540.7520.8520.95110101

FrequencyCumulative %BinFrequencyHistogram

Sheet7BinFrequency100201530454070509060100

Sheet70000000

Frequency

Sheet80000000

FrequencyOgive

Sheet9BinFrequency00102204306405502601More0

Sheet902465210

FrequencyFrequencyHistogram

Sheet10BinFrequency00103207304404502More0

Sheet100374420

FrequencyBinFrequencyHistogram

Sheet11

Sheet12BinFrequency9.9319.9620.9130.9440.9450.92More0

Sheet123614420

FrequencyBinFrequencyHistogram

Sheet2BinFrequency9.9319.9629.9539.9449.9259.90More0

Sheet23654200

FrequencyFrequencyHistogram

Sheet329.9319.9729.91139.91249.91459.91617172021252728313334364348

103206305404502

Chart2125140146160167170188195200

Cost per dayVolume per DayCost per DayCost per Day vs. Production Volume

Sheet1Volume per dayCost per day231252614029146331603816742170501885519560200

Sheet1000000000

Cost per dayVolume per DayCost per DayProduction Volume vs. Cost per Day

Sheet2

Sheet3