70
Introduction to Applied Statistics CHAPTER 1 BUM 2413 / BPF 3313

Chapter 1 Introduction to Applied Statistics 111

Embed Size (px)

Citation preview

Page 1: Chapter 1 Introduction to Applied Statistics 111

Introduction to Applied Statistics

CHAPTER 1

BUM 2413 / BPF 3313

Page 2: Chapter 1 Introduction to Applied Statistics 111

CONTENT

1.1 Overview

1.2 Statistical Problem-Solving Methodology

1.3 Review of Descriptive Statistics

1.3.1 Measures of Central Tendency

1.3.2 Measures of Variation

Page 3: Chapter 1 Introduction to Applied Statistics 111

OBJECTIVE By the end of this chapter, you should be able to

Define the meaning of statistics, population, sample, parameter, statistic, descriptive statistics and inferential statistics.

Understand and explain why a knowledge of statistics is needed Outline the 6 basic steps in the statistical problem solving

methodology. Identifies various method to obtain samples. Discuss the role of computers and data analysis software in

statistical work. Summarize data using measures of central tendency, such as

the mean, median, mode, and midrange. Describe data using measures of variation, such as the range,

variance, and standard deviation.

Page 4: Chapter 1 Introduction to Applied Statistics 111

1.1 OVERVIEW

Page 5: Chapter 1 Introduction to Applied Statistics 111

What is Statistics?Most people become familiar with probability and statistics through radio, television, newspapers, and magazines. For example, the following statements were found in newspapers:

• Ten of thousands parents in Malaysia have chosen StemLife as their trusted stem cell bank.

• The average annual salary for a professional football player for the year 2001 was $1,100,500.

• The average cost of a wedding is nearly RM10,000.

• In USA, the median salary for men with a bachelor’s degree is $49,982, while the median salary for women with a bachelor’s degree is $35,408.

• Globally, an estimated 500,000 children under the age of 15 live with Type 1 diabetes.

• Women who eat fish once a week are 29% less likely to develop heart disease.

Page 6: Chapter 1 Introduction to Applied Statistics 111

Why Statistics? Deal with uncertainty in repeated scientific

measurements

Draw conclusions from data

Design valid experiments and draw reliable conclusions

Be a well-informed member of society

Page 7: Chapter 1 Introduction to Applied Statistics 111

is the sciences of conducting studies to collect, organize, summarize, analyze, present, interpret and draw conclusions from data.

Any values (observations or measurements) that have been collected

Statistics

Page 8: Chapter 1 Introduction to Applied Statistics 111

The basic idea behind all statistical methods of data analysis is to make inferences about a population by studying small

sample chosen from it

PopulationThe complete collection of

measurements outcomes, object or individual under study

SampleA subset of a population,

containing the objects or outcomes that are actually observed

ParameterA number that describes a population characteristics

StatisticA number that describes a

sample characteristics

TangibleAlways finite & after a population is sampled,

the population size decrease by 1The total number of members is fixed &

could be listed

ConceptualPopulation that consists of all the

value that might possibly have been observed & has an unlimited number

of members

Page 9: Chapter 1 Introduction to Applied Statistics 111

Example 1 Consider a machine that makes steel rods for use in optical storage

devices. The specification for the diameter of the rods is 0.45 0.02 cm. During the last hour, the machine has made 1000 rods. The quality engineer wants to know approximately how many of these rods meet the specification. He does not have time to measure all 1000 rods.

So he draws a random sample of 50 rods, measures them, and finds that 46 of them (92%) meet the diameter specification. Now, it is unlikely that the sample of 50 rods represents the population of 1000 perfectly.

Page 10: Chapter 1 Introduction to Applied Statistics 111

Example 1The engineer might need to answer several questions based on the sample data. For example:

1.How large is a typical difference for this kind of sample?

2.What interval gives a good estimate of the percentage of acceptable rods in the population with reasonable certainty?

3.How certain can the engineer be that at least 90% of the rods are good?

Statistics can help us to address questions like these.

Page 11: Chapter 1 Introduction to Applied Statistics 111

Descriptive & Inferential Statistics

Inferential statistics consists of generalizing from

samples to populations, performing estimations hypothesis testing, determining relationships among variables, and making predictions.

Used to describe, infer, estimate, approximate the characteristics of the target population

Used when we want to draw a conclusion for the data obtain from the sample

Descriptive statistics consists of the collection,

organization, classification, summarization, and presentation of data obtain from the sample.

Used to describe the characteristics of the sample

Used to determine whether the sample represent the target population by comparing sample statistic and population parameter

Page 12: Chapter 1 Introduction to Applied Statistics 111

Example 2 Ten of thousands parents in Malaysia have chosen StemLife as their trusted

stem cell bank. (Descriptive)

The death rate from lung cancer was 10 times for smokers compared to nonsmokers. (Inferential)

The average cost of a wedding is nearly RM10,000. (Descriptive)

In USA, the median salary for men with a bachelor’s degree is $49,982, while the median salary for women with a bachelor’s degree is $35,408. (Descriptive)

Globally, an estimated 500,000 children under the age of 15 live with Type 1 diabetes. (Inferential)

A researcher claim that a new drug will reduce the number of heart attacks in men over 70 years of age. (Inferential)

Page 13: Chapter 1 Introduction to Applied Statistics 111

An overview of descriptive statistics and statistical inference

START

Gathering of Data

Classification, Summarization, and Processing of data

Presentation and Communication of

Summarized information

Is Information from a sample?

Use cencus data to analyze the population

characteristic under study

Use sample information to make inferences about

the population

Draw conclusions about the population

characteristic (parameter) under study

STOP

Yes

No

Statistical Inference

Descriptive

Statistics

Statistical Inference

Descriptive Statistics

No

Yes

Page 14: Chapter 1 Introduction to Applied Statistics 111

Need for Statistics It is a fact that, you need a knowledge

of statistics to help you

1. Describe and understand numerical relationship between variables There are a lot of data in this world

so we need to identify the right variables.

2. Make better decision Statistical methods allow people to

make better decisions in the face of uncertainty.

Page 15: Chapter 1 Introduction to Applied Statistics 111

Describing relationship between variables1. A management consultant wants to compare a client’s

investment return for this year with related figures from last year. He summarizes masses of revenue and cost data from both periods and based on his findings, presents his recommendations to his client.

2. A college admission director needs to find an effective way of selecting student applicants. He design a statistical study to see if there’s a significance relationship between SPM result and the gpa achieved by freshmen at his school. If there is a strong relationship, high SPM result will become an important criteria for acceptance.

Page 16: Chapter 1 Introduction to Applied Statistics 111

Aiding in Decision Making1. Suppose that the manager of “Big-Wig Executive Hair Stylist”,

Alvin Tang, has advertised that 90% of the firm’s customers are satisfied with the company’s services. If Pamela, a consumer activist, feels that this is an exaggerated statement that might require legal action, she can use statistical inference techniques to decide whether or not to sue Alvin.

2. Students and professional people can also use the knowledge gained from studying statistics to become better consumers and citizens. For example, they can make intelligent decisions about what products to purchase based on consumer studies about government spending based on utilization studies, and so on.

Page 17: Chapter 1 Introduction to Applied Statistics 111

1.2 STATISTICAL

PROBLEM SOLVING METHODOLOGY

Page 18: Chapter 1 Introduction to Applied Statistics 111

STATISTICAL PROBLEM SOLVING METHODOLOGY

6 Basic Steps

1. Identifying the problem or opportunity

2. Deciding on the method of data collection

3. Collecting the data

4. Classifying and summarizing the data

5. Presenting and analyzing the data

6. Making the decision

Page 19: Chapter 1 Introduction to Applied Statistics 111

STEP 1Identifying the problem or opportunity

Must clearly understand & correctly define the objective/goal of the study If not, time & effort are waste

Is the goal to study some population? Is it to impose some treatment on the group & then test the

response? Can the study goal be achieved through simple counts or

measurements of the group? Must an experiment be performed on the group? If sample are needed, how large?, how should they be

taken? – the larger the better (more than 30)

Page 20: Chapter 1 Introduction to Applied Statistics 111

Characteristics of sample size

The larger the sample, the smaller the magnitude of sampling errors.

Survey studies needed large sample because the returns of the survey is voluntary based.

Easy to divide into subgroups.

In mail response the percentage of response may be as low as 20%-30%, thus the bigger number of samples is required.

Subject availability and cost factors are legitimate considerations in determining appropriate sample size.

Page 21: Chapter 1 Introduction to Applied Statistics 111

STEP 2Deciding on the Method of Data Collection

Data must be gathered that are accurate, as complete as possible & relevant to the problem

Data can be obtained in 3 ways1. Data that are made available by others

(internal, external, primary or secondary data)2. Data resulting from an experiment

(experimental study)3. Data collected in an observational study

(observation, survey, questionnaire, interview)

Page 22: Chapter 1 Introduction to Applied Statistics 111

STEP 3Collecting the data

Nonprobability data Is one in which the judgment of the experimenter, the

method in which the data are collected or other factors could affect the results of the sample

3 basic methods: Judgment samples, Voluntary samples and Convenience samples

Probability data Is one in which the chance of selection of each item in the

population is known before the sample is picked 4 basic methods : random, systematic, stratified, and

cluster.

Page 23: Chapter 1 Introduction to Applied Statistics 111

Nonprobability data samples1. Judgment samples

Base on opinion of one or more expert person Ex: A political campaign manager intuitively picks certain voting

districts as reliable places to measure the public opinion of his candidate

2. Voluntary samples Question are posed to the public by publishing them over radio or

tv (phone or sms)

3. Convenience samples Take an ‘easy sample’ (most conveniently available) Ex: A surveyor will stand in one location & ask passerby their

questions

Page 24: Chapter 1 Introduction to Applied Statistics 111

Probability data samples1. Random samples

Selected using chance method or random methods Example:

A lecturer wants to study the physical fitness levels of students at her university. There are 5,000 students enrolled at the university, and she wants to draw a sample of size 100 to take a physical fitness test. She obtains a list of all 5,000 students, numbered it from 1 to 5,000 and then randomly invites 100 students corresponding to those numbers to participate in the study.

Page 25: Chapter 1 Introduction to Applied Statistics 111

Probability data samples2. Systematic samples

Numbering each subject of the populations and data is selected every kth number.

Example: A lecturer wants to study the physical fitness levels of

students at her university. There are 5,000 students enrolled at the university, and she wants to draw a sample of size 100 to take a physical fitness test. She obtains a list of all 5,000 students, numbered it from 1 to 5,000 and randomly picks one of the first 50 voters (5000/100 = 50) on the list. If the pick number is 30, then the 30th student in the list should be invited first. Then she should invite the selected every 50th name on the list after this first random starts (the 80th student, the 130th student, etc) to produce 100 samples of students to participate in the study.

Page 26: Chapter 1 Introduction to Applied Statistics 111

Probability data samples3. Stratified samples

Dividing the population into groups according to some characteristics that is important to the study, then sampling from each group

Example: A lecturer wants to study the physical fitness levels of students at

her university. There are 5,000 students enrolled at the university, and she wants to draw a sample of size 100 to take a physical fitness test. Assume that, because of different lifestyles, the level of physical fitness is different between male and female students. To account for this variation in lifestyle, the population of student can easily be stratified into male and female students. Then she can either use random method or systematic methods to select the participants. As example she can use random sample to chose 50 male students and use systematic method to chose another 50 female students or otherwise.

Page 27: Chapter 1 Introduction to Applied Statistics 111

Probability data samples4. Cluster samples

Dividing the population into sections/clusters, then randomly select some of those cluster and then choose all members from those selected cluster

Using a cluster sampling can reduce cost and time. Example:

A lecturer wants to study the physical fitness levels of students at her university. There are 5,000 students enrolled at the university, and she wants to draw a sample to take a physical fitness test. Assume that, because of different lifestyles, the level of physical fitness is different between freshmen, sophomores, juniors and seniors students. To account for this variation in lifestyle, the population of student can easily be clustered into freshmen, sophomores, juniors and seniors students. Then she can choose any one cluster such as freshmen and take all the freshmen students as the participant.

Page 28: Chapter 1 Introduction to Applied Statistics 111

Identified the type of sampled obtain

Example 1A physical education professor wants to study the

physical fitness levels of students at her university. There are 20,000 students enrolled at the university, and she wants to draw a sample of size 100 to take a physical fitness test. She obtains a list of all 20,000 students, numbered it from 1 to 20,000 and then

invites the 100 students corresponding to those numbers to participate in the study.

Example 2A quality engineer wants to inspect rolls of wallpaper in order

to obtain information on the rate at which flows in the printing are occurring. She decides to draw a sample of 50 rolls of wallpaper from

a day’s production. Each hour for 5 hours, she takes the 10 most recently produced rolls and counts the number of flaws on each. Is

this a simple random sample?

Page 29: Chapter 1 Introduction to Applied Statistics 111

Example 3Suppose we have a list of 1000 registered voters in a

community and we want to pick a probability sample of 50. We can use a random number table to pick one of the first 20 voters (1000/50 = 20) on our list. If the table gave us the number of 16, the 16th voter on the

list would be the first to be selected. We would then pick every 20th name after this random start (the 36th voter, the 56th voter, etc) to

produce a sample.

Example 4Consumer surveys of large cities often employ cluster sampling.

The usual procedure is to divide a map of the city into small blocks each blocks containing a cluster are surveyed. A number of clusters are

selected for the sample, and all the households in a cluster are surveyed. Using a cluster sampling can reduce cost and time. Less

energy and money are expended if an interviewer stays within a specific area rather than traveling across stretches of the cities.

Page 30: Chapter 1 Introduction to Applied Statistics 111

Example 5Suppose our population is a university student body. We want

to estimate the average annual expenditures of a college student for non school items. Assume we know that, because of different lifestyles,

juniors and seniors spend more than freshmen and sophomores, but there are fewer students in the upper classes than in the lower classes because of some dropout factor. To account for this variation in lifestyle

and group size, the population of student can easily be stratified into freshmen, sophomores, junior and seniors. A sample can be stratum

and each result weighted to provide an overall estimate of average non school expenditures.

Example 6A researcher wanted to survey students in 100 homerooms in

secondary school in a large school district. They could first randomly select 10 schools from all the secondary schools in the district. Then

from a list of homerooms in the 10 schools they could randomly select 100.

Page 31: Chapter 1 Introduction to Applied Statistics 111

STEP 4Classifying and Summarizing the data

Organize or group the facts/sample raw data for study and investigation

Classifying- identifying items with like characteristics & arranging them into groups or classes. Ex: Production data (product make, location, production

process ext..) Data can be classified as Qualitative (categorical/Attributes)

data and Quantitative (Numerical) data. Summarization

Graphical & Descriptive statistics ( tables, charts, measure of central tendency, measure of variation, measure of position)

Page 32: Chapter 1 Introduction to Applied Statistics 111

Data ClassificationData Classification

Variables can be classified

By how they are categorized, counted or measured

- Level of measurements of data

As Quantitative and Qualitative

Data are the values that variables can assume Variables is a characteristic or attribute that can assume different

values. Variables whose values are determined by chance are called random

variables

Page 33: Chapter 1 Introduction to Applied Statistics 111

Types of Data

Qualitative (categorical/Attributes) 1* Data that refers only to name classification (done

using numbers)2* Can be placed into

distinct categories according to some

characteristic or attribute.

Quantitative (Numerical)

1* Data that represent counts or measurements

(can be count or measure)

2* Are numerical in nature and can be ordered or

ranked.

Nominal Data (can’t be rank)Gender, race, citizenship. etc

Ordinal Data (can be rank)Feeling (dislike – like),

color (dark – bright) , etc

Discrete Variables Assume values that can be

counted and finiteEx : no of something

Continuous variables 1. Can assume all values between any two specific values & it obtained by measuring2. Have boundaries and must be rounded because of the limits of measuring device

Ex: weight, age, salary, height, temperature, etc

Use code numbers (1,

2,…)

Page 34: Chapter 1 Introduction to Applied Statistics 111

Example

The Lemon Marketing Corporation has asked you for information about the car you drive. For each question, identify each of the types of data requested as

either attribute data or numeric data. When numeric data is requested, identify the variable as discrete or continuous.

1. What is the weight of your car?2. In what city was your car made?3. How many people can be seated in your car?4. What’s the distance traveled from your home to your school?5. What’s the color of your car?6. How many cars are in your household?7. What’s the length of your car?8. What’s the normal operating temperature (in degree Fahrenheit) of your car’s

engine?9. What gas mileage (miles per gallon) do you get in city driving?10. Who made your car?11. How many cylinders are there in your car’s engine?12. How many miles have you put on your car’s current set of tyres?

Page 35: Chapter 1 Introduction to Applied Statistics 111

Level of Measurements of Data

Nominal-level data

Ordinal-level data

Interval-level data

Ratio-level data

classifies data into mutually exclusive (non overlapping), exhausting

categories in which no order or ranking can be imposed on

the data

classifies data into categories

that can be ranked;

however, precise

differences between the ranks do not

exist

ranks data, and

precise differences

between units of measure do

exist; however, there is no

meaningful zero

Possesses all the characteristics

of interval measurement,

and there exists a true zero.

Examples

Page 36: Chapter 1 Introduction to Applied Statistics 111

STEP 5Presenting and Analyzing the data

Summarized & analyzed information given by the graphical & descriptive statistics

Identify the relationship of the information Making any relevant statistical inferences

(hypothesis testing, confidence interval, ANOVA, control charts, etc…)

Page 37: Chapter 1 Introduction to Applied Statistics 111

Types of Graph & ChartTypes of Graph & Chart

Page 38: Chapter 1 Introduction to Applied Statistics 111

Distribution Shapes for HistogramDistribution Shapes for Histogram

Bell Shaped Has a single

peak & tapers off at either end

Approximately symmetry

It is roughly the same on the both sides of a line running through the center

J-Shaped Has a few data

values on the left side & increase as one move to the right

Uniform Basically

flat/rectangular

Reverse J-Shaped

Opposite J-Shaped

Has a few data values on the right side & increase as one move to the left

Page 39: Chapter 1 Introduction to Applied Statistics 111

Distribution Shapes for HistogramDistribution Shapes for Histogram

Right Skewed The peak is to

the left The data value

taper off to the right

Bimodal Have 2 peak at

the same height

Left Skewed The peak is to

the right The data value

taper off to the left

U-Shaped The shape is U

Page 40: Chapter 1 Introduction to Applied Statistics 111

STEP 6Making the decision

The researchers can make a list of all the options and decisions which can achieve the objective and goal of the research, weighs the options and choose the best options which represents the ‘best’ solution to the problem.

The correctness of this choice depends on the analytical skill and the quality of the information.

Page 41: Chapter 1 Introduction to Applied Statistics 111

Statistical Problem Solving Methodology

START

Identify the problem or opportunity

Gather available internal and external facts relevant to the

problem

Gather new data from populations and samples using instruments, interviews,

questionnaire, etc

Classify, summarize, and process data using tables,

charts, and numerical descriptive measure

Present and communicate summarized information in form of tables, charts and

descriptive measure

Use cencus information to evaluate alternative courses of

action and make decisions

Use sample information to 1. Estimate value of parameter 2. Test assumptions about parameter

Interpret the results, draw conclusions, and make decisions

STOP

Are available facts sufficient?

Is information from a sample?

No

No

Yes

Yes

Page 42: Chapter 1 Introduction to Applied Statistics 111

Role of the Computer in Statistics

Two software tools commonly used for data

analysis

1. Spreadsheets Microsoft Excel & Lotus 1-2-3

2. Statistical Packages MINITAB, SAS, SPSS and SPlus

Page 43: Chapter 1 Introduction to Applied Statistics 111

1.3 REVIEW OF

DESCRIPTIVE STATISTICS

Page 44: Chapter 1 Introduction to Applied Statistics 111

Summary Statistics (Data Description)

Statistical methods can be used to summarize data.

Measures of average are also called measures of central tendency and include the mean, median, mode, and midrange.

Measures that determine the spread of data values are called measures of variation or measures of dispersion and include the range, variance, and standard deviation.

Measures of position tell where a specific data value falls within the data set or its relative position in comparison with other data values. The most common measures of position are percentiles, deciles, and quartiles.

The measures of central tendency, variation, and position are part of what is called traditional statistics. This type of data is typically used to confirm conjectures about the data

Page 45: Chapter 1 Introduction to Applied Statistics 111

1.3.1 Measures of Central Tendency

Mean

the sum of the values divided by the total number of values.

Population Mean Sample Mean

1 , population size

N

ii

xN

N

1 , sample size

n

ii

xx n

n

Example: 9 2 1 4 3 3 7 5 8 6

Page 46: Chapter 1 Introduction to Applied Statistics 111

Properties of Mean The mean is compute by using all the values of the data.

The mean varies less than the median or mode when samples are

taken from the same population and all three measures are

computed for these samples.

The mean is used in computing other statistics, such as variance.

The mean for the data set is unique, and not necessarily one of the

data values.

The mean cannot be computed for an open-ended frequency

distribution.

The mean is affected by extremely high or low values and may not

be the appropriate average to use in these situations

Page 47: Chapter 1 Introduction to Applied Statistics 111

1.3.1 Measures of Central Tendency

Median

the middle number of n ordered data (smallest to largest)

If n is odd If n is even

1

2

Median nx 12 2Median

2

n nx x

Example:

9 2 1 4 3 3 7 5 8 6

Example:

9 2 1 3 3 7 5 8 6

Page 48: Chapter 1 Introduction to Applied Statistics 111

Properties of Median

The median is used when one must find the center or middle value of a data set.

The median is used when one must determine whether the data values fall into the upper half or lower half of the distribution.

The median is used to find the average of an open-ended distribution.

The median is affected less than the mean by extremely high or extremely low values.

Page 49: Chapter 1 Introduction to Applied Statistics 111

1.3.1 Measures of Central Tendency

Mode

the most commonly occurring value in a data series

The mode is used when the most typical case is desired.

The mode is the easiest average to compute.

The mode can be used when the data are nominal, such as religious preference, gender, or political affiliation.

The mode is not always unique. A data set can have more than one mode, or the mode may not exist for a data set.

Example: 9 2 1 4 3 3 7 5 8 6

Page 50: Chapter 1 Introduction to Applied Statistics 111

Midrange is a rough estimate of the middle & also a very rough

estimate of the average and can be affected by one extremely high or low value.

lowest value highest valueMR

2

1.3.1 Measures of Central Tendency

Example: 9 2 1 4 3 3 7 5 8 6

Page 51: Chapter 1 Introduction to Applied Statistics 111

Types of Distribution

Symmetric

Positively skewed or right-skewed Negatively skewed or left-skewed

Page 52: Chapter 1 Introduction to Applied Statistics 111

1.3.2 Measures of Variation / Dispersion

Used when the central of tendency doesn't mean anything or not needed (ex: mean are same for two types of data)

One that measure the variability that exists in a data set

To form a judgment about how well the average value illustrate/ depict the data

To learn the extent of the scatter so that steps may be taken to control the existing variation

Page 53: Chapter 1 Introduction to Applied Statistics 111

1.3.2 Measures of Variation / Dispersion

Range

is the different between the highest value and the lowest value in a data set.

The symbol R is used for the range.

R = highest value - lowest value

Example: 9 2 1 4 3 3 7 5 8 6

Page 54: Chapter 1 Introduction to Applied Statistics 111

1.3.2 Measures of Variation / Dispersion

Variance

is the average of the squares of the distance each value is from the mean.

Population Variance Sample Variance

Standard Deviation is the square root of the variance

Population standard deviation , Sample standard deviation, s

2

2 1

2

1

, population size

, population size

N

ii

N

ii

xN

N

xN

N

2

2 1

2

1

, sample size1

, sample size1

n

ii

n

ii

x xs n

n

x xs n

n

Example:

9 2 1 4 3 3 7 5 8 6

Page 55: Chapter 1 Introduction to Applied Statistics 111

Properties of Variance & Standard Deviation

Variances and standard deviations can be used to determine the spread of the data. If the variance or standard deviation is large, the data are more dispersed. The information is useful in comparing two or more data sets to determine which is more variable.

The measures of variance and standard deviation are used to determine the consistency of a variable.

The variance and standard deviation are used to determine the number of data values that fall within a specified interval in a distribution.

The variance and standard deviation are used quite often in inferential statistics.

The standard deviation is used to estimate amount of spread in the population from which the sample was drawn.

Page 56: Chapter 1 Introduction to Applied Statistics 111

Chebychev Theorem

If 11.27, 15, and 4.12

illustrate the Chebychev Theorem

for 1, 2, and 3

n

k k k

Page 57: Chapter 1 Introduction to Applied Statistics 111

Describing the position of the data value (increasing order)

Percentiles Quartiles

100

i in cP x x 4

i in cQ x x

15 3 22, 3, 4.5P D Q Example: 9 2 1 4 3 3 7 5 8 6

DecilesSplit data into

100 equal partsSplit data into 4 equal parts

Split data into 10 equal parts

10

i in cD x x

TIPS: If c is not a whole number, round it up to the next whole number

If c is a whole number, then use 1 2c cx x

1.3.3 Measures of Position

Page 58: Chapter 1 Introduction to Applied Statistics 111

EXERCISEEXERCISE1. Given 9 2 1 4 3 7 5 4 6 .

a) What percentile is the value of 8?

b) Find the value correspond to 4th deciles.

c) Find the value correspond to 3rd quartiles.

2. Given 9 22 11 14 13 3 7 15 18 16

a) Find the value correspond to 20th percentiles.

b) What percentile is the value of 20?

c) Find the value correspond to 7th deciles.

TIPS: The percentile correspond to a given value of x is computed by:

number of values below 0.5100%

total number of values

xPercentile

Page 59: Chapter 1 Introduction to Applied Statistics 111

OutliersOutliers An outlier is an extremely high or an extremely low data value when

compared with the rest of the data values.

Outliers can be the result of measurement or observational error.

When a distribution is normal or bell-shaped, data values that are beyond three standard deviations of the mean can be considered suspected outliers.

Example: 9 22 11 14 13 3 7 15 18 16 no outliers

Page 60: Chapter 1 Introduction to Applied Statistics 111

TIPS: Calculate mean and variance

by using Scientific Calculator

Casio fx-570MS Insert data

MODE SD data M+ Shift 1 Shift 2 Clear data

Shift CLR 1

Casio fx-570W Insert data

MODE SD data M+ Shift 1 Shift 2 Shift 3 Shift 4 Clear data

Shift AC/ON =

Page 61: Chapter 1 Introduction to Applied Statistics 111

1.4 EXPLORATORY DATA ANALYSIS

Page 62: Chapter 1 Introduction to Applied Statistics 111

A simple way to summarize a data set. Each item in the sample is divided into two

parts: a stem, consisting of the leftmost one or two digits, and the leaf, which consists of the next digit.

It is a compact way to represent the data. It also gives us some indication of the shape of

our data.

1.4.1 Stem and Leaf Plot

Page 63: Chapter 1 Introduction to Applied Statistics 111

EXAMPLE 1 Example: Duration of dormant periods of the geyser Old Faithful in

Minutes Stem-and-leaf plot:

4 2595 01111335566786 0677897 012334555566666998 0000122233444566689 013

Let’s look at the first line of the stem-and-leaf plot. This represents measurements of 42, 45, and 49 minutes.

A good feature of these plots is that they display all the sample values. One can reconstruct the data in its entirety from a stem-and-leaf plot.

Page 64: Chapter 1 Introduction to Applied Statistics 111

A boxplot is a graphic that presents the median, the first and third quartiles, and any outliers present in the sample.

The interquartile range (IQR) is the difference between the third quartile and the first quartile. This is the distance needed to span the middle half of the data.

1.4.2 Box Plots

Page 65: Chapter 1 Introduction to Applied Statistics 111

STEP to Construct a BoxplotSTEP to Construct a Boxplot STEP1 : Arrange the data STEP2 : Find the Median STEP3 : Find Q1 and Q3 STEP4 : Find Outliers

Points that lying more than 1.5 times the interquartile range above Q3 or below Q1

STEP5 : Draw a scale for the data on the x axis. STEP6 : Locate the lowest value, Q1, the median, Q3, the

highest value and outliers on the scale. STEP7 : Draw a box around Q1 and Q3, draw a vertical line

through the median, and connect the upper and lower values

1 3 1 3 3 11.5 and 1.5 x Q Q Q x Q Q Q

Page 66: Chapter 1 Introduction to Applied Statistics 111

EXAMPLEEXAMPLE

1. Plot a boxplot for the following data. Then describe the data.

a) 9 22 11 14 13 3 7 15 18 16

b) 19 2 1 7 5 8 6

2. A dietician is interested in comparing the sodium content of real cheese with the sodium content of a cheese substitute. Te data for two random samples are shown. Compare the distributions using boxplots

Real Chese 310, 420, 45, 40, 220, 240, 180, 90

Cheese Subtitute 270, 180, 250, 290, 130, 260, 340, 310

Page 67: Chapter 1 Introduction to Applied Statistics 111

Summarize the BoxplotsSummarize the Boxplots

EXTRA INFO:

1. If the boxplots for two or more data sets are graphed on the same axis, the distributions can be compared.

2. To compare the averages, use the location of the medians.

3. To compare the variability, use the location of the interquartile range.

Page 68: Chapter 1 Introduction to Applied Statistics 111

Anatomy of a Boxplot

Page 69: Chapter 1 Introduction to Applied Statistics 111

Conclusion The applications of statistics

are many and varied. People encounter them in everyday life, such as in reading newspapers or magazines, listening to the radio, or watching television.

By combining all of the descriptive statistics techniques discussed in this chapter together, the student is now able to collect, organize, summarize and present data.

Page 70: Chapter 1 Introduction to Applied Statistics 111

Thank You

See You in CHAPTER 2

Commonly used Probability Distribution

- DO YOUR TUTORIAL!!!