25
Chapter 1 The Role of Statistics and the Data Analysis Process

Mazda Presentation Topic

Embed Size (px)

DESCRIPTION

Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com

Citation preview

Page 1: Mazda Presentation Topic

Chapter 1

The Role of Statistics and the Data Analysis Process

Page 2: Mazda Presentation Topic

1.1 Three Reasons to Study Statistics

Reason 1. Being Informed.• You should be able to

– Extract information from tables, charts, and graphs;

– Follow numerical arguments;

– Understand the basics of how data should be gathered, summarized, and analyzed.

Page 3: Mazda Presentation Topic

1.1 Three Reasons to Study Statistics

1. Examples of Being Informed• An analysis of data from University of Utah concluded

that drivers engaged in cell phone conversations missed twice as many simulated signals as drivers who were not talking over the phone.

• An article on the Journal of the American Medical Association concluded that surgery patients at hospitals with a severe shortage of nurses had a 31% greater risk of dying while in the hospital.

• Based on interviews with 24,000 women in 10 different country, WHO found that the percentage of women who have been abused by a partner varied widely-from 15% in Japan to 71% in Ethiopia.

Page 4: Mazda Presentation Topic

1.1 Three Reasons to Study Statistics

Reason 2. Making Informed Judgments

To make informed decisions, you must be able to take the following steps:

1. Decide whether existing information is adequate or whether additional information is required.

2. If necessary, collect more information in a reasonable and thoughtful way.

3. Summarize the available data in a useful and informative manner.

4. Analyze the available data.5. Draw conclusions, make decisions, and assess the risk of

an incorrect decision.

Page 5: Mazda Presentation Topic

1.1 Three Reasons to Study Statistics Examples of Making Informed Decisions• Almost all industries, as well as government and nonprofit

organizations, use market research tools, such as consumer surveys, that are designed to provide information about who uses their products or services.

• Modern science and its applied fields rely on statistical methods for analyzing data and deciding whether various conjectures are supported by observed data.

• In law, class-action lawsuit can depend on a statistical analysis of whether one kind of injury or illness is more common in a particular group than in general public.

• We also use the five steps to make everyday decision: Should we go out for a sport that involves the risk of injury. If we choose a particular major, what are our chance of finding a job when you graduate?

Page 6: Mazda Presentation Topic

1.1 Three Reasons to Study Statistics

Reason 3. Evaluating Decisions That Affect Your Life Other people use statistical methods to make decisions that affect you. An understanding of statistical techniques will allow you to question and evaluate decisions that affect your well-being.

• Insurance company use statistical techniques to set auto insurance rates.

• University financial aid offices collect data on family incomes and savings, and use the data to set criteria for deciding who receives financial aid.

• Medical researchers use statistical methods to make recommendations regarding the choice between surgical and nonsurgical treatment of such diseases as coronary heart disease and cancer.

Page 7: Mazda Presentation Topic

1.2 The Nature and Role of Variability

• Variability is almost universal. • Imagine an unrealistic situation: In a university,

every student takes the same courses, spends exactly the same amount of money on textbooks, and has the same GPA.

• Populations with no variability almost do not exist.• We need to understand variability to be able to

collect, analyze, and draw conclusions from data in a sensible way.

Page 8: Mazda Presentation Topic

1.3 Statistics and Data Analysis

• Statistics is the science of collecting, analyzing, and drawing conclusions from data.

• The Population: the entire collection of individuals or objects about which information is desired.

• A Sample: A subset of the population, selected for study in some prescribed manner.

• Descriptive statistics includes methods for organizing and summarizing data

• Inferential statistics involves generalizing from a sample to the population from which it was selected, and assessing the reliability of such generalization.

Page 9: Mazda Presentation Topic

The Data Analysis Process

1. Understand the nature of the problem.

2. Decide what to measure and how to measure it.

3. Collect data with a carefully developed plan.

4. Summarize the data and start preliminary analysis.

5. Apply the appropriate inferential statistical method for formal data analysis.

6. Interpret the results.

Page 10: Mazda Presentation Topic

1.3 Statistics and Data Analysis

• Example: A consumer group conducts crash tests of new model cars. To determine the severity of damage to 2003 Mazda 626s resulting from a 10-mph crash into a concrete wall, the research group tests six cars of this type and assesses the amount of damage. Describe the population and sample for this problem.

Population: All 2003 Mazda 626sSample: The six Mazda 626 being tested.

Page 11: Mazda Presentation Topic

1.3 Statistics and Data Analysis

• Example: The supervisors of a rural county are interested in the proportion of property owners who support the construction of a sewer system. Because it is too costly to contact all 7000 property owners, a survey of 500 owners (selected at random) is undertaken. Describe the population and sample for this problem

Population: All 7000 property owners in the countySample: The 500 property owners being surveyed

Page 12: Mazda Presentation Topic

Example: A Proposed New Treatment for Alzheimer’s Disease

• Doctors at Stanford Medical Center were interested in determining if a new surgical approach to treating Alzheimer’s disease results in improved memory functioning. (The surgical procedure involves implanting a thin tube, called a shunt.)

• 11 patients have shunts implanted and were followed for a year, receiving quarterly tests for memory function.

• Another sample of Alzheimer’s patients received standard care, and was used as a comparison group.

• After analyzing the data from this study, the researchers concluded that the treated patients essentially held their own in the cognitive test while the patients in the comparison group steadily declined.

Page 13: Mazda Presentation Topic

1.3 Statistics and Data Analysis

• In the example: A proposed new treatment for Alzheimer’s disease, what is the population and sample?

• Do you think the sample is good enough to produce conclusive statistical evidence?

• The limitations of the study: the result is from a small sample. They need a larger, more sophisticated study, and a new data analysis cycle begins.

• A much larger 18-month study was planned. The study was to include 256 patients at 25 medical centers around the country.

Page 14: Mazda Presentation Topic

1.4 Types of Data and Some Simple Graphical Displays

Definitions

1. A variable is an characteristic whose value may change from one individual or object to another in a population. e.g. The population is the set of all students in our stats class. The brand of calculator owned by each student is a variable, and the distance to UHD from each student’s home is also a variable.

2. A data set consisting of observations on a single variable (attribute) is a univariate data set.

3. A univariate data set is categorical (or qualitative) if the individual observations are categorical responses. (e.g. the brand of calculator)

4. A univariate data set is numerical (or quantitative) if each observation is a number. (e.g. the distance to UHD)

Page 15: Mazda Presentation Topic

1.4 Types of Data and Some Simple Graphical Displays

Discrete and Continuous Data:

1. Numerical data are discrete if the possible values are isolated points on the number line.

2. Numerical data are continuous if the set of possible values forms an entire interval on the number line.

Page 16: Mazda Presentation Topic

1.4 Types of Data and Some Simple Graphical Displays

1. Example: Airline Safety Violations

The FAA monitors airlines and can take administrative actions for safety violations: Security (S), Maintenance (M), Flight Operations (F), Hazardous Materials (H), or Other (O).

Data for 20 administrative actions are given below.

S S M H M O S M S S

F S O M S M S M S M

Classify the attribute as categorical or numerical.

Answer: categorical

Page 17: Mazda Presentation Topic

An Example of Numerical Data

2. Example: Revisiting Airline Safety Violation• The following data present the number of violations and the average

fine per violation for the period 1985-1998 for 10 major airlines:Airline No. of Violation Average Fine per Violation

($)Alaska 258 5038.760American West 257 3112.840American 1745 2693.410Continental 973 5755.390Delta 1280 3828.125Northwest 1097 2643.573Southwest 535 3925.234TWA 642 2803.738United 1110 2612.613US Airways 891 3479.237

Page 18: Mazda Presentation Topic

1.4 Types of Data and Some Simple Graphical Displays: Frequency Distributions

• Frequency Distributions for Categorical Data is a table that displays the possible categories along with the associated frequencies and/or relative frequencies.

• The frequency for a particular category is the number of times the category appears in the data set.

• The relative frequency for a particular category is the fraction or proportion of the observations resulting in the category

• If the table includes relative frequency, it is sometimes referred to as a relative frequency distribution.

set data in the nsobservatio ofnumber

frequencyfrequency relative

Page 19: Mazda Presentation Topic

Frequency Distributions• Example: To ensure safety, the motorcycle helmet should reach the

bottom of the motorcyclist’s ears, according to the standards set by US Department of Transportation. Data was collected by observing 1700 motorcyclists nationwide at selected roadway locations. There were 731 riders who wore no helmet, 153 who wore a noncompliant helmet, and 816 who wore a compliant helmet. Determine the frequency distribution and relative frequency distribution. Use the code:

N = no helmet, NH = noncompliant helmet, and CH = compliant helmet

• Frequency distribution for helmet use

Helmet Use Category Frequency Relative Frequency

N 731 .430 (731/1700)

NH 153 .090 (153/1700)

CH 816 .480 (816/1700)

Total 1700 1.00

Page 20: Mazda Presentation Topic

Some Simple Graphical Displays: Bar Charts

• When to use a bar chart: Categorical data• How to Construct1. Draw a horizontal line, and write the category names or labels

below the line at regularly spaced intervals.2. Draw a vertical line, and label the scale using either frequency or

relative frequency.3. Place a rectangular bar above each category label. The height is

determined by the category’s frequency or relative frequency, and all bars should have the same width. With the same width, both the height and the area of the bar are proportional to frequency and relative frequency.

• Construct a bar chart for the helmet data.

Page 21: Mazda Presentation Topic

Create a Bar Chart Using Excel

Input and highlight data

Click Insert

Click Bar

Choose Bar type

Click OK

Page 22: Mazda Presentation Topic

Excel generates the bar chart. You can choose from “Chart Layout” to add title, give explanations and do other modifications.

Page 23: Mazda Presentation Topic

1.4 Types of Data and Some Simple Graphical Displays: Dotplots for Numerical Data

• When to use a dotplot: Small numerical data sets

• How to construct a dotplot :

1. Draw a horizontal line and mark it with an appropriate measurement scale

2. Locate each value in the data set along the measurement scale, and represent it by a dot. If there are two or more observations with the same value, stack the dots vertically.

• What to Look For:

1. A representative or typical value in the data set.

2. The extent to which the data values spread out.

3. The nature of the distribution of values along the number line.

4. The presence of unusual values in the data set.

Page 24: Mazda Presentation Topic

1.4 Types of Data and Some Simple Graphical Displays: Dotplots for Numerical Data

Example: The Chronicle of Higher Education reported graduation rate for NCAA Division I schools.. The rates reported are the percentage of full-time freshmen in fall 1993 who had earned a bachelor’s degree by August 1999. Data from 20 schools in California and 19 schools from Texas are as follows

California:

Texas:

Construct (1) a dotplot of graduation rates; (2) a dotplot of graduation rate for California and Texas

64 41 44 31 37 73 72 68 35 37

81 90 82 74 79 67 66 66 70 63

67 21 32 88 35 71 39 35 71 63

12 46 35 39 28 65 25 24 22

Page 25: Mazda Presentation Topic

Dotplot of graduation rates (California and Texas together)

Separate dotplots of graduation rates for Texas and California