108
BEA140 Leon Jiang, University of Tas mania 1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

Embed Size (px)

Citation preview

Page 1: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 1

Module 2 Quantitative Methods

Summer Semester 2009By Leon JiangUniversity of Tasmania

Page 2: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 2

Why this unit?

Particularly this module as required to study statistics?

Page 3: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 3

Thinking on a case!

Suppose! You work for R&d of IBM. As you know, IBM

is being competing with a number of very strong rivals.

Can you create a laptop computer for IBM as you like?

Page 4: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 4

Not that simple as ‘you like’! To clearly know what types of

laptop computers that customers like to have should be the only headway and or the direction for a successful R&D department.

But, can we know the number of customers and the types of laptop computers they like to actually buy?

Page 5: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 5

Of course…

The best situation would be we know exactly, in this world, how many customers who are willing to buy, exactly what types of laptop computers, and don’t forget exactly how many of these customers who can financially afford your products.

Page 6: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 6

However, can we know these information? In this case, it is nearly impossible to know!

But, can we estimate these information?

Page 7: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 7

We do can actually know these information to some extent!

This is what we are going to learn in this module!

Page 8: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 8

* What is statistics?

The word 'Statistics' is derived from the Latin word 'Statis' which means a "political state.“

Clearly, statistics is closely linked with the administrative affairs of a state such as facts and figures regarding defense force, population, housing, food, financial resources etc.

What is true about a government is also true about industrial administration units, and even one personal life.

Page 9: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 9

* The meaning of statistics!

* The word statistics, as a singular noun, is used to describe a branch of applied mathematics, whose purpose is to provide methods of dealing with a collection of data and extracting information from them in compact form by tabulating, summarizing and analyzing the numerical data or a set of observations.

* The various methods used are termed as statistical methods and the person using them is known as a statistician. A statistician is concerned with the analysis and interpretation of the data and drawing valid worthwhile conclusions from the same.

Page 10: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 10

* Some important terms~!

Population ( or universe) - the total number of objects (individuals or members) to be considered.

The total number in a population is known as the size of the population---which may be finite or infinite.

The population can refer to things as well as people.

Page 11: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 11

* Examples as “population”! All members of the student union of UTAS.

All students of QM unit in your class.

All the people who drink beer at least 2 times a week over the past one year.

Heights of teaching staff at Utas.

Weights of all the citizens of Hobart above 20 years of age.

Page 12: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 12

* Population - finite and infinite

A population is finite if it contains finite number of individuals. For example, the number of students at your class.

A population is infinite if it contains infinite number of individuals. For example, the pressures at various points in the atmosphere.

Page 13: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 13

* Infinite population~!

Often, statisticians want to know things about population, but they fail to do so almost because in every case such data for every individual of the population are not available.

Suppose in the above case, can we know how many people who are willing to buy and actually afford to buy your products?

Thus whenever we want to study the characteristics of a certain population, it is difficult to study the whole population. it is often expensive and time consuming and many times we lack resources for the study of the whole population. In any science we cannot

study more than a part of population. A part or small section selected from the population is called a sample.

Page 14: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 14

This can be the starting point for the case, to take a sample in order to know something about the population!

Page 15: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 15

* Sample~!

A finite set of objects drawn from the population with an aim is called a sample.

Page 16: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 16

* Even in every day life we make many of our decisions

based on samples taken, though we are not aware of it. We just take a little from a gunny bag of rice, we

judge its quality and then we purchase the whole bag.

If we want to taste milk, we just take a glassful of milk from the can and taste it.

Note that taking a sample is easy in many cases where the population is uniform or homogeneous

When the population is heterogeneous (not uniform), the selection of a sample is not very easy.

Page 17: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 17

* Parameter & statistic*• The word ‘Parameters’ is associated with

the population and it is understood as the measure of the characteristics of the population, such as mean and standard deviation, etc.

• The word 'Statistic' is used for a random sample and it is understood as the measure of the characteristics of the random sample, such as mean and standard deviation etc.

Page 18: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 18

* Different symbols are used to denote parameters and statistics *

Page 19: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 19

* For instance, say:

Aim to know: The average (mean) income of families living in the area “Salamanca" in the year 2007-2008. i.e. $50,000 - The population parameter (in A$).

Work out this way: Draw a random sample of 200 families and compute their average income. i.e. the statistic of sample says $52,000.

Conclusion: The population mean (parameter) is close to sample mean.

X

Page 20: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 20

* The process of doing statistics * - 3 steps ~!

1. Design – gathering data !

2. Description – summarizing, studying features and characteristics of data, providing useful and effective information. (including graphical tools, tables, summary measures.)

3. Inference ( conclusion) – requiring the application of probability concepts.

Page 21: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 21

Importance of statistics

‘hard data’

Scientific evidence

Postgraduate study for masters or PhD.

Necessary for high-quality essays.

Page 22: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 22

* Sources of data *

Primary source – the original collector of the data, i.e. the National Population Census Bureau.

Secondary source – a subsequent user of the data.

Page 23: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 23

* How to collect primary data?

• Survey – interview, questionnaire, etc.- in doing a survey, skills are required for design.

• Observation – observing and recording behaviors.

• Experiment – use of experimental and control groups.

- appropriate design is important.

Page 24: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 24

* Where to find secondary data? *

Those published sources of data, e.g. trade journals, any relevant kinds of media.

Secondary data collection is usually more cost-saving and less time -consuming than primary data collection.

Page 25: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 25

* Survey errors *- Usually four kinds of errors realized beforehand

1. Coverage error→selection bias Not cover all or exclude some –population frames not clear( the

random probability sample selected will provide an estimate of the characteristics of the frame, not the actual population. )

2. Non-response error → bias Those who with no response might have very different views.

3. Sampling error samples are not representative.

4. Measurement error Poor questionnaire or interviewing skills

Page 26: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 26

•Sampling methods *- 2 basic kinds!

1. Probability sampling ( random sampling)-

* Only random sampling is valid for statistical inference.

2. Non-probability sampling –- Two broad types: accidental or purposive.

Page 27: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 27

* Convenience sample *

A convenience sample is a sample where the patients are selected, in part or in whole, at the convenience of the researcher. The researcher makes no attempt, or only a limited attempt, to ensure that this sample is an accurate representation of some larger group or population.

The classic example of a convenience sample is standing at a shopping mall and selecting shoppers as they walk by to fill out a survey.

Page 28: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 28

- More about convenience sample -

In general, the Statistics community frowns on convenience samples. You will often have great difficulty in generalizing the results of a convenience sample to any population that has practical relevance.

Still, convenience samples can provide you with useful information, especially in a pilot study. To interpret the findings from a convenience sample properly, you have to characterize (usually in a qualitative sense) how your sample would differ from an ideal sample that was randomly selected. In particular, pay attention to who might be left out of your convenience sample or who might be underrepresented in your sample.

Page 29: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 29

* Random sample *

In contrast, a random sample is one where the researcher insures (usually through the use of random numbers applied to a list of the entire population) that each member of that population has an equal probability of being selected.

Random samples are an important foundation of Statistics. Almost all of the mathematical theory upon which Statistics are based rely on assumptions which are consistent with a random sample. This theory is inconsistent with data collected from a convenience sample.

Page 30: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 30

* Judgment sample *

- A non-probability sample that is often called a purposive sample because the sample elements are handpicked and because they are expected to serve the research purpose.

Page 31: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 31

* Simple sampling * A sampling procedure that assures that each element in the

population has an equal chance of being selected is referred to as simple random sampling .

Let us assume you had a school , with a 1000 students, divided equally into boys and girls, and you wanted to select 100 of them for further study. You might put all their names in a drum and then pull 100 names out. Not only does each person have an equal chance of being selected, we can also easily calculate the probability of a given person being chosen, since we know the sample size (n) and the population (N) and it becomes a simple matter of division:

n/N x 100 or 100/1000 x 100 = 10%

This means that every student in the school has a 10% or 1 in 10 chance of being selected using this method.

Page 32: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 32

Systematic Sampling

At first sight this is very different. Suppose that the N units in the population are numbered 1 to N in some order. To select a systematic sample of n units, if then every k-th unit is selected commencing with a randomly chosen number between 1 and k. Hence the selection of the first unit determines the whole sample, e.g., N = 5,000, n = 250 therefore k = 5000/250 = 20. Therefore, select every 20th item commencing with (say) 6.

Page 33: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 33

* Samples from a subdivided population *

* Quota sampling usually refers to the process whereby a researcher attempts to match in a sample the exact makeup of the population with regard to certain demographic characteristics deemed important (such as gender, age, race, income, etc ).

* Quota sampling is non-probability.

Page 34: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 34

* Stratified random sampling*

Stratified sampling is used if sampled area (or volume) is heterogeneous.

The whole population is first into mutually exclusive subgroups or strata and then units are selected randomly from each stratum.

Page 35: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 35

* Cluster sampling *

Cluster sampling is used when "natural" groupings are evident in the population. The total population is divided into groups or clusters.

Page 36: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 36

* Properties of data *

The phenomena or characteristics observed are random variables.

Variables have a range of values and are random, for example: eye color, height, weight, income per month, car accidents per day…

Page 37: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 37

* Two types of variables!

Categorical– featuring in quality of variables.

Numerical – more in quantity of variables.

Page 38: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 38

* Categorical variables!- Yielding categorical responses -

* Nominal scale and ordinal scale *- Nominal scale : variables have no relation to order and only can

be analyzed by their names.- Arithmetic limited to counting.- Example: Degree - law, commerce, economics, arts, science, etc.

- Ordinal scale: variables are also nominal but there is ordering or ranking in them.

- Example as: House number in a street: 121, 122, 123, 124, 125, etc.

- Nominal plus positional measures including median in particular.

Page 39: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 39

* Numerical variables!- Yielding numerical responses -

* Interval scale and ratio scale *- Interval scale: variables themselves have an order and the

difference between values is a meaningful quantity. - Zero value here is arbitrary. - Example: temperature – difference between 4 C and 6 C is the same

as between 6 C and 8 C, but 8 C is not twice as hot as 4 C.- Or, degree of your eyesight; is 1.5 two times 0.75?

- Ratio scale: like interval scale, variables in this scale have a order or ranking, but there is a true zero here.

- Example: 100kg is twice as heavy as 50kg.

Page 40: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 40

* Numerical or quantitative variables can further be subdivided continuous and discrete ones.

Continuous variables - such as time.

Discrete variables – such as family size.

Page 41: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 41

* Example- variable types & scales of measurement

variable example value type/ scale country of birth Australia categorical, norminaljudo belt Blue categorical, ordinal

mortgage $125, 000 (continuous) numerical, ratio

class size 302 (discrete) numerical, ratio

Page 42: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 42

* Two more terms!

Raw data: collected but unsorted.

Array : ordered data, increasing or decreasing.

Page 43: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 43

* Describing and presenting data!

Data can be described and communicated in three main ways:

Tabular (in the form of tables) – frequency tables, contingency tables and super tables, etc.

Graphical- various forms of charts.

Summary (descriptive)- mean, standard deviation, median, etc.

Page 44: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 44

Steam and leaf display

Stem Leaf Frq Cum.

1 .8 1 1

2 .0 .2 .4 .5 .5 .5 .7 .8 .9 .9 10 11

3 .1 .1 .2 .2 .3 .4 .4 .4 .6 .8 .8 .9 .9 13 24

4 .0 .2 .3 .5 .6 .6 6 30

5 .0 .0 .1 .9 4 34

6 .0 .1 .2 .5 .7 .7 6 40

7 .0 .2 .5 .6 .6 5 45

8 .0 .1 .5 .9 4 49

9 .2 1 50

10 .1 1 51

11 0 51

12 .4 1 52

13 .6 1 53

14 0 53

15 0 53

16 0 53

17 .7 1 54

54

Page 45: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 45

Frequency table!

Time Number ofCalls

Class Mark Cum. Freq Cum. %

x i f j x j f j f j /n 0 0 0 0.00% 11 2 11 20.37% 19 4 30 55.56% 10 6 40 74.07% 9 8 49 90.74% 2 10 51 94.44% 1 12 52 96.30% 1 14 53 98.15% 0 16 53 98.15% 1 18 54 100.00% 0 20 54 100.00%

Page 46: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 46

Histogram

Histogram of Call Durations

0

5

10

15

20

25

-1 &U

1

1 &U 3

3 &U 5

5 &U 7

7 &U 9

9 &U 11

11 &U 13

13 &U 15

15 &U 17

17 &U 19

19 &U 21

Duration in Minutes

Number of Calls

Page 47: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 47

Frequency Polygon

Frequency Poly gon of Call Durations

0

5

10

15

20

25

0 2 4 6 8 10 12 14 16 18 20

Duration in Minutes

Num

ber o

f Cal

ls

Page 48: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 48

Ascending Ogive

Ogive of Call Durations

0%

20%

40%

60%

80%

100%

0 2 4 6 8 10 12 14 16 18 20 22

Duration in Minutes

Prop

ortio

n of C

alls

Page 49: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 49

Bar ChartNumber of Calls Handled on an Average Weekday

0100200

300400500

Morning Day Evening Night

Shift

Num

ber o

f Call

s

Page 50: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 50

Pareto Pareto Diagram of Causes of Disatisfaction with Consultants

0%10%20%30%40%50%60%70%80%90%

100%

Ru

de

Po

or

Kn

ow

led

ge

Did

n't

Lis

ten

Po

or

Gra

mm

ar

To

o F

orm

al

To

o

Fa

mili

ar

Oth

ers

Cause

% o

f Re

spo

nse

s

Page 51: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 51

* Summary measures~!

Central tendency: typical or representative value – a measure of location.

Dispersion: indicating the variation or spread in the data.

Shape of the grouped data.

Page 52: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 52

* Presenting data in tables and charts

* Summary Measures

Page 53: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 53

Univariate Data

Single variable

Page 54: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 54

* Learning objectives *

1. Organize numerical data

2. Develop tables and charts for numerical data

3. Develop tables and charts for categorical data

4. Understand the principles of proper graphical presentation

Page 55: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 55

* Two ways to organize numerical data

The ordered array

The stem-and-leaf display

Page 56: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 56

* The ordered array !

An ordered array makes the raw data in rank order from the smallest to the largest.

The feature of ordered array is it makes easier to pick out extremes, typical values, and area where the majority of the values are concentrated.

Page 57: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 57

* The stem-and-leaf display !

This valuable data-organizing tool helps show how the values distribute and cluster in the data set.

The stem-and-leaf display is constructed , apparently from its name, with two parts>:

- the stem - The leaf

Page 58: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 58

* Construct a stem-and-leaf display *

Example: 12, 45, 67, 26, 89, 56, 13, 15, 44, 36, 32, 20, 11, 10

Page 59: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 59

Frequency Cumulative

0 0 01 0 1 2 3 5 5 52 0 6 2 73 2 6 2 94 4 5 2 115 6 1 126 0 127 0 128 9 9 21

21

Page 60: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 60

Stem & Leaf Chart improves information. Useful to indicate range,

concentration and structure of data.

Page 61: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 61

* Tables and charts for numerical data *

1. The frequency distribution

2. The histogram

3. The polygon

Page 62: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 62

* The frequency distribution table * For large data sets, it is not convenient

to analyze those observations by using ordered array or a stem-and-leaf display, instead we can arrange these observations into different groups (class groupings) to provide a more effective presentation.

This arrangement of data in tabular form is called a frequency distribution.

Page 63: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 63

* A frequency distribution table *- this also called “the relative frequency distribution” ---------------------------------------------------- 5-year annualized percentage return number of funds

------------------------------------------------------------------------------------

-10.0<-5.0 1

-5.0 < 0.0 3

0.0<5.0 14

5.0<10.0 58

10.0<15 61

15.0<20.0 17

20.0<25.0 3

25.0<30.0 1

Total 158

---------------------------------------------------------------------------------------

Page 64: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 64

* The procedures of establishing a frequency distribution table *

1. Selecting the number of classes 2. Deciding the class interval( width of interval) 3. Deciding the boundaries of the classes

Then, establishing frequency distribution table.

Page 65: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 65

* Selecting the number of classes Usually , at least 5 classes and at most 15 classes.

This means we can decide the number of classes by ourselves between 5 and 15 classes.

Of course, larger data sets have more classes than smaller ones.

Page 66: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 66

* Deciding class interval? *

Find out the range of the set of data. Where is the range? The largest – the smallest = range range

Width of interval = ------------------------------------------------

number of desired class groupings

Page 67: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 67

* Deciding the boundaries *

Boundaries mean the two ends of this frequency distribution table.

The basic rule for deciding the boundaries is that we must include the entire range of data in and but must avoid overlapping of classes.

Page 68: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 68

* Some highlights here *

1. Of course, you can choose 10 classes or 6, or whatever between 5 to 15.

2. Of course, you can also just use 4 as the width of interval, or even 6.

3. But, remember, the purpose for statistics is to make things simpler and this is why we can subjectively choose 5 or 10 as the width of interval.

Page 69: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 69

* The relative frequency distribution, the percentage distribution, and the cumulative distribution *

-------------------------------------------------------- 5-year annualized number percentage cumulative percentage return of funds of funds percentage (percentage of funds less than lower boundary of class interval)

-------------------------------------------------------------------------------------------- -10.0<-5.0 1 0.6 0.0

-5.0 < 0.0 3 1.9 0.6 0.0<5.0 14 8.9 2.5=0.6+1.9 5.0<10.0 58 36.7 11.4=0.6+1.9+8.9 10.0<15 61 38.6 48.1=0.6+1.9+8.9+36.7 15.0<20.0 17 10.8 86.7=0.6+1.9+8.9+36.7+38.6 20.0<25.0 3 1.9 97.5=0.6+1.9+8.9+36.7 +38.6+10.8 25.0<30.0 1 0.6 99.4= 0.6+1.9+8.9+36.7 +38.6+10.8+1.9 Total 158 100.0 100.0=0.6+1.9+8.9+36.7 +38.6+10.8+1.9+0.6

-------------------------------------------------------------------------------------------- -

Page 70: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 70

* Histogram *

Although tables such as the stem-and-leaf display, ordered array, and the frequency distribution table are effective to describe a large set of data, graphs(pictures) are able to more vividly present the features of it.

A picture is worth 1,000 words!

Page 71: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 71

* What is histogram?

Histogram is used to describe numerical data that have been grouped into frequency, relative frequency, or percentage distributions

. This means, after establishing frequency

distributions, histogram starts its mission.

Page 72: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 72

Histogram

Histogram of Call Durations

0

5

10

15

20

25

-1 &U

1

1 &U 3

3 &U 5

5 &U 7

7 &U 9

9 &U 11

11 &U 13

13 &U 15

15 &U 17

17 &U 19

19 &U 21

Duration in Minutes

Number of Calls

Page 73: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 73

* Frequency polygon *

Connecting all the midpoints of every classes in the frequency distribution!

Page 74: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 74

Frequency Polygon

Frequency Poly gon of Call Durations

0

5

10

15

20

25

0 2 4 6 8 10 12 14 16 18 20

Duration in Minutes

Num

ber o

f Cal

ls

Page 75: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 75

Ascending Ogive – based on cumulative percentage

Ogive of Call Durations

0%

20%

40%

60%

80%

100%

0 2 4 6 8 10 12 14 16 18 20 22

Duration in Minutes

Prop

ortio

n of C

alls

Page 76: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 76

Tables and charts for categorical data

-The summary table-Bar chart-Pareto chart-Pie chart-Run chart

Page 77: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 77

* The summary table *

A summary table is very similar to a frequency distribution table since both of them are basis to build up the other graphs (or pictures).

However, the summary table is for categorical data and the frequency distribution is for numerical data.

Page 78: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 78

* Constructing a summary table *• “Funds” example, there are altogether 259 mutual funds, 158 of

them are growth funds and the other 101 are value funds.

• Previously, we have just sorted out the 158 growth funds. These 158 funds in the group of growth category are numerical.

• Now, we classify all these 259 into 5 groups: risk is very low, low, average, high, and very high.

• These five groups now present us a categorical set of data to analyze.

Page 79: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 79

Now, do it !

------------------------------------------------- fund risk level number of funds percentage

very low 6 2.32 low 76 29.34 average 82 31.66 high 80 30.89 very high 15 5.79------------------------------------------------------------------------ Total 259 100.0

Page 80: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 80

* The bar chart *

Based on the previous summary table, by using Microsoft Excel, we can build up a bar chart.

Bar chart presents the number of different categories of funds’ risk.

Page 81: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 81

Bar Chart – used for categorical data

Number of Calls Handled on an Average Weekday

0100200

300400500

Morning Day Evening Night

Shift

Num

ber o

f Call

s

Page 82: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 82

* The pie chart *

As same, pie chart is also based on the summary table to set up.

Pie chart represents the percentage part of the summary table.

Page 83: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 83

Singles

Married / No kids

Full Nest 1

Full Nest 2

Full Nest 3

Empty Nest

Page 84: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 84

* The properties of a pareto diagram *

1. When having many groupings, we prefer using pareto diagram.

2. Pareto diagram represents the most significant grouping first and then on.

3. For the cumulative percentage polygon, the points are those midpoints of each category.

Page 85: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 85

* The pareto diagram *

Pareto diagram, also based on the summary table, is similar to bar chart.

The differences are :

1. the pareto diagram adopts descending rank order of their frequencies.

2. The pareto also includes cumulative polygon on the same graph.

3. Left side – percentage; right side – cumulative percentage.

Page 86: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 86

Pareto Diagram of Causes of Disatisfaction with Consultants

0%10%20%30%40%50%60%70%80%90%

100%

Rude

Poor

Know

ledge

Didn

't List

en

Poor

Gram

mar

Too

Form

al

Too

Fami

liar

Othe

rs

Cause

% o

f Res

pons

es

Page 87: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 87

History of Chocolate Sales

0

20

40

60

80

100

120

Jan- 95

Apr- 95

Jul- 95

Oct- 95

Jan- 96

Apr- 96

Jul- 96

Oct- 96

Jan- 97

Apr- 97

Jul- 97

Oct- 97

Jan- 98

Apr- 98

Jul- 98

Oct- 98

Jan- 99

Apr- 99

Jul- 99

Month

Sales ($'000)

Page 88: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 88

Run Chart

A good tool for illustrating one or more (numerical) variables over time.

Run Chart can allow identification of trends and periodicity.

Page 89: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 89

* Summary measures *

To describe characteristics of a set of data by “Numbers + words”!

Page 90: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 90

*Measuring from ungrouped (raw) data*

1. Central tendency (location)

2. Variation

3. shape

Page 91: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 91

Central tendency

Most sets of data show a central point, around which group or cluster of data are located.

This central point actually is a typical or representative value for the whole set of data.

Three measures here:

1. The arithmetic mean

2. Median

3. mode

Page 92: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 92

* The arithmetic mean *

• Easy to calculate!

• Caution: arithmetic mean is greatly affected by any extreme value or values.

• Therefore, when reporting an arithmetic mean with extreme values, median and mode should be added with.

Page 93: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 93

Mean for population- “N” is the population size.

NX iX /)(

Page 94: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 94

Mean for sample - “n” is the sample size.

nXX i /)(

Page 95: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 95

The median The median is the value for which 50% of the

observations are smaller and the other 50% are larger.

Caution to even number of array- the median under this circumstance is the average of the two middle values.

n+1 Median = ------------ ordered observations 2

Page 96: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 96

example

1, 2, 3, 4, 5 – odd number of data

1, 2, 3, 4, 5, 6 – even number of data

Median is 3.5.

Page 97: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 97

* Significance of median *

Whenever a set of data includes big extremes, since extremes seriously affect the accuracy of mean, median is adopted.

Median is not affected by any extreme values in a set of data.

Page 98: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 98

The mode

Easy~! No calculation at all~!

Definition: the value in a set of data that appears most frequently!

Caution: different types of data when reporting mode: 1. Data with mode 2. Data with no mode 3. A set of data can be bimodal or multimodal

Page 99: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 99

Midrange

Numerical data only!

Midrange=(Xlargest +Xsmallest)/ 2

Page 100: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 100

* Dispersion or spread *

1. Range

2. Variance

3. Standard deviation

Page 101: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 101

Dispersion, or spread

After we noticing the midpoint value – central tendency, we should pay attention to how and how much a set of data spread from the midpoint value.

Variation amount is used to measure the dispersion(spread) of a set of data.

Often, five measures of variation (measuring dispersion): range, interquartile range, variance, standard deviation and coefficient of variation.

Page 102: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 102

Range

Range = largest value – smallest value

Page 103: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 103

Variance and standard deviation

* Range is a measure of the total spread.

• While, variance and standard deviation consider how the values of the data are distributed.

Page 104: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 104

Variance

• The variance is roughly the average of the squared differences between each of the values in a set of data and the mean.

Page 105: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 105

For population

/N)(X 2Xi

2

Page 106: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 106

For sample

)1/()( 22 nXXs i

Page 107: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 107

* Standard deviation *

Stand deviation is the square root of variance.

Stand deviation means, by rule of thumb, 95% of values are around the mean at two stand deviation values.

Page 108: BEA140Leon Jiang, University of Tasmania1 Module 2 Quantitative Methods Summer Semester 2009 By Leon Jiang University of Tasmania

BEA140 Leon Jiang, University of Tasmania 108

This is the end of today’s lecture!