Upload
franklin-bridges
View
214
Download
0
Embed Size (px)
Citation preview
BEA140 Leon Jiang, University of Tasmania 1
Module 2 Quantitative Methods
Summer Semester 2009By Leon JiangUniversity of Tasmania
BEA140 Leon Jiang, University of Tasmania 2
Why this unit?
Particularly this module as required to study statistics?
BEA140 Leon Jiang, University of Tasmania 3
Thinking on a case!
Suppose! You work for R&d of IBM. As you know, IBM
is being competing with a number of very strong rivals.
Can you create a laptop computer for IBM as you like?
BEA140 Leon Jiang, University of Tasmania 4
Not that simple as ‘you like’! To clearly know what types of
laptop computers that customers like to have should be the only headway and or the direction for a successful R&D department.
But, can we know the number of customers and the types of laptop computers they like to actually buy?
BEA140 Leon Jiang, University of Tasmania 5
Of course…
The best situation would be we know exactly, in this world, how many customers who are willing to buy, exactly what types of laptop computers, and don’t forget exactly how many of these customers who can financially afford your products.
BEA140 Leon Jiang, University of Tasmania 6
However, can we know these information? In this case, it is nearly impossible to know!
But, can we estimate these information?
BEA140 Leon Jiang, University of Tasmania 7
We do can actually know these information to some extent!
This is what we are going to learn in this module!
BEA140 Leon Jiang, University of Tasmania 8
* What is statistics?
The word 'Statistics' is derived from the Latin word 'Statis' which means a "political state.“
Clearly, statistics is closely linked with the administrative affairs of a state such as facts and figures regarding defense force, population, housing, food, financial resources etc.
What is true about a government is also true about industrial administration units, and even one personal life.
BEA140 Leon Jiang, University of Tasmania 9
* The meaning of statistics!
* The word statistics, as a singular noun, is used to describe a branch of applied mathematics, whose purpose is to provide methods of dealing with a collection of data and extracting information from them in compact form by tabulating, summarizing and analyzing the numerical data or a set of observations.
* The various methods used are termed as statistical methods and the person using them is known as a statistician. A statistician is concerned with the analysis and interpretation of the data and drawing valid worthwhile conclusions from the same.
BEA140 Leon Jiang, University of Tasmania 10
* Some important terms~!
Population ( or universe) - the total number of objects (individuals or members) to be considered.
The total number in a population is known as the size of the population---which may be finite or infinite.
The population can refer to things as well as people.
BEA140 Leon Jiang, University of Tasmania 11
* Examples as “population”! All members of the student union of UTAS.
All students of QM unit in your class.
All the people who drink beer at least 2 times a week over the past one year.
Heights of teaching staff at Utas.
Weights of all the citizens of Hobart above 20 years of age.
BEA140 Leon Jiang, University of Tasmania 12
* Population - finite and infinite
A population is finite if it contains finite number of individuals. For example, the number of students at your class.
A population is infinite if it contains infinite number of individuals. For example, the pressures at various points in the atmosphere.
BEA140 Leon Jiang, University of Tasmania 13
* Infinite population~!
Often, statisticians want to know things about population, but they fail to do so almost because in every case such data for every individual of the population are not available.
Suppose in the above case, can we know how many people who are willing to buy and actually afford to buy your products?
Thus whenever we want to study the characteristics of a certain population, it is difficult to study the whole population. it is often expensive and time consuming and many times we lack resources for the study of the whole population. In any science we cannot
study more than a part of population. A part or small section selected from the population is called a sample.
BEA140 Leon Jiang, University of Tasmania 14
This can be the starting point for the case, to take a sample in order to know something about the population!
BEA140 Leon Jiang, University of Tasmania 15
* Sample~!
A finite set of objects drawn from the population with an aim is called a sample.
BEA140 Leon Jiang, University of Tasmania 16
* Even in every day life we make many of our decisions
based on samples taken, though we are not aware of it. We just take a little from a gunny bag of rice, we
judge its quality and then we purchase the whole bag.
If we want to taste milk, we just take a glassful of milk from the can and taste it.
Note that taking a sample is easy in many cases where the population is uniform or homogeneous
When the population is heterogeneous (not uniform), the selection of a sample is not very easy.
BEA140 Leon Jiang, University of Tasmania 17
* Parameter & statistic*• The word ‘Parameters’ is associated with
the population and it is understood as the measure of the characteristics of the population, such as mean and standard deviation, etc.
• The word 'Statistic' is used for a random sample and it is understood as the measure of the characteristics of the random sample, such as mean and standard deviation etc.
BEA140 Leon Jiang, University of Tasmania 18
* Different symbols are used to denote parameters and statistics *
BEA140 Leon Jiang, University of Tasmania 19
* For instance, say:
Aim to know: The average (mean) income of families living in the area “Salamanca" in the year 2007-2008. i.e. $50,000 - The population parameter (in A$).
Work out this way: Draw a random sample of 200 families and compute their average income. i.e. the statistic of sample says $52,000.
Conclusion: The population mean (parameter) is close to sample mean.
X
BEA140 Leon Jiang, University of Tasmania 20
* The process of doing statistics * - 3 steps ~!
1. Design – gathering data !
2. Description – summarizing, studying features and characteristics of data, providing useful and effective information. (including graphical tools, tables, summary measures.)
3. Inference ( conclusion) – requiring the application of probability concepts.
BEA140 Leon Jiang, University of Tasmania 21
Importance of statistics
‘hard data’
Scientific evidence
Postgraduate study for masters or PhD.
Necessary for high-quality essays.
BEA140 Leon Jiang, University of Tasmania 22
* Sources of data *
Primary source – the original collector of the data, i.e. the National Population Census Bureau.
Secondary source – a subsequent user of the data.
BEA140 Leon Jiang, University of Tasmania 23
* How to collect primary data?
• Survey – interview, questionnaire, etc.- in doing a survey, skills are required for design.
• Observation – observing and recording behaviors.
• Experiment – use of experimental and control groups.
- appropriate design is important.
BEA140 Leon Jiang, University of Tasmania 24
* Where to find secondary data? *
Those published sources of data, e.g. trade journals, any relevant kinds of media.
Secondary data collection is usually more cost-saving and less time -consuming than primary data collection.
BEA140 Leon Jiang, University of Tasmania 25
* Survey errors *- Usually four kinds of errors realized beforehand
1. Coverage error→selection bias Not cover all or exclude some –population frames not clear( the
random probability sample selected will provide an estimate of the characteristics of the frame, not the actual population. )
2. Non-response error → bias Those who with no response might have very different views.
3. Sampling error samples are not representative.
4. Measurement error Poor questionnaire or interviewing skills
BEA140 Leon Jiang, University of Tasmania 26
•Sampling methods *- 2 basic kinds!
1. Probability sampling ( random sampling)-
* Only random sampling is valid for statistical inference.
2. Non-probability sampling –- Two broad types: accidental or purposive.
BEA140 Leon Jiang, University of Tasmania 27
* Convenience sample *
A convenience sample is a sample where the patients are selected, in part or in whole, at the convenience of the researcher. The researcher makes no attempt, or only a limited attempt, to ensure that this sample is an accurate representation of some larger group or population.
The classic example of a convenience sample is standing at a shopping mall and selecting shoppers as they walk by to fill out a survey.
BEA140 Leon Jiang, University of Tasmania 28
- More about convenience sample -
In general, the Statistics community frowns on convenience samples. You will often have great difficulty in generalizing the results of a convenience sample to any population that has practical relevance.
Still, convenience samples can provide you with useful information, especially in a pilot study. To interpret the findings from a convenience sample properly, you have to characterize (usually in a qualitative sense) how your sample would differ from an ideal sample that was randomly selected. In particular, pay attention to who might be left out of your convenience sample or who might be underrepresented in your sample.
BEA140 Leon Jiang, University of Tasmania 29
* Random sample *
In contrast, a random sample is one where the researcher insures (usually through the use of random numbers applied to a list of the entire population) that each member of that population has an equal probability of being selected.
Random samples are an important foundation of Statistics. Almost all of the mathematical theory upon which Statistics are based rely on assumptions which are consistent with a random sample. This theory is inconsistent with data collected from a convenience sample.
BEA140 Leon Jiang, University of Tasmania 30
* Judgment sample *
- A non-probability sample that is often called a purposive sample because the sample elements are handpicked and because they are expected to serve the research purpose.
BEA140 Leon Jiang, University of Tasmania 31
* Simple sampling * A sampling procedure that assures that each element in the
population has an equal chance of being selected is referred to as simple random sampling .
Let us assume you had a school , with a 1000 students, divided equally into boys and girls, and you wanted to select 100 of them for further study. You might put all their names in a drum and then pull 100 names out. Not only does each person have an equal chance of being selected, we can also easily calculate the probability of a given person being chosen, since we know the sample size (n) and the population (N) and it becomes a simple matter of division:
n/N x 100 or 100/1000 x 100 = 10%
This means that every student in the school has a 10% or 1 in 10 chance of being selected using this method.
BEA140 Leon Jiang, University of Tasmania 32
Systematic Sampling
At first sight this is very different. Suppose that the N units in the population are numbered 1 to N in some order. To select a systematic sample of n units, if then every k-th unit is selected commencing with a randomly chosen number between 1 and k. Hence the selection of the first unit determines the whole sample, e.g., N = 5,000, n = 250 therefore k = 5000/250 = 20. Therefore, select every 20th item commencing with (say) 6.
BEA140 Leon Jiang, University of Tasmania 33
* Samples from a subdivided population *
* Quota sampling usually refers to the process whereby a researcher attempts to match in a sample the exact makeup of the population with regard to certain demographic characteristics deemed important (such as gender, age, race, income, etc ).
* Quota sampling is non-probability.
BEA140 Leon Jiang, University of Tasmania 34
* Stratified random sampling*
Stratified sampling is used if sampled area (or volume) is heterogeneous.
The whole population is first into mutually exclusive subgroups or strata and then units are selected randomly from each stratum.
BEA140 Leon Jiang, University of Tasmania 35
* Cluster sampling *
Cluster sampling is used when "natural" groupings are evident in the population. The total population is divided into groups or clusters.
BEA140 Leon Jiang, University of Tasmania 36
* Properties of data *
The phenomena or characteristics observed are random variables.
Variables have a range of values and are random, for example: eye color, height, weight, income per month, car accidents per day…
BEA140 Leon Jiang, University of Tasmania 37
* Two types of variables!
Categorical– featuring in quality of variables.
Numerical – more in quantity of variables.
BEA140 Leon Jiang, University of Tasmania 38
* Categorical variables!- Yielding categorical responses -
* Nominal scale and ordinal scale *- Nominal scale : variables have no relation to order and only can
be analyzed by their names.- Arithmetic limited to counting.- Example: Degree - law, commerce, economics, arts, science, etc.
- Ordinal scale: variables are also nominal but there is ordering or ranking in them.
- Example as: House number in a street: 121, 122, 123, 124, 125, etc.
- Nominal plus positional measures including median in particular.
BEA140 Leon Jiang, University of Tasmania 39
* Numerical variables!- Yielding numerical responses -
* Interval scale and ratio scale *- Interval scale: variables themselves have an order and the
difference between values is a meaningful quantity. - Zero value here is arbitrary. - Example: temperature – difference between 4 C and 6 C is the same
as between 6 C and 8 C, but 8 C is not twice as hot as 4 C.- Or, degree of your eyesight; is 1.5 two times 0.75?
- Ratio scale: like interval scale, variables in this scale have a order or ranking, but there is a true zero here.
- Example: 100kg is twice as heavy as 50kg.
BEA140 Leon Jiang, University of Tasmania 40
* Numerical or quantitative variables can further be subdivided continuous and discrete ones.
Continuous variables - such as time.
Discrete variables – such as family size.
BEA140 Leon Jiang, University of Tasmania 41
* Example- variable types & scales of measurement
variable example value type/ scale country of birth Australia categorical, norminaljudo belt Blue categorical, ordinal
mortgage $125, 000 (continuous) numerical, ratio
class size 302 (discrete) numerical, ratio
BEA140 Leon Jiang, University of Tasmania 42
* Two more terms!
Raw data: collected but unsorted.
Array : ordered data, increasing or decreasing.
BEA140 Leon Jiang, University of Tasmania 43
* Describing and presenting data!
Data can be described and communicated in three main ways:
Tabular (in the form of tables) – frequency tables, contingency tables and super tables, etc.
Graphical- various forms of charts.
Summary (descriptive)- mean, standard deviation, median, etc.
BEA140 Leon Jiang, University of Tasmania 44
Steam and leaf display
Stem Leaf Frq Cum.
1 .8 1 1
2 .0 .2 .4 .5 .5 .5 .7 .8 .9 .9 10 11
3 .1 .1 .2 .2 .3 .4 .4 .4 .6 .8 .8 .9 .9 13 24
4 .0 .2 .3 .5 .6 .6 6 30
5 .0 .0 .1 .9 4 34
6 .0 .1 .2 .5 .7 .7 6 40
7 .0 .2 .5 .6 .6 5 45
8 .0 .1 .5 .9 4 49
9 .2 1 50
10 .1 1 51
11 0 51
12 .4 1 52
13 .6 1 53
14 0 53
15 0 53
16 0 53
17 .7 1 54
54
BEA140 Leon Jiang, University of Tasmania 45
Frequency table!
Time Number ofCalls
Class Mark Cum. Freq Cum. %
x i f j x j f j f j /n 0 0 0 0.00% 11 2 11 20.37% 19 4 30 55.56% 10 6 40 74.07% 9 8 49 90.74% 2 10 51 94.44% 1 12 52 96.30% 1 14 53 98.15% 0 16 53 98.15% 1 18 54 100.00% 0 20 54 100.00%
BEA140 Leon Jiang, University of Tasmania 46
Histogram
Histogram of Call Durations
0
5
10
15
20
25
-1 &U
1
1 &U 3
3 &U 5
5 &U 7
7 &U 9
9 &U 11
11 &U 13
13 &U 15
15 &U 17
17 &U 19
19 &U 21
Duration in Minutes
Number of Calls
BEA140 Leon Jiang, University of Tasmania 47
Frequency Polygon
Frequency Poly gon of Call Durations
0
5
10
15
20
25
0 2 4 6 8 10 12 14 16 18 20
Duration in Minutes
Num
ber o
f Cal
ls
BEA140 Leon Jiang, University of Tasmania 48
Ascending Ogive
Ogive of Call Durations
0%
20%
40%
60%
80%
100%
0 2 4 6 8 10 12 14 16 18 20 22
Duration in Minutes
Prop
ortio
n of C
alls
BEA140 Leon Jiang, University of Tasmania 49
Bar ChartNumber of Calls Handled on an Average Weekday
0100200
300400500
Morning Day Evening Night
Shift
Num
ber o
f Call
s
BEA140 Leon Jiang, University of Tasmania 50
Pareto Pareto Diagram of Causes of Disatisfaction with Consultants
0%10%20%30%40%50%60%70%80%90%
100%
Ru
de
Po
or
Kn
ow
led
ge
Did
n't
Lis
ten
Po
or
Gra
mm
ar
To
o F
orm
al
To
o
Fa
mili
ar
Oth
ers
Cause
% o
f Re
spo
nse
s
BEA140 Leon Jiang, University of Tasmania 51
* Summary measures~!
Central tendency: typical or representative value – a measure of location.
Dispersion: indicating the variation or spread in the data.
Shape of the grouped data.
BEA140 Leon Jiang, University of Tasmania 52
* Presenting data in tables and charts
* Summary Measures
BEA140 Leon Jiang, University of Tasmania 53
Univariate Data
Single variable
BEA140 Leon Jiang, University of Tasmania 54
* Learning objectives *
1. Organize numerical data
2. Develop tables and charts for numerical data
3. Develop tables and charts for categorical data
4. Understand the principles of proper graphical presentation
BEA140 Leon Jiang, University of Tasmania 55
* Two ways to organize numerical data
The ordered array
The stem-and-leaf display
BEA140 Leon Jiang, University of Tasmania 56
* The ordered array !
An ordered array makes the raw data in rank order from the smallest to the largest.
The feature of ordered array is it makes easier to pick out extremes, typical values, and area where the majority of the values are concentrated.
BEA140 Leon Jiang, University of Tasmania 57
* The stem-and-leaf display !
This valuable data-organizing tool helps show how the values distribute and cluster in the data set.
The stem-and-leaf display is constructed , apparently from its name, with two parts>:
- the stem - The leaf
BEA140 Leon Jiang, University of Tasmania 58
* Construct a stem-and-leaf display *
Example: 12, 45, 67, 26, 89, 56, 13, 15, 44, 36, 32, 20, 11, 10
BEA140 Leon Jiang, University of Tasmania 59
Frequency Cumulative
0 0 01 0 1 2 3 5 5 52 0 6 2 73 2 6 2 94 4 5 2 115 6 1 126 0 127 0 128 9 9 21
21
BEA140 Leon Jiang, University of Tasmania 60
Stem & Leaf Chart improves information. Useful to indicate range,
concentration and structure of data.
BEA140 Leon Jiang, University of Tasmania 61
* Tables and charts for numerical data *
1. The frequency distribution
2. The histogram
3. The polygon
BEA140 Leon Jiang, University of Tasmania 62
* The frequency distribution table * For large data sets, it is not convenient
to analyze those observations by using ordered array or a stem-and-leaf display, instead we can arrange these observations into different groups (class groupings) to provide a more effective presentation.
This arrangement of data in tabular form is called a frequency distribution.
BEA140 Leon Jiang, University of Tasmania 63
* A frequency distribution table *- this also called “the relative frequency distribution” ---------------------------------------------------- 5-year annualized percentage return number of funds
------------------------------------------------------------------------------------
-10.0<-5.0 1
-5.0 < 0.0 3
0.0<5.0 14
5.0<10.0 58
10.0<15 61
15.0<20.0 17
20.0<25.0 3
25.0<30.0 1
Total 158
---------------------------------------------------------------------------------------
BEA140 Leon Jiang, University of Tasmania 64
* The procedures of establishing a frequency distribution table *
1. Selecting the number of classes 2. Deciding the class interval( width of interval) 3. Deciding the boundaries of the classes
Then, establishing frequency distribution table.
BEA140 Leon Jiang, University of Tasmania 65
* Selecting the number of classes Usually , at least 5 classes and at most 15 classes.
This means we can decide the number of classes by ourselves between 5 and 15 classes.
Of course, larger data sets have more classes than smaller ones.
BEA140 Leon Jiang, University of Tasmania 66
* Deciding class interval? *
Find out the range of the set of data. Where is the range? The largest – the smallest = range range
Width of interval = ------------------------------------------------
number of desired class groupings
BEA140 Leon Jiang, University of Tasmania 67
* Deciding the boundaries *
Boundaries mean the two ends of this frequency distribution table.
The basic rule for deciding the boundaries is that we must include the entire range of data in and but must avoid overlapping of classes.
BEA140 Leon Jiang, University of Tasmania 68
* Some highlights here *
1. Of course, you can choose 10 classes or 6, or whatever between 5 to 15.
2. Of course, you can also just use 4 as the width of interval, or even 6.
3. But, remember, the purpose for statistics is to make things simpler and this is why we can subjectively choose 5 or 10 as the width of interval.
BEA140 Leon Jiang, University of Tasmania 69
* The relative frequency distribution, the percentage distribution, and the cumulative distribution *
-------------------------------------------------------- 5-year annualized number percentage cumulative percentage return of funds of funds percentage (percentage of funds less than lower boundary of class interval)
-------------------------------------------------------------------------------------------- -10.0<-5.0 1 0.6 0.0
-5.0 < 0.0 3 1.9 0.6 0.0<5.0 14 8.9 2.5=0.6+1.9 5.0<10.0 58 36.7 11.4=0.6+1.9+8.9 10.0<15 61 38.6 48.1=0.6+1.9+8.9+36.7 15.0<20.0 17 10.8 86.7=0.6+1.9+8.9+36.7+38.6 20.0<25.0 3 1.9 97.5=0.6+1.9+8.9+36.7 +38.6+10.8 25.0<30.0 1 0.6 99.4= 0.6+1.9+8.9+36.7 +38.6+10.8+1.9 Total 158 100.0 100.0=0.6+1.9+8.9+36.7 +38.6+10.8+1.9+0.6
-------------------------------------------------------------------------------------------- -
BEA140 Leon Jiang, University of Tasmania 70
* Histogram *
Although tables such as the stem-and-leaf display, ordered array, and the frequency distribution table are effective to describe a large set of data, graphs(pictures) are able to more vividly present the features of it.
A picture is worth 1,000 words!
BEA140 Leon Jiang, University of Tasmania 71
* What is histogram?
Histogram is used to describe numerical data that have been grouped into frequency, relative frequency, or percentage distributions
. This means, after establishing frequency
distributions, histogram starts its mission.
BEA140 Leon Jiang, University of Tasmania 72
Histogram
Histogram of Call Durations
0
5
10
15
20
25
-1 &U
1
1 &U 3
3 &U 5
5 &U 7
7 &U 9
9 &U 11
11 &U 13
13 &U 15
15 &U 17
17 &U 19
19 &U 21
Duration in Minutes
Number of Calls
BEA140 Leon Jiang, University of Tasmania 73
* Frequency polygon *
Connecting all the midpoints of every classes in the frequency distribution!
BEA140 Leon Jiang, University of Tasmania 74
Frequency Polygon
Frequency Poly gon of Call Durations
0
5
10
15
20
25
0 2 4 6 8 10 12 14 16 18 20
Duration in Minutes
Num
ber o
f Cal
ls
BEA140 Leon Jiang, University of Tasmania 75
Ascending Ogive – based on cumulative percentage
Ogive of Call Durations
0%
20%
40%
60%
80%
100%
0 2 4 6 8 10 12 14 16 18 20 22
Duration in Minutes
Prop
ortio
n of C
alls
BEA140 Leon Jiang, University of Tasmania 76
Tables and charts for categorical data
-The summary table-Bar chart-Pareto chart-Pie chart-Run chart
BEA140 Leon Jiang, University of Tasmania 77
* The summary table *
A summary table is very similar to a frequency distribution table since both of them are basis to build up the other graphs (or pictures).
However, the summary table is for categorical data and the frequency distribution is for numerical data.
BEA140 Leon Jiang, University of Tasmania 78
* Constructing a summary table *• “Funds” example, there are altogether 259 mutual funds, 158 of
them are growth funds and the other 101 are value funds.
• Previously, we have just sorted out the 158 growth funds. These 158 funds in the group of growth category are numerical.
• Now, we classify all these 259 into 5 groups: risk is very low, low, average, high, and very high.
• These five groups now present us a categorical set of data to analyze.
BEA140 Leon Jiang, University of Tasmania 79
Now, do it !
------------------------------------------------- fund risk level number of funds percentage
very low 6 2.32 low 76 29.34 average 82 31.66 high 80 30.89 very high 15 5.79------------------------------------------------------------------------ Total 259 100.0
BEA140 Leon Jiang, University of Tasmania 80
* The bar chart *
Based on the previous summary table, by using Microsoft Excel, we can build up a bar chart.
Bar chart presents the number of different categories of funds’ risk.
BEA140 Leon Jiang, University of Tasmania 81
Bar Chart – used for categorical data
Number of Calls Handled on an Average Weekday
0100200
300400500
Morning Day Evening Night
Shift
Num
ber o
f Call
s
BEA140 Leon Jiang, University of Tasmania 82
* The pie chart *
As same, pie chart is also based on the summary table to set up.
Pie chart represents the percentage part of the summary table.
BEA140 Leon Jiang, University of Tasmania 83
Singles
Married / No kids
Full Nest 1
Full Nest 2
Full Nest 3
Empty Nest
BEA140 Leon Jiang, University of Tasmania 84
* The properties of a pareto diagram *
1. When having many groupings, we prefer using pareto diagram.
2. Pareto diagram represents the most significant grouping first and then on.
3. For the cumulative percentage polygon, the points are those midpoints of each category.
BEA140 Leon Jiang, University of Tasmania 85
* The pareto diagram *
Pareto diagram, also based on the summary table, is similar to bar chart.
The differences are :
1. the pareto diagram adopts descending rank order of their frequencies.
2. The pareto also includes cumulative polygon on the same graph.
3. Left side – percentage; right side – cumulative percentage.
BEA140 Leon Jiang, University of Tasmania 86
Pareto Diagram of Causes of Disatisfaction with Consultants
0%10%20%30%40%50%60%70%80%90%
100%
Rude
Poor
Know
ledge
Didn
't List
en
Poor
Gram
mar
Too
Form
al
Too
Fami
liar
Othe
rs
Cause
% o
f Res
pons
es
BEA140 Leon Jiang, University of Tasmania 87
History of Chocolate Sales
0
20
40
60
80
100
120
Jan- 95
Apr- 95
Jul- 95
Oct- 95
Jan- 96
Apr- 96
Jul- 96
Oct- 96
Jan- 97
Apr- 97
Jul- 97
Oct- 97
Jan- 98
Apr- 98
Jul- 98
Oct- 98
Jan- 99
Apr- 99
Jul- 99
Month
Sales ($'000)
BEA140 Leon Jiang, University of Tasmania 88
Run Chart
A good tool for illustrating one or more (numerical) variables over time.
Run Chart can allow identification of trends and periodicity.
BEA140 Leon Jiang, University of Tasmania 89
* Summary measures *
To describe characteristics of a set of data by “Numbers + words”!
BEA140 Leon Jiang, University of Tasmania 90
*Measuring from ungrouped (raw) data*
1. Central tendency (location)
2. Variation
3. shape
BEA140 Leon Jiang, University of Tasmania 91
Central tendency
Most sets of data show a central point, around which group or cluster of data are located.
This central point actually is a typical or representative value for the whole set of data.
Three measures here:
1. The arithmetic mean
2. Median
3. mode
BEA140 Leon Jiang, University of Tasmania 92
* The arithmetic mean *
• Easy to calculate!
• Caution: arithmetic mean is greatly affected by any extreme value or values.
• Therefore, when reporting an arithmetic mean with extreme values, median and mode should be added with.
BEA140 Leon Jiang, University of Tasmania 93
Mean for population- “N” is the population size.
NX iX /)(
BEA140 Leon Jiang, University of Tasmania 94
Mean for sample - “n” is the sample size.
nXX i /)(
BEA140 Leon Jiang, University of Tasmania 95
The median The median is the value for which 50% of the
observations are smaller and the other 50% are larger.
Caution to even number of array- the median under this circumstance is the average of the two middle values.
n+1 Median = ------------ ordered observations 2
BEA140 Leon Jiang, University of Tasmania 96
example
1, 2, 3, 4, 5 – odd number of data
1, 2, 3, 4, 5, 6 – even number of data
Median is 3.5.
BEA140 Leon Jiang, University of Tasmania 97
* Significance of median *
Whenever a set of data includes big extremes, since extremes seriously affect the accuracy of mean, median is adopted.
Median is not affected by any extreme values in a set of data.
BEA140 Leon Jiang, University of Tasmania 98
The mode
Easy~! No calculation at all~!
Definition: the value in a set of data that appears most frequently!
Caution: different types of data when reporting mode: 1. Data with mode 2. Data with no mode 3. A set of data can be bimodal or multimodal
BEA140 Leon Jiang, University of Tasmania 99
Midrange
Numerical data only!
Midrange=(Xlargest +Xsmallest)/ 2
BEA140 Leon Jiang, University of Tasmania 100
* Dispersion or spread *
1. Range
2. Variance
3. Standard deviation
BEA140 Leon Jiang, University of Tasmania 101
Dispersion, or spread
After we noticing the midpoint value – central tendency, we should pay attention to how and how much a set of data spread from the midpoint value.
Variation amount is used to measure the dispersion(spread) of a set of data.
Often, five measures of variation (measuring dispersion): range, interquartile range, variance, standard deviation and coefficient of variation.
BEA140 Leon Jiang, University of Tasmania 102
Range
Range = largest value – smallest value
BEA140 Leon Jiang, University of Tasmania 103
Variance and standard deviation
* Range is a measure of the total spread.
• While, variance and standard deviation consider how the values of the data are distributed.
BEA140 Leon Jiang, University of Tasmania 104
Variance
• The variance is roughly the average of the squared differences between each of the values in a set of data and the mean.
BEA140 Leon Jiang, University of Tasmania 105
For population
/N)(X 2Xi
2
BEA140 Leon Jiang, University of Tasmania 106
For sample
)1/()( 22 nXXs i
BEA140 Leon Jiang, University of Tasmania 107
* Standard deviation *
Stand deviation is the square root of variance.
Stand deviation means, by rule of thumb, 95% of values are around the mean at two stand deviation values.
BEA140 Leon Jiang, University of Tasmania 108
This is the end of today’s lecture!