View
220
Download
3
Embed Size (px)
Comparingcategorical data
18Chapter
Contents: A Categorical data
B Examining categorical data
C Comparing and reporting
categorical data
D Data collectionE Misleading graphs
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\369IB_MYP3_18.CDR Wednesday, 28 May 2008 11:52:44 AM PETER
HISTORICAL NOTE FLORENCE NIGHTINGALE
OPENING PROBLEM
Florence Nightingale (1820 - 1910) was borninto an upper class English family. Her
father believed that women should have an
education, and she learnt Italian, Latin, Greek
and history, and had an excellent early preparation in
mathematics.
She served as a nurse during the Crimean War, and became
known as the lady with the lamp. During this time she
collected data and kept systematic records.
After the war she came to believe that most of the soldiers
in hospital were killed by insanitary living conditions rather
than dying from their wounds.
She wrote detailed statistical reports and represented her statistical data graphically.
She demonstrated that statistics provided an organised way of learning and this led to
improvements in medical and surgical practices.
A construction company is building a new high-rise apartment building in
Tokyo. It will be 24 floors high with 8 apartments on each floor.
The company needs to know some information about the people who will
be buying the apartments. They prepare a form which is published in all
local papers and on-line:
Marital status:
married single
Age group:
18 to 35 36 to 59 60+
Desired number of bedrooms:
1 2 3
The statistical officer receives 272 responses and these are typed in coded form.
Marital Status
Age group
Married (M) Single (S)
18 to 35 (Y) 36 to 59 (I) 60+ (O)
1 2 3 bedroomsApartment size
HANAKOCONSTRUCTIONS
Please respond only if you have in owning yourown residence in this prestigious new block.
some interest
HANAKO CONSTRUCTIONS NEW APARTMENTS
U70 400to million
Phone number: :::::::::::::::::::::::::::::::::::::::::::::
Current address: :::::::::::::::::::::::::::::::::::::::::::
Name: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::
370 COMPARING CATEGORICAL DATA (Chapter 18)
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\370IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:29 AM PETER
The results are:
MY1 MI3 MI2 MO2 MY2 MO2 MO2 MY2 MO2 MI2
SY1 MO2 MY1 MI3 MO2 SO1 MI3 SO2 MO2 MO2
MI3 SO3 SO2 MI3 MI1 MO3 SI3 MO2 SO2 SO1
SO1 MI3 MO2 SO1 SY2 MO1 MY1 MI2 MO1 MO1
MO2 SO1 SO2 MI3 MO1 MI3 SI1 SI2 MO2 MO1
SO1 MO2 MI3 MI3 MO1 MI2 MO2 MO2 MO1 MO1
MO2 MI3 SY2 MO3 MO1 MI3 MI3 MI3 MO1 SO3
SO1 MO2 SI2 SO1 MO3 MI3 SI2 MO1 MI3 MO1
MO2 MO1 MI3 MY2 MY3 MI3 MI1 MY1 SY2 MI3
SO1 MY2 MI3 MO1 SI3 SI1 SY3 MO1 MO1 SO1
MY1 MI3 MI3 MI3 MY2 MO3 MO2 SO2 MI3 MO1
MO1 MI1 SI2 MO3 MI1 MI3 MI3 MY3 MO2 MO1
MO2 MY2 SO2 MY2 SO1 SI2 SO3 MO3 MI3 MI3
SO2 MI3 MI3 SO1 MY2 MI3 SY2 MO1 MI2 MI3
SO1 SO2 MI3 MO3 SO2 SY1 SO2 SI1 MY2 SI1
MI2 MI3 MI3 MY2 MY2 MI3 MO2 MO3 MO1 MI3
MO1 SO1 MO1 MO2 MO2 SO2 MI3 SO1 MI3 SI1
MI2 MY2 MI3 SI1 MI3 MO2 MI3 MI3 MO1 MO2
MI3 SI1 MI3 MI3 SY2 SO2 MO1 SI2 SO2 SO1
SO1 MI2 MO2 MO2 MO1 MI3 MI3 MI3 MO3 MO2
MI2 MI3 MO1 MI3 SO1 SO2 SI2 SO1 SI2
SO1 MI3 MI3 MO3 MO2 MY1 MO2 MI3 MO3
MI1 SY2 MO3 SO1 MY2 SI2 MI2 MI3 SI1
MO1 MO2 MO3 MI3 MO1 SO1 MI2 MI3 MO2
MI3 MI3 MI3 SO1 MI3 MI3 SY2 SI3 MO2
MI1 SO1 MI3 MY2 SY3 MI3 MI2 SO2 MO2
SO1 MI3 MI3 MY1 MI1 MO2 MY1 MI2 MO3
MI1 MI3 MI3 SI1 MO3 MO1 SI1 SO1 SI1
Things to think about:
What problems are the construction company trying to solve? Is the companys investigation a census or a survey? What are the variables? Are the variables categorical or quantitative? What are the categories of the categorical variables? Can you explain why the construction company is interested in these categories? Is the data being collected in an unbiased way? Why were the names, addresses and phone numbers of respondents asked for? Can you make sense of the data in its present form? How could you reorganise the data so that it can be summarised and displayed? What methods of display are appropriate here? Can you make a conclusion regarding the data and write a report of your findings?
COMPARING CATEGORICAL DATA (Chapter 18) 371
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\371IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:31 AM PETER
Statistics is the art of solving problems and answering questions by collecting
and analysing data.
The facts or pieces of information we collect are called data.
One piece of information is known as one piece of datum (singular), whereas lots of pieces
of information are known as data (plural).
A list of information is called a data set. If it is not in organised form it is called raw data.
VARIABLES
There are two types of variables that we commonly deal with:
A categorical variable describes a particular quality or characteristic. The data isdivided into categories, and the information collected is called categorical data.
Examples of categorical variables are:
Getting to school:
Colour of eyes:
the categories could be train, bus, car and walking.
the categories could be blue, brown, hazel, green, grey.
A quantitative variable has a numerical value and is often called a numericalvariable. The information collected is called numerical data.
Quantitative variables can be either discrete or continuous.
A quantitative discrete variable takes exact number values and is often a result of
counting.
Examples of discrete quantitative variables are:
The number of people in a household: the variable could take the values
1, 2, 3, .....
The score out of 30 for a test: the variable could take the values
0, 1, 2, 3, ......, 30.
A quantitative continuous variable takes numerical values within a certain
continuous range. It is usually a result of measuring.
Examples of quantitative continuous variables are:
The weight of newborn babies: the variable could take any positive value
on the number line but is likely to be in the
range 0:5 kg to 8 kg.
The heights of 14 year old students: the variable would be measured in
centimetres. A student whose height is
recorded as 145 cm could have exact heightanywhere between 144:5 cm and 145:5 cm.
CATEGORICAL DATAA
372 COMPARING CATEGORICAL DATA (Chapter 18)
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25
25
75
75
50
50
95
95
100
100
Y:\HAESE\IB_MYP3\IB_MYP3_18\372IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:33 AM PETER
CENSUS OR SAMPLE
The two methods of data collection are by census or sample.
A census involves collecting data about every individual in a whole population.
The individuals in a population may be people or objects. A census is detailed and accurate
but is expensive, time consuming, and often impractical.
A sample involves collecting data about a part of the population only.
A sample is cheaper and quicker than a census but
is not as detailed or as accurate. Conclusions drawn
from samples always involve some error.
A sample must truly reflect the characteristics of the
whole population. To ensure this it must be unbiased
and large enough.
Just how large a sample needs to be is discussed in
future courses.
In a biased sample, the data has been unfairly influenced by the collection process.
It is not truly representative of the whole population.
STATISTICAL GRAPHS
Two variables under consideration are usually linked by one being dependent on the other.
For example: The total cost of a dinner depends on the number of guests present.
The total cost of a dinner is the dependent variable.
The number of guests present is the independent variable.
When drawing graphs involving two variables,
the independent variable is usually placed on the
horizontal axis and the dependent variable is
placed on the vertical axis. An exception to this
is when we draw a horizontal bar chart.
Acceptable graphs which display categorical data are:
The mode of a set of categorical data is the category which occurs most frequently.
dependent variable
independent variable
Vertical column graph
42
68
10
Horizontal bar chart
42 6 8 10
Segment bar chartPie chart
COMPARING CATEGORICAL DATA (Chapter 18) 373
IB MYP_3
magentacyan yellow black
0 05 5
25
25
75
75
50
50
95
95
100
100 0 05 5
25