Transcript
Page 1: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

Comparingcategorical data

18Chapter

Contents: A Categorical data

B Examining categorical data

C Comparing and reporting

categorical data

D Data collectionE Misleading graphs

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\369IB_MYP3_18.CDR Wednesday, 28 May 2008 11:52:44 AM PETER

Page 2: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

HISTORICAL NOTE FLORENCE NIGHTINGALE

OPENING PROBLEM

Florence Nightingale (1820 - 1910) was born

into an upper class English family. Her

father believed that women should have an

education, and she learnt Italian, Latin, Greek

and history, and had an excellent early preparation in

mathematics.

She served as a nurse during the Crimean War, and became

known as ‘the lady with the lamp’. During this time she

collected data and kept systematic records.

After the war she came to believe that most of the soldiers

in hospital were killed by insanitary living conditions rather

than dying from their wounds.

She wrote detailed statistical reports and represented her statistical data graphically.

She demonstrated that statistics provided an organised way of learning and this led to

improvements in medical and surgical practices.

A construction company is building a new high-rise apartment building in

Tokyo. It will be 24 floors high with 8 apartments on each floor.

The company needs to know some information about the people who will

be buying the apartments. They prepare a form which is published in all

local papers and on-line:

Marital status:

¤ married ¤ single

Age group:

¤ 18 to 35 ¤ 36 to 59 ¤ 60+

Desired number of bedrooms:

¤ 1 ¤ 2 ¤ 3

The statistical officer receives 272 responses and these are typed in coded form.

Marital Status

Age group

Married (M) Single (S)

18 to 35 (Y) 36 to 59 (I) 60+ (O)

1 2 3 bedroomsApartment size

HANAKOCONSTRUCTIONS

Please respond only if you have in owning yourown residence in this prestigious new block.

some interest

HANAKO CONSTRUCTIONS NEW APARTMENTS–

U70 400to million

Phone number: :::::::::::::::::::::::::::::::::::::::::::::

Current address: :::::::::::::::::::::::::::::::::::::::::::

Name: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::

370 COMPARING CATEGORICAL DATA (Chapter 18)

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\370IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:29 AM PETER

Page 3: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

The results are:

MY1 MI3 MI2 MO2 MY2 MO2 MO2 MY2 MO2 MI2

SY1 MO2 MY1 MI3 MO2 SO1 MI3 SO2 MO2 MO2

MI3 SO3 SO2 MI3 MI1 MO3 SI3 MO2 SO2 SO1

SO1 MI3 MO2 SO1 SY2 MO1 MY1 MI2 MO1 MO1

MO2 SO1 SO2 MI3 MO1 MI3 SI1 SI2 MO2 MO1

SO1 MO2 MI3 MI3 MO1 MI2 MO2 MO2 MO1 MO1

MO2 MI3 SY2 MO3 MO1 MI3 MI3 MI3 MO1 SO3

SO1 MO2 SI2 SO1 MO3 MI3 SI2 MO1 MI3 MO1

MO2 MO1 MI3 MY2 MY3 MI3 MI1 MY1 SY2 MI3

SO1 MY2 MI3 MO1 SI3 SI1 SY3 MO1 MO1 SO1

MY1 MI3 MI3 MI3 MY2 MO3 MO2 SO2 MI3 MO1

MO1 MI1 SI2 MO3 MI1 MI3 MI3 MY3 MO2 MO1

MO2 MY2 SO2 MY2 SO1 SI2 SO3 MO3 MI3 MI3

SO2 MI3 MI3 SO1 MY2 MI3 SY2 MO1 MI2 MI3

SO1 SO2 MI3 MO3 SO2 SY1 SO2 SI1 MY2 SI1

MI2 MI3 MI3 MY2 MY2 MI3 MO2 MO3 MO1 MI3

MO1 SO1 MO1 MO2 MO2 SO2 MI3 SO1 MI3 SI1

MI2 MY2 MI3 SI1 MI3 MO2 MI3 MI3 MO1 MO2

MI3 SI1 MI3 MI3 SY2 SO2 MO1 SI2 SO2 SO1

SO1 MI2 MO2 MO2 MO1 MI3 MI3 MI3 MO3 MO2

MI2 MI3 MO1 MI3 SO1 SO2 SI2 SO1 SI2

SO1 MI3 MI3 MO3 MO2 MY1 MO2 MI3 MO3

MI1 SY2 MO3 SO1 MY2 SI2 MI2 MI3 SI1

MO1 MO2 MO3 MI3 MO1 SO1 MI2 MI3 MO2

MI3 MI3 MI3 SO1 MI3 MI3 SY2 SI3 MO2

MI1 SO1 MI3 MY2 SY3 MI3 MI2 SO2 MO2

SO1 MI3 MI3 MY1 MI1 MO2 MY1 MI2 MO3

MI1 MI3 MI3 SI1 MO3 MO1 SI1 SO1 SI1

Things to think about:

² What problems are the construction company trying to solve?

² Is the company’s investigation a census or a survey?

² What are the variables?

² Are the variables categorical or quantitative?

² What are the categories of the categorical variables?

² Can you explain why the construction company is interested in these categories?

² Is the data being collected in an unbiased way?

² Why were the names, addresses and phone numbers of respondents asked for?

² Can you make sense of the data in its present form?

² How could you reorganise the data so that it can be summarised and displayed?

² What methods of display are appropriate here?

² Can you make a conclusion regarding the data and write a report of your findings?

COMPARING CATEGORICAL DATA (Chapter 18) 371

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\371IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:31 AM PETER

Page 4: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

Statistics is the art of solving problems and answering questions by collecting

and analysing data.

The facts or pieces of information we collect are called data.

One piece of information is known as one piece of datum (singular), whereas lots of pieces

of information are known as data (plural).

A list of information is called a data set. If it is not in organised form it is called raw data.

VARIABLES

There are two types of variables that we commonly deal with:

² A categorical variable describes a particular quality or characteristic. The data is

divided into categories, and the information collected is called categorical data.

Examples of categorical variables are:

Getting to school:

Colour of eyes:

the categories could be train, bus, car and walking.

the categories could be blue, brown, hazel, green, grey.

² A quantitative variable has a numerical value and is often called a numerical

variable. The information collected is called numerical data.

Quantitative variables can be either discrete or continuous.

A quantitative discrete variable takes exact number values and is often a result of

counting.

Examples of discrete quantitative variables are:

The number of people in a household: the variable could take the values

1, 2, 3, .....

The score out of 30 for a test: the variable could take the values

0, 1, 2, 3, ......, 30.

A quantitative continuous variable takes numerical values within a certain

continuous range. It is usually a result of measuring.

Examples of quantitative continuous variables are:

The weight of newborn babies: the variable could take any positive value

on the number line but is likely to be in the

range 0:5 kg to 8 kg.

The heights of 14 year old students: the variable would be measured in

centimetres. A student whose height is

recorded as 145 cm could have exact height

anywhere between 144:5 cm and 145:5 cm.

CATEGORICAL DATAA

372 COMPARING CATEGORICAL DATA (Chapter 18)

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\372IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:33 AM PETER

Page 5: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

CENSUS OR SAMPLE

The two methods of data collection are by census or sample.

A census involves collecting data about every individual in a whole population.

The individuals in a population may be people or objects. A census is detailed and accurate

but is expensive, time consuming, and often impractical.

A sample involves collecting data about a part of the population only.

A sample is cheaper and quicker than a census but

is not as detailed or as accurate. Conclusions drawn

from samples always involve some error.

A sample must truly reflect the characteristics of the

whole population. To ensure this it must be unbiased

and large enough.

Just how large a sample needs to be is discussed in

future courses.

In a biased sample, the data has been unfairly influenced by the collection process.

It is not truly representative of the whole population.

STATISTICAL GRAPHS

Two variables under consideration are usually linked by one being dependent on the other.

For example: The total cost of a dinner depends on the number of guests present.

The total cost of a dinner is the dependent variable.

The number of guests present is the independent variable.

When drawing graphs involving two variables,

the independent variable is usually placed on the

horizontal axis and the dependent variable is

placed on the vertical axis. An exception to this

is when we draw a horizontal bar chart.

Acceptable graphs which display categorical data are:

The mode of a set of categorical data is the category which occurs most frequently.

dependent variable

independent variable

Vertical column graph

42

68

10

Horizontal bar chart

42 6 8 10

Segment bar chartPie chart

COMPARING CATEGORICAL DATA (Chapter 18) 373

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\373IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:35 AM PETER

Page 6: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

THE STATISTICAL METHOD

The process of statistical enquiry or investigation includes the following steps:

Step 1: Examine the problem which may be solved using data. Pose the correct

questions.

Step 2: Collect unbiased data.

Step 3: Organise the data.

Step 4: Summarise and display the data.

Step 5: Analyse the data and make a conclusion in the form of a conjecture.

Step 6: Write a report.

GRAPHING USING A COMPUTER PACKAGE

Click on the icon to obtain a computer package which can be used to draw:

² graphs of a single set of categorical data using:

I a vertical column graphI a horizontal bar chartI a pie chartI a segment bar chart

² comparative graphs of categorical data using:

I a side-by-side column graphI a back-to-back bar chart.

EXERCISE 18A

1 Classify the following variables as either categorical or numerical:

a the number of text messages you send in a day

b the places where you access the internet

c the brands of breakfast cereal

d the heights of students in your class

e the daily maximum temperature for your city

f the number of road fatalities each day

g the breeds of horses

h the number of hours you sleep each night.

2 Write down possible categories for the following categorical variables:

a brands of cars b methods of transport

c types of instruments in a band d methods of advertising.

3 For each of the following possible investigations, classify the variable as categorical,

quantitative discrete, or quantitative continuous:

a

GRAPHING

PACKAGE

374 COMPARING CATEGORICAL DATA (Chapter 18)

the types of flowers available from a florist

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\374IB_MYP3_18.CDR Monday, 2 June 2008 2:22:57 PM PETER

Page 7: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

b the numbers on playing cards in a pack of cards

c the heights of trees that were planted one year ago

d the masses of oranges in a 5 kg bag

e the times for runners in a 400 metre race

f the number of oranges in the 5 kg bags at a

supermarket

g the varieties of peaches

h the amount of rain each day for a month

i the speeds of cars passing through an intersection

j the types of fiction

k the pulse rates of horses at the end of a race

l

m the number of passengers for a taxi driver each day for a month

n the number of students absent from school each day for a term.

4 State whether a census or a sample would be used for these investigations:

a the country of origin of the parents of students in

your class

b the number of people in your country who are

concerned about global warming

c people’s opinions about the public transport system

in your capital city

d the favourite desserts in your local restaurant

e the most popular candidate for the next election in

your state or county.

5 Comment on any possible bias in the following situations:

a Members of a dog club are asked if dogs make the best pets.

b School students are asked about the benefits of homework.

c Commuters at peak hour are asked about crowding in buses.

6

a

b What is the dependent variable?

c What is the sample size?

d Find the mode of the data.

e

0

2

4

6

8

10

Can

ada

Engla

nd

Franc

e

Ger

man

y

Spain

Uni

ted

State

s

Aus

tralia

frequency

country

Guests of a hotel in Paris were askedwhich country they lived in. The resultsare shown in the vertical column graph.

What are the variables in thisinvestigation?

Construct a pie chart for the data.If possible, use a spreadsheet.

COMPARING CATEGORICAL DATA (Chapter 18) 375

the weekly cost of groceries for your family

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\375IB_MYP3_18.CDR Monday, 2 June 2008 2:22:44 PM PETER

Page 8: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

7 Fifty households of one street were asked which brand of

television they owned. The data alongside was collected.

a What are the dependent and independent variables in

this investigation?

b If we are trying to determine the buying patterns of

a whole city, is the sample unbiased? Explain your

answer.

c Find the mode of the data.

d

8 Find the sector angle of a pie chart if the frequency of the category is:

a 23 in a sample of 180 b 128 in a sample of 720

c 238 in a sample of 1250.

9 A sample of many people was taken, asking them their

favourite fruit. On a pie chart a sector angle of 68o

represented 277 people whose favourite fruit was an

orange.

Find, to the nearest 10, the size of the sample used.

SY1 2 SI1 11 SO1 26 39

SY2 7 SI2 9 SO2 15 31

SY3 2 SI3 3 SO3 3 8

11 23 44 78

Number

of

bedrooms

Age of single respondent

18 to 35 36 to 59 60+ Totals

1 2 11 26 39

2 7 9 15 31

3 2 3 3 8

Totals 11 23 44 78

She then uses a spreadsheet to create a series of graphs. Here are two of them:

EXAMINING CATEGORICAL DATAB

oranges

0

5

10

15

20

25

30

18 to 35 36 to 59 60+

1 bedroom2 bedrooms3 bedrooms

Housing by age group

In order to make a report tothe construction company, shedisplays this data in the formof a :two way table

TV Brand Frequency

A 9B 4

C 12D 8

E 7F 10

Construct a horizontal bar chart for the data.

376 COMPARING CATEGORICAL DATA (Chapter 18)

For the data, the statisticalofficer first extracts the data for single peopleresponding to the survey. Her findings are:

Opening Problem

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\376IB_MYP3_18.CDR Monday, 2 June 2008 9:39:18 AM PETER

Page 9: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

If we transpose the data by interchanging the rows and columns, we also get an interesting

comparison.

From the table of counts and from graphs, various questions can be answered. In many cases

tables containing percentages may be more appropriate to use.

Organ

donor

Marital status

Single Married Totals

Yes 63 79

No 25 27

Totals

A survey was conducted to determine

willingness to be an organ donor. The

results are shown alongside:

a Complete the table to find the total

of each row and column.

b How many people surveyed were

married but not willing to be an organ donor?

c What percentage of single people surveyed were willing to be organ donors?

d What percentage of people surveyed were married?

a

Organ

donor

Marital status

Single Married Totals

Yes 63 79 142

No 25 27 52

Totals 88 106 194

b From the table, 27 people were

married but were not willing to

be an organ donor.

c Percentage of single people who

were willing to be organ donors

= 6388 £ 100%

¼ 71:6%

d Percentage of people surveyed who were married = 106194 £ 100% ¼ 54:6%

EXERCISE 18B

Preference

Gender

Male Female Totals

Basketball 21 20

Tennis 9 35

Totals

1 Residents of a suburb were sent

a survey in the mail. It asked

them to indicate their gender and

whether they would prefer a tennis

court or a basketball court built in

their suburb. The results are given

alongside.

a Complete the table to find the total of each row and column.

Example 1 Self Tutor

Find out how to transpose yourtable without having to retypethe data into new cells.

0

5

10

15

20

25

30

1 bedroom 2 bedrooms 3 bedrooms

18 to 35

36 to 59

60+

Housing by dwelling type

COMPARING CATEGORICAL DATA (Chapter 18) 377

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\377IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:45 AM PETER

Page 10: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

b Did more males or females respond to the survey?

c What percentage of people responding to the survey preferred a basketball court?

d What percentage of people responding to the survey who preferred a tennis court

were female?

Transport

method

North South Totals

Public 85 64 149

Car 71 57 128

Totals 156 121 277

2 To determine where the need for public

transport is greatest, residents of a city were

asked to indicate whether they lived to the

north or south of the city centre, and the

method of transport they used to go to work.

a What percentage of people living north

of the city centre take public transport to work?

b What percentage of people surveyed live south of the city centre and drive their car

to work?

c Is the percentage of people using public transport greater to the north or the south

Home

computer

Year

1997 2007 Totals

Yes 113 281

No 159 23

Totals

3 A survey was conducted in 1997, and another

in 2007, to investigate how many people had a

computer in their home. The results are given

in the table alongside:

a Complete the table to find the total of

each row and column.

b What percentage of people surveyed in 1997 had a computer in their home?

c What percentage of people surveyed in 2007 did not have a computer in their home?

d

Food

preference

Age group

Under 30 Over 30 Totals

Italian 29 38 67

Greek 55 24 79

Totals 84 62 146

4 A country town in England

is organising a food festival.

Residents interested in attending

were asked to indicate their age

group and whether they preferred

Italian or Greek food.

a How many people responding to the survey expressed a preference for Greek food?

b What percentage of people who indicated they preferred Italian food were under 30?

c What percentage of people over 30 preferred Greek food?

d Which age group showed the most interest in the festival, and which type of food

did that group prefer?

5 For the Singles’ data from the Opening Problem:

a create your own tally on a spreadsheet and obtain the side-by-side column graph

b transpose the data on the spreadsheet and draw the new side-by-side column graph

c answer the following questions:

i How many singles responded to the survey?

ii What percentage of the total respondents were single?

378 COMPARING CATEGORICAL DATA (Chapter 18)

Position

of the city centre?

Find the increase in computer ownership percentage from to , accordingto the survey.

1997 2007

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\378IB_MYP3_18.CDR Monday, 2 June 2008 2:24:41 PM PETER

Page 11: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

INVESTIGATION THE OPENING PROBLEM

iii Which singles’ age group showed the most interest in the new development and

what form of housing interested them most?

iv What percentage of the 36 to 59 singles age group were interested in 3 bedroom

apartments?

v What percentage of the single respondents were interested in buying a 2bedroom apartment?

d True or false?

i The 18 to 35 singles group has shown little interest in the new apartments.

ii There is much interest amongst the single respondents in one bedroom

apartments.

iii The 60+ singles age group has shown most interest in the new apartments and

the vast majority of them want them with 2 or 3 bedrooms.

Your task now is to organise the data from the married respondents for

the Opening Problem. You could do this without using a spreadsheet and

simply count from the raw data originally given. However, you could use

the spreadsheet found on your CD. Click on the icon to find it.

It contains all 272 responses so the singles data should first be eliminated.

What to do:

1 Follow these steps to analyse the apartment size by age group:

Step 1: Open the spreadsheet by clicking on the icon.

Step 2: Enter the formula =COUNTIF(| {z } $A:$A,| {z } “M”&D$3&$C4)| {z }” into cell D4.

Step 3: Fill the formula in D4 down and across to cell F6.

Step 4: Highlight the cell range D4:G7 and click the sum button on the

toolbar.

Step 5: Highlight the cell range C3:F6 and click on the Chart Wizard button

on the toolbar. Choose the first type of column graph and click the

Finish button. A column graph of your tabulated data should appear.

data incolumn A

the formula to count thenumber of times “ ”

appears in the dataMY1

constructs “ ” fromthe table row andcolumn headings

MY1

SPREADSHEET

COMPARING CATEGORICAL DATA (Chapter 18) 379

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\379IB_MYP3_18.CDR Tuesday, 17 June 2008 10:29:19 AM PETER

Page 12: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

2 The next task is to suitably graph the married respondents’ data so that it can be

compared:

² in categories

² with the singles’ data to find similarities and differences.

Produce similar graphs for the singles’ data.

3 Construct your own report for the construction company. It should include:

² tabulation of the data in summarised form

² graphical representation of the data

² discussion and conclusion.

4 What conclusions can you draw from the data?

DISPLAY

To display categorical data sets for comparison we could use:

² a side-by-side column graph ² a back-to-back bar chart.

City

Colour

Red Blue Black White

London 35 26 38 21

Paris 27 19 34 40

In order to compare the popularity of

car colours in London and Paris, a

sample of 120 people from each city

were asked their favourite car colour.

a Draw a side-by-side column graph

comparing the data from London and Paris.

b Which colour is most popular in London?

c In which city are blue cars more popular?

d Which colour shows the largest difference in popularity between the two cities?

COMPARING AND REPORTINGCATEGORICAL DATA

C

Example 2 Self Tutor

0

2

4

6

8

10

12

A B C D E F G

frequency

value frequency

value

�� � � � � ��

A

B

C

D

E

F

G

380 COMPARING CATEGORICAL DATA (Chapter 18)

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\380IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:53 AM PETER

Page 13: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

a b Black is the most popular colour in

London, with 38 people preferring

black.

c From the graph, blue cars are more

popular in London.

d The largest difference in popularity

occurs in white cars, with 21 people in

London compared to 40 people in Paris.

EXERCISE 18C

Gender

Subject

Maths Science Geography Art

Boys 30 26 21 22

Girls 20 15 40 25

1 100 boys and girls were asked

to indicate their favourite

subjects. The results were:

a Draw a back-to-back bar

chart comparing the data

for boys and girls.

b What is the most popular subject for girls?

c Do boys or girls show less variation in their subject preferences?

d In which subject were the boys’ and girls’ preferences closest?

Age

Movie type

Action Comedy Drama Horror

Under 30 38 42 29 31

Over 30 17 21 25 7

2 People from two different age

groups were surveyed about their

favourite type of movie.

a Draw a side-by-side column

graph comparing movie

preferences of the under 30s and over 30s.

b Is this a sensible way to compare the groups? Give a reason for your answer.

c Draw a side-by-side column graph again, this time using relative frequencies. Is this

a more sensible way of comparing the groups?

d Which age group likes drama movies more?

e Which type of movie is preferred equally by each age group?

3 In the lead up to an election, 100 people from the electorates of Arton and Burnley were

polled to see how they intended to vote on election day. The results are shown in the

following table:

Electorate

Party

Labor Liberal Independent Undecided

Arton 37 28 14 21

Burnley 33 36 15 16

a Draw a back-to-back bar chart comparing the data from Arton and Burnley.

b Which party is the most popular in Arton?

c In which electorate is the Liberal Party performing the best?

d In which electorate is the intended voting closest?

0

10

20

30

40

50

red blue black white

London

Paris

colour

no

of

vote

s

Car colours

COMPARING CATEGORICAL DATA (Chapter 18) 381

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\381IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:55 AM PETER

Page 14: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

DATA COLLECTION

There are a number of ways in which data can be misleading. It is always a good idea to

check the source and method of collection of a set of data before making major decisions

based on statistics.

Before data is collected the following decisions need to be made:

1 Should data for the investigation be collected from the whole population or a sample

of the population?

In most statistical investigations surveying the whole population is impractical; it is either

too time consuming or too costly. In these cases a sample is chosen to represent the

population. Conclusions for an investigation from a sample will not provide the same

degree of accuracy as conclusions made from the population.

In this context, population means all the people or things that the conclusions of a

statistical investigation would apply to.

For example, if you want to investigate a theory related only to your school then the

population is all the students who attend your school. The population in this case would

be accessible although the investigation could also be done with a carefully selected

sample.

2 How should the sample be collected?

If data is to be collected from a sample then a sample that represents the population

must be chosen so that reliable conclusions about the population can be made.

Samples must be chosen so that the results will not show bias towards a particular

outcome. For example, if the purpose of a survey is to get an accurate indication of how

the population of a city is going to vote at the next election then surveying a sample

of voters from only one suburb would not provide information that represents all of the

city. It would give a biased sample.

One way of choosing an unbiased sample is to use simple random sampling where

every member of the population has an equal chance of being chosen.

3 What should the sample size be?

The sample size is an important feature to be

considered if conclusions about the population

are to be made from the sample.

For example, measuring a group of three fifteen-

year-olds would be insufficient to give a very

reliable estimate of the height of fifteen-year-

olds.

DATA COLLECTIOND

382 COMPARING CATEGORICAL DATA (Chapter 18)

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\382IB_MYP3_18.CDR Wednesday, 28 May 2008 11:53:57 AM PETER

Page 15: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

PROJECT

Decide on a worthwhile project of a statistical nature.

The data collected should be categorical in nature.

² Discuss the project with your teacher who will judge if it is appropriate.

² Make sure that your sample is sufficiently large to accurately reflect the population.

² Make sure that your sample is not biased.

² Write a detailed, factual report including your data and your conclusions.

EXERCISE 18D

1 A school has 820 students. An investigation concerning the school uniform is being

conducted. 40 students from the school are randomly selected to complete the survey on

their school uniform.

a What is the population size?

b What is the size of the sample?

c Explain why data collected in the following ways would not produce a sample

representative of the population.

i The surveyor’s ten best friends are asked to complete the survey.

ii All the students in one class are surveyed.

iii Volunteers are asked to complete the survey.

2

3 A polling agency is employed to survey the voting intention of residents of a particular

electorate in the coming election. From the data collected they are to predict the election

result in that electorate.

Explain why each of the following situations would produce a biased sample:

a A random selection of people in the local large shopping complex is surveyed

between 1 pm and 3 pm on a weekday.

b All the members of the local golf club are surveyed.

c A random sample of people on the local train station between 7 am and 9 am are

surveyed.

d A doorknock is undertaken, surveying every voter in a particular street.

A research company wants to knowpeoples’ opinions on whether smokingshould be banned in all public places.

They ask people standing outsidebuildings in the city during office hours.Explain why the data collected is likelyto be biased.

COMPARING CATEGORICAL DATA (Chapter 18) 383

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\383IB_MYP3_18.CDR Wednesday, 28 May 2008 11:54:01 AM PETER

Page 16: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

Graphs can also be misleading. There are two ways this usually happens:

USING A ‘CUT-OFF’ SCALE ON THE VERTICAL AXIS

For example, consider the graph shown:

A close look at the graph reveals that the vertical

scale does not start at zero and so has exaggerated

the increase in profits.

The graph should look like that on the left,

which gives a better picture of the profit

increases. It probably should be labelled

‘A slow but steady increase in profits’.

MISREPRESENTING THE ‘BARS’ ON A BAR CHART OR COLUMN

GRAPH

Sometimes the ‘bars’ on a bar chart or column graph are shown with misleading area or

volume.

For example, consider the graph below comparing sales of different flavours of drink.

By giving the ‘bars’ the appearance of

volume, the sales of lemon drinks look

to be about eight times the sales of lime

drinks.

However, on a bar chart the frequency is

proportional to the height of the bar only, and so

the graph should look like this:

MISLEADING GRAPHSE

profit ($1000’s)

month3

6

9

12

15

18

Jan Feb Mar Apr

profit ($1000’s)

month14

15

16

17

Jan Feb Mar Apr

Profit

ssk

yrocket!

sales($m’s)

flavour of drink

Lime Lemon Orange

sales ($m’s)

flavour of drink

Lime Lemon Orange

The sales of lemon arejust over twice the salesof lime.

384 COMPARING CATEGORICAL DATA (Chapter 18)

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\384IB_MYP3_18.CDR Wednesday, 28 May 2008 11:54:04 AM PETER

Page 17: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

REVIEW SET 18A

EXERCISE 18E

1 Describe the misleading or poor features of each of the following graphs:

a b

c d

2

a Which graph gives the impression of rapidly increasing sales?

b Have sales in fact rapidly increased over this 6 year period?

c According to graph A, the sales for 2006 appear to be double those of 2005. Is this

true?

1 For each of the following investigations, classify the variable as categorical,

quantitative discrete, or quantitative continuous:

a the favourite television programs watched by class members

b the number of visitors to an art gallery each week.

year

Graph A

2000

2100

2200

2300

2400

2500

2600

03 04 05 06 07 08 09

sales of chocolates

year

Graph B

0

500

1000

1500

2000

2500

03 04 05 06 07 08 09

sales of chocolates

Fish sold at markets

week

cases (100’s)

0

10

20

30

1 2

Milk production

10

20

factory

('000s L)

0

A B

Exports

year0

10

20

30

40

20062005 2007

tonnes (millions)Interstate bus fares

year

dollars

45

50

65

55

70

60

2005 2006 20072004

COMPARING CATEGORICAL DATA (Chapter 18) 385

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\385IB_MYP3_18.CDR Monday, 2 June 2008 2:27:45 PM PETER

Page 18: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

Type of food Frequency

Pie 20

16

Pasta 9

Sandwich 17

Apple 13

Chips 5

2

a What are the variables in the investigation?

b What are the dependent and independent

variables?

c In what way is the sample biased?

d Construct a vertical column graph to illustrate

the data.

Age

Analogue Digital Totals

Under 30s 32 23

Over 30s 38 21

Totals

3 A sample of people were asked

whether they owned an analogue

or a digital radio. The results

were sorted by the age of the

people surveyed, and are shown

in the table alongside:

a Complete the table to find

the total of each row and column.

b How many people surveyed were over

30?

c What percentage of under 30s surveyed

own an analogue radio?

d What percentage of people surveyed

were over 30s who have a digital radio?

Ticket Group

Adult Concession Children

Friday 121 71 63

Saturday 139 34 82

4 The ticket sales for a movie theatre

over a two day period are given in

the table alongside:

a Draw a side-by-side column

graph comparing the ticket

sales from Friday and Saturday

for each ticket group.

b Which day is more popular with adults?

c Which ticket group was least popular on Friday?

d Which ticket group was most influenced by the day of the week?

5 To compare the sporting preferences of Hillsvale and Greensdale High Schools, 100students from each school were asked to indicate their favourite sports:

Sport

Gridiron Ice Hockey Basketball Baseball Football

Hillsvale 24 28 17 22 9

Greensdale 18 30 14 12 26

Radio type

Hot dog

386 COMPARING CATEGORICAL DATA (Chapter 18)

To find out which foods at a school canteen thestudents eat most, year students were askedto nominate the item they purchased most often.The following data was collected:

80 12

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\386IB_MYP3_18.CDR Monday, 2 June 2008 9:43:53 AM PETER

Page 19: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

REVIEW SET 18B

a Draw a back-to-back bar graph comparing the results for Hillsvale and

Greensdale High Schools.

b Which is the most favoured sport at Hillsvale?

c Basketball is more popular in which school?

d Which sport is best described as “a mainly Greensdale School Sport”?

6 To investigate the public’s opinion

on whether money should be spent

upgrading the local library, a

questionnaire form is placed in the

library for members of the public to

fill out.

Explain why this sample is likely to

be biased.

1 For each of the following investigations, classify the variable as categorical,

quantitative discrete, or quantitative continuous:

a the area of each block of land on a street

b the marital status of people in a particular suburb.

2 State whether a census or a sample would be used for these investigations:

a finding the percentage of people who own a dog

b finding the number of pets that students in a particular class own

c finding the amount of rain a city receives each month.

Require glasses

Yes No

Left handed 39 56

Right handed 229 311

3 To investigate whether being left or right

handed has any effect on eyesight, a sample of

people were asked whether they were left or

right handed and whether they required glasses

for driving. The results are given in the table

alongside:

a How many people were surveyed?

b What percentage of people surveyed were left handed?

c What percentage of right handed people required glasses for driving?

d Was there a significant difference between the percentages of right handed and

left handed people who required glasses for driving?

Election Issues

Unemployment Inflation Health Education

Under 30 68 20 23 39

Over 30 52 27 50 21

4 A newspaper surveys

150 people aged under

30 and 150 people over

30 about an upcoming

election. They want to

find out which issues

are most important to each age group.

COMPARING CATEGORICAL DATA (Chapter 18) 387

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\387IB_MYP3_18.CDR Wednesday, 28 May 2008 11:54:11 AM PETER

Page 20: Comparing categorical data - Haese Mathematics categorical data Chapter18 Contents: A Categorical data B Examining categorical data C Comparing and reporting categorical data D Data

a Draw a back-to-back bar graph comparing the results for the under 30s and the

over 30s.

b Which issue is most important to the people aged under 30?

c Which issue is the least important to the people aged over 30?

d Which issue is mainly an “over 30s issue”?

Preference

Hotel Shopping Park

Male 83 39 48

Female 4 16 10

5 A council is considering whether to occupy

a block of land with a hotel, a shopping

centre, or a park. To find the local

residents’ opinions, the council surveys 200spectators at the local football match. The

results are shown alongside:

a Which option was most preferred by the people surveyed?

b Would it be reasonable for the council to make a decision based on the answer

given in a? Give a reason for your answer.

c Draw a side-by-side column graph comparing the preferences of males and

females, using relative frequencies.

d Is the option of a park preferred more by males or females?

6 The Government releases the following

graph showing the increase in employment

in the tourism industry over recent years.

a Explain why the graph is misleading.

b Redraw the graph in a way that more

accurately indicates the increase in

employment.

30

31

32

33

34

35

2005 2006 2007 2008

employment (’000)

year

388 COMPARING CATEGORICAL DATA (Chapter 18)

IB MYP_3

magentacyan yellow black

0 05 5

25

25

75

75

50

50

95

95

100

100 0 05 5

25

25

75

75

50

50

95

95

100

100

Y:\HAESE\IB_MYP3\IB_MYP3_18\388IB_MYP3_18.CDR Wednesday, 28 May 2008 11:54:13 AM PETER


Recommended