Students Tutorial Answers Week2

1

BES Tutorial Sample Solutions, S2/10 It will be gradually posted on BES website with one week delay & without the tutor notes denoted by TN.

WEEK 2 TUTORIAL EXERCISES

guide to solutions

1. What is meant by a variable in a statistical sense? Distinguish between qualitative and quantitative statistical variables, and between continuous and discrete variables. Give examples.

A variable in a statistical sense is just some characteristic of an object. It may take different values. Data on a quantitative variable can be expressed numerically in a meaningful way (e.g. height of an individual, number of children in a family. Data on qualitative variables cannot be expressed numerically in a meaningful way; e.g. sex of an individual, hair colour). A discrete quantitative variable can assume only certain discrete numerical values on the number line (can be a finite or infinite number of these values). A continuous quantitative variable can assume any value in a specific range or interval; e.g. length of a pipe.

2

2. Distinguish between (a) a statistical population and a sample; (b) a parameter and a statistic. Give examples.

A statistical population is the set of measurements or observations of a characteristic of interest for all elementary units in a frame; e.g the shoe sizes of all men in Australia. A statistical sample is a subset of a population; e.g. the shoe sizes of all the men in the class is a sample of the population represented by the shoe sizes of all men in Australia. A parameter is a numerical description of a population. For example, the average shoe size of all Australian men is a parameter (of the population of the shoe sizes of all Australian men). A statistic is a numerical description of a sample. For example, the average shoe size of all men in this class room is a statistic (calculated from the sample of the shoe sizes of all men in this class room).

3

3. In order to know the market better, the second-hand car dealership, Anzac Garage, wants to analyze the age of second-hand cars being sold. A sample of 20 advertisements for passenger cars is selected from the second-hand car advertising/listing website www.drive.com.au The ages of the vehicles at time of advertisement are listed below: 5, 5, 6, 14, 6, 2, 6, 4, 5, 9, 4, 10, 11, 2, 3, 7, 6, 6, 24, 11

(a) Calculate frequency, cumulative frequency and

relative frequency distributions for the age data using the following bin classes: More than 0 to less than or equal to 8 years More than 8 to less than or equal to 16 years More than 16 to less than or equal to 24 years.

Bin Relative

Frequency Frequency Cumulative Frequency

0 8 0.7 14 14

8 16 0.25 5 19

16 24 0.05 1 20

(b) Sketch a frequency histogram using the calculations in

part (a). What can you say about the distribution of the age of these second-hand cars? Is there anything wrong with the frequency table and histogram? Specifically, is the choice of bin classes appropriate? What needs to be done?

4

From this graph (it was not necessary to use EXCEL although it is good practice), the Age distribution appears to be skewed to the right. 70% of observations have age between 0 and 8. However, this histogram only provides limited information about the Age distribution because there are too few bins and they are very wide. (c) Halve the width of the bins (0 to 4, 4 to 8, etc) and

recalculate the frequency, cumulative frequency and relative frequency distributions. Using the new distributions and histogram, what can you now say about the distribution of the age of second-hand cars?

Relative frequency histogram for Age

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

8 16 24

Bin

Freq

uenc

y

5

Bin Relative

Frequency FrequencyCumulative Frequency

0 4 0.25 5 5 4 < Age 8 0.45 9 14 8 < Age 12 0.2 4 18

12 < Age 16 0.05 1 19 16 < Age 20 0 0 19 20 < Age 24 0.05 1 20

There still appears to be a skew to the right, but now we can also see that there is an outlier in the 21~24 Age category. 5~8 are the most frequently observed ages. A quite sizable proportion of the second-hand cars are relatively new (25% being less or equal to 4 years old).

0123456789

10

2 6 10 14 18 22

Freq

uenc

y

Age

Figure 3.1: Revised histogram for age of cars

6

4. Management of a major bank has asked the Human Resources Department to provide an analysis of sick leave taken by the staff of one of their branches. The days taken as sick leave in the last calendar year for all 25 branch employees were: 0, 10, 9, 5, 0, 0, 5, 10, 0, 0, 10, 1, 0, 0, 0, 0, 10, 5, 10, 45, 0, 2, 1, 0, 5

a. What are the key features of these data? Its a bit difficult to tell. Even ordering would help. But clearly 45 is an outlying (large relative to the

others) observation. At the other extreme there are several employees

who havent taken any sick days. b. Calculate the frequency distribution and sketch the

frequency histogram. Does this provide any extra information to summarize these data relative to what you observed in (a)?

7

bin Frequency0 111 22 13 04 05 46 07 08 09 110 5

More 1

The clustering of the data are now very clear. Over half of employees take very few sick days. There were a few who took 5 days which may represent a single one week break because of a more serious illness and finally there were quite a few taking two weeks off.

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8 9 10 More

Freq

uency

bin

Histogram

8

c. What do you think would be the best way to

summarize the data for Management? While the histogram is a reasonable representation the verbal description in (b) is also a good description in this case. d. Is this an analysis of a population or a sample? Depends what the problem is. It could be a population if interest is confined to this branch in this year. Alternatively it could be a sample if management is interested in all branches or this particular branch over time.

9

5. SIA: Health expenditure A recent report by Access Economics provides a comparison of Australian expenditures on health with that of comparable OECD countries. Data from that report relating to 2005 have been used to reproduce their Figure 2.2 (below denoted as Figure 2.1).

(a) What are the key features of these data? A strong positive association more per capita

GDP implies more Health Expenditure per capita. There are (at least) 2 outliers, the observation with

the largest Health Expenditure (Luxembourg) and the observation with the highest GDP (USA). Without these 2 the relationship is approximately linear. With them, there is a suggestion of a non-linear relationship.

An indication of more variability in health expenditures when GDP is larger.

(b) While this is a bivariate scatter plot, there are three

variables involved: health expenditure, GDP and population. Why account for population by expressing health expenditure and GDP in per capita terms?

10

This is recognition that there may be factors other than GDP associated with Health Expenditures and population size is one obvious factor. Expressing everything in per capita terms is one way to control for population variations and hence isolate the GDP Health Expenditure relationship.

0

1

2

3

4

5

6

7

0 10 20 30 40 50 60 70

Healthexpe

nditurepe

rcapita(US$00

0)

GDPpercapita(US$000)

Figure2.1OECDHealthExpenditureandGDP

Documents

Students Tutorial Answers Week2