39

UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an
Page 2: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Statistics is the study of the collection, organization, analysis, interpretation and presentation of data.

Page 3: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

What is Data? Data is a collection of facts, such as values or

measurements.

It can be numbers, words, measurements, observations or even just descriptions of things.

Page 4: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Qualitative vs Quantitative

Qualitative data is descriptive information (it describes something: colour, shape, etc)

Quantitative data is numerical

information (numbers).

Page 5: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Quantitative data can also be Discrete or Continuous:

Discrete data can only take certain values (like whole numbers): the number of pupils in class.

Continuous data can take any value (within a range): persons height.

Page 6: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

What do we know about the dog?

Qualitative:

He is brown and white

He has short hair

He has lots of energy

Quantitative:

Discrete:

He has 4 legs

He has 2 brothers

Continuous:

He weighs 12,5 kg

He is 80 cm tall

Page 7: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Put simply:

Discrete data is counted.

Continuous data is measured.

Page 8: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

When we need to analyse data they must be collected and organized in a table.

Data can be categorical or numerical.

Examples of categorical data are: colour, kind of music, foods, our favourite subject…

Numerical data can be discrete or continuous.

Examples of numerical discrete data are: size of shoes, number of brothers and sisters…

Examples of numerical continuous data are: height, measures of length, …

Page 9: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

The main terms used in statistic Population or census: the set that is made up of all the

elements that we want to study. When you collect data for every member of the group (the whole "population").

Page 10: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Sample: is a part of the population that we study. Sample is when you collect data just for selected members of the group.

A census is accurate, but hard to do. A sample is not as accurate, but may be good enough, and is a lot easier.

Page 11: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Example: there are 120 people in your local football club.

You can ask everyone (all 120) what their age is. That is a census. 120 people

Or you could just choose the people that are there this afternoon. That is a sample.

population

sample

Page 12: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Variable: the quality we want to study in the sample.

Remember that variables can be cuantitative or cualitative.

- cualitative variable: your friend´s favourite colour.

- cuantitative variable:

Discrete: your friend´s number of siblings.

Continuous: the dog weight .

Individual: Each element of the population or sample.

Size: Number of elements in the sample or population.

Page 13: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Organise and analyse data

When we need to analyse data they must be collected and organized in a table.

It is advisable to follow these steps.

1. Collect data.

2. Organise data and display them in a frequency table.

3. Draw a graph.

Page 14: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Example 630 students have been asked about the number of brothers and sisters they are in their families.

These are the answers:

1. Collect data: 1, 2, 1, 3, 6, 3, 2, 1, 1, 1, 2, 2, 3, 2, 3, 2, 2, 4, 2, 3, 3, 2, 2, 3, 4, 2, 2, 3, 1, and 2.

2. Organise data into a frequency table:

Number of B or S Tally Frequency

1 II II II 6

2 II II II II II II I 13

3 II II II II 8

4 II 2

5 0

6 I 1

Page 15: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Frequency is how often something occurs.

We count things or different situations (variables) and say how often they occur.

By counting frequencies we can make a Frequency Distribution table.

Example:

We count the number of books pupils from 1º ESO read during the summer, and we get the tally and total:

1 0 1 3 4 2 tally:

2 3 2 2 0 0

1 0 2 2 3 1

1 0 2 1 2 1

0 IIII I 5

1 IIII III 7

2 IIII IIII 8

3 III 3

4 I 1

Page 16: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

From the table some conclusions are drawn:

Only one of the pupils from 1º ESO has read 4 books.

The most common number of books pupils read are 1 and 2.

The value from the distribution about books is called ABSOLUT FREQUENCY that means the number of times one thing occurs. This value is represented by the symbol fi

The sum of the absolute frequency for each value correspond to the total number of data.

f1 + f2 + f3 +……..+ = N

The RELATIVE FREQUENCY is the quotient of the absolute frequency by

the total number of data and the symbol for this value is hi The sum of the relative frequency for each value always correspond to 1

h1 + h2 + h3 + …….+ = 1

Page 17: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

The frequency and the data from a distribution can be organized in a frequency table.

Example: Build a frequency table for the tally from the last example.

Data Absolute

Frequency fi

Relative Frequency

hi

0 5 5/24

1 7 7/24

2 8 8/24

3 3 3/24

4 1 1/24

N = 24 Total = 1

Page 18: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Cumulative frequencies To have cumulative frequencies, just add up the values as you go.

You only can obtain the cumulative frequencies for quantitative variables because you need to put the data in order.

Cumulative absolute frequency is represented by Fi

Cumulative relative frequency is represented by Hi

Page 19: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Obtain the cumulative frequencies for the distribution below:

The shoe size for 20 students in a class is:

43, 42, 41, 39, 41, 37, 40, 43, 44, 40, 39, 36, 38, 41, 40, 39, 38, 39, 39, 40

xi fi hi Fi Hi

37 1 1/20 1 1/20

38 2 2/20 3 3/20

39 6 6/20 9 9/20

40 4 4/20 13 13/20

41 3 3/20 16 16/20

42 1 1/20 17 17/20

43 2 2/20 19 19/20

44 1 1/20 20 20/20

20 1

Page 20: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Now do exercises 8, 11 an 14 from pages 248, 249

and 250

Page 21: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Finding a central value for a data distribution

We can find the central value for a data distribution in different ways:

Calculating the MEAN (𝒙 )

Calculating the MEDIAN (Me)

Calculating the MODE (Mo)

Page 22: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

1. Calculating the MEAN The mean is the average of the numbers: a calculated "central" value of a set of numbers.

You only have to add up the numbers and divide by how many numbers.

Example: Tom wants to know the mean for the number of hours that his children spend playing videogames per week.

Tom: 4h/week Paul: 5 h/week

Ana: 2h/week Peter: 3h/week

Bob: 5h/week Betty: 4//week

Add up all the ages, and divide by 6 (because there are 6 numbers):

(4+2+5+5+3+4) / 6 = 3,8...

Page 23: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

It is advisable to follow these steps:

1. Build a frequencies table.

2. Multiply each value for the variable by the value of its absolute frequency.

3. Add up all products.

4. Divide the result by N (total number of data).

You can apply the formula: 𝒙 = 𝒙𝒊∙𝒇𝒊

𝑵

Number of hours per

week

Absolute frequency

𝒙𝒊 ∙ 𝒇𝒊

2 1 2

3 1 3

4 2 8

5 2 10

N = 14

Sum = 23

𝒙 = 𝒙𝒊 ∙ 𝒇𝒊

𝑵 =

23

14= 𝟏, 𝟔𝟒

Page 24: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

2. Calculating the MODE The Mode is the value that occurs most often:

In the example before, the MODE is 5, because this occurs twice.

Tom: 4h/week Paul: 5 h/week

Ana: 2h/week Peter: 3h/week

Bob: 5h/week Betty: 4//week

But Mode can be tricky, there can sometimes be more than one Mode.

Example: What is the Mode of 3, 4, 4, 5, 6, 6, 7

Well ... 4 occurs twice but 6 also occurs twice.

So both 4 and 6 are modes.

When there are two modes it is called "bimodal", when there are three or more modes we call it "multimodal".

Page 25: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

3. Calculating the MEDIAN

But you could also use the Median: simply list all numbers in order and choose the middle one:

In a birthday party there are 11 kids with different ages:

5 kids aged 10, 3 kids aged 8 and 2 kids aged 7.

We obtain the median by listing the data in order and choosing the middle number:

10 ,10, 10, 10, 10, 8, 8, 8, 8, 7, 7

The Median age is 8 ... so let's go to the cinema!

Page 26: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Sometimes there are two middle numbers. Just average them:

Example: What is the Median of 3, 4, 7, 9, 12, 15

There are two numbers in the middle:

3, 4, 7, 9, 12, 15

So we average them:

(7+9) / 2 = 16/2 = 8

The Median is 8

Page 27: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Finding the position values for a data distribution: Quartiles A position value gives us the place for a variable in the ordered data distribution.

Quartiles are the values that divide a list of numbers into quarters (Q1, Q2 and Q3)

- First put the list of numbers in order

- Then cut the list into four equal parts

- The Quartiles are at the "cuts“

25% 25% 25% 25%

I I I I I

Q1 Q2 Q3

Page 28: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Example: 4, 7, 6, 5, 5, 2, 4, 7, 5, 8, 8, 8

- Put the list of numbers in order

2, 4, 4, 5, 5, 5, 6, 7, 7, 8, 8, 8

- Then cut the list into four equal parts

- The Quartiles are at the "cuts“

2, 4, 4, 5, 5, 5, 6, 7, 7, 8, 8, 8

Q1 Q2 Q3

lower middle upper

quartile quartile quartile

(median)

Page 29: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Sometimes a "cut" is between two numbers ... the Quartile is the average of the two numbers.

3, 3, 4, 5, 5, 5, 6, 6, 7, 8, 8, 8, 8

The number are already in order.

Now, divide the list into quarters (cut the list)

3, 3, 4, 5, 5, 5, 6, 6, 7, 8, 8, 8, 8

Q1 Q2 Q3

lower middle upper

quartile quartile quartile

(median)

Q2 is s half way between 6 and 7

Q2 = (6 +7) /2 = 6,5

Page 30: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

In other words:

Q1 correspond to 25% of data, so:

The value of Cumulative frequecy higher than 25%* N correspond to Q1.

Q2 correspond to 50% of data, so:

The value of Cumulative frequecy higher than 50%* N correspond to Q2. This value is the same that the Median.

Q3 correspond to 75% of data, so:

The value of Cumulative frequecy higher than 75%* N correspond to Q3.

Page 31: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Finding Dispersion measurements for a data distribution

Dispersion in statistics is a way of describing how spread out a set of data is.

The spread of a data set can be described by:

- Range (R)

- Mean deviation (DM)

- Variance (σ2 )

- Standard deviation (σ)

- Variation coefficient (CV)

Page 32: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

The Range (R) is the difference between the lowest and highest values.

Mean deviation (DM) is the mean of the distances of each value from their mean.

Step 1: Find the mean: 𝑥

Step 2: Find the distance of each value from that mean: 𝑥𝑖 − 𝑥

Step 3. Find the mean of those distances

DM= 𝑓𝑖 ∙ 𝑥𝑖 − 𝑥

𝑁

It tells us how far, on average, all values are from the middle.

Page 33: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Variance (S2 or σ2) is the average of the squared differences from the Mean.

To calculate the variance follow these steps:

- Work out the Mean (the simple average of the numbers).

- Then for each number: subtract the Mean and square the result (the squared difference).

- Then work out the average of those squared differences.

Page 34: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Standard Deviation (σ) is just the square root of Variance, so:

Variation coefficient (CV) is the quotient between the standard deviation and the mean:

CV= σ𝑥

Page 35: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

How to show data: Graphs Besides tables, graphs make very easy to organize data.

Different graphs can be build with data.

Bar Graph:

A Bar Graph (also called Bar Chart) is a graphical display of data using bars of different heights.

Example: Imagine you just did a survey of your friends to find which kind of sport they liked best:

Sport Football Basket Tennis Athletism Handball

fi 8 12 6 10 4

Page 36: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

We can show that on a bar graph like this:

0

2

4

6

8

10

12

14

sports

sports

Page 37: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Frequency polygon:

If we join the middle top point of each column of a frequency graph, we obtain the frequences poligon (in red colour):

0

2

4

6

8

10

12

14

sports

sports

Page 38: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Histograms:

When we have a data distribution with intervals we use the histogram.

It is a graphical display of data using bars of different heights.

It is similar to a Bar chart, but a histogram groups numbers into ranges.

See example number 8 from page 252

Page 39: UNIT 13:HANDLING WITH DATA - miblogcolegioherma · 40 4 4/20 13 13/20 41 3 3/20 16 16/20 42 1 1/20 17 17/20 43 2 2/20 19 19/20 44 1 1/20 20 20/20 20 1 . Now do exercises 8, 11 an

Pie Chart:

A Pie chart is a special chart that uses "pie slices" to show relative sizes of data.

Example: Imagine you survey your friends to find the kind of books they like best:

You can show the data by this Pie Chart:

Topic Adventure Sci Fi Drama Byography History

fi 8 12 6 10 4

Topic

Adventures

Sci Fi

Drama

Byography

History