51
MEASURES OF CENTRALITY

measures of centrality

  • Upload
    nam

  • View
    66

  • Download
    3

Embed Size (px)

DESCRIPTION

measures of centrality. Last lecture summary. Which graphs did we meet? scatter plot ( bodový graf ) bar chart (sloupcový graf) histogram pie chart (koláčový graf) How do they work, what are their advantages and/or disadvantages?. SDA women – histogram of heights 2014. n = 48 or N = 48 - PowerPoint PPT Presentation

Citation preview

Page 1: measures of centrality

MEASURES OF CENTRALITY

Page 2: measures of centrality

Last lecture summary• Which graphs did we meet?

• scatter plot (bodový graf)• bar chart (sloupcový graf)• histogram• pie chart (koláčový graf)

• How do they work, what are their advantages and/or disadvantages?

Page 3: measures of centrality

SDA women – histogram of heights 2014

n = 48 or N = 48

bin size = 3.8

Page 4: measures of centrality

Distributions

negatively skewedskewed to the left

positively skewedskewed to the left

http://turnthewheel.org/free-textbooks/street-smart-stats/

e.g., life expectancy e.g., body height e.g., income

Page 5: measures of centrality

STATISTICS IS BEATIFULnew stuff

Page 6: measures of centrality

Life expectancy data• Watch TED talk by Hans Rosling, Gapminder Foundation:

http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html

Page 7: measures of centrality

STATISTICS IS DEEP

Page 8: measures of centrality

UC BerkeleyThough data are fake, the paradox is the same

Simpson’s paradox

www.udacity.com – Introduction to statistics

Page 9: measures of centrality

Male

Applied Admitted Rate [%]MAJOR A 900 450MAJOR B 100 10

www.udacity.com – Introduction to statistics

Page 10: measures of centrality

Male

Applied Admitted Rate [%]MAJOR A 900 450 50MAJOR B 100 10 10

www.udacity.com – Introduction to statistics

Page 11: measures of centrality

Female

Applied Admitted Rate [%]MAJOR A 100 80MAJOR B 900 180

www.udacity.com – Introduction to statistics

Page 12: measures of centrality

Female

Applied Admitted Rate [%]MAJOR A 100 80 80MAJOR B 900 180 20

www.udacity.com – Introduction to statistics

Page 13: measures of centrality

Gender bias

What do you think, is there a gender bias?

Who do you think is favored? Male or female?

Applied Admitted Rate [%]MAJOR A 900 450 50MAJOR B 100 10 10

Applied Admitted Rate [%]MAJOR A 100 80 80MAJOR B 900 180 20

www.udacity.com – Introduction to statistics

Page 14: measures of centrality

Gender bias

Applied Admitted Rate [%]MAJOR A 900 450 50MAJOR B 100 10 10

Both 1000 460 46

Applied Admitted Rate [%]MAJOR A 100 80 80MAJOR B 900 180 20

Both 1000 260 26

male

female

www.udacity.com – Introduction to statistics

Page 15: measures of centrality

Gender bias

Rate [%]MAJOR A 50MAJOR B 10

Both 46

Rate [%]MAJOR A 80MAJOR B 20

Both 26

male

female

www.udacity.com – Introduction to statistics

Page 16: measures of centrality

Statistics is ambiguous• This example ilustrates how ambiguous the statistics is.

• In choosing how to graph your data you may majorily impact what people believe to be the case.

“I never believe in statistics I didn’t doctor myself.”“Nikdy nevěřím statistice, kterou si sám nezfalšuji.”

Who said that?

Winston Churchill

www.udacity.com – Introduction to statistics

Page 17: measures of centrality

What is statistics?• Statistics – the science of collecting, organizing,

summarizing, analyzing and interpreting data• Goal – use imperfect information (our data) to infer facts,

make predictions, and make decisions

• Descriptive statistic – describing and summarising data with numbers or pictures

• Inferential statistics – making conclusions or decisions based on data

Page 18: measures of centrality

Variables• variable – a value or characteristics that can vary from

individual to individual• example: favorite color, age

• How variables are classified?

• quantitative variable – numerical values, often with units of measurement, arise from the how much/how many question, example: age, annual income, number children• continuous (spojitá proměnná), example: height, weight• discrete (diskrétní proměnná), example: number of children

• continuous variables can be discretized

Page 19: measures of centrality

Variables• categorical (qualitative) variables

• categories that have no particular order• example: favorite color, gender, nationality

• ordinal• they are not numerical but their values have a natural order• example: tempterature low/medium/high

Page 20: measures of centrality

variable(proměnná)

quantitative(kvantitativní)

categorical(kategorická)

continuous(spojitá)

discrete(diskrétní)

ordinal(ordinální)

Variables

Page 21: measures of centrality

Choosing a profession

Chemistry Geography

50 000 – 60 000 40 000 – 55 000

www.udacity.com – Statistics

Page 22: measures of centrality

Choosing a profession• We made an interval estimate.• But ideally we want one number that describes the entire

dataset. This allows us to quickly summarize all our data.

www.udacity.com – Statistics

Page 23: measures of centrality

Choosing a profession

1. The value at which frequency is highest.

2. The value where frequency is lowest.

3. Value in the middle.

4. Biggest value of x-axis.

5. Mean

Chemistry Geography

www.udacity.com – Statistics

Page 24: measures of centrality

Three big M’s

• The value at which frequency is highest is called the mode. i.e. the most common value is the mode.

• The value in the middle of the distribution is called the median.

• The mean is the mean (average is the synonymum).

Chemistry Geography

www.udacity.com – Statistics

Page 25: measures of centrality

Quick quiz• What is the mode in our data?

2 5 6 5 2 6 9 8 5 2 3 5

www.udacity.com – Statistics

Page 26: measures of centrality

Mode in negatively skewed distribution

www.udacity.com – Statistics

Page 27: measures of centrality

Mode in uniform distribution

www.udacity.com – Statistics

Page 28: measures of centrality

Multimodal distribution

www.udacity.com – Statistics

Page 29: measures of centrality

Mode in categorical data

www.udacity.com – Statistics

Page 30: measures of centrality

More of modeTrue or False?

1. The mode can be used to describe any type of data we have, whether it’s numerical or categorical.

2. All scores in the dataset affect the mode.

3. If we take a lot of samples from the same population, the mode will be the same in each sample.

4. There is an equation for the mode.

• Ad 3.• http://onlinestatbook.com/stat_sim/sampling_dist/ • http://www.shodor.org/interactivate/activities/Histogram/ - mode changes as you

change a bin size.

• Because 3. is not true, we can’t use mode to learn something about our population. Mode depends on how you present the data.

www.udacity.com – Statistics

Page 31: measures of centrality

Life expectancy data

www.coursera.org – Statistics: Making Sense of Data

Page 32: measures of centrality

Minimum

Sierra Leone

minimum = 47.8

www.coursera.org – Statistics: Making Sense of Data

Page 33: measures of centrality

Maximum

Japan

maximum = 84.3

www.coursera.org – Statistics: Making Sense of Data

Page 34: measures of centrality

Life expectancy data

all countries

www.coursera.org – Statistics: Making Sense of Data

Page 35: measures of centrality

Life expectancy data

1 197

Egypt

99

73.2half larger

half smaller

www.coursera.org – Statistics: Making Sense of Data

Page 36: measures of centrality

Life expectancy data

Minimum = 47.8

Maximum = 83.4

Median = 73.2

www.coursera.org – Statistics: Making Sense of Data

Page 37: measures of centrality

Q1

1 197

Sao Tomé & Príncipe

50 (¼ way)

1st quartile = 64.7

www.coursera.org – Statistics: Making Sense of Data

Page 38: measures of centrality

Q1

¾ larger¼ smaller

1st quartile = 64.7

www.coursera.org – Statistics: Making Sense of Data

Page 39: measures of centrality

Q3

1 197

NetherlandAntilles

148 (¾ way)

3rd quartile = 76.7

www.coursera.org – Statistics: Making Sense of Data

Page 40: measures of centrality

Q3

3rd quartile = 76.7

¾ smaller ¼ larger

www.coursera.org – Statistics: Making Sense of Data

Page 41: measures of centrality

Life expectancy data

Minimum = 47.8

Maximum = 83.4

Median = 73.2

1st quartile = 64.7

3rd quartile = 76.7

www.coursera.org – Statistics: Making Sense of Data

Page 42: measures of centrality

Box Plot

www.coursera.org – Statistics: Making Sense of Data

Page 43: measures of centrality

Box plot

1st quartile

3rd quartilemedian

minimum

maximum

Page 44: measures of centrality

Modified box plot

IQRinterquartile range

1.5 x IQR

outliers

outliers

Page 45: measures of centrality

Quartiles, median – how to do it?

79, 68, 88, 69, 90, 74, 87, 93, 76

Find min, max, median, Q1, Q3 in these data. Then, draw the box plot.

www.coursera.org – Statistics: Making Sense of Data

Page 46: measures of centrality
Page 47: measures of centrality

Another example

Min. 1st Qu. Median 3rd Qu. Max.

68.00 75.00 81.00 88.50 93.00

78, 93, 68, 84, 90, 74

Page 48: measures of centrality

Percentiles

věk [roky]http://www.rustovyhormon.cz/on-line-rustove-grafy

Page 49: measures of centrality

3rd M – Mean• Mathematical notation:

• … Greek letter capital sigma• means SUM in mathematics

• Another measure of the center of the data: mean (average)

• Data values:

Page 50: measures of centrality

Salary of 25 players of the American football (NY red Bulls) in 2012.

33 750

33 750

33 750

33 750

44 000

44 000

44 000

44 000

45 566

65 000

95 000

103 500

112 495

138 188

141 666

181 500

185 000

190 000

194 375

195 000

205 000

292 500

301 999

4 600 000

5 600 000

median = 112 495

mean = 518 311

Mean is not a robust statistic.

Median is a robust statistic.

Robust statistic

Page 51: measures of centrality

10% trimmed mean … eliminate upper and lower 10% of data

Trimmed mean is more robust.

Trimmed mean33 750

33 750

33 750

33 750

44 000

44 000

44 000

44 000

45 566

65 000

95 000

103 500

112 495

138 188

141 666

181 500

185 000

190 000

194 375

195 000

205 000

292 500

301 999

4 600 000

5 600 000

median = 112 495

mean = 518 311

10% trimmed mean = 128 109