19
Body size distribution of European Collembola Lecture 9 Moments of distributions

Body size distribution of European Collembola Lecture 9 Moments of distributions

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Body size distribution of European Collembola

Lecture 9Moments of distributions

Body size distribution of European Collembola

SpeciesBody

weight [mg]

ln weight

ln body weight [mg] class means

Number of

speciesTetrodontophora bielanensis (Waga 1842) 13.471729 2.6006 -4.71511 7Orchesella chiantica Frati & Szeptycki 1990 13.471729 2.6006 -4.018377 53Disparrhopalites tergestinus Fanciulli, Colla, Dallai 2005 12.924837 2.5592 -3.321643 133Orchesella dallaii Frati & Szeptycki 1990 9.4503028 2.246 -2.624909 224Seira pini Jordana & Arbea 1989 9.4503028 2.246 -1.928176 353Isotomurus pentodon (Kos,1937) 7.1044808 1.9607 -1.231442 395Heteromurus (V.) longicornis (Absolon 1900) 7.1044808 1.9607 -0.534708 325Pogonognathellus flavescens (Tullberg 1871) 6.9512714 1.9389 0.162025 126Orchesella hoffmanni Stomp 1968 6.9512714 1.9389 0.858759 45Heteromurus (H) constantinellus Lučić, Ćurčić & Mitić 2007 6.3862223 1.8541 1.555493 24Pogonognathellus longicornis (Müller 1776) 6.2133935 1.8267 2.252226 9Orchesella devergens Handschin 1924 6.2133935 1.8267Orchesella flavescens (Bourlet 1839) 6.2133935 1.8267Orchesella quinquefasciata (Bourlet 1841) 6.2133935 1.8267

0

100

200

300

400

500

-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25

Num

ber o

f spe

cies

ln body weight class

CollembolaThe histogram of raw data

Modus

Weighed mean

)(1

111

ifxnn

xnxn

xk

ii

ik

iii

k

ii

Class 1 Class 2 Class 3N 25 31 43

Mean 1.8169079 1.032923 0.5310592.6005933 1.313477 0.6518082.5591508 1.313477 0.6518082.2460468 1.313477 0.6518082.2460468 1.313477 0.6518081.9607257 1.313477 0.6518081.9607257 1.301948 0.6518081.9389246 1.225568 0.6518081.9389246 1.165038 0.6518081.8541429 1.165038 0.6518081.8267072 1.165038 0.6518081.8267072 1.165038 0.6518081.8267072 1.006355 0.6518081.8267072 1.006355 0.6518081.8267072 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6518081.584378 1.006355 0.6131521.584378 1.006355 0.5738351.584378 1.006355 0.5738351.5326904 1.006355 0.5338341.5326904 0.939683 0.4931251.5064044 0.871022 0.4931251.4529137 0.871022 0.4931251.4529137 0.835906 0.493125

0.835906 0.4931250.800247 0.4890140.800247 0.4516820.764026 0.4516820.756712 0.4516820.727225 0.451682

0.409479

Three Collembolan weight classes

What is the average body weight?

013.1531.09943

033.19931

812.19925 x

n

xn

ii

1n

xx

n

ii

1

Population mean Sample mean

ln body weight [mg] class means

Number of

speciesFrequency

Arithmetic mean

Variance

-4.72 7 =B2/B14 =A2*C2 =(A2-D14)^2*C2-4.02 53 0.031286895 -0.125723 0.202268085-3.32 133 0.078512397 -0.26079 0.267516588-2.62 224 0.132231405 -0.347095 0.174619987-1.93 353 0.208382527 -0.401798 0.042653444-1.23 395 0.233175915 -0.287143 0.013917567-0.53 325 0.191853601 -0.102586 0.1698983170.16 126 0.074380165 0.0120514 0.1995107270.86 45 0.026564345 0.0228124 0.1447740291.56 24 0.014167651 0.0220377 0.1301786272.25 9 0.005312869 0.0119658 0.073837264

Sum 1694 -1.475751 1.462535979StDev 1.209353538

0

0.05

0.1

0.15

0.2

0.25

-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25

Num

ber o

f spe

cies

ln body weight class

Collembola

nn

xf i)( 1

Weighed mean

k

iii

k

i

iin

i

i xfxnxn

nx

x111

)(

Discrete distributions

Continuous distributions

max

min

)( dxxxf

The average European springtail has a body weight of e-1.476 = 023 mg.

Most often encounted is a weight around e-1.23 = 029 mg.

Why did we use log transformed values?

SpeciesAverage

body length [mm]

Body weight

[mg]

Tetrodontophora bielanensis (Waga 1842) 7 13.472Orchesella chiantica Frati & Szeptycki 1990 7 13.472Disparrhopalites tergestinus Fanciulli, Colla, Dallai 2005 6.875 12.925Orchesella dallaii Frati & Szeptycki 1990 6 9.4503Seira pini Jordana & Arbea 1989 6 9.4503Isotomurus pentodon (Kos,1937) 5.3 7.1045Heteromurus (V.) longicornis (Absolon 1900) 5.3 7.1045Pogonognathellus flavescens (Tullberg 1871) 5.25 6.9513Orchesella hoffmanni Stomp 1968 5.25 6.9513Heteromurus (H) constantinellus Lučić, Ćurčić & Mitić 2007 5.06 6.3862Pogonognathellus longicornis (Müller 1776) 5 6.2134Orchesella devergens Handschin 1924 5 6.2134Orchesella flavescens (Bourlet 1839) 5 6.2134Orchesella quinquefasciata (Bourlet 1841) 5 6.2134

5 =JEŻELI(B86=0;0;EXP(-1.875+LN(B86)*2.3))

3.2875.1 ][]/[][ mmLLWemgW

0

100

200

300

400

500

-6.00 -4.00 -2.00 0.00 2.00 4.00

Num

ber o

f spe

cies

ln body weight class

Collembola

0

100

200

300

400

500

0 2 4 6 8 10

Num

ber o

f spe

cies

Body weight class

CollembolaLog transformed data Linear data

The distribution is skewed

Body weight [mg] class

means

Number of

speciesFrequency

Arithmetic mean

Geometric mean

0.01 7 0.004132231 3.702E-05 -0.0194839260.02 53 0.031286895 0.0005626 -0.1257225390.04 133 0.078512397 0.0028338 -0.2607901530.07 224 0.132231405 0.0095797 -0.3470954050.15 353 0.208382527 0.0303016 -0.4017981870.29 395 0.233175915 0.0680574 -0.2871426150.59 325 0.191853601 0.1123956 -0.1025856551.18 126 0.074380165 0.0874629 0.0120514462.36 45 0.026564345 0.062698 0.022812374.74 24 0.014167651 0.0671181 0.0220376819.51 9 0.005312869 0.0505194 0.011965782

Sum 1694 0.491566 -1.4757512Exp() 0.228606933

0

100

200

300

400

500

0 2 4 6 8 10

Num

ber o

f spe

cies

Body weight class

Collembola

LzWW

LWW

mmLLWemgWz

lnlnln

][]/[][

0

0

3.2875.1

In the case of exponentially distributed data we have to use the geometric mean.To make things easier we first log-transform our data.

nx

n

n

ii

n

ii

ex

1

ln

1

Geometric mean

The average European springtail has a body weight of

e-1.476 = 023 mg.

lb scaled weight classes

How to use geometric means

A tropical forest is logged during three years: first year 0.1%, second year 1% and third year 10% of area.

Hence the total decrease in forest area is

3 0 0A (1 0.001)(1 0.01)(1 0.1)A 0.890A

11% of area has been logged during three year. What is the mean logging rate per year?

Arithmetic mean

33 0 0

0.999 0.99 0.90.963

3

A 0.963 A 0.893A

Geometric mean

1/3

33 0 0

(0.999*0.99*0.9) 0.962

A 0.962 A 0.890A

nx

n

n

ii

n

ii

ex

1

ln

1

In multiplicative processes we should use the geometric mean.

ln body weight [mg] class means

Number of

speciesFrequency

Arithmetic mean

Variance

-4.72 7 =B2/B14 =A2*C2 =(A2-D14)^2*C2-4.02 53 0.031286895 -0.125723 0.202268085-3.32 133 0.078512397 -0.26079 0.267516588-2.62 224 0.132231405 -0.347095 0.174619987-1.93 353 0.208382527 -0.401798 0.042653444-1.23 395 0.233175915 -0.287143 0.013917567-0.53 325 0.191853601 -0.102586 0.1698983170.16 126 0.074380165 0.0120514 0.1995107270.86 45 0.026564345 0.0228124 0.1447740291.56 24 0.014167651 0.0220377 0.1301786272.25 9 0.005312869 0.0119658 0.073837264

Sum 1694 -1.475751 1.462535979StDev 1.209353538

0

0.05

0.1

0.15

0.2

0.25

-4.72 -4.02 -3.32 -2.62 -1.93 -1.23 -0.53 0.16 0.86 1.56 2.25

Num

ber o

f spe

cies

ln body weight class

Collembola

nn

xf i)( 1

1

)(1

2

2

n

xxs

n

ii

n

xn

ii

1

2

2

)(

Variance

)()(1

22i

n

ii xfxxs

Continuous distributions

dxxfxxs max

min

22 )()(

2ss Standard deviation

Mean

1 SD

The standard deviation is a measure of the width of the statistical distribution that has the sam

dimension as the mean.

Degrees of freedom

The standard deviation as a measure of errorsEnvironmental pollution

Station NOx [ppm]1 8.492 1.123 9.114 7.755 0.756 8.237 0.978 6.069 8.48

10 5.8811 8.5112 9.6213 3.3514 7.7415 2.0316 5.0617 7.6118 0.9919 2.5520 8.91

Mean 5.66Variance 10.45

Standard deviation

3.23

DistanceAverage NOx

concentrationStandard deviation

1 9.53 1.702 7.37 1.183 5.24 0.864 3.15 0.265 2.17 0.186 1.05 0.097 0.84 0.148 0.63 0.109 0.32 0.03

10 0.21 0.02

The precision of derived metrics should always match the precision of the raw data

02468

101214

1 2 3 4 5 6 7 8 9 10

Conc

entr

ation

Distance [km]

± 1 standard deviation is the most often used estimator of error.The probablity that the true mean is within ± 1 standard deviation is approximately 68%.The probablity that the true mean is within ± 2 standard deviations is approximately 95%.

± 1 standard deviation

MeanStandard deviation

5.44 4.15

4.49 5.29

5.55 3.39

5.56 3.13

Standard deviation and standard errorEnvironmental

pollution

StationNOx

[ppm]1 8.492 1.123 9.114 7.755 0.756 8.237 0.978 6.069 8.48

10 5.8811 8.5112 9.6213 3.3514 7.7415 2.0316 5.0617 7.6118 0.9919 2.5520 8.91

The standard deviation is constant irrespective of sample size.

The precision of the estimate of the mean should increase with sample size n.

The standard error is a measure of precision.

n

SDSE

DistanceAverage NOx

concentrationStandard deviation

Standard error n=20

1 9.53 3.32 0.742 7.37 2.45 0.553 5.24 1.24 0.284 3.15 0.67 0.155 2.17 0.87 0.196 1.05 0.34 0.087 0.84 0.14 0.038 0.63 0.10 0.029 0.32 0.03 0.01

10 0.21 0.02 0.01

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10

Conc

entr

ation

Distance [km]

)()()(2)()()()(1

2

11

2

1

22i

n

ii

n

iii

n

iii

n

ii xfxxfxxxfxxfxxs

2

1

22

1

22 )()(1)(2)()( xxfxxxxxfxs i

n

iii

n

ii

E(x2) [E(x)]2

222 )()( xExE

The variance is the difference between the mean of the squared values and the squared mean

1

( ) ( )n

k ki i

i

E X x f x

( ) ( )k kE X x f x dx

( )E X k-th central moment

2 2 2

1

( ) ( ) (( ) )n

i ii

X f X E X

Mathematical expectation

Central moments

First central momentFirst moment of central tendency

2

11

2

2

11

n

x

n

xs

n

ii

n

ii

Frequency distributions of resource use or wealth in a population can be described by a power law (the famous Pareto-Zipf law) with exponents that often have values around

-5/2. What are the mean and the variance of such a power function distribution?

1)(max

min

xf zaxxf )(

z Mean Variance StDev2.5 1.50942 18.9121 4.34881

x f(x) f(x)/sum xf(x) (x-m)2f(x)1 1 0.756475 0.756475 0.5669292 0.176777 0.133727 0.267454 1.5424843 0.06415 0.048528 0.145584 1.8600554 0.03125 0.02364 0.094559 2.0018365 0.017889 0.013532 0.067661 2.0786746 0.01134 0.008579 0.051472 2.125627 0.007714 0.005835 0.040846 2.1567168 0.005524 0.004179 0.033432 2.1785479 0.004115 0.003113 0.028018 2.194559

10 0.003162 0.002392 0.023922 2.206711Sum 1.32192 1 1.50942 18.9121

0.001

0.01

0.1

1

1 10

Freq

uenc

y

Wealth class

2/5)( xxf

32.1)(

2/5

xxf

Discrete distribution

Most people are in the lowest income class and the average is half between the first and the second.

07.193.01

193.0

93.035.0

35.10

)32(

2/32/35.10

5.0

2/35.10

5.0

2/5

aa

aaa

xa

dxax

37.2)5.05.10(14.25.01

07.193.01 2/12/1

5.10

5.0

2/15.10

5.0

2/5

xdxxx

Continuous approximation

0.001

0.01

0.1

1

0 1 2 3 4 5 6 7 8 9 10Fr

eque

ncy

Wealth class

Upper bound of ten would only cover half of the column

Note that the y-axis is at log

scale.

56.1)5.05.10(52.15.01

76.032.11 2/12/1

5.10

5.0

2/15.10

5.0

2/5

xdxxx

The estimate of a is imprecise

The Arrhenius probability model assumes the same probability of an event irrespective of the time that elapsed from the starting.

What are the mean and the variance of such a distribution?

max

min

1)( dxxf

0

00

1

1

11

dte

aa

ea

dtea

t

tt

1)1(

00

te

dttext

t

taetf )(

Cumulative density function

2

2

22

2

0

2

22

0

22

112

2)22(][

tte

dtetxEt

t

00.20.40.60.8

1

0 2 4 6 8

f(x)

x

3

3

(( ) )E X

Skewness

3 3 2 2 3 3 2 3(( ) ) ( ) 3 ( ) 3 ( ) ( ) 3 ( ) 2E X E X E X E X E X E X Third central moment

4

4

( )( ) 3X

E

Kurtosis

00.20.40.60.8

1

0 2 4 6 8

f(x)

x

00.20.40.60.8

1

0 500 1000 1500 2000f(

x)

x

00.20.40.60.8

1

1 1.5 2

f(x)

x

g=0 g>0 g<0

Symmetric distribution Right skewed distribution Left skewed distribution

d=0

00.20.40.60.8

1

0 2 4 6 8

f(x)

x

d>0

How to get the modus?

We need the maximum of the pdf

00.20.40.60.8

1

0 2 4 6

f(x)

x

xxey Mode

10

xexedxdxe xx

x

111)1(

00

xedxxe

xx

2)22()(0

2

0

2

0

xxedxexdxxxexE xxx

A probability distribution if

Arithmetic mean

Mean

Body volumes are estimated from measures of height*length*width. Assume you estimated the thorax volume of insects and used this volume to infer the body weight.

WidthHeightLengthcV

zWidthHeightLengthamgW ][

How to get the parameters a and z?

Standard deviation is a measure of accuracy (error) Independent measurements

n

iitotal

n

iitotal

1

22

1

22 ;

035.00012.0

0012.002.002.002.0 2222

total

total

Body weights are estimated from species weights against thorax

volume.

y = 1.7754x0.6072

0.0000.5001.0001.5002.0002.5003.000

0.000 0.500 1.000 1.500 2.000

Dry

wei

ght

Thorax volume

The body weight of a new species is estimated from the regression function

Height, length and width could be measured with an accuracy of ± 2%.

2

The error of the thorax estimate is 3.5%.

zWidthHeightLengthamgW ][

Home work and literature

Refresh:

• Arithmetic, geometric, harmonic mean• Cauchy inequality• Statistical distribution• Probability distribution• Moments of distributions• Error law of Gauß• Bootstrap

Prepare to the next lecture:

• Bionomial distribution• Mean and variance of the binomial

distribution• Poisson distribution• Mean and variance of the Poisson

distribution• Moments of distributions• DNA mutations• Transition matrix

Literature:Łomnicki: Statystyka dla biologów

Binomial distribution:http://www.stat.yale.edu/Courses/1997-98/101/binom.htmPoisson dstribution:http://en.wikipedia.org/wiki/Poisson_distribution