29
Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Embed Size (px)

Citation preview

Page 1: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Statistical model for count dataSpeaker : Tzu-Chun LoAdvisor : Yao-Ting Huang

Page 2: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Outline

•Why use statistical model•Target

▫Gene expression•Binomial distribution

▫Poisson distribution•Over dispersion•Negative binomial

▫Chi-square approximation•Conclusion

Page 3: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Statistics model

•A statistical model is a probability distribution constructed to enable inferences to be drawn or decisions made from data.

Population

sample Information :

Inference

Make a decision : Hypothesis testing

designer consumer

We have to choose astatistics model for sample(mean, variance)

We

Height, weight, etc.

(mean, variance) size

Page 4: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Target

• Gene expression▫ We like to use statistical model to test an observed difference in read counts is significant.

Look like asignificantregion

How about thisCan we sure ?

Noise or not

Page 5: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Count data

•A type of data in which the observations can

take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking.•An individual piece of count data is often

termed a count variable.Binomial

Poisson

Negative binomial

All of themare this type

Page 6: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Binomial distribution•The number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.•Notation :

 

Page 7: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Binomial distributionEx : p=0.8 , (1-p)=0.2 , times : 3 , success : 2 (1 1 0) (1 0 1) (0 1 1) f(2)=0.384

33 goals110 shotsin this season

Success : 0.3Fail : 0.7

What is the probabilityif he scored 6 goals in 10 shots

Page 8: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Binomial distribution

•Exactly six goals

•Most three goals

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

binomial(n=10,p=0.3)

goals

probability

0 1 2 3 4 5 6 7 8 9 106

Page 9: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Poisson distribution

•Expresses the probability of a given number

of events occurring in a fixed interval. •Notation : •

Page 10: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang
Page 11: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Poisson distribution

•Suppose interval : goals per game

e = 2.718281828…

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

binomial(n=10,p=0.3)

goals

probability

Page 12: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

0 1 2 3 4 5 6 7 80

1

2

3

poissonraw data

Poisson

•Total : 11 games •Score : 33 goals•(33/11) = 3 goals per game•Poisson : •Raw data : •We could test inaccurately in this case by

poisson

Games

goals

Goals of game

0 1 2 3 4 5 6 7

Poisson 0.5

1.6

2.5

2.5

1.8

1.1

0.6

0.2

Raw data 1 2 2 2 2 0 1 1

Page 13: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

•The presence of greater variability (statistical dispersion) in a data set than would be expected based on a given simple statistical model.

Overdispersion

Page 14: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Negative binomial

•Gamma-poisson (mixture) distribution

Page 15: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Negative binomial

Page 16: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Parameter estimation

Page 17: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Approximate control limits

•Chi-square approximation

𝑣=2𝜇1+𝜇𝑘

Page 18: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Example

= 67.0

Page 19: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang
Page 20: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang
Page 21: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Conclusion

•Conclusion

•Thanks for attention

Page 22: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Statistics model

•Suitable type▫Which distribution should we use

•Parameters ▫Get some information from data

•Inference ▫What do we want to know▫How could we make a decision

Hypothesis testing

Page 23: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Statistics model

•Suitable type▫Binomial distribution

•Parameters ▫n = 10, p = 0.7

•Inference▫2 successes

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

dbinom(0:10, n=10, p=0.3)

goals

probability

Page 24: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Multinomial distribution

•The analog of the Bernoulli distribution is the categorical distribution, where each trial results in exactly one of some fixed finite number k of possible outcomes.•http://en.wikipedia.org/wiki/Multinomial_

distribution

Page 25: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Trinomial distribution

Page 26: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Count data

•A type of data in which the observations can

take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking.•We tend to use fixed fractions of genes.

The probability that reads appearedin this region

The number of read countsin this interval

(Binomial distribution) (Poisson distribution)

Page 27: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang
Page 28: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Poisson example

0 1 2 3 4 5 6 7 8 90

1

2

3

poissonraw data

Page 29: Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang

Negative binomial