29
1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North Carolina State University

1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

1

Cladistic Clustering of Haplotypes

in Association Analysis

Jung-Ying Tzeng

Aug 27, 2004

Department of Statistics & Bioinformatics Research Center

North Carolina State University

Page 2: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

2

Simple Disorder vs. Complex Disorder

Peltonen and McKusick (2001). Science

Page 3: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

3

Complex Disorders

Liability genes = genes containing variants increasing disease liability

Goal: look for such genes Rely more on the epidemiological evidences

Association analysis Case-control studies Detect liability genes by searching for association

between disease status and genetic variants

Page 4: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

4

Genetic Markers

Instead of studying the whole DNA sequences, we look at a subset of

them---genetic markers

SNP: Single Nucleotide Polymorphism

• Pro: dense; 100-300bp

• Con: binary variants

Resolved by considering adjacent SNPs jointly

Page 5: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

5

Haplotype-based Association Analysis

Haplotype = maker sequence

Haplotye-based association analysis

TCTC

CACA

Case Control

Hap 1Hap 2Hap 3

.

.

.

Hap k

T C T C

C A C A

Page 6: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

6

Haplotype-based Association Analysis

Problem: findings are not replicable• Under-powered (Lohmueller et. al 2003; Neal and Sham 2004 )

Solution:

1. Use large samples (Lohmueller et. al 2003)

2. Reduce the dimension of the parameter space

Page 7: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

7

Dimensionality

Haplotype distribution within a block

Daly et al. (2001) Nature Genetics

Method I: Truncating

: tag SNPs

Page 8: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

8

Evolutionary tree of haplotypes

Minimize the haplotype distance within clusters

000000

100000

100001

100011 100101 101001 110000

010000

011001 000100

011000

111000

Method II: Clustering (Molitor et al. 2003; Durrant et al. 2004)

Page 9: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

9

Method II: Clustering

000000

100000

100001

100011 100101 101001 110000

010000

011001 000100

011000

111000

Page 10: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

10

000000

100000

100001

100011 100101 101001 110000

010000

011001 000100

011000

111000

Method II: Clustering

Page 11: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

11

Observed Hap ={ 000, 001, 010, 100,110, 101, 011, 111 }

001

101

110

010

011000

111

100 001

101

110

010

011000

111

100

Method III: Cladistic Grouping(Templeton 1995)(Seltman et al. 2003)

Cladogram

Page 12: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

12

Include all samples

Incorporate both haplotype distance and age

• High frequency ancient (Crandall & Templeton 1995)

• Low frequency young

Allow uncertainty in inferring the underlying

evolutionary relationship

Desired Features

Page 13: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

13

Possible Hap = { 000, 001, 010, 100, 110, 101, 011, 111 }

110

001 101011

000

111010 100

{ 110 } (2)

*(i)t = (i)t + (i+1)t B(i+1)

{ 000, 010, 111, 100 }

{ 001, 011, 101 }

(1)

(0)

001 101011

111010 100

000

110 B(2)

B(1)

Proposed Approach: Cladistic Clustering

p 1-p

q1 q2 1-q1-q2

*t = tB

= (0)t (1)t (2)t

B(2)B(1)

B(1)

I

Page 14: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

14

Issues

1. Determine major nodes (0)

2. Construct conditional allocating matrix B(i)

Page 15: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

15

110

001 101011

000

111010 100

{ 110 }

{ 000, 010, 100, 111 }

{ 001, 011, 101 }

B(2) =

C = ()

c c c c110

000 010 100 111

(2)

(1)

(0)

Conditional Allocating Matrix B(i)

*(1)t = (2)t B(2) + (1)t

[0,1likelihood of one step movement

B(2)

110

111010 100

000

Page 16: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

16

B(1) =

*t = (0)t + (1)t B(1) + (2)t B(2)B(1)

Conditional Allocating Matrix B(i)

110

001 101011

000

111010 100

100

111

010

000

101011001

101110

101

101110

110

Page 17: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

17

Determine

Information criteria

• Net Information (Shannon’s Information content)

k

k

iii nk /)(log)/1(log 2

12

Page 18: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

18

Net Information and (0)

Page 19: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

19

Association Analysis Based on *

Coalescent simulation (Hudson’s 2002):

• Prevalence = 0.01

• Relative Risk = 2

• Frequencies of liability Allele = (0.1, 0.3, 0.5)

• Location of liability allele = (hot spot, blocky, very blocky)

• Draw 200 cases and 200 controls

Test of homogeneity based on *cs and *cn

Page 20: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

20

Power and Type I error

Gene Pelc Gene IL01RB

Page 21: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

21

Summary

Provide a mechanism of cladistic clustering by * B

• Combine the ideas of Truncating and Clustering

• Based on evolutionary relationship without reconstruct cladogram

• Incorporate haplotype frequencies and distance in cluster assignment

• One-step conditional regrouping can accommodate multiple step regrouping: self-repeating, algebraic multiplicative

• Reserve (0) based on information criteria

* increases test efficiency

• Increased power even for large samples and haplotypes in block regions

Page 22: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

22

End of Slides

Page 23: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

23

Approach

Two stages:

• Stage I: (Where)

Identify the susceptible regions across genome

(multiple testing problem)

Approaches based on haplotype similarity

• Stage II: (Which)

Determine and pinpoint the specific liability

variants

Study individual effects of groups of haplotypes

Page 24: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

24

I. Haplotype Similarity

• Van Der Meulen and te Meerman 1997; Bourgain et al. 2000-2002; Tzeng et al. 2003ab

• Search for extra haplotype sharing among cases

• Pro: 1 degree of freedom

• Con: not study individual haplotype effect

• Usage: good for genome screening

Strategies of Reducing Degrees of Freedom

Page 25: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

25

Strategies of Reducing Degrees of Freedom

Freq(%)

1 A C A C C C C C G G G C C G 45

2 . . . . . . . . . . . A . . 20

3 C T T G . T A T T A . . . . 13.25

4 . . . . . . . . . . . . . A 11.25

5 C . T . T . A . . . A A . . 3.75

6 . . . . . . . . . . . . T . 3.50

7 C . . . . . . . . . . . . . 1.50

8 C . T . T . A . . . . . . . 0.50

9 . T T G . T A T T A . . . . 0.50

1 A C G

2 . A .

3 T . .

4 . . A

5 T A .

(1) . . .

(1) . . .

6 T . .

(6) T . .

tag SNP

II. Haplotype Tagging (Johnson et al. 2001)

• Pro: efficiently capture the major diversity

• Con: discard rare haplotypes

Page 26: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

26

III. Haplotype Clustering

• Molitor et al. 2003; Seltman et al 2001, 2003; Durrant et al 2004

• Similar haplotypes induce similar liability effect

• Cluster haplotypes and perform analysis based

on clusters of haplotypes

• Pro: incorporating all data

• Con: may cluster two major haplotypes in the

same group

Strategies of Reducing Degrees of Freedom

Page 27: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

27

Approach

Two stages:

• Stage I: (Where)

Identify the susceptible regions across genome

(multiple testing problem)

Approaches based on haplotype similarity

• Stage II: (Which)

Determine and pinpoint the specific liability variants

Study individual effects of groups of haplotypes

Page 28: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

28

Haplotype Grouping

Focus on Stage II

Combine the pros of haplotype tagging and clustering

Page 29: 1 Cladistic Clustering of Haplotypes in Association Analysis Jung-Ying Tzeng Aug 27, 2004 Department of Statistics & Bioinformatics Research Center North

29

Power and Type I error

Gene Pelc Gene IL01RB