51
Principles of genetic epidemiology April 2008 course

Principles of genetic epidemiology April 2008 course

Embed Size (px)

Citation preview

Page 1: Principles of genetic epidemiology April 2008 course

Principles of genetic epidemiology

April 2008 course

Page 2: Principles of genetic epidemiology April 2008 course

The post-genomic eraThe post-genomic era• Now that the full human genome sequence has been published, we

have access to genetic information in an unprecedented manner:– 3 billion base pairs in the human genome

– c 22 000 genes

– Tens of thousands of RNAs

– Hundreds of thousands of proteins

• Thus, developments in molecular genetic analysis render it now possible to attempt identification of liability genes in complex, multifactorial traits, and to dissect out with new precision the role of genetic predisposition and environment/life style factors in these disorders.

• New technologies and statistical tools are continuously introduced

• Nonetheless, often much hype and little real progress

Page 3: Principles of genetic epidemiology April 2008 course

In complex disease a person's susceptibility

genotype and environmental history combine to establish present health status,

and the genotype's norm of reaction determines future health trajectory

Genes, developmental history and environment as determinants of health

Page 4: Principles of genetic epidemiology April 2008 course

Characteristics of complex traits

Trait values are determined by complex interactions among numerous metabolic and physiological systems, as well as demographic and lifestyle factors

Variation in a large number of genes can potentially influence interindividual variation of trait values

The impact of any one gene is likely to be small to moderate in size

For diseases: Monogenic diseases that mimic complex diseases typically account for a small fraction of disease cases (examples in breast cancer, obesity, hypertension, osteoarthritis)

Example: Ala-Kokko L et al. Single base mutation in the type II procollagen gene (COL2A1) as a cause of primary osteoarthritis associated with a mild chondrodysplasia. PNAS 1990 ;87:6565-8. One large family, mutation not found otherwise.

Page 5: Principles of genetic epidemiology April 2008 course

Phenotype:

Clinical definitionDefine genetic componentIdentify data sets and data sources

Follow-up:(gene tracing & evaluation)

ReplicationFunctional studiesInteractions (gene-gene & gene-environment)

Analysis:

GenotypingStatistical analysisBioinformaticsVariation detection

Study design:

Case-control (association)Family-based (multiplex families/ sibpairs/ trio-based)Whole genome analysis

Steps in gene discovery, tracing and evaluation

Adapted from Fig 16.2 in Genetic Analysis of Complex Disease, Haines & Pericak-Vance, 2nd edition, 2006

Page 6: Principles of genetic epidemiology April 2008 course

Strategies for family studies:

• Does disease or behavior aggregate in families?

• What are the causes of familial aggregation?

• What is the model of genetic inheritance and which genes are responsible?

• How do genes interact with the environment?

Page 7: Principles of genetic epidemiology April 2008 course

Families are the basic unit

Page 8: Principles of genetic epidemiology April 2008 course

How to detect genetic effects and find genes?

Family studies:– provide estimates of heritability– information on mode of inheritance– adoption and twin studies as special cases

Molecular genetic studies:– candidate genes, genome-wide scans– association studies & linkage– animal studies (e.g.’knockouts’)

D1S1597

GATA29A05

D1S552

D1S1622

D1S2134

D1S1669

D1S1665

D1S551

D1S1588

D1S1631

D1S1675

D1S534

D1S1595

D1S1679

D1S1677

D1S1589

D1S518

D1S1660

D1S1678

D1S3465

D1S2141

D1S549

D1S1656

ATA29C07

Page 9: Principles of genetic epidemiology April 2008 course

What is heritability

Heritability is the estimate of the proportion in total variance of a trait or liability to a disease that is accounted for by genetic variance - interindividual genetic differences.

Genetic variance may arise from additive effects, due to different alleles at a locus, or may be due to dominance, the interactions of alleles

Heritability is a characteristic of populations, not individuals or families, which is affected by both genetic and environmental effects

Page 10: Principles of genetic epidemiology April 2008 course

Conceptual model of individual’s phenotype

Y = μ + G + Env

Where Env = C+E

Hence, variance can be decomposed:

σ2 = σ2G + σ2C + σ2E

Heritability is σ2G/σ2 and genetic variance has several components:

σ2G = σ2A + σ2D + σ2I

Page 11: Principles of genetic epidemiology April 2008 course

FAMILY STUDY

• Provides estimates of the degree of family aggregation

• Risks to siblings, parents, offspring as well as to other relatives can be estimated

• Similarity of different types of relatives can permit modelling of genetic versus non-genetic familial influences

Page 12: Principles of genetic epidemiology April 2008 course

• To disentangle genes and experience, we

study special family groups:

• Either family members sharing experiences but differing in shared genes, e.g. twin studies or

• family members sharing genes, but differing in their shared experience, e.g. adoption studies

Page 13: Principles of genetic epidemiology April 2008 course

ADOPTION DESIGNTest for association between trait in adoptees and trait in

biological parents (genetic correlation) &

Test for association between trait in adoptees and trait in

adoptive parents.

STRENGTHS: relatively powerful

WEAKNESSES:(1) poor generalizability

(2) adoptive parents likely to provide ‘good homes’

(3) biological parents of adoptive children may have

had multiple forms of psychopathology - selection

(4) poor characterization of phenotypes of biological

parents

Page 14: Principles of genetic epidemiology April 2008 course

The Classical Twin Study• Monozygotic (MZ) pairs are genetically alike• Dizygotic (DZ) pairs, like siblings, share on average half

of their segregating genes• DZ pairs can be same-sexed or opposite-sex (male-

female)• Increased similarity of twin pairs compared to unrelated

subjects suggests familial factors• Increased similarity of MZ pairs compared to DZ pairs

provides evidence for genetic factors

Page 15: Principles of genetic epidemiology April 2008 course

The classical twin study modelling• Model contribution of additive (A) and non-

additive (D)genetic effects, environmental effects shared by family members (C ) and unshared effects (E) (i.e. unique to each family member)

• Competing models, e.g. E, AE, ACE can be statistically compared and tested against actual data

• Mx – statistical program created by Mike Neale most commonly used in genetic modelling: http://views.vcu.edu/mx/

Page 16: Principles of genetic epidemiology April 2008 course

Different phenotypes,different effects of genes

Genetic effects

Non-genetic family effects

Experimentation (age 12) 11% 73%

Initiation/ever smoker

(adolescents)20-36% 18-59%

Initiation/ever smoker

(adults)28-80% 4-50%

Persistence/ cessation 58-71% None

Nicotine dependence (FTND or DSM-IV)

60-75% None

Page 17: Principles of genetic epidemiology April 2008 course
Page 18: Principles of genetic epidemiology April 2008 course

Extensions of the classical twin study I

• Effect modification by age, sex and environmental factors, e.g. smoking or obesity

• Assess genetic covariance over time through longitudinal models

• Assess sex effects by comparison of like-sexed and same-sexed DZ pairs

• Assess social interaction effects

Page 19: Principles of genetic epidemiology April 2008 course

Genetic Influenceson Change in BMI

A longitudinal study of Finnish twins

J.v.B.Hjelmborg, C.Fagnani, K.Silventoinen,M.McGue, M.Korkeila, K.Christensen, A.Rissanen, J.Kaprio

Page 20: Principles of genetic epidemiology April 2008 course

Finnish Twin Cohort

• Twins born 1930-1955 participating in three surveys in 1975, 1981 and 1990

• Wt and ht asked in each questionnaire

• 10556 twins answered all questionnaires

• Same sex pairs

• Age at baseline 20-45 y

Page 21: Principles of genetic epidemiology April 2008 course

Latent growth model for weight change in adults 1975-1990

Page 22: Principles of genetic epidemiology April 2008 course

Males (95% CI) Females (95% CI)

N of pairs 499 MZ, 1013 DZ 735 MZ; 1265 DZ

MZ correlation of BMI level 0.79 (.79,.80) 0.83 (.82,.83)

DZ correlation of BMI level 0.44 (.44,.45) 0.39 (.38,.39)

MZ correlation of weight gain 0.60 (.56,.68) 0.65 (.61,.71)

DZ correlation of weight gain 0.26 (.24,.32) 0.30 (.28,.32)

Heritability of BMI level 0.80 (.79,.80) 0.82 (0.81,0.84)

Heritability of rate of weight gain 0.58 (.50,.69) 0.64 (0.58,0.69)

Add. genetic correlation of BMI levels with rate of weight gain

-0.070 (-.13,-.068) 0.041 (0.00,0.076)

Unique environmental correlation of BMI levels with rate of weight gain

0.0094 (-.020,.091) 0.24 (0.14,0.34)

Genetic modeling results for latent growth curve model of BMI Finnish Twin Cohort 1975 – 1990

Page 23: Principles of genetic epidemiology April 2008 course

Summary of findings

• A longitudinal growth curve model provides better estimates of heritability – c 80% for adult BMI – c 60% for rate of weight gain over a 15 year period in

young to middle-aged adults• Genetic influences on baseline BMI and on rate

of weight gain are weakly, if at all, correlated• Genes regulating weight gain and loss are likely

to be different from those affecting BMI• Environmental effects on weight change appear

to be larger than on BMI

Page 24: Principles of genetic epidemiology April 2008 course

Extensions of the classical twin study II

• Define phenotypes by assessing the combination of signs and symptoms with highest heritability– for example, broad vs. narrow definitions of LBP

• Define natural history of disease by assessing genetic communality of different stages – for example, initiation, persistence, and dependence

in smoking• Common genetic pathways across phenotypes

– for example, hip, knee and hand OA; bone density in weight-bearing & non-weight bearing bones

Page 25: Principles of genetic epidemiology April 2008 course

Phenotype:

Clinical definitionDefine genetic componentIdentify data sets and data sources

Follow-up:(gene tracing & evaluation)

ReplicationFunctional studiesInteractions (gene-gene & gene-environment)

Analysis:

GenotypingStatistical analysisBioinformaticsVariation detection

Study design:

Case-control (association)Family-based (multiplex families/ sibpairs/ trio-based)Whole genome analysis

Steps in gene discovery, tracing and evaluation

Adapted from Fig 16.2 in Genetic Analysis of Complex Disease, Haines & Pericak-Vance, 2nd edition, 2006

Page 26: Principles of genetic epidemiology April 2008 course

How to detect genetic effects and genes?

Molecular genetic studies:– candidate genes, genome-wide scans– association studies & linkage– animal studies (e.g.’knockouts, knock-ins’)

Family studies:– provide estimates of heritability– information on mode of inheritance

– adoption and twin studies as special cases

D1S1597

GATA29A05

D1S552

D1S1622

D1S2134

D1S1669

D1S1665

D1S551

D1S1588

D1S1631

D1S1675

D1S534

D1S1595

D1S1679

D1S1677

D1S1589

D1S518

D1S1660

D1S1678

D1S3465

D1S2141

D1S549

D1S1656

ATA29C07

Page 27: Principles of genetic epidemiology April 2008 course

• ascertain pedigree units that are likely to segregate genes of relevance – Ex: pedigrees with quasi-Mendelian disease

transmission – affected sib pair approach of linkage analysis

• ascertain families on the basis of individuals with extreme or remarkable phenotypes– Ex: extremely discordant sibpairs – ascertain young individuals with the disease

• ascertain individuals from isolated populations: – more homogenous genetically and culturally as well

• ascertain intermediate phenotypes – physiologic phenotype is “closer” to sequence variants

Increasing the genetic signal in the data...

... At the cost of representativeness and ability to evaluate population risk

Page 28: Principles of genetic epidemiology April 2008 course

ISOLATED POPULATION

• Wonderfully isolated Finnish population

– Small number of founders

– Subsequent isolation

– Rapid expansion

– Major bottlenecks

→ Genetic drift has moulded the gene pool

• Genetic homogeneity, longer LD blocks

• Valuable for genetic studies, especially of

monogenic diseases

Page 29: Principles of genetic epidemiology April 2008 course

1. candidate gene analysismotto: study a few good genes

2. whole-genome searches (genome scans)

motto: cast out a net that catches all the big fish

Two basic Analysis Strategies

Page 30: Principles of genetic epidemiology April 2008 course

• statistically straightforward: test the association between genotypes and phenotype with contingency tables, chi-square test, regression

• principle: if an allele is more frequent in affecteds than unaffecteds gene may be close to a disease gene

• candidacy of a gene can come from a number of different sources: – biological insights (e.g. gene expressed in a certain

tissue)– homology to other genes – functional studies in model organisms – member of a relevant gene family

• Challenge: greater biological understanding of the genes

Candidate Gene Studies

Page 31: Principles of genetic epidemiology April 2008 course

Allelic association studies test whether alleles are associated with the trait

• 2 types of association tests– population-based association test

• cases and controls are unrelated• cross-classify by genotype• use 2 test, ANOVA or logistic regression

– family-based association tests (e.g. TDT)• cases and controls are related: parents, sibs etc• often based on allele transmission rates

• Multivariate/data reduction approaches– Multiple regression of all SNPs in gene– Haplotype analyses– False discovery rate and replication rather than p-values

• Pathway analyses– Combination of individual SNPs/genes and pathway

constraints

Page 32: Principles of genetic epidemiology April 2008 course
Page 33: Principles of genetic epidemiology April 2008 course

• best: allele increases disease susceptibility– candidate gene studies

• good: some subjects share common ancestor – linkage disequilibrium studies

• bad: association due to population stratification– family-based offer protection

The 3 possible causes for association

d

A1

d

M

K

AllelesLoci

Slide by Steven Horwath, 2003

Page 34: Principles of genetic epidemiology April 2008 course

POPULATION STRATIFICATIONHypothetical Example (by Andrew Heath)

Falsely infer that A1 allele is risk-factor for following traditional Mediterranean diet .

OR = 2.28, 95%CI 1.39 - 3.73

NO ASSOCIATION NO ASSOCIATION

NORTHERN EUROPEANANCESTRY (N=200)

SOUTHERN EUROPEAN ANCESTRY (N=200)

NOT A1 alleleA1 allele

NON-MED DIET

MEDDIET

NON-MEDDIET

MEDDIET

16218

90%

182

10%

3515

25%

10545

75%

70%

30%

90%

10%

NON-MEDDIET

MEDDIET

19733

12347

NOT A1 alleleA1 allele

MINGLED IN AUSTRALIAN POPULATION (N=400)

Page 35: Principles of genetic epidemiology April 2008 course

• Family-based association tests avoid confounding due to ethnic stratification – These designs automatically match "controls" to

cases on ethnic ancestry.• Conventional wisdom:

– family-based designs are generally less efficient than designs based on unrelated control subjects

– population admixture effects are negligible• Non-conventional wisdom

– family controls are better matched for environmental exposures

– cryptic relatedness may be an important issue in isolate populations

Population-based versus family-based association tests

Page 36: Principles of genetic epidemiology April 2008 course

Pathway approach

Hung et al. Cancer Epid Biomarker Prev 2004 &Conti et al. Human Heredity 2003

Page 37: Principles of genetic epidemiology April 2008 course

• involve anonymous markers, no candidate genes• hundreds of evenly spaced genetic markers in the

genome• often hundreds of related individuals in small to

large families • linkage analysis is statistical method to draw

inferences about the co-transmission of marker locus alleles and trait-influencing alleles

• Identifies chromosomal regions harboring the genes predisposing to trait (such as nicotine dependence)

Family-based Genome Scans

Page 38: Principles of genetic epidemiology April 2008 course

Co-transmission of disease and alleles

Aa

Aa

aa Aa

Aa aa

aaAa Aa

Aa

aa

Page 39: Principles of genetic epidemiology April 2008 course

Chromosome Phenotype LOD ≥2 /p-value

Author and year Country Number of families and individuals

2 FTQ 2.61 Straub et al. 1999 New Zealand 130 families, 343 individuals

FTQ 2.53 Sullivan et al. 2004 New Zealand 129 families

5 FTND 3.04 Gelernter et al. 2007 US 634 small nuclear families

6 FTND 2.70 Swan et al. 2006 US 158 nuclear families, 607 individuals

7 FTND 2.70 Swan et al. 2006 US 158 nuclear families, 607 individuals

FTND 2.73 Gelernter et al. 2007 US 634 small nuclear families

FTND 2.50 Loukola et al. 2007 Finland 153 families,505 individuals

8 FTND 2.7 Swan et al. 2006 US 158 nuclear families, 607 individuals

10 HSI 4.17 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals

FTQ 2.43 Straub et al. 1999 New Zealand 130 families, 343 individuals

FTQ 2.02 Sullivan et al. 2004 New Zealand 129 families

11 FTND 2.31 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals

HSI 2.15 Li et al. 2006 US (AA) 402 nuclear families, 1261 individuals

17 FTND 0.009* Lou et al. 2007 US (EA) 200 families, 671 individuals

AA=African-American sample, EA=European-American sample, * Lou et al 2007 reported a p-value

Page 40: Principles of genetic epidemiology April 2008 course

Index cases are twins

from pairs concordant

for heavy smoking

based on earlier

questionnaires from the

Finnish Twin Cohorts

1293 families (twin pairs) invited

762 families recruited with 2412 family members

(1278 men, 1134 women)

Data collection complete for 2143 persons

Interview, blood sample, informed consent

SAMPLE COLLECTION

Page 41: Principles of genetic epidemiology April 2008 course

– Identified Finnish families with DZ smoking twins

• Invited also siblings and parents to participate

– 153 affected twin-pair families, 505 individuals

– On average 3 individuals per family (range 2-9)

Phenotype definitions

1. Smoker (smoked ≥100 cigarettes during lifetime)

2. Nicotine dependent (Fagerström, FTND)

3. Nicotine dependent (DSM-IV)

4. Alcohol use (aiming for intoxication)

5. Co-morbid phenotype of FTND and alcohol use

STUDY SAMPLE

Page 42: Principles of genetic epidemiology April 2008 course

Chromosome 11- Nicotine WithdrawalL

OD

sco

re

cM position

Finnish Australian

Chromosome 11- Candidate Genes for Nicotine withdrawal in Finnish and Australian families

1. DRD42. TH3. CHRNA104. TPH15. ANKK1/DRD2, HTR3A, HTR3B

1 2 3 4 5

Page 43: Principles of genetic epidemiology April 2008 course

• involve anonymous markers, no candidate genes• chips of 300,000 to 1,000,000 SNPS on a single array

(Illumina, Affymetrix)• Hundreds to thousands of cases and unrelated controls• High-through-put genotyping of common SNPs such as

those identified from HapMap project • Over past two years many new genes in common diseases

have been identified• Two recent GWAs on nicotine dependence (Uhl et al, 2007,

Bierut et al, 2007) • New GWA on smoking cessation (Uhl G, et al, Arch Gen

Psychiatr, in press) finds genes with very little overlap to earlier GWAs on nicotine dependence

Genome-wide Case-Control Analyses

Page 44: Principles of genetic epidemiology April 2008 course

Li C-Y et al, PLoS Comput Biol 2008

Bioinformatics processing of existing information to discover biological pathways

Page 45: Principles of genetic epidemiology April 2008 course

Li C-Y et al, PLoS Comput Biol 2008

Page 46: Principles of genetic epidemiology April 2008 course

Phenotype:

Clinical definitionDefine genetic componentIdentify data sets and data sources

Follow-up:(gene tracing & evaluation)

ReplicationFunctional studiesInteractions (gene-gene & gene-environment)

Analysis:

GenotypingStatistical analysisBioinformaticsVariation detection

Study design:

Case-control (association)Family-based (multiplex families/ sibpairs/ trio-based)Whole genome analysis

Steps in gene discovery, tracing and evaluation

Adapted from Fig 16.2 in Genetic Analysis of Complex Disease, Haines & Pericak-Vance, 2nd edition, 2006

Page 47: Principles of genetic epidemiology April 2008 course

Integration of information at different levels

Developments in molecular genetics render it now possible to attempt identification of liability genes in complex, multifactorial traits, and to dissect out with new precision the role of genetic predisposition and environment/life style factors in these disorders. But, an integrative framework is needed

Complex picture

Gottesmann I, Science 1997

Page 48: Principles of genetic epidemiology April 2008 course

P

G4G1 G2 G3 E1 E4E2 E3

GE

P1 P4P2 P3

G’4

E’4E’1G’2 E’3

G’1

Measured Genotypes Measured Environments

Outcome Phenotype

Endophenotypes

TIME?

P5

G’5

E’

Eaves et al., 2005

Page 49: Principles of genetic epidemiology April 2008 course

• millions of SNPs, bi-allelic

• all common genetic variants known

• common function known

fast genotyping, sequencing, mutation detection

Information of genetic data will increase

past present, future

• microsatellites

•incomplete knowledge of variants•function barely known

• linkage analysis

genetic map

candidate genes

new technology

statistical methods

Linkage disequilibrium tests

Slide from Steve Horwarth

Page 50: Principles of genetic epidemiology April 2008 course

• Complex disease gene mapping is starting to fullfill its promise

• distinction between candidate gene studies and whole genome scans diminishes as genotyping costs decrease

• when collecting pedigrees enriched with affecteds always collect the DNA of good controls as well

• Put effort into high quality and detailed phenotyping– multiple, longitudinal measures– use intermediate, physiological phenotypes as traits– Imaging, metabolomics– gene expression and protein array measurements

To summarize

Page 51: Principles of genetic epidemiology April 2008 course

Useful reading

• JL Haines, MA Pericak-Vance. Genetic analysis of Complex Disease. Wiley, 2006

• DC Thomas. Statistical Methods in Genetic Epidemiology, Oxford 2004

• MJ Khoury. Human Genome Epidemiology, Oxford, 2003