Genes and Diseases

Preview:

DESCRIPTION

Genetic Epidemiological Strategies in the Search for Genes Tuan V. Nguyen University of New South Wales Faculty of Medicine. Genes and Diseases. Many diseases have their roots in gene and environment. - PowerPoint PPT Presentation

Citation preview

Genetic Epidemiological StrategiesGenetic Epidemiological Strategiesin the Search for Genesin the Search for Genes

Tuan V. Nguyen

University of New South WalesFaculty of Medicine

Genes and DiseasesGenes and Diseases

• Many diseases have their roots in gene and environment.

• Currently, >4000 diseases, including sickle cell anemia and cystic fibrosis, are known to be genetic and are passed on in families.

Genes and Medical SciencesGenes and Medical Sciences

The central question for the medical sciences is the extent to which it will be possible to relate events at the molecular level with the clinical findings or phenotypes of patients with particular diseases.

ContentsContents

• Genes and DNA

• Detection of genetic effects

• Search for specific genes

ChromosomesChromosomes

Each human cell contains 23 pairs of chromosomes (distinguished by size and banding pattern). This is for males. Females have two XX chromosomes

DNA and GenesDNA and Genes

• DNA carries the instructions that allow cells to make proteins.

• DNA is made up of 4 chemical bases (A, T, G, C).

• The bases make “words”: AGT CTC GAA TAA

• Words make “sentence” = genes:

< AGT CTC GAA TAA>

Genes, Alleles, and GenotypesGenes, Alleles, and Genotypes

• Location of a gene is called locus.

• Alleles are alternate forms of a gene. Example: A, a

• Genotype: the maternal and paternal alleles of an individual at a locus defines the genotype of the individual at that locus. Example: AA, Aa, aa.

How Do Genes Work?How Do Genes Work?

• Genes tell cell how to make molecules, called proteins.

• Protein allows cells to perform specific functions.

• If the instructions are fine, things will be normal. If the instructions are changed (mutated), abnormality will be resulted.

InheritanceInheritance

• The passing of genes from parents to child is the basis of inheritance.

• We are not identical to our parents: half of our genes are from our mothers and half from our fathers.

• Each brother and sister inherits different combination of chromosomes. N = 2^23 = 8,388,608 combinations.

• Identical twins receive exactly the same combination of genes from their parents.

Genetic effectsGenetic effects

• Three types of gene action: additive, dominant,and epistasis.

• Additive effect. – AA: 9, Aa = 7, aa = 5.

• Dominant effect. – AA: 9, Aa = 9, aa = 5.

• Epistasis: interaction of alleles ar 2 loci – For locus 1: AA: 9, Aa = 7, aa = 5.– For locus 2: AA: 5, Aa = 5, aa = 9.

How to detect genetic effects?How to detect genetic effects? How to detect genetic effects?How to detect genetic effects?

Clues to Genetics and EnvironmentClues to Genetics and Environment

Epidemiol characteristics Genetics EnvironmentGeographic variation + +Ethnic variation + +Temporal variation - +Epidemics +/- +Social class variation - +Gender variation + +Age +/- +Family variables

History of disease + +Birth order +/- +Birth interval - +Co-habitation - +

Methods of Investigation of Genetic TraitsMethods of Investigation of Genetic Traits

• Family studies. Examine phenotypes (diseases) in the relatives of affected subjects (probands).

• Twin studies. Examine the intraclass correlation between MZ (who share 100% genotypes) and DZ twins (who share 50% genotypes).

• Adoption studies. Seek to distinguish genetic from environmental effects by comparing phenotypes in children more closely resemble their biological than adoptive parents.

• Offspring of discordant MZ twins. Control for environmental effect; test for large genetic contribution to etiology.

Basic Genetic-Environmental ModelBasic Genetic-Environmental Model

Phenotype (P) = Genetics + Environment

Genetics = Additive (A) + Dominant (D)

Environment = Common (C) + Specific (E)

=> P = A + D + C + E

Cov(Yi,Yj) = 2ij2(a) + ij2(d) + ij2(c) + ij2(e)

ij : kinship coefficient

ij : Jacquard’s coefficient of identical-by-descent

ij : Probability of sharing environmental factors

ij : Residual coefficient

VP = VA + VD + VC + VE

Statistical Genetic ModelStatistical Genetic Model

V = variance; P = Phenotype; A, D, C, E = as defined

Kinship coefficientsKinship coefficients

Expected coefficient forRelative 2(a) 2(d) 2(c)Spouse-spouse 0 0 1Parent-offspring 1/2 0 1Full sibs 1/2 1/4 1Half-sibs 1/4 0 1Aunt-niece 1/4 0 1First cousins 1/8 0 0Dizygotic twins 1/2 1/4 1Monozygotic twins 1 1 1

Broad-sense heriatbility: H2 = (VA+ VD) / VP

Narrow-sense heriatbility: H2 = VA / VP

Cov(Yi,Yj) = 2ij2(a) + ij2(d) + ij2(c) + ij2(e)

VP = VA + VD + VC + VE

Heritability (HHeritability (H22))

Statistical Methods for Estimating HeritabilityStatistical Methods for Estimating Heritability

• Simple linear regression Yoffp = (Yp ) + e

H2 = 2

• Twin concordanceIntraclass correlation: rMZ and rDZ

H2 = 2(rMZ - rDZ)

• Path analysis and variance component model

Twin 1 Twin 2

E1 C1 D1 A1 A2 D2 C2 E2

Path Model for Twin DataPath Model for Twin Data

r = 1

r = .5 / .25

r = 1 / .5

a c d e a d c e

A=additive; D=dominant; C=common environment; E=specific environment

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Twin 1

Tw

in 2

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

Twin 1

Tw

in 2

Intraclass Correlation: Intraclass Correlation: Femoral neck bone massFemoral neck bone mass

MZ DZ

rMZ = 0.73 rMZ = 0.47

rMZ rDZ H2 (%)

Lumar spine BMD 0.74 (0.06) 0.48 (0.10) 77.8

Femoral neck BMD 0.73 (0.06) 0.47 (0.11) 76.4

Total body BMD 0.80 (0.05) 0.48 (0.10) 78.6

Lean mass 0.72 (0.06) 0.32 (0.12) 83.5

Fat mass 0.62 (0.08) 0.30 (0.12) 64.8

Genetic Determination of Lean, Fat and Bone MassGenetic Determination of Lean, Fat and Bone Mass

rMZ, rDZ : Intraclass correlation for MZ and DZ twins

Multivariate Analysis: Multivariate Analysis: The Cholesky Decomposition ModelThe Cholesky Decomposition Model

Leanmass

Fatmass

LSBMD

FNBMD

TBBMD

E1 E2 E3 E4 E5

G1 G2 G3 G4 G5

LS=lumbar spine, FN=femoral neck, TB=total body, BMD = bone mineral density

LM FM LS FN TB

Lean mass (LM) 0.52 0.39 0.23 0.51

Ft mass (FM) 0.16 0.41 0.36 0.70

Lumbar spine BMD (LS) 0.08 0.02 0.57 0.70

Femoral neck BMD (FN) 0.16 0.05 0.64 0.61

Total body BMD (TB) 0.09 0.31 0.75 0.58

Genetic and Environmental Correlation between Genetic and Environmental Correlation between Lean, Fat and Bone MassLean, Fat and Bone Mass

Strategies for finding genesStrategies for finding genesStrategies for finding genesStrategies for finding genes

How many genes?How many genes?

• Initial estimate: 120,000.

• DNA sequence: 60,000 - 70,000.

• HGP: 32,000 - 39,000 (including non-functional genes = inactive genes).

Effect size

Num

ber of genes

Major genes

Polygenes

Oligogenes

Distribution of the number of genesDistribution of the number of genes

Finding genes: a challengeFinding genes: a challenge

One of the most difficult challenges ahead is to find genes involved in diseases that have a complex pattern of inheritance, such as those that contribute to osteoporosis, diabetes, asthma, cancer and mental illness.

Why Search for Genes?Why Search for Genes?

• Scientific value • Study genes’ actions at the molecular level

• Therapeutic value• Gene product and development of new drugs;

• Gene therapy

• Public health• Identification of “high-risk” individuals

• Interaction between genes and environment

Genomewise screening vs Genomewise screening vs Candidate aene approachCandidate aene approach

• Genomewise screening• No physiological assumption

• Systematic screening for chromosomal regions of interest in the entire genome

• Candidate gene• Proven or hypothetical physiological mechanism

• Direct test for individual genes

Linkage vs AssociationLinkage vs Association

• Linkage• Transmission of genes within pedigrees

• Association• Difference in allele frequencies between cases and

unrelated controls

Statistical modelsStatistical models

• Linkage analysis traces cosegregation and recombination phenomena between observed markers and unobserved putative trait. Significance is shown by a LOD (log-odds) score.

• Association analysis compares the frequencies of alleles between unrelated cases (diseased) and controls.

• Transmission disequilibrium test (TDT) examines the transmission of alleles from heterozygous parents to those children exhibiting the phenotype of interest.

Two-point linkage analysis: an exampleTwo-point linkage analysis: an example

??138 /142

134 /142 146 / 154

142 /146 142 /154 134 / 146 142 / 154 134 / 146 134 / 154 134 / 146 134 / 154

Non Rec Non Non Non Non Rec Non

D142

D142

d134

Non = non-recombination; Rec = recombination

134

142

D d

1/4 1/4

1/41/4

134

142

D d

0 1/2

01/2

134

142

D d

(1-)/2

/2(1-)/2

No linkage Complete linkage

Incomplete linkage

8

26

10

41

221

log

θθ

LOD

LODscore

Estimated value of 0 0.1 0.2 0.3 0.4 0.5

Estimation of Estimation of

-6

-4

-2

0

+2

+4

+6Max LOD score

Basic linkage modelBasic linkage model

LR: likelihood ratio

LR() = L(data | ) / L(data | = 0.5)

LOD = Log10 max [LR()]

Haseman-Elston modelHaseman-Elston model(allele sharing method)(allele sharing method)

Xi1 = value of sib 1; Xi2 = value of sib 2 i = abs(Xi1 - Xi2)2

i = probability of genes shared identical-by-descentE(i | i) = + i

If = 0 => 2(g) = 0; = 0.5, i.e. No linkageIf < 0 => 2(g) > 0; ne 0.5, i.e. Linkage

Behav Genet 1972; 2:3-19

Identical-by-descent (IBD)Identical-by-descent (IBD)

126 / 130 134 / 138

126 / 134 126 / 138 130 / 134 130 / 138 126 / 138 A B C D E

• A and D share no alleles• A, B and E share 1 allele (126) ibd; C vs D; A vs C; B, D and E• B and E share 2 (126 and 138) alleles ibd

Alleles ibd if they are identical and descended from the same ancestral allele

Identical-by-state (IBS)Identical-by-state (IBS)

126 / 126 126 / 138

126 / 126 126 / 138 126 / 138 126 / 126 A B C D

• A and D share 1 allele (126) ibs• B and C share 126 ibs, 138 ibd

Alleles ibs if they are identical, but their ancestral derivation is unclear

oooooooo

o

ooooooooo

ooooooooo

Squareddifference in BMDamong siblings

Number of alleles shared IBD

0 1 2

Sibpair linkage analysis: Sibpair linkage analysis: allele-sharing methodallele-sharing method

0

5

10

15

20

25

0 1 2

Alleles shared IBD

Intr

apai

r di

ffer

ence

(%

)

Linkage between VDR gene and lumbar spine bone mineral density in a sample of 78 DZ twin pairs. Nature 1994; 367:284-287

Association analysisAssociation analysis

• Presence/absence of an allele in a phenotype.

Genotype Fx No FxBB 50 10Bb 30 30bb 20 60Total 100 100

Frequency of allele B among fx: (50x2 + 30) / (100x2) = 0.65Freq. of allele B among no fx: (10x2 + 30) / (100x2) = 0.25

Association analysis: an exampleAssociation analysis: an example

0.8

0.9

1

1.1

BB Bb bb

VDR genotype

g/cm

2

Association between vitamin D receptor gene and bone mineral density

Association analysisAssociation analysis

• Three conditions of association• The genetic marker is the putative gene

• The marker is in linkage disequilibrium (association) with the putative gene or with a nearby locus

• Random artefact, population admixture

Linkage and associationLinkage and association

• Linkage without association• Many trait-causing loci

• Association between a marker and a loci can be weak or absent

• Association without linkage• A minor effect of the genetic marker

• Poor discriminant power for phenotype within a pedigree

Statistical issuesStatistical issues

Diagnostic reasoning

Disease is really

Test Present Absent______________________________________________

+ve True +ve False +ve

-ve False -ve True -ve______________________________________________

Statistical reasoning

Null hypothesis (Ho) is

Stat test Not true True______________________________________________

Reject Ho No error Type I ()

Accept Ho Type II () No error______________________________________________

Study design: minimize type I and type II errors

LOD = 3 LOD = 4

1.1 7460 89311.2 2048 25661.3 1033 12991.5 489 6152.0 199 2421.5 191 1543.0 88 115

No. of sibpairs required to establish linkage No. of sibpairs required to establish linkage for a single gene and recombination = 0for a single gene and recombination = 0

= familial relative risk

Strategies for improvement of powerStrategies for improvement of power

• Population and sampling

• Phenotypes

• Statistical analysis

Population and samplingPopulation and sampling

• Population• Homogenous populations

• Sampling units• Related members

• Large, multigenerational families (rather than sibpairs)

• Phenotypes• Low-level, intermediate

• Well-defined and highly reproducible

Statistical analysesStatistical analyses

• Multivariate analysis vs. univariate analysis

• Variance component model

• Power• Locus-specific power: probability of detecting an

individual locus associated with the trait, e.g. 1-i

• Genomewide power: probability of detecting any of the k loci, e.g. 1-1 x 2 x 3 x … x k

• Studywise power: probability of detecting all k loci, e.g. (1-1) x (1-2) x (1-3) x ... x (1-k)

SummarySummary

• Most diseases are regulated by genes and environment.

• Genetic dissection of multifactorial diseases is a challenge.

• Gene-hunting is a major endeavour in epidemiological research.

• Substantial progress in statistical models.

PerspectivePerspective

• Can genes be found?

• The Human Genome Project

• Influences of biotechnology

• Should “epidemiology” become “genetic epidemiology”?

• BMJ 2001; 322: 28 April. Special issue on genetics.• Nguyen TV, Eisman JA. Genetics of fracture:

challenges and opportunities. J Bone Miner Res 2000; 15:1253-1256.

• Nguyen TV, Blangero J, Eisman JA. Genetic epidemiological approaches to the search for osteoporosis genes. J Bone Miner Res 2000; 15:392-401.

• Nguyen TV, et al. Bone mass, lean mass and fat mass: same genes or same environment. Amer J Epidemiol 1998; 147:3-16.

Further readingsFurther readings

Recommended