Upload
yardan
View
45
Download
1
Embed Size (px)
DESCRIPTION
Genetic Epidemiological Strategies in the Search for Genes Tuan V. Nguyen University of New South Wales Faculty of Medicine. Genes and Diseases. Many diseases have their roots in gene and environment. - PowerPoint PPT Presentation
Citation preview
Genetic Epidemiological StrategiesGenetic Epidemiological Strategiesin the Search for Genesin the Search for Genes
Tuan V. Nguyen
University of New South WalesFaculty of Medicine
Genes and DiseasesGenes and Diseases
• Many diseases have their roots in gene and environment.
• Currently, >4000 diseases, including sickle cell anemia and cystic fibrosis, are known to be genetic and are passed on in families.
Genes and Medical SciencesGenes and Medical Sciences
The central question for the medical sciences is the extent to which it will be possible to relate events at the molecular level with the clinical findings or phenotypes of patients with particular diseases.
ContentsContents
• Genes and DNA
• Detection of genetic effects
• Search for specific genes
ChromosomesChromosomes
Each human cell contains 23 pairs of chromosomes (distinguished by size and banding pattern). This is for males. Females have two XX chromosomes
DNA and GenesDNA and Genes
• DNA carries the instructions that allow cells to make proteins.
• DNA is made up of 4 chemical bases (A, T, G, C).
• The bases make “words”: AGT CTC GAA TAA
• Words make “sentence” = genes:
< AGT CTC GAA TAA>
Genes, Alleles, and GenotypesGenes, Alleles, and Genotypes
• Location of a gene is called locus.
• Alleles are alternate forms of a gene. Example: A, a
• Genotype: the maternal and paternal alleles of an individual at a locus defines the genotype of the individual at that locus. Example: AA, Aa, aa.
How Do Genes Work?How Do Genes Work?
• Genes tell cell how to make molecules, called proteins.
• Protein allows cells to perform specific functions.
• If the instructions are fine, things will be normal. If the instructions are changed (mutated), abnormality will be resulted.
InheritanceInheritance
• The passing of genes from parents to child is the basis of inheritance.
• We are not identical to our parents: half of our genes are from our mothers and half from our fathers.
• Each brother and sister inherits different combination of chromosomes. N = 2^23 = 8,388,608 combinations.
• Identical twins receive exactly the same combination of genes from their parents.
Genetic effectsGenetic effects
• Three types of gene action: additive, dominant,and epistasis.
• Additive effect. – AA: 9, Aa = 7, aa = 5.
• Dominant effect. – AA: 9, Aa = 9, aa = 5.
• Epistasis: interaction of alleles ar 2 loci – For locus 1: AA: 9, Aa = 7, aa = 5.– For locus 2: AA: 5, Aa = 5, aa = 9.
How to detect genetic effects?How to detect genetic effects? How to detect genetic effects?How to detect genetic effects?
Clues to Genetics and EnvironmentClues to Genetics and Environment
Epidemiol characteristics Genetics EnvironmentGeographic variation + +Ethnic variation + +Temporal variation - +Epidemics +/- +Social class variation - +Gender variation + +Age +/- +Family variables
History of disease + +Birth order +/- +Birth interval - +Co-habitation - +
Methods of Investigation of Genetic TraitsMethods of Investigation of Genetic Traits
• Family studies. Examine phenotypes (diseases) in the relatives of affected subjects (probands).
• Twin studies. Examine the intraclass correlation between MZ (who share 100% genotypes) and DZ twins (who share 50% genotypes).
• Adoption studies. Seek to distinguish genetic from environmental effects by comparing phenotypes in children more closely resemble their biological than adoptive parents.
• Offspring of discordant MZ twins. Control for environmental effect; test for large genetic contribution to etiology.
Basic Genetic-Environmental ModelBasic Genetic-Environmental Model
Phenotype (P) = Genetics + Environment
Genetics = Additive (A) + Dominant (D)
Environment = Common (C) + Specific (E)
=> P = A + D + C + E
Cov(Yi,Yj) = 2ij2(a) + ij2(d) + ij2(c) + ij2(e)
ij : kinship coefficient
ij : Jacquard’s coefficient of identical-by-descent
ij : Probability of sharing environmental factors
ij : Residual coefficient
VP = VA + VD + VC + VE
Statistical Genetic ModelStatistical Genetic Model
V = variance; P = Phenotype; A, D, C, E = as defined
Kinship coefficientsKinship coefficients
Expected coefficient forRelative 2(a) 2(d) 2(c)Spouse-spouse 0 0 1Parent-offspring 1/2 0 1Full sibs 1/2 1/4 1Half-sibs 1/4 0 1Aunt-niece 1/4 0 1First cousins 1/8 0 0Dizygotic twins 1/2 1/4 1Monozygotic twins 1 1 1
Broad-sense heriatbility: H2 = (VA+ VD) / VP
Narrow-sense heriatbility: H2 = VA / VP
Cov(Yi,Yj) = 2ij2(a) + ij2(d) + ij2(c) + ij2(e)
VP = VA + VD + VC + VE
Heritability (HHeritability (H22))
Statistical Methods for Estimating HeritabilityStatistical Methods for Estimating Heritability
• Simple linear regression Yoffp = (Yp ) + e
H2 = 2
• Twin concordanceIntraclass correlation: rMZ and rDZ
H2 = 2(rMZ - rDZ)
• Path analysis and variance component model
Twin 1 Twin 2
E1 C1 D1 A1 A2 D2 C2 E2
Path Model for Twin DataPath Model for Twin Data
r = 1
r = .5 / .25
r = 1 / .5
a c d e a d c e
A=additive; D=dominant; C=common environment; E=specific environment
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Twin 1
Tw
in 2
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Twin 1
Tw
in 2
Intraclass Correlation: Intraclass Correlation: Femoral neck bone massFemoral neck bone mass
MZ DZ
rMZ = 0.73 rMZ = 0.47
rMZ rDZ H2 (%)
Lumar spine BMD 0.74 (0.06) 0.48 (0.10) 77.8
Femoral neck BMD 0.73 (0.06) 0.47 (0.11) 76.4
Total body BMD 0.80 (0.05) 0.48 (0.10) 78.6
Lean mass 0.72 (0.06) 0.32 (0.12) 83.5
Fat mass 0.62 (0.08) 0.30 (0.12) 64.8
Genetic Determination of Lean, Fat and Bone MassGenetic Determination of Lean, Fat and Bone Mass
rMZ, rDZ : Intraclass correlation for MZ and DZ twins
Multivariate Analysis: Multivariate Analysis: The Cholesky Decomposition ModelThe Cholesky Decomposition Model
Leanmass
Fatmass
LSBMD
FNBMD
TBBMD
E1 E2 E3 E4 E5
G1 G2 G3 G4 G5
LS=lumbar spine, FN=femoral neck, TB=total body, BMD = bone mineral density
LM FM LS FN TB
Lean mass (LM) 0.52 0.39 0.23 0.51
Ft mass (FM) 0.16 0.41 0.36 0.70
Lumbar spine BMD (LS) 0.08 0.02 0.57 0.70
Femoral neck BMD (FN) 0.16 0.05 0.64 0.61
Total body BMD (TB) 0.09 0.31 0.75 0.58
Genetic and Environmental Correlation between Genetic and Environmental Correlation between Lean, Fat and Bone MassLean, Fat and Bone Mass
Strategies for finding genesStrategies for finding genesStrategies for finding genesStrategies for finding genes
How many genes?How many genes?
• Initial estimate: 120,000.
• DNA sequence: 60,000 - 70,000.
• HGP: 32,000 - 39,000 (including non-functional genes = inactive genes).
Effect size
Num
ber of genes
Major genes
Polygenes
Oligogenes
Distribution of the number of genesDistribution of the number of genes
Finding genes: a challengeFinding genes: a challenge
One of the most difficult challenges ahead is to find genes involved in diseases that have a complex pattern of inheritance, such as those that contribute to osteoporosis, diabetes, asthma, cancer and mental illness.
Why Search for Genes?Why Search for Genes?
• Scientific value • Study genes’ actions at the molecular level
• Therapeutic value• Gene product and development of new drugs;
• Gene therapy
• Public health• Identification of “high-risk” individuals
• Interaction between genes and environment
Genomewise screening vs Genomewise screening vs Candidate aene approachCandidate aene approach
• Genomewise screening• No physiological assumption
• Systematic screening for chromosomal regions of interest in the entire genome
• Candidate gene• Proven or hypothetical physiological mechanism
• Direct test for individual genes
Linkage vs AssociationLinkage vs Association
• Linkage• Transmission of genes within pedigrees
• Association• Difference in allele frequencies between cases and
unrelated controls
Statistical modelsStatistical models
• Linkage analysis traces cosegregation and recombination phenomena between observed markers and unobserved putative trait. Significance is shown by a LOD (log-odds) score.
• Association analysis compares the frequencies of alleles between unrelated cases (diseased) and controls.
• Transmission disequilibrium test (TDT) examines the transmission of alleles from heterozygous parents to those children exhibiting the phenotype of interest.
Two-point linkage analysis: an exampleTwo-point linkage analysis: an example
??138 /142
134 /142 146 / 154
142 /146 142 /154 134 / 146 142 / 154 134 / 146 134 / 154 134 / 146 134 / 154
Non Rec Non Non Non Non Rec Non
D142
D142
d134
Non = non-recombination; Rec = recombination
134
142
D d
1/4 1/4
1/41/4
134
142
D d
0 1/2
01/2
134
142
D d
(1-)/2
/2(1-)/2
No linkage Complete linkage
Incomplete linkage
8
26
10
41
221
log
θθ
LOD
LODscore
Estimated value of 0 0.1 0.2 0.3 0.4 0.5
Estimation of Estimation of
-6
-4
-2
0
+2
+4
+6Max LOD score
Basic linkage modelBasic linkage model
LR: likelihood ratio
LR() = L(data | ) / L(data | = 0.5)
LOD = Log10 max [LR()]
Haseman-Elston modelHaseman-Elston model(allele sharing method)(allele sharing method)
Xi1 = value of sib 1; Xi2 = value of sib 2 i = abs(Xi1 - Xi2)2
i = probability of genes shared identical-by-descentE(i | i) = + i
If = 0 => 2(g) = 0; = 0.5, i.e. No linkageIf < 0 => 2(g) > 0; ne 0.5, i.e. Linkage
Behav Genet 1972; 2:3-19
Identical-by-descent (IBD)Identical-by-descent (IBD)
126 / 130 134 / 138
126 / 134 126 / 138 130 / 134 130 / 138 126 / 138 A B C D E
• A and D share no alleles• A, B and E share 1 allele (126) ibd; C vs D; A vs C; B, D and E• B and E share 2 (126 and 138) alleles ibd
Alleles ibd if they are identical and descended from the same ancestral allele
Identical-by-state (IBS)Identical-by-state (IBS)
126 / 126 126 / 138
126 / 126 126 / 138 126 / 138 126 / 126 A B C D
• A and D share 1 allele (126) ibs• B and C share 126 ibs, 138 ibd
Alleles ibs if they are identical, but their ancestral derivation is unclear
oooooooo
o
ooooooooo
ooooooooo
Squareddifference in BMDamong siblings
Number of alleles shared IBD
0 1 2
Sibpair linkage analysis: Sibpair linkage analysis: allele-sharing methodallele-sharing method
0
5
10
15
20
25
0 1 2
Alleles shared IBD
Intr
apai
r di
ffer
ence
(%
)
Linkage between VDR gene and lumbar spine bone mineral density in a sample of 78 DZ twin pairs. Nature 1994; 367:284-287
Association analysisAssociation analysis
• Presence/absence of an allele in a phenotype.
Genotype Fx No FxBB 50 10Bb 30 30bb 20 60Total 100 100
Frequency of allele B among fx: (50x2 + 30) / (100x2) = 0.65Freq. of allele B among no fx: (10x2 + 30) / (100x2) = 0.25
Association analysis: an exampleAssociation analysis: an example
0.8
0.9
1
1.1
BB Bb bb
VDR genotype
g/cm
2
Association between vitamin D receptor gene and bone mineral density
Association analysisAssociation analysis
• Three conditions of association• The genetic marker is the putative gene
• The marker is in linkage disequilibrium (association) with the putative gene or with a nearby locus
• Random artefact, population admixture
Linkage and associationLinkage and association
• Linkage without association• Many trait-causing loci
• Association between a marker and a loci can be weak or absent
• Association without linkage• A minor effect of the genetic marker
• Poor discriminant power for phenotype within a pedigree
Statistical issuesStatistical issues
Diagnostic reasoning
Disease is really
Test Present Absent______________________________________________
+ve True +ve False +ve
-ve False -ve True -ve______________________________________________
Statistical reasoning
Null hypothesis (Ho) is
Stat test Not true True______________________________________________
Reject Ho No error Type I ()
Accept Ho Type II () No error______________________________________________
Study design: minimize type I and type II errors
LOD = 3 LOD = 4
1.1 7460 89311.2 2048 25661.3 1033 12991.5 489 6152.0 199 2421.5 191 1543.0 88 115
No. of sibpairs required to establish linkage No. of sibpairs required to establish linkage for a single gene and recombination = 0for a single gene and recombination = 0
= familial relative risk
Strategies for improvement of powerStrategies for improvement of power
• Population and sampling
• Phenotypes
• Statistical analysis
Population and samplingPopulation and sampling
• Population• Homogenous populations
• Sampling units• Related members
• Large, multigenerational families (rather than sibpairs)
• Phenotypes• Low-level, intermediate
• Well-defined and highly reproducible
Statistical analysesStatistical analyses
• Multivariate analysis vs. univariate analysis
• Variance component model
• Power• Locus-specific power: probability of detecting an
individual locus associated with the trait, e.g. 1-i
• Genomewide power: probability of detecting any of the k loci, e.g. 1-1 x 2 x 3 x … x k
• Studywise power: probability of detecting all k loci, e.g. (1-1) x (1-2) x (1-3) x ... x (1-k)
SummarySummary
• Most diseases are regulated by genes and environment.
• Genetic dissection of multifactorial diseases is a challenge.
• Gene-hunting is a major endeavour in epidemiological research.
• Substantial progress in statistical models.
PerspectivePerspective
• Can genes be found?
• The Human Genome Project
• Influences of biotechnology
• Should “epidemiology” become “genetic epidemiology”?
• BMJ 2001; 322: 28 April. Special issue on genetics.• Nguyen TV, Eisman JA. Genetics of fracture:
challenges and opportunities. J Bone Miner Res 2000; 15:1253-1256.
• Nguyen TV, Blangero J, Eisman JA. Genetic epidemiological approaches to the search for osteoporosis genes. J Bone Miner Res 2000; 15:392-401.
• Nguyen TV, et al. Bone mass, lean mass and fat mass: same genes or same environment. Amer J Epidemiol 1998; 147:3-16.
Further readingsFurther readings