View
219
Download
2
Category
Preview:
Citation preview
Human non-synonymous SNP: molecular function, evolution and disease
Shamil Sunyaev
Genetics Division, Brigham & Women’s Hospital
Harvard Medical School
Harvard-M.I.T. Division of HST
Effect on molecular function
Phenotype
Natural selection
Medical Genetics
Structural Biology Biochemistry
Evolutionary Genetics
Why is this useful?
Understanding variation in molecular function and structure
Evolutionary genetics: comparison of polymorphism and divergence rates between different functional categories is a robust way to detect selection
Why is this useful?
Rare human developmental disorders / mouse mutagenesis screens: linkage studies are impossible
Genetics of complex disease: SNP prioritization
Genetics of complex disease: Rare variants
Common disease / Common variant
Trade off (antagonistic pleiotropy) Balancing selectionRecent positive selection Reverse in direction of selection
ExamplesAPOE Alzheimer’s diseaseAGT HypertensionCYP3A HypertensionCAPN10 Type 2 diabetes
Individual human genome is a target for deleterious mutations !
~40% of human Mendelian diseases are due to hypermutable sites
Frequency of deleterious variants is directly proportional to mutation rate (q=/s)
Multiple mostly rare variants
Many deleterious alleles in mutation-selection balance
Examples
Plasma level of HDL-CPlasma level of LDL-CColorectal adenomas
Harmful mutations
Function: damaging
Evolution: deleterious
Phenotype: detrimental
Advantageous pseudogenization (Zhang et al. 2006)
Gain of function disease mutations
Sickle Cell Anemia
N E L V T L T C L A R G F S - P K D V L V R W L R E S A T I T C L V T G F S - P A D V F V Q W M G G S L R L S C V A S G I T - F S G Y D M Q W V T P G L T L T C T V S G F S - L S S Y D M G W V G Q K A K M R C I P E - - - - K G H P V V F W Y G Q E A T L W C E P I - - - - S G H S A V F W Y G Q Q V T L S C F P I - - - - S G H L S L Y W Y R K D V S L T C L V V G F N - P G D I S V E W T G Q K L T L K C Q Q N - - - - F N H D T M Y W Y R D K A T F T C F V V G S D - L K D A H L T W E S K S A T L T C R V S N M V N A D G L E V S W W G A R T S L N C T F S D - - - S A S Q Y F W W Y G A S L Q L R C K Y S Y - - - S A T P Y L F W Y N G A P K L T C L V V D L E S E K N V N V T W N E A T V T L T C V V S N - - A P Y G V N V S W T
Profile
Ala -1.2 1.1 -0.6 -0.8 0.3 ... ... Arg 0.6 -0.3 -0.3 -0.5 0.6 ... ... Asn -1.1 -0.5 -0.5 -0.7 0.4 ... ... Asp -0.9 -0.3 -0.3 -0.5 0.6 ... ... Cys 0.4 -0.5 0.6 0.8 -0.3 ... ... Gln ... ... ... ... ... ... ...
... ... ... ... ... ... ... ...
protein
multiple alignment
profile
Prediction rate of damaging substitutions
possibly probably
Disease mutations
Divergence
82% 57%
9% 3%
Polymorphism 27% 15%
Polyphen
Phylogenetic measures
PAM-120 -5.32 -8.35* -12.76*
BLOSUM-45 -8.41* -3.96 -13.39*
BLOSUM-62 -8.41* -4.09 -12.75*
BLOSUM-80 -8.46* -4.49 -13.52*
Site-specific structural/phylogenetic measures
-23.602* -6.072* -11.732*
Estimate of selection coefficient
Williamson et al., PNAS 2005
de novo mutation effect spectrum
NO DELETERIOUS POLYMORPHISM
LOTS OF DELETERIOUS POLYMORPHISM
Effect of new mutation may range from lethal, to neutral, to slightly beneficial
Neutral mutation model
Human ACCTTGCAAATChimpanzee ACCTTACAAATBaboon ACCTTACAAAT
Prob(TAC->TGC) Prob(TGC->TAC)
Prob(XY1Z->XY
2Z) 64x3 matrix
Mildly deleterious mutations
54 genes, 757 individuals
inflammatory response236 genes, 46-47 individuals
DNA repair and cell cycle pathways
518 genes, 90-95 individuals
Set Number of sequencedindividuals
Percent of deleteriousSNPs among missense
“singlets”
McPherson set 757 70%
NIEHS- EGP 90- 95 63%
SeattleSNPs 46- 47 54%
The majority of missense mutations observed at frequency below 1% are deleterious
Frequency itself is a reliable predictor of function!
Fitness and selection coefficient
Wild type New mutation
N1= 4 N2= 3
Fitness 1N1
N2 = 1 – s
Selection coefficient
Mildly deleterious mutations
54 genes, 757 individuals
inflammatory response236 genes, 46-47 individuals
DNA repair and cell cycle pathways
518 genes, 90-95 individuals
Human effective population size
present
past
1001001100111101010010010111010100001111001100011100010111001
Estimation of selection coefficient - simulation
Estimation of selection coefficient - simulation
Human effective population size
present
past0 1 2 3 4 5 6
0
0.2
0.4
0.6
0.8
1
1.2
-log(s)
Fsingl
(s) FMAF>25%
(s)
Selection coefficient
SNP probability to be observed
Rare missense variants in NPC1L1 gene contributes to variability in cholesterol absorption and plasma levels of low-density lipoproteins (LDLs)
Cohen J et al., PNAS 2006 in press
Nonsynonymous sequence variants in ABCA1 gene were significantly more common in individuals with low HDL-C (<fifth percentile) than in those with high HDL-C (>95th percentile).
Cohen J et al., Science 2004
Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas
Fearnhead NS et al., PNAS 2004
“Mutation enrichment” association studies
What about common alleles of smaller effect?
Population of 3500 individuals with known plasma levels of HDL-C
Population includes both genders and three ethnic groups
839 SNPs genotyped
Independent population of 800 individuals for validation
What about common alleles of smaller effect?
Introduce a linear model (ANCOVA)
Subsequently add SNPs to the linear model
Include SNPs based on the likelihood ratio test
Prioritizing SNPs based on conservation did not help
Recommended