45
Human non-synonymous SNP: molecular function, evolution and disease Shamil Sunyaev Genetics Division, Brigham & Women’s Hospital Harvard Medical School Harvard-M.I.T. Division of HST

Human non-synonymous SNP: molecular function, evolution and disease Shamil Sunyaev Genetics Division, Brigham & Women’s Hospital Harvard Medical School

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Human non-synonymous SNP: molecular function, evolution and disease

Shamil Sunyaev

Genetics Division, Brigham & Women’s Hospital

Harvard Medical School

Harvard-M.I.T. Division of HST

Effect on molecular function

Phenotype

Natural selection

Medical Genetics

Structural Biology Biochemistry

Evolutionary Genetics

Predicting the effect of mutations in proteins

Why is this useful?

Understanding variation in molecular function and structure

Evolutionary genetics: comparison of polymorphism and divergence rates between different functional categories is a robust way to detect selection

Linkage analysisRare

Classical association studies

ControlDisease

Common

Why is this useful?

Rare human developmental disorders / mouse mutagenesis screens: linkage studies are impossible

Genetics of complex disease: SNP prioritization

Genetics of complex disease: Rare variants

Technically, polymorphism should not exist!

Quantitative trait

Mendelists Biometricians

Forces to maintain variation:

Selection

Mutation

Common disease / Common variant

Trade off (antagonistic pleiotropy) Balancing selectionRecent positive selection Reverse in direction of selection

ExamplesAPOE Alzheimer’s diseaseAGT HypertensionCYP3A HypertensionCAPN10 Type 2 diabetes

Individual human genome is a target for deleterious mutations !

~40% of human Mendelian diseases are due to hypermutable sites

Frequency of deleterious variants is directly proportional to mutation rate (q=/s)

Multiple mostly rare variants

Many deleterious alleles in mutation-selection balance

Examples

Plasma level of HDL-CPlasma level of LDL-CColorectal adenomas

What about late onset phenotypes?

Harmful mutations

Function: damaging

Evolution: deleterious

Phenotype: detrimental

Advantageous pseudogenization (Zhang et al. 2006)

Gain of function disease mutations

Sickle Cell Anemia

N E L V T L T C L A R G F S - P K D V L V R W L R E S A T I T C L V T G F S - P A D V F V Q W M G G S L R L S C V A S G I T - F S G Y D M Q W V T P G L T L T C T V S G F S - L S S Y D M G W V G Q K A K M R C I P E - - - - K G H P V V F W Y G Q E A T L W C E P I - - - - S G H S A V F W Y G Q Q V T L S C F P I - - - - S G H L S L Y W Y R K D V S L T C L V V G F N - P G D I S V E W T G Q K L T L K C Q Q N - - - - F N H D T M Y W Y R D K A T F T C F V V G S D - L K D A H L T W E S K S A T L T C R V S N M V N A D G L E V S W W G A R T S L N C T F S D - - - S A S Q Y F W W Y G A S L Q L R C K Y S Y - - - S A T P Y L F W Y N G A P K L T C L V V D L E S E K N V N V T W N E A T V T L T C V V S N - - A P Y G V N V S W T

Profile

Ala -1.2 1.1 -0.6 -0.8 0.3 ... ... Arg 0.6 -0.3 -0.3 -0.5 0.6 ... ... Asn -1.1 -0.5 -0.5 -0.7 0.4 ... ... Asp -0.9 -0.3 -0.3 -0.5 0.6 ... ... Cys 0.4 -0.5 0.6 0.8 -0.3 ... ... Gln ... ... ... ... ... ... ...

... ... ... ... ... ... ... ...

protein

multiple alignment

profile

PolyPhen

Prediction rate of damaging substitutions

possibly probably

Disease mutations

Divergence

82% 57%

9% 3%

Polymorphism 27% 15%

10% of PolyPhen false-positives are due to compensatory substitutions

    Polyphen

Phylogenetic measures

    PAM-120 -5.32   -8.35* -12.76*

    BLOSUM-45 -8.41* -3.96 -13.39*

    BLOSUM-62 -8.41* -4.09 -12.75*

    BLOSUM-80 -8.46* -4.49 -13.52*

Site-specific structural/phylogenetic measures

-23.602* -6.072* -11.732*

Estimate of selection coefficient

Williamson et al., PNAS 2005

de novo mutation effect spectrum

NO DELETERIOUS POLYMORPHISM

LOTS OF DELETERIOUS POLYMORPHISM

Effect of new mutation may range from lethal, to neutral, to slightly beneficial

Mutation effect spectrum

NO DELETERIOUS POLYMORPHISM

LOTS OF DELETERIOUS POLYMORPHISM

?

Neutral mutation model

Human ACCTTGCAAATChimpanzee ACCTTACAAATBaboon ACCTTACAAAT

Prob(TAC->TGC) Prob(TGC->TAC)

Prob(XY1Z->XY

2Z) 64x3 matrix

Strongly detrimental mutations

Effectively neutral mutations

Mildly deleterious mutations

Mildly deleterious mutations

54 genes, 757 individuals

inflammatory response236 genes, 46-47 individuals

DNA repair and cell cycle pathways

518 genes, 90-95 individuals

Set Number of sequencedindividuals

Percent of deleteriousSNPs among missense

“singlets”

McPherson set 757 70%

NIEHS- EGP 90- 95 63%

SeattleSNPs 46- 47 54%

The majority of missense mutations observed at frequency below 1% are deleterious

Frequency itself is a reliable predictor of function!

Fitness and selection coefficient

Wild type New mutation

N1= 4 N2= 3

Fitness 1N1

N2 = 1 – s

Selection coefficient

Mildly deleterious mutations

54 genes, 757 individuals

inflammatory response236 genes, 46-47 individuals

DNA repair and cell cycle pathways

518 genes, 90-95 individuals

Fraction of detectable

polymorphism

Human effective population size

present

past

1001001100111101010010010111010100001111001100011100010111001

Estimation of selection coefficient - simulation

Estimation of selection coefficient - simulation

Human effective population size

present

past0 1 2 3 4 5 6

0

0.2

0.4

0.6

0.8

1

1.2

-log(s)

Fsingl

(s) FMAF>25%

(s)

Selection coefficient

SNP probability to be observed

Classical association studies

ControlDisease

Common

“Mutation enrichment” association studies

ControlDisease

Rare

Rare

ControlDisease

“Mutation enrichment” association studies

Rare missense variants in NPC1L1 gene contributes to variability in cholesterol absorption and plasma levels of low-density lipoproteins (LDLs)

Cohen J et al., PNAS 2006 in press

Nonsynonymous sequence variants in ABCA1 gene were significantly more common in individuals with low HDL-C (<fifth percentile) than in those with high HDL-C (>95th percentile).

Cohen J et al., Science 2004

Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas

Fearnhead NS et al., PNAS 2004

“Mutation enrichment” association studies

Cholesterol

Adopted from Brewer et al., 2003

Effect of rare nsSNPs on HDL-C

What about common alleles of smaller effect?

Population of 3500 individuals with known plasma levels of HDL-C

Population includes both genders and three ethnic groups

839 SNPs genotyped

Independent population of 800 individuals for validation

What about common alleles of smaller effect?

Introduce a linear model (ANCOVA)

Subsequently add SNPs to the linear model

Include SNPs based on the likelihood ratio test

Prioritizing SNPs based on conservation did not help

Effect of common SNPs on HDL-C

HDL

And a different population…

HDL

Acknowledgements

The lab:

Gregory Kryukov, Steffen Schmidt, Saurabh Asthana, Victor Spirin, Ivan Adzhubey

Bioinformatics: Human genetics:

Vasily Ramensky Jonathan Cohen