View
218
Download
0
Embed Size (px)
Citation preview
A comparative study of adaptive molecular evolution in different HIV clades
Marc ChoisyCEPM UMR CNRS-IRD 9926
Montpellier, France
HIV-2
HIV-1
HIV-2
HIV-1 M
HIV-1 N
HIV-1 O
Adaptive evolution ?
Purifying selection
Adaptive evolution
Darwinian evolution(mutation + selection)
Neutral evolution(mutation alone)
No selectionSelection
A A A T GGCA T
Lys TrpThr
dS
Lys TrpThr
A A T GGCA T G
dN
Asn TrpThr
A A T GGCA T C
Molecular evolutiondN/dS>1
dN/dS<1
dN/dS=1
envelop proteins
env
Lipid membrane
gp41gp120
Capsid
gagRNA
0 10000
pol
enzymes
1
HIV-2 A HIV-1 M A
HIV-1 M B
HIV-1 M C
dN/dS
0
HIV-2 A
HIV-1 M AHIV-1 M BHIV-1 M C
w = dN/dS
w = dN/dS = p0*w0+ p1*w1+ p2*w2+ p3*w3
w3w2w1w0
p1
p2p0
p3
w = dN/dS = p0*w0+ p1*w1+ p2*w2
LRT
w = dN/dS = p0*w0+ p1*w1+ p2*w2
w = dN/dS = p0*w0+ p1*w1+ p2*w2
0.0
0.2
0.4
0.6
0.8
1.0
Prior probabilities : p0, p1, and p2
0.0
0.2
0.4
0.6
0.8
1.0
Posterior probabilities : f0i, f1
i, and f2i
wi = dN/dSi = f0
i*w0+ f1i*w1+ f2
i*w2
5’ 3’
2
95%
1
HIV-1 M AHIV-1 M BHIV-1 M DHIV-1 M C
HIV-1 O
HIV-2 A
0.0
0.2
0.4
0.6
0.8
1.0
95%
2
1
wi = f0i*w0+ f1
i*w1+ f2i*w2
0.0
0.2
0.4
0.6
0.8
1.0
95%
1
HIV-1 M A1 MRAMGIQR-NCQ-SL-WRWG------IMILGMIIICSAA----GNLWVTVYYGVPVWKDAET--TLFCASDAKAYTTEVH 80 HIV-1 M B MTVKGTRK-NYR-HL-WTWG------TMILGILMICSAA----NNFWVTVYYGVPVWREATT--TLFCASDAKAYDEEVH 80 HIV-1 M C MRATGIQR-NCQ-QW-WIWG------ILGFWMLMICNVM----GNLWVTVYYGVPVWKEATT--TLFCASDAKAYDTEAH 80
HIV-1 M D MRVKGIKR-NYQ-PL-WKWG------IMLLGMLMMTYSAA---DNLWVTVYYGVPVWKEATT--TLFCASDAKSYKTEAH 80
HIV-1 O MKA----MEKR---NKKLW-----TLYLAMALITPCLSL----RQLYATVYAGVPVWEDATP--VLFCASDANLTSTEKH 80 SIVcpz MR-----KPIH-----IIW---GLALLIQFIE----KGT----NEDYVTVFYGVPVWRNATP--TLFCATNASMTSTEVH 80
HIV-1 M A1 NVWATHACVPTDPNPQEINLENVTEKFNMWKNNMVEQMHTDIISLWDQSLQPCVKLTPLCVTLNCSDVNVNAT------- 160 HIV-1 M B NVWATHACVPTDPNPQEVELINVTENFNMWKNNMVEQMHEDIISLWDQSLKPCVKLTPLCVSLNCTDAKNTTN------- 160 HIV-1 M C NVWATHACVPTDPNPQEMVLKNVTEDFNMWKNDMVDQMHQDIISLWDQSLKPCVKLTPLCVTLDCKNANATHN------- 160
HIV-1 M D NIWATHACVPTDPNPQEIELKNVTENFNMWKNNMVDQMHEDIISLWDQSLKPCVKLTPRCVTLNCTDASRNST------- 160
HIV-1 O NIWASQACVPTDPTPYEYPLHNVTDDFNIWKNYMVEQMQEDIISLWDQSLKPCVQMTFLCVQMECTNIAGTTN------- 160 SIVcpz NVWATTSCVPIDPDPIVVRL-NTSVWFNAYKNYMVESMTEDM?QLFQQSHKPCVKLTPMCIKMNCTGYNGTPT------- 160
HIV-1 M A1 -------------NYNVNDTINMKEEIRNCSFNMTTE-LRDRKQKVYSLFYRLDVVQMNN-------------------S 240 HIV-1 M B -----SNTNSSSSTNSSSLEQGKAGEIKNCSFNITTN-MRDKVQKQYALFYSLDIVPIDD-------------------K 240 HIV-1 M C ---------------GTIDNRTMGGEIKNCSFNITTE-LKDKKQRAHALFYSLDIVQLDG-------------------- 240
HIV-1 M D -------------DNNSTLPTVKPGEMKNCSFNITTV-VTDKRKQVHALFYRLDVVQIDN-------------EGKNEIN 240 HIV-1 O -----------------------ENLMKKCEFNVTTV-IKDKKEKKQALFYVSDLMELNE--------------TSSTNK 240 SIVcpz ----TPSTTTSTVTPKTTTPIVDGMKLQECNFNQSTG-FKDKKQKMKAIFYKGDLMKCQD-------------------N 240
HIV-1 M A1 NNSNQYRLINCNTSAI-TQACPKVSFEPIPIHYCAPAGYAILKCKDKEFNGT--GLCKNVST-VQCTHGIKPVVSTQLLL 320 HIV-1 M B GNDTSYRLISCNTSVI-TQACPKISFEPIPIHYCAPAGFAILKCNEKGFNGK--GPCKNVST-VQCTHGIRPVVSTQLLL 320 HIV-1 M C --GGSYRLISCNTSAI-TQACPKVSFDPIPIHYCAPAGYAILKCNNKTFNGT--GQCNNVST-VQCTHGIKPVISTQLLL 320
HIV-1 M D DTYGTYRLINCNTSAI-TQACPKVSFEPIPIHYCAPAGFAILKCNDKRFNGT--GPCKNVSS-VQCTHGIRPVVSTQLLL 320 HIV-1 O TNSKMYTLTNCNSTTI-TQACPKVSFEPIPIHYCAPAGYAIFKCNSTEFNGT--GTCRNITV-VTCTHGIRPTVSTQLIL 320 SIVcpz NETNCYYLWHCNTTTI-TQSCEKSTFEPIPIHYCAPAGYAILRCEDEDFTGV--GMCKNVSV-VHCTHGISPMVATWLLL 320
HIV-1 M A1 -NGSLAE-SKVMIRSENIT-NNAKNILVQLTSPVNISCIRPNNNT--RKSVRI---GPGQAFYATGE----IIGNIRQAY 400 HIV-1 M B -NGSLAE-EEVVIKSDNFT-NNAKTIIVQLNTSVEITCVRPNNNT--RRSIPI---GPGRAFYTT-E----IIGDIRQAY 400 HIV-1 M C -NGSLAE-EEIIIRSENLT-NNAKTIIVHLKDPVEIECTRPNNNT--RKSIRI---GPGQILYATGD----IIGDIRQAH 400
HIV-1 M D -NGSLAE-EEIVIRSENLT-NNAKIIIVHLNQSVEINCTRPYKKE--RQRTPI---GQGQALYTTRY----TTRIIGQAY 400 HIV-1 O -NGTLSK-GKIRMMAKDIL-EGGKNIIVTLNSTLNMTCERPQI-D--IQEMRI---GPMAWYSMGIG--GTAGNSSRAAY 400 SIVcpz -NGTYQT-NTSVVMNGRKN-ESVLVRFGKEFENLTITCIRPGNRT--VRNLQI---GPGMTFYNVEI----ATGDTRKAF 400
HIV-1 M A1 CNVNRSEWNEALREVVKQLR--TYF------NKTIIFDNSS-GGDLEITTHSFNCGGEFFYCNTSRLFNSTW-----NDT 480 HIV-1 M B CNITKANWTDTLQKVAIKLR--EQF------NKTIAFKPSS-GGDPEIVTHSFNCGGEFFYCNSTQLFNGTW-----NGT 480 HIV-1 M C CNINETKWNKTLQDVSEKLA--KYFP-----NKTINFAQPS-GGDLEIVTHSFNCRGEFFYCNTSKLFNSTD-------- 480
HIV-1 M D CNISGVKWNNTLRQVARKLG--NLLN-----QTKIIFKPSS-GGDPEITTHSFNCGGEFFYCNTSGLFNSAW------NI 480 HIV-1 O CKYNATDWGKILKQTAERYL-ELVNNT---GSINMTFNHSS-GGDLEVTHLHFNCHGEFFYCNTAKMFNYTF----SCNG 480 SIVcpz CTVNKTLWEQARNKTEHVLA--EHWKKVDNKTNAKTIWTF-QDGDPEVKVHWFNCQGEFFYCDITPWFNATY------TG 480
HIV-1 M A1 TSMLN--DTKPN-------DTITLPCRIKQIINM-WQR-AGQAIYAPPIQ-GVIRCESNITGLILTRDGGGN-------S 560 HIV-1 M B WINGTWKSSYGNDT-----TNITLPCRIKQIINM-WQE-VGKAMYAPPIR-GQIKCTSNITGLLLTRDGGNS-----NTT 560 HIV-1 M C -RSS---STES--------ANITIPCRIKQIINM-WQG-VGRAMYAPPIK-GKITCKSNITGLLLTRDGGTT-------- 560 HIV-1 M D SGHS---TGLND-------TIITIPCRIKQIINM-WQE-VGKAMYAPPIE-GQINCSSNITGLLLTRDGGAN-------- 560 HIV-1 O TTCS----VSNVSQG-NN-GT--LPCKLRQVVRS-WIR-GQSGLYAPPIK-GNLTCMSNITGMILQMDNTWN-------S 560 SIVcpz NLI-------TN-------GALIAHCRIKQIVNH-WGI-VSKGIYLAPRR-GNVSCTSSITGIMLEGQIYNE-------- 560 HIV-1 M A1 SVSSETFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRVV------EREKRAV-G-LGAV-FIG-FLGAAGSTMG 640 HIV-1 M B DNSTETFRPGGGDMRDNWRSELYRYKVVQVEPLGIAPTRAKRRVV------QREKRAV-G-IGAM-FLG-FLGAAGSTMG 640 HIV-1 M C NDSTEAFRPAGGDMKDNWRSELYKYKVVEIKPLGVAPTKAKRRVV------EREKRAV-G-IGAV-FLR-FLGAAGSTMG 640 HIV-1 M D NTQNDTFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRVV------EREKRAI-G-LGAM-FLG-FLGAAGSTMG 640 HIV-1 O SNNNVTFRPIGGDMKDIWRTELFNYKVVRVKPFSVAPTRIARPVISTRT--HREKRAV-G-LGML-FLG-VLSAAGSTMG 640 SIVcpz ---TVKVSP-AARVADQWRAELSRYQVVEI?PLSVAPTT?KRPEIKQH---SRQKRGI-G-IGLF-FLG-LLSAAGSTMG 640 HIV-1 M A1 AASITLTVQARQLLSGIVQQQ-SNLLRAIEAQQHLLKLTVWGIKQLQARVLALERYLRD-QQILGIWGCSGKLICTTNVP 720 HIV-1 M B AASVTLTVQARQLLSGIVQQQ-NNLLRAIEAQQHMLQLTVWGIKQLQARVLAVERYLRD-QQLLGLWGCSGKLICTTTVP 720 HIV-1 M C AASITLTVQARQLLSGIVQQQ-SNLLRAIETQQHMLQLTVWGIKQLQTRVLAIERYLKD-QQLLGIWGCSGKLICTTAVP 720 HIV-1 M D AASMTLTVQARQVLSGIVQQQ-NNLLRAIEAQQHLLQLTVWGIKQLQARILAVERYLKD-QQLLGIWGCSGKHICTTTVP 720 HIV-1 O AAATTLAVQTHTLLKGIVQQQ-DNLLRAIQAQQQLLRLS?WGIRQLRARLLALETLLQN-QQLLSLWGCKGKLVCYTSVK 720 SIVcpz AASIALTAQTRNL?HGIVQQQ-ANLLQAIETQQHLLQLSVWGVKQLQARMLAVEKYLRD-QQLLSLWGCADKVTCHTTVP 720 HIV-1 M A1 WNSSWS-N---------KSQSEIW--ENMTWLQWDKEISNYTHIIYTLLEESQIQQEKNEQDLLALDKWANLWNWFDITN 800 HIV-1 M B WNTSWS-N---------KTLEKIW--DNMTWMQWETEINNYTNIIYTLLEESQNQQEKNEKDLLELDQWANLWNWFTISN 800 HIV-1 M C WNDSWSNN---------KTLGDIW--NSTTWMQWDREISNYTNTIYRLLEDSQNQQEQNEKDLLALDKWQNLWSWFDITN 800 HIV-1 M D WNSSWS-N---------RSVEYIW--GNMTWMQWEREIDNYTGLIYNLIEESQIQQEKNEKELLELDKWASLWNWFSITQ 800 HIV-1 O WNRTWI-G-----------NESIW--DTLTWQEWDRQISNISSTIYEEIQKAQVQQEQNEKKLLELDEWASIWNWLDITK 800 SIVcpz WNNSWV-NFTQTCAKNSSDIQCIW--ENMTWQEWDRLVQNSTGQIYNILQIAHEQQERNKKELYELDKWSSLWNWFDITQ 800 HIV-1 M A1 WLWYIKIFIM-IVGGLIGLRIVFTVLSIINRVRQGYSPLSFQTHTP-NPGG-LDRPRRIEEEGGEQDRDRSIRLVGGFLT 880 HIV-1 M B WLWYIKIFIM-IVGGLIGLRIVFTVFSIVNRVRQGYSPLSFQTHLP-TPRG-PDRPEGIEEEGGERGRGSSTRLVHGFLA 880 HIV-1 M C WLWYIKIFIM-IVGGLIGLRIIFAILSIVNRVRQGYSPLSFQTLTP-SPRG-PDRLGRIEEEGGEQDRDRSIRLVSGFLA 880 HIV-1 M D WLWYIKIFIM-IVGGLIGLRIVFAVLSIVNRVRQGYSPLSFQTLLP-APRG-PDRPEGIEEEGGEQDRGTSIRLVNGFSA 880 HIV-1 O WLWYIKIAII-IVGALVGVRVIMIVLNIVKNIRQGYQPLSLQIPNH-HQEE-AGTPGRTGGGGGEEGRPRWIPSPQGFLP 880 SIVcpz WLWYIKIFIM-IVGAIVGLRILLVLVSCLRKVRQGYHPLSFQIPTQ-NQQD-PEQPEEIREEGGRKDRIRWRALQHGFFA 880 HIV-1 M A1 LVWDDLRSLCLFSYHRLRDFTLIAARTVELLGHSSLKGLRLGWEGLKYLGNLLLYWGRELRISAINLLDTIAIIIAGWTD 960 HIV-1 M B LFWDDLRSLCLFSYHRLRDLLLIVTRIVELLGRG-------GWEALKYWWNLLQYWRQELQKSAVSLFNATAIAVAEGTD 960 HIV-1 M C LAWDDLRSLCLFSYHRLRDLILIATRVVELLGRSSLRGLQRGWEILKYLGSLVQYWGLELKKSAINLLNITAIAVAEGTD 960 HIV-1 M D LIWDDLRNLCLFSYHRLRELILIAARIVELLGRR-------GWEALKYLWNLLQYWSRELKNSAISLVDATAIAVAEGTD 960 HIV-1 O LLYTDLRTIILWTYHLLSNLASGIQKVISYLRLGLWILGQKIINVCRICAAVTQYWLQELQNSATSLLDTLAVAVANWTD 960 SIVcpz LLWVDLTSIIQWIYQICRTCLLNLWAVLQHL------CRITFRLCNHLENNLSTLW-TIIRTEIIKNIDRLAIWVGEKTD 960
2 wi = f0i*w0+ f1
i*w1+ f2i*w2
Test 1 :H0 : no match between +vely selected sitesH1 : match between +vely selected sites
Positively selected sites tend to occur at the same position in different clades, except for HIV-2
HIV data set HIV-1 M:A HIV-1 M:B HIV-1 M:C HIV-1 M:D HIV-1 O E 2.018 HIV-1 M:B O 13 P 0.001
E 1.990 2.015 HIV-1 M:C O 16 15 P 0.001 0.001
E 1.760 1.793 1.741 HIV-1 M:D O 10 14 15 P 0.001 0.001 0.001
E 1.016 0.996 0.889 0.848 HIV O O 7 7 8 6 P 0.001 0.001 0.001 0.001
E 0.633 0.722 0.571 0.511 0.729 HIV-2 A O 3 1 2 2 1 P 0.024 0.535 0.104 0.091 0.539
Test 2 :H0 : random repartition of +vely selected sites with respect to epitopesH1 : no match between +vely selected sites and epitopesH2 : match between +vely selected sites and epitopes
Positively selected sites do not tend to be related to epitopic sites
Epitopes NIN OIN EIN PIN NOUT OOUT EOUT POUT
Ab 370 18 22.17 0.946 208 17 12.53 0.072
CTL 394 19 23.82 0.976 184 16 11.12 0.053
Th 499 26 30.16 0.989 79 9 4.78 0.028
Ab and CTL 507 30 30.64 0.726 71 5 4.17 0.401
Ab and Th 537 33 32.40 0.496 41 2 1.90 0.612
CTL and Th 524 27 31.67 0.999 54 8 2.92 0.018
H1 vs H0 H2 vs H0
Test 3 :H0 : no match between +vely selected sites and glycosylation sitesH1 : match between +vely selected sites and glycosylation sites
HIV-1
data set
Sites of
N-glyc
Conserved
N-glyc
Observed Expected P-value
HIV-1 M:A 28 5 11 1.64 0.001
HIV-1 M:B 27 2 5 1.62 0.019
HIV-1 M:C 30 2 7 1.69 0.002
HIV-1 M:D 22 5 4 1.17 0.023
HIV-1 O 39 6 13 2.46 0.001
Positively selected sites tend to occur on N-glycosylation sites
Test 4 :H0 : +vely selected sites have the same strengthH1 : +vely selected sites do not have the same strength
HIV data set HIV-1 M:A HIV-1 M:B HIV-1 M:C HIV-1 M:D HIV-1 O HIV-1 M:B Z = 3.8266
N = 69
P = 0.0001
HIV-1 M:C Z = 1.2150
N = 67
P = 0.2244
Z = -2.1945
N = 62
P = 0.0282
HIV-1 M:D Z = 0.6009
N = 46
P = 0.5479
Z = -1.1021
N = 54
P = 0.2704
Z = -0.4077
N = 51
P = 0.6835
HIV-1 O Z = 0.3652
N = 23
P = 0.7149
Z = -1.853
N = 22
P = 0.0639
Z = -1.0934
N = 26
P = 0.2742
Z = 0.0000
N = 18
P = 1.0000
HIV-2 A Z = -0.4001
N = 11
P = 0.6891
Z = -1.0193
N = 10
P = 0.3081
Z = 1.8347
N = 10
P = 0.0665
Z = 0.8293
N = 9
P = 0.4069
Z = 2.2819
N = 7
P = 0.0225
Not that much difference between clades
CONCLUSIONS
Positively selected sites tend to occur at the same position in different clades, except for HIV-2
Positively selected sites do not tend to be related to epitopic sites
Positively selected sites tend to occur on N-glycosylation sites
Similar intensity of selection across clades
CONCLUSIONS
Positively selected sites tend to occur at the same position in different clades, except for HIV-2
Positively selected sites do not tend to be related to epitopic sites
Positively selected sites tend to occur on N-glycosylation sites
Similar intensity of selection across clades
Moderates conclusions from Gaschen et al 2002
CONCLUSIONS
Positively selected sites tend to occur at the same position in different clades, except for HIV-2
Positively selected sites do not tend to be related to epitopic sites
Positively selected sites tend to occur on N-glycosylation sites
Similar intensity of selection across clades
Confirms the glycan shield model of Kwong et al. 2002
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Glycan shield model (Kwong et al. 2002)
Contributors:C. H. Woelk, University of California San Diego, USA.
D. L. Robertson, University of Manchester, UK.
J. F. Guégan, CEPM, UMR CNRS-IRD 9926, Montpellier, France.
Data:Kuiken, C., et al. (2000). HIV Sequence Compendium. Los Alamos, NM, USA: Theoretical Biology and Biophysics Group, Los Alamos National Laboratory.
Korber, B., et al. (2000). HIV Molecular Immunology. Los Alamos, NM, USA: Theoretical Biology and Biophysics Group, Los Alamos National Laboratory.
Programs:Yang, Z. H. (1997). PAML: a program package for the phylogenetic analysis by maximum likelihood. Computer Applications in the Biosciences 13, 555-556.
Hansen, J. E., et al. (1998) NetOglyc: prediction of mucin type O-glycosylation sites based on sequence context and surface accessibility. Glycoconjugate Journal 15, 115-130.
ACKNOWLEDGEMENTS
PAUP*
A.6
A.7
A.2
A.11A.3
A.17
A1.14A.16
A.12
A1.8
A.10A.4
A.9A1.5
A1.15
A.1
A.13
50 changes
ML on model of codon substitution
dN/dS= w0p0 + w1p1 + w2p2