Upload
bethanie-moore
View
216
Download
1
Embed Size (px)
Citation preview
Protein Mutational Analysis Using Statistical Geometry
Methods
Majid [email protected]
http://mason.gmu.edu/~mmasso
Bioinformatics and Computational Biology
George Mason University
Protein Basics formed by linearly linking
amino acid residues (aa’s are the building blocks of proteins)
20 distinct aa types A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,
V,W,Y
+H3N Cα C
H O
O-
CH2
CH
CH3 H3C
Identical for al l amino acids
Unique side chain (R group) for each amino acid
Leucine (Leu or L)
+H3N Cα C
H O
O-
R1
+ +H3N
H
Cα
R2
C O-
O
H2O
+H3N Cα Cα
O
R1
C C
H H
N
R2 H
O-
O
peptide bond
Amino Acid Groups Brandon/Tooze (affinity for water)
hydrophobic aa’s: A,V,L,I,M,P,F hydrophilic aa’s:
polar: N,Q,W,S,T,G,C,H,Y charged: D,E,R,K
Dayhoff (similar wrt structure or function) (A,S,T,G,P),(V,L,I,M),(R,K,H),(D,E,N,Q),(F,Y,W),(C) conservative substitution: replacement with an
amino acid from within the same class non-conservative substitution: interclass
replacement
Protein Basics genes: code, or “blueprint” proteins: product, or
“building” protein structure gives rise to
function why do “things go wrong”?
mistakes in “blueprint” incorrectly built, or nonexistent
“buildings” Protein Data Bank (PDB):
repository of protein structural data, including 3D coords. of all atoms (www.rcsb.org/pdb/)
PDB ID: 1REZ
Structure reference: Muraki M., Harata K., Sugita N., Sato K., Origin of carbohydrate recognition specificity of human lysozyme revealed by affinity labeling, Biochemistry 35 (1996)
Computational Geometry Approach to Protein Structure Prediction
Tessellation protein structure represented as a set
of points in 3D, using Cα coordinates Voronoi tessellation: convex polyhedra,
each contains one Cα , all interior points closer to this Cα than any other
Delaunay tessellation: connect four Cα
whose Voronoi polyhedra meet at a common vertex
vertices of Delaunay simplices objectively define a set of four nearest-neighbor residues (quadruplets)
5 classes of Delaunay simplices Quickhull algorithm (qhull program),
Barber et al., UMN Geometry Center
Voronoi/Delaunay tessellation in 2D space. Voronoi tessellation-dashed line, Delaunay tessellation-solid line (Adapted from Singh R.K., et al. J. Comput. Biol., 1996, 3, 213-222.)
i
i+1i+2
i+3j
ii+2 i+1
j+1
j
ii+1
k
j
ii+1l
k
j
i
{1-1-1-1} {2-1-1} {2-2} {3-1} {4}
Five classes of Delaunay simplices. (Adapted from Singh R.K., et al. J. Comput. Biol., 1996, 3, 213-222.)
Counting Quadruplets
assuming order independence among residues comprising Delaunay simplices, the maximum number of all possible combinations of quadruplets forming such simplices is 8855
D F E C
C C D E
C C D D
C C C D
C C C C
20
4
1920
2
20
2
20 19
20
Residue Environment Scores log-likelihood:
= normalized frequency of quadruplets containing residues i,j,k,l in a representative training set of high-resolution protein structures with low primary sequence identity
i.e., = total number of quadruplets in dataset containing only residues i,j,k,l divided by total number of observed quadruplets
= frequency of random occurrence of the quadruplet (multinomial)
i.e., = total number of occurrences of residue i divided by total
number of residues in the dataset , where n = number of distinct residue types in the quadruplet, and t i is the number of residues of
type i.
ijklf
ijklp
ijkl i j k lp ca a a a
logijkl ijkl ijklq f p
ia
ijklf
4!
!n
ii
ct
Residue Environment Scores total statistical potential (topological score) of protein: sum the log-
likelihoods of all quadruplets forming the Delaunay simplices individual residue potentials: sum the log-likelihoods of all quadruplets in
which the residue participates (yields a 3D-1D potential profile)
3phv Potential Profile
Residue Number
0 10 20 30 40 50 60 70 80 90 100
Po
ten
tia
l
-2
0
2
4
6
8
10
12
Structure reference: R. Lapatto, T. Blundell, A. Hemmings, et al., X-ray analysis of HIV-1 proteinase at 2.7 Å resolution confirms structural homology among retroviral enzymes, Nature 342 (1989) 299-302.
PDB ID: 3phvHIV-1 Protease
Monomer99 amino acids
(total potential 27.93)
Properties of HIV-1 Protease
functional as a homodimer 99 residues per subunit
monomers form an intermolecular two-fold axis of symmetry
approximate intramolecular two-fold axis of symmetry
dimer interface: N and C termini (P1-T4 & C95-F99, respectively) form a four-stranded beta sheet
active site triad: D25-T26-G27
h-phobic flaps (M46-V56) are also G-rich, providing flexibility
accommodate / interact with substrate molecule
Figure adapted from URL:http://mcl1.ncifcrf.gov/hivdb/Informative/Facts/
facts.html
HIV-1 Protease Comprehensive Mutational Profile (CMP) mutate 19 times the residue present at each of the 99 positions in the primary sequence get total potential and potential profile of each artificially created mutant protein create 20x99 matrix containing total potentials of all the single residue mutants
columns labeled with residues in the primary sequence of wild-type (WT) HIV-1 protease monomer, and rows labeled with the 20 naturally occurring amino acids
subtract WT total potential (TP) from each cell, then average columns to get CMP CMPj = [(mutant TP)ij-(WT TP)] = [(mutant TP)ij-27.93] , j=1,…,99
3phv Comprehensive Mutational Profile
Residue Number
0 10 20 30 40 50 60 70 80 90 100
Me
an
Ch
an
ge
in
To
tal
P
ro
tein
Po
ten
tia
l
-8
-6
-4
-2
0
2
4
20
1
1
20 i
20
1
1
20 i
3phv Clustered Comprehensive Mutational Profiles
-10
-8
-6
-4
-2
0
2
4
P1
Q I T L 5
. . . E21
A L L D T G A D D30
. . . A71
I G T V L V G P T 80
. . . C 95
T L N F99
C
NC
ALL
-12
-10
-8
-6
-4
-2
0
2
4
6
P 1
Q I T L 5
. . . E 21
A L L D T G A D D30
. . . A71
I G T V L V G P T 80
. . . C95
T L N F99
H-phobic
Charged
Polar
Total
Mea
n C
han
ge in
Tot
al P
rote
in P
oten
tial
Residue
3phv Comprehensive Mutational Profile vs. Potential Profile
Individual Residue Potentials of Wild-Type Protein (potential of residue j in WT HIV-1 protease)
-2 0 2 4 6 8 10 12
Me
an
Ch
an
ge
in T
ota
l Pro
tein
Po
ten
tia
l (C
MP j)
-8
-6
-4
-2
0
2
4
P1Q2
I3
T4
L5
W6
Q7
R8
P9
L10
V11
T12
I13
K14
I15
G16
G17
Q18
L19
K20
E21
A22
L23
L24
D25
T26G27
A28
D29
D30
T31
V32 L33
E34
E35
M36
S37
L38
P39
G40
R41
W42
K43
P44
K45M46
I47
G48G49
I50
G51
G52F53
I54
K55
V56
R57
Q58
Y59
D60
Q61
I62
L63
I64
E65
I66
C67
G68
H69K70
A71
I72
G73
T74
V75
L76
V77
G78
P79
T80P81V82
N83
I84
I85
G86
R87
N88
L89
L90
T91
Q92
I93
G94
C95
T96
L97
N98
F99
3phv Comprehensive Non-Conservative Mutational Profile vs. Potential Profile
Individual Residue Potentials of Wild-Type Protein
-2 0 2 4 6 8 10 12
Mea
n C
hang
e in
Ove
rall
Pro
tein
Pot
entia
l
-10
-8
-6
-4
-2
0
2
4
P1Q2
I3
T4
L5W6
Q7
R8
P9
L10
V11
T12
I13
K14
I15
G16
G17
Q18
L19
K20
E21 A22
L23
L24
D25
T26
G27A28
D29
D30
T31
V32L33
E34
E35
M36
S37
L38
P39
G40
R41
W42
K43
P44
K45M46
I47
G48G49
I50
G51
G52F53
I54
K55
V56
R57
Q58
Y59
D60
Q61
I62
L63
I64
E65
I66
C67
G68
H69
K70A71
I72
G73
T74
V75
L76
V77
G78
P79
T80P81
V82
N83
I84
I85
G86
R87
N88
L89L90
T91
Q92
I93
G94
C95
T96
L97
N98
F99
3phv Comprehensive Conservative Mutational Profile vs. Potential Profile
Individual Residue Potentials of Wild-Type Protein
-2 0 2 4 6 8 10 12
Mea
n C
hang
e in
Ove
rall
Pro
tein
Pot
entia
l
-3
-2
-1
0
1
P1
Q2
I3
T4
L5
W6
Q7
R8
P9
L10
V11T12
I13
K14
I15
G16
G17
Q18L19
K20
E21
A22
L23
L24
D25
T26
G27
A28
D29
D30
T31
V32
L33
E34
E35
M36
S37
L38P39G40
R41
W42
K43
P44
K45
M46
I47G48
G49
I50
G51
G52
F53
I54
K55
V56R57
Q58
Y59
D60
Q61
I62
L63
I64
E65
I66
C67
G68
H69
K70
A71
I72
G73
T74
V75
L76
V77G78
P79
T80
P81
V82N83
I84
I85
G86
R87
N88
L89
L90
T91Q92
I93
G94
C95
T96
L97
N98
F99
Experimental Data
536 single point missense mutations 336 published mutants: Loeb D.D., Swanstrom R.,
Everitt L., Manchester M., Stamper S.E., Hutchison III C.A. Complete mutagenesis of the HIV-1 protease. Nature, 1989, 340, 397-400
200 mutants provided by R. Swanstrom (UNC) each mutant placed in one of 3 phenotypic
categories, positive, negative, or intermediate, based on activity
mutant activity to be compared with change in sequence-structure compatibility elucidated by potential data
Experimental Data
3phv Structure-Function Correlations
-1.80
-1.60
-1.40
-1.20
-1.00
-0.80
-0.60
-0.40
-0.20
0.00
HIV-1 Protease Assay
HIV-1 Protease Mutagenesis Data
Ave
rag
e C
han
ge
in P
ote
nti
al
ALL -0.23 -0.74 -1.39
C -0.14 -0.75 -0.23
NC -0.29 -0.73 -1.65
Positive Intermediate Negative
Observations set of mutants with unaffected protease activity exhibit minimal
(negative) change in potential set of mutants that inactivate protease exhibit large negative change in
potential, weighted heavily by NC set of mutants with intermediate phenotypes exhibit moderate negative
change in potential (similar among C and NC); wide range for intermediate phenotype in the experiments
Evolutionarily Conserved Residue Positions
Apply chi-square test statistic on tables above, with the null hypothesis being no association between residue position conservation and level of sensitivity to mutation :
LHS table (1 df): χ2 = 10.44, reject null with p < 0.01 RHS table (2 df): χ2 = 75.49, reject null with p < 0.001
Mutagenesis at the Dimer Interface
Q2, T4, T96, and N98 are polar and side chains directed outward; P1, I3, L97, and F99 are hydrophobic and side chains directed toward body
F99 in one subunit makes extensive contacts with I3, V11, L24, I66, C67, I93, C95, and H96 in the complementary chain
Impact of the F99A Mutation in One Chain of the HIV-1 Protease on Conctacts in the Complementary Subunit
Residue Number
0 10 20 30 40 50 60 70 80 90 100
Diffe
rence in R
esid
ue
Pote
ntial (F
99
A -
WT
)
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
0.2
Mutagenesis at the Dimer Interface
Alanine scan conducted on interface residues individually and in pairs, in one subunit and in both chains; activity of mutants measured by % cleavage of β-galactosidase containing a protease cleavage site
S. Choudhury, L. Everitt, S.C. Pettit, A.H. Kaplan, Mutagenesis of the dimer interface residues of tethered and untethered HIV-1 protease result in differential activity and suggest multiple mechanisms of compensation, Virology 307 (2003) 204-212.
Results: Good correlation between % cleavage (protease activity) and topological scores (protease sequence-structure compatibility)
Structure-Function Correlation Based on Mutations in Both Subunits of HIV-1 Protease
P1A WT
N98A
T96A
Q2AT4A
N98D
I3A
Q2A+I3A
F99A
L97A
T96A+L97A
L97A+N98A
I3A+T4A
R2 = 0.61
-6
-5
-4
-3
-2
-1
0
1
2
3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
% Cleavage
Dif
fere
nc
e in
To
po
log
ica
l Sc
ore
s (
Mu
tan
t -
WT
)
Structure-Function Correlation Based on Mutations in One Subunit of HIV-1 Protease
L97A
L97A+N98AT96A+L97A
F99A
Q2A+I3A
I3A+T4A
I3A N98D
T4AQ2A
T96A
P1AWT
N98A
R2 = 0.57
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
% Cleavage
Dif
fere
nc
e in
To
po
log
ica
l Sc
ore
s (
Mu
tan
t -
WT
)
Conformational Changes Due to Dimerization and/or Ligand Binding
PDB ID: 1g35 HIV-1 Protease Dimer with Inhibitor aha024
monomer in a dimeric configuration with an inhibitor: obtain profile for 1g35, plot 3D-1D only for g35A
isolated monomer: eliminate all PDB coordinate lines in 1g35 except those for 1g35A, obtain profile, plot 3D-1D
plot interface: difference between the 1g35A 3D-1D’s in the dimer and monomer configurations
Structure reference: W. Schaal, A. Karlsson, G. Ahlsen, et al., Synthesis and comparative molecular field analysis (CoMFA) of symmetric and nonsymmetric cyclic sulfamide HIV-1 protease inhibitors, J. Med. Chem. 44 (2001) 155-169
1g35A Interface
Residue Number
0 10 20 30 40 50 60 70 80 90 100
Diffe
ren
ce
in
Po
ten
tia
l P
rofile
s
-2
-1
0
1
2
3
4
5
Observations majority of residues forming both dimer interface and flap region
exhibit increase in stability following dimerization: Q2, T4, I47-I54, T96, L97, and F99
all h-phobic except Q2 increase in stability due to inhibitor binding evident for the active
site residues D25, T26, and G27; also true for the surrounding h-phobic residues L24 and A28
Significance of Hydrophobic Residues in HIV-1 Protease
35/99 amino acids with scores exceeding 1.0 27 of these are hydrophobic altogether, 44/99 amino acids in protease are hydrophobic
Assuming h-phobic residues no more likely than others (polar/charged) to have score>1.0
expect (35/99)x44, i.e. 15 or 16 h-phobics >1.0 P(27 h-phobics>1.0)= < 0.001, yet this is
exactly what we observe! What about other cut-off scores, and other proteins?
applied similar test to all 996 proteins in the training set—while varying cut-off between 0.0-5.0 in 0.25 increments, binomial probabilities were calculated for each protein. For a given p-value, # of proteins with a lower significance level at each cut-off score was tabulated
27 17 -444! 35 6427!17! 99 99 2.7x10
Significance of Hydrophobic Residues optimal cut-off score for rejection of the null is
clearly distinct for each of the individual proteins. Ex. 827 proteins reject a null with 2.0 cut-off score at p
= 0.05, but 918 proteins reject the null at the same significance level if all cut-off scores considered.
alternate approach: 92,343 h-phobic amino acids and 136,329 others (polar/charged), total of 228,672 residues in the 996 proteins; assuming no differ. in the mean of the scores in both groups, apply t-test.
Result: t=126.48, with 228,670 df => reject null!
Acknowledgements Iosif Vaisman (Ph.D. advisor, first to
apply Delaunay to protein structure)
Zhibin Lu (Java programs for calculating statistical potentials from tessellations)
Ronald Swanstrom (experimental HIV-1 protease mutants and activity measure)