43
Predicting ligand binding s ites on protein surface Zengming Zhang 2010-5-12

Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Embed Size (px)

Citation preview

Page 1: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Predicting ligand binding sites on protein surface

Zengming Zhang

2010-5-12

Page 2: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

What is the binding site?

(Concave, cleft, hole) – shaped region on protein surface

A key into a lock!Key-ligandLock-proteinLock hole-binding sites

Page 3: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Why do we need to find binding sites?

First step in many structure analyses:Functional/catalytic site predictionComparisons of protein atomic configurationsDocking calculationsStructure-based drug design…

Page 4: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithms for finding binding sites

Grid-based Cover the protein into a 3D grid, Empty grid points are then defined a pockets if they satisfy a nu

mber of geometric or energetic conditions. Sphere-based

A set of probe spheres are placed on protein surface. Pocket spheres are those generated probe spheres that satisfy a

number of geometric conditions among the generated probe spheres.

α-shape based Is defined as a subset of Delaunay tessellations of protein atoms,

omitting edges longer than the sum of the radii of two atoms.

Page 5: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithms for finding binding sites

Grid-based POCKET, LIGSITE, LIGSITECS ,LIGSITECSC ,ConCavi

ty, PocketPicker and GHECOM

Sphere-based SURFNET, PASS, Q-SiteFinder, PHECOM

α-shape based CAST, Fpocket

Page 6: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

α-shape

The shape surrounded by the black line

The edge of Delaunay tessellations

Page 7: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Delaunay tessellations

No edge that its length is longer than the sum of the

radii of two atoms

Page 8: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

α-shape based: CAST

Computes a triangulation of the protein’s surface atoms using α-shapes, then triangles are grouped by letting small triangles flow toward neighboring larger triangles, which act as sinks!

Page 9: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Grid-based

The protein is projected onto a 3D grid. They focused on PSP (protein-solvent-protein) events of the grids.

When a straight line drawn from a grid point is enclosed on both side by protein atoms, the arrangement of the line for that grid point is termed a PSP event.

Grid points having more than a threshold number of PSP events are defined as pockets.

Page 10: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Sphere-based SURFNET: Places a sphere (called gap spheres)

between two protein atoms. If the sphere contains any other

atoms, reduce its radius until it just touches one protein atom.

A set of these gap spheres are defined as pockets.

Page 11: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Grid-based: GHECOM

By Takeshi Kawabata Kawabata T. (2010) Detection of multi-scale poc

kets on protein surfaces using mathematical morphology. Proteins,78, 1195-1121

To define pocket region onprotein surface

Page 12: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Primary points:

1. A new definition of pockets by using the basic operations of mathematical morphology

2. Proposed an algorithm for finding pockets

3. Construct a useful dataset for algorithm testing

4. Introduced a new method for evaluate binding site predictions

5. Some useful discoveries about ligands bind to binding sites

Page 13: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Some Background:

Multiscale pockets: Calculate deep and shallow pockets simultaneously “Multiscale pockets” need “multiscale probes”, they us

e many probes of different sizes to define pockets.

“Size” and “Depth” of pockets: Two properties of pockets A definition of pockets using small and large spherical

probes of his previous work: PHECOM A pocket region: a space into which a small spherical

can enter but a large spherical probe cannot.

Page 14: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Pocket definition

Mathematical Morphology It is a theory used in the analysis of geometric

features of digital images based on rigorous set theory.

Morphology can provide boundaries of objects, their skeletons, and their convex hulls. It is also useful for many pre- and post-processing techniques, especially in edge thinning and pruning.

Page 15: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

mathematical morphology (con.)

Four operations: dilation, erosion, opening, closing

a: Molecular shapeb: The shape of the probec:X P: Operation dilation of X by P⊕d:XΘP: Operation erosion of X by Pe:X○P: Operation opening of X by Pf: X • P: Operation closing of X by P

The shape X is the vdW volume of a protein

Page 16: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

mathematical morphology (con.) mathematical morphology language:

The translation of the shape X by the vector p (p-translated X) is denoted by (X)p and is defined by:

Page 17: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

mathematical morphology (con.)

where Xc is the complement of shape X Xc = E3 –X In other words, the closing of X by P is defined as a spa

ce where the probe P cannot enter when any overlaps between X and P are prohibited.

The closing of X by P is called as the “molecular volume” of molecule X defined by probe P.

Page 18: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Pocket definition (con.) Eq.(12) is introduced by Masuya and Doi using mathema

tical morphological operations:

Page 19: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Pocket definition (con.)

Page 20: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm:Multiscale closingormultiscale molecular volume:

Using K types of large probe spheres P1,P2, … Pk, and one Small probe S, must satisfy:

The opening condition means thata large probe Pj can be reconstru-cted by a set of translated smallerprobes Pi.

Page 21: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.)

If the opening condition [Eq. 16] is satisfied for all the probes {Pi}, then the following relation will hold:

But …

Page 22: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.)

Not satisfy Eq.(16)

Page 23: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.)

Is the assumption WRONG ? NO! The assumption of Eq. (16) is still safe, because they

use digitized pseudo-spheres as approximations of real spheres in continuous space, and therefore, the digitized pseudo-spheres should have the properties of real spheres.

Page 24: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.)

Only one index for the 3D grid I(x) is necessary to store K types of dilations, molecular volumes and pockets:

x is a 3D point, ID(x), IC(x) and IP(x) are integers determined by a 3D point x.

Multiscale dilation

Multiscale closing orMultiscale molecular volume

Multiscale pocket

Page 25: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12
Page 26: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.) Rinaccess:

The minimum inaccessible radius, means the minimum radius of spheres that cannot touch the point x.

As a measure of shallowness for probes on protein surface.

Rpocket The minimum pocket radius, means the minimum radius of spheres with

which the point x is within the pocket.

Page 27: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.) Eq.(17-19) suggest an efficient algorithm for calculating

multiscale dilations, molecular volumes and pockets. To implement an efficient algorithm, a shell of pockets Hk

is defined as the difference of kth and (k-1)th probes as follows:

Page 28: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.) A general strategy for an efficient algorithm is to process

a shape X using a series of shells, progressing in size from smaller to large shell( H1, H2, …, Hk).

The algorithm is shown in Figure 4. In this study, the grid width was set to 0.8 Å, the radius o

f the probe S was set to 1.87 Å, and 17 types of different large probes Pk were used, their radius were: 2.0, 2.5, 3.0, 3.5,…. And 10 Å.

Page 29: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.) Calculation of Rinaccess for li

gand atoms

A measure of pocket shallowness for probes or atoms of binding ligands is useful for characterizing binding pockets.

|L| is the number of points in the sharp L of the ligand.

A: 1/((1/3 + 1/4 + 1/4 )/3) = 3.6 ÅB: 1/((1/6 + 1/5 + 1/5 )/3) = 5.3 Å

Page 30: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.) Calculation of Rinaccess and pocketness for protein atoms and residu

es

A measure for characterizing the depth of a protein atom or residue is useful for analyzing the relationship between ligand types and surrounding protein atom types.

For characterizing the depth of protein atoms, they introduced the concept of “accessible shell volume” around a part of protein Y:

where shell Y is a part of a protein shape X (Y X), and S is a spherical ⊂probe.

Page 31: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12
Page 32: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.)

The measure of pocketness for a protein atom or residue, indicating how much it contributes to binding ligands.

Generally speaking, deep and large pockets tend to bind ligands. Here is a measure pocketness to indicate both size and depth of a p

ocket:

A residue in a deeper and larger pocket has a larger value of pocketness.

Page 33: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Algorithm (con.)

Clustering grids and filtering out small clusters Most of ligands are bound in the largest pockets. The procedure of clustering pockets and extracting on

ly large pocket clusters have been widely used by researchers.

In this study, using multiscale boundaries of pockets need a threshold value of the Rpocket measure for the boundary between the pocket and the open outer space. [will shown in “Results” section]

Page 34: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Dataset

Prepared from SCOP database, V 1.73 Included protein chains with mutual sequence identities of 40% or less. Exclude:

Small proteins with less than 40 residues Protein chains with domains of class f,h,i,j,k, total 7375 chains

Extract the chains bound to “proper” small molecules, exclude: Tiny molecules Unnatural precipitants: BOG, DTT, EPE, GOL, MES, MPD, MRD, PG4 and TRS. DNA, RNA ( >= 3 ntd) and proteins (>=10 aa) Chains with more than 10,000 heavy atoms

As a result: 1817 chains were included. Each of which contacted at least one proper small molecule. Only use bound chains.

Page 35: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Evaluation of binding site predictions using recall-precision plots

For purpose of comparison, calculated pockets and binding ligands were represented by pockets or ligands with 0.8Å width; each point was checked to determine if it was inside of the pockets or binding ligands.

NP is the number of grid points in pockets, NL is the number of grid point overlapping with ligands, and NPL is the number of grid points in

pockets that overlapped with ligands.

Page 36: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Results

1dwd

Page 37: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Results

Page 38: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Results

Page 39: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Results

Page 40: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Useful discoveries

The majority of molecules binding in deep pockets were coenzymes

In contrast, adenine and guanine mononucleotides tend to bind in medium-to-shallow pockets

Macromolecules tend to bind in shallow pockets or protruded regions

Page 41: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Useful discoveries

In the typical binding pose of the dataset HEM molecule, the aromatic atoms CBB and CMC are facing proteins, whereas the carboxyl atoms O1A and O2A are facing water.

In the ADP molecule, the atom N6 in the adenine ring and the atom O1B, O2B and O3B of phosphate group favored deep pockets, the atoms of sugar, such as O2’ and O3’, favored shallow pockets. N6 side of adenine atoms and the phosphate termini are facing proteins, while the sugar atoms are facing water.

Page 42: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Summary:

1. A new definition of pockets by using the basic operations of mathematical morphology

2. Proposed an efficient algorithm for finding pockets

3. Construct a useful dataset for algorithm testing

4. Introduced a new method for evaluate binding site predictions with precision and recall.

5. Some useful discoveries

Page 43: Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Thanks!

Any questions? Please feel free to ask me!