Predicting ligand binding sites on protein surface Zengming Zhang 2010-5-12

Preview:

Citation preview

Predicting ligand binding sites on protein surface

Zengming Zhang

2010-5-12

What is the binding site?

(Concave, cleft, hole) – shaped region on protein surface

A key into a lock!Key-ligandLock-proteinLock hole-binding sites

Why do we need to find binding sites?

First step in many structure analyses:Functional/catalytic site predictionComparisons of protein atomic configurationsDocking calculationsStructure-based drug design…

Algorithms for finding binding sites

Grid-based Cover the protein into a 3D grid, Empty grid points are then defined a pockets if they satisfy a nu

mber of geometric or energetic conditions. Sphere-based

A set of probe spheres are placed on protein surface. Pocket spheres are those generated probe spheres that satisfy a

number of geometric conditions among the generated probe spheres.

α-shape based Is defined as a subset of Delaunay tessellations of protein atoms,

omitting edges longer than the sum of the radii of two atoms.

Algorithms for finding binding sites

Grid-based POCKET, LIGSITE, LIGSITECS ,LIGSITECSC ,ConCavi

ty, PocketPicker and GHECOM

Sphere-based SURFNET, PASS, Q-SiteFinder, PHECOM

α-shape based CAST, Fpocket

α-shape

The shape surrounded by the black line

The edge of Delaunay tessellations

Delaunay tessellations

No edge that its length is longer than the sum of the

radii of two atoms

α-shape based: CAST

Computes a triangulation of the protein’s surface atoms using α-shapes, then triangles are grouped by letting small triangles flow toward neighboring larger triangles, which act as sinks!

Grid-based

The protein is projected onto a 3D grid. They focused on PSP (protein-solvent-protein) events of the grids.

When a straight line drawn from a grid point is enclosed on both side by protein atoms, the arrangement of the line for that grid point is termed a PSP event.

Grid points having more than a threshold number of PSP events are defined as pockets.

Sphere-based SURFNET: Places a sphere (called gap spheres)

between two protein atoms. If the sphere contains any other

atoms, reduce its radius until it just touches one protein atom.

A set of these gap spheres are defined as pockets.

Grid-based: GHECOM

By Takeshi Kawabata Kawabata T. (2010) Detection of multi-scale poc

kets on protein surfaces using mathematical morphology. Proteins,78, 1195-1121

To define pocket region onprotein surface

Primary points:

1. A new definition of pockets by using the basic operations of mathematical morphology

2. Proposed an algorithm for finding pockets

3. Construct a useful dataset for algorithm testing

4. Introduced a new method for evaluate binding site predictions

5. Some useful discoveries about ligands bind to binding sites

Some Background:

Multiscale pockets: Calculate deep and shallow pockets simultaneously “Multiscale pockets” need “multiscale probes”, they us

e many probes of different sizes to define pockets.

“Size” and “Depth” of pockets: Two properties of pockets A definition of pockets using small and large spherical

probes of his previous work: PHECOM A pocket region: a space into which a small spherical

can enter but a large spherical probe cannot.

Pocket definition

Mathematical Morphology It is a theory used in the analysis of geometric

features of digital images based on rigorous set theory.

Morphology can provide boundaries of objects, their skeletons, and their convex hulls. It is also useful for many pre- and post-processing techniques, especially in edge thinning and pruning.

mathematical morphology (con.)

Four operations: dilation, erosion, opening, closing

a: Molecular shapeb: The shape of the probec:X P: Operation dilation of X by P⊕d:XΘP: Operation erosion of X by Pe:X○P: Operation opening of X by Pf: X • P: Operation closing of X by P

The shape X is the vdW volume of a protein

mathematical morphology (con.) mathematical morphology language:

The translation of the shape X by the vector p (p-translated X) is denoted by (X)p and is defined by:

mathematical morphology (con.)

where Xc is the complement of shape X Xc = E3 –X In other words, the closing of X by P is defined as a spa

ce where the probe P cannot enter when any overlaps between X and P are prohibited.

The closing of X by P is called as the “molecular volume” of molecule X defined by probe P.

Pocket definition (con.) Eq.(12) is introduced by Masuya and Doi using mathema

tical morphological operations:

Pocket definition (con.)

Algorithm:Multiscale closingormultiscale molecular volume:

Using K types of large probe spheres P1,P2, … Pk, and one Small probe S, must satisfy:

The opening condition means thata large probe Pj can be reconstru-cted by a set of translated smallerprobes Pi.

Algorithm (con.)

If the opening condition [Eq. 16] is satisfied for all the probes {Pi}, then the following relation will hold:

But …

Algorithm (con.)

Not satisfy Eq.(16)

Algorithm (con.)

Is the assumption WRONG ? NO! The assumption of Eq. (16) is still safe, because they

use digitized pseudo-spheres as approximations of real spheres in continuous space, and therefore, the digitized pseudo-spheres should have the properties of real spheres.

Algorithm (con.)

Only one index for the 3D grid I(x) is necessary to store K types of dilations, molecular volumes and pockets:

x is a 3D point, ID(x), IC(x) and IP(x) are integers determined by a 3D point x.

Multiscale dilation

Multiscale closing orMultiscale molecular volume

Multiscale pocket

Algorithm (con.) Rinaccess:

The minimum inaccessible radius, means the minimum radius of spheres that cannot touch the point x.

As a measure of shallowness for probes on protein surface.

Rpocket The minimum pocket radius, means the minimum radius of spheres with

which the point x is within the pocket.

Algorithm (con.) Eq.(17-19) suggest an efficient algorithm for calculating

multiscale dilations, molecular volumes and pockets. To implement an efficient algorithm, a shell of pockets Hk

is defined as the difference of kth and (k-1)th probes as follows:

Algorithm (con.) A general strategy for an efficient algorithm is to process

a shape X using a series of shells, progressing in size from smaller to large shell( H1, H2, …, Hk).

The algorithm is shown in Figure 4. In this study, the grid width was set to 0.8 Å, the radius o

f the probe S was set to 1.87 Å, and 17 types of different large probes Pk were used, their radius were: 2.0, 2.5, 3.0, 3.5,…. And 10 Å.

Algorithm (con.) Calculation of Rinaccess for li

gand atoms

A measure of pocket shallowness for probes or atoms of binding ligands is useful for characterizing binding pockets.

|L| is the number of points in the sharp L of the ligand.

A: 1/((1/3 + 1/4 + 1/4 )/3) = 3.6 ÅB: 1/((1/6 + 1/5 + 1/5 )/3) = 5.3 Å

Algorithm (con.) Calculation of Rinaccess and pocketness for protein atoms and residu

es

A measure for characterizing the depth of a protein atom or residue is useful for analyzing the relationship between ligand types and surrounding protein atom types.

For characterizing the depth of protein atoms, they introduced the concept of “accessible shell volume” around a part of protein Y:

where shell Y is a part of a protein shape X (Y X), and S is a spherical ⊂probe.

Algorithm (con.)

The measure of pocketness for a protein atom or residue, indicating how much it contributes to binding ligands.

Generally speaking, deep and large pockets tend to bind ligands. Here is a measure pocketness to indicate both size and depth of a p

ocket:

A residue in a deeper and larger pocket has a larger value of pocketness.

Algorithm (con.)

Clustering grids and filtering out small clusters Most of ligands are bound in the largest pockets. The procedure of clustering pockets and extracting on

ly large pocket clusters have been widely used by researchers.

In this study, using multiscale boundaries of pockets need a threshold value of the Rpocket measure for the boundary between the pocket and the open outer space. [will shown in “Results” section]

Dataset

Prepared from SCOP database, V 1.73 Included protein chains with mutual sequence identities of 40% or less. Exclude:

Small proteins with less than 40 residues Protein chains with domains of class f,h,i,j,k, total 7375 chains

Extract the chains bound to “proper” small molecules, exclude: Tiny molecules Unnatural precipitants: BOG, DTT, EPE, GOL, MES, MPD, MRD, PG4 and TRS. DNA, RNA ( >= 3 ntd) and proteins (>=10 aa) Chains with more than 10,000 heavy atoms

As a result: 1817 chains were included. Each of which contacted at least one proper small molecule. Only use bound chains.

Evaluation of binding site predictions using recall-precision plots

For purpose of comparison, calculated pockets and binding ligands were represented by pockets or ligands with 0.8Å width; each point was checked to determine if it was inside of the pockets or binding ligands.

NP is the number of grid points in pockets, NL is the number of grid point overlapping with ligands, and NPL is the number of grid points in

pockets that overlapped with ligands.

Results

1dwd

Results

Results

Results

Useful discoveries

The majority of molecules binding in deep pockets were coenzymes

In contrast, adenine and guanine mononucleotides tend to bind in medium-to-shallow pockets

Macromolecules tend to bind in shallow pockets or protruded regions

Useful discoveries

In the typical binding pose of the dataset HEM molecule, the aromatic atoms CBB and CMC are facing proteins, whereas the carboxyl atoms O1A and O2A are facing water.

In the ADP molecule, the atom N6 in the adenine ring and the atom O1B, O2B and O3B of phosphate group favored deep pockets, the atoms of sugar, such as O2’ and O3’, favored shallow pockets. N6 side of adenine atoms and the phosphate termini are facing proteins, while the sugar atoms are facing water.

Summary:

1. A new definition of pockets by using the basic operations of mathematical morphology

2. Proposed an efficient algorithm for finding pockets

3. Construct a useful dataset for algorithm testing

4. Introduced a new method for evaluate binding site predictions with precision and recall.

5. Some useful discoveries

Thanks!

Any questions? Please feel free to ask me!

Recommended