47
Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Embed Size (px)

Citation preview

Page 1: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Towards Low Resolution Refinement

Garib N Murshudov

York Structural Laboratory

Chemistry Department

University of York

Page 2: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Contents

1) Some of the projects

2) TWIN refinement in REFMAC and its extension

3) Problems of low resolution refinement

4) Some tools for low resolution: ncs, external restraints, B value restraints and “jelly” body

5) Map sharpening: General approach and some applications

6) Some future plans

7) Conclusions

Page 3: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

BALBES: automatic molecular replacement pipeline

3

People involved: Alexei Vagin, Fei Long

Using redesigned PDB with their domain and multimeric organisation tries to solve molecular replacement problem.

Page 4: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Jligand

4

Link

People involved: Andrey Lebedev and Paul Young (and ccp4)

A GUI to design links and covalent link descriptions.

Page 5: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

2cex : Sialic Acid Binding Protein

1mpd : Maltose Binding Protein

Conformation invariant alignment and restrain generation

average fragment score = 1.2A

Green = 0Red = 2

People involved:Rob Nicholls

Page 6: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Resolution of space group uncertainty after refinement

6

Andrey Lebedev

Page 7: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Available refinement programs

• SHELXL• CNS• REFMAC5• TNT• BUSTER/TNT• Phenix.refine• RESTRAINT• MOPRO• MAIN

Page 8: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

What can REFMAC do?• Simple maximum likelihood restrained refinement

• Twin refinement

• Phased refinement (with Hendrickson-Lattmann coefficients)

• SAD/SIRAS refinement

• Structure idealisation

• Library for more than 8000 ligands (from the next version)

• Covalent links between ligands and ligand-protein

• Rigid body refinement

• NCS local, restraints to external structures

• TLS refinement

• Map sharpening

• etc

Page 9: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Twinning

Page 10: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

merohedral and pseudo-merohedral twinning

Crystal symmetry: P3 P2 P2Constrain: - β = 90º -Lattice symmetry *: P622 P222 P2(rotations only)Possible twinning: merohedral pseudo-merohedral -

Domain 1

Domain 2

Twinning operator

-

Crystal lattice is invariant with respect to twinning operator.

The crystal is NOT invariant with respect to twinning operator.

Page 11: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Twin refinement in REFMAC

Twin refinement in refmac (5.5 or later) is automatic.

– Identify “twin” operators

– Calculate “Rmerge” (Σ|Ih-<I>twin| /ΣΙh) for each operator. Ιf Rmerge>0.50 keep it: Twin plus crystal symmetry operators should form a group

– Refine twin fractions. Keep only “significant” domains (default threshold is 5%): Twin plus symmetry operators should form a group

Intensities can be used

If phases are available they can be used

Maximum likelihood refinement is used

Page 12: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

12

Likelihood

The dimension of integration is in general twice the number of twin related domains. Since the phases do not contribute to the first part of the integrant the second part becomes Rice distribution.

The integration is carried out using Laplace approximation.

These equations are general enough to account for: non-merohedral twinning (including allawtwin), unmerged data. A little bit modification should allow handling of simultaneous twin and SAD/MAD phasing, radiation damage

Page 13: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Electron density: likelihood based

Map coefficients

It seems to be working reasonable well. For unbiased map it is necessary to integrate over errors in all parameters (observations as well as refined parameters.

Page 14: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Electron density: 1jrg

Warning: Usually twin refinement reduces R factors but electron density does not improve much

“non twin” map “twin” map

Page 15: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Effect of twin on electron density: Data provided by Ivan Campeotto

Space group: P21Cell parameters: 54.63 142.77 84.37 90.00 108.76 90.00Resolution: 1.8Twin operators: -H, -K, H+L (or H, -K, -H-L)Twin fractions: 0.46, 0.54 “Rmerge”: 0.065Rmerge for calc: 0.36R/Rfree no twin: 0.30/0.34R/Rfee twin: 0.21/0.26

R/Rfree are final statistics after refinement and rebuildingNote: Reindexing may be needed. In these cases REFMAC warns that you may reconsider reindexing.

Page 16: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Effect of twin on electron density: Data provided by Ivan Campeotto

Twin off (difficult rebuilding) Twin on (final model)

Page 17: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Twinning: Warning

Rfactors

Random R factor in the presence of perfect twin with twinning modeled is around 40. If twinning is not modeled then it is around 50. Be careful after molecular replacement

Small twin fractions: Be careful with small twin fractions. Refmac removes twin domains with fraction less than 5%

High symmetry:

If twin fractions are refined towards perfect twinning then space group may be higher. Program zanuda from YSBL website may sort out some of the space group uncertainty problems:

www.ysbl.york.ac.uk/YSBLPrograms/index.jsp

Page 18: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Twin: Few warnings about R factorsFor acentric case only:

For random structure

Crystallographic R factors

No twinning 58%

For perfect twinning: twin modelled 40%

For perfect twinning without twin modelled 50%

R merges without experimental error

No twinning 50%

Along non twinned axes with another axis than twin 37.5%

Twin

Non twin

Page 19: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Effect of twinning on electron density

Using twinning in refinement programs is straightforward. It improves statistics substantially (sometimes R-factors can go down by 10%). However improvement of electron density is not very dramatic (just like when you use TLS). It may improve electron density in weak parts but in general do not expect miracles. Especially when twinning and NCS are close then improvements are marginal.

Page 20: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Further applications of “twin” likelihood

1) Reticular (or non-merohedral) twinning

2) Split crystals

3) Overlapping spots that have been integrated as single observation

Page 21: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Problems of low resolution refinement1) Function to describe fit of the model into experiment: likelihood or similar

1) Data may come from very peculiar “crystals”: Twin, OD, multiple cell

2) Radiation damage

3) Converting I-s to |F| may not be valid

2) Limited and noisy data: use of available knowledge

1) Known structures

2) Internal patterns: NCS, secondary structure

3) Smeared electron density with vanishing side chains, secondary structures, domains: High B values and series termination:

1) Filtering methods: Solve inverse problem with regulariser

2) Missing data problem: Data augmentation, bootstrap

Page 22: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Use of available knowledge

1) NCS local2) Restraints to known structure(s)3) Restraints to current inter-atomic distances (implicit normal modes or “jelly” body)4) Better restraints on B values

These are available from the version 5.6

NoteBuster/TNT has local NCS and restraints to known structures CNS has restraints to known structures (they call it deformable elastic network)Phenix has B-value restraints on non-bonded atom pairs and automatic global NCSLocal NCS (only for torsion angle related atom pairs) was available in SHELXL since the beginning of time

Page 23: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Auto NCS: local and global1. Align all chains with all chains using Needleman-Wunsh method2. If alignment score is higher than predefined (e.g.80%) value then consider them as similar3.Find local RMS and if average local RMS is less than predefined value then consider them aligned4. Find correspondence between atoms5. If global restraints (i.e. restraints based on RMS between atoms of aligned chains) then identify domains6.For local NCS make the list of corresponding interatomic distances (remove bond and angle related atom pairs)7.Design weights

The list of interatomic distance pairs is calculated at every cycle

Page 24: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Auto NCS

Global RMS is calculated using all aligned atoms.

Local RMS is calculated using k (default is 5) residue sliding windows and then averaging of the results

Aligned regions

Chain A

Chain B

k(=5)

Ave(RmsLoc)k =1

N − k +1RmsLoc i

i=1

N−k +1

RMS = Ave(RmsLoc)N

Page 25: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Auto NCS: Neighbours

After alignment, neighbours are analysed.1)Each water, ligand is assigned to the chain they are close to. 2)Neighbours included in restrains when possible

Chain A

Water or ligand

Chain B

Water or ligand

Shell 1

Shell 2

Shell 1Shell 2

Page 26: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Auto NCS: Iterative alignment

********* Alignment results ********* -------------------------------------------------------------------------------: N: Chain 1 : Chain 2 : No of aligned :Score : RMS :Ave(RmsLoc): -------------------------------------------------------------------------------: 1 : J( 131 - 256 ) : J( 3 - 128 ) : 126 : 1.0000 : 5.2409 : 1.6608 : : 2 : J( 1 - 257 ) : L( 1 - 257 ) : 257 : 1.0000 : 4.8200 : 1.6694 : : 3 : J( 131 - 256 ) : L( 3 - 128 ) : 126 : 1.0000 : 5.2092 : 1.6820 :: 4 : J( 3 - 128 ) : L( 131 - 256 ) : 126 : 1.0000 : 3.0316 : 1.5414 : : 5 : L( 131 - 256 ) : L( 3 - 128 ) : 126 : 1.0000 : 0.4515 : 0.0464 : ----------------------------------------------------------------------------------------------------------------------------------------------

Example of alignment: 2vtu.There are two chains similar to each other. There appears to be gene duplication

RMS – all aligned atomsAve(RmsLoc) – local RMS

Page 27: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Auto NCS: Conformational changes

In many cases it could be expected that two or more copies of the same molecule will have (slightly) different conformation. For example if there is a domain movement then internal structures of domains will be same but between domains distances will be different in two copies of a molecule

Domain 1

Domain 1

Domain 2

Domain 2

Page 28: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Robust estimators

One class of robust (to outliers) estimators are called M-estimators: maximum-likelihood like estimators. One of the popular functions is Geman-Mcclure.

Essentially when distances are similar then they should be kept similar and when they are too different they should be allowed to be different.

This function is used for NCS local restraints as well as for restraints to external structures Red line: x2

Black line: x^2/(1+w x^2)

where x=(d1-d2)/σ, w=0.1

Page 29: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Restraints to external structuresIt is done by Rob Nicholls

ProSmart

Compares Two Protein Chains• Conformation-invariant structural comparison• Residue-residue alignment• Superimposition• Residue-based and global similarity scores

Produces local atomic distance restraints• Based on one or more aligned chains• Possibility of multi-crystal refinement

Page 30: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

ProSmart Restrain

structure to be refined known similar structure (prior)

Page 31: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

ProSmart Restrain

structure to be refined known similar structure (prior)

Remove bond and angle related pairs

Page 32: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

To allow conformational changes, Geman-McClure type robust estimator functions are used

Page 33: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Restraints to current distances

The term is added to the target function:

Summation is over all pairs in the same chain and within given distance (default 4.2A). dcurrent is recalculated at every cycle. This function does not contribute to gradients. It only contributes to the second derivative matrix.

It is equivalent to adding springs between atom pairs. During refinement inter-atomic distances are not changed very much. If all pairs would be used and weights would be very large then it would be equivalent to rigid body refinement.

It could be called “implicit normal modes”, “soft” body or “jelly” body refinement.

w(| d | − | dcurrent |)2

pairs

Page 34: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

B value restraints and TLSDesigning restraints on B values is much more difficult. Current available options to deal with B values at low resolutions

1)Group B as implemented in CNS2)TLS group refinement as implemented in refmac and phenix.refine

Both of them have some applications. TLS seems to work for wide range of cases but unfortunately it is very often misused. One of the problems is discontinuity of B values. Neighbouring atoms may end up having wildly different B values

In ideal world anisotropic U with good restraints should be used. But this world is far far away yet. Only in some cases full aniso refinement at 3Å gives better R/Rfree than TLS refinement. These cases are with extreme ansiotropic data.

TLS1

TLS2

loop

Page 35: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Parameters: B value restraints and TLS

Restraints on B values1)Differences of projections of aniso U of atom on the bond should be similar (rigid bond)2)Kullback-Liblier (conditional entropy) divergence should be small:

For isotropic atoms (for bonded and non-bonded atoms)

B1/B2+B2/B1-2

1)Local TLS: Neighboring atoms should be related as TLS groups (not available yet)

Page 36: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Kullback-Leibler divergence

36

If there are two densities of distributions – p(x) and q(x) then symmetrised Kullback-Leibler divergence between them is defined (it is distance between distributions)

If both distributions are Gaussian with the same mean values and U1 and U2 variances then this distance becomes:

And for isotropic case it becomes

Restraints for bonded pairs have more weights more than for non-bonded pairs. For nonbonded atoms weights depend on the distance between atoms.

This type of restraint is also applied for rigid bond restraints in anisotropic refinement

1

2( p(x)log(

p(x)

q(x))dx +

−∞

∫ p(x)log(q(x)

p(x))dx

−∞

∫ )

tr(U1U2−1 + U2U1

−1 − 2I)

3(B1

B2

+B2

B1

− 2) = 3(B1 − B2)2

B1B2

Page 37: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Example, after molecular replacement 3A resolution, data completeness 71%

Rfactors vs cycleBlack – simple refinementRed – Global NCSBlue – Local NCSGreen – “Jelly” body

Solid lines – RfactorDashed lines - Rfree

Page 38: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Example: 4A resolution, data from pdb 2r6c

Rfactors vs cycleBlack – Simple refinementRed – External restraintsBlue – “Jelly” body

Solid lines – RfactorDashed lines - Rfree

Page 39: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Example: 5A resolution, data from pdb 2w6h

Rfactors vs cycleBlack – Simple refinementRed – External restraintsBlue – “Jelly” + local NCS

Solid lines – RfactorDashed lines - Rfree

Page 40: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

MAP SHARPENING: INVERSE PROBLEM

40

K(x,y)ρ 0(y)dy = ρ (x)∫

We want to observe ρ0(x) but we observe ρ(x). These two entities are related:

If K is known, calculate ρ0. In general problem is easy: Discretise and solve the linear equation. However these problems are ill-posed (small perturbation in the input causes large deviation in the output). In practice: by sharpening signal as well as noise are amplified.Regularisation may help:

L is related with regularisation function. For L2 norm (value of ρ should be small) L is identity and for Sobolev norm (ρ should be smooth) of first order it is Laplace operators

|| Kρ 0 − ρ ||2 +αf (ρ) ==> min

ρ 0 = (KTK + αL)−1KT ρ

Page 41: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

MAP SHARPENING: INVERSE PROBLEM

41

In general K is effect of such terms as TLS or smoothly varying blurring function.Noises in electron density: series termination, errors in phases and noises in experimental data.

Very simple case: K is overall B value. Then the problem is solved using FFT: Fourier transform of Gaussian is Gaussian, Fourier transform of Laplace operator is square length of the reciprocal space vector.

Fdeblurred =e−B |s|2 / 4

e−2B |s|2 / 4 + α | s |2F = Kα (s,B)F

Page 42: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

MAP SHARPENING: INVERSE PROBLEMREGULARISATION PARAMETER.

42

One way of selecting regularisation parameter: Minimise predicted error.

Where Aα=K Kα

If we restore unobserved data with their expected values Fe then the last term would be replaced by

Restoring seems to give less predictive error (problem of bias towards error in phases remains)

“Best” regularisation parameter is that that minimises PE.

PE = (Aα −1)observed

∑2

| Fmap |2 +2 Aα <| F − Fmap |2>observed

∑ + 2 Aα <| F |2>unobserved

(Aα −1)2 | Fe |2

unobserved

∑ + 2 Aα

unobserved

∑ <| F − Fe |2>

Page 43: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

MAP SHARPENING: 2R6C, 4Å RESOLUTION

43

Original No sharpening

Sharpening, median Bα 0

Sharpening, median Bα optimised

Top left and bottom:After local NCS refinement

Page 44: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Some of the other new features in REFMAC

SAD refinement available from version 5.5SIRAS refinement available from version 5.6

New and complete dictionary available from version 5.6Improved mask solvent available from version 5.6Jligand for ligand dictionary and link description

Page 45: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Future

Electron density calculation: Bayesian filtersHow to combine two conflicting ideas: Sharpen electron density to have

“better” defined atoms and integrate over errors to smoothen the electron density thus reduce noise

Local TLS restraints: Needs to be tested

Restraints on secondary structures and other internal patterns

Reticular twin: Almost there

Radiation damage

Error estimation

Page 46: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

Conclusion

• Twin refinement improves statistics and occasionally electron density

• Use of similar structures should improve reliability of the derived model: Especially at low resolution

• NCS restraints must be done automatically: but conformational flexibility must be accounted for

• “Jelly” body works better than I though it should

• Regularised map sharpening looks promising. More work should be done on series termination and general sharpening operators

Page 47: Towards Low Resolution Refinement Garib N Murshudov York Structural Laboratory Chemistry Department University of York

AcknowledgmentYork Leiden

Alexei Vagin Pavol Skubak

Andrey Lebedev Raj Pannu

Rob Nocholls

Fei Long

CCP4, YSBL people

REFMAC is available from CCP4 or from York’s ftp site:

www.ysbl.york.ac.uk/refmac/latest_refmac.html

This and other presentations can be found on:

www.ysbl.york.ac.uk/refmac/Presentations/