54
Direct Methods and Many Direct Methods and Many Site Se-Met MAD Problems Site Se-Met MAD Problems using BnP using BnP W. Furey

Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Direct Methods and Many Site Direct Methods and Many Site Se-Met MAD Problems using Se-Met MAD Problems using

BnPBnP

W. Furey

Page 2: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Classical Direct MethodsClassical Direct Methods

Main method for “small molecule” structure determination

Highly automated (almost totally “black box”) Solves structures containing up to a few

hundred non-hydrogen atoms in the asymmetric unit.

Page 3: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Direct Methods Assumptions Direct Methods Assumptions and Requirementsand Requirements

Non-negativity of electron density Atoms are “resolved”, i.e. “atomic resolution”

data are available Unit cell, symmetry and contents are known

Page 4: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Important Concepts - 1Important Concepts - 1

Normalized Structure Factors EH given by EH = FH / < |FH|2>1/2 with averaging in

resolution shells

The phase H of EH is the same as for FH

< |EH|2> = 1 hence “normalized”

Page 5: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Important Concepts - 2Important Concepts - 2 Structure Invariant - structural quantity

independent of choice of unit cell origin

Probabilistic estimates can be made for the values of structure invariants given the associated E magnitudes and cell contents

Page 6: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 7: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Fundamental formulas Fundamental formulas involving individual tripletsinvolving individual triplets

P(HK) = [2I0(AHK)]-1 exp(AHK cos HK) where P(HK) is the probability of the structure invariant having the value HK

AHK = 2 |EHEKE-H-K| / N1/2 where N is the number of atoms in the cell and the E’s are normalized structure factors

Page 8: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Note probability P(HK) increases as AHK increases, and that AHK is proportional to product of E’s and inversely proportional to N1/2

Expected value of cos HK is given by

<cos HK> = I1(AHK) / I0(AHK)

Page 9: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Cochran Distributionfor various K’s

vs K

3 = HK, K=AHK

Page 10: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 11: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 12: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 13: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Classical Direct Methods Classical Direct Methods Applications for ProteinsApplications for Proteins

Used for phase extension to very high resolution

Used with moderate success to locate heavy atom sites in isomorphous derivatives

E values used in molecular replacement calculations

Page 14: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Current Direct Methods Current Direct Methods Applications for ProteinsApplications for Proteins

Shake n Bake (based on minimum function) used to solve complete protein structures with over 1,000 atoms (rubredoxin, lysozyme, calmodulin etc.), provided data to 1.1Å or better is available

Used to locate anomalous scatterer sites from MAD or SAS data

Page 15: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

General Shake n Bake ConceptGeneral Shake n Bake Concept

Use a multi-solution method starting with random phases (or randomly positioned atoms) in each trial.

For each trial phase set, use a “dual space” procedure iterating between real and reciprocal space optimization/constraints.

Page 16: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Reciprocal space optimization based on shifting phases to reduce the “minimum function” R()

Real space optimization and constraints based on computing new phases only from the largest peaks in map based on previous cycle phases

Each trial phase set ranked by value of R()

Page 17: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Generate random trial structure

Select “structure”from largest peaks

Compute phasesfrom structure

Shift phases toreduce R()

Compute mapfrom new phases

SnB inner loop for trial structure

Stop after N iterations

Page 18: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Choice of data for Se determinationChoice of data for Se determination

Use | |FH|+ - |FH|- | (anomalous) difference at single

Use | |FH|i - |FHlj | (dispersive) difference between two ’s

Use FA values (derived from data at all ’s)

Use FHLE values based on max anomalous and max dispersive differences

Page 19: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 20: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 21: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

MAD PhasingMAD Phasing

For data collected at 1, 2 etc, choose a wavelength n as “native” data, and “reduce” that data set by averaging Bijvoet pairs.

For other “derivative” wavelengths d, reduce both by averaging Bijvoet pairs to form “isomorphous” data sets, and without averaging to form “anomalous” data sets.

Page 22: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

MAD PhasingMAD Phasing For “isomorphous” and “derivative

anomalous” data sets, scale “derivative” to “native” and use scattering factors of

f0= 0, f’= f’(d) - f’(n), f”= f”(d)

For “native anomalous” data use original native Bijvoet pairs and scattering factors of

f0= 0, f’ = 0, f”= f”(n)

Page 23: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Phase Refinement MinimizingPhase Refinement Minimizing

|FPHcalc h2 |FPobs | h

2 |FHcalc | h2

2 |FPobs |h |FHcalc | h cos P H h

( P )|

Wh P P

P

h

| FPHobs | h |FPHcalc P |h 2

where

Page 24: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Phase Refinement OptionsPhase Refinement Options

“Classical” - P = centroid, Wh=1/E2,1/ <E2> or unity, PP=1, use reflections with FOM > 0.4-0.6

“Maximum Likelihood” - P stepped over allowed phases, PP= corresponding probability, Wh=1/E2, 1/ <E2> or unity, use reflections with FOM > 0.2

P, PP can also come from external source, i.e solvent flattened or NC-symmetry averaged maps.

Whh PP

| FPHobs |h |FPHcalc (P )|h P

2

Page 25: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 26: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 27: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 28: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Projection of peaks down NC twofoldProjection of peaks down NC twofold

Page 29: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

MAD 1, 2, 3 data (Scalepack files)

“iso” and “ano” scaled files

“extension”file

all “native”(3) data

CMBISO CMBANO

PHASIT

MISSNG

FSFOUR

BNDRY

MAPINV

EXTRMP

MAPAVG

BLDCEL

“phase” file

“submap” file

“averaging” mask file

final map

Page 30: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 31: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

MAD Phasing/Averaging StatisticsWavelength type dmin

(Å)No. refl Rano Riso dmin (Å)

(phasing)Rc Phasing

Power<FOM>

1,edge ano 2.3 72,632 0.063 - 2.6 - 3.47 0.3802,peak ano 2.3 72,996 0.060 - 2.6 - 3.45 0.447

3,remote ano 2.3 72,650 0.048 - 2.6 - 2.09 0.3891-3 iso 2.3 74,407 - 0.039 2.6 0.55 1.89 0.3932-3 iso 2.3 74,774 - 0.035 2.6 0.61 1.59 0.357

Mean FOM (combined) = 0.759 for 48,632 reflections (2.6Å)

Correlation coefficient between monomer density prior toNCS averaging = 0.764

Correlation coefficient between monomer density after NCSaveraging/phase combination = 0.906

Page 32: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 33: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 34: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Peak anomalous (Peak anomalous (2)2)difference Pattersondifference Patterson

Page 35: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 36: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 37: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 38: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 39: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey
Page 40: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

With SnB it’s possible to automatically locate the anomalous scatterer substructure with data from any one of the dispersive combinations or anomalous pair sets

As expected, sets with the maximum dispersive or anomalous signal typically yield a greater frequency of success

Page 41: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Automated Applications ofAutomated Applications of BnP: Methodology BnP: MethodologyW. Furey,W. Furey,11 L. Pasupulati, L. Pasupulati,11

S. PotterS. Potter22, H. Xu, H. Xu22, R. Miller, R. Miller33 & C. Weeks & C. Weeks22

11University of Pittsburgh School of MedicineUniversity of Pittsburgh School of Medicine and VA Medical Centerand VA Medical Center

22Hauptman-Woodward Medical Research InstituteHauptman-Woodward Medical Research Institute33Center for Computational Research, SUNY at BuffaloCenter for Computational Research, SUNY at Buffalo

Page 42: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

SnB Strengths1. Powerful, state-of-the-art

direct methods for automatically locating heavy atom sites

2. Friendly graphical user

interface.

SnB Weaknesses

1. Stops after finding sites, i.e no protein phasing

2. No software interface

PHASES Strengths1. Proven protein phasing (MAD,

MIRAS, etc), solvent flattening, NCS

averaging, external program interfacing

2. Interactive graphics

PHASES Weaknesses1. Doesn’t automatically find

heavy atom sites2. Script based, i.e. no GUI

Goal:Goal: Provide user-friendly software for automatic Provide user-friendly software for automatic determination of protein crystal structuresdetermination of protein crystal structures

Page 43: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Combine the SnB program with the “PHASES” package, putting everything under GUI control

Establish default parameters and procedures allowing all aspects of the structure determination to be fully automated

Also provide a manual mode allowing experienced users more control, and to facilitate development

Provide graphical feedback when possible

Facilitate coupling with popular external software

Adopted StrategyAdopted Strategy

Page 44: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Automatic substructure solution detection

Automatic substructure validation

Automatic hand determination (including space group changes, when needed)

Main Developments Required for Main Developments Required for Automated Structure DeterminationAutomated Structure Determination

Page 45: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Automatic Substructure Solution Automatic Substructure Solution DetectionDetection

Original MethodBased on histogram(Manual, time consuming,requires user interaction)

Current MethodBased on Rmin andRcryst statistics(Automatic, fast, no user interaction)

Page 46: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Automatic Substructure ValidationAutomatic Substructure Validation

Original MethodLeft up to user to decide which peaks correspond to true sites (Manual)

Current Method (auto mode)Based on occupancyrefinement against Bijvoetdifferences (Automatic, fast,requires no coordinate refinement, hand insensitive)

Current Method (manual mode)As in auto but can also comparepeaks from different solutions (Manual)

Page 47: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Automatic Substructure ValidationAutomatic Substructure Validation

Page 48: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Automatic Hand DeterminationAutomatic Hand Determination

Original MethodVisual inspection of map projections (Manual,requires user interaction)

Current Method(MAD, SIRAS or MIRAS)Based on variance differences in proteinand solvent regions (Automatic, fast since requires no refinement, also requires no user interaction)

Page 49: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Automatic Hand DeterminationAutomatic Hand Determination

Current Method(SAS data only)

Comparative analysis of R, FOM and CC after solvent flattening/phase combination. (Automatic, fast, requires no refinement)

Current Method(SIR, MIR data only)

Both hands tried, map examination needed. (Requires user interaction)

Page 50: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

No man (or program) is an islandNo man (or program) is an islandImporting data files

Scalepack files D*Trek files MTZ files$

Free format files

Exporting control files

O RESOLVE 2.08 Arp/wARP 6.1.1

Exporting data files

Free format files CNS files MTZ files$

O files CHAIN files PDB files

Job submission from GUI

RESOLVE$ 2.08 Arp/wARP$ 6.1.1

$RESOLVE, Arp/wARP and/or CCP4 must be obtainedfrom their respective authors/distributors for theseoptions to work

Page 51: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Results for 1jc4Results for 1jc4

a=43.6 b=78.6, c=89.4 Å, = 91.95°, P21

4 molecules (592 residues) in asu2.1Å data, 3 MAD data

Substructure: Found 24 of 24 Se

Phasing: mean PP- 2.95; mean FOM- 0.661

Time to map: ~41 min on G4 (1.5 GHz) Powerbook

~13 min on G5 (2.7 GHz) Desktop

Auto Tracability:Resolve- 87% main chain, 68% side chainArp/wARP- 82% main chain, 73% side chain

Page 52: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

SeMet ASU Size & Data Resolution PDB

Code

No.

Sites

No.

Residues NCS

d(Å)

PDB

Code

No.

Sites

No.

Residues NCS

d(Å)

1QC2 4 169 1 1.5 1CLI 28 1380 4 3.0

1BX4 7 345 1 2.25 1A7A 30 864 2 2.8

1CB0 8 283 1 2.2 1L8A 40 1772 2 2.6

1T5H 10 504 1 2.5 1E3M 45 1600 2 3.0

2JXH 12 576 2 3.1 1HI8 50 1328 2 2.8

1GSO 13 431 1 2.22 1GKP 54 2748 6 2.5

2TPS 15 454 2 2.7 1DQ8 60 1868 4 2.33

1DBT 19 717 3 2.49 1E2Y 60 1880 10 3.2

1JEN 22 668 2 2.25 1M32 66 2196 6 2.55

1JC4 24 592 4 2.1 1EQ2 70 3100 10 2.9

Page 53: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

Phasing Flexibility (Manual Mode)

Page 54: Direct Methods and Many Site Se-Met MAD Problems using BnP Direct Methods and Many Site Se-Met MAD Problems using BnP W. Furey

ConclusionConclusion

BnP is a user friendly, efficient, package for theautomated determination of protein structuresfrom x-ray diffraction data

BnP downloads for Linux, Apple G4, G5, & Intel, andSGI’s available (academic & non-profit institutions) at

http://www.hwi.buffalo.edu/BnP/