26
RECOORD RECOORD RE RE calculated calculated COOR COOR dinates dinates D D atabase atabase Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University [email protected] Jurgen Doreleijers Center for Eukaryotic Structural Genomics University of Madison-Wisconsin [email protected] Wim Vranken Macromolecular Structure Database European Bioinformatics Institute [email protected]

RECOORD RE calculated COOR dinates D atabase

  • Upload
    tamarr

  • View
    39

  • Download
    0

Embed Size (px)

DESCRIPTION

RECOORD RE calculated COOR dinates D atabase. Jurgen Doreleijers Center for Eukaryotic Structural Genomics University of Madison-Wisconsin [email protected]. Aart Nederveen Bijvoet Center for Biomolecular Research Utrecht University [email protected]. Wim Vranken - PowerPoint PPT Presentation

Citation preview

Page 1: RECOORD RE calculated  COOR dinates  D atabase

RECOORDRECOORD

REREcalculated calculated COORCOORdinates dinates DDatabaseatabase

Aart NederveenBijvoet Center for Biomolecular ResearchUtrecht [email protected]

Jurgen DoreleijersCenter for Eukaryotic Structural GenomicsUniversity of [email protected]

Wim VrankenMacromolecular Structure DatabaseEuropean Bioinformatics [email protected]

Page 2: RECOORD RE calculated  COOR dinates  D atabase

AimAim

• Recalculation of protein structures based on deposited NMR restraints using state of the art methods

• Goals:• decrease user- and software-dependent biases

• allow a better comparison between structures

• comparison between different structure calculation programs

• provide a database for the development and assessments of validation tools and calculation protocols

Page 3: RECOORD RE calculated  COOR dinates  D atabase

Overview recalculation projectOverview recalculation project

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

BMRB:STAR filesDoreleijers et al. 2003

BMRB:STAR filesDoreleijers et al. 2003

EBI/UU:Generation ofconsistentSTAR files

EBI/UU:Generation ofconsistentSTAR files

PDB:-coordinates-restraints

PDB:-coordinates-restraints

CYANA-sequence-MD SA-…

CYANA-sequence-MD SA-…

restraint manipulation

analysis

recalculation

design of RECOORD

CNS-topology-MD SA -refinement

CNS-topology-MD SA -refinement

1 2 3

4

6

5

Page 4: RECOORD RE calculated  COOR dinates  D atabase

Databases now publicly availableDatabases now publicly available

• DOCR/FRED (BMRB)databases containing converted and filtered restraintshttp://www.bmrb.wisc.edu/servlets/MRGridServlet

• RECOORD (EBI)database containing recalculated coordinateshttp://www.ebi.ac.uk/msd/recoord

Page 5: RECOORD RE calculated  COOR dinates  D atabase

SelectionSelection

• Formats (if distance restraints available): • CNS/XPLOR • DIANA/DYANA/CYANA• DISCOVER/MSI

• PDB entries selected:• only proteins• no HET atoms• multimers allowed (not yet re-calculated)• at least 20 residues

• Finally 545 monomers were selected

BMRB:STAR filesDoreleijers et al. 2003

BMRB:STAR filesDoreleijers et al. 2003

PDB:-coordinates-restraints

PDB:-coordinates-restraints

1 2

Page 6: RECOORD RE calculated  COOR dinates  D atabase

Conversion issuesConversion issues

• Data is converted to formats readable by calculation software (e.g. XPLOR/CNS and CYANA) by the FormatConverter available within CCPN software (Wim Vranken, EBI).

Problems:

• Differences between coordinate and restraint data:• e.g. 1 chain in pdb entry, 2 chains in restraint list

• residue numbering can differ in PDB entry and restraint list

• restraints for residues not present in PDB entry…

• Nomenclature in restraint list

EBI/UU:Generation ofconsistentSTAR files

EBI/UU:Generation ofconsistentSTAR files

3

Page 7: RECOORD RE calculated  COOR dinates  D atabase

Building topologyBuilding topology

• Starting script: generate_easy.inp from CNS

• Automated detection in original ensemble of:• Disulfide bridges (<3Å S-S distance in original first models)

• CIS peptides (if ||<25º in original first models)

• Protonation state of histidines (use CNS patches HISD, HISE)

• CYANA: sequence based on CNS topology• Add CYSS, HIST, HIST+, cPRO in sequence

• Automated generation of disulfide restraints

CYANA-sequence-MD SA-…

CYANA-sequence-MD SA-…

CNS-topology-MD SA -refinement

CNS-topology-MD SA -refinement

4 5

Page 8: RECOORD RE calculated  COOR dinates  D atabase

CONDOR computer cluster CS CONDOR computer cluster CS University MadisonUniversity Madison

• More than 800 processor used

• Total CPU time: 31,169 hours (3.5 years on single workstation)

• Example 2EZM, calculation of 1 model

(101 a.a. & 2.2 GHz P4 computer)CYANA 31 seconds

CNS 340 seconds

CYANA-sequence-MD SA-…

CYANA-sequence-MD SA-…

CNS-topology-MD SA -refinement

CNS-topology-MD SA -refinement

4 5

Page 9: RECOORD RE calculated  COOR dinates  D atabase

Evaluation of structure qualityEvaluation of structure quality

• Agreement with experimental restraints

• Improvement?

• Comparison CNS and CYANA

• Relation NMR data quality and structural

quality

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

Page 10: RECOORD RE calculated  COOR dinates  D atabase

Distance restraints violations Distance restraints violations

ORG: 0.08 Å (0.14 Å)

original entries

CNW: 0.04 Å (0.05 Å)

recalculated in CNS and refined in water

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

RMS distance restraints violations (Å)

freq

uenc

y

Page 11: RECOORD RE calculated  COOR dinates  D atabase

Dihedral restraints violationsDihedral restraints violations

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

RMS dihedral restraints violations (degrees)

freq

uenc

y

ORG: 1.6° (4.6°)

original entries

CNW: 0.5° (0.5°)

recalculated in CNS and refined in water

Page 12: RECOORD RE calculated  COOR dinates  D atabase

Results: quality indicatorsResults: quality indicatorsperformance CNS vs. CYANA (no water refinement yet)performance CNS vs. CYANA (no water refinement yet)

Average value over 545 entries

Original PDB

CNS recalculatio

n

CYANA recalculation

RMS distance restraints violations (Å)

0.08 ± 0.14 0.04 ± 0.06 0.04 ± 0.05

RMS dihedral restraints violations (degrees)

1.6 ± 4.6 0.5 ± 0.7 0.5 ± 0.7

Packing quality (Z-score) WHATCHECK

-3.5 ± 1.9 -4.1 ± 1.9 -4.3 ± 1.8

Bumps per 100 residues

73 ± 63 11 ± 9 86 ± 37

% most favoured PROCHECK

69 ± 14 69 ± 13 61 ± 14

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

Page 13: RECOORD RE calculated  COOR dinates  D atabase

Results: quality indicatorsResults: quality indicatorsperformance CNS before and after water refinementperformance CNS before and after water refinement

Average value over 545 entries

Original PDB

CNS recalculatio

n

CNS + water refinement

RMS distance restraints violations (Å)

0.08 ± 0.14 0.04 ± 0.06 0.04 ± 0.05

RMS dihedral restraints violations (degrees)

1.6 ± 4.6 0.5 ± 0.7 0.5 ± 0.5

Packing quality (Z-score) WHATCHECK

-3.5 ± 1.9 -4.1 ± 1.9 -2.5 ± 2.0

Bumps per 100 residues

73 ± 63 11 ± 9 10 ± 7

% most favoured PROCHECK

69 ± 14 69 ± 13 76 ± 11

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

Page 14: RECOORD RE calculated  COOR dinates  D atabase

Improvement: Improvement: packing and Ramachandran Z-packing and Ramachandran Z-

scoresscores

missing data

For ~ 5 % of entries no improvement possible because of missing NMR data compared to authors

improvement packing

impr

ovem

ent

Ram

acha

ndra

n analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

Improvent Z-score:

Z=Zrefined - Zoriginal

Page 15: RECOORD RE calculated  COOR dinates  D atabase

In search of correlations In search of correlations (Pearson coefficient)(Pearson coefficient)

data density

RMS violations

circular variance

packing(Z score)

Ramachandran(Z score)

bumps

data density

-0.23 -0.46 0.35 0.31 -0.03

RMS violations

-0.11 0.22 -0.25 -0.37 0.58

circular variance

-0.32 0.00 -0.60 -0.67 0.25

packing(Z-score)

0.32 -0.06 -0.49 0.69 -0.39

Ramachandran(Z-score)

0.16 -0.11 -0.48 0.48 -0.51

bumps 0.04 0.04 0.07 -0.21 -0.47

original

refined

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

(correlations lower)

(correlations higher)

Page 16: RECOORD RE calculated  COOR dinates  D atabase

data densit

y

RMS violatio

ns

circular varianc

e

packing(Z score)

Ramachandran

(Z score)

bumps

data density

-0.23 -0.46 0.35 0.31 -0.03

RMS violations

-0.11 0.22 -0.25 -0.37 0.58

circular variance

-0.32 0.00 -0.60 -0.67 0.25

packing(Z-score)

0.32 -0.06 -0.49 0.69 -0.39

Ramachandran(Z-score)

0.16 -0.11 -0.48 0.48 -0.51

bumps 0.04 0.04 0.07 -0.21 -0.47

In search of correlations In search of correlations (Bumps)(Bumps)

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

original

refined

Page 17: RECOORD RE calculated  COOR dinates  D atabase

data densit

y

RMS violatio

ns

circular varianc

e

packing(Z score)

Ramachandran

(Z score)

bumps

data density

-0.23 -0.46 0.35 0.31 -0.03

RMS violations

-0.11 0.22 -0.25 -0.37 0.58

circular variance

-0.32 0.00 -0.60 -0.67 0.25

packing(Z-score)

0.32 -0.06 -0.49 0.69 -0.39

Ramachandran(Z-score)

0.16 -0.11 -0.48 0.48 -0.51

bumps 0.04 0.04 0.07 -0.21 -0.47

In search of correlations In search of correlations (NMR data density)(NMR data density)

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

original

refined

Page 18: RECOORD RE calculated  COOR dinates  D atabase

Correlation NMR data density Correlation NMR data density Ramachandran Z-scoreRamachandran Z-score

NMR data density

Ram

acha

ndra

n Z

-sco

re

r=0.31

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

Page 19: RECOORD RE calculated  COOR dinates  D atabase

Correlation NOE completeness Correlation NOE completeness and packing Z-scoreand packing Z-score

NMR data-based indicators cannot yield any indication of the normality of the structures

NOE completeness

pack

ing

Z-s

core

r=0.20

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

Page 20: RECOORD RE calculated  COOR dinates  D atabase

data densit

y

RMS violatio

ns

circular varianc

e

packing(Z score)

Ramachandran

(Z score)

bumps

data density

-0.23 -0.46 0.35 0.31 -0.03

RMS violations

-0.11 0.22 -0.25 -0.37 0.58

circular variance

-0.32 0.00 -0.60 -0.67 0.25

packing(Z-score)

0.32 -0.06 -0.49 0.69 -0.39

Ramachandran(Z-score)

0.16 -0.11 -0.48 0.48 -0.51

bumps 0.04 0.04 0.07 -0.21 -0.47

In search of correlations In search of correlations (Precision)(Precision)

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

original

refined

Page 21: RECOORD RE calculated  COOR dinates  D atabase

Correlation between precision and Correlation between precision and data densitydata density

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6ci

rcul

ar v

aria

nce

NMR data density

r=-0.46

Page 22: RECOORD RE calculated  COOR dinates  D atabase

Correlation between precision Correlation between precision and Ramachandranand Ramachandran

1SUT

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

Ramachandran plot appearance (Z-score)

circ

ular

var

ianc

e

r=-0.67

Protein with high Ramachandran normality will have small circular variance

Page 23: RECOORD RE calculated  COOR dinates  D atabase

Correlation between RMSD and Correlation between RMSD and structural uncertainty (QUEEN)structural uncertainty (QUEEN)

r=-0.69

structural uncertainty

back

bone

RM

SD

)

Structural uncertainty imposes lower limit to the RMSD

analysis-improvement?-correlations?-…

analysis-improvement?-correlations?-…

6

Page 24: RECOORD RE calculated  COOR dinates  D atabase

Conclusions IConclusions I

• NMR-STAR files made consistent for 545 out of ±1700 entries

• Protocols and scripts available for recalculation in CYANA and CNS

• Validation database available for testing of new protocols

• Improvement compared to original data: 1 standard deviation closer to X-ray db• violations in original data do no limit recalculation effort

• refinement in water required

• 5 % no improvement: data missing

Page 25: RECOORD RE calculated  COOR dinates  D atabase

Conclusions IIConclusions II

• Correlations higher after recalculation and refinement, though most of them still weak

• Highest correlation: precision vs. Ramachandran score & structural uncertainty (QUEEN)

Page 26: RECOORD RE calculated  COOR dinates  D atabase

AcknowledgementsAcknowledgements

• Utrecht University Alexandre Bonvin Rob Kaptein

• EBI Cambridge Wim Vranken• CESG/BMRB Jurgen Doreleijers

Zachary MillerEldon Ulrich John Markley

• Radboud University Nijmegen Chris Spronk Sander Nabuurs

• RIKEN Japan Peter Güntert• Institut Pasteur Paris Michael Nilges