Upload
edana
View
44
Download
4
Embed Size (px)
DESCRIPTION
The RCSB Protein Data Bank Teaching an Old Dog New Tricks. Philip E. Bourne [email protected]. From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted. Biocurator Perspectives. A Tribute. Agenda. The old dog - PowerPoint PPT Presentation
Citation preview
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
The RCSB Protein Data BankTeaching an Old Dog New Tricks
Philip E. Bourne
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
A Tribute
From the guardian of a resource (institution) to all those men and
women who make biology possible – may we never take you for
granted
Biocurator Perspectives
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Agenda
• The old dog• New tricks
– Thinking differently about proteins– Virtual Communities
• Internal (wwPDB)• External
• What will the resource look like in 2-5 years?
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
History of the Old Dog1970s• Community discussions about how to establish an archive of protein structures• Cold Spring Harbor meeting in protein crystallography• PDB established at Brookhaven (October 1971; 7 structures)1980s• Number of structures increases as technology improves• Community discussions about requiring depositions• IUCr guidelines established• Number of structures deposited increases1990s• Ontology defined • Structural genomics begins• PDB moves to RCSB 2000s• wwPDB formed
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
History of the Old Dog1970s• Community discussions about how to establish an archive of protein structures• Cold Spring Harbor meeting in protein crystallography• PDB established at Brookhaven (October 1971; 7 structures)1980s• Number of structures increases as technology improves• Community discussions about requiring depositions• IUCr guidelines established• Number of structures deposited increases1990s• Ontology defined • Structural genomics begins• PDB moves to RCSB 2000s• wwPDB formed
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Unchanging Core Mission
• Create and maintain a well-curated database of macromolecular structure data derived using experimental methods
that will…• Facilitate and support scientific research and education
that is…• Always accessible to a diverse user community worldwide• Developed in collaboration with that community
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Challenges - Scientific • More complex structures – molecular machines,
complexes• New methods (e.g. EM)• Lack of a vocabulary to provide reductionism in
complex structures• Partially solved problems in analyzing structures –
structure alignments, domain definitions, functional site determination and characterization, pathway relationships, interaction partners
• Integrating microscopic and macroscopic views• Disease relationships
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
um
ber
of
rele
ased
en
trie
s
Year:
Growth and Complexity
Structure
SWISS-PROT/ GenBank IDs
Gene Ontology
Enzyme Commission
SourceOrganism
OMIM/Disease
Genomes(NCBI Gene)
Structural Genomics Targets
PubmedNCBI Taxonomy
Domains/Families
Primary References Derived References
•Source Organism Browser
•GO Browsers•Find Structures by GO ID
• Enzyme Browser
• Reactome
• Genome Browser•SNPs Mapped to Structure•Find Structures by SP ID
SCOP
CATH
•Disease Browser
Some Actions
•CATH Browser•SCOP Browser•PFAM Display
•Abstract Search
• Target Search
Data Integration
NAR 2005, 33: D233-D237
Human Proteome &Homology Models
•Function Coverage•Target Selection
PFAM
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Challenges - Technical• Sheer numbers• Efficient visualization• Improved annotation• Demands from a more diverse user base• Centralization versus decentralization• Web V2
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Diverse User Community (180,000 individuals per month)
and Diversifying Further• Structural biologists
• Computational biologists
• Experimental biologists
• Educators
• Students
• Lay public
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Agenda
• The old dog• New tricks
– Thinking differently about proteins– Virtual Communities
• Internal (wwPDB)• External
• What will the resource look like in 2-5 years?
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
New Tricks – Protein Representation
The conventional view of a protein (left) has had a remarkable impact on our understanding of living systems, but is it time for a new view? It is not how one protein sees another after all.
Limitations of a Cartesian Viewpoint
• A local viewpoint – does not capture the global properties of the protein
• Limited to a single scale descriptor
• Limits comparative analysis
New Tricks – Protein Representation
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Superfamily Members – The Same But Different
Protein kinase like superfamily. Left - rmsd distance matrix. Right – number of violations of the triangle inequality at each pair of
proteins.
Alignment Violates the Triangle Inequality
),(),(),(|),(),(| kjdjidkidkjdjid
Many of the features in the distance matrix may be due to “distortions” induced by the failure to satisfy the TI.
New Tricks – Protein Representation
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
• Roots in spherical harmonics
• Parameter space and boundary conditions can be a variety of properties
• Order of the multipoles defines the granularity of the descriptors
• Bottom line – interpreted as shape descriptors
An Alternative Approach: Multipolar Representation
Gramada & Bourne 2006 BMC Bioinformatics 7:242
Results – Protein Kinase Like Superfamily Alignment
Scheeff & Bourne 2005 PLoS Comp. Biol., 1(5) e49
Clear distinction between families.
Some clustering seen inside TPKs that resemble various groups, even though there is little shape discrimination at this level.
New Tricks – Protein Representation
Possibilities – Structure Based Phylogenetic Analysis
Scheeff & Bourne Multipoles
New Tricks – Protein Representation
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
New Tricks – Protein Motion
OrderedStructures
DisorderedStructures
Structures exist in a spectrum from order to disorder
Obtaining Protein Dynamic InformationProtein Structures Treated as a
3-D Elastic Network
Bahar, I., A.R. Atilgan, and B. Erman
Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential.
Folding & Design, 1997. 2(3): p. 173-181.
New Tricks – Protein Motion
Gaussian Network Model
• Each C is a node in the network.
• Each node undergoes Gaussian-distributed fluctuations influenced by neighboring interactions within a given cutoff distance. (7Å)
• Decompose protein fluctuation into a summation of different modes.
New Tricks – Protein Motion
Functional Flexibility Score
• Utilize correlated movements to help define regional flexibility with functional importance.
Functionally Flexible Score
For each residue:
1. Find Maximum and Minimum Correlation.
2. Use to scale normalized fluctuation to determine functional importance.
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Identifying FFRs in HIV Protease
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Other Examples BPTI and Calmodulin
Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90
Side Note: Gaussian Network Model vs Molecular Dynamics
• GNM relatively course grained
• GNM fast to compute vs MD–Look over larger time scales
–Suitable for high throughput
New Tricks – Protein Motion
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
An Active Research Program Around the Resource is Good for
the Resource
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Agenda
• The old dog• New tricks
– Thinking differently about proteins– Virtual Communities
• Internal (wwPDB)• External
• What will the resource look like in 2-5 years?
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
• Ensures that the PDB
remains a single &
uniform archive publicly
available to the worldwide
community
• 3 founding members:
RCSB PDB, PDBj, MSD-
EBI
Single worldwide archive of macromolecular structural data
Virtual Communities - Internal
wwPDB Activities
• Collaborative projects– Remediation
• taxonomy, ligands, literature
– Single data processing system
Virtual Communities - Internal
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Agenda
• The old dog• New tricks
– Thinking differently about proteins– Virtual Communities
• Internal (wwPDB)• External (modeling, other….)
• What will the resource look like in 2-5 years?
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Virtual Communities - External
Consider the PDB a gathering point through which a virtual and
real community interacts with each other around a common
interest
Virtual Communities - External
PDB-in-a-CAVE
NJ Science Olympiad Science ExpoTraveling art exhibit
for lay audiences
Website Tutorials/Feedback
Molecule of the Month
Real
Virtual
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Virtual Communities - Modelers• Recommendations of Workshop
– PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules
– A central, publicly available archive (or technical equivalent thereof) or portal should be established for models
– It was unanimously agreed that methods for assessing model quality are essential
Structure 2006 To be published
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Agenda
• The old dog• New tricks
– Thinking differently about proteins– Virtual Communities
• Internal (wwPDB)• External
• What will the resource look like in 2-5 years?
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
What Will the Resource Look Like in the Next 2-5 Years?
• Upwards of 75,000 structures• Consensus (and different) views at the micro and
macro scale – domains, SNPs, gene structure, cell localization, pathways, interactions, post-translational modification…
• Community annotation cf Wikipedia• Distributed subsets - External Reference Files (XML)• MyPDB• PDB-in-a-box• Specialized visualization tools (mbt.sdsc.edu)
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
Is a database really different than a biological journal?
PloS Comp Biol 2005 1(3) e34
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
Now assigning DOIs to structures
Swiss-Prot - 20 Year Celebration
www.pdb.org • [email protected]
Acknowledgements
The RCSB PDB
Jenny GuProtein Motions
NIH, NSF, DOE
Apostol GramadaMultipole Analysis
A Protein is More than the Union of its Parts
• Breaking the protein into parts changes the object of the comparison
• This is interpreted in many cases to imply that the rmsd measure is inadequate.
• The reality is that it is the aligning of structure that breaks the triangle inequality and not the measure per se. The reason for failure is that we effectively compare different objects then we say we do.
From Røgen & Fain (2003), PNAS 100:119-124
New Tricks – Protein Representation
An Alternative Approach: Multipolar Representation
Roots in Spherical Harmonics
• Parameterization
+ boundary conditionsgCharge distribution (i.e. structure) Ð
f qlm out;M lm in;qilm; M i
lmg
Scalar potential
Gramada & Bourne 2006 BMC Bioinformatics 7:242
New Tricks – Protein Representation
Spatial distribution ofa scalar quantity
• “Out” Multipoles
qlm =P
i=1
N
r li Y ã
lm(òi;þi); l = 0;ááá;1 ; m = à l;ááá;l
For a given rank l, they form a 2l+1 dimensional vector under 3D rotations
ql = fql;mgm=à l;ááá;l
Vector algebra applies => metric properties
Gramada & Bourne 2006 BMC Bioinformatics 7:242
An Alternative Approach: Multipolar Representation
New Tricks – Protein Representation
The multipoles can be interpreted as shape descriptors
In principle, from the entire series of multipoles one can reconstruct the scalar field and therefore the density, i.e the entire set of Cartesian coordinates, i. e. of the structure with a geometric level of detail
The partitioning of the multipole series according to various representation of the rotational group allows for a multi-scale description of the structure
An Alternative Approach: Multipolar Representation
Gramada & Bourne 2006 BMC Bioinformatics 7:242New Tricks – Protein Representation