Upload
philip-bourne
View
1.027
Download
2
Tags:
Embed Size (px)
DESCRIPTION
What constitutes structural bioinformatics and 2 example areas from our own work - studying evolution using structure and what really happens when we take a drug. Presented to UCSD medical students in years 1-3
Citation preview
Structural Bioinformaticswith Examples Drawn from Our Own Work
Philip E. Bourne Professor of Pharmacology UCSD
Associate Vice Chancellor for Innovation & Industry Alliances
1MED26410/24/13
How I Got Excited
Some Things Stay with You Your Whole Life
Num
ber
of r
elea
sed
entr
ies
Drivers: Numbers & Complexity
Courtesy of the RCSB Protein Data Bank4MED26410/24/13
Putting Structural Bioinformatics in Perspective
MED264 5
PharmacyInformatics
BiomedicalInformatics
Bioinformatics
Drug dosingPharmacokineticsPharmacy InformationSystems
EHRDecision support systemsHospital Information Systems
AlgorithmsGenomicsProteomicsBiological networksSystems Biology
Note: These are only representative examples
10/24/13
Putting Structural Bioinformatics in Perspective
MED264 6
PharmacyInformatics
BiomedicalInformatics
Bioinformatics
Controlled vocabulariesOntologiesLiterature searchingData managementPharmacogenomicsPersonalized medicine
Note: These are only representative examples
10/24/13
Structural Bioinformatics
7MED26410/24/13
Structural Bioinformatics – Example Topics
• Structure prediction• Evolution• Drug discovery• Sequence-structure-
function relationships….
10/24/13 MED264 8
Video: http://www.scivee.tv/node/11616
10
Determining 3D Structures – X-ray Crystallography
Basic Steps
Target Selection
Crystallomics• Isolation,• Expression,• Purification,• Crystallization
DataCollection
StructureSolution
StructureRefinement
Functional Annotation Publish
Structural biology moves from being functionally driven to genomically driven
Fill inprotein fold
space
Robotics-ve data
Software engineering Functional prediction
Notnecessarily
MED26410/24/13
Enough background lets look at two fundamental questions where structural bioinformatics is critical
1. Is structure useful in studying evolution and what can we learn?
2. What really happens when we take a drug?
10/24/13 MED264 11
Nature’s ReductionismThere are ~ 20300 possible proteins>>>> all the atoms in the Universe
~45M protein sequences from UniProt
~90,000 protein structures Yield ~1500 folds, ~2000 superfamilies,
~4000 families (SCOP 1.75)10/24/13 MED264 12
13
Structure Provides an Evolutionary Fingerprint
Distribution among the three kingdoms as taken from SUPERFAMILY
• Superfamily distributions would seem to be related to the complexity of life
Eukaryota (650)
Archaea (416) Bacteria (564)
2 42
10
135
118
387
17
SCOP fold (765 total)
1
153/14
9/1
21/2 310/0645/49
29/0 68/0
Any genome / All genomes
10/24/13 MED264
14
Method – Distance Determination
(FSF)SCOP
SUPERFAMILY
organisms
C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
C. intestinalis C. briggsae F. rubripes
C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0
Presence/Absence Data Matrix
Distance Matrix
10/24/13 MED264
15
If Structure is so Conservedis it a Useful Tool in the Study of Evolution?
The Answer Would Appear to be Yes
• It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
10/24/13 MED264
16
The Influence of Environment on Life
Chris Dupont Scripps Institute of Oceanography
UCSD
DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827
10/24/13 MED264
17
Consider the Distribution of Disulfide
Bonds among Folds • Disulphides are only stable under
oxidizing conditions• Oxygen content gradually
accumulated during the earth’s evolution
• The divergence of the three kingdoms occurred 1.8-2.2 billion years ago
• Oxygen began to accumulate ~ 2.0 billion years ago
• Logical deduction – disulfides more prevalent in folds (organisms) that evolved later
• This would seem to hold true
• Can we take this further?
Eukaryota
Archaea Bacteria
0% (0/2)
16.7% (7/42)
0% (0/10)
31.9% (43/135)
14.4% (17/118) 4.7%
(18/387)
5.9% (1/17)
SCOP fold (708 total)
1
10/24/13 MED264
18
Evolution of the Earth• 4.5 billion years of change• 300+50K• 1-5 atmospheres• Constant photoenergy• Chemical and geological
changes• Life has evolved in this time
• The ocean was the “cradle” for 90% of evolution
10/24/13 MED264
19
• Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines).
• The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom.
0
0.5
1
1.00E-20
1.00E-16
1.00E-12
1.00E-08
1.00E-15
1.00E-12
1.00E-09
1.00E-06
1.00E-11
1.00E-09
1.00E-07
00.511.522.533.544.5
Billions of years before present
Concentration
(O2
in arbitrary units, Zn and Fe in m
oles L-1
BacteriaArchaea
Eukarya
Oxygen
Zinc
Iron
CobaltManganese
Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History
Replotted from Saito et al, 2003Inorganica Chimica Acta 356: 308-318
10/24/13 MED264
20
The Gaia Hypothesis
Gaia - a complex entity involving the Earth's biosphere, atmosphere, oceans, and soil; the totality constituting a feedback system which seeks an optimal physical and chemical environment for life on this planet.
James Lovelock
Gaia (pronounced /'geɪ.ə/ or /'gaɪ.ə/) "land" or "earth", from the Greek Γαῖα; is a Greek goddess personifying the Earth
10/24/13 MED264
21
The Question
• Have the emergent properties of an organism as judged by its protein content been influenced by the environment?
• Will do this by consideration of the metallomes of a broad range of species
• The metallomes can only be deduced by consideration of the protein structures to which the metal is covalently bound
• Will hypothesize that these emergent properties in turn influenced the environment
10/24/13 MED264
22
Bacteria Fe superfamilies
a.1.1 a.1.2
a.104.1 a.110.1
a.119.1 a.138.1
a.2.11 a.24.3
a.24.4 a.25.1
a.3.1 a.39.3
a.56.1 a.93.1
b.1.13 b.2.6
b.3.6 b.33.1
b.70.2 b.82.2
c.56.6 c.83.1
c.96.1 d.134.1
d.15.4 d.174.1
d.178.1 d.35.1
d.44.1 d.58.1
e.18.1 e.19.1
e.26.1 e.5.1
f.21.1 f.21.2
f.24.1 f.26.1
g.35.1 g.36.1
g.41.5
Eukaryotic Fe superfamilies
a.1.1 a.1.2
a.104.1 a.110.1
a.119.1 a.138.1
a.2.11 a.24.3
a.24.4 a.25.1
a.3.1 a.39.3
a.56.1 a.93.1
b.1.13 b.2.6
b.3.6 b.33.1
b.70.2 b.82.2
c.56.6 c.83.1
c.96.1 d.134.1
d.15.4 d.174.1
d.178.1 d.35.1
d.44.1 d.58.1
e.18.1 e.19.1
e.26.1 e.5.1
f.21.1 f.21.2
f.24.1 f.26.1
g.35.1 g.36.1
g.41.5
Superfamily Distribution As Well As Overall Content Has Changed
10/24/13 MED264
23
Metal Binding Proteins are Not Consistent Across Superkingdoms
0
1
2
Zn Fe Mn Co
Archaea Bacteria Eukarya
Total domains in a proteome
Tot
al Z
n-bi
ndin
g do
mai
ns in
a p
rote
ome
10
104
102.5 105
Slo
pe o
f fi
tted
pow
er la
w
A B
Since these data are derived from current species they are independent ofevolutionary events such as duplication, gene loss, horizontal transfer andendosymbiosis
10/24/13 MED264
Power Laws: Fundamental Constants in the Evolution of Proteomes
A slope of 1 indicates that a group of structural domains is in equilibrium with genome
growth, while a slope > 1 indicates that the group of domains is being preferentially
duplicated (or retained in the case of genome reductions).
van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.). Power laws, scale-free networks, and genome biology
10/24/13 MED264 24
25
Why are the Power Laws Different for Each Superkingdom?
• Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing Eukarya and Prokarya are correlated to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen
• We hypothesize that proteomes contain an imprint of the environment at the time of the last common ancestor in each Superkingdom
• This suggests that Eukarya evolved in an oxic environment, whereas the Prokarya evolved in anoxic environments
10/24/13 MED264
26
Do the Metallomes Contain Further Support for this Hypothesis?
Overall percent of Fe bound bySuperkingdom Fold Family % Fe-binding O2 Fe-S heme amino
Cytochrome P450 0.44 + 0.48 heme yesCytochrome c3-like 0.13 + 0.3 heme noCytochrome b5 0.12 + 0.09 heme no
Eukarya Purple acid phosphatase 0.11 + 0.08 amino no 21 + 9 47 + 19 32 + 12Penicillin synthase-like 0.07 + 0.1 amino yesHypoxia-inducible factor 0.07 + 0.04 amino yesDi-heme elbow motif 0.06 + 0.01 heme no
4Fe-4S ferredoxins 1.80 + 0.7 Fe-S noMoCo biosynthesis proteins 1.60 + 0.3 Fe-S noHeme-binding PAS domain 1.10 + 1.0 heme no
Archaea HemN 0.80 + 0.20 Fe-S 1 68 + 12 13 + 14 19 + 6a helical ferrodoxin 0.60 + 0.16 Fe-S nobiotin synthase 0.55 + 0.1 Fe-S noROO N-terminal domain-like 0.5 + 0.1 amino 2
High potential iron protein 0.38 + 0.25 Fe-S noHeme-binding PAS domain 0.3 + 0.4 heme 1MoCo biosynthesis proteins 0.21 + 0.15 Fe-S no
Bacteria HemN 0.2 + 0.15 Fe-S no 47 + 11 22 + 12 31 + 164Fe-4S ferredoxins 0.2 + 0.2 Fe-S nocytochrome c 0.14 + 0.2 heme noa helical ferrodoxin 0.12 + 0.09 Fe-S no
1. Some, but not all, PAS domains actually sense oxygen2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway
10/24/13 MED264
27
e- Transfer ProteinsSame Broad Function, Same Metal, Different Chemistry
Induced by the Environment?
Fe-S clustersFe bound by S
Cluster held in place by Cys
Generally negative reduction potentials
Very susceptible to oxidation
CytochromesFe bound by heme (and
amino-acids)
Generally positive reduction potentials
Less susceptible to oxidation
10/24/13 MED264
28
Hypothesis
• Emergence of cyanobacteria changed oxygen concentrations
• Impacted relative metal ion concentrations in the ocean
• Organisms evolved to use these metals in new ways to evolve new biological processes eg complex signaling
• This in turn further impacted the environment
• Only protein structures could reveal such dependencies
10/24/13 MED264
What really happens when we take a drug?
MED264 2910/24/13
Our Motivation• Tykerb – Breast cancer
• Gleevac – Leukemia, GI cancers
• Nexavar – Kidney and liver cancer
• Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive
Collins and Workman 2006 Nature Chemical Biology 2 689-700
10/24/13 MED264 30
A Reverse Engineering Approach to Drug Discovery Across Gene FamiliesCharacterize ligand binding site of primary target (Geometric Potential)
Identify off-targets by ligand binding site similarity(Sequence order independent profile-profile alignment)
Extract known drugs or inhibitors of the primary and/or off-targets
Search for similar small molecules
Dock molecules to both primary and off-targets
Statistics analysis of docking score correlations
…
Xie and Bourne 2009 Bioinformatics 25(12) 305-312
31
• Initially assign Ca atom with a value that is the distance to the environmental boundary
• Update the value with those of surrounding Ca atoms dependent on distances and orientation – atoms within a 10A radius define i
0.2
0.1)cos(
0.1
i
Di
PiPGP
neighbors
a
Conceptually similar to hydrophobicity or electrostatic potential that is dependant on both global and local environments
Characterization of the Ligand Binding Site - The Geometric Potential
Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9
Discrimination Power of the Geometric Potential
0
0.5
1
1.5
2
2.5
3
3.5
4
0 11 22 33 44 55 66 77 88 99
Geometric Potential
binding site
non-binding site
• Geometric potential can distinguish binding and non-binding sites
100 0
Geometric Potential Scale
Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9
For Residue Clusters
Local Sequence-order Independent Alignment with Maximum-Weight Sub-Graph Algorithm
L E R
V K D L
L E R
V K D L
Structure A Structure B
• Build an associated graph from the graph representations of two structures being compared. Each of the nodes is assigned with a weight from the similarity matrix
• The maximum-weight clique corresponds to the optimum alignment of the two structures
Xie and Bourne 2008 PNAS, 105(14) 5441
Similarity Matrix of Alignment
Chemical Similarity
• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH)
• Amino acid chemical similarity matrix
Evolutionary Correlation
• Amino acid substitution matrix such as BLOSUM45
• Similarity score between two sequence profiles
ia
i
ib
ib
i
ia SfSfd
fa, fb are the 20 amino acid target frequencies of profile a and b, respectivelySa, Sb are the PSSM of profile a and b, respectively Xie and Bourne 2008 PNAS, 105(14) 5441
We are particularly interested in applying these techniques to
neglected diseases
10/24/13 MED264 36
The Problem with Tuberculosis
• One third of global population infected• 1.7 million deaths per year• 95% of deaths in developing countries• Anti-TB drugs hardly changed in 40 years• MDR-TB and XDR-TB pose a threat to
human health worldwide• Development of novel, effective and
inexpensive drugs is an urgent priority
MED264 37
The TB-Drugome
1. Determine the TB structural proteome
2. Determine all known drug binding sites from the PDB
3. Determine which of the sites found in 2 exist in 1
4. Call the result the TB-drugome
Kinnings et al 2010 PLoS Comp Biol 6(11): e100097610/24/13 MED264 38
1. Determine the TB Structural Proteome
284
1, 446
3, 996 2, 266
TB proteome
homology models
solved structu
res
• High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%
Kinnings et al 2010 PLoS Comp Biol 6(11): e100097610/24/13 MED264 39
2. Determine all Known Drug Binding Sites in the PDB
• Searched the PDB for protein crystal structures bound with FDA-approved drugs
• 268 drugs bound in a total of 931 binding sites
No. of drug binding sites
MethotrexateChenodiol
AlitretinoinConjugated estrogens
DarunavirAcarbose
Kinnings et al 2010 PLoS Comp Biol 6(11): e100097610/24/13 MED264 40
Map 2 onto 1 – The TB-Drugomehttp://funsite.sdsc.edu/drugome/TB/
Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red). 10/24/13 MED264 41
From a Drug Repositioning Perspective
• Similarities between drug binding sites and TB proteins are found for 61/268 drugs
• 41 of these drugs could potentially inhibit more than one TB protein
No. of potential TB targets
raloxifenealitretinoin
conjugated estrogens &methotrexate
ritonavir
testosteronelevothyroxine
chenodiol
Kinnings et al 2010 PLoS Comp Biol 6(11): e100097610/24/13 MED264 42
Top 5 Most Highly Connected Drugs
Drug Intended targets Indications No. of connections TB proteins
levothyroxine transthyretin, thyroid hormone receptor α & β-1, thyroxine-binding globulin, mu-crystallin homolog, serum albumin
hypothyroidism, goiter, chronic lymphocytic thyroiditis, myxedema coma, stupor
14
adenylyl cyclase, argR, bioD, CRP/FNR trans. reg., ethR, glbN, glbO, kasB, lrpA, nusA, prrA, secA1, thyX, trans. reg. protein
alitretinoin retinoic acid receptor RXR-α, β & γ, retinoic acid receptor α, β & γ-1&2, cellular retinoic acid-binding protein 1&2
cutaneous lesions in patients with Kaposi's sarcoma 13
adenylyl cyclase, aroG, bioD, bpoC, CRP/FNR trans. reg., cyp125, embR, glbN, inhA, lppX, nusA, pknE, purN
conjugated estrogens estrogen receptor
menopausal vasomotor symptoms, osteoporosis, hypoestrogenism, primary ovarian failure
10
acetylglutamate kinase, adenylyl cyclase, bphD, CRP/FNR trans. reg., cyp121, cysM, inhA, mscL, pknB, sigC
methotrexatedihydrofolate reductase, serum albumin
gestational choriocarcinoma, chorioadenoma destruens, hydatidiform mole, severe psoriasis, rheumatoid arthritis
10
acetylglutamate kinase, aroF, cmaA2, CRP/FNR trans. reg., cyp121, cyp51, lpd, mmaA4, panC, usp
raloxifeneestrogen receptor, estrogen receptor β
osteoporosis in post-menopausal women 9
adenylyl cyclase, CRP/FNR trans. reg., deoD, inhA, pknB, pknE, Rv1347c, secA1, sigC
10/24/13 MED264 43
Chang et al. 2010 Plos Comp. Biol. 6(9): e1000938 &Change et al. 2013 BMC Systems Biology 7:102
Systems Pharmacology
44MED26410/24/13
A closing note…
10/24/13 MED264 45
Your Social ResponsibilityJosh Sommer and Chordoma Disease
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation#fullprogram10/24/13 MED264 46