Feb 1, 2008 Professional Development Series 1. 2 Metallomes are very diverse Ubiquitous metal binding folds? Very few folds are found in all or most (>90%)

Feb 1, 2008 Professional Development Series 1


Metallomes are very diverse

Ubiquitous metal binding folds? Very few folds are found in all or most (>90%) proteomes. These include the tRNA synthases (Zn), Enolases (Mn), HemN (O 2 independent coproporphyrin oxigenase), and HighPotentialIronProteins (HIPIP)

IntroductionChemical speciation modeling shows that Fe, Zn, Mn, and Co concentrations in an Archaean anoxic ocean, a Proterozoic euxinic ocean, and a Modern oxic ocean would have been quite different (Fig. 1). R.J.P. Williams and J.J.R. Frausto da Silva have long contended that these changes have had an indelible effect upon the evolution of life, particularly in the selection of elements for biological usage. Their theories further posit that this selective force will have left imprints in the genomes of organisms, though this has not been tested. Here we present the metal-binding structural contents of modern proteomes, as they are inferred from bioinformatics analysis of fully sequenced genomes. These results are reconciled with the theorized changes in global trace metal geochemistry.

Modern proteomes and putative “metallomic” imprints of ancient changes in geochemistry

Christopher L. Dupont1, Song Yang2, Brian Palenik1, Philip E. Bourne3

1.Scripps Institution of Oceanography, University of California, San Diego2.Department of Chemistry and Biochemistry, University of California, San Diego3. San Diego Supercomputer Center and the Department of Pharmacology, University of California, San DiegoContact: [email protected]

Methodology: Making the metallome

Figure 2: Pathway of metallome construction. The results of steps 1, 2, and 3 are contained in the Superfamily database Step 4 is done using a manual annotation from the SCOP database.

1. Genome Sequence (actg)

2. Proteome Sequence(amino acid)

3. HMM-based classification into structural fold families

4. A “metallome” for each proteome is constructed using a manually curated annotation of the SCOP database. Includes structural and functional information

Feheme bound

oxidative defense

ZnHis bound

carbon assimilation

FeHis bound

vitaminmetabolism

Power Laws: fundamental constants in the evolution of proteomesThe power law is described by the function y = mxb. A slope of 1 indicates that a group of structural domains is in equilibrium with

genome growth, while a slope > 1 indicates that the group of domains is being preferentially duplicated (or retained in the case of genome reductions).

The number of metal binding structural domains (nm) in a proteome of size p at any given time (t) are described by the generic equation:

nm (t) = (nm (0) / p(0)<am>/<a> ) p(t)<am>/<a>

where <am> and <a> are time averages of the growth of a category and the entire proteome, respectively.

What does this mean?1.The first term (blue) is defined by a common ancestor (time zero), and thus is the same for all proteomes in a given

Superkingdom2.The second term is the slope of our observed power laws, indicating that the abundances of Zn, Fe, Mn, and Co binding

domains conform to Superkingdom-specific evolutionary constants, regardless of the evolutionary history of the organism.

Therefore:1.The proteomes of the Prokarya have preferentially retained or recruited Fe and Co binding

domains during increases or decreases in proteome size, respectively, while excluding Zn binding domains

2.Visa versa in the proteomes of Eukarya

References and Acknowledgements

1. Any work by JJR Frausto Da Silva and RJP Williams, Saito et al. 2003 Inorganica Chimica Acta 356: 308-318. Anbar and Knoll 2002 Science 297: 1137-1142, Van Nimwegen in Koonin et al. Power Laws. C.L.D. would like to thank the Princeton Center for Environmental Bioinorganic Chemistry and the ASEE (NDSEG fellowship) for funding; PEB is funded by NIH.

0

1

2

Zn Fe Mn Co

Archaea Bacteria Eukarya

Total domains in a proteomeT

ota

l Z

n-bin

din

g d

om

ain

s in

a p

rote

om

e

10

1

04

102.5 105

Slo

pe

of

fitt

ed p

ow

er l

aw

A B

The abundance of metal binding structures in a proteome adheres to a power law

Why are are the power laws different for each Superkingdom?Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing Eukarya and Prokarya are similar to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen

We hypothesize that they are the result of the environment of the last common ancestor in each Superkingdom

This suggests that Eukarya evolved in an oxic environment, whereas the Prokarya evolved in anoxic enironmonts

Do the metallomes contain further support this hypothesis?

Figure 3: A quantile plot showing the percent of Bacterial proteomes each Fe-binding fold family occurs in (x). This plot also shows the average copy number of that fold family in the proteomes where it occurs (♦). Essentially, few Fe-binding folds are in most proteomes. Further, the widespread Fe-binding folds are not necessarily abundant. Similar trends are observed for Zn, Mn, and Co in all three Superkingdoms.

0

2

4

6

8

10

12

14

010

20304050

607080

90100

Unique Fe-binding fold families (108 total)

(x)

Per

cen

t of

Bac

teri

al p

rote

omes

wh

ich

a f

old

fam

ily

occu

rs i

n

(♦)Average cop

y nu

mb

er

Fe binding folds: Oxygen and redox shifts

Table 1: The seven most abundant Fe binding folds in each Superkingdom, along with the mode of Fe binding. Also shown is if O2 is present in reactions catalyzed by that fold. Essentially, Eukaryotic Fe binding folds are more likely to bind Fe by hemes or amino acids and also show an increased usage of oxygen.This is consistent with the hypothesis that Eukarya evolved in an oxic environment.

Overall percent of Fe bound bySuperkingdom Fold Family % Fe-binding O2 Fe-S heme amino

Cytochrome P450 0.44 + 0.48 heme yesCytochrome c3-like 0.13 + 0.3 heme noCytochrome b5 0.12 + 0.09 heme no

Eukarya Purple acid phosphatase 0.11 + 0.08 amino no 21 + 9 47 + 19 32 + 12Penicillin synthase-like 0.07 + 0.1 amino yesHypoxia-inducible factor 0.07 + 0.04 amino yesDi-heme elbow motif 0.06 + 0.01 heme no

4Fe-4S ferredoxins 1.80 + 0.7 Fe-S noMoCo biosynthesis proteins 1.60 + 0.3 Fe-S noHeme-binding PAS domain 1.10 + 1.0 heme no

Archaea HemN 0.80 + 0.20 Fe-S 1 68 + 12 13 + 14 19 + 6a helical ferrodoxin 0.60 + 0.16 Fe-S nobiotin synthase 0.55 + 0.1 Fe-S noROO N-terminal domain-like 0.5 + 0.1 amino 2

High potential iron protein 0.38 + 0.25 Fe-S noHeme-binding PAS domain 0.3 + 0.4 heme 1MoCo biosynthesis proteins 0.21 + 0.15 Fe-S no

Bacteria HemN 0.2 + 0.15 Fe-S no 47 + 11 22 + 12 31 + 164Fe-4S ferredoxins 0.2 + 0.2 Fe-S nocytochrome c 0.14 + 0.2 heme noa helical ferrodoxin 0.12 + 0.09 Fe-S no

1. Some, but not all, PAS domains actually sense oxygen2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway

The importance of “small class” Zn folds to Eukarya

Figure 5: A: Log-log plot of the abundance of “small class” Zn binding folds in the proteomes for each Superkingdom. B: Venn diagram showing the distribution of the 53 unique small class Zn folds in each Superkingdom. The bottom set of numbers describe the distribution of small class Zn folds that occur in at least 50% of the proteomes in a given Superkingdom. Small class Zn folds are exemplified by Zn fingers and RING domains. They are believed to have originally evolved in Archaea.It seems unlikely that the observed diversification of Zn structures could occur in an environment low in Zn (Fig. 1).

1

10

100

1000

10000

100 1000 10000 100000

Total number of domainsin a proteomes

To

tal “

sma

ll c

lass”

Zn

b

ind

ing

dom

ain

s

A B

Archaea0/531/28

Eukarya30/5318/28

Bacteria0/530/28

5/530/28

11/539/28

7/530/28

0/530/28

Archaea0/531/28

Eukarya30/5318/28

Bacteria0/530/28

5/530/28

11/539/28

7/530/28

0/530/28

Potential methodological biases1.Unknown folds: The results from the Protein Structure Initiative suggest

that there will be few novel metalloproteins of widespread distribution and high abundance

2.Genome Bias: Principal component analysis shows oxygen tolerance and environment have little effect upon the trends observed in Fig. 4. Phylogeny groupings are apparent however.

Conclusions1.Metallomes have diverse compositions, yet the total abundances

conform to evolutionary constants2.These constants exhibit Superkingdom-specific differences

consistent with ancient changes in geochemistry, a hypothesis further supported by the roles of Zn and Fe

3. These results provide genomic-based evidence for the theory of Anbar and Knoll that Eukaryotic diversification and oxygen-related changes in trace metal chemistry are linked

Figure 1: Theoretical levels of trace metals and oxygen in the deep ocean through Earth’s history. Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines). The trace metal concentrations are replotted from Saito et al, 2003. The phylogenetic tree symbols at the top of the figure show the theoretical periods of diversification for each Superkingdom.

0

0.5

1

1.00E-20

1.00E-16

1.00E-12

1.00E-08

1.00E-15

1.00E-12

1.00E-09

1.00E-06

1.00E-11

1.00E-09

1.00E-07

00.511.522.533.544.5

Billions of years before present

Co

ncen

tration

(O

2in

arbitrary

un

its, Zn

and

Fe in

mo

les L-1

BacteriaArchaea

Eukarya

Oxygen

Zinc

Iron

CobaltManganese

Figure 3: Panel A: Power law scaling for the abundance of metal binding domains. Each point is a discrete proteome of an Archaea ( ■), Bacteria (+), or Eukarya (o), with the number of Zn binding proteins on the Y-axis plotted against the total number of structural domains in a proteome, which is linear with genome size. Panel B: The slopes of the fitted power laws for Zn, Fe, Mn, and Co for each Superkingdom, which are evolutionary constants of proteome evolution (see below).


• Song Yang• Beijing University, B.S. Chemistry• Department of Chemistry and Biochemistry• Graduated with PhD

Genome-wide Study of the Evolution of Protein Domains

I am interested in the evolutionary aspect of protein structures. Protein domain, the basic three-dimensional structural element of proteins, is stabilized by its intrinsic physical and chemical properties. Each domain has its own specific functions and occupies a particular sequence space thus resulting in its own evolutionary history.

The study of the evolution of protein domains is not only an interesting topic, but further enhances our understanding of the sequence-structure-function relationship of proteins. Utilizing protein domains to address evolutionary problems and to study the evolution of protein domains themselves are two facets of the topic I am working on. The right hand side figure is a phylogenetic tree of 174 species across all three major kingdoms generated using protein domain content.

Phylogeny Determined by Protein Domain Content.S. Yang, R.F. Doolittle, and P.E. Bourne. 2005 PNAS 102: 373-378

• Ruben Valas• Carnegie Mellon, BS Computer Science 2005• Bioinformatics Graduate Program• 3rd year PhD student

Rethinking proteasome evolution: Two novel bacterial proteasomes

The proteasome is a multi subunit structure that degrades proteins. Protein degradation is an essential component of regulation because proteins can become misfolded, damaged, or unnecessary. Proteasomes and their homologs vary greatly in complexity.

We searched 238 complete bacterial genomes for structures related to the proteasome, and found evidence of two novel groups of bacterial proteasomes.

The first, which we name Anbu, is sparsely distributed among cyanobacteria and proteobacteria. We hypothesize that Anbu is an ancient proteasome. We also present evidence for a fourth type of bacterial proteasome found in a few β-proteobacteria, which we name β-proteobacteria proteasome homolog (BPH).

Sequence and structural analysis show that Anbu and BPH are both distinct from known bacterial proteasomes, but have homologous structures. Anbu is encoded by one gene, so we postulate a duplication of Anbu created the 20s proteasome. We have found different combinations of Anbu, BPH, and HsIV within these bacterial genomes which raises questions about specialized protein degradation systems.

The PDB contains a significant number of major pharmaceuticals bound to their receptors. Lei Xie with Sarah Kinnings and Jian Wang, have developed a methodology for finding equivalent binding sites across what we define as the druggable proteome. At this time, we estimate this covers about 40% of all druggable targets. An equivalent binding site for a major pharmaceutical holds promise for either (a) explaining the side effects of existing drugs, or (b) using an existing drug (already approved) to treat a different condition. Thus far we have one example of each.

Selective Estrogen Receptor Modulators (SERMs) are a class of drugs that include tamoxifen which are used in the treatment of breast cancer. This drug has significant side effects attributed to disruption in calcium homeostasis. We believe we have found the target of this epidemilogy, namely a Sacroplasmic Reticulum Ca2+ ion channel ATPase protein (SERCA). The challenge now is to design a modified SERM that has equal or better binding to estrogen receptors but less binding to SERCA. In a second experiment, we have established a Parkinson’s Disease drug which we believe will be very effective in the treatment of drug resistant tuberculosis.

• Lei Xie, PhD• Researcher

Repositioning Existing Pharmaceuticals

Proteome-wide Elucidation of the Molecular Mechanism Defining the Adverse Effect of Selective Esterogen Receptor Modulators.L. Xie and P.E. Bourne 2007 PLoS Comp. Biol., Submitted.

Repurposing safe pharmaceuticals to treat multi-drug and extensively drug resistant tuberculosis using an in silico cross-gene-family approach.S. Kinnings, L. Xie and P.E. Bourne 2007 JACS, Submitted.

This project utilizes the EOL pipeline to identify new human kinases with its automated annotation tool, iGAP. In addition to traditional sequence alignment, the more conserved structural elements are considered when searching for remote homologs. This is achieved by comparing proteins to a comprehensive fold library to predict function and structure.

A novel protein kinase function for an Acyl-CoA dehydrogenase protein has been discovered with this process. This is potentially significant because kinases have been implicated in many diseases, including some forms of cancer, thus providing a new pharmaceutical target for therapy. We are interested in collaborations to further explore the role of this putative kinase. Email [email protected] for more information.

IGAP by EOL, an integrative annotation pipeline

PDB

PDP

SCOP

FoldLib

WU-BLAST

PSI-Blast

123D

Structural assignments

Prediction of structural components

Reliability scoring

This work is supported by NIH GM63208.

• Kristine Briedis• Iowa State University, B.S. Genetics• Bioinformatics Graduate Program• 6th year PhD student

Using Structure Similarity to Search for New Human Protein Kinases

Analysis of the Human Kinome Using Methods Including Fold Recognition Reveals Two Novel KinasesK.M. Briedis and P.E. Bourne PLoS ONE, Submitted.

Our laboratory works in the general area of bioinformatics, with an emphasis on structural bioinformatics – the use of the complete corpus of macromolecular structure – proteins, DNA, RNA and complexes thereof to further our understanding of living systems. We believe that when studying living systems the devil is in the details, and in many cases structure affords those details.

Our raw data are the Protein Data Bank (PDB) which we maintain for the worldwide community and is used by 10,000 scientists every day. Using these data we develop algorithms and methods in an attempt to improve our understanding of biology through computation. Here you will find the work of some of our students who study, for example, species differentiation based on protein fold content, prediction of sites of protein-protein interaction, prediction of binding sites across the druggable proteome, and the discovery of novel protein kinases within the human genome. We are committed to the free distribution of software and to open access to all our findings.

The Bourne Laboratoryhttp://www.sdsc.edu/pb

Our laboratory is very interested in scientific dissemination in the Web 2.0 era. To this end we have two major projects.

(1) BioLit is the work of Dr. Lynn Fink and involves the integration of biological database content with the biological literature. We are using the complete corpus of the Public Library of Science journals (PLoS; www.plos.org) and the Protein Data Bank (PDB; www.pdb.org) as our prototype system. So for example, if you access a PLoS paper online describing a structure-function relationship, you can click on a figure in the paper and by accessing the associated structural data in the PDB bring up a view of the molecule that maps directly to that presented in the paper, rotate it, annotate it, and use it to further query the PDB and the associated literature.

(2) SciVee is led by Apryl Bailey and involves Lynn Fink, John Matherly, Alex Ramos, Willy Suwanto and Ben Wilson. We refer to it as a YouTube for scientists. Check it out at http://scivee.tv

Scientific Dissemination and Communication

My project identifies where and how proteins interact with each other using protein sequences and structures. We focus on exploiting the information extracted from 3D structures, which are expected to be very useful with the growing number of structures determined by structural genomics efforts.

Structurally conserved residues, derived from multiple structure alignments, are combined with sequence profile and accessible surface area to predict protein-protein binding sites. The incorporation of structure conservation significantly improves the prediction performance.

We are currently developing a prediction method to detect if two binding sites are interacting with each other. The ultimate goal of this project is to identify the binding sites of a protein and the corresponding binding site on the interacting protein partner.

Exploiting Sequence and Structure Homologs to Identify Protein-Protein Binding Sites.J.L. Chung, W. Wang, and P.E. Bourne 2006 Proteins: Structure, Function and Bioinformatics 62(3) 630-640.

This work is supported by the NIH grant 1P01GM63208-01A1 and 2T32 GM08326.

• Jo-Lan Chung • National Taiwan University, B.S. - Chemistry 1999• Department of Chemistry and Biochemistry• Graduated with PhD

Exploiting Sequence and Structure Homologs to Identify Protein-protein Binding Sites

Rethinking proteasome evolution: Two novel bacterial proteasomes.R. Valas and P.E. Bourne 2007 J. Mol. Evol., Submitted.


Documents

Feb 1, 2008 Professional Development Series 1. 2 Metallomes are very diverse Ubiquitous metal binding folds? Very few folds are found in all or most (>90%)