114

Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,
Page 2: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Genomes to Life Program

Gary JohnsonU.S. Department of Energy (SC-30)Office of Advanced Scientific Computing Research301/903-5800, Fax: 301/[email protected]

Marvin FrazierU.S. Department of Energy (SC-72)Office of Biological and Environmental Research301/903-5468, Fax: 301/[email protected]

A limited number of print copies are available. Contact:

Sheryl MartinOak Ridge National Laboratory1060 Commerce Park, MS 6480Oak Ridge, TN 37830865/576-6669, Fax: 865/574-9888, [email protected]

An electronic version of this document became available on February 4, 2003, at theGenomes to Life Web site:

• http://doegenomestolife.org/pubs/2003abstracts/

Abstracts for this publication were submitted via the Web.

Page 3: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Contractor-Grantee Workshop I

Arlington, Virginia

February 9–12, 2003

Prepared for the

U.S. Department of Energy

Office of Science

Office of Biological and Environmental Research

Office of Advanced Scientific Computing Research

Germantown, MD 20874-1290

Prepared by

Human Genome Management Information System

Oak Ridge National Laboratory

Oak Ridge, TN 37830

Managed by UT-Battelle, LLC

For the U.S. Department of Energy

Under contract DE-AC05-00OR22725

DOE/SC-0072

Page 4: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Contents

Welcome to Genomes to Life Contractor-Grantee Workshop I . . . . . . . . . . . ix

Genomes to Life: Realizing the Potential of the Genome Revolution . . . . 1

GTL Program Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Harvard Medical School

A2 Microbial Ecology, Proteogenomics, and Computational Optima . . . . . . . . . . . . . . 7

George Church, Sallie Chisholm, Martin Polz, Roberto Kolter, Fred Ausubel, Raju Kucherlapati,Steve Lory, Mike Laub, Robert Steen, Martin Steffen, Kyriacos Leptos, Matt Wright, Daniel Segre,Allegra Petti, Jake Jaffe, David Young, Eliana Drenkard, Debbie Lindell, Eric Zinser, and Andrew Tolonen

Lawrence Berkeley National Laboratory

A4 Rapid Deduction of Stress Response Pathways in Metal/RadionuclideReducing Bacteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Adam Arkin, Alex Beliaev, Inna Dubchak, Matthew Fields, Terry Hazen, Jay Keasling, Martin Keller,Vincent Martin , Frank Olken, Anup Singh, David Stahl, Dorothea Thompson, Judy Wall,and Jizhong Zhou

Oak Ridge National Laboratory

A6 Bioinformatics and Computing in the Genomes to Life Center for Molecularand Cellular Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

D. A. Payne, E. S. Mendoza, G. A. Anderson, D. K. Gracio, W. R. Cannon, T. P. Straatsma, H. J. Sofia, D.A. Dixon, M. Shah, D. Xu, D. Schmoyer, S. Passovets, I. Vokler, J. Razumovskaya, T. Fridman,V. Olman, A. Gorin, E. Uberbacher, F. Larimer, and Y. Xu

A8 Mass Spectrometry in the Genomes to Life Center for Molecularand Cellular Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Gregory B. Hurst, Robert L. Hettich, Nathan C. Verberkmoes, Gary J. Van Berkel, Frank W. Larimer,Trish K. Lankford, Steven J. Kennel, Dale Pelletier, Jane Razumovskaya, Richard D. Smith, Mary Lipton,Michael Giddings, Ray Gesteland, Malin Young, and Carol Giometti

Genomes to Life I i

Session and poster board numbers are indicated in the gray boxes.

Page 5: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

A10 Genomes to Life Center for Molecular and Cellular Systems: A ResearchProgram for Identification and Characterization of Protein Complexes . . . . . . . . . 11

Joshua N. Adkins, Deanna Auberry, Baowei Chen, James R. Coleman, Priscilla A. Garza,Jane M. Weaver Feldhaus, Michael J. Feldhaus, Yuri A. Gorby, Eric A. Hill, Brian S. Hooker, Chian-Tso Lin,Mary S. Lipton, L. Meng Markillie, M. Uljana Mayer, Keith D. Miller, Sewite Negash, Margaret F. Romine,Liang Shi, Robert W. Siegel, Richard D. Smith, David L. Springer, Thomas C. Squier, H. Steven Wiley,Linda J. Foote, Trish K. Lankford, Frank W. Larimer, T-Y. S. Lu, Dale Pelletier, Stephen J. Kennel, andYisong Wang

A12 New Approaches for High-Throughput Identification and Characterizationof Protein Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Michelle Buchanan, Frank Larimer, Steven Wiley, Steven Kennel, Thomas Squier, Michael Ramsey,Karin Rodland, Gregory Hurst, Richard Smith, Ying Xu, David Dixon, Mitchel Doktycz, Steve Colson,Carol Giometti, Raymond Gesteland, Malin Young, and Michael Giddings

A14 Automation of Protein Complex Analyses in Rhodopseudomonas palustris andShewanella oneidensis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

P. R. Hoyt, C. J. Bruckner-Lea, S. J. Kennel, P. K. Lankford, M. S. Lipton, R. S. Foote, J. M. Ramsey,K. D. Rodland, and M. J. Doktycz

Sandia National Laboratories

A16 Analysis of Protein Complexes from a Fundamental Understanding of ProteinBinding Domains and Protein-Protein Interactions in Synechococcus WH8102 . . . . 16

Anthony Martino, Andrey Gorin, Todd Lane, Steven Plimpton, Nagiza Samatova, Ying Xu,Hashim Al-Hashimi, Charlie Strauss, Byung-Hoon Park, George Ostrouchov, Al Geist, William Hart,and Diana Roe

A18 Carbon Sequestration in Synechococcus: Microarray Approaches . . . . . . . . . . . . . . . 18

Brian Palenik, Anthony Martino, Jerilyn A. Timlin, David M. Haaland, Michael B. Sinclair,Edward V. Thomas, Vijaya Natarajan, Arie Shoshani, Ying Xu, Dong Xu, Phuongan Dam,Bianca Brahamsha, Eric Allen, and Ian Paulsen

A20 Carbon Sequestration in Synechococcus sp.: From Molecular Machinesto Hierarchical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Grant S. Heffelfinger, Anthony Martino, Andrey Gorin, Ying Xu, Mark D. Rintoul III, Al Geist,Hashim M. Al-Hashimi, George S. Davidson, Jean Loup Faulon, Laurie J. Frink, David M. Haaland,William E. Hart, Erik Jakobsson, Todd Lane, Ming Li, Phil Locascio, Frank Olken, Victor Olman,Brian Palenik, Steven J. Plimpton, Diana C. Roe, Nagiza F. Samatova, Manesh Shah, Arie Shoshani,Charlie E. M. Strauss, Edward V. Thomas, Jerilyn A. Timlin, and Dong Xu

A22 Systems Biology Models for Synechococcus sp. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Mark D. Rintoul, Damian Gessler, Jean-Loup Faulon, Shawn Means, Steve Plimpton, Tony Martino,and Ying Xu

ii Genomes to Life I

Session and poster board numbers are indicated in the gray boxes.

Page 6: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

University of Massachusetts, Amherst

A24 Analysis of the Genetic Potential and Gene Expression of MicrobialCommunities Involved in the in situ Bioremediation of Uranium andHarvesting Electrical Energy from Organic Matter . . . . . . . . . . . . . . . . . . . . . . . . 20

Derek Lovley, Stacy Ciufo, Zhenya Shebolina, Abraham Esteve-Nunez, Cinthia Nunez, Richard Glaven,Regina Tarallo, Daniel Bond, Maddalena Coppi, Pablo Pomposiello, Steve Sandler, Barbara Methé,Carol Giometti, and Julia Krushkal

GTL Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

B63 Communicating Genomes to Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Anne E. Adamson, Jennifer L. Bownas, Denise K. Casey, Sherry A. Estes, Sheryl A. Martin,Marissa D. Mills, Kim Nylander, Judy M. Wyrick, Laura N. Yust, and Betty K. Mansfield

Modeling/Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

A26 Hierarchical Organization of Modularity in Metabolic Networks . . . . . . . . . . . . . . 25

Albert-László Barabási, Zoltán N. Oltvai, A. L. Somera, D. A. Mongru, G. Balazsi, Erzsebet Ravasz,S. Y. Gerdes, J. W. Campbell, and A. L. Osterman

A30 SimPheny: A Computational Infrastructure Bringing Genomes to Life . . . . . . . . . 26

Christophe H. Schilling, Radhakrishnan Mahadevan, Sung Park, Evelyn Travnik, Bernhard O. Palsson,Costas Maranas, Derek Lovley, and Daniel Bond

A32 Parallel Scaling in Amber Molecular Dynamics Simulations . . . . . . . . . . . . . . . . . . 27

Michael Crowley, Scott Brozell, and David A. Case

A34 Microbial Cell Model of G. sulfurreducens: Integration of in Silico Modelsand Functional Genomic Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Derek Lovley, Maddalena Coppi, Daniel Bond, Jessica Butler, Susan Childers, Teena Metha, Ching Leang,Barbara Methé, Carol Giometti, R. Mahadevan, C. H. Schilling, and B. Palsson

A36 Towards a Self-Organizing and Self-Correcting Prokaryotic Taxonomy . . . . . . . . . 30

George M. Garrity and Timothy G. Lilburn

A38 Computational Framework for Microbial Cell Simulations . . . . . . . . . . . . . . . . . . 31

Haluk Resat , Heidi Sofia, Harold Trease, Joseph Oliveira, Samuel Kaplan, and Christopher Mackenzie

A40 Characterization of Genetic Regulatory Circuitry Controlling AdaptiveMetabolic Pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Harley McAdams, Lucy Shapiro, and Mike Laub

Genomes to Life I iii

Session and poster board numbers are indicated in the gray boxes.

Page 7: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

A28 Computational Elucidation of Metabolic Pathways . . . . . . . . . . . . . . . . . . . . . . . . 33

Imran Shah

A42 Data Exchange and Programmatic Resource Sharing: The Systems BiologyWorkbench, BioSPICE and the Systems Biology Markup Language (SBML) . . . . 34

Herbert M Sauro

A44 A Web-Based Laboratory Information Management System (LIMS)for Laboratory Microplate Data Generated by High-ThroughputGenomic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

James R. Cole, Joel A. Klappenbach, Paul R. Saxman, Qiong Wang, Siddique A. Kulam,Alison E. Murray, Liyou Wu, Jizhong Zhou, and James M. Tiedje

A46 BioSketchpad: An Interactive Tool for Modeling Biomolecularand Cellular Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Jonathan Webb, Lois Welber, Arch Owen, Jonathan Delatizky, Calin Belta, Mark Goulian,Franjo Ivancic, Vijay Kumar, Harvey Rubin, Jonathan Schug, and Oleg Sokolsky

A48 Molecular Docking with Adaptive Mesh Solutions to thePoisson-Boltzmann Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Julie C. Mitchell, Lynn F. Ten Eyck, J. Ben Rosen, Michael J. Holst, Victoria A. Roberts,J. Andrew McCammon, Susan D. Lindsey, and Roummel Marcia

A50 Functional Analysis and Discovery of Microbial Genes Transforming Metallicand Organic Pollutants: Database and Experimental Tools . . . . . . . . . . . . . . . . . . 37

Lawrence P. Wackett and Lynda B.M. Ellis

A52 Comparative Genomics Approaches to Elucidate TranscriptionRegulatory Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Lee Ann McCue, William Thompson, C. Steven Carmack, Zhaohui S. Qin, Jun S. Liu,and Charles E. Lawrence

A54 Predicting Genes from Prokaryotic Genomes: Are “Atypical” GenesDerived from Lateral Gene Transfer? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

John Besemer, Yuan Tian, John Logsdon, and Mark Borodovsky

A56 Advanced Molecular Simulations of E. coli Polymerase III . . . . . . . . . . . . . . . . . . . 39

Michael Colvin, Felice Lightstone, Ed Lau, Ceslovas Venclovas, Daniel Barsky, Michael Thelen, Giulia Galli,Eric Schwegler, and Francois Gygi

A58 Karyote®: Automated Physico-Chemical Cell Model DevelopmentThrough Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Peter J. Ortoleva, Abdalla Sayyed-Ahmad, Ali Navid, Kagan Tuncay, and Elizabeth Weitzke

iv Genomes to Life I

Session and poster board numbers are indicated in the gray boxes.

Page 8: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

A60 The Commercial Viability of EXCAVATOR™: A Software Tool ForGene Expression Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Robin D. Zimmer, Morey Parang, Dong Xu, and Ying Xu

A62 Modeling Electron Transfer in Flavocytochrome c3 Fumarate Reductase . . . . . . . . 43

Dayle M. Smith, Michel Dupuis, Erich R. Vorpagel, and T. P. Straatsma

Environmental Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

B1 Identification and Isolation of Active, Non-Cultured Bacteria fromRadionuclide and Metal Contaminated Environments for Genome Analysis . . . . . 45

Cheryl R. Kuske, Susan M. Barns, and Leslie E. Sommerville

B3 A Metagenomic Library of Bacterial DNA Isolated from the Delaware River . . . . 46

David L. Kirchman, Matthew T. Cottrell, and Lisa Waidner

B5 Approaches for Obtaining Genome Sequence from ContaminatedSediments Beneath a Leaking High-Level Radioactive Waste Tank . . . . . . . . . . . . 47

Fred Brockman, Margaret Romine, Kristin Kadner, Paul Richardson, Karsten Zengler, Martin Keller,and Cheryl Kuske

B7 Ecological and Evolutionary Analyses of a Spatially and GeochemicallyConfined Acid Mine Drainage Ecosystem Enabled by Community Genomics . . . . 48

Gene W. Tyson, Philip Hugenholtz, and Jillian F. Banfield

Microbial Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

B11 Strategies to Harness the Metabolic Diversity of Rhodopseudomonas palustris . . . . . . 51

Caroline S. Harwood, Jizhong Zhou, F. Robert Tabita, Frank Larimer, Liyou Wu, Yasuhiro Oda,Federico Rey, and Sudip Samanta

B13 Gene Expression Profiles in Nitrosomonas europaea, an ObligateChemolithoautotroph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Dan Arp, Xueming Wei, Luis Sayavedra-Soto, Martin G. Klotz, Jizhong Zhou, and Tingfen Yan

B15 Genomics of Thermobifida fusca Plant Cell Wall Degradating Proteins . . . . . . . . . . 52

David B. Wilson, Yuan-Man Hsu, and Diana Irwin

B17 The Rhodopseudomonas palustris Microbial Cell Project . . . . . . . . . . . . . . . . . . . . . . 53

F. Robert Tabita, Janet L. Gibson, Caroline S. Harwood, Frank Larimer, Thomas Beatty, James C. Liao,Jizhong (Joe) Zhou, and Richard Smith

Genomes to Life I v

Session and poster board numbers are indicated in the gray boxes.

Page 9: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

B19 Lateral Gene Transfer and the History of Bacterial Genomes . . . . . . . . . . . . . . . . . 53

Scott R. Santos and Howard Ochman

B21 Environmental Sensing, Metabolic Response, and Regulatory Networksin the Respiratory Versatile Bacterium Shewanella oneidensis MR-1 . . . . . . . . . . . . . 54

James K. Fredrickson, Margie F. Romine, William Cannon, Yuri A. Gorby, Mary S. Weir-Lipton,H. Peter Lu, Richard D. Smith, Harold E. Trease, and Shimon Weiss

A64 Interdisciplinary Study of Shewanella oneidensis MR-1’s Metabolismand Metal Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Eugene Kolker

B23 Integrated Analysis of Protein Complexes and Regulatory Networks Involvedin Anaerobic Energy Metabolism of Shewanella oneidensis MR-1 . . . . . . . . . . . . . . 56

Jizhong Zhou, Dorothea K. Thompson, Matthew W. Fields, Adam Leaphart, Dawn Stanek,Timothy Palzkill, Frank Larimer, James M. Tiedje, Kenneth H. Nealson, Alex S. Beliaev, Richard Smith,Bernhard O. Palsson, Carol Giometti, Dong Xu, Ying Xu, Mary Lipton, James R. Cole,and Joel Klappenbach

B25 Global Regulation in the Methanogenic Archaeon Methanococcus maripaludis . . . . 57

John Leigh, Murray Hackett, Roger Bumgarner, Ram Samudrala, William Whitman, Jon Amster,and Dieter Söll

B27 Identification of Regions of Lateral Gene Transfer Across the Thermotogales . . . . 58

Karen E. Nelson, Emmanuel Mongodin, Ioana Hance, and Steven R. Gill

B29 The Dynamics of Cellular Stress Responses in Deinococcus radiodurans . . . . . . . . . . 59

Michael J. Daly, Jizhong Zhou, James K. Fredrickson, Richard D. Smith, Mary S. Lipton,and Eugene Koonin

B9 Uncovering the Regulatory Networks Associated with IonizingRadiation-Induced Gene Expression in D. radiodurans R1 . . . . . . . . . . . . . . . . . . 60

John R. Battista, Ashlee M. Earl, Heather A. Howell, and Scott N. Peterson

B31 Analysis of Proteins Encoded on the S. oneidensis MR-1 Chromosome,Their Metabolic Associations, and Paralogous Relationships . . . . . . . . . . . . . . . . . 60

Margrethe H. Serres, Maria C. Murray, and Monica Riley

B33 Finishing and Analysis of the Nostoc punctiforme Genome . . . . . . . . . . . . . . . . . . . 61

S. Malfatti, L. Vergez, N. Doggett, J. Longmire, R. Atlas, J. Elhai, J. Meeks,and P. Chain

vi Genomes to Life I

Session and poster board numbers are indicated in the gray boxes.

Page 10: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

B35 In Search of Diversity: Understanding How Post-Genomic Diversity isIntroduced to the Proteome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Barry Moore, Chad Nelson, Norma Wills, John Atkins, and Raymond Gesteland

B37 The Microbial Proteome Project: A Database of Microbial Protein Expressionin the Context of Genome Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Carol S. Giometti, Gyorgy Babnigg, Sandra L. Tollaksen, Tripti Khare, Derek R. Lovley,James K. Fredrickson, Kenneth H. Nealson, Claudia I. Reich, Gary J. Olsen, Michael W. W. Adams,and John R. Yates III

B39 Analysis of the Shewanella oneidensis Proteome in Cells Grown inContinuous Culture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Carol S. Giometti, Mary S. Lipton, Gyorgy Babnigg, Sandra L. Tollaksen, Tripti Khare,James K. Fredrickson, Richard D. Smith, Yuri A. Gorby, and John R. Yates III

B41 The Molecular Basis for Metabolic and Energetic Diversity . . . . . . . . . . . . . . . . . . 64

Timothy Donohue, Jeremy Edwards, Mark Gomelsky, Jonathan Hosler, Samuel Kaplan,and William Margolin

B43 Integrative Studies of Carbon Generation and Utilization in theCyanobacterium Synechocystis sp. PCC 6803 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Wim Vermaas, Robert Roberson, Julian Whitelegge, Kym Faull, Ross Overbeek, and Svetlana Gerdes

Technology Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

B45 Comparative Optical Mapping: A New Approach for MicrobialComparative Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Shiguo Zhou, Thomas S. Anantharaman, Erika Kvikstad, Andrew Kile, Mike Bechner, Wen Deng,Jun Wei, Valerie Burland, Frederick R. Blattner, Chris Mackenzie, Timothy Donohue, Samuel Kaplan,and David C. Schwartz

B47 Optical Mapping of Multiple Microbial Genomes . . . . . . . . . . . . . . . . . . . . . . . . . 67

Shiguo Zhou, Michael Bechner, Erika Kvikstad, Andrew Kile, Susan Reslewic, Aaron Anderson,Rod Runnheim, Jessica Severin, Dan Forrest, Chris Churas, Casey Lamers, Samuel Kaplan,Chris Mackenzie, Timothy J. Donohue, and David C. Schwartz

B49 Identification of ATP Binding Proteins within Sequenced Bacterial GenomesUtilizing Phage Display Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Suneeta Mandava, Lee Makowski, and Diane J. Rodi

B51 Development of Vectors for Detecting Protein-Protein Interactions in Bacteria . . . 69

Peter Agron and Gary Andersen

Genomes to Life I vii

Session and poster board numbers are indicated in the gray boxes.

Page 11: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

B53 Development and Use of Microarray-Based Integrated Genomic Technologiesfor Functional Analysis of Environmentally Important Microorganisms . . . . . . . . 70

Jizhong Zhou, Liyou Wu, Xiudan Liu, Tingfen Yan, Yongqing Liu, Steve Brown, Matthew W. Fields,Dorothea K. Thompson, Dong Xu, Joel Klappenbach, James M. Tiedje, Caroline Harwood, Daniel Arp,and Michael Daly

B55 Electron Tomography of Whole Bacterial Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Ken Downing

B57 Single Cell Proteomics—D. radiodurans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Norman J. Dovichi

B59 Genomes to Proteomes to Life: Application of New Technologies forComprehensive, Quantitative and High Throughput Microbial Proteomics . . . . . . 72

Richard D. Smith, James K. Fredrickson, Mary S. Lipton, David Camp, Gordon A. Anderson,Ljiljana Pasa-Tolic, Ronald J. Moore, Margie F. Romine, Yufeng Shen, Yuri A. Gorby, and Harold R. Udseth

B61 New Developments in Statistically Based Methods for Peptide Identificationvia Tandem Mass Spectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Kenneth D. Jarman, Kristin H. Jarman, Alejandro Heredia-Langner, and William R. Cannon

Appendix 1: Attendees List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Appendix 2: Poster Presenters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Appendix 3: GTL Web Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Institution Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Meeting Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inside Back Cover

Total Number of Abstracts: 64

viii Genomes to Life I

Session and poster board numbers are indicated in the gray boxes.

Page 12: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Welcome to Genomes to LifeContractor-Grantee Workshop I

Welcome to the first of what we hope will be many Genomes to Life (GTL) contractor-grantee workshops.Although only in its second official year of funding, GTL already is attracting broad and enthusiastic interestand support from scientists at universities, national laboratories, and industry; colleagues at other federalagencies; Department of Energy leadership; and Congress.

You are part of an exciting era in biology as we begin to systematically leverage the knowledge and capabili-ties brought to us by DNA sequencing projects into an understanding of the functioning and control ofentire biological systems. GTL certainly is not the first, nor will it be the last, to conduct �systems biology�research, but we believe the program offers a roadmap for these new explorations. GTL research is, of neces-sity, at the interface of the physical, computational, and biological sciences.

GTL will require the development of technologies that will enable us to �see� biology happen at finer scalesof resolution. It also will require a substantial integration of our broad capabilities in mathematics and com-putation with our new knowledge of biology. Only with this integration can we achieve GTL�s fundamentalgoal: to understand biological systems so well that we can accurately predict their behavior with sophisti-cated computational models.

To enable this goal, GTL aims to develop these new technological, analytical, biological, and computationalcapabilities into cost-effective, widely accessible, high-throughput capabilities analogous to today�s DNAsequencing factories.

Microbes are GTL�s principal biological focus. In the complex �simplicity� of microbes, we find capabilitiesneeded by DOE�indeed by our entire nation�for clean energy, cleanup of environmental contamination,and sequestration of atmospheric carbon dioxide that contributes to global warming. In addition, the funda-mental knowledge and technologies developed in GTL will be broadly usable in all areas of biologicalresearch.

This first GTL program workshop is an opportunity for all of us to discuss, listen, and learn about the excit-ing science, identify research needs and opportunities, form research partnerships, and share the excitementof this program with the broader scientific community.

We look forward to a stimulating and productive meeting and offer our sincere thanks to all the organizersand to you, the scientists, whose vision and efforts will help us all to realize the promise of this excitingresearch program.

Ari PatrinosAssociate Director forBiological and Environmental ResearchOffice of ScienceU.S. Department of Energy

Ed OliverAssociate Director forAdvanced Scientific Computing ResearchOffice of ScienceU.S. Department of Energy

Genomes to Life I xi

Page 13: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Genomes to Life: From DNA Sequence to Living Systems

he remarkable successes of theHuman Genome Project and spin-offsrevealing the details of numerous

genomes—from microbes to plants tomice—provide the richest resource in thehistory of biology. These achievements nowempower scientists to address the ultimategoal of modern biology: to obtain a funda-mental, comprehensive, and systematicunderstanding of life. This goal is founded,as is life itself, on the genome, whose genesencode the proteins that carry out mostcellular activities via a labyrinth of path-ways and networks that make the cell“come alive” (see figure below).

Catalyzing Systems BiologyThe Department of Energy’s (DOE) Ge-nomes to Life (GTL) program is combininghigh-throughput advanced technologiesand computation with the informationfound in microbial genomes to establish afoundation for achieving an understandingof living systems (see “Microbes for DOEMissions,” p. 2). GTL is designed to helplaunch biology onto a new trajectory tocomprehensively understand cellularprocesses in a realistic context. This newlevel of exploration, known as systemsbiology, will empower scientists to pursuecompletely new approaches to discovery,transforming biology to a more quantita-tive and predictive science. GTL scientificgoals target the fundamental processes ofliving systems by studying them on threelevels:

1. Proteins and multicomponent molecularmachines that form all of the cell’sstructures and perform most of thecell’s work.

2. Gene regulatory networks and pathwaysthat control cellular processes.

3. Microbial communities in which groupsof cells carry out complex processes innature.

Molecular machines carry out chemicalreactions, generate mechanical forces,transport metabolites and ions, andmake possible every action of a biologi-cal system. A cell does not generate itsentire repertoire of molecular machinesat once. Genomic regulatory elementsdictate the particular set producedaccording to the organism’s life strategyand in response to environmental cues,including other microbial populations inthe larger ecological community.

A comprehensive approach to under-standing biological systems thus extendsfrom individual cells to many cellsfunctioning in communities. Such studiesmust encompass proteins, molecularmachines, pathways, networks, cells,and, ultimately, their regulatory elements,cellular systems, and environments. Thisnext generation of biology is viable onlywith vastly increased computational andinformational capabilities to master thefull complexities of biological systems.

T Understanding lifeprocesses at themolecular level is a“national sciencepriority.”—OSTP-OMB FY 2004budget guidance memo;see p. 6.

Genes are made up of DNA and contain the information used by othercellular components (e.g., RNA and ribosomes, not shown here) tocreate proteins. A working cell is tightly packed with tens of thou-sands of proteins and other molecules, often working together asmultimolecular “machines” to perform essential cellular activities(see also cell figure, p. 5).

Genomes to Life:Realizing the Potential ofthe Genome Revolution

Catalyzing systemsbiology ......................... 1

Transforming biology withlarge-scale technologiesand computing............... 2

Emerging technologies andcomputing for systemsbiology ...................... 3–4

Integrated user facilitiesdemocratizing accessto systems biologyresources ...................... 5

A growing mandate formolecular studies ........... 6

Understanding life at themolecular level

January 2003 DOEGenomesToLife.org

DOE/SC-0069

Page 14: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Why Study Microbes?The ability of this planet to sustain life is largelydependent on microbes. They are the foundation of thebiosphere, controlling earth’s biogeochemical cyclesand affecting the productivity of the soil, quality ofwater, and global climate. As one of the most excitingfrontiers in biology today, microbial research is reveal-ing the hidden architecture of life and the dynamic,life-sustaining processes on earth. The diversity andrange of their environmental adaptations mean thatmicrobes long ago “solved” many problems for whichscientists are seeking solutions today (see examples atright). The incomprehensible number of microbes isan untapped but valuable resource that ultimately maybe used to generate new energy sources (e.g., hydro-gen for a new energy economy), new cleanup andindustrial processes, and new ways of using biology toaddress DOE missions.

The ChallengeMicrobes have become masters at living in almostevery environment, harvesting energy in almost anyform. Their sophisticated biochemical capabilities canbe utilized for transforming wastes and organic mat-ter, cycling nutrients, and, as part of the photosyn-thetic process, converting sunlight into energy and“fixing” (storing) CO2 from the atmosphere. The analyti-cal complexity involved in understanding these proc-esses is enormous. Thousands of microbes have capa-bilities of interest. Moreover, each microbial genomecontains thousands of genes capable of producing aneven-greater number of proteins. These proteins

Large-Scale Technologies and Computing:Transforming Biology

Microbes for DOE Missions: Energy Security, Cleanup, Climate Change

J

2

combine to form innumerable molecular machines inmyriad pathways and networks, many of which carryout biological processes useful for DOE missions (see“Potential Applications of GTL Science,” p. 3 ).

Thalassiosira pseudonana:Ocean diatom that is majorparticipant in biologicalpumping of carbon toocean depths and haspotential for mitigatingglobal climate change.

Methanococcus jannaschii:Produces methane, animportant energy source;contains enzymes thatwithstand high tempera-tures and pressures;possibly useful for indus-trial processes.

Deinococcus radiodurans:Survives extremely highlevels of radiation and hashigh potential for radioac-tive waste cleanup.

ust as DNA sequencing capability was com-pletely inadequate at the beginning of the Hu-man Genome Project (HGP), the quantity and

complexity of data that must be collected and analyzedfor systems biology research far exceed current capa-bilities and capacities. The HGP taught that aspects ofbiological research can be made high-throughput andcost-effective (see graph, p. 5). Collecting and usingsuch data and reagents will require a new organiza-tional model that coordinates and integrates dozens ofhigh-throughput technologies and approaches, somenot yet refined or even developed. This is the centralprinciple of GTL and indeed of all systems biologyresearch.

Analysis of living systems will require a new genera-tion of experimentation and the computational meth-ods and capabilities to assimilate, understand, andmodel the data on the scale and complexity of realliving systems. Computing must guide the researchquestions and interpretation at every step.

The knowledge base resulting from the GTL programwill provide the entire research community with data,models, and simulations of gene expression, path-ways, and network systems; molecular machines; andcell and community processes. These new capabilitiesand resources will inspire revolutionary solutions toDOE mission needs and transform the entire lifesciences landscape, from agriculture to human health.

Page 15: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Emerging Technologies and Computing for Systems Biology:Establishing a Firm Foundation for Genomes to Life

Office of Science—At the Forefront of the Biological RevolutionIn 1986 the DOE Office of Science launched the HumanGenome Project to understand, at the DNA level, theeffects of energy production on human health. The HGP’sinnovative operational model proved highly successful,and benefits far exceeded the original goal. Today, DOE ispoised to take the next vital steps—translating the geneticcode in DNA into a new understanding of how life worksand applying those biological processes to serve itschallenging missions. Effective use of microbial and otherbiological systems and components will generate newbiotechnological industries involving fuels, biochemicalprocessing, nanomaterials, and broader environmentaland biomedical applications.

N

3

Learning about the inner workings of microbes and theirdiverse inventory of molecular machines can lead to discoveryof ways to isolate and use these components to develop new,synthetic nanostructures that carry out some of the functionsof living cells. In this figure, the enzyme organophosphorushydrolase (OPH) has been embedded in a synthetic nano-membrane (mesoporous silica) that enhances its activity andstability [J. Am. Chem. Soc. 124, 11242–43 (2002)]. The OPHtransforms toxic substances (purple molecule at left of OPH) toharmless byproducts (yellow and red molecules at right).Applications such as this could enable development of efficientenzyme-based ways to produce energy, remove or inactivatecontaminants, and sequester carbon to mitigate global climatechange. The knowledge gained from GTL research also couldbe highly useful in food processing, pharmaceuticals, separa-tions, and the production of industrial chemicals.

computing sciences together at the scale and complexityrequired for success. Its academic affiliations, nationallaboratories, and other resources include major facili-ties for DNA sequencing and molecular-structure char-acterization, the high-performance computing resourcesof the Office of Advanced Scientific Computing Research(OASCR), the expertise and infrastructure for technologydevelopment, and a tradition of productive multidisci-plinary research essential for such an ambitious re-search program. In the effort to understand biologicalsystems, these strong assets and the GTL program willcomplement and extend the capabilities and researchefforts supported by the National Institutes of Health,National Science Foundation, other agencies andinstitutions, and industry.

Potential Applications of GTL Science

umerous projects funded by the Office of Biologicaland Environmental Research (BER) over the past5 years have established a strong foundation for the

GTL program. These projects underscore the need forhigh-throughput biological research and novel computa-tional approaches. They are also demonstrating the powerof mass spectrometric analyses of whole microbialproteomes, developing new imaging methods, advancingthe use of microarrays for expression analyses, exploringscalable ways to generate microbial proteins, and develop-ing computational tools for second-generation genomeanalysis and annotation.

Several collaborative groups are integrating technolo-gies and computational modeling to gain a systemsunderstanding of specific microbes in their naturalenvironments. For example, the Shewanella Federation,

consisting of teams from academia, national laboratories,and other organizations is making progress in prelimi-nary proteome and expression analyses of this remark-ably versatile organism that can immobilize toxic ura-nium from ground water. By focusing multiple technolo-gies on a single organism, the federation is integratingdiverse experimental results into a multidimensionalperspective of the biology of this key microbe. Thus far,the group has identified >77% of the predicted proteomeof Shewanella (3782 of 4855 predicted genes) usingultrahigh-resolution mass spectrometry techniques. Thisand other groundbreaking BER projects (e.g., onDeinococcus radiodurans) have elucidated the highestpercentages of the proteomes of organisms studied todate. These projects have also set the stage and identifiedthe need for developing high-throughput user facilitiesaccessible to the biological research community (see p. 5).

The Office of Science has the capabilities and institu-tional traditions to bring the biological, physical, and

Page 16: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

enomes to Life continues to build its R&D portfolio,having made awards in July 2002 that totaled$103 million for FY 2002–FY 2007. These projects are

focusing on isolating and characterizing protein machines,understanding complex biological communities, modelingcellular metabolic and regulatory processes, and modelingcarbon sequestration processes in marine microbes.

Projects were chosen to test the concept of systemsbiology applications and to demonstrate advancedtechnologies (see picture at right), computation, andpotential high-throughput methods in areas havingpossible impact on DOE missions. These awards repre-sent the culmination of nearly 3 years of planning bythe DOE Office of Science and hundreds of scientistsat universities, national laboratories, and industry. The

Other Participating Institutions

FY 2003 Call for Proposals: www.er.doe.gov/production/grants/Fr03-05.html

Emerging Technologies and Computing for Systems Biology:Genomes to Life FY 2002 Awards

• Oak Ridge National Laboratory: Developing technolo-gies needed to identify and characterize the com-plete set of multiprotein complexes within a microbeinvolved in the carbon cycle (important for carbonsequestration) and another microbe that has theability to clean up metals in contaminated soil(www.ornl.gov/GenomestoLife/).

• Lawrence Berkeley National Laboratory: Developingcomputational models that describe and predict thebehavior of gene regulatory networks in microbesin response to environmental conditions found atDOE waste sites (vimss.lbl.gov/).

• Sandia National Laboratories: Developing experimen-tal and computational methods to understand theproteins, protein-protein interactions, and generegulatory networks in a marine microbe that playsa significant role in earth’s carbon cycle; importantfor carbon sequestration (www.genomes-to-life.org/).

• University of Massachusetts, Amherst: Developingcomputational models to predict the activity of naturalcommunities of microbes having potential for uraniumbioremediation and production of electricity throughtheir ability to transfer electrons to electrodes(DOEGenomesToLife.org/research/umass.html).

• Harvard Medical School: Studying the proteins, protein-protein interactions, gene regulatory networks, andcommunity behavior of microbes active in the carboncycle (with capabilities relevant to carbon sequestra-tion); important for bioremediation strategies. Devel-oping computational methods to understand thecomplex biology of these microbes at a systems level(arep.med.harvard.edu/DOEGTL/).

G

4

microbesstudied in thepilot projects,as well as the2002 awards,have potentialfor bioremedi-ating metalsand radionu-clides, degrad-ing organicpollutants, producing energy feedstocksincluding biomass and hydrogen,sequestering carbon, and demonstrat-ing importance in ocean carbon cycling.

Institutions and Projects Awarded in 2002

The 7-Tesla Fouriertransform ioncyclotron resonancemass spectrometer atthe William R. WileyEnvironmentalMolecular SciencesLaboratory. Massspectrometry is themost sensitivemethod for identify-ing proteins.

Argonne NationalLaboratory

Brigham and Women’sHospital

Diversa Corporation

Los Alamos NationalLaboratory

Massachusetts GeneralHospital

Massachusetts Instituteof Technology

National Center forGenome Resources

Pacific Northwest NationalLaboratory

The Institute forGenomic Research

The Molecular ScienceInstitute

University of California(Berkeley, San Diego,Santa Barbara)

University of Illinois

University of Michigan

University of Missouri

University of NorthCarolina

University of Tennessee(Knoxville, Memphis)

University of Utah

University ofWashington

Direct Web Access• doegenomestolife.org/pubs.html

• doegenomestolife.org/gallery/images.html

• doegenomestolife.org/research/index.html

• www.ornl.gov/microbialgenomes

• www.ornl.gov/hgmis/education/education.html

Microbes studied in GTL havehad their genetic sequences

determined under DOE’sMicrobial Genome Program.

MicrobialGenomeProgram

Page 17: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Facility I for Production and Characterization ofProteins would use highly automated processes tomass-produce and characterize proteins directly frommicrobial genome data and create affinity reagents(“tags”) to identify, capture, and monitor the proteinsfrom living systems.

Facility II for Whole Proteome Analysis wouldcharacterize the expressed proteomes of diverse mi-

crobes under different environmental condi-tions as an essential step toward determiningthe functions and interactions of individualproteins and sets of proteins.

Facility III for Characterization andImaging of Molecular Machines wouldisolate, identify, and characterize thousands ofmolecular machines from microbes and developthe ability to image component proteins withincomplexes and to validate the presence of thecomplexes within cells.

Facility IV for Analysis and Modeling ofCellular Systems would combine advancedcomputational, analytical, and experimentalcapabilities for the integrated observation,measurement, and analysis of spatial andtemporal variations in the structures andfunctions of cellular systems—from individualmicrobial cells to complex communities andmulticellular organisms.

Integrated, Large-Scale User Facilities:A Plan to Democratize Access to Systems Biology Resources

nalyzing whole microbial systems requires econo-mies of scale. Traditionally, scientists have tried tounderstand the functions of individual proteins or

small groups of proteins. In the new era of systemsbiology, researchers will study the behavior of the cell’sentire working complements of proteins (proteomes),their regulatory pathways, and their interactions asthey perform functions. These activities must be carriedout on a scale that far exceeds today’s capacities.

To meet this challenge, BER and OASCR have planneda set of four core research facilities. Building on eachother, these facilities are intricately linked in their long-term goals, targets, technologies, capabilities, andcapacities. They will provide scientists with an enduringcomprehensive ability to understand and, ultimately,reap enormous benefit from the biochemical functional-ity of microbial systems. Making the most advancedtechnologies and computing resources available to

5

A Plan for GTL User Facilities

Ascientists in large orsmall laboratorieswill democratizeaccess to the toolsneeded for systemsbiology. They thusopen new avenues ofinquiry, fundamen-tally changing thecourse of biologicalresearch and greatlyaccelerating the paceof discovery. Hall-marks of thesefacilities includehigh-throughputadvanced technolo-gies, automation,and tools for datamanagement and analysis, simulation, and an inte-grated knowledge base.

Cro

wd

ed c

ell

by

D.

Go

od

sell

19

99

) Large-scalefacilities arerequired foridentifying,characterizing,and modelingthe activity ofthe tens ofthousands ofinteractingcomponentspresent at anygiven momentin a cell.

Large-Scale Facilities Spur Cost,Productivity Improvements

The dramatically increased produc-tivity and reduced costs achieved inthe HGP via high-throughputproduction environments (e.g., theDOE Joint Genome Institute) providethe paradigm for the dedicatedindustrial-scale facilities envisionedfor Genomes to Life.

GTL User Facility HallmarksOpen Access to Data and Facilities

These facilities would serve as focal points for the life sciences researchcommunity, providing a national venue to pursue multidisciplinary systemsbiology and promote cross-disciplinary education.

Page 18: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Achieving a molecular-level understanding of lifeprocesses is a national science priority, according tothe Office of Science and Technology Policy (OSTP)and Office of Management and Budget (OMB)FY 2004 Interagency Research and DevelopmentPriorities. The view in this guidance reflects thatfound throughout much of the biological researchcommunity.

OSTP, OMB: “National Science Priority”

Scientific Community, OSTP, OMB, and BERAC:A Growing Mandate for Molecular Studies

AAM: “Develop New Technologies”Specific recommendations made by the AmericanAcademy of Microbiology (AAM) in its 2001 colloquiumreport, Microbial Ecology and Genomics: A Crossroadsof Opportunity include the following:

• “Develop new technologies including methods formeasuring the activity of microorganisms (at the levelof populations and single cells), approaches to culti-vating currently uncultivable species, and methodsfor rapid determination of key physiological traitsand activities.

• “Establish mechanisms to encourage the necessaryinstrument development.

• “Encourage instrumentation development throughcollaboration with device engineers, chemists, physi-cists, and computational scientists, since uncoveringthe diversity and activities of themicrobial world is dependent on such advances.

U.S. Department of EnergyOffice of ScienceMarvin FrazierOffice of Biological and Environmental Research (SC-72)301/903-5468, Fax: 301/[email protected]

Gary JohnsonOffice of Advanced Scientific Computing Research (SC-30)301/903-5800, Fax: 301/[email protected]

Web site for this document: • DOEGenomesToLife.org/pubs/overview.pdf

6

BERAC Subcommittee: “CreateUnique, High-ThroughputResearch Facilities”The BER program of DOE, having played a critical,catalytic role in bringing about the genomic revolution,is now poised to make equally seminal contributions tothe next, transforming phase of biology. A subcommit-tee report approved by the BER Advisory Committee(BERAC) in April 2002 stated: “DOE should now createunique, high-throughput research facilities and resourcesto translate the new biology, embodied in the Genomesto Life (GTL) program, into a reality for the nation. . . .[GTL] is designed to build on the major accomplish-ments of the past decade and move from this vision toreality—to a new and comprehensive systems approachfrom which we will understand the functioning of cellsand organisms and their interactions with their envi-ronments. Since the science has changed so profoundly,to accomplish these challenging goals in a timely andcost-effective fashion, new facilities and new scientificresources are needed.”

GTL Program DevelopmentGenomes to Life is a joint program of the Office of Biological andEnvironmental Research and the Office of Advanced ScientificComputing Research in the Office of Science of the U.S.Department of Energy.

To solicit guidance from the scientific community in thedevelopment of the GTL program, in the past 2 years DOE hassponsored 15 workshops, which were attended by scientists fromindustry, national laboratories, and academia. A strategic plan fordeveloping advanced and high-throughput facilities to serve GTLand the entire community was approved by the BER AdvisoryCommittee (BERAC) in April 2002, and BERAC voiced approval ofthe subsequent facilities plan in December 2002.

GTL was developed in response to a 1999 charge by the DOEOffice of Science to BERAC to define DOE’s potential roles inpost-HGP science. The resulting report, Bringing the Genome toLife (August 2000), set forth recommendations that led to theGenomes to Life roadmap (April 2001). Funding for FY 2002 was$21.7 million. The FY 2003 budget for the program proposed inthe President’s Request to Congress is $42.4 million.

• “Develop technology and analysis capability tostudy microbial communities and symbioses holisti-cally, measuring system-wide expression patterns(mRNA and protein) and activity measurements atthe level of populations and single cells.”

Page 19: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

GTL Program Projects

Harvard Medical School

Microbial Ecology, Proteogenomics and Computational Optima

A2Microbial Ecology, Proteogenomics,and Computational Optima

George Church ([email protected]),Sallie Chisholm, Martin Polz, Roberto Kolter, FredAusubel, Raju Kucherlapati, Steve Lory, Mike Laub,Robert Steen, Martin Steffen, Kyriacos Leptos,Matt Wright, Daniel Segre, Allegra Petti, Jake Jaffe,David Young, Eliana Drenkard, Debbie Lindell, EricZinser, and Andrew Tolonen

Harvard Medical School and Massachusetts Institute ofTechnology

Understanding microbial cells and communitiesrequires system models, not just subsystems, butcomprehensive, genome-wide analyses. Genotype+ environment yields phenotype. New methodsallow us to cost-effectively �overdetermine� eachof these three components enabling studies ofmechanism, optimality, and bioengineering. Thekey to this will be integration of measures of mol-ecules per cell of RNA, proteins and metabolites.

Beyond concentrations, we need to image andmodel 4D structures of cells and of communitiesof cells. New technologies include single-moleculesequencing with polymerase colonies (polonies)to assess RNA and DNA states. New geneticselections allow phenotypes of genome-wide setsof mutations using a microarray read-out. Newcomputational approaches include �expressioncoherence� for combinations of transcription ele-ments and �Minimization of Metabolic Adjust-ment� (MoMA) to model proliferation ofmutants. We are applying these methods toProchlorococcus, responsible for a major fraction ofthe earth�s microbial carbon fixation, Caulobacter,relevant to dilute scavenging and bioremediationas well as cell division, Pseudomonas, displaying abroad range of metabolic pathways includingchemical/biological toxins and well-studiedbiofilms, and to other species in their communi-ties including �uncultivated isolates.�

For more complete descriptions & updates seehttp://arep.med.harvard.edu/DOEGTL.

Genomes to Life I 7

Page 20: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Lawrence Berkeley National Laboratory

Rapid Deduction of Stress Response Pathways in Metal/Radionuclide Reducing Bacteria

A4Rapid Deduction of Stress ResponsePathways in Metal/RadionuclideReducing Bacteria

Adam Arkin2,3 ([email protected]), Alex Beliaev8,Inna Dubchak2, Matthew Fields1, Terry Hazen2, JayKeasling2, Martin Keller4, Vincent Martin2, 3, FrankOlken2, Anup Singh5, David Stahl7, DorotheaThompson1, Judy Wall6, and Jizhong Zhou1

1Oak Ridge National Laboratory; 2Lawrence BerkeleyNational Laboratory; 3University of California, Berkeley;4Diversa, Inc.; 5Sandia National Laboratories; 6University ofMissouri, Columbia; 7University of Washington, Seattle; and8Pacific Northwest National Laboratory

The focus of our research is the characterizationof regulatory networks in microorganisms, andthe creation of data-driven, validated mathemati-cal models of stress response to conditions com-monly found in U.S. Department of Energy(DOE) metal and radionuclide contaminatedsites. We have created an integrated program ofapplied environmental microbiology, functionalgenomic measurement, and computational analy-sis and modeling that seeks to understand thebasic biology involved in a microorganism�s abil-ity to survive in the relevant contaminated envi-ronments while reducing metals andradionuclides. Our main focus is Desulfovibriovulgaris because of its metabolic versatility, itsability to reduce metals of interest to DOE, andits relatively easy culturability and molecular biol-ogy. However, because achieving our program-matic goals requires a comparative analysis ofregulation among multiple bacteria in the envi-ronment, we are also studying Shewanellaoneidensis and Geobacter metallireducens, which fol-low different lifestyles than Desulfovibrio. Becausea strong research community is already studyingthese former two microbes� behavior under theauspices of DOE�s Microbial Cell program, we arecoordinating with those teams to jumpstart theinitial research of this program. Our overarchinggoal is to develop criteria for monitoring theintegrity (health) and altering the trajectory of anenvironmental biological system (process con-

trol). To achieve this requires a more completeunderstanding of how the biological �units� com-prising the system are organized, regulated, andlinked in time and space (genes, genomes, cells,populations, communities, and ultimately, ecosys-tems). Key to these objectives is a more completeunderstanding of stress response systems and theirenvironmental context.

During the first few months of this project, wehave established our three research cores inApplied Environmental Microbiology (AEMC),Function Genomics (FGC), and Computation(CC). Each core has established a work plan withspecific tasks. The tasks and more detailed accom-plishments of each core will be presented in sepa-rate posters. A Web page (http://vimss.lbl.gov)was established immediately for communicationto the public, scientific community and the pro-ject teams. As part of the web page, we haveestablished bulletin boards for discussion and aninterface with the project database (Biofiles).Investigators have uploaded protocols for sam-pling and analysis, and data of various types tothe Biofiles database that the Computational Coreis developing for the project. The CC hasobtained sequences for all three bacteria andbegun analysis. The initial annotations have beencurated, operon, regulon and cis-regulatorysequence prediction have been made and the visu-alization tools are now being built. The FGC hasacquired new instrumentation (eg., Mass spec-trometers) and begun testing on Shewanellastrains. Standard culture conditions for theDesulfovibrio strains have also been tested at allsites and preliminary proteomics data has beenobtained. The AEMC has documented availabledata from the Field Research Center at Oak Ridgefrom various investigators and begun rigorousanalysis of samples for sulfate reducers and in par-ticular Desulfovibrio strains. The AEMC has alsoacquired anaerobic chambers and sediment sam-ples from contaminated areas at the FRC andbegun analysis of stressors to determine the mostappropriate initial simulations and directions forthe project. The initial focus is on pH, N, P, andO.

GTL Program Projects

8 Genomes to Life I

Page 21: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Oak Ridge National Laboratory

Genomes to Life Center for Molecular and Cellular SystemsA Research Program for Identification and Characterization of Protein Complexes

A6Bioinformatics and Computing in theGenomes to Life Center for Molecularand Cellular Systems

D. A. Payne*1 ([email protected]), E. S.Mendoza1, G. A. Anderson1, D. K. Gracio*1, W. R.Cannon1, T. P. Straatsma1, H. J. Sofia1, D. A.Dixon*1, M. Shah2, D. Xu2, D. Schmoyer2, S.Passovets2, I. Vokler2, J. Razumovskaya2, T.Fridman2, V. Olman2, A. Gorin2, E. Uberbacher2, F.Larimer2, and Y. Xu2

*Presenters

1Pacific Northwest National Laboratory; and 2Oak RidgeNational Laboratory

Scientists will generate large amounts of experi-mental and computational data at theORNL/PNNL Genomes to Life (GTL) Centerfor Molecular and Cellular Systems. Data will begenerated at several collaborating facilities andwill need to be shared among the collaboratorsand, ultimately, with the wider research commu-nity. The processing, analysis, management, andstorage of this data will require a flexible, robust,and scalable information system. As the GTL pro-ject ramps up, many of the data and sample track-ing and analysis functions will need to beautomated and integrated to keep up with thehigh-throughput processes. Since the start of theproject, our bioinformatics work has been focus-ing on three areas: 1) laboratory informationmanagement system (LIMS) in support of theCenter�s data management and storage, 2) massspectrometry proteomics analysis, and 3)bioinformatic analysis tools.

LIMS System

We have purchased a commercially available andproven LIMS system, Nautilus� (from ThermoLab Systems) to serve as the backbone for inte-grating data management and analysis. Nautilus,once configured, will provide comprehensive sam-ple tracking from planning through experimenta-tion, data analysis, reporting, and final archival ordisposal. Nautilus will be interfaced with labora-

tory instruments and data analysis tools and ser-vices to enable automation and standardization ofdata processing. Data will be archived throughintegration with the Environmental MolecularSciences Laboratory Northwest File Systemarchive.

A key to the success of this project will be theability for users to have ubiquitous, seamlessaccess to LIMS data at both ORNL and PNNL.To accomplish this data sharing, a schema will bedefined for components and workflow that arecommon to both facilities, and software will bewritten to access data from both instances of theLIMS system. Current activities include definingthe overall system, defining the data managementschema for the respective facilities at ORNL andPNNL, gathering requirements, and identifyingcommon data structures.

Mass Spectrometry Proteomic Data Analysis

Before the GTL program started, PNNL devel-oped the Proteomics Research Information Sys-tem and Management (PRISM) system thatstores, tracks pedigree of, and provides automatedanalyses of proteomic data. PRISM will be usedboth at PNNL and at ORNL for mass spectrome-try data analysis. It is composed of distributedsoftware components that operate cooperativelyon several commercially available computer sys-tems that communicate over standard networkconnections. PRISM collects data files directlyfrom all mass spectrometers in the laboratory andmanages storage and tracking of these data files aswell as automates the processing into both inter-mediate results and final products.

PRISM will be installed at ORNL to provide acommon proteomic data analysis capability. Addi-tionally, a mass spectrometry data analysis pipelinefor automated processing of large-scale mass spec-trometry data of proteins and protein complexeshas been designed and is in the early stages ofimplementation. The pipeline is designed to pro-cess data generated using both bottom-up andtop-down approaches and to combine informa-tion derived from both approaches for identifyingproteins and protein complexes. The pipeline

GTL Program Projects

Genomes to Life I 9

Page 22: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

builds a data interpretation capability based onthree existing mass spectrometry data analysissoftware: SEQUEST, MASCOT, and COMET.These tools have been evaluated with systematiccomparison using experimental data. Throughthese analyses, computational techniques havebeen developed for assessing the reliability ofthese identification tools. For example, in the caseof SEQUEST, a neural network and a statis-tics-based method has been developed for suchreliability assessment. Such a capability can signif-icantly remove the need of human involvement inlarge-scale MS data interpretation. New methodsfor de novo sequencing that can complementdatabase search-based methods for protein identi-fication are also under development.

Bioinformatic Analysis Tools

In the area of bioinformatics, our project isfocused in many areas: computational inferencingof protein complexes, including membrane-associ-ated complexes, dynamic simulation of pro-tein-protein interaction, and functionalmechanism studies of protein complexes; charac-terizations of amino acids and peptide transportpathways; and identification of operons andregulons. Interactive analysis and visualizationtools are being developed to support these goals.

This research is supported by the Office of Biological and Envi-ronmental Research of the U.S. Department of Energy. PacificNorthwest National Laboratory is operated for the U.S. Depart-ment of Energy by Battelle Memorial Institute through ContractNo. DE-AC06-76RLO 1830.

A8Mass Spectrometry in the Genomes toLife Center for Molecular and CellularSystems

Gregory B. Hurst1 ([email protected]), Robert L.Hettich1, Nathan C. Verberkmoes1, Gary J. VanBerkel1, Frank W. Larimer1, Trish K. Lankford1,Steven J. Kennel1, Dale Pelletier1, JaneRazumovskaya1, Richard D. Smith2, Mary Lipton2,Michael Giddings5, Ray Gesteland4, Malin Young3,and Carol Giometti6

1Oak Ridge National Laboratory; 2Pacific NorthwestNational Laboratory; 3Sandia National Laboratories;4University of Utah; 5University of North Carolina; and6Argonne National Laboratory

Mass spectrometry is a significant contributor tothe Center for Molecular and Cellular Systemsdue to its capability for high-throughput identifi-cation of proteins and, by extension, protein com-plexes. From the outset of the Genomes To Life(GTL) Program, therefore, mass spectrometry hasan important role to play in the pursuit of Goal 1of the GTL�the identification of the �machinesof life.� The potential utility of mass spectrometryto GTL, however, extends far beyond currentcapabilities. In addition to incorporation ofstate-of-the- art mass spectrometry as a resource,we have also included a mass spectrometryresearch component as part of the Center forMolecular and Cellular Systems. The aim of thisresearch component is to improve on existingmass spectrometry tools for protein complex char-acterization, as well as to produce new tools thatwill further the goals of the GTL program. Key tothe success of this research component is closeinteraction with the protein expression, complexisolation, computational and imaging compo-nents of the Center.

Currently, mass spectrometry is contributingheavily to the process of identifying target pro-teins that are likely to be members of complexesin Rhodopseudomonas palustris. These target pro-teins will be evaluated for expression as fusionswith affinity labels to facilitate isolation of com-plexes. This identification process is based onmass spectrometric detection, in pelleted frac-tions, of proteins that one would normally expectto find in soluble fractions, indicating possiblemembrane association or membership in a largecomplex. From MS analysis of proteins from twodifferent growth conditions of R. palustris, an ini-tial list of target proteins has been assembled. The

GTL Program Projects

10 Genomes to Life I

Page 23: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

MS analysis strategy at ORNL measures bothintact molecular masses (�top-down�) and tandemmass spectra of tryptic digests of proteins (�bot-tom-up�). The �bottom-up� approach allowsmore comprehensive identification of proteins ina sample, while the �top-down� approach, whichexploits the high-performance characteristics ofFourier transform mass spectrometry, providesinformation on post-translational modifications.The accurate mass tag (AMT) approach at PNNLis aimed at increasing throughput, sensitivity, anddynamic range for enhancing the detection oflow-copy-number proteins and complexes.

We have also obtained initial mass spectrometryresults from affinity purifications of fusions of R.palustris genes with GST and 6-HIS affinity tags,expressed in E. coli, verifying correct expression ofthe fusion proteins. Two strategies are being com-pared for this measurement. The first strategy isto elute affinity-captured proteins from the resin,separate by 1D SDS-PAGE, excise bands, digest,and analyze by reverse-phase nanoscale liquidchromatography on line with nano-electrospray/tandem mass spectrometry. The second strategy isto eliminate the gel separation, and simply digestthe entire mixture eluted from the affinity resin.The latter strategy will improve throughput con-siderably. �Top-down� measurements of affin-ity-captured fusion proteins are also underway.Current experiments directed toward expressionof affinity-labeled proteins in R. palustris will pro-vide our first opportunity for mass spectrometricidentification of proteins that associate with theselabeled targets-an important first step for Goal 1of GTL.

Combined mass spectrometric and computationalmethods for characterizing crosslinked proteincomplexes are also under development.Crosslinking offers the opportunity to stabilize�fragile� complexes. Furthermore, it provides analternative method to introduce an affinity taginto a protein complex, potentially increasing thethroughput of analysis of complexes. Technicalissues to be solved include increasing the robust-ness of crosslinking protocols, mass spectrometricdetection of crosslinks, and computational meth-ods for data interpretation. We have made prog-ress in optimizing an affinity purificationprocedure based on peptides that have beencrosslinked using a biotinylated reagent. Com-puter programs for interpretation of mass spectraof crosslinked samples have been initiated. Dem-onstration of integrating these various compo-nents on a model protein complex is underway.

Although not all funded by GTL, other massspectrometric techniques relevant to the goals ofthe GTL are also under development. At ORNL,these include a method for characterizing surfacesof proteins and protein complexes via oxidativechemistry combined with mass spectometry, andsampling by electrospray mass spectrometry ofproteins captured on surfaces displaying arrays ofaffinity-capture reagents surfaces. PNNL is devel-oping hardware improvements for increasing thespeed, sensitivity, and dynamic range of measure-ments, as well as informatic methods for incorpo-rating chromatography elution information inprotein identification techniques.

This research sponsored by Office of Biological and Environ-mental Research, U.S. Department of Energy. Oak RidgeNational Laboratory (ORNL) is managed by UT-Battelle, LLC,for the U. S. Department of Energy under Contract No.DE-AC05-00OR22725.

A10Genomes to Life Center for Molecularand Cellular Systems: A Research Pro-gram for Identification and Character-ization of Protein Complexes

Joshua N. Adkins1, Deanna Auberry1, BaoweiChen1, James R. Coleman1, Priscilla A. Garza1, JaneM. Weaver Feldhaus1, Michael J. Feldhaus1, Yuri A.Gorby1, Eric A. Hill1, Brian S. Hooker1, Chian-TsoLin1, Mary S. Lipton1, L. Meng Markillie1, M. UljanaMayer1, Keith D. Miller1, Sewite Negash1, MargaretF. Romine1, Liang Shi1, Robert W. Siegel1, RichardD. Smith1, David L. Springer1, Thomas C. Squier1,H. Steven Wiley1 ([email protected]), Linda J.Foote2, Trish K. Lankford2, Frank W. Larimer2, T-Y.S. Lu2, Dale Pelletier2, Stephen J. Kennel2, andYisong Wang2

1Pacific Northwest National Laboratory; and 2Oak RidgeNational Laboratory

Summary: We have developed methodologies forisolating and identifying multiprotein complexesin Shewanella oneidensis MR-1 (PNNL) andRhodopseudomonas palustris (ORNL), whosemetabolisms are important in both understandingmicrobial energy production and environmentalremediation. We are comparing complementarymethods involving the isolation and identificationof transient and stable protein complexes, with acurrent focus on validating the physiological rele-vance of isolated protein complexes.

GTL Program Projects

Genomes to Life I 11

Page 24: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Cloning, Expression, and Purification: To date,23 S. oneidensis genes have been cloned into theGATEWAY� expression vector pDEST� con-taining a His6-tag for purification. Initial screen-ing tests indicate that ~73% of cloned genes wereexpressed. Among those expressed proteins, 8were purified to homogeneity using a Ni-NTAcolumn under nondenaturing conditions. Theyields of purified proteins obtained from 1 L ofculture varied from 5 to 29 mg. We have also con-structed new GATEWAY™-compatible vectorsthat will permit the expression of His6-taggedproteins in both S. oneidensis and R. palustris andthe subsequent isolation of preformed complexesfrom microbes. Using four modified pDEST vec-tors, 7 R. palustris, genes have been cloned andexpressed in E. coli. We are testing both N andC-terminal 6-his and GST tags for efficiency ofexpression and purification . Western blots of pro-teins and MS spectra of tryptic digests (see MSposter) of the GST-tagged nitrite reductase verifythe expression and purification of polyproteins athigh yield. The modified vector containing theGroEL gene has been inserted into R. palustrisand it appears to be retained and convey drugresistance to the bacteria. Pull down experimentsare in progress to isolate complexes from this tar-get organism.

Affinity Reagent Generation: Purified proteinsfrom S. oneidensis are currently being screenedagainst a cell surface display of single-chain frag-ment variable (scFv) antibodies on the yeastSaccharomyces cerevisiae developed at PNNL,allowing rapid generation of affinity reagents thatwill permit the capture of protein complexesformed in vivo. We expect that these affinityreagents will cross-react with homologous proteincomplexes in different microbes, permitting therapid isolation of protein complexes in a general-ized manner.

Tagging and Cross-Linking Approaches forComplex Isolation: In addition to the His6-tag,additional epitope tags are being assessed for theirutility in enhancing the specificity of complex iso-lation under milder isolation conditions that willretain low-affinity binding partners in proteincomplexes. To date, we have demonstrated theutility of the CCXXCC epitope sequence for pro-tein purification. Likewise, commercially availablelight-activated cross-linking reagents have beenused to stabilize protein complexes in cellularhomogenates from Shewanella, permitting theaffinity purification of protein complexes undermore stringent conditions that removenonspecifically associated proteins. Under these

conditions a limited range of cross-linked prod-ucts are observed that are readily characterized bymass spectrometry.

Complex Isolation and Identification: Criticalto the development of robust methods to rapidlyisolate protein complexes is the assessment ofstandard protocols to isolate and identify differentclasses of protein complexes. We have thereforedeveloped parallel methods focusing on the isola-tion and identification of membrane and solubleprotein complexes that are known to form eitherstable or transient protein-protein interactions.Initial measurements have focused on the identifi-cation of stable and soluble protein complexes(e.g., RNA polymerase A), which has permittedthe validation of protein isolation and cross-link-ing methods and the development of conditionsthat minimize nonspecific protein associations.However, because dynamic changes in proteincomplexes are expected to provide importantinsights into the metabolic regulatory strategiesused by these organisms to adapt to environmen-tal changes, we have extended these methods toassess transient protein interactions associatedwith signal transduction proteins(phosphotyrosine phosphatase A ) and stress-reg-ulated proteins (e.g., methionine sulfoxide reduc-tases A and B). In the latter cases, these proteinsare known to interact and reduce oxidized sub-strates on a time scale of minutes. The develop-ment of immunoprecipitation methods thatpermit the isolation of transient complexes involv-ing these proteins suggests that generalizablestrategies to rapidly isolate protein complexes canbe used to identify the formation of transient pro-tein complexes. Surprisingly, the catalytic activityof methionine sulfoxide reductases fromShewanella has additional catalytic activities rela-tive to those found in either E. coli or vertebrates,consistent with Shewanella’s known ability tothrive under harsh environmental conditions. Weexpect that identifying binding partners betweenthis critical antioxidant protein will, furthermore,provide important information regarding oxida-tively sensitive proteins and associated regulatorystrategies that these organisms implement to sur-vive.

Of the 7 R. palustris proteins expressed in themodified pDEST vector, we are concentrating onthe GroEL chaparonin protein to validate com-plex formation. The tagged protein expressed inE. coli can be used to complex with GroES fromR. palustris to document complex formation andpull-down efficiency. R. palustris has two differentgenes for GroEL type proteins and we will test if

GTL Program Projects

12 Genomes to Life I

Page 25: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

each is expressed and if they form co-complexesor if they are used separately for different func-tions. Dissimilatory nitrite reductases are capableof generating a membrane potential, as well asproviding an electron sink for maintenance of bal-anced photosynthetic growth in the presence ofhighly reduced C-sources. In addition, there is areport that cells engaged in denitrification have analtered chemotactic response. Other systems beingexpressed include subunits of the uptakehydrogenase and components of sulfite oxidation,i.e., sulfite dehydrogenase, and sulfite oxidase.

This research is supported by the Office of Biological and Envi-ronmental Research of the U.S. Department of Energy. PacificNorthwest National Laboratory is operated for the U.S. Depart-ment of Energy by Battelle Memorial Institute through ContractNo. DE-AC06-76RLO 1830. Oak Ridge National Laboratoryis managed by UT-Battelle, LLC, for the U. S. Department ofEnergy under Contract No. DE-AC05-00OR22725.

A12New Approaches for High-ThroughputIdentification and Characterization ofProtein Complexes

Michelle Buchanan1 ([email protected]),Frank Larimer1, Steven Wiley2, Steven Kennel1,Thomas Squier2, Michael Ramsey1, Karin Rodland2,Gregory Hurst1, Richard Smith2, Ying Xu1, DavidDixon2, Mitchel Doktycz1, Steve Colson2, CarolGiometti3, Raymond Gesteland4, Malin Young5, andMichael Giddings6

1Oak Ridge National Laboratory; 2Pacific NorthwestNational Laboratory; 3Argonne National Laboratory;4University of Utah; 5Sandia National Laboratories; and6University of North Carolina

The Center for Molecular and Cellular Systems(CMCS) is a recently established project thatfocuses specifically on Goal 1 of the GTL pro-gram. Its aim is to identify and characterize thecomplete set of protein complexes within a cell toprovide a mechanistic basis of biochemical func-tions. Achieving this Goal would provide the abil-ity to understand cells and their components insufficient detail to allow the creation of networkmaps of cells that could be used in building mod-els to predict, test and understand the responsesof a biological system to its environment. Further,Goal 1 forms the foundation necessary to accom-plish all of the other objectives of the GTL pro-gram, which are focused on gene regulatory

networks and molecular level characterization ofinteractions in microbial communities.

A stated goal of the GTL program is to identifygreater than 80% of the protein complexes in anorganism per year within the first five years of theprogram. Ultimately, the GTL program willrequire the analysis of thousands of protein com-plexes from hundreds of microbes each year. Thecentral task of the CMCS (Core Project) is tointegrate biological, analytical, and computationaltools to allow identification and characterizationof protein complexes in a robust, high-throughputmanner. The Core includes systems for growth ofmicrobial cells under well-characterized condi-tions, isolation of protein complexes from cells,and their analysis by mass spectrometry (MS), fol-lowed by verification and characterization byimaging techniques. Several approaches for theisolation of the complexes are currently beingexamined and compared, including affinity tags(e.g., GST and 6-HIS affinity tags) and singlechain antibodies. Computational tools are beingintegrated into this process to track samples,interpret the data, and to archive and disseminatedata. Automated, parallel sample handling pro-cesses will be incorporated to maximize through-put and minimize amount of sample required.

The CMCS is initially focused on the identifica-tion and characterization of protein complexes intwo microbial systems, Shewanella oneidensis andRhodopseudomonas palustrus. The aim is to obtain aknowledge base that can provide insight into therelationship between the complement of proteincomplexes in these microbes and their biologicalfunction. Early activities within the Core havefocused on setting up isolation, purification andanalysis techniques and obtaining data on specificcomplexes in these two microbes. For R. palustris,we have performed baseline growth studies in twoimportant metabolic states, anaerobicphotohetero-trophic and dark aerobicheterotrophic. Wild-type cultivations at up to 2-Lhave generated samples for proteome analysis andfor isolation of protein complexes. Data has beenobtained from affinity purification of fusion pro-teins between several R. palustris genes and GSTand 6-HIS affinity tags have been expressed in E.coli. We have verified correct expression of thefusion proteins and affinity-labeled proteins in R.palustrus. Various forms of chaperonin60, nitritereductase, hydrogenase subunits, sulfitedehydrogenase, and thiosulfite oxidase are cur-rently being examined. Work with Shewanella hasfocused on an initial set of tagged proteinsexpressed in E. coli; 20 proteins are in progress,

GTL Program Projects

Genomes to Life I 13

Page 26: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

among them, phosphotyrosine phosphatase,methionine sulfoxide reductase and RNA poly-merase�alpha subunit have been purified and car-ried forward to use as bait with Shewanellaextracts, with MS-MS analysis proceeding.

The Core of the CMCS will generate largeamounts of experimental data at different sitesand these data will need to be shared among thecollaborators and, ultimately, with the widerresearch community. The management and stor-age of this data requires a flexible, robust and scal-able information system. After a comprehensiveanalysis and evaluation of the CMCS�s processand data flow information need, we selected aLaboratory Information Systems (LIMS) that willserve as the backbone for integrating data man-agement and analysis. Concurrent with evaluationof LIMS systems, we have also examined the pro-cesses within the Core that can be readily auto-mated and incorporated into parallel processes(e.g., 96 well plate format), such as cell lysis,complex isolation, and final purification prior toMS analysis.

As initial data are generated within the Core, weare also evaluating the technologies to identifybottlenecks and needs for technology improve-ment. Current technologies for the identificationand characterization of protein complexes will notbe sufficient to meet the long-term goals of theGTL program. Therefore, a number of researchtasks have been devised to address specificrequirements of the Core, including newapproaches for high throughput complex process-ing. For example, as part of the efforts to improvesample processing, we are evaluating microfluidicdevices for microbial cell lysis and protein/peptideseparation. We are also examining novelapproaches for optimizing molecular characteriza-tion by MS, such as improving sensitivity anddynamic range. Combined MS and computationalmethods for characterizing crosslinked proteincomplexes area also under development.Crosslinking offers the opportunity to stabilize�fragile� complexes, and is an alternative to intro-ducing an affinity tag into the complex, poten-tially increasing analysis throughput. Initialinvestigations have included optimization of anaffinity purification procedure based oncrosslinked biotinylated peptides, and the identifi-cation of putative cross-links in model proteincomplexes. In addition, imaging techniques arebeing developed to validate the presence of com-plexes in cells and to provide physical character-ization of the complexes. Finally, bioinformaticstools for data tracking, acquisition, interpretation,

and dissemination, along with computationaltools for modeling and simulation of proteincomplexes are being developed.

This research sponsored by Office of Biological and Environ-mental Research, U.S. Department of Energy. Oak RidgeNational Laboratory (ORNL) is managed by UT-Battelle, LLC,for the U. S. Department of Energy under Contract No.DE-AC05-00OR22725.

A14Automation of Protein Complex Anal-yses in Rhodopseudomonas palustris andShewanella oneidensis

P. R. Hoyt1 ([email protected]), C. J.Bruckner-Lea2, S. J. Kennel1, P. K. Lankford1, M. S.Lipton2, R. S. Foote1, J. M. Ramsey1, K. D.Rodland2, and M. J. Doktycz1

1Oak Ridge National Laboratory; and 2Pacific NorthwestNational Laboratory

High-throughput analyses afforded by mass spec-troscopy require sample preparation processesthat can keep pace. Standardization and automa-tion of protein �pulldowns�, and related reagentsare being developed. The processes are designedto provide a straightforward material flow inhigh-throughput format for the pulldown of pro-tein complexes from the Rhodopseudomonaspalustris and Shewanella oneidensis genomes.Existing techniques are well developed; however,some processes in clone library, antibody, and pro-tein complex production have never been auto-mated and few established protocols are available.In order to provide the highest level of biologicalsignificance and protein interaction coverage, theprotein complex pulldowns from the differentorganisms will use different strategies. Subse-quently, automation is designed to use flexible,compatible processes of varied scale during theprogram such that advances in technology can beevolved into innovative high-throughput tech-niques for sample preparation. The result will be aunique and robust system for protein expressionand complex pulldown in bacterial systems.

The process for production of native tagged pro-teins for complex pulldown experiments uses con-ventional fluidics scale of 96-well format andliquid handling robotics. It is subdivided into themolecular preparation of a complete genomiclibrary of expression clones for in vivo expressionof R. palustris genes, followed by the production

GTL Program Projects

14 Genomes to Life I

Page 27: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

of proteins and �pull-downs� of protein com-plexes for analyses by mass spectrometry. Thegene library and protein production schemeinvolves a suite of high-throughput molecularbiology techniques based on the Gateway� tech-nology cloning strategy supplied by InvitrogenCorporation. This process requires two rounds ofrecombination between purified DNAs to pro-duce protein expression vectors suitable forpull-down experiments in RP. At this time, allPCR setup, PCR purification, plasmid isolation,and redistribution steps, have been fully auto-mated and integrated into an information man-agement system for sample tracking.Recombination reactions should be fully auto-mated in the near future using existing instrumen-tation. High-throughput automation of theelectroporation steps, as well as colony pickingcan be automated using commercially availableproducts, which are currently being evaluated.This leaves only the plating of bacteria on selec-tive media to rely on manual processes.

Because detergents are not compatible with massspectroscopic analyses, manual disruption pro-cesses were required. We were able to adapt ahigh-throughput, closed container non-detergentbead-milling technology (used originally forhigh-throughput isolation of RNA from animaltissues), to disrupt the R. paulutris cell walls. Thisprocess results in comparable protein profiles gen-erated using other physical disruption techniques.Bead milling has been found to be most compati-ble with downstream MS analyses. Additionally, itreduces cross-contamination, and provides anextraordinary level of automation to the produc-tion process.

An heterologous-tagged protein pulldown system,for S. oneidensis using single-chain antibodies (Ab)to specific expressed proteins is also under auto-mation development. This process uses amicrofluidics platform combined withfunctionalized microbeads for the purification ofprotein complexes. A renewable microcolumn sys-tem with optical detection has been assembledand automated procedures developed. The renew-able microcolumn consists of small volumes(microliters) of microbeads that are automaticallypacked, perfused with cell lysates, and wash solu-tions, and proteins eluted using a solution that issuitable for mass spectrometry analysis. After eachpurification, the small volume of microbeads isautomatically flushed from the microcolumn anda new microcolumn is automatically packed. Themicrobeads are functionalized for the capture of aspecific protein, for example by derivatization

with an antibody for the protein of interest.Optical monitoring of the microcolumn duringprocessing provides information about theamount of material on the column during eachbinding and washing step. The current automatedprocedure can process a cell lysate volume rangingfrom 10 microliters to 1 millilter, and the purifiedproteins are eluted into 150 microliters of a lowsalt buffer solution. Automated procedures arecurrently being tested for the capture ofShewanella proteins tagged with yellow fluores-cent protein (YFP), along with the proteins thatassociate with the YPF-tagged protein. As newreagents for protein capture such as single chainantibodies for Shewanella proteins of interest aredeveloped, they will be linked to microbeads andrenewable column protocols will be developed forautomated purification of the protein complexesfor mass spectrometry. In the next stage of thiswork, the eluted protein complexes will be ana-lyzed by mass spectrometry and the automatedprotocols will be optimized.

For the ultimate in throughput and sensitivity, alab-on-a-chip complex isolation and identificationprogram is also under development. Many of theindividual steps involved in sample processing andanalysis, including cell lysis, protein/peptide sepa-rations and enzyme digestions, have been imple-mented in microfluidic devices that can beinterfaced with mass spectrometry for on-lineanalysis. (We have previously demonstrated elec-trically induced lysis of mammalian cells inmicrofluidic devices and will apply this techniqueto bacterial protoplasts). The integration of thesefunctions with a pull-down step would providehigh-throughput analyses of protein complexes inextremely small numbers of cells.

In summary, protein complex analysis by massspectroscopy will require a high-throughputreagent production scheme. Because the com-plexes isolated are different for the differentorganisms, different schemes for complex isola-tion have been implemented. At scales rangingfrom macro to micro we are automating the pro-duction of reagents and samples to produce thesedifferent complexes, and the processes are beingoptimized to feed into mass spectroscopic analy-ses. The automation development is concomitantwith establishment of sample tracking and infor-mation management processes so that integrationof these systems will be seamless.

This research sponsored by Office of Biological and Environ-mental Research, U.S. Department of Energy. Oak RidgeNational Laboratory (ORNL) is managed by UT-Battelle, LLC,

GTL Program Projects

Genomes to Life I 15

Page 28: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

for the U. S. Department of Energy under Contract No.DE-AC05-00OR22725.

Sandia National Laboratories

Carbon Sequestration in Synechococcus

From Molecular Machines to Hierarchical Modeling

A16Analysis of Protein Complexes from aFundamental Understanding of Pro-tein Binding Domains and Pro-tein-Protein Interactions inSynechococcus WH8102

Anthony Martino1 ([email protected]), AndreyGorin2, Todd Lane1, Steven Plimpton1, NagizaSamatova2, Ying Xu2, Hashim Al-Hashimi3, CharlieStrauss4, Byung-Hoon Park2, George Ostrouchov2,Al Geist2, William Hart2, and Diana Roe1

1Sandia National Laboratories, P.O. Box 969, MS9951,Livermore, CA 94551; 2Oak Ridge National Laboratory,P.O. Box 2008, MS6367, Oak Ridge, TN 37831;3University of Michigan, Department of Chemistry, 930 N.University, Ann Arbor, MI 48109; and 4Los AlamosNational Laboratories, P.O. Box 1663, Los Alamos, NM87545

The goal of this work is to characterize proteincomplexes in SynechococcusWH8102 by studyingprotein-protein interaction domains. We arefocused on two efforts, one on the protein com-position and cognate binding partners in thecarboxysome, and another on characterization ofknown protein binding domains throughout thegenome. An experimental design is chosen tointegrate a number of computational techniquesin order to develop a fundamental understandingof how protein complexes form.

Experimental Elucidation of Protein Com-plexes

Initial efforts will focus on the carboxysome, apolyhedral inclusion body that consists of a pro-tein shell surrounding ribulose 1,5-bisphosphatecarboxylase/oxygenase (RuBisCO). WhileRuBisCO regulates photosynthetic carbon reduc-tion, the function of the carboxysome is unclear.The carboxysome may either actively promotecarbon fixation by concentrating CO2 or passivelyplay a role by regulating RuBisCO turnover. Thepresence of carbonic anhydrase, an enzyme thatregulates the equilibrium between inorganic car-

bon species, in the carboxysome would suggest anactive role in carbon concentration, but experi-mental results are mixed. No clear biochemicalevidence for a link between carbonic anhydraseand the carboxysome exists in WH8102.

We are developing synergistic techniques includ-ing protein identification mass spectrometry, yeast2-hybrid, phage display, and NMR to characterizethe composition, cognate binding partners, andprotein interaction domains in the carboxysome.Established techniques are in progress to purifycarboxysomes. Earlier literature indicate thecarboxysome is composed of 5-15 peptides. Inseveral organisms, a number of proteins withincarboxysomes are known, and in SynechococcusWH8102, a number are inferred by homology.Results are dependent on sometimes difficultcarboxysome preparations. We hope to report onprogress in this area specific to SynechococcusWH8102. After SDS-PAGE separation and in-gelenzymatic digests, comprehensive protein identifi-cation will be determined using quadrupoletime-of-flight mass spectrometry with anelectrospray ionization source. Cognate bindingpairs between known proteins will be determinedby systematic yeast 2-hybrid experiments. Theresults will be verified and explored further usingphage display to determine potential proteinbinding domains. Both genomic and random pep-tide libraries will be employed. Finally, we willpursue the development of automatedRDC-NMR methods for high throughput assign-ments and characterization of relative domainalignments in two sub-units in RuBisCO (52KDa) and organization of the carboxysome.NMR methods for characterizing protein-proteininteractions are also being developed that rely onprobing interactions between proteins and pep-tide moieties that are attached to field orientedphage particles. Such an approach would enjoyhigh sensitivity to molecular interactions, provid-ing an effective complement to phage displaymethods.

GTL Program Projects

16 Genomes to Life I

Page 29: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

In a broader effort, proteins in SynechococcusWH8102 containing known binding domains willbe explored using phage display. Eight TPR, fourPDZ, and four CBS domains are indicated bypfam analysis in ORFs of SynechococcusWH8102.Three SH3-homologous domains have beendescribed in other cyanobacteria. Determinationof consensus binding sites within the genome willcharacterize possible fundamental interactiondomains in complexes and provide insight forcomputing theoretical protein interaction maps.

Computational Elucidation of Protein Com-plexes

Investigations of protein-protein interactions areconducted on many levels and with different ques-tions in mind�ranging from the reconstruction ofgenome-wide protein-protein interaction net-works and to detailed studies of the geome-try/affinity in a particular complex. Yet as thequestions asked at the different levels are oftenintricately related and interconnected, we areapproaching the problem from several directions,developing computational methods involvingsequence analysis approaches, low resolution pre-diction of protein folds and detailed atom-atomsimulations.

The sequencing of complete genomes has createdunique opportunities to fuse the knowledgeextracted from genomic contexts for prediction ofthe functional interactions between genes. Herewe demonstrate that unusual protein-profile pairscan be �learned� from the database of experimen-tally determined interacting proteins. Distribu-tions of protein-profile counts are calculated forrandom and interacting protein pairs. A pair ofprotein-profiles is considered unusual if its fre-quency distribution is significantly different com-pared to what is expected at random. Wedemonstrate that statistically significant patternscan be identified among protein-profiles charac-terized by the PFAM domains, Blocks proteinfamilies, or InterPro signatures but not by thePROSITE and TIGRFAM. Such patterns can beused for predicting putative pairs of interactingproteins beyond original �learning database�.

In addition to �sequence-based� protein signa-tures one of our main aims is the development ofstructure-based algorithms for the inference ofprotein-protein interactions. At the initial stagewe will apply structure prediction methods todetermine protein fold families with ourROSETTA and PROSPECT programs and useinferred structural similarities to create hypotheses

about their interacting partners. The necessarystep in this process is a creation of structure pre-diction pipeline for high throughput characteriza-tion of the protein folds. The computationalpipeline merges several bioinformatics and model-ing tools including algorithms for protein domaindivision, secondary structure prediction, fragmentlibrary assembly, and structure comparison. Sincethe protein folding algorithms deliver not uniqueanswers but rather ensembles of predictions wewill also construct database system to store andcurate the accumulated inferences.

Finally, we are developing tools for full atommodeling of protein-protein interactions. Tem-pering capability is being integrated into our par-allel molecular dynamics code (LAMMPS). Intempering, multiple copies of a system are simu-lated simultaneously. Temperature exchanges areperformed between copies to more efficientlysample conformational space. We are using tem-pering to generate conformations of short peptidechains in solution, similar to the peptide frag-ments that bind to proteins in the phage displayexperiments our team is performing. These con-formations will be used in peptide docking calcu-lations against protein binding domains fromSynechococcus. We are extending our docking codePDOCK with genetic-algorithm optimizers toenable peptide flexibility in this step. The com-puted conformations of docked complexes will befurther relaxed and solvated with molecular tools(MD and classical DFT) to estimate relative bind-ing affinities, converting experimental phage dis-play output into quantitative protein/proteinnetwork data.

GTL Program Projects

Genomes to Life I 17

Page 30: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

A18Carbon Sequestration in Synechococcus:Microarray Approaches

Brian Palenik4, Anthony Martino2, Jerilyn A.Timlin2 ([email protected]), David M. Haaland2,Michael B. Sinclair2, Edward V. Thomas2, VijayaNatarajan3, Arie Shoshani3, Ying Xu1, Dong Xu1,Phuongan Dam1, Bianca Brahamsha4, Eric Allen4,and Ian Paulsen5

1Oak Ridge National Laboratory; 2Sandia NationalLaboratories; 3Lawrence Berkeley National Laboratory;4Scripps Institute of Oceanography; University of SouthernCalifornia, San Diego; and 5The Institute for GenomicResearch

Synechococcus sp. are major primary producers inthe marine environment. Their carbon fixationrates are likely affected by physical and chemicalfactors such as temperature, light, and the avail-ability of nutrients such as nitrate and phosphate.In our GTL, microarray analysis is being devel-oped as a collaborative multidisciplinary projectto characterize Synechococcus gene expressionunder different environmental stresses. We areconstructing a whole genome microarray. We aredeveloping microarray experiments using statisti-cal considerations as input to the process. We areanalyzing the arrays with a unique hyperspectralscanner and associated analysis algorithms. Themicroarray data will be archived using state of theart database management techniques. Themicroarray data will then be analyzed using ourrecently developed techniques for cluster, datamining, and incorporated in pathway analyses.The result will be biological insights intoSynechococcus and marine primary productivity notachievable by a single investigator approach.

A20Carbon Sequestration in Synechococcussp.: From Molecular Machines to Hier-archical Modeling

Grant S. Heffelfinger1([email protected]),Anthony Martino2, Andrey Gorin3, Ying Xu3, MarkD. Rintoul III1, Al Geist3, Hashim M. Al-Hashimi8,George S. Davidson1, Jean Loup Faulon1, Laurie J.Frink1, David M. Haaland1, William E. Hart1, ErikJakobsson7, Todd Lane2, Ming Li9, Phil Locascio2,Frank Olken4, Victor Olman2, Brian Palenik6, StevenJ. Plimpton1, Diana C. Roe2, Nagiza F. Samatova3,Manesh Shah2, Arie Shoshani4, Charlie E. M.Strauss5, Edward V. Thomas1, Jerilyn A. Timlin1,and Dong Xu2

1Sandia National Laboratories, Albuquerque, NM; 2SandiaNational Laboratories, Livermore, CA; 3Oak Ridge NationalLaboratory, Oak Ridge, TN; 4Lawrence Berkeley NationalLaboratory, Berkeley, CA; 5Los Alamos National Laboratory,Los Alamos, NM; 6University of California, San Diego;7University of Illinois, Urbana/Champaign; 8University ofMichigan; and 9University of California, Santa Barbara

This talk will discuss the Sandia-led Genomes toLife (GTL) project: �Carbon Sequestration inSynechococcus sp.: From Molecular Machines toHierarchical Modeling.� This project is focusedon developing, prototyping, and applying newcomputational tools and methods to ellucidate thebiochemical mechanisms of the carbon sequestra-tion of Synechococcus sp., an abundant marinecyanobacteria known to play an important role inthe global carbon cycle. Our effort includes fivesubprojects: an experimental investigation, threecomputational biology efforts, and a fifth whichdeals with addressing computational infrastruc-ture challenges of relevance to this project and theGenomes to Life program as a whole. Some detailwill be provided in this talk about each of oursubprojects, starting with our experimental effortwhich is designed to provide biology and data todrive the computational efforts and includes sig-nificant investment in developing new experimen-tal methods for uncovering protein partners,characterizing protein complexes, identifying newbinding domains. Discussion of our computa-tional efforts will include coupling molecular sim-ulation methods with knowledge discovery fromdiverse biological data sets for high-throughputdiscovery and characterization of protein-proteincomplexes and developing a set of novel capabili-ties for inference of regulatory pathways in micro-bial genomes across multiple sources ofinformation through the integration of computa-

GTL Program Projects

18 Genomes to Life I

Page 31: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

tional and experimental technologies. We are alsoinvestigating methods for combining experimen-tal and computational results with visualizationand natural language tools to accelerate discoveryof regulatory pathways and developing set ofcomputational tools for capturing the carbon fixa-tion behavior of complex of Synechococcus at differ-ent levels of resolution. Finally, because theexplosion of data being produced byhigh-throughput experiments requires data analy-sis and models which are more computationallycomplex, more heterogeneous, and require cou-pling to ever increasing amounts of experimen-tally obtained data in varying formats, we havealso established a companion computational infra-structure to support this effort. This element ofour project will be discussed in the larger GTLprogram context as well.

A22Systems Biology Models forSynechococcus sp.

Mark D. Rintoul1 ([email protected]), DamianGessler2, Jean-Loup Faulon1, Shawn Means1, StevePlimpton1, Tony Martino2, and Ying Xu3

1Sandia National Laboratories; 2National Center for GenomeResources; and 3Oak Ridge National Laboratory

Ultimately, all of the data that is generated fromexperiment must be interpreted in the context of amodel system. Individual measurements can berelated to a very specific pathway within a cell,but the real goal is a systems understanding of thecell. Given the complexity and volume of experi-mental data as well as the physical and chemicalmodels that can be brought to bear on subcellularprocesses, systems biology or cell models hold thebest hope for relating a large and varied numberof measurements to explain and predict cellularresponse. Clearly, cells fit the working scientificdefinition of a complex system: a system where anumber of simple parts combine to form a largersystem whose behavior is much harder to under-stand. The primary goal of this subproject is tointegrate the genomic data generated from theoverall project�s experiments and lower level simu-

lations, along with data from the existing body ofliterature, into a whole cell model that capturesthe interactions between all of the individualparts. It is important to note here that all of theinformation that is obtained from other efforts inthis project is vital to the work here. In a sense,this is the �Life� of the �Genomes to Life� themeof this project.

The precise mechanism of carbon sequestration inSynechococcus sp. is poorly understood. There ismuch unknown about the complicated pathwayby which inorganic carbon is transferred into thecytoplasm and then converted to organic carbon.While work has been carried out on many of theindividual steps of this process, the finer pointsare lacking, as is an understanding of the relation-ships between the different steps and processes.Understanding the response of Synechococcus sp. todifferent levels of CO2 in the atmosphere willrequire a detailed understanding of how the car-bon concentrating mechanisms in Synechococcus sp.work together. This will require looking thesepathways as a system.

The aims of this part of the project are to developand apply a set of tools for capturing the behaviorof complex systems at different levels of resolutionfor the carbon fixation behavior of Synechococcussp. The first aim is focused on protein networkinference and deals with the mathematical prob-lems associated with the reconstruction of poten-tial protein-protein interaction networks fromexperimental work such as phage display experi-ments and simulation results such as pro-tein-ligand binding affinities. Once thesenetworks have been constructed, Aim 2 and Aim3 describe how the dynamics can be simulatedusing either discrete component simulation (forthe case of a manageably small number of objects)or continuum simulation (for the case where theconcentration of a species is a more relevant mea-sure than the actual number). Finally, in thefourth aim we present a comprehensive hierarchi-cal systems model that is capable of tying resultsfrom many length and time scales together, rang-ing from gene mutation and expression to meta-bolic pathways and external environmentalresponse.

GTL Program Projects

Genomes to Life I 19

Page 32: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

University of Massachusetts, Amherst

Analysis of the Genetic Potential and Gene Expression of MicrobialCommunities Involved in the in situ Bioremediation of Uraniumand Harvesting Electrical Energy from Organic Matter

A24Analysis of the Genetic Potential andGene Expression of Microbial Commu-nities Involved in the in situBioremediation of Uranium and Har-vesting Electrical Energy from OrganicMatter

Derek Lovley1 ([email protected]),Stacy Ciufo1, Zhenya Shebolina1, AbrahamEsteve-Nunez1, Cinthia Nunez1, Richard Glaven1,Regina Tarallo1, Daniel Bond1, Maddalena Coppi1,Pablo Pomposiello1, Steve Sandler1, BarbaraMethé2, Carol Giometti3, and Julia Krushkal4

1University of Massachusetts; 2The Institute for GenomicResearch; 3Argonne National Laboratory; and 4University ofTennessee

The goal of this research is to develop models thatcan describe the functioning of the microbialcommunities involved in the in situbioremediation of uranium-contaminated ground-water and harvesting electricity from wasteorganic matter. Previous studies have demon-strated that the microbial communities involvedin uranium bioremediation and energy harvestingare both dominated by microorganisms in thefamily Geobacteraceae and that theseGeobacteraceae are responsible for the uraniumbioremediation and electron transfer to electrodes.The research plan is diagrammed below.

Progress to Date: Although the physiology ofpure cultures of Geobacters are being studied andmodeled in detail, the degree of similarity in thegenetic potential of the Geobacters in culture andthose that predominate during uraniumbioremediation or electrical energy harvesting isunknown. The environmental component of thestudies in the first four months of this projecthave focused a NABIR-program site, located inRifle, Colorado in which the addition of acetateto the subsurface stimulated the growth ofGeobacter species and the removal of uraniumfrom the groundwater. In order to evaluate thegenetic potential of the Geobacter species involved

in uranium bioremediation, which at timesaccounted for over 80% of the total microbialcommunity in the groundwater, genomic DNAwas extracted from sediments undergoing activeuranium reduction and is now being sequenced atthe Joint Genome Institute. Some of this datashould be available by the time of the meetingand novel methods for assembling complete ornearly complete Geobacter genomes from thisenvironmental genomic DNA will be presented.An additional strategy to learning more about thegenetic potential of Geobacters living in thesubsurface was to isolate the predominantGeobacters and sequence their genomes. Using anovel technique, we were able to isolate aGeobacter from the study site whose 16S rDNAsequenced matched a 16S rDNA sequence thatwas prevalent in clone libraries from the uraniumreduction zone at the study site. The genome ofthis organism will be studied in detail in the nextyear.

In order for information on the genetic potentialof Geobacters to be useful in predicting the activityof Geobacters during bioremediation or energyharvesting, it is important to understand howgene expression is regulated. Although Geobactershave previously been considered to be metaboli-cally simple organisms with little regulation,sequencing the genomes of several Geobacters has

GTL Program Projects

20 Genomes to Life I

Page 33: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

revealed that they have multiple complex regula-tory systems. Therefore, a major goal of this pro-ject is to investigate regulatory mechanisms inGeobacters. For example, analysis of the G.sulfurreducens genome revealed that it is highlyattuned to its environment with the largest num-ber of signal transduction proteins of any fullysequenced bacterium. Investigation of these regu-latory systems as well as other fur-like, fnr-like,and sigma factor systems are currently underway.A novel regulatory system, discovered in ourGenomes-to-Life research, in which Fe(III) serves

as a repressor signal controlling the expression ofthe fumarate reductase genes will also bedescribed.

Details on other key components of this projectwhich include: additional environmental studieson energy-harvesting electrodes; functional analy-sis of genomes of multiple species in the familyGeobacteraceae; and gene expression andproteomics studies to be conducted on sedimentswill also be presented.

GTL Program Projects

Genomes to Life I 21

Page 34: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

GTL Communication

B63Communicating Genomes to Life

Anne E. Adamson, Jennifer L. Bownas, Denise K.Casey, Sherry A. Estes, Sheryl A. Martin, MarissaD. Mills, Kim Nylander, Judy M. Wyrick, Laura N.Yust, and Betty K. Mansfield ([email protected])

Life Sciences Division, Oak Ridge National Laboratory,1060 Commerce Park, MS 6480; Oak Ridge, TN 37830

For the past 14 years, the Human Genome Man-agement Information System (HGMIS) hasfocused on presenting Human Genome Projectinformation and imparting knowledge to a widevariety of audiences. Our goal has been to helpensure that scientists could participate in and reapthe scientific bounty of this revolution, that newgenerations of students could be trained in thescience, and the public could make informed deci-sions regarding complicated genetics issues.Building on that experience, for the past 2 yearsHGMIS also has been involved in communicatingabout the DOE Office of Science Genomes toLife program, sponsored jointly by the Office ofBiological and Environmental Research (BER)and the Office of Advanced Scientific ComputingResearch (OASCR).

The Genomes to Life systems biology program isa departure into a new territory of complexity andopportunity requiring contributions from teamsof interdisciplinary scientists from the life, physi-cal, and computing sciences, necessitating anunprecedented integrative approach to both thescience and to science communication strategies.Because each discipline has its own perspectiveand language, effective communication, in addi-tion to technical achievement, is highly critical toGTL's overall scientific coordination and success.Part of the challenge is to help groups speak thesame language from the team-building and strat-egy-development phases through program imple-mentation and the reporting of results to scientificand public audiences. Our mission is to informand foster participation by the greater scientificcommunity, science administrators, educators,students, and the general public. Specifically, GTLcommunications goals include the following:

• Facilitate science by fostering information shar-ing, strategy development, and communicationamong scientists and across disciplines toaccomplish synergies, innovation, and increasedintegration of scientific knowledge.

• Help reduce duplication of scientific effort.

• Increase public awareness of the importance ofunderstanding microbial systems and theircapabilities.

In our work with interdisciplinary teams assem-bled by BER to hold discussions and develop sci-entific and programmatic strategies to accelerateGTL science, we create internal documentationWeb sites that organize draft texts, presentations,graphics, and supplementary materials and links.From such team activities arose a number ofimportant documents including more than 20texts and presentations since October 2000:• Roadmap and Web site, April 2001.

• Handouts for several BER and OASCR advi-sory committee meetings.

• Workshop reports.

• Numerous overview documents, includingabstracts and flyers.

• Contractor-grantee workshop research abstractsbook.

All GTL publications are on the public Web site.The GTL site also includes an image gallery,research abstracts, and links to program fundingannouncements and individual researcher Websites. Site enhancements are under way.

In addition to the GTL Web site, we producesuch related sites as Human Genome ProjectInformation, Microbial Genome Program, Micro-bial Genomics Gateway, Gene Gateway, Chromo-some Launchpad, and the CERN Library onGenetics. Collectively, HGMIS Web sites receivemore than 10 million hits per month; one milliontext file hits from more than 270,000 user ses-sions that last an average of more than 12 min-utes-well over the average time for Web visits. Weare leveraging this Web activity to increase visibil-ity for the GTL program.

Genomes to Life I 23

Page 35: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

HGMIS also identifies venues for special GTLsymposia or presentations by program managersand grantees. We present the GTL program viaour exhibit at meetings of such organizations asthe American Association for the Advancement ofScience, American Society for Microbiology,American Chemical Society, and the Biotechnol-ogy Industry Organization, as well as the G8energy ministers' conference hosted by DOE Sec-retary Abraham.

As HGMIS anticipates communications needsand new avenues to more comprehensively

represent GTL science, we continually seek ideasfor extending and improving communications andprogram integration efforts. We welcomesuggestions and input.DOEGenomesToLife.org

865-576-6669

This research sponsored by Office of Biological and Environ-mental Research, U.S. Department of Energy. Oak RidgeNational Laboratory (ORNL) is managed by UT-Battelle, LLC,for the U. S. Department of Energy under Contract No.DE-AC05-00OR22725.

GTL Communication

24 Genomes to Life I

Page 36: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Modeling/Computation

A26Hierarchical Organization of Modular-ity in Metabolic Networks

Albert-László Barabási1 ([email protected]), Zoltán N.Oltvai2 ([email protected]), A. L. Somera3, D. A.Mongru3, G. Balazsi3, Erzsebet Ravasz1, S. Y.Gerdes4, J. W. Campbell4, and A. L. Osterman4

1University of Notre Dame, Department of Physics, 225Nieuwland Science Hall, Notre Dame, IN 46556,574-631-5767, Fax: 574-631-5952; 2Department ofPathology, Northwestern University Medical School, WardBldg. 6-204, W127, 303 E. Chicago Ave., Chicago, IL60611, 312-503-1175, Fax: 312-503-8240; 3NorthwesternUniversity; and 4Integrated Genomics, Inc.

The identification and characterization of sys-tem-level features of biological organization is akey issue of post-genomic biology. An elegantproposal addressing the cell�s functional architec-ture is offered by the concept of modularity,assuming that the cell can be partitioned into acollection of modules. Each module, a discreteentity of several elementary components, per-forms an identifiable biological task, separablefrom the functions of other modules. Yet, it isnow widely recognized that the thousands ofcomponents of the metabolism are dynamicallyconnected to one another, such that the cell�sfunctional properties are ultimately encoded into acomplex metabolic web of molecular interactions.Within this network, however, modular organiza-tion and clear boundaries between sub-networksare not immediately apparent. Indeed, recentstudies have demonstrated that metabolic net-works have a scale-free topology. A distinguishingfeature of such scale-free networks is the existenceof a few hubs, highly connected metabolites suchas pyruvate or CoA, which participate in a verylarge number of metabolic reactions. With a largenumber of links, these hubs integrate all sub-strates into a single, integrated web in which theexistence of fully separated modules is prohibited.

To resolve the apparent contradiction, we nowprovided evidence that the metabolism has a hier-archical organization, an architecture thatseamlessly integrates a scale-free topology with aninherent modular structure. For this purpose we

have shown that the degree of clustering presentin the network can be used as a distinguishing fea-ture of a hierarchical structure, and offered directevidence that the metabolism of 43 organismshave such a hierarchical architecture.

To turn this new conceptual framework into apractical tool we developed a method to directlyidentify and visualize the topological modulespresent in the E. coli metabolism and identifiedthe function of these modules based on the pre-dominant biochemical class of the substrates theybelong to, using the standard, small molecule bio-chemistry based classification of metabolism. Wefind that most substrates of a given small mole-cule class are distributed within the same identi-fied module and correspond to relativelywell-delimited regions of the metabolic network,demonstrating strong correlations between sharedbiochemical classification of metabolites and the

Genomes to Life I 25

Barabási— Fig. 1. The E. coli metabolic network color-coded

based on the biochemical classification of the individual substrates.

Each node corresponds to a metabolite, and links represent

biochemical reactions between them.

Page 37: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

global topological organization of E. coli. Theseresults and the systematic experimental corrobora-tion of this framework by global transposonmutagenesis will be discussed.

Supported by the DOE grant “The Organization of ComplexMetabolic Networks.” Principal Investigator: Albert-LászlóBarabási, University of Notre Dame.

Background Literature

Hierarchical Organization of Modularity in Metabolic Net-works, E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai,and A.-L. Barabási, Science Aug 30 2002: 1551-1555.

Experimental and System-Level Analysis of Essential andDispensable Genes in E. coli MG1655, S.Y. Gerdes et al, inpreparation.

A30SimPheny: A Computational Infra-structure Bringing Genomes to Life

Christophe H. Schilling1 ([email protected]), Radhakrishnan Mahadevan1, Sung Park1,Evelyn Travnik1, Bernhard O. Palsson2, CostasMaranas3, Derek Lovley4, and Daniel Bond4

1Genomatica, Inc., 5405 Morehouse Drive , Suite 210, SanDiego, CA 92121, 858-824-1771, Fax: 858-824-1772;2University of California, San Diego; 3Penn State University;and 4University of Massachusetts, Amherst

The Genomes to Life (GtL) program has clearlystated a number of overall goals that will only beachieved if we develop �a computational infra-structure for systems biology that enables thedevelopment of computational models for com-plex biological systems that can predict the behav-ior of these complex systems and their responsesto the environment.� At Genomatica we havedeveloped the SimPheny� (for Simulating Phe-notypes) platform as the computational infra-structure to support a model-driven systemsbiology research paradigm. SimPheny enables theefficient development of genome-scale metabolicmodels of microbial organisms and their simula-tion using a constraints-based modeling approach.

We are currently utilizing this platform for a num-ber of DOE-related projects including:

1. Developing the next generation ofgenome-scale models: In collaboration withProf. Costas Maranas at Penn State Univer-sity and Prof. Bernhard Palsson at the Univ.California, San Diego, we are integrating

methods to incorporate regulation and signaltransduction mechanisms into metabolicmodels and enable advance simulation algo-rithms that utilize mixed-integer linear pro-gramming (MILP).

2. Geobacter sulfurreducens Modeling: As partof the Microbial Cell Project led by Prof.Derek Lovley at the Univ. of Massachusettswe have completed the development of afirst draft genome scale model for G.sulfurreducens within SimPheny. We are nowbeginning the process of performing simula-tions with the model to providemodel-driven analysis of experimental data,and providing data integration solutionsthrough the development of a model centricdatabase

3. Pseudomonas fluorescens Model Development:As part of a Phase I Small Business Innova-tive Research (SBIR) grant we are construct-ing a genome scale model of P. fluorescensthat will be used to drive metabolic engineer-ing research on this organism for industrialbioprocessing applications.

Modeling/Computation

26 Genomes to Life I

Page 38: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Herein we will highlight the capabilities of theSimPheny platform as an infrastructure for sup-porting model driven systems biology researchwith special emphasis on its application to thedevelopment of the G. sulfurreducens metabolicmodel.

A32Parallel Scaling in Amber MolecularDynamics Simulations

Michael Crowley, Scott Brozell, and David A. Case([email protected])

Dept. of Molecular Biology, The Scripps Research Institute,La Jolla, CA 92037

Large-scale biomolecular simulations form anincreasingly important part of research in struc-tural genomics, proteomics, and drug design.Popular modeling tools such as Amber andCHARMM are limited by both state-of-the-arthardware capabilities and by software algorithmlimitations. Current macro-molecular systems ofinterest range in size to several hundred thousandatoms, and current simulations generally simulateone to tens of nanoseconds. With a 2 fs timestep,and each force evaluation involving millions ofinteractions to be calculated, a simulation requiresmany gigaflops to finish in a reasonable period oftime. A parallel implementation of the calculation

can provide the required performance by usingthe power of many processors simultaneously.However, communication speed between nodeshas not progressed as rapidly as CPU processingpower in recent years. Here, we address someweakness of the current parallel molecular dynam-ics implementation in Amber (and in a compara-ble program such as CHARMM). The work isaimed at making affordable a new generation ofincreasingly sophisticated biomolecular simula-tions.

Atom-Based Decomposition in Amber

Of the many ways to distribute the work of aforce calculation in parallel [1,2], the method ofreplicated data (or �atom decomposition�) hastraditionally been used in Amber and CHARMM.This sort of parallel implementation is based ondividing each portion of the force calculationevenly among the processors, while keeping a fullset of coordinates on all processors. This is veryflexible, and relatively straightforward to pro-gram. Each processor is assigned an equal numberof bonds, angles, dihedrals, and nonbond interac-tions. In this way, the work is balanced in eachpart of the force calculation, and the computationtime scales well as the number of processorsincreases. However, in each part of the force cal-culation a node computes forces for different sub-sets of atoms. For this reason, each processorrequires a complete set of up-to-date coordinatesand is assumed to have components of forces forall atoms. At each step, the forces computed forall atoms on each node must be summed and dis-tributed, and updated coordinates must be col-lected from each node and sent complete to allnodes. There are hence two all-to-all communica-tions at each step. Even with binary tree algo-rithms for distributed sums and redistribution,the communication time becomes a significantfraction of the total time by 32 processors, evenon the most sophisticated parallel machines. Thislimitation eliminates the possibility of efficientparallel runs at large numbers of processors, andputs a restriction on the size and length of simula-tions that a researcher can attempt even whenlarge parallel computational resources are avail-able. Still, for systems up to about 32 processors,these codes are more efficient for typical solvatedsimulations than are popular alternatives such asCHARMM or NAMD.

Spatial Decomposition in Amber

The second-generation parallel Amber, now underdevelopment, implements a �spatial decomposi-

Modeling/Computation

Genomes to Life I 27

Page 39: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

tion� method [1,2] in which the molecular systemis divided into regions of space where approxi-mately equal amount of force computation isrequired. The method works when contributionsto the force on an atom come primarily frominteractions with other atoms that are relativelyclose and are neglected for atoms that are beyonda fixed cutoff. (This condition is valid in modernMD simulations except for long-rangeelectrostatics, which use Ewald-based methodsdiscussed below.) In this approach, a processor isassigned the atoms located in a slice of space andit is responsible for the coordinates, forces, veloci-ties, and energetic contributions of those atoms.In order to compute the forces for its ownedatoms, the processor must be able to compute thecontributions from interactions with atoms thatare within the cutoff, including any that areassigned to other processors. A processor keeps acopy of all such needed atom coordinates andforces as well as its owned atom coordinates andforces. At each step, a processor determines theforce contributions due to all interactions in itsowned and needed atoms. It sends all force con-tributions on needed atoms to the processors thatown those atoms and receives any force contribu-tions for its owned atoms that were calculated byother processors. When the force communicationsare complete, the coordinate integration is per-formed on the owned atoms. Each message in allthe above communications is at most the size ofthe owned atom partition and will often be consid-erably smaller. Inventories of message sizes showsa reduction in overall data transferred to less thanhalf of that for the replicated-data method, forttypical solvated protein or nucleic acid systems.Timing for communication is reduced by nearlyidentical ratios.

This conversion of the Amber codes is complex,since there are complications inherent in spatialdecomposition that do not arise in the replicateddata method; these are mainly in the treatment ofbonded interactions, constraints, long-rangeelectrostatics, and bookkeeping. The first twocomplications arise when molecules (chemicalbonds) or distance constraints span the spatialboundaries. Most bonds, angles, dihedrals,restraints, and constraints can be assigned accord-ing to ownership of atoms. When the atomsinvolved are owned by distinct processors, analgorithm must be implemented to insure that theinteractions are considered but only once, andthat the coordinates necessary are current and cor-rect. Bond-length constraints (using the so-called

�SHAKE� approach) are more complicated, sincethey redefine the positions of atoms after thecomputed forces have been applied to ownedatoms. In this case, the updated positions of allatoms involved in a constraint must be known inorder to adjust positions of owned atoms regard-less of whether they are owned or not. Besidesthese complications lie the bookkeeping needed tokeep track of which forces and coordinates arebeing sent and received. One of the challenges isto keep the bookkeeping to a minimum, and tomake it as efficient as possible, so that it does notsimply replace the time saved in communications.Finally, we must optimize scaling of the Ewaldmethod of treating long-range electrostatics inperiodic system, and in particular, the PMEimplementation of Ewald sums. We are currentlyexploring several methods of reducing the com-munications costs of PME in highly parallel sys-tems.

Running Dynamics on Pentium Clusters

In molecular dynamics simulations the calcula-tions of the nonbonded interactions are a compu-tational bottleneck. These interactions dependupon the interparticle distances. On Intel 32 bitarchitectures (IA32) the fastest methods to evalu-ate reciprocal square roots utilize the StreamingSIMD (Single-instruction multiple-data) Exten-sions for double precision (SSE2) operations.Tuning the AMBER source code to enable auto-matic vectorization, i.e., generation of SSE2instructions by compilers, introduces a memorycache penalty as well as additional loop overhead.On IA32 platforms the SSE2 tuned AMBER isapproximately eight percent faster than the origi-nal AMBER as measured by execution times of atypical explicit solvent protein simulation. TheIA32 SIMD vector length is four for single preci-sion data and two for double precision. Conver-sion of AMBER to single precision yields a fortypercent performance improvement on IA32 plat-forms, as measured by execution times of a typicalGeneralized Born implicit solvent protein simula-tion. We are exploring what how to make best useof these sorts of gains without sacrificing anyessential precision in the results.

[1] S. Plimpton. Fast parallel algorithms for short-range moleculardynamics. J. Comput. Phys. 117, 1-19 (1995).

[2] L. Kalé, R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N.Krawetz, J. Phillips, A. Shinozaki, K. Varadarajan, and K.Schulten. NAMD2: Greater scalability for parallel moleculardynamics. J. Comput. Phys. 151, 283-312 (1999).

Modeling/Computation

28 Genomes to Life I

Page 40: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

A34Microbial Cell Model of G.sulfurreducens: Integration of in SilicoModels and Functional GenomicStudies

Derek Lovley1 ([email protected]),Maddalena Coppi1, Daniel Bond1, Jessica Butler1,Susan Childers1, Teena Metha1, Ching Leang1,Barbara Methé2, Carol Giometti3, R. Mahadevan4,C. H. Schilling4, and B. Palsson4

1Department of Microbiology, University of Massachusetts,Amherst, Amherst, MA; 2The Institute for GenomicResearch, Rockville, MD; 3Biosciences Division, ArgonneNational Laboratory, Argonne, IL; and 4Genomatica, Inc.,San Diego, CA 92121

Molecular analyses have demonstrated thatGeobacter species are the predominantdissimilatory metal-reducing microorganisms in avariety of subsurface environments in which metalreduction is an important process, including ura-nium-contaminated aquifers undergoingbioremediation. The long-term objective of thisstudy is to develop comprehensive conceptual andmathematical models of Geobacter physiology andits interactions with its physical-chemical environ-ment, in order to predict the behavior ofGeobacter in a diversity of subsurface environ-ments. Prior to sequencing and initiating thefunctional analysis of the genomes of Geobactersulfurreducens and Geobacter metallireducens, it wasconsidered that these organisms were non-motile,strict anaerobes with a simple metabolism thatrequired little regulation. It is now clear that eachof these basic characterizations is incorrect. Theseand other surprises from the analysis of theGeobacter genomes are significantly influencingour design of strategies for in situ metalsbioremediation.

A preliminary genome-scale metabolic model ofG. sulfurreducens central metabolism has beendeveloped with a constraints-based modelingapproach using the annotated genome sequenceand scientific literature. A detailed description ofthis model is provided in the companion posterpresented by Genomatica.

Further experimental investigation is required torefine and expand this preliminary in silico model.For example, G. sulfurreducens requires a fumaratereductase in order to grow with fumarate as thesole electron acceptor, but when growing with

Fe(III) as an electron acceptor, it also requires asuccinate dehydrogenase in order to complete theoxidation of acetate to carbon dioxide via theTCA cycle. Although the genome of G.sulfurreducens contains a cluster of three genes,frdA, frdB, and frdC, resembling the three sub-units of theWolinella succinogenes fumaratereductase, genes for a separate succinatedehydrogenase are not apparent. Studies with afrdA knock-out mutant revealed that this complexfunctioned in vivo, not only as a fumaratereductase, but also as a succinate dehydrogenase.Elucidation of this novel function significantlyimproves the in silico model.

A genetic study on putative hydrogenase genesrevealed that although G. sulfurreducens has atleast four possible hydrogenases, only one isrequired for growth with hydrogen as the soleelectron acceptor. Interestingly, this is the onehydrogenase gene that is missing from the closelyrelated Geobacter metallireducens, which unlike G.sulfurreducens can not grow with hydrogen as anelectron donor. Genetic and metabolomic studieshave demonstrated that a novel cytoplasmicenzyme, which had previously been identified asNADPH-dependent Fe(III) reductase, is in factinvolved in electron transport from NADPH to avariety of electron acceptors and plays a key rolein intracellular NADPH homeostasis. In additionto providing valuable information on Geobactermetabolism, this result underscores the impor-tance of investigating the function of enzymes invivo, especially when dealing with electron trans-port to Fe(III), which many enzymes cannonspecifically reduce in vitro.

The Geobacter genomes contain a much higherpercentage of genes devoted to electron transportfunction than found in other organisms. Geneexpression and proteomics studies have revealedspecific genes that are associated with Fe(III)reduction. For example, genes with highhomology to flagellar and type(IV) pilus geneswere specifically expressed during growth on theinsoluble Fe(III) and Mn(IV) oxides. This find-ing, coupled with the discovery that Geobacter ischemotactic to Fe(II), has suggested thatGeobacter produce extracellular appendages formotility only when they need to search for insolu-ble Fe(III) or Mn(IV) oxides (Childers, S. E., S.Ciufo, and D. R. Lovley. 2002. Geobactermetallireducens access Fe(III) oxide by chemotaxis.Nature 416:767� 769). Studies on mutants miss-ing key pilin genes have further demonstrated theimportant role of pili in Fe(III) oxide reduction.In a similar manner, genes that are homologs of

Modeling/Computation

Genomes to Life I 29

Page 41: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

components of the type II secretion systems ofother bacteria are specifically expressed duringgrowth on Fe(III) or Mn(IV) oxide and disrupt-ing these genes inhibits the reduction of the insol-uble metal oxides, but not soluble electron accep-tors, including Fe(III) citrate. Mutants withdeletions in one of several c-type cytochromegenes that are more highly expressed duringgrowth on Fe(III) no longer had the capacity forFe(III) reduction. These studies clearly demon-strate that Geobacter possess a highly regulatedsystem of genes that are specifically involved inthe reduction of Fe(III) and Mn(IV).

Additional examples, of ongoing iterative model-ing and experimental elucidation of metabolic andelectron transport pathways will be presented.These include novel enzymes for carbon metabo-lism as well as the surprising finding, originatingfrom genome-based modeling, that Geobacter spe-cies that were previously classified as �strictanaerobes� have the ability to use oxygen. Thishas important implications for their survivalunder the fluctuating redox regimes common inthe subsurface. Genome analysis has also sug-gested additional bioremediation capabilities suchas the ability to degrade TNT and related contam-inants and the ability to reduce mercury. Theseearly results under the Microbial Cell Project dem-onstrate the power of genomic analysis to signifi-cantly influence bioremediation research.

A36Towards a Self-Organizing andSelf-Correcting Prokaryotic Taxonomy

George M. Garrity1 ([email protected]) andTimothy G. Lilburn2

1The Bergey’s Manual Trust, Michigan State University, EastLansing, MI 48224; and 2The American Type CultureCollection, Manassas, VA 20110

A longstanding goal of microbiologists has beenthe creation of a taxonomy that reflects the natu-ral history of prokaryotes and provides a stableand reliable scheme that is predictive and work-able in a wide variety of applications. Forgenomic comparisons the establishment of such aframework is essential if we hope to be able torecognize homologous, paralogous andxenologous sequences. Over the past 15 yrs, a rea-sonably good picture of the evolutionary relation-ships among the Bacteria and Archaea has

emerged, largely as a result of the widespreadadoption of 16S rRNA as the molecule of choicefor phylogenetic studies. At present, twolarge-scale phylogenetic trees of the prokaryotesexist in varying states of completeness and serve asthe foundation of the comprehensive taxonomyused in the Second Edition of Bergey’s Manual ofSystematic Bacteriology. However, limitations ofthe underlying phylogenetic models precludeincorporation of much of the available sequencedata into an all-inclusive taxonomy that can beeasily maintained and modified in an automaticfashion as new taxa are described and existing taxaemended. Confounding problems include a highnumber of annotation errors in the sequence dataand a lack of clear criteria for defining taxonboundaries.

We recently described the application of tech-niques drawn from the field of exploratory dataanalysis that take advantage of the large numberof SSU rRNA sequences available and that pro-duce results which are reconcilable with ourknowledge of both the phylogenetic models andavailable phenotypic data. We found that principalcomponents analysis (PCA) of large matrices ofevolutionary distance data (> 2×106 data points)yielded 2D and 3D maps of evolutionary space inwhich the high level groups that were formedproved consistent with those found in the largescale consensus trees. While PCA maps proveduseful in establishing a comprehensive prokaryotictaxonomy and greatly aided in the identificationof misclassified sequences, the method proved lessuseful in establishing group membership ofmisclassified or unknown sequences. In order toeliminate some of these problems, we have turnedto a second visualization technique, heat maps.Heat maps are a type of graph in which signalintensity is displayed as a gradation of colorwithin the confines of a precisely ordered grid.Heat maps have recently found widespread appli-cation in the field of microarray analysis and pro-vide a useful means of visualizing differential geneexpression. Heat maps, in an earlier variation,have also found application in the distant past inprokaryotic taxonomy.

Recently, we have begun using heat maps as agraphical tool for comparing alternativephylogenies and models at intermediate taxo-nomic levels (Order-Genus). In this paper, wedescribe how heat maps were used as a graphicaltool to demonstrate significant improvements inthe current classification of theGammaproteobacteria, brought about by the appli-cation of a newly developed supervised-clustering

Modeling/Computation

30 Genomes to Life I

Page 42: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

algorithm. Using a set of simple statistical criteriafor group membership, the algorithm iterativelyreorganizes the distance matrix on ataxon-by-taxon basis, excludes outliers and mis-identified sequences, and subsequently reinsertssuch sequences into the matrix at the location ofits most-probable identity. Dynamically reorderedheat maps of user-selected submatrices serve as anaid to the inspection and modification of theautomatically generated classification, the identifi-cation of possible classification errors, ad-hoc test-ing of alternative classifications/hypotheses anddirect extraction of the underlying distance data.

Our results demonstrate that significant improve-ments to prokaryotic taxonomy can be readilyobtained using statistical approaches to the evalu-ation of sequence-based evolutionary distances.Errors in curation, classification, and identifica-tion can be easily spotted and their effects cor-rected, and the classification itself can be modifiedso that the information content of the taxonomyis enhanced. Furthermore, evolutionary analysesbased on other molecules can be viewed in termsof this rRNA-based phylogeny and used toimprove the classification in taxa where the infor-mation content and resolving power of the 16SrRNA molecules proves too low (e.g., Bacillus).Obviously, a more robust classification has greaterpredictive power and serves as a more reliableevolutionary framework for genome exploration.The visualization supplies an intuitive approach,in that persons with no taxonomic training can,by looking at the heat maps, see how the classifi-cation might be improved; our algorithm formal-izes and automates the means used to achievesuch improvements. The chief drawback of theapproach is that groups formed from taxa that aresparsely represented in the SSU rRNA sequencedata set may not be as robust or stable as groupsfrom more richly populated taxa. This is especiallytrue in instances where such taxa are equidistantto two or more otherwise unrelated taxa. How-ever, such problems should prove transitory as thedata set grows daily and the tools we are develop-ing will allow us to maintain and expand a com-prehensive prokaryotic taxonomy.

It is worthwhile noting that the usefulness of thetechniques described here is not limited to bacte-rial taxonomy. These methods can be used todevelop and improve classifications of all types.For example, functional assignment of newsequences benefits from a reliable protein classifi-cation. Data from gene expression microarraysmight also be usefully classified using these tech-niques.

A pseudocode description of the methods usedhere will be available at the poster.

A38Computational Framework for Micro-bial Cell Simulations

Haluk Resat1 ([email protected]), Heidi Sofia1,Harold Trease1, Joseph Oliveira1, Samuel Kaplan2,and Christopher Mackenzie2

1Pacific Northwest National Laboratory and 2University ofTexas Medical School at Houston

The complex nature of data characterizing cellularprocesses makes mathematical and computationalmethods essential for interpreting experimentalresults and in designing new experiments.Although the need to develop comprehensiveapproaches is widely recognized, significantimprovements are still necessary to bridge experi-mental and computational approaches. Suchadvances require the development of integratedsets of computational tools to achieve the level ofsophistication necessary for understanding thecomplex processes that occur in cells. As part ofthis project, we have been developing a set of pro-totype network analysis tools and methods, andemploying them to investigate the flux and regu-lation of fundamental energy and material path-ways in Rhodobacter sphaeroides.

Our tool development efforts span a wide spec-trum. Current prototype components aredesigned in such a way that, when combined later,they will form the backbone of a comprehensivemicrobial cell simulation environment. In particu-lar, we have been working on developing a frame-work for the following areas:

Qualitative network analysis: This analysis algo-rithm uses the connection diagram of the cellularnetworks to classify and rank them according totheir linkage characteristics. We have shown thatthe Petri net representation can be a powerful wayto extract the topologies and the universal featuresof cellular networks. This allows for the compari-son of networks to decipher the common mod-ules.

Gene regulatory networks: We have developed anobject oriented stochastic simulation software thatcan be used to study the expression levels ofgenes. Given a regulatory network and using auser defined multistate representation for gene

Modeling/Computation

Genomes to Life I 31

Page 43: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

expression levels, our software can be used to sim-ulate the mean gene expression levels and theirfluctuations. This simulation software was appliedto derive the network parameters of a recentlyreported synthetic genetic network.

Stochastic kinetic simulation software: It has beenwell established that the number of molecules ofcertain species in cells can be very low. This makesthe stochastic representation more appropriate tostudy the dynamics of cellular systems. We haveshown that the kinetic simulations are not onlylimited to biochemical reactions but any physicalevent such as vesicle formation can be included inkinetic simulations. Our kinetic simulation algo-rithm was devised and implemented in such a waythat it can be used to simulate hybrid models thatcombine biochemical and physical events.

Imaging of bacterial cells and image recontruction:We are in the process of obtaining images of R.sphaeroides using electron tomography. R.sphaeroides cells will be chemically processed forTEM and imaged using the high resolution elec-tron microscope at EMSL/PNNL. A differentmethod of electron tomography will be utilizedfor 3-dimensional reconstruction of a bacterial cellto visualize the spatial distribution of intracellularvesicles throughout the bacterium. This will beperformed using the remove operation capabilitiesof the TEM imaging facility at UCSD that allowsus to obtain a series of tilted digital images thatcan later be computationally reconstructed.Reconstructed 3-D geometry of surface featuresand of the internal structures will later be used inspatially resolved simulation studies.

Mesh grid based simulation framework: Explicitincorporation of geometric and structural infor-mation into cellular models is important to studythe effect of the local environment on transportand kinetic properties. NWGrid and NWPhyscodes that are part of the large scale computa-tional simulation framework used at PNNL havebeen adapted to biological systems. Obtainedimages of R. sphaeroides will define a structureupon which we can map spatially dependentquantities and will provide the basis for buildingspatial computational models of this microbialcell.

Similarity analysis of genomes and prediction ofsuperfamilies: We have developed analysis andvisualization software based on clustering anddata integration to enable biologists to navigatethrough large quantities of genome sequence dataand operon information for the purpose of classi-

fying genomic sequences and assigning proteinfunctions rapidly and efficiently. We are applyingthe Similarity Box analysis to R. sphaeroidesgenome sequence data. To illustrate the usefulnessof our software, we show a comparative genomicsanalysis of the FNR superfamily, an importantgroup of transcriptional regulator proteins indiverse bacterial species that respond to signalssuch as redox, nitrogen status, and temperature.We also show a genomic comparison of FNR pro-teins in the two ?-Proteobacterial species, R.sphaeroides and Rhodopseudomonas palustris. Thesetwo closely related but divergent prokaryotes havea partially overlapping complement of FNR pro-teins.

Molecular part list and network derivation:Although most of the key genes have been identi-fied, the network describing the energy metabo-lism of R. sphaeroides is minimally understood. Toimprove the energy metabolism network of R.sphaeroides that was initially built using biochemi-cal data, we are using the recent genome (DOEsponsored JGI) and microarray (UT-Houston)data to build a more complete molecular parts list.We are making use of the genome data by apply-ing our similarity analysis approach to assignfunctionality to improve the annotation of thegenome. For similar purposes, we are analyzingthe microarray data using clustering methodscombined with promoter sequence information.

A40Characterization of Genetic RegulatoryCircuitry Controlling Adaptive Meta-bolic Pathways

Harley McAdams*, Lucy Shapiro*, and MikeLaub*

*Presenters

Stanford University

In this project, an interdisciplinary team of scien-tists from Stanford, Harvard, and SRI Interna-tional is characterizing genetic regulatory circuitryand metabolic pathways in the bacteriumCaulobacter crescentus. The Caulobacter areoligotrophic organisms adapted to low-nutrientenvironments such as clear streams, lakes, and theopen ocean and some species are found in thedeep subsurface environment. The genetic circuitcontrolling the Caulobacter crescentus cell cycle is

Modeling/Computation

32 Genomes to Life I

Page 44: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

one of the best-characterized bacterial regulatorynetworks. Caulobacter is a particularly usefulmodel system for study of cell-cycle regulationbecause cell populations can be synchronized withminimum perturbation of the normal physiologyof the cell. This feature has permitted a detailedmolecular analysis of the Caulobacter life cycle andhas revealed a complex regulatory network gov-erning the cell cycle progression andmorphogenesis. The results of the current projectwill provide a powerful base for eventually engi-neering situation-specific regulatory �cassettes�into Caulobacter cells for targeted remediationapplications.

In the first year of the project, we have developeda computational method to predict Caulobacteroperon organization, established a baseline profileof gene expression levels for the complete genomeover the cell cycle using microarrays, and devel-oped and optimized a protocol for high through-put creation of gene deletion mutants for theentire Caulobacter genome. We have succeeded inobtaining and visualizing unique biofilms ofCaulobacter wild-type and mutant cells. In theirnatural habitat, Caulobacter commonly grow inbiofilms. We are particularly interested in deter-mining whether genes are expressed in biofilmsthat are not active under laboratory conditions.As with most newly sequenced genomes, aboutforty percent of Caulobacter�s genes are ofunknown function. Determining what these genesdo is an important objective of the project.

We have published the first version of theCauloCyc online database (www.biocyc.org) forbrowsing and analysis of the Caulobacter genomeusing SRI�s Pathway Tools software. This com-bined database and software environment haspowerful and unique capabilities for modeling,visualizing, and analyzing biochemical and geneticnetworks. Using SRI�s PathoLogic program wehave computationally predicted Caulobacter�s met-abolic pathways from the genome. The Patho-Logic program accepts two inputs: a fullyannotated genome sequence, and the MetaCycmetabolic pathway database. The process of path-way prediction involves evaluating the evidencefor the presence of each pathway in the referenceDB for the organism being analyzed. A pathwayconsists of a sequence of enzyme-catalyzed reac-tions. The more enzymes we find within a path-way that are encoded by the genome, the moreevidence we say there is for the presence of thatpathway. The resulting metabolic pathway predic-tion assigned 617 Caulobacter enzymes to theircorresponding metabolic reactions in 130 pre-

dicted metabolic pathways. Now we are investi-gating the �holes� within these predicted meta-bolic pathways, and we are using microarraystudies of colonies growing in diverse media plusfocused Blast studies to identify the missingenzymes within the Caulobacter genome.

A28Computational Elucidation of Meta-bolic Pathways

Imran Shah ([email protected])

School of Medicine, University of Coloradohttp://shah-lab.uchsc.edu

Elucidating the metabolic network of a living sys-tem is an important requirement for modeling itsphysiological behaviour and for engineering itspathways. With the availability of whole genomesit is theoretically possible to infer the presence ofputative enzymes and transporters in an organism.However, piecing this information into a com-plete picture is still mostly a daunting manual taskfor at least two reasons. First, we do not haveaccurate and sufficient annotation of enzymaticfunction from sequence. Consequently, many pro-teins in completely sequenced microbes remainfunctionally uncharacterized. Second, inferringthe causal biochemical connections within a meta-bolic network is not straightforward. We aredeveloping a computational infrastructure toaddress these challenges. In earlier work we havedeveloped a machine learning (ML) approach toimprove the assignment of enzymatic functionfrom sequence. More recently, we have developedan artificial intelligence (AI) approach for the pre-diction of metabolic pathways and their interac-tive visualization, called PathMiner. This posterwill present an overview of our system and discusssome relevant results for the radiation resistantmicrobe, D. radiodurans, and the metal-reducingbacterium, S. oneidensis MR-1.

Modeling/Computation

Genomes to Life I 33

Page 45: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

A42Data Exchange and ProgrammaticResource Sharing: The Systems Biol-ogy Workbench, BioSPICE and theSystems Biology Markup Language(SBML)

Herbert M Sauro ([email protected])

Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711

Standards

There is now a wide variety of modeling toolsavailable to the budding systems biologist, butuntil recently there has been no agreed way toexchange models between different tools. TheSystems Biology Markup Language is one of twoemerging standards to allow systems biologymodeling and analysis packages to exchange mod-els. SBML is an open XML-based format devel-oped to facilitate the exchange of models ofbiochemical reaction networks between softwarepackages. Currently SBML Level 1 is supportedby a growing number of tools, including, Jarnac,JDesigner, Gepasi, VCell, jigCell, Cellerator, andBioSPICE (under development). Level 2 is cur-rently under discussion with the BioSPICE group,with a final release sometime in the second quar-ter of 2003.

There are in addition a small number of growingrepositories of SBML models now available onthe web, including, http://www.sbw-sbml.org,http://www.symbio.jst.go.jp/~funa/kegg/mge.html, http://www.sys-bio.org,http://www-aig.jpl.nasa.gov/public/mls/cellerator/nb.html, http://www.gepasi.org/

Programmatic Resource Sharing

The ERATO Systems Biology Workbench (SBW)is an open source, portable (Windows, Linux,Mac OS X) framework for allowing both legacyand new application resources to share data andalgorithmic capabilities. Our target audience isthe computational biology community whoseinterest lies in simulation and numerical analysisof biological systems. SBW allows communicationbetween processes potentially located across a net-work on different hardware and operating sys-tems. SBW currently has bindings to C, C++,Java, Delphi, Python, Perl and BioSPICE, withmore planned for in the future. SBW uses a sim-

ple messaging system across sockets as a meansfor applications to communicate at high speed.

Software components that implement differentfunctions (such as GUIs, model simulation meth-ods, analysis methods, database interfaces, etc.)can be connected to each other through SBWusing a straightforward application programminginterface (API). The figure illustrates the visualdesign tool, JDesigner, interacting with the com-putational engine, Jarnac, via SBW (Jarnac is act-ing as a server and is thus invisible to the user).This setup permitted us to avoid writing,�yet-another-simulator�, and allowed JDesigner,which had no inherent simulation capabilities todispatch simulation requests to another resource.Under SBW, models are exchanged betweenresource applications using SBML.

We are also working closely with the BioSPICEDARPA funded program to enhance SBML toLevel 2 and to establish a set of agreed resourceAPIs to enable plug and play resources for bothBioSPICE and SBW.

There is a growing list of modules which cancommunicate with each other via SBW and theBioSPICE software, including time course simula-tors, stochastic simulators (at least three kinds),basic optimizer, structural analysis tool viaMETATOOL, graphing tools, system browsingtools, etc.

Collaborators

The development of SBML/SBW was primarilyconducted at Caltech by Hiroaki Kitano, JohnDoyle, Hamid Bolouri, Mike Hucka and AndrewFinney and Herbert Sauro in collaboration with

Modeling/Computation

34 Genomes to Life I

Sauro - Fig. 1. JDesigner simulating a MAPK Pathway

Page 46: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

many other groups, including (in alphabeticalorder): A. Arkin, B. Bornstein, D. Bray, A.Cornish-Bowden, A. A. Cuellar, S. Dronov, D.Fell, E. D. Gilles, M. Ginkel, V. Gor, I. I.Goryanin, W. J. Hedley, T. C. Hodgman, J.-H.Hofmeyr, P. J. Hunter, N. Juty, J. Kasberger, A.Kremling, U. Kummer, N. Le Novere, L. Loew,D. Lucio, P. Mendes, E. Mjolsness, Y. Nakayama,M. Nelson, P. Nielsen, T. Sakurada, J. Schaff, B.Shapiro, T. Shimizu, H. Spence, J. Stelling, K.Takahashi, M. Tomita, J. Wagner, J. Wang, theBioSPICE program project investigators.

Web sites for further information on SBW andSBML:• www.sbw-sbml.org• www.sys-bio.org

Funding was provided by ERATO, the Keck Graduate Institute,DARPA, and a U.S. Air Force Grant.

A44A Web-Based Laboratory InformationManagement System (LIMS) for Labo-ratory Microplate Data Generated byHigh-Throughput Genomic Applica-tions

James R. Cole1 ([email protected]), Joel A.Klappenbach1 ([email protected]), Paul R.Saxman1, Qiong Wang1, Siddique A. Kulam1, AlisonE. Murray2, Liyou Wu3, Jizhong Zhou3, and JamesM. Tiedje1

1Center for Microbial Ecology, Michigan State University,East Lansing, MI; 2Earth and Ecosystem Sciences, DesertResearch Institute, Reno, NV; and 3Environmental SciencesDivision, Oak Ridge National Laboratory, Oak Ridge, TN

The use of high-throughput genomic-level meth-odologies such as microarray fabrication,proteomics, and whole-genome clone librariesnecessitate computer database management toolsfor tracking and archiving massive quantities ofdata. We have developed a laboratory informationmanagement system (LIMS)�named theMicrobeArrayDB�for handing high-throughputdata created during the fabrication of DNAmicroarrays. The laboratory microplate (96 or384 well) functions as the primary data unit ofthe LIMS. This flexible database structure permitscreation of different plate types, with associated

data fields, extending the functionality of theLIMS to many different applications. Project-spe-cific customization of the LIMS is controlledthrough a set of easily modified meta-data tablescontaining information on microplate types, con-tents, and how contents of microplates are com-bined and stored during laboratory procedures.The contents of microplates (�reagents�) serve as�reactants� that are combined by the user duringin silico �reactions� to create new product plateswithin the database. Users load microplate-spe-cific information using cut-and-paste operationsfrom tab-delimited file formats such as those cre-ated during oligonucleotide primer/probe designand from manufacturer supplied files. Data fromthese files is inherited during subsequent �reac-tions� that create new microplates. LIMS tools areinterfaced through any internet browser and dataaccess is controlled through group and user levelpermissions. User name and time stamps arerecorded for each entry into the LIMS creating apermanent record and detailed audit trail. TheMicrobeArrayDB is built on a multi-tier cli-ent/server architecture model using a pub-licly-available relational DBMS for our back-endtier (PostgreSQL) and Java web technologies formiddle and presentation tiers.

In our initial project configuration, the LIMS iscustomized to track the production of aPCR-based DNA microarray from primer designto product deposition on coated slides. Currentplate types for the microarray configurationinclude: Primer, Template, Primer Pair, PCRProduct, and Printing plates. Contents of theseplates are combined during �reactions��such asthe creation of a PCR Product plate from existingPrimer and Template plates�as they are physi-cally combined in the laboratory. Data inheritanceis structured with microplate data loaded fromtext files produced during primer design and syn-thesis that include information such as geneID/name, primer sequences, design criteria, quan-tity, length, batch control numbers, etc. Processinformation such as PCR reactions, product puri-fication, and quality control data are added by theuser during microarray construction. Internal test-ing has been conducted to track the fabricationprocess of whole-genome expression arrays forShewanella oneidensis MR-1 at Oak Ridge NationalLaboratory. This implementation of the LIMS istargeted for whole-genome microarrays for bacte-ria including Deinococcus radiodurans R1,Desulfovibrio vulgaris, Geobacter metallireducens,Nitrosomonas europaea, Rhodopseudomonas palustris.

Modeling/Computation

Genomes to Life I 35

Page 47: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

A46BioSketchpad: An Interactive Tool forModeling Biomolecular and CellularNetworks

Jonathan Webb1, Lois Welber1, Arch Owen1,Jonathan Delatizky1 ([email protected]), CalinBelta2, Mark Goulian2, Franjo Ivancic2, VijayKumar2, Harvey Rubin2, Jonathan Schug2, and OlegSokolsky2

1BBN Technologies (http://bio.bbn.com/) and 2University ofPennsylvania

The Bio SketchPad (BSP) is an interactive tool formodeling and designing biomolecular and cellularnetworks. It features a simple, easy to use, graphi-cal front end. Descriptive models can be built andparameterized, and converted to forms supportedby external simulation and analysis tools. The cur-rent version of BSP supports CHARON, a highlevel language and toolset for simulating hybridsystems developed at the University of Pennsylva-nia, and is being enhanced to communicate withother simulators, such as Virginia Tech�s JigCell.

BSP was designed with biologists for biologists.It provides an intuitive graphical interface whichallows experimentalists to easily generate workingmodels of networks. Biomolecular reactions sup-ported include transcription, translation, regula-tion, and general protein-protein interactions.BSP supports typical editing operations for grapheditors as well as some specialized operations spe-cific to the biomolecular modeling domain.Chemical species, reactions, and regulations ofreactions are drawn as nodes in the graph. Theuser can control some of the rendering propertiesof specie elements such as the text label, color,and shape of the drawn node. Syntactic con-straints are imposed on the model construction.Node highlighting is used to assist users duringmodel construction. Specialized commands areimplemented for constructing reaction geometriescommon to biochemical systems with specifiednumbers of inputs and outputs. Model parametersare specified in parameter dialogs accessedthrough the graphical presentation of the model.The set of parameters can change depending onwhich types of rate laws or regulation functionsare used.

BSP is under active development. Recent andupcoming enhancements include the use ofSBML Level II to exchange model informationwith other in silico tools, modularization of the

simulator interface to utilize the standards beingdeveloped in the DARPA IPTO BioComp pro-gram, and the ability to represent models hierar-chically.

BSP development has been funded by theDARPA IPTO BioComputation Program, PMDr. Sri Kumar. The BSP application together withthe CHARON simulation environment are avail-able for download.

A48Molecular Docking with AdaptiveMesh Solutions to the Pois-son-Boltzmann Equation

Julie C. Mitchell1 ([email protected]), Lynn F.Ten Eyck1, J. Ben Rosen2, Michael J. Holst3,Victoria A. Roberts5, J. Andrew McCammon4,Susan D. Lindsey1, and Roummel Marcia1

1San Diego Supercomputer Center, 2Department ofComputer Science and Engineering, 3Department ofMathematics, 4Department of Chemistry and Biochemistryand Department of Pharmacology, University of CaliforniaSan Diego; and 5Department of Molecular Biology, TheScripps Research Institute

The Docking Mesh Evaluator (DoME) uses adap-tive mesh solutions to the Poisson-Boltzmannequation to quickly evaluate and optimize dock-ing energies. This is accomplished by interpola-tion of potential functions over an irregular meshthat is dense in high gradient regions. The resultis a method capable of performing detailed energycalculations very quickly. The initial version ofDoME offers many useful tools for computationalstudy of molecular interactions. DoME isintended to bridge the gap between methods thatuse a coarse interaction model for computationalefficiency and those having detailed but expensivecalculations. The software is fully parallel and canrun on supercomputers, clusters and linked inde-pendent workstations.

The Critical Assessment of PRedicted Interactions(CAPRI) is a CASP-inspired exercise in whichthe goal is to predict bound protein-protein struc-tures given their individual crystal structures.Using the Fourier transform-based molecular pro-gram DOT [1], predictions were made for sevensystems in CAPRI Rounds 1 and 2. DOT�s per-formance is an important part of DoME�s devel-opment, since DoME�s energy model is meant tobe a higher precision, continuous version of that

Modeling/Computation

36 Genomes to Life I

Page 48: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

used by DOT. Ours was one of four teams (out ofnineteen) submitting good predictions for threeof the seven systems (the most achieved by anygroup.) This work will appear in a special editionof Proteins [2].

As a useful starting point for the development ofDoME, we are running extensive analysis on theCAPRI systems. Part of the goal in developingDoME is to achieve accurate docking solutionswithout requiring as extensive a search as DOTperforms. We hope to determine how fineness ofthe sampling affects the quality of the results, inparticular at what sampling level DoME returnsresults with a high fraction of residue-residue con-tacts. The effect of local optimization of the bestDoME solutions generated in the global scan isbeing considered, as well as local optimization(using DoME) of solutions generated by DOT�smore exhaustive search. It appears that optimiza-tion can disrupt correct residue-residue contacts,but it is not yet clear whether the reason for this isbiological or algorithmic. We are also implement-ing and testing schemes for global optimization.The aim is to find a consistent �recipe� of scan-ning, local and global optimization that will pro-duce useful results for most protein-proteinsystems. Global optimization for docking presentssome difficulties, as the variables used toparameterize the system are non-homogeneousand in some cases cyclic.

[1] J.G. Mandell, V. A. Roberts, M. E. Pique, V. Kotlovyi, J. C.Mitchell, E. Nelson, I. Tsigelny and L.F. Ten Eyck (2001),“Protein docking using continuum electrostatics and geo-metric fit,” Protein Engineering 14(2): 103—115.

[2] D.H. Law, L.F. Ten Eyck, O. Katzenelson, I. Tsigelny, V.A.Roberts, M.E. Pique and J.C. Mitchell (2002), “Findingneedles in haystacks: Re-ranking DOT results using shapecomplementarity, cluster analysis and biological informa-tion,” Prot. Struct. Fun. Gen. In press.

A50Functional Analysis and Discovery ofMicrobial Genes Transforming Metallicand Organic Pollutants: Database andExperimental Tools

Lawrence P. Wackett([email protected]) and Lynda B.M.Ellis ([email protected])

Center for Microbial and Plant Genomics, University ofMinnesota

It is the major thesis of the current project thatmuch of the breadth of microbial metabolismremains uncatalogued and uncharacterized. Char-acterizing this metabolism represents a major taskof microbial functional genomics. Moreover, thereis a general inability to predict metabolic path-ways when all of the necessary reactions are notfound in databases. The research described hereseeks to better assemble existing metabolic data,discover new microbial metabolism, and predictmetabolic pathways for compounds not yet indatabases.

Approximately half the chemical elements aremetallic and metalloid. Microbial metabolism ofmany of these elements, and compounds contain-ing them, have been poorly studied relative tocommon intermediary metabolism, the typicalfocus of functional genomic analysis. Yet, recentstudies suggest that many microbes have broadabilities to transform metals, metalloid elements,and compounds containing those elements. In thecurrent project, the web-based University of Min-nesota Biocatalysis/Biodegradation Database(UM-BBD) has been expanded to include infor-mation on microbiological interactions with 52chemical elements. For each element, there is nowa webpage with annotation on the major micro-bial interactions with that element, links toMedline, and access to further UM-BBD informa-tion. The information above has been coordinatedwith several depictions of the periodic table of theelements, one a classical table with columns androws, and secondly a depiction of the elements ina spiral. The latter serves to cluster elementsbetter with respect to their interactions withmicrobiological systems. The URLs for the rele-vant pages are given below:• Biochemical Periodic Tables (overview

website): http://umbbd.ahc.umn.edu/periodic/

Modeling/Computation

Genomes to Life I 37

Page 49: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

• Traditional Periodic Table:http://umbbd.ahc.umn.edu/periodic/periodic.html

• Spiral Periodic Table:http://umbbd.ahc.umn.edu/periodic/spiral.html

The pages above have added hundreds of newlinkages to UM-BBD compound pages. Forexample, the mercury element page has 4 links,arsenic has 9 links, and chlorine has 129 links toUM-BBD compounds, respectively.

An important facet of the current project is to dis-cover new metabolism and functionally analyzethe novel microbial enzymes and genes involved.A bioinformatic analysis has shown that on theorder of one hundred chemical functional groupsare found in natural products, yet only fifty arecurrently known to undergo microbial transfor-mation. In this context, complete genomesequence annotation will require the identificationof genes and enzymes that metabolize the fulldiversity of elements and functional groups thatmicrobes act on in the environment. In the cur-rent project, we are uncovering the molecularbasis of microbial metabolism, some of which hasbeen previously unknown and uninvestigated. Forexample, we have investigated the metabolism ofbismuth compounds, boronic acids, azetidine ringcompounds, and novel organonitrogen com-pounds.

Another goal of the project has been to develop atool to predict microbial catabolism, using theUM-BBD as a knowledge base. The objective isto propose one or more plausible biodegradationschemes for compounds whose metabolism is notyet known. To begin, a system for substructuresearching was added to the UM-BBD, which bothimproved the database and was a necessary com-ponent for developing a metabolism predictionsoftware. The metabolism prediction software isbased on rules that describe fundamental micro-bial reactions. At present, there are 88 rules in thebiotransformation rule database, each specifyingthe atoms and their positions in a functionalgroup and the biotransformation reaction thatthey undergo. The software can now predictbiodegradation pathways for a significant numberof aliphatic and aromatic compounds. The proto-type system has been offered to UM-BBD usersto use and critique. The system will be expandedwith input from our Scientific Advisory Boardand the broader scientific community in the com-ing year.

A52Comparative Genomics Approaches toElucidate Transcription RegulatoryNetworks

Lee Ann McCue1* ([email protected]),William Thompson1, C. Steven Carmack1, ZhaohuiS. Qin2, Jun S. Liu2, and Charles E. Lawrence1

*Presenter

1The Wadsworth Center, New York State Department ofHealth, Albany, NY 12201; and 2Department of Statistics,Harvard University, Cambridge, MA 02138

The ultimate goal of this research is to delineatethe core transcription regulatory network of aprokaryote. Toward that end, we are developingcomparative genomics approaches that aredesigned to identify complete sets of transcriptionfactor (TF) binding sites and infer regulons with-out evidence of co-expression. Using Escherichiacoli as our model system, we have developed aphylogenetic footprinting technique to identifyTF binding sites upstream of every operon in theE. coli genome. This method requires the genomesequences of several closely related species, andemploys an extended Gibbs sampling algorithmto analyze orthologous promoter data. Using thepromoters of 166 E. coli operons and a databaseof experimentally verified TF binding sites for val-idation, we have evaluated our ability to predictregulatory sites with this method, and addressedthe questions of which species are most useful andhow many genomes are sufficient for comparison.Orthologous promoter data from just three spe-cies were sufficient for ~75% of predicted sites tooverlap the experimentally verified sites by 10 bpor more. A genome-scale phylogeneticfootprinting study of E. coli identified 741 predic-tions above a threshold for statistical significance(p < 0.05) determined using randomized datasimulations. We have also developed a novelBayesian clustering algorithm to cluster these pre-dictions thereby identifying 181 putativeregulons, most of which are orphans�that is, thecognate TF is not known. This strategy to inferregulons utilizes only genome sequence informa-tion and is complimentary to and confirmative ofgene expression data generated by microarrayexperiments. We are now applying these technolo-gies to the Synechocystis PCC6803 genome.

Modeling/Computation

38 Genomes to Life I

Page 50: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

A54Predicting Genes from ProkaryoticGenomes: Are “Atypical” GenesDerived from Lateral Gene Transfer?

John Besemer1, Yuan Tian2, John Logsdon1, andMark Borodovsky2 ([email protected])

1Department of Biology, Emory University, Atlanta, GA; and2School of Biology, Georgia Technical Institute, Atlanta, GA

Algorithmic methods for gene prediction havebeen developed and successfully applied to manydifferent prokaryotic genome sequences. As theset of genes in a particular genome is not homo-geneous with respect to DNA sequence composi-tion features, the GeneMark.hmm programutilizes two Markov models representing distinctclasses of protein coding genes denoted �typical�and �atypical.� Atypical genes are those whoseDNA features deviate significantly from thoseclassified as typical and they represent approxi-mately 10% of any given genome. In addition tothe inherent interest of more accurately predictinggenes, the atypical status of these genes may alsoreflect their separate evolutionary ancestry fromother genes in that genome. We hypothesize thatatypical genes are largely comprised of thosegenes that have been relatively recently acquiredthrough lateral gene transfer (LGT). If so, whatfraction of atypical genes are such bona fide LGTs?We have made atypical gene predictions for allfully completed prokaryotic genomes; we havebeen able to compare these results to other �sur-rogate� methods of LGT prediction. In order tovalidate the use of atypical genes for LGT detec-tion, we are building a bioinformatic analysispipeline to rigorously test each of the gene candi-dates within an explicit phylogenetic framework.This process starts with gene predictions and endswith a phylogenetic reconstruction of each candi-date. From the set of bona fide LGTs that we haveidentified, we will be able to determine the LGTparameters to which our gene finding programsare most sensitive (i.e. time scale of transfers,phylogenetic distance from transfer source, etc.).We are developing this pipeline using fourcyanobacterial genomes as our test set:Prochlorococcus marinus str. MIT 9313,Prochlorococcus marinus subsp. Pastoris str.CCMP1378, Synechococcus sp.WH 8102,Synechocystis sp. PCC 6803, the first three of whichare nearly complete genomes from DOE. Fromthis initial analysis, we are estimating the extent

and pattern of LGT in each of these genomes. Wewill then extend our studies to include all availablegenomes, both complete and nearly complete.

A56Advanced Molecular Simulations of E.coli Polymerase III

Michael Colvin1 ([email protected]), FeliceLightstone1, Ed Lau1, Ceslovas Venclovas1, DanielBarsky1, Michael Thelen1, Giulia Galli2, EricSchwegler2, and Francois Gygi3

1Biology and Biotechnology Research Program, 2Physics andAdvanced Technology Directorate, and 3Center for AppliedScientific Computing, Lawrence Livermore NationalLaboratory, Livermore, CA 94552

The goal of this project is to use advanced molec-ular simulation methods to improve our under-standing of several key mechanisms in the E. coliDNA polymerase III (Pol III). Pol III is the pri-mary replicating polymerase in E. coli and theholoenzyme consists of at least 10 protein sub-units that carry out different functions, includingtemplate-driven DNA replication (subunit á),replicative error correction (subunit e), tetheringof the Pol III complex on the DNA by a �clamp�(subunit â), and the clamp loading complex (ã, ä,ä’, ø and ÷ subunits). In this poster we will pres-ent several key results from this study involvingsimulation methods ranging from homology-based structure prediction to first principlesmolecular dynamics simulations.

Although the three-dimensional structure of E.coli polymerase III â-clamp is known, and thestructure of the clamp-loading complex has beensolved recently, the mechanism of loading theâ-clamp onto primed DNA sites remains unclear.One of the unanswered questions is what forcesact to direct DNA into the â-clamp, once itbecomes opened by the clamp-loader. The crystalstructures of the â-clamp, clamp-loader and themonomer of the â-clamp complexed with one ofthe subunits (ä) of the clamp-loader makes it clearthat there are no flexible domains/subdomains ofany kind that could push DNA inside thering-shaped â-clamp. On the other hand, theexperimental evidence suggests that theATP-dependent clamp-loading reaction is veryefficient, arguing against a random diffu-sion-based mechanism.

Modeling/Computation

Genomes to Life I 39

Page 51: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

We hypothesized that the lack of the structural�helper� motifs might be compensated by otherproperties of the â-clamp itself, such aselectrostatics. The crystal structure of a mutantâ-clamp provided us with a model of the clampopened at a single interface. Although the open-ing at the interface is too small for DNA to passinto the ring, we considered it a feasible model forthe estimation of the effect of the electromagneticfield on the negative charges (DNA) in the vicin-ity of the opening. Using combination of DelPhi(a program to solve the non-linear Pois-son-Boltzmann equation) and GRASP, we ana-lyzed the electrostatic properties of the openâ-clamp. The computational study revealed thatthe vectors of the electromagnetic field near theopened â-clamp interface are directed such thatthe negative electric charge (like DNA) would bedrawn into the opening (see Figure). Interest-ingly, if the interface is opened very widely, theelectrostatic guiding effect largely disappears. Weare also performing classical molecular dynamicssimulations of the â-subunit and its interactionswith DNA.

The epsilon (å) subunit is the 3'�5' exonuclease inthis DNA polymerase and interacts with the á(polymerase unit) and è (unknown function) sub-units. Epsilon is able to hydrolyze thephosphodiester backbone of either single or dou-ble stranded DNA. This enzyme requires Mn2+ orMg2+ for activity. The exonuclease activity residesin the N-terminus (residues 1-186). The C-termi-nus (residues 187-242) binds to the á subunit.The crystal structure of å (residues 1-186)complexed with the p-nitrophenyl ester of TMP atpH 5.8 and 8.5 has recently been solved. Thesetwo structures have been provided by our externalcollaborator to serve as the starting point for thisstudy. In this classical molecular dynamics studythe interactions formed between a trinucleotide,modeled into the active site, and the å subunitwere investigated. Four molecular dynamics simu-lations were performed using explicit solvent mol-ecules. In three separate simulations, the likelygeneral base amino acid (His162) was modeled asa neutral residue (protonated at ND1 or NE2)and an ionized residue. The fourth simulationcontained the quantum chemical gas-phase transi-tion state docked into the active site and His162was modeled as an ionized residue.

Our results show that the phosphodiester back-bone interacts with the two Mg2+ ions and å, thenucleobases form surprisingly few interactionswith å. G1 of the trinucleotide is almost com-pletely solvent exposed. Only Met18, Asn99, and

Phe102 consistently interact with the bases in allsimulations. Glu61 and Ala62 interacted with thebases in the majority of the simulations. In con-trast, the phosphate undergoing hydrolysis ishighly stabilized in the active site. There is a mini-mal amount of motion in this group relative tothe rest of the nucleotide. Only in the TS simula-tion does His162 stay close to the phosphatethroughout the simulation. His162 is situated in amobile loop which exhibits some of the highestfluctuations in the crystal structure.

In our efforts to understand the basic chemistry ofthe epsilon subunit (exonuclease), we have usedfirst principles molecular dynamics simulations(FPMD) to simulate a model nuclease substrate,dimethyl phosphate (DMP) and the hydrox-ide-induced hydrolysis of DMP. These simulationswere run on a massively parallel computer and the14 ps DMP simulation is the longest FPMD everreported. The results of this simulation show thatthe proximity of the solvated magnesium ioninduces conformational changes in DMP. Thesesubsequent conformations are higher energy con-formations and could be a factor as how theenzyme cleaves the DNA backbone so easily.Hydroxide attack on DMP was simulated by fix-ing the distances over the reaction coordinate andthen sampling for 3 ps for each constrained dis-tance. The results show a small shoulder in the

Modeling/Computation

40 Genomes to Life I

Colvin—Fig. 1. Electrostatic surface of the slightly opened

â-clamp. Lines correspond to the lines of the electric field. Arrows

indicate directionality of the electric field, and their size is

proportional to the calculated force. Calculations were made in

solution, considering dielectric constant 2.0 for the protein interior

and 80.0 for the solvent and the physiological ionic strength (0.15

M).

Page 52: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

reaction free energy profile after the initial transi-tion state as hydroxide attacks DMP. These resultsprovide a reference system for subsequent simula-tions for the exonuclease-catalyzed reaction, andfurther are providing clues to answering the40-year debate whether phosphate hydrolysis hasa single transition state or has a stable intermedi-ate.

This work was performed under the auspices of the U. S.Department of Energy by the University of California, LawrenceLivermore National Laboratory under Contract No.W-7405-Eng-48.

A58Karyote®: Automated Physico-Chemi-cal Cell Model Development ThroughInformation Theory

Peter J. Ortoleva ([email protected]), AbdallaSayyed-Ahmad, Ali Navid, Kagan Tuncay, andElizabeth Weitzke

Center for Cell and Virus Theory, Indiana University

The dynamics of a cell are modeled using a reac-tion, transport, and genetic simulator, Karyote®,in order to predict cell behavior in response tochanges in its surroundings or to modifications ofits genetic code. Our methodology accounts forthe organelles of eukaryotes and the specializedzones in prokaryotes by dividing the volume ofthe cell into discrete compartments. Each com-partment exchanges mass with other compart-ments either through membrane transport or witha time delay effect associated with molecularmigration (e.g. as for the nucleoid inprokaryotes). In each compartment multiple met-abolic, proteomic and genomic reactions takeplace. All couplings among processes areaccounted for. A multiple scale technique allowsfor the computation of processes that occur on awide range of time scales.

Karyote® allows for the investigation of variouscell behaviors that arise due to gene mutation,presence of external chemical agents or other fac-tors. The underlying equations integrate the met-abolic, proteomic and genetic networks (see Fig.1). Catalyzed polymerization kinetics transcribesmRNA from an input DNA sequence while theresulting mRNA is used via ribosome-mediatedpolymerization kinetics to accomplish translation.Feedback associated with the creation of speciesnecessary for metabolism by the genomic/

proteomic network modifies the rates of produc-tion of factors (e.g. nucleotides and amino acids)that affect the genome/proteome dynamic. Hence,the effect of genetic mutations on overall cellbehavior is accounted for. The concentration andsequence of the predicted proteins can be com-pared with experimental data via the constructionof synthetic tryptic digests and associated massspectra.

The complex network of biochemical reac-tion/transport processes and their biochemicalspatial organization make the development of apredictive model of a living cell a grand challengefor the 21st century. However, advances in reac-tion/transport modeling and the exponentiallygrowing databases of genomic, proteomic, meta-

Modeling/Computation

Genomes to Life I 41

Ortoleva—Fig. 1a and b. 1 (a) Karyote predicted time dependence of

nucleotide concentrations during transcription for the gene

TACTTTTAGGGG. As nucleotides are depleted, mRNA synthesis slows down,

illustrating an important feature of coupled genome, proteome,

metabolome dynamics captured in Karyote. (b) Karyote predicted evolution

of concentration over time of DNA, RNA Polymerase II, all of their

complexes involved in transcription/translation and mRNA created under

the same condition as seen in Fig 1(a) (Weitzke and Ortoleva 2003).

1a

1b

Page 53: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

bolic and bioelectric data make cell modeling fea-sible if these two elements can be automaticallyintegrated in an unbiased fashion. We have devel-oped a procedure to integrate data with Karyote®using information theory (see Fig. 2).

Our procedure provides an objective approach forintegrating a variety of types and qualities ofexperimental data. Data that can be used in thisapproach include NMR, spectroscopy, microscopyand cellular bioelectric information. The approachis demonstrated on the well-studied Trypanosomabrucei system.

A major obstacle for the development of a predic-tive cell model is that the complexity of these sys-tems makes it unlikely that any model presentlyavailable will soon account for a complete set ofprocesses. Thus, not only is the model-buildingendeavor labor intensive, but also at any stage oneis faced with the challenge of calibrating and run-ning an incomplete model. We present a probabil-ity functional method that allows the integrationof quantitative and qualitative experimental dataand physically motivated regularization to delin-eate the time course of the concentration of com-ponents (e.g. an enzyme) whose role may be keyto the dynamics of the processes already incorpo-rated in the model, but the reaction creating ordestroying it are not yet understood.

A60The Commercial Viability of EXCAVA-TOR™: A Software Tool For GeneExpression Data Clustering

Robin D. Zimmer1* ([email protected]),Morey Parang2*, Dong Xu3, and Ying Xu3

*Presenters

1ApoCom Genomics, 11020 Solway School Road,Knoxville, TN 37931; and 2Oak Ridge National Laboratory

ApoCom Genomics, in collaboration with OakRidge National Laboratory, is being funded undera DOE Phase I SBIR Grant (DE-FG02-02ER83365) to assess the commercial viability ofa novel data clustering tool developed by Drs.Ying Xu, Victor Olman and Dong Xu (Xu, et.al.,2001). As we enter into an era of advancedexpression studies and concomitant voluminousdatabases, there is a growing need to rapidly ana-lyze and cluster data into common expression andfunctionality groupings. To date, the most preva-lent approaches for gene and/or protein clusteringhave been hierarchical clustering (Eisen et.al.,1998), K�means clustering (Herwig et al., 1999),and clustering through Self-Organizing Maps(SOMs) (Tamayo et al.,1999). While theseapproaches have all clearly demonstrated theirusefulness, they all have inherent weaknesses.First, none of these algorithms can, in general,rigorously guarantee to produce globally optimalclustering for any non-trivial objective function.Moreover K-means and SOMs heavily dependupon the �regularity� of the geometric shape ofcluster boundaries, and they generally do notwork well when the clusters cannot be containedin some non-overlapping convex sets.

For cases where boundaries between clusters maynot be clear, an objective function addressingmore global properties of a cluster is needed.Three clustering algorithms, along with a mini-mum spanning tree (MST) representation, havebeen implemented within a computer programcalled EXpression data Clustering Analysis andVisualizATiOn Resource (EXCAVATOR�). Ourresearch team has conducted a comparisonbetween the EXCAVATOR� clustering algorithmand the widely used K-means clustering algorithmusing rat central nervous system (CNS) data. Twocriteria were employed for the comparison. Thefirst was based on the jackknife approach to assessthe predictive power of the clustering algorithm,

Modeling/Computation

42 Genomes to Life I

Page 54: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

and the second was based on the separability qual-ity of clusters. All three of the EXCAVATOR�algorithms (MST-hierarchical, MST-iterative, andMST-global optimal) outperformed the K-meansalgorithm relative to predictive power and separa-bility quality.

In addition to comparative studies to assess theusefulness of EXCAVATOR�, the team hasdeveloped an advanced graphical user interface(GUI). The GUI has been designed to affordmaximum flexibility incorporating the multi-clus-tering data visualization, as well as user drivencomparison and editing capabilities. EXCAVA-TOR�s� data visualization component is based ona modular/flexible approach so as to extend itscapability to other clustering/classification areas,such as phylogeny, sequence motif recognition,and protein family recognition.

References

Eisen, M.B., Spellman, P.T., Brown, P.O. and Botstein, D.(1998) Cluster analysis and display of genome-wide expression pat-terns. Proc. Natl Acad. Sci. USA, 95, 14 863-14 868.

Herwig, R., Poustka, A.J., Müller, C., Bull, C., Lehrach, H. andO’Brien, J. (1999) Large-scale clustering of cDNA-fingerprintingdata. Genome Res., 9, 1093-1105.

Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S.,Dmitrovsky, E., Lander, E.S. and Golub, T.R. (1999) Inter-preting patterns of gene expression with self-organizing maps: methodsand application to hematopoietic differentiation. Proc. Natl Acad.Sci. USA, 96, 2907-2912.

Xu, Y., Olman, V. and Xu, D. (2001) Clustering Gene ExpressionData Using A Graph-Theoretic Appraoch: An Application of Mini-mum Spanning. Bioinformatics. Vol.18, no. 2002.

A62Modeling Electron Transfer inFlavocytochrome c3 FumarateReductase

Dayle M. Smith, Michel Dupuis, Erich R. Vorpagel,and T. P. Straatsma ([email protected])

Computational BioSciences Group, Biological SciencesDivision, Pacific Northwest National Laboratory, P.O. Box999, MS K1-83, Richland, WA 99352, (509) 375-2802, Fax(509) 375-6631

Ferric and ferrous hemes, such as those present inelectron transfer proteins, often have low-lyingspin states that are very close in energy. In orderto explore the relationship between spin state,geometry and cytochrome electron transfer in theflavocytochrome c3 fumarate reductase ofShewanella frigidimarina, we investigate, usingdensity functional theory, the relative energies,electronic structure, and optimized geometries fora high- and low-spin ferric and ferrous hememodel complex. Our model consists of aniron-porphyrin axially ligated by two imidazoles,which model the interaction of a heme withhistidine residues. Using the B3LYP hybrid func-tional, we found that, in the ferric model hemecomplex, the doublet is lower in energy than thesextet by 8.4 kcal/mol, and the singlet ferrousheme is 6.7 kcal/mol more stable than the quintet.The difference between the high-spin ferric andferrous model heme energies yields an adiabaticelectron affinity (AEA) of 5.24 eV, and thelow-spin AEA is 5.17 eV. Both values are largeenough to ensure electron trapping, and electronicstructure analysis indicates that the iron dð orbitalis involved in the electron transfer betweenhemes. Mössbauer parameters calculated to verifythe B3LYP electronic structure correlate very wellwith experimental values. Isotropic hyperfine cou-pling constants for the ligand nitrogen atomswere also evaluated. The optimized geometries ofthe ferric and ferrous hemes are consistent withstructures from X-ray crystallography and revealthat the iron-imidazole distances are significantlylonger in the high-spin hemes, which suggeststhat the protein environment, modeled here bythe imidazoles, plays an important role in regulat-ing the spin state. Iron-imidazole dissociationenergies, force constants and harmonic frequen-cies were calculated for the ferric and ferrouslow-spin and high-spin hemes. In both the ferricand ferrous cases, a single imidazole ligand ismore easily dissociated from the high-spin hemes.

Modeling/Computation

Genomes to Life I 43

Page 55: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Environmental Genomics

B1Identification and Isolation of Active,Non-Cultured Bacteria fromRadionuclide and Metal ContaminatedEnvironments for Genome Analysis

Cheryl R. Kuske1 ([email protected]), Susan M.Barns1, and Leslie E. Sommerville1,2

1Los Alamos National Laboratory; and 2Ft. Lewis College,Durango, Colorado

The overall goals of this project are to identifyactive, non-cultured soil bacteria present inradionuclide and metal contaminated environ-ments, and to collect bacterial cells representingabundant, non-cultured populations for genomeanalysis. Our studies have focused on non-cul-tured members of the Acidobacterium division.Members of this division are widespread in con-taminated and pristine soils having vastly differentphysical and chemical characteristics, and theyhave been found to represent a major fraction ofthe non-cultured bacteria in several soils (by 16SrRNA clone library analysis). Toward accomplish-ing the above goals we have made progress onfive specific objectives.

Objective 1: Analysis of soil bacterial communityrRNA. Historically, rRNA gene-based surveys ofbacteria have been conducted on the pool ofDNA encoding the rRNA gene. This provides asnapshot of the total composition of a sample, butdoes not indicate which members may be active atthe time of sampling. We have conducted parallelDNA- and RNA-based analyses from a surfacesoil and a bacterial cell preparation extracted fromthat soil. By comparing the DNA-based composi-tion to a parallel rRNA-based analysis, we hopeto identify which components of the total com-munity are active (ie. contain abundant ribo-somes).

Objective 2: Survey of Acidobacterium divisionmembers in contaminated soils. Our secondobjective is to conduct a survey of DOE sites con-taminated with radionuclides and metals to deter-mine the diversity of Acidobacterium divisionbacteria and identify active members of this divi-

sion in contaminated soils. Collections are inprogress from the NABIR FRC (Oak Ridge,TN), PNNL, an UMPTRA site in Rifle, CO, andthe Nevada Test Site. Through, we are assessingthe presence and diversity of Acidobacterium divi-sion members using simultaneous DNA andRNA extraction, followed by 16S rRNA PCRand RT-PCR for rDNA and rRNA, respectively.Analysis of contaminated and background sitesfrom the FRC indicate (a) subsurface biomass wasvery low and only DNA was recoverable in suffi-cient quantities for analysis from this site. (b) Thecomposition of Acidobacterium division bacteria issimilar across replicates of the background site,but is very different from the contaminated sites(saturated zone near wells TPB15, PTB16, DB13in Area 2). (c) The contaminated sites containvery diverse Acidobacterium division members.Two new subgroups comprise the majority of 16SrRNA clones from the sample taken near wellTPB16. A significant number of clones for one ofthe subgroups were also present in the other twocontaminated sites. Clones for these new sub-groups have not been found in samples from thebackground site.We are continuing to collect sur-face and subsurface samples from the other fieldsites for RNA- and DNA-based composition anal-yses.

Objective 3: Collection of Acidobacterium divisionmembers from soil for genome analysis. We havecontinued efforts to develop sensitive, specifichybridization methods for detection ofAcidobacterium division members in environmen-tal samples using riboprobes, and to collecthybridized cells from soil bacteria using flowcytometry cell sorting. We have developed a seriesof specific hybridization probes for the divisionand some of its major subgroups. In collaborationwith Hong Cai (LANL), we determined thatalthough cells could be specifically labeled andobserved microscopically, the fluorescence inten-sity of each cell was too low to allow cell sortingon the LANL instrument. We are currently work-ing with Diversa Corp. and PNNL to conduct cellsorting experiments on a Diversa instrument withbetter calibration for bacterial cells. At LANL, weare using gradient centrifugation to obtain

Genomes to Life I 45

Page 56: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Acidobacterium division-enriched bacterial cellpreparations for mixed genome libraries.

Objective 4: Culture of Acidobacterium divisionspecies from soils. In collaboration with MartinKeller (Diversa Corp.) and Fred Brockman(PNNL) we are using Diversa’s Gel Microdroplet(GMD) technology in attempts to cultureAcidobacterium division cells from our soil sam-ples. Diversa has attempted culture in a matrix ofdifferent media and aeration conditions. To datewe have several candidate cultures representingsubgroups 1 and 6. DNA extracted from culturedcells will be used to generate genomic libraries atLANL.

Objective 5: Analysis of contaminated soil micro-cosm RNAs. We will use ribosomal RNA (todetermine species composition) and messengerRNA (to determine active functions) to examinebacterial community response to radionuclidecontamination in soil microcosm experiments. Incollaboration with Mary Neu (LANL), we are set-ting up preliminary soil microcosms to supportmethod development and for preliminary analysisof functional response of the natural soil bacterialcommunity to different forms of Pu and U.

B3A Metagenomic Library of BacterialDNA Isolated from the DelawareRiver

David L. Kirchman, ([email protected]),Matthew T. Cottrell, and Lisa Waidner

College of Marine Studies, University of Delaware, Lewes,DE 19958

Most bacteria and archaea in natural environ-ments still cannot be isolated and cultivated aspure cultures in the laboratory, and the microbesthat can be cultured appear to be quite differentfrom uncultured ones. Consequently, the phylo-genetic composition, physiological capacity andgenetic properties of natural microbes have to bededuced from fluorescence in situ hybridization(FISH) assays and bulk properties of microbialassemblages, and from a variety of PCR-basedmethods applied to DNA isolated directly fromnatural samples. Another culture-independentapproach is to clone this DNA directly intoappropriate vectors and to screen the resulting“metagenomic library”, which theoretically con-

sists of all possible genes from the microbialassemblage. We applied this general approach to asample from the freshwater end of the DelawareEstuary as part of our efforts to understand car-bon and nitrogen cycling in environments likeestuaries with large environmental gradients.Metagenomic libraries have been constructed forsoils and marine samples, but not for freshwaters.High molecular weight DNA from the bacterialsize fraction was isolated and cloned into thefosmid vector pEpiFOS-5 (Epicentre). Ourlibrary consisted of 4608 clones with an averageinsert size of 40 kB, representing about 90genomes, if we assume a genome size of 2 mB.

Screening the library revealed several surprises,including genes found previously in metagenomiclibraries of oceanic samples. Our library appearsto be dominated by Cytophaga-like bacteriaaccording to the 16S rRNA data collected byDGGE analysis of PCR amplified 16S rRNAgenes. Of the 80 clones bearing 16S rRNA genes,about 50% appear to be from theCytophaga-Flavobacteria, a complex cluster in theBacteroidetes division. FISH analysis of the origi-nal microbial assemblage indicated thatCytophaga-like bacteria were only about 15% ofthe community. The next most abundant 16SrRNA genes in the library are from G+Actinobacteria, which others have shown to beabundant in freshwater lakes. Butbeta-proteobacteria usually dominate freshwatersystems and were the most abundant group in oursample according to the FISH analysis, yetbeta-proteobacteria accounted for only about15% of the 16S rRNA genes in the metagenomiclibrary, much less than the 25% found by FISH.

The library was also screened for hydrolysis of thefluorescent analog of cellulose, MUF-beta-1,4-glucoside. Twenty-four of the 2,400 clonesscreened had cellulase activity, which was inferredfrom rates of analog hydrolysis 2.5-fold greaterthan the control with vector alone. The activity ofseven clones was 3-fold greater than the control,while 40 additional clones had activities between2 and 2.5-fold higher than the control. The varia-tion of activities observed suggests that the librarycontains genes encoding variety of glycosyl hydro-lases capable of cleaving this fluorgenic analogue.One of the cellulase-active clones was determinedto harbor a 16S rRNA gene from a Cytophaga-likebacterium. This clone is now being completelysequenced by John Heidelburg (TIGR).

We also screened the library for genes indicativeof two newly-discovered photoheterotrophic

Environmental Genomics

46 Genomes to Life I

Page 57: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

metabolisms. Our fosmid library does not appearto contain the proteorhodopsin gene, which hadbeen found by Beja et al. (Science (1999) 289:1902-1906) in a marine metagenomic library.However, we found two clones that contain pufMand pufL, which code for reaction center proteinsin bacteria carrying out anoxygenic photosynthe-sis. Although well known to be present in anoxicenvironments, the biophysical evidence of Kolberet al. (Science (2001) 292: 2492-2495) indicatesthat aerobic anoxygenic photosynthesizing bacte-ria are also present and perhaps arebiogeochemically important in the oxic habitats,such as the oceans. One 33 kb clone from ourlibrary has a pufL sequence most similar to theprotein sequence of the gamma-proteobacteriumAllochromatium vinosum, whereas pufM of thisclone is most similar to the freshwaterbeta-proteobacterium Rhodoferax fermentans. Thesecond fosmid clone contains a 35 kb insert withthe PufL-M reaction center complex ofalpha-proteobacterial origin. The PufL sequence ismost similar to that of alpha-4 proteobacteriasubgroup, whereas the PufM protein sequence isrelated to Monterey Bay environmental BACclones and Monterey Bay isolates in thealpha-proteobacterial Roseobacter clade.

The diversity of rRNA genes, enzyme activitiesand puf genes in this freshwater library is consis-tent with our expectation that freshwater environ-ments harbor diverse assemblages of microbes.The large number of clones with Cytophaga-like16S rRNA sequences and with apparent glycosylhydrolase activity is encouraging. Additionalscreening of hydrolase-active clones byfosmid-end sequencing may provide additionallinks to clones representing Cytophaga-like bacte-ria, which would support our efforts to explorethe hydrolytic capabilities and DOM cycling bythis important group of aquatic bacteria.

B5Approaches for Obtaining GenomeSequence from Contaminated Sedi-ments Beneath a Leaking High-LevelRadioactive Waste Tank

Fred Brockman1 ([email protected]),Margaret Romine1, Kristin Kadner2, PaulRichardson2, Karsten Zengler3, Martin Keller3, andCheryl Kuske4

1Pacific Northwest National Laboratory; 2DOE JointGenome Institute; 3Diversa Corporation; and 4Los AlamosNational Laboratory

The SX Tank Farm at the U.S. Department ofEnergy�s Hanford Site was built in 1953 toreceive high level radioactive waste, and consistsof one million gallon enclosed tanks. The wasteresulted from recovery of purified plutonium anduranium from irradiated production fuels usingmethyl isobutyl ketone, aluminum nitrate, nitricacid, and sodium dichromate. Between 1962 and1969, tens of thousands of gallons of radioactiveliquid leaked from tank SX-108. An extreme envi-ronment formed in the vadose zone from incur-sion of radioactive, caustic, and toxiccontaminants and heating from the self-boilingcontents of the tank. Samples contain up to 50microCuries of Cesium-137 per gram sediment,nitrate at 1% to 5% of sediment mass, pH’s to9.8, and were heated to 50 to 70 degrees C.These samples are the most radioactive sedimentsstudied at the DOE Hanford Site in Washingtonstate, and we hypothesized selection would resultin a relatively nondiverse community. Previouswork demonstrated the presence of a low numberof cultured organisms from these sediments(Fredrickson et al, to be submitted).

The vast majority of microbial diversity in envi-ronmental samples has proved refractory to culti-vation and therefore genome, proteome, andmetabol- omic analysis. New strategies are neededto access the repertoire of genes, proteins andmetabolic capabilities embodied in the commu-nity�s genomic sequences. The project goal is todemonstrate an approach to obtain geneti-cally-linked genome sequence from members ofthis low-biomass community. Our initial approachwas to screen sediments and enrichments for thepresence of communities dominated by a very fewmicrobes representing uncultured or poorly cultureddivisions (Hugenholtz et al, 1998). The purposeof the screening was to identify an appropriate

Environmental Genomics

Genomes to Life I 47

Page 58: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

sample for constructing a community BAC libraryand then performing high throughput sequencingof BAC ends to assemble >1,000 Kbp of geneti-cally linked sequence.

To evaluate the utility of this approach we firstcharacterized the environmental DNA from 8 sed-iment samples and 91 enrichments by 16S ampli-fication with conserved primers followed bysequencing of clones at the DOE ProductionGenomics Facility. We also performed PCR�s onthe samples using primers targeting 13 divisionsof uncultured or poorly cultured bacteria, andcloned and sequenced successful PCR’s. Theresults showed (1) only one sequence out of over9,000 clones showed a BLAST hit to unculturedor poorly cultured bacteria, (2) Archaea 16Ssequences could not be amplified, (3) biomasswas about 10^5 viable cells per gram sediment,and (4) the depth of community penetration inthe sediments was poor because the detectionlevel (80,000 copies per gram sediment) was only2 to 3 times higher than the indigenous templateconcentrations (determined by competitive PCR).Never the less, sequencing showed between 3 and15 putative genera per sediment, and two of thehighly contaminated sediments were dominatedby alpha and/or beta proteobacteria.Proteobacteria are not primary inhabitants ofHanford Site deep vadose zone sediments, sug-gesting in situ microbial growth on componentsof the waste.

Because the results showed that our initialapproach for studying genomes of uncultured andpoorly cultured microbes at this site was not feasi-ble, we pursued a novel culturing approach devel-oped by Diversa Corporation. SX-108 sedimentswere grouped into high rad-low nitrate; highrad-high nitrate; low rad-high nitrate; and lowrad-low nitrate groups. Nearby sediments recov-ered from similar depths were pooled to providean uncontaminated control sample. As a positivecontrol, a soil sample from Los Alamos Nat’l Lab.with a natural population known to be comprisedof >25% Acidobacterium cells was prepared incollaboration with Cheryl Kuske. Cells were puri-fied and concentrated from each sample usingmultiple nycodenz gradient centrifugations. Sin-gle cells were encapsulated in individual gelmicrodroplets (gmd’s) and the community recon-stituted by placing gel microdroplets into a col-umn. The community was grown under lownutrient flux conditions and gmd’s sorted by flowcytometry using intrinsic forward and side scatterto detect those containing microcolonies of, onaverage, 50 to 200 cells. Key aspects of this tech-

nology are that it enables propagation of singleorganisms with extremely slow growth rates, andpreserves some of the community interactions andother specific requirements needed for successfulcultivation.

For the positive control sample, sorted gmd�swere screened with primers specific forAcidobacterium division. 16S sequences wereamplified from positive gmd�s and a number ofthe sequences group in a phylogenetic tree withsubgroup 6 of the Acidobacterium division (seeposter by Cheryl Kuske and Sue Barnes). Thisrepresents the first known culturing of these bac-teria. Our next steps are to extract DNA by eitherstandard techniques or employing whole genomeamplification from small numbers of cells, andperform partial genome sequencing.

A total of 14,000 gmd�s with putativemicrocolonies have been sorted from the four setsof pooled SX-108 sediments and from the uncon-taminated control. In the previous study, less than50 isolates were obtained from plating these samesamples. A subset of these gmd�s will be ampli-fied, cloned, and sequenced to compare the com-munity structure in these samples to one another,to the previously obtained isolates, and to the 16Ssequences obtained by direct extraction of DNAfrom the sediments. Our hypothesis is that thegmd microcolonies will represent microbes from anumber of poorly cultured or uncultured divi-sions. One or more of the most unique microbeswill be used for partial genome sequencing as out-lined above.

B7Ecological and Evolutionary Analysesof a Spatially and Geochemically Con-fined Acid Mine Drainage EcosystemEnabled by Community Genomics

Gene W. Tyson, Philip Hugenholtz, and Jillian F.Banfield ([email protected])

Department of Environmental Science, Policy andManagement, Earth and Planetary Sciences, University ofCalifornia, Berkeley

Subsurface acid mine drainage (AMD) ecosystemsare ideal models for the genome-enabled study ofmicrobial ecology and evolution because they arephysically isolated from other ecosystems and arerelatively geochemically and biologically simple.

Environmental Genomics

48 Genomes to Life I

Page 59: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

We are using culture-independent genomesequencing of an AMD community from theRichmond mine, Iron Mountain, CA, to evaluatethe extent and character of lateral gene transfer(LGT) within the community and to resolvemicrobial community function at the molecularlevel.

Microbial communities exist in several distincthabitats within the Richmond mine, includingbiofilms (subaqueous slime streamers andsubaerial slimes) and cells attached directly topyrite granules. All communities investigated todate by 16S rDNA clone libraries comprise only ahandful of phylogenetically distinct organisms,typically dominated by the iron-oxidizing generaLeptospirillum and Ferroplasma. ALeptospirillum-dominated biofilm community waschosen for detailed analysis. 16S rDNA clonelibraries and fluorescence in situ hybridization(FISH) using group-specific oligonucleotideprobes indicated that the community is made upof only 6 prokaryotic populations (see Figure; thesize of colored circles indicates 16S rDNA diver-gence within populations).

We analyzed initial community genome sequencedata from a 3 Kb shotgun library of the biofilm toestimate the community genome size. This analy-sis used an implementation of the Lander-Water-man equation that took into consideration speciesabundances determined from the data and byFISH. Results indicate that each population isdominated by a single genome type. The conclu-sion was robust, even when assembly criteria werevaried to improbable extremes and large uncer-tainties in population structure were included.However, more sequence data are needed to sta-tistically validate the finding. Furthermore, theanalysis is insensitive to genome types that occur

in low abundance. The results suggest thatreassembly of the dominant genomes of AMDcommunity members will be tractable with amodest sequencing effort. The apparent popula-tion homogeneity may arise due to the specificcharacteristics of the AMD habitat or may be awidespread phenomenon in microbial ecosystems.

Similarity searches of the initial communitygenome sequence data revealed many genes con-sistent with the chemoautotrophic lifestyle of thecommunity, including CO2 fixation genes andnitrogen fixation genes. None of these key func-tional genes had close matches to genes fromFerroplasma acidarmanus, isolated from the Rich-mond Mine, the only relevant organism for whicha complete genome is available. Thus, most ofthese genes likely come from Leptospirillum.Organism-resolved metabolic pathway informa-tion will be used to develop methods to monitormicrobial activity in the environment.

LGT is thought to play a crucial role in the ecol-ogy and evolution of prokaryotes. The extremeconditions (pH < 1.0, molar concentrations ofiron sulfate and mM concentrations of arsenic,copper and zinc, and elevated temperatures of upto 50° C) largely isolate the AMD communityfrom most potential gene donors. Naked DNA,phage and prokaryotes native to neutral pH habi-tats do not persist at pH <1.0, precluding influxof genes by transformation, transduction and con-jugation, respectively. However, prophage havebeen recognized in the Ferroplasma genomesequence and acidophilic phage have beendetected in the biofilm community. Phage may beimportant vectors for gene exchange. We have ini-tiated a collaboration to sequence the phage com-munity to assess their diversity and enhance ourability to detect prophage in the prokaryote com-munity genome data.

Comparative genome analyses indicate that F.acidarmanus and the ancestor of two acidophilicThermoplasma species belonging to theEuryarchaeota have traded many genes withphylogenetically remote acidophilic Sulfolobus spe-cies (Crenarchaeota). The putatively transferredsets of Sulfolobus genes in Ferroplasma and theThermoplasma ancestor are distinct, suggestingindependent LGT events between organisms liv-ing in the same, and adjacent habitats. In bothcases, however, the majority of transferred genesare involved in metabolism, particularly energyproduction/conversion and amino acid trans-port/metabolism. The lack of genes transferredfrom the (sequenced) genomes of other

Environmental Genomics

Genomes to Life I 49

Page 60: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

prokaryotes is consistent with the hypothesis thatextreme acidophiles have limited access to genesfrom organisms outside their ecotype. Interest-ingly, Sulfolobus, Ferroplasma and Thermoplasma areall bounded by a single tetraether-dominatedmembrane, which may facilitate conjugation. Todate, no Sulfolobus species have been detected atIron Mountain, suggesting two possibilities toexplain the observed pattern of putatively trans-ferred genes to Ferroplasma from Sulfolobus: 1)Sulfolobus is present at Iron Mountain but inregions currently inaccessible to sampling and/or

2) the transfers occurred prior to introduction ofFerroplasma into the current geological setting.Comparative analyses of the community genomedata should improve the resolution of LGT in thecommunity.

Ultimately, our goal is to develop an understand-ing of how acidophilic organisms evolved andfunction as communities to control acid minedrainage generation. The community genomicsdata are essential for this effort.

Environmental Genomics

50 Genomes to Life I

Page 61: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Microbial Genomics

B11Strategies to Harness the MetabolicDiversity of Rhodopseudomonas palustris

Caroline S. Harwood1 ([email protected]), Jizhong Zhou2, F. Robert Tabita3,Frank Larimer2, Liyou Wu2, Yasuhiro Oda1, FedericoRey1, and Sudip Samanta1

1The University of Iowa; 2Oak Ridge National Laboratory;and 3Ohio State University

Rhodopseudomonas palustris is an extremely success-ful bacterium that can be found in virtually anytemperate soil or water sample on earth. It cangrow by adopting one of the four major meta-bolic modes: photoautotrophic growth (energyfrom light and carbon from C1 compounds),photoheterotrophic growth (energy from lightand carbon from organic compounds),chemohetero- trophic growth (carbon and energyfrom organic compounds) and chemoautotrophicgrowth (energy from inorganic compounds andcarbon from C1 compounds). Rhodopseudomonasenjoys exceptional versatility within each of thesegrowth modes. It can grow with or without oxy-gen and can use many alternative forms of carbon,nitrogen and inorganic electron donors. Itdegrades plant biomass and chlorinated pollutantsand shows promise as a catalyst for biofuel pro-duction. Rhodopseudomonas has thus become amodel to probe how the web of metabolic reac-tions that operates in the confines of a single celladjusts in response to subtle changes in environ-mental conditions. Genes for key metabolicenzymes and regulatory proteins are easily identi-fied in the 5.49 Mb Rhodopseudomonas genome. Ithas a large cluster of photosynthesis genes and acollection of additional genes that encodelight-responsive proteins. It has genes for themetabolism of diverse kinds of carbon sources,including lignin monomers, complex fatty acidsand dicarboxylic acids. It encodes two differentcarbon dioxide fixation enzymes and three differ-ent nitrogen fixation enzymes, each with a differ-ent transition metal at its active site. Each of thethree nitrogenases is active in Rhodopseudomonasand each catalyzes the conversion of nitrogen gasto ammonia and hydrogen, a biofuel.

Rhodopseudomonas can convert sunlight to ATPand derive electrons by biodegrading plant mate-rial. ATP and electrons so generated can, in turn,be used to fix nitrogen, with accompanyinghydrogen production. Because multiple systemsare involved, hydrogen production is a good start-ing point for studies or integrative metabolism byRhodopseudomonas. To achieve efficient hydrogenproduction it is important to understand howexpression levels of genes involved in photosyn-thesis, carbon dioxide fixation, lignin monomerdegradation and nitrogen fixation fluctuate inresponse to variations in conditions. It is alsoimportant to identify regulatory bottlenecks thatrestrict the flow of energy and electrons fromplant biomass to hydrogen production. Towardsthis end we have constructed a whole genomeDNA microarray of Rhodopseudomonas and we arenow using the array to analyze global patterns ofgene expression.

B13Gene Expression Profiles inNitrosomonas europaea, an ObligateChemolithoautotroph

Dan Arp1 ([email protected]), Xueming Wei1,Luis Sayavedra-Soto1, Martin G. Klotz2, JizhongZhou3, and Tingfen Yan3

1Oregon State University; 2University of Louisville; and3Oak Ridge National Laboratory

Nitrosomonas europaea derives energy for growthfrom the oxidation of ammonia to nitrite. Thisprocess contributes to nitrification in soils andwaters, often with detrimental effects incroplands, and with beneficial effects inwastewater treatment. Our long-range goal is tounderstand the molecular underpinnings for theoxidation of ammonia and other cellular processescarried out by these organisms. Towards this goal,the genome of this bacterium was sequenced atthe Lawrence Livermore National Laboratory(Jane Lamerdin, Patrick S. Chain). Through a col-laborative effort with at Oak Ridge National Lab-

Genomes to Life I 51

Page 62: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

oratory, we are developing microarrays in order toexamine whole genome expression profiles forthis organism. We are interested in understandingthe effects of nutrient shifts, starvation, and otherenvironmental changes on gene expression. Thearrays are now constructed and preliminary resultswill be presented.

Prior to the initiation of this project, the expres-sion of the genes coding for the enzymes involvedin ammonia oxidation were characterized. Thesegenes are amoCAB, coding for ammonia mono-oxygenase, and hao, coding for hydroxylamineoxidoreductase. The amo genes in particular showa strong response to ammonia. In preparation forthe microarray experiments, we were interested inlearning more about the expression of the genescoding for Rubisco. This autotroph assimilatesCO2 via this enzyme and the Calvin Cycle.Sequence data reveals that N. europaea has a type IRubisco. The Rubisco operon in N. europaea likelyconsists of five open reading frames. AlthoughRubisco is essential for the autotrophy of thisorganism, studies on its expression are scant. Weanalyzed mRNA levels of Rubisco and some othergenes by Northern hybridization with specificprobes. Rubisco large and small subunits(encoded by cbbL and cbbS) were highly expressedin growing cells. The message levels appearedhigher than those of amo and hao. In ammo-nia-deprived and stationary phase cells, RubiscomRNAs were undetectable. Particularly interest-ing is that the expression of Rubisco genes wasinversely proportional to the carbon levels in themedium. Rubisco mRNAs in cells grown in amedium with only atmospheric CO2 were severaltimes higher than those in cells in carbonate-con-taining medium. The higher the carbonate level inthe medium, the lower the Rubisco mRNA levels.This result was in contrast to house keeping genessuch as hao and carbonic anhydrase genes.

We also investigated the induction of the cbboperon and its message depletion patterns. After30 min induction in normal medium, abundantcbbL and cbbS messages were detected and theywere expressed fully after one hr. The estimatedhalflives of cbbL and cbbS were 0.5 and 0.75 hr,which were shorter than the half lives of amo andhao.

The three remaining cbb genes in the operon wereexpressed at much lower levels than cbbL andcbbS. This result may be explained by the presenceof a transcription terminator downstream of cbbS,as revealed in the DNA sequence data. Theseresults indicated that Rubisco gene expression was

dependent on ammonia, and that carbon had anegative control on Rubisco transcription.

B15Genomics of Thermobifida fusca PlantCell Wall Degradating Proteins

David B. Wilson ([email protected]), Yuan-ManHsu, and Diana Irwin

Department of Molecular Biology & Genetics, CornellUniversity, Ithaca, NY 14853

Plasmids have been successfully introduced intoThermobifida fusca YX by mating to E. coli cellscontaining a transfer plasmid and selecting forthiostrepton resistance. We have constructed a sui-cide plasmid with a defective celR gene containingan insert and are trying to knockout the celR genein T. fusca. One puzzling result is that mating doesnot work in T. fusca strain ER1, which was iso-lated from T. fusca YX by mutagenizing sporeswith methyl sulfate and selecting for a colonylacking the major extracellular protease. The twostrains should only differ by a few point muta-tions and it seems unlikely that inactivating theprotease would interfere with mating or plasmidtransfer.

We have continued our study of T. fusca XG74,the main T. fusca xyloglucanase. Surprisingly, T.fusca does not grow on xyloglucan and thisappears to be due to the lack of a transport systemfor taking up the products of xyloglucan hydroly-sis since they accumulate in the media of T. fuscacells incubated with xyloglucan. We have shownthat XG74 does allow a mixture of T. fuscacellulases to hydrolyze cellulose coated withxyloglucan, which is not hydrolyzed by the mix-ture containing only cellulases. The mixture con-taining XG74 cannot degrade the cellulose intomato cell walls, but T. fusca crude cellulase willit. We will try to identify the additional proteinsthat are required for hydrolysis. The level of Xg74in the culture supernatant of T. fusca grown ondifferent carbon sources was determined by West-ern blotting and it was low on glucose orcellobiose slightly higher on xylan and high oncorn fiber, Sulka Floc or xyloglucan.

Microbial Genomics

52 Genomes to Life I

Page 63: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

B17The Rhodopseudomonas palustris Micro-bial Cell Project

F. Robert Tabita1 ([email protected]), Janet L.Gibson1, Caroline S. Harwood2, Frank Larimer3,Thomas Beatty4, James C. Liao5, Jizhong (Joe)Zhou3, and Richard Smith6

1Ohio State University; 2University of Iowa; 3Oak RidgeNational Laboratory; 4University of British Columbia;5University of California at Los Angeles; and 6PacificNorthwest National Laboratory

The long-range objective of this interdisciplinarystudy is to examine how processes of global car-bon sequestration (CO2 fixation), nitrogen fixa-tion, sulfur oxidation, energy generation fromlight, biofuel (hydrogen) production, plus organiccarbon catabolism and metal reduction operate ina single microbial cell. The recently sequencedRhodopseudomonas palustris genome serves as theraw material for these studies since the metabolicversatility of this organism makes such studiesboth amenable and highly feasible. Bioinformaticsanalysis has allowed for a reasonable approxima-tion of many of the metabolic schemes that areutilized by this organism to catalyze the aboveprocesses. However, it appears from recent studiesthat several of the above processes arecoordinately controlled and interdependent; thusduring the first part of this project we havefocused at identifying key regulatory genes andproteins. From the genomic sequence a numberof likely target genes of opportunity were alsorevealed. These have been systematically knockedout to produce a battery of useful mutant strainsthat are employed in a variety of studies to exam-ine the regulation of metabolism. In addition, wehave taken advantage of a transposon library inwhich virtually all the open reading frames of thegenome have been interrupted. Screening thistransposon library under diverse growth condi-tions has enabled us to identify several additionalunique and previously unappreciated genetic locithat are important for the above processes. Thelatter mutant strains are currently being studied toreveal exactly how these newly identified genesand their products influence the processes understudy.

In addition to these more traditional approaches,we have also undertaken genomics, proteomics,and metabolomics oriented experiments to furtheranalyze the integrative control of metabolism,with the rationale that these approaches will help

direct our future investigations and focus ourefforts. For example, whole genome microarrayshave been prepared for R. palustris and initialstudies have commenced that are assisting ourefforts to identify additional genes that are up anddown regulated under growth conditions of inter-est, with examination of both wild-type andmutant strains underway or contemplated in thenear future. Furhermore, a Bioinformatics methodwas developed to deduce operon structure usingmicroarray data and gene distance. This methodallows the refinement of operon prediction basedon genomic information. In contrast to the theo-retical prediction based only on genomic informa-tion, this method incorporates microarray dataand considers the noise level in the microarrayexperiments. In parallel with microarray experi-ments, examination of the proteome has com-menced under selected growth conditions, usingboth whole cells and isolated intracytoplasmicmembranes (these are intracellular structures thatare employed for photochemical energy genera-tion by this and related organisms). Proteomicanalysis provides a real advantage as one canexamine the end result of transcription and trans-lation under different physiological growth condi-tions and use this protein data to relate back tothe regulation of gene expression and/or potentialposttranslational events.

The above experimental approaches will allow usto reach the eventual goal of this project; i.e., togenerate the knowledge base to model metabo-lism for the subsequent construction of strains inwhich carbon sequestration and hydrogen produc-tion are maximized.

B19Lateral Gene Transfer and the Historyof Bacterial Genomes

Scott R. Santos and Howard Ochman

Department of Biochemisty and Molecular Biophysics,University of Arizona, Tucson, AZ 85721

Comparative analyses of complete microbialsequences have brought many new insights intothe evolution of genomes and the genetic relation-ships among microorganisms. For the vast major-ity of microbial species, molecular phylogeneticrelationships have been on a single gene, i.e.,small subunit ribosomal DNA (16S rDNA).Because 16S rDNA is highly conserved � both in

Microbial Genomics

Genomes to Life I 53

Page 64: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

terms of its function and distribution in alllifeforms and its rate of change � it is particularlywell-suited for resolving the relationships amongvery divergent organisms. However, recent studiesprovide clear evidence that lateral gene transfer iscommon among bacteria with the result thatgenomes are chimeric, such that different regionswill have very different histories.

The long-range objectives of our reseach is toemploy nucleotide sequences of a large set of uni-versally distributed genes among eubacteria of dif-fering degrees of genetic relatedness in order toaddress questions relating to the role of genetransfer in shaping bacterial genomes. We areusing existing databases as well as newly deter-mined nucleotide sequences in order to designconserved primers for PCR amplification andsequencing of nucleotide sequences across taxa.These primer sets were initially derived fromalignments of 143 protein coding genes commonto the majority of eubacteria. Of these, conservedprimer sets, i.e., those of limited degeneracy thatyield products 400-2000 bp in length, could beobtained for 21 of these genes. Primer sets werethen tested against DNA templates from represen-tatives of diverse bacteria phyla, and the initialscreening identified nine genes that could beamplified reliably from over 50% of the isolates.

The target genes for which we have developedconserved primer sets are involved in a variety ofcellular functions, including translation, ribosomalstructure and biogenesis (fusA, ileS, leuS, rplB,valS), DNA replication, recombination and repair(gyrB), cell motility and secretion (lepA), nucleo-tide transport and metabolism (pyrG) and tran-scription (rpoB). This diversity makes them idealcandidates to test if genes subject to lateral trans-fer are functionally or genetically linked and, totest the phylogenetic limits to lateral gene trans-fer. We are also developing primer sets for pro-teins specific to members of the specific phyla(e.g., Proteobacteria, Spirochaetes) to explore theancestry of taxa-specific genes.

B21Environmental Sensing, MetabolicResponse, and Regulatory Networks inthe Respiratory Versatile BacteriumShewanella oneidensis MR-1

James K. Fredrickson1

([email protected]), Margie F. Romine2,William Cannon2, Yuri A. Gorby2, Mary S.Weir-Lipton2, H. Peter Lu2, Richard D. Smith2,Harold E. Trease2, and Shimon Weiss2

1Pacific Northwest National Laboratory; and 2University ofCalifornia at Los Angeles

Shewanella oneidensis MR-1 is a motile facultativebacterium with remarkable metabolic versatility inregards to electron acceptor utilization; it can uti-lize O2, nitrate, fumarate, Mn, Fe, and S0 as ter-minal electron acceptors during respiration. Thisversatility allows MR-1 to efficiently compete forresources in environments where electron acceptortype and concentration fluctuate in space andtime. The ability to effectively reduce polyvalentmetals and radionuclides, including solid phase Feand Mn oxides, has generated considerable inter-est in the potential role of this organism inbiogeochem- ical cycling and in thebioremediation of contaminant metals andradionuclides. In spite of considerable effort, thedetails of MR-1�s electron transport system andthe mechanisms by which it reduces metals andradionuclides remain unclear. Even less is knownregarding the molecular networks in this organ-ism that allow it to respond to compete efficientlyin a changing environment. The entire genomesequence of MR-1 has been determined and highthroughput methods for measuring gene expres-sion are being developed and applied.

DOE recognized that in order to achieve the goalsof the Genomes to Life Program, obtaining acomprehensive systems-level level understandingof the components and functions of the cell thatgive it life, a united effort integrating the capabili-ties and talents of many would be required. Tothis end, the Shewanella Federation was formed toprobe in detail the functions of Shewanellaoneidensis MR1 cells. The Federation consists ofteams of scientists from academia, national labora-tories, and private industry (shewanella.org) thatare working together in a collaborative, coordi-nated mode to jointly achieve a comprehensiveunderstanding of the biology of this remarkablyversatile organism. This project is contributing to

Microbial Genomics

54 Genomes to Life I

Page 65: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

the collaborative experiments of the Federation byproviding, among other things, characterizedchemostat cultures of MR-1 and high resolutionseparation and high mass accuracy and sensitivityFourier transform ion cyclotron resonance(FTICR) mass spectrometry for global proteomeanalyses of these cultures. To date, >50% of thepredicted 5000+ proteins in MR-1 have beenidentified by accurate mass tags (AMTs) with anaverage of 3 AMTs per protein.

To date, we have established procedures for grow-ing MR-1 in continuous culture under both aero-bic and anaerobic (with fumarate) conditions withlactate as the growth-limiting nutrient for the ini-tial collaborative Shewanella experiments. The ini-tial results from 2-D PAGE analysis of proteins byC. Giometti (ANL) revealed that there were sub-stantial variations in the proteome in the aerobicvs. anaerobic cultures and that biological repli-cates of chemostat samples were in excellentagreement. Microarray based analyses of mRNAexpression are underway in collaboration with J.Zhou (ORNL) as is MS-based proteome analysesat PNNL. An unanticipated result was the forma-tion of flocs in the aerobic chemostat. Floc forma-tion was due to the production of exopolymericsubstance (EPS) by MR-1 and was hypothesizedto be due to a defense mechanism against O2stress (i.e., induced by O radicals). In the absenceof Ca, flocs are unable to form likely due to a lackof cross- linking of the EPS. We are currently gen-erating additional MR-1 continuous cultures inthe absence of Ca under aerobic and anaerobicconditions as well as under O2 limiting condi-tions. The major goal of these experiments is toprovide insights into gene and protein expressionpatterns under these conditions as a baseline forfuture experiments with other types of electronacceptors, such as metals and radionuclides ornitrate, and for identifying genes that are poten-tial targets for mutagenesis.

Another approach for characterizing differentialgene expression can be provided by analysis ofreporter activity mediated by transcriptionalfusions. Because reporter activity can be measuredin living cells in real-time, the use of transcript-ional fusions is more amenable than aremicroarrays to dynamic measurements of geneexpression analysis under many growth conditionsand measurements can be made at the level ofindividual cells as opposed to a bulk average ofthe entire population. We describe the construc-tion of a small targeted reporter library (62 con-structs) in MR-1, whereby promoter-containingDNA sequences upstream to genes associated

with electron transport, adhesion, and cell signal-ing were cloned in a broad-host range plasmidupstream to the green fluorescent protein (GFP).Using MR-1 bearing GFP reporter constructs, wehave demonstrated that 1) the vector, pProbe-NT,utilized is stable after several passages in theabsence of antibiotic, 2) GFP is stable for at leasta week, and 3) very little time is needed for fluo-rescence development, even where cells are grownanaerobically prior to making fluorescence mea-surements. Initial testing using a crude assaydeveloped to measure the effect of growth in sus-pension versus on solid surfaces, suggest thatexpression of the promoter upstream to mtrDEFis significantly higher on surfaces than in suspen-sion. Similar, but not so dramatic, effects wereobserved for other promoter constructs. Proposedmethods for high throughput analyses with theseconstructs and the utility of these assays fordesign of complementary microarray analyses areunderway.

Outer membrane vesicles (MVs) are unique toGram-negative bacteria, are initiated by the for-mation of �blebs� in the outer membrane, and arereleased from the cell surface during growth, trap-ping some of the underlying periplasmic contentsin the process. Membrane vesicles provide anexcellent means to identity proteins that are local-ized to the outer portions of the MR-1 cell enve-lope without disturbing cellular integrity or theneed to further fractionate cells. Mass spectromet-ric analyses of vesicles isolated from MR-1 cellsgrown on LB supplemented with fumarate andlactate revealed the presence of 18 outer mem-brane and 12 periplasmic proteins. Proteins thatwere identified include electron transport pathwaycomponents (OmcA, OmcB, MtrB, CymA,fumarate reductase, and formate dehydrogenasealpha and Fe-S subunits), five putative porins,three proteases, proteins involved in protein mat-uration (PpiD and DsbA), and two transport pro-teins (long-chain fatty acids and tungstate). Inaddition, these samples contained FlaA flagellinproteins and the MshA pilin protein, head and tailproteins from prophage LambdaSo and MuSo2which, along with several other putative innermembrane and cytoplasmic proteins, probablyco-purified with vesicles. The presence of phagecoat proteins in these samples suggests that a frac-tion of cells within MR-1 cultures are undergoinglysis during culture and may explain why proteinspredicted to be associated with the inner mem-brane or cytoplasm were also detected in MVpreparations. The presence of electron transportproteins shown in vitro to be capable of reducingFe(III) is consistent with related findings in S.

Microbial Genomics

Genomes to Life I 55

Page 66: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

putrefaciens CN32, a close relative of MR-1,where vesicles have been shown to mediateFe(III), U(VI) and Tc(VII) reduction.

A64Interdisciplinary Study of Shewanellaoneidensis MR-1’s Metabolism andMetal Reduction

Eugene Kolker ([email protected])

BIATECH (www.biatech.org), 19310 N. Creek Parkway,Suite 115, Bothell, WA 98011, 425.481.7200 x100, Fax:425.481.5384

Since our project became part of the ShewanellaFederation, we focused our work mostly on analy-sis of different types of data produced by globalhigh-throughput technologies to characterizegene and protein expression as well as getting abetter understanding of the cellular metabolism.Specifically, first year activities include develop-ment of:

1. New labeling technique for quantitativeproteomics, so called methyl esterificationlabeling approach, complementary to cur-rently available methods;

2. New algorithm for de novo protein sequenc-ing;

3. New statistical model for spectral analysis ofarbitrary shape data;

4. One of the first analyses of the transcriptomeof the entire microorganism;

5. New approach to predict operon structuresand transcripts within untranslated regions;

6. The first control protein experimental mix-tures with known physico-chemical charac-teristics for high-throughput proteomicsexperiments;

7. The first statistical models for peptide andprotein identifications for high-throughputproteomics analysis;

8. Shewanella metabolic capability experimentswith minimal media on aerobically & anaer-obically grown cells and transformationexperiments.

Several collaborations have been establishedwithin the Shewanella Federation with PNNL,USC, ORNL, and MSU. The first year of thisproject, supported by DOE�s Offices of Biologicaland Environmental Research and Advanced Scien-tific Computing Research, also resulted in 6 pub-lished papers.

This is a joint work of A. Keller, A. Nesvizhskii, A. Picone, B.Tjaden, D. Goodlett, S. Purvine, S. Stolyar, and T. Cherny doneat BIATECH and ISB.

B23Integrated Analysis of Protein Com-plexes and Regulatory NetworksInvolved in Anaerobic Energy Metabo-lism of Shewanella oneidensis MR-1

Jizhong Zhou1 ([email protected]), Dorothea K.Thompson1, Matthew W. Fields1, Adam Leaphart1,Dawn Stanek1, Timothy Palzkill2, Frank Larimer1,James M. Tiedje3, Kenneth H. Nealson4, Alex S.Beliaev5, Richard Smith5, Bernhard O. Palsson6,Carol Giometti7, Dong Xu1, Ying Xu1, Mary Lipton5,James R. Cole3, and Joel Klappenbach3

1Oak Ridge National Laboratory; 2Baylor College ofMedicine; 3Michigan State University; 4University ofSouthern California; 5Pacific Northwest NationalLaboratory; 6University of California at San Diego; and7Argonne National Laboratory

Large-scale sequencing of entire genomes repre-sents a new age in biology, but the greatest chal-lenge is to define cellular responses, genefunctions, and regulatory networks at thewhole-genome/proteome level. The key goal ofthis project is to explore whole-genome sequenceinformation for understanding the genetic struc-ture, function, regulatory networks, and mecha-nisms of anaerobic energy metabolism in themetal-reducing bacterium Shewanella oneidensisMR-1. To define the repertoire of MR-1 genesresponding to different terminal electron accep-tors, transcriptome profiles were examined inbatch cultures grown with fumarate, nitrate,thiosulfate, DMSO, TMAO, ferric citrate, ferricoxide, manganese dioxide, colloidal manganese,and cobalt using DNA microarrays covering~99% of the total predicted protein-encodingopen reading frames in S. oneidensis. Total RNAwas isolated from cells exposed to different elec-tron acceptors for 3.5 h under anoxic conditionsand compared to RNA extracted from cells underfumarate-reducing conditions. Microarray analy-

Microbial Genomics

56 Genomes to Life I

Page 67: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

ses revealed significant differences in globalexpression patterns in response to different anaer-obic respiratory conditions. The data indicated anumber of genes that displayed preferential induc-tion in response to specific terminal electronacceptors. This work represents an important steptowards the goal of characterizing the anaerobicrespiratory system of S. oneidensis MR-1 on agenomic scale.

To understand the molecular basis of anaerobicenergy metabolism in MR-1, more than 20 geneswith putative functions in global gene regulation,energy metabolism and adaptive cellular responsesto stress were inactivated by deletion mutagenesis.Three double mutants defective in two global reg-ulatory genes were also obtained. Genetic, bio-chemical and physiological characterization ofthese mutants are currently underway. Also, a ran-dom Shewanella phage display library is beingconstructed, and this library will contain 10 mil-lion unique inserts with insert sizes ranging from300 bp to about 1.2 kb. Thus, the library shouldhave a fusion point for approximately every basepair in the genome. In addition, the conditionsfor cloning individual open reading frames fromShewanella were optimized, and now the designand synthesis of primers for amplifying individualgenes are underway.

B25Global Regulation in theMethanogenic Archaeon Methanococcusmaripaludis

John Leigh1 ([email protected]), MurrayHackett1, Roger Bumgarner1, Ram Samudrala1,William Whitman2, Jon Amster2, and Dieter Söll3

1University of Washington; 2University of Georgia; and 3YaleUniversity

Methanococcus maripaludis is a modelhydrogenotrophic methanogenic archaeon.Growth on hydrogen and carbon dioxide resultsin the production of methane as a waste product.M. maripaludis stands out among methanogenicarchaea as an ideal model species because of fastreproducible growth, a genome sequence, andeffective genetic tools. Genetic manipulations inM. maripaludis are increasingly facile. In our col-laboration under the Department of Energy�sMicrobial Cell Program we are studying globalregulation by hydrogen, amino acid starvation,

growth rate, and other conditions. Little is knownof these regulatory systems in Archaea, especiallyat the global level. Our approaches include con-tinuous culture ofM. maripaludis, expressionarrays, proteomics, measurement of metabolitelevels, determination of tRNA charging, andgenetic manipulation.

To date our team has installed dual chemostatsrunning with anaerobic gas sources and has estab-lished reproducible growth conditions over arange of growth rates. We are in the process ofcalibrating the continuous culture system forgrowth under various nutrient limitations includ-ing hydrogen. Besides having well controlledsteady-state conditions, a particular virtue of thecontinuous culture approach is that it will allowus to distinguish the specific effects of nutrientlimitation from the general effects of growth rate.As work preliminary to our global analysis ofhydrogen regulation we have produced lacZfusions to mtd (encoding the fourth step inmethanogenesis) and two fdh genes (encodingformate dehydrogenases) and have demonstrateddifferential regulation under high- and low hydro-gen regimes. In preparation for our study of theresponse to amino acid starvation we have con-structed two amino acid-auxotrophic mutants,including a mutant in the biosynthesis of leucine(isopropyl malate synthase, leuA) and a mutant inthe common pathway of aromatic amino acidsbiosynthesis (3-dehydroquinate dehydratase,aroD). The genotypes of these mutants have beenconfirmed, and the mutants possess the expectedauxotrophic phenotypes. We are also preparing tostudy other regulatory systems, and for this pur-pose we have constructed null mutations in genesencoding potential transcriptional regulatory pro-teins, including members of the AsnC, MerR andGntR families. For the analytical aspects of ourglobal regulation studies we are implementing anapproach to proteomics that in preliminary testsappears well suited to quantitative global analysisof the proteomes of prokaryotic organisms withrelatively small genomes. In this approach, poolsof proteolytic peptides are fractionated by severalstages of liquid chromatography, analyzed by tan-dem mass spectrometry, and computationallymatched to open reading frames in the genome.For expression studies at the mRNA level we havepurchased a DNA array forM. maripaludis andare in the process of spotting glass slides. Theexpression array center at the University of Wash-ington is continuing to develop data analysis toolsthat will facilitate our manipulation and integra-tion of expression array data, proteomic data, andannotation information.

Microbial Genomics

Genomes to Life I 57

Page 68: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

B27Identification of Regions of LateralGene Transfer Across theThermotogales

Karen E. Nelson ([email protected]), EmmanuelMongodin, Ioana Hance, and Steven R. Gill

The Institute for Genomic Research, 9712 Medical CenterDrive, Rockville, Maryland, Telephone: 301-838-3565, Fax:301-838-0208

The genome of Thermotoga maritima MSB8 wascompletely sequenced in 1999. Whole genomeanalysis of this bacterium suggested that 24% ofthe DNA sequence was most similar to that ofarchaeal species, primarily to Pyrococcus sp. Manyof these open reading frames (ORFs) that werearchaeal-like were clustered together in large con-tiguous pieces that stretched from 4 to 21 kb insize, were of atypical composition when comparedto the rest of the genome, and shared gene orderwith the archaeal species that they were most sim-ilar to. The analysis of the genome suggested thatthis organism had undergone extensive lateralgene transfer (LGT) with archaeal species. Inde-pendent biochemical analyses by Doolittle andworkers using degenerative PCR and subtractivehybridization techniques have also revealed genetransfer and extensive genomic diversity acrossdifferent strains of Thermotoga. Genes involved insugar transport, polysaccharide degradation aswell as subunits ATPases were found to be vari-able.

We have created a whole-genome microarraybased on the completed genome sequence that hasbeen used to do comparative genome hybridiza-tions (CGH) with 12 different Thermotogastrains/species that include Thermotoga sp. RQ2,Thermotoga neopolitana NS-ET and Thermotogathermarum LA3. PCR products representing the1879 total T. maritima MSB8 ORFs have beenspotted in duplicate on Corning UltraGap slides.Two flip-dye experiments have been conductedper strain. Genes were considered to be sharedbetween the 2 compared strains if the ratio(MSB8/experimental strain) was between 1 and 3,and considered to be absent if the ratio wasgreater than 10. Analysis of the resulting datademonstrates that there is a high level of variabil-ity in the presence and absence of genes across thedifferent Thermotoga strains/species.

Of the strains that have been compared to thesequenced MSB8, RQ2, PB1platt and S1-L12B

share the highest level of genome conservationwith MSB8. Only 129 ORFs in the MSB8genome (1866 ORFs in total) did not havehomologues in the RQ2 genome. These include45 hypothetical proteins and 13 conserved hypo-thetical proteins, as well as 23 (18% of total thatare absent) that are involved in transport. Ofthese 129, 18 occur as single ORFs, and theremaining correspond to islands that range in sizefrom 2 kb to 38 kb. For strain S1-L12B, 9.4 % ofthe ORFs in MSB8 do not have homologs in thisgenome. Of these 174, 48 occur as single ORFs,and there are a total of 22 islands larger than 2 kbthat are absent. Sixty-six ORFs correspond tohypothetical proteins, and 29 ORFs correspondto conserved hypothetical proteins. In addition,6.9% are devoted to transport. Ten percent (186)of the MSB8 ORFs do not have homologs in PB1(55 hypothetical proteins, 33 conservedhypotheticals), 16% of which are involved intransport. There are a total of 18 islands greaterthan 2kb in size that are absent from this strain. T.thermarum LA3 appears to be the most distantlyrelated to MSB8. Initial data analysis suggests thatlateral gene transfer across hyperthermophilesmay be mediated by repetitive sequences that canbe found in all these species. Interestingly, there isa high percentage of genes that are sharedbetween T. maritimaMSB8 and Thermotoga strainPB1platt that was isolated from an oil field inAlaska.

We have also designed, ordered and receivedprimers to create a Pyrococcus furiosus genomemicroarray, and are in the process of diluting theprimers and generating the PCR products thatrepresent the entire genome. It is anticipated thatwe will conduct experiments similar to theThermotoga comparative genome hybridizationstudy.

Microbial Genomics

58 Genomes to Life I

Page 69: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

B29The Dynamics of Cellular StressResponses in Deinococcus radiodurans

Michael J. Daly1, Jizhong Zhou2, James K.Fredrickson3, Richard D. Smith3, Mary S. Lipton3,and Eugene Koonin4

1Uniformed Services, University of the Health Sciences,4301 Jones Bridge Road, Bethesda, MD 20814; Tel:301-295-3750; 2Oak Ridge National Laboratory, OakRidge, TN; 3Pacific Northwest National Laboratory,Richland, WA; and 4National Center for BiotechnologyInformation, NIH, Bethesda, MD

Deinococcus radiodurans (DEIRA) is the mostcharacterized member of the radiation resistantbacterial family Deinococcaceae. It is non-patho-genic, amenable to genetic engineering, and his-torically best known for its extreme resistance togamma radiation [1]. The bacterium can growand functionally express cloned foreign genes inthe presence of 60 Gy /hour, and can surviveacute exposures that exceed 15,000 Gy withoutlethality [1, 2]. How this feat is accomplished isunknown, and a long-term goal of our GTL pro-ject is a detailed understanding of the molecularpathways underlying this phenotype. Based on itsremarkable robustness, DEIRA is also beingdeveloped for bioremediation of radioactivemixed waste sites containing radionuclides, heavymetals, and toxic organic compounds [2]. Using acombination of computational and whole-celltechnologies, we are analyzing expression net-works in DEIRA to map its cellular repair path-ways, and also using information gleaned fromthis work to facilitate its development forbioremediation.

Whole genome sequencing, annotation, and com-parative analyses for DEIRA [1, 3] have givenrise to the development of new experimentalwhole-cell technologies dedicated to this organ-ism. In collaboration with co-investigators atPNNL, our groups have developed proteomicmethodology that uses high-resolution liquidchromatography and Fourier transform ion cyclo-tron resonance mass spectrometry to characterizean organism�s dynamic proteome. Using this tech-nology, >61% of the predicted proteome ofDEIRA has been characterized with high confi-dence [4]. This represents the broadest proteomecoverage for any organism to date. And, in collab-oration with co-investigators at ORNL andNCBI, we have constructed a whole-genomemicroarray (WGM) for DEIRA and used it to

examine global RNA expression dynamics duringrecovery from high-dose irradiation [5].

Already, proteomic and WGM research hasrevealed an unprecedented view of the molecularsystems involved in the resistance phenotypes ofDEIRA, and has also facilitated the constructionof DEIRA strains capable of detoxifying highlyradioactive waste environments. For example,with respect to novel genes that may be involvedin radiation resistance, we have confirmed theinvolvement of several that show marked induc-tion following irradiation [5]. This work has alsoalerted us to how metabolic strategies, not gener-ally associated with DNA protection or repair,could enhance its resistance functions. DEIRAswitches its metabolic pathways in response toirradiation, minimizing oxidative stress produc-tion and optimizing recovery [6]. We now believethat a comprehensive knowledge of DIERAmetabolism is key to advancing our understand-ing of its extreme radiation resistance, as well asextending its intrinsic metabolic functions forbioremedia- tion. In the past, our goal of engi-neering DEIRA for complete mineralization oftoluene could not be reached because of uncer-tainties regarding engineering strategies relatingto its metabolic configuration. These uncertaintieswere overcome following proteomic and WGManalyses that untangled some of the complexitiesof DEIRA metabolism, and revealed how DEIRAintermediary metabolism could be integrated withcomplete toluene oxidation. As a result, we havesuccessfully engineered DEIRA strains that cancompletely mineralize toluene, as demonstrated innatural sediment and groundwater analogs ofDOE contaminated environments [7]. In sum-mary, the GTL program is providing us a uniqueopportunity to bring together state-of-the-sciencehigh-throughput whole-cell technologies toexplore fundamental and applied aspects ofDeinococcus radiodurans.

1. K. S. Makarova, et al. (2001) The genome of the extremelyradiation resistant bacterium Deinococcus radiodurans viewedfrom the perspective of comparative genomics. Microbiologyand Molecular Biology Reviews 65, 44–79.

2. H. Brim, et al. (2000) Engineering Deinococcus radioduransfor metal remediation in radioactive mixed waste environ-ments. Nature Biotechnology 18, 85–90.

3. O. White, et al. (1999) Sequencing and functional analysis ofthe Deinococcus radiodurans genome. Science 286,1571–1577.

4. M. S. Lipton, et al. (2002) Global analysis of the Deinococcusradiodurans R1 proteome using accurate mass tags. Proc.Natl. Acad. Sci. USA, 99, 11049–11054.

Microbial Genomics

Genomes to Life I 59

Page 70: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

5. Y. Liu, J. Zhou, A. Beliaev, J. Stair, L. Wu, D.K. Thompson,D. Xu, A. Venkateswaran, M. Omelehenko, M. Zhai, E. K.Gaidamakova, K. S. Makarova, E. Koonin, and M. J. Daly(2002) Transcriptome dynamics of Deinococcus radioduransrecovering from ionizing radiation. Submitted.

6. A.Vasilenko, A.Venkateswaran, H. Brim, Y. Liu, J. Zhou, K.S. Makarova, M. Omelchenko, D. Ghosal, and M. J. Daly(2002) Relationship between metabolism, oxidative stressand radiation resistance in the family Deinococcaceae. Sub-mitted.

7. H. Brim, J. P. Osborne, A Venkateswaran, M. Zhai, J. K.Fredrickson, L. P. Wackett, and M. J. Daly (2002). Facili-tated Cr(VI) Reduction by Deinococcus radiodurans Engi-neered for Complete Toluene Mineralization. Submitted.

B9Uncovering the Regulatory NetworksAssociated with Ionizing Radia-tion-Induced Gene Expression in D.radiodurans R1

John R. Battista1 ([email protected]), Ashlee M.Earl1, Heather A. Howell2, and Scott N. Peterson2

1Department of Biological Sciences, Louisiana StateUniversity and A & M College, Baton Rouge, LA 70803;and 2The Institute for Genomic Research, Rockville, MD20850

In an effort to determine which genes are LexAregulated in D. radiodurans a lexA defective strainof D. radiodurans R1, GY10912, was evaluatedusing microarray analysis. Under normal,unstressed conditions over 100 transcripts weremore abundant in GY10912 than in the wild-typestrain suggesting that a large fraction (3%) of D.radiodurans genome is regulated by this repressor.However, only 22 genes from the LexA con-trolled gene set overlapped with the 71 genes pre-viously determined to be stress induced followingexposure to ionizing radiation in wild type R1.There is absolutely no overlap between the �classi-cal� SOS regulon of E. coli and LexA controlledgenes in D. radiodurans. When a 3,000Gy dose ofionizing radiation is administered to GY10912only 7 additional genes are induced includingrecA. Since a LexA defect does not render D.radiodurans sensitive to ionizing radiation, it isassumed that the cell only needs to up-regulatethese 29 loci: the 22 LexA dependent loci and the7 LexA independent loci. The LexA independentinduction is in part controlled by the IrrE protein(DR0167). IrrE is a positive effector that wheninactivated results in loss of ionizing radiation

resistance. Loss of IrrE prevents the expression of13 loci that are normally induced in response toionizing radiation. All 13 of these loci overlapwith those genes thought to confer resistance toGY10912.

B31Analysis of Proteins Encoded on the S.oneidensis MR-1 Chromosome, TheirMetabolic Associations, and ParalogousRelationships

Margrethe H. Serres*, Maria C. Murray, andMonica Riley

*Presenter

Marine Biological Laboratory, Woods Hole, MA 02543

Proteins encoded by the Shewanella oneidensisMR-1 chromosome have been analyzed for theirsequence similarity to proteins encoded in 49completely sequenced microbial genomes. It isour goal to elucidate the metabolic pathways in S.oneidensis in order better to understand the meta-bolic capabilities of this versatile organism. Thegenome is also being analyzed for paralogous orsequence similar proteins within the chromosome.Such protein families have been found useful inassigning putative functions to gene products andhave provided insight into the evolution of thegenome and the functions encoded within.

Sequence similarity searches were done usingDarwin and an alignment requirement of at least83 amino acids. Sequence matches were found for82% of the encoded proteins at a similarity of<=250 PAM units. Vibrio cholerae, Pseudomonasaeruginosa, and Yersinia pestis showed the highestpercentage of best hits at 23%, 10%, and 7%,respectively.

To identify metabolic pathways we conferred withthe GenProtEC and EcoCyc/BioCyc databasesand with published literature. We used theMultiFun classification system consisting of thefollowing classes; metabolism, information trans-fer, regulation, transport, cell processes, cell struc-ture, location, extra-chromosomal origin, DNAsites and cryptic genes. Currently more than 3800cell function assignments have been made to 1560S. oneidensis proteins. Among these, 740 proteinswere assigned to metabolism, including 193 pro-teins classified as having a role in respiration. An

Microbial Genomics

60 Genomes to Life I

Page 71: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

overview of pathways, missing steps and similar-ity to other genomes will be presented.

In order to determine paralogous protein families,proteins encoded by fused or composite geneshave to be identified and their sequences sepa-rated into entities (modules) of independent evo-lutionary origin. The pair-wise alignments fromthe Darwin analysis described above have beenused to discern modules. Currently 170 (4%) ofthe chromosomally encoded S. oneidensis proteinshave been identified as composite. This number isslightly higher than for the E. coli genome. An ini-tial grouping of protein modules is in progressand will be presented.

B33Finishing and Analysis of the Nostocpunctiforme Genome

S. Malfatti1([email protected]), L. Vergez1, N.Doggett2, J. Longmire2, R. Atlas3, J. Elhai4, J.Meeks5, and P. Chain1

1Biology and Biotechnology Research Program, LawrenceLivermore National Laboratory, Livermore, CA; 2BioscienceDivision, Los Alamos National Laboratory, Los Alamos,NM; 3Department of Biology, University of Louisville,Louisville, KY; 4Department of Biology, University ofMissouri, St. Louis, MO; and 5Section of Microbiology,University of California, Davis, CA

In an effort to explore the role of diverse microor-ganisms in global carbon sequestration, theDOE�s JGI has drafted the genomic sequence ofseveral microorganisms that play unique roles insoil and ocean ecosystems. Elucidation of theirgenetic content will allow de novo identificationof metabolic pathways relevant to understandingthe physiological and genetic controls of photo-synthesis, nitrogen fixation and general carboncycling. A major contributor to the sequestrationof CO2 in organic compounds, Nostoc punctiformeis a unique nitrogen-fixing cyanobacterium thatcan differentiate into three types of specializedcells (N2-fixing, motile and spore-like), is capableof forming symbiotic associations with fungi andplants and has the ability to grow rapidly undercompletely dark heterotrophic conditions. Wehave undertook the task of finishing and analyz-ing the genome of N. punctiforme strain ATCC29133, which is likely to yield novel informationon global regulation of multiple developmentalpathways, symbiotic association, and phylogeneticrelation to the other cyanobacteria including the

now complete Prochlorococcus and Synechococcusstrains.

The genome of N. punctiforme is very large for aprokaryote, nearly 10Mb, which makes this anattractive genome to complete. There are cur-rently near 100 prokaryotic genomes that havebeen sequenced and closed, yet the average fin-ished genome size is only ~3Mb and fewer than10 have a genome size larger than 6Mb. Thus,there is quite possibly a bias in our prokaryoticgenome knowledge toward smaller genomes andthere may be some distinctive features in largerprokaryotic genomes that require finer resolution.

The genome was drafted by the JGI to 10-foldcoverage due to the large number of contigsobserved at 5-, 6- and 7-fold; however, the num-ber of contigs greater than 2kb with >10 readsremained essentially the same, near 230 contigs.Additional rounds of automated primer design forfinishing purposes has not significantly reducedthis number. The reason for the difficulties in fin-ishing are likely due to the prevalence of repeatedelements within the genome. This is supported bythe high coverage (25- to 50-fold) of entirecontigs of large size (up to 110 kb). The identifi-cation and resolution of these repeated elements isbeing performed with the use of a fosmid scaffoldalong with a suite of computational tools. Anoverview of the progress in finishing N.punctiforme will be presented.

This work was performed under the auspices of the U. S.Department of Energy by the University of California, LawrenceLivermore National Laboratory under Contract No.W-7405-Eng-48.

B35In Search of Diversity: UnderstandingHow Post-Genomic Diversity is Intro-duced to the Proteome

Barry Moore, Chad Nelson, Norma Wills, JohnAtkins, and Raymond Gesteland([email protected])

Department of Human Genetics, University of Utah, SaltLake City, UT

Analysis of an organism�s genome is the primarytool for understanding the diversity of proteinproducts present in an organism. However, asproteomics focuses more work on the proteinsthemselves, it is apparent that the diversity of the

Microbial Genomics

Genomes to Life I 61

Page 72: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

proteome goes well beyond that indicted in thegenome. Much of the energy and resources inproteome projects have been invested in variouspeptide based mapping strategies�principally by2-D gel electrophoresis followed by MALDI-MSanalysis. One shortcoming of peptide approaches,however, is that they do not allow for determina-tion of the full-length protein mass. This meansthat while it can be determined that a particularregion of the genome (the ORF) is translated intoprotein, no direct evidence is provided about thefull-length protein product, and there is no indica-tion of the diversity of different proteins productstranslated from that ORF.

A number of studies have indicated that regionsof sequenced genomes not currently identified asORFs are translated, or are likely to be translated.Furthermore, numerous mechanisms are knownby which diversity is introduced to an organism�sproteome post-genomically, and there are a widevariety of bioinformatics algorithms either avail-able or in development that attempt to predictthese events from genome sequence. Ultimately,we must go to the proteins themselves to fullyunderstand when and where post genomic diver-sity is arising and to refine our ability to predictfrom genome analyses the diversity of proteinsderived from an organism�s genome. We aredeveloping a dual affinity tagging system in theyeast Saccharomyces cerevisiae that allows us tobuild fusions to regions of the genome wherenovel ORFs are predicted, or where non-standardtranslational events are expected to occur. Fusionproteins are expressed in yeast and purified byaffinity chromatography to produce a proteinpure enough for analysis by electrospray massspectrometry. Analysis of a highly accurate fulllength mass in combination with tryptic peptidefragments and MS/MS analysis of those fragmentsallows a thorough understanding of the proteinsfull length covalent state, and the diversity ofproducts produced from a particular ORF.

B37The Microbial Proteome Project: ADatabase of Microbial Protein Expres-sion in the Context of Genome Analy-sis

Carol S. Giometti1 ([email protected]), GyorgyBabnigg1, Sandra L. Tollaksen1, Tripti Khare1,Derek R. Lovley2, James K. Fredrickson3, KennethH. Nealson4, Claudia I. Reich5, Gary J. Olsen5,Michael W. W. Adams6, and John R. Yates III7

1Argonne National Laboratory; 2University ofMassachusetts; 3Pacific Northwest National Laboratory;4University of Southern California; 5University of Illinois atUrbana; 6University of Georgia; and 7The Scripps ResearchInstitute

Although complete genome sequences can beused to predict the proteins expressed by a cell,such predictions do not provide an accurateassessment of the relative abundance of proteinsunder different environmental conditions. Inaddition, genome sequences do not define thesubcellular location, biomolecular and cofactorinteractions, or covalent modifications of proteinsthat are critical to their function. Analysis of theprotein components actually produced by cells(i.e., the proteome) in the context of genomesequence is, therefore, essential to understandingthe regulation of protein expression. As the num-ber of complete microbial genome sequencesincreases, vast amounts of genome and proteomeinformation are being generated.

In parallel with the proteome analysis of numer-ous microbial systems, researchers at ArgonneNational Laboratory are developing methods formanaging and interfacing the diverse data typesgenerated by both genome and proteome studiesas part of Argonne�s Microbial Proteome Project.The goal is to provide users with a highly interac-tive database that contains proteome informationin the context of genome sequence in formats thatare conducive to data interrogations that will pro-vide answers to biological questions. To achievethat goal, three World Wide Web-based databasesare currently being developed and maintained:Proteomes2, ProteomeWeb, and GelBank.

The Proteomes2 database (proteomes2.bio.anl.gov) serves as a Laboratory Information Manage-ment System for the management of sample dataand related two-dimensional gel electrophoresis(2DE) patterns. This password-protected site pro-vides DOE project collaborators with access to

Microbial Genomics

62 Genomes to Life I

Page 73: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

data access from multiple sites through theInternet. The database currently contains theexperimental details for approximately 1000 sam-ples from seven different microbes (Shewanellaoneidensis, Geobacter sulfurreducens, Prochlorococcusmarinus, Methanococcus jannaschii, Pyrococcusfuriosus, Rhodopseudomonas palustris, andDeinococcus radiodurans) and links each samplewith multiple protein patterns. Over 4000 proteinpattern images are currently accessible.

ProteomeWeb (http://ProteomeWeb.anl.gov) isan interactive public site that provides the identifi-cation of expressed microbial proteins, links togenome sequence information, tools for miningthe proteome data, and links to metabolic path-ways. Data from proteome analysis experimentsare included in the ProteomeWeb database whengenome sequences are deposited in GenBank.Currently, the results from experiments designedto alter protein expression inM. jannaschii areaccessible on this site, and results from S.oneidensis experiments are in the process of beingincorporated.

GelBank currently includes the complete genomesequences of approximately 90 microbes and isdesigned to allow queries of proteome informa-tion. The database is currently populated withprotein expression patterns from the ArgonneMicrobial Proteomics studies and will accept datainput from outside users interested in sharing andcomparing proteome experimental results.

Oracle9i RDBMS � the common foundation forall three Argonne proteomics databases � allowsthe integration of genome sequences, sampledescriptors, protein expression data (e.g., 2DEimages and numerical data), peptide masses, andannotation. The database management architec-ture being developed currently addresses proteinexpression data from 2DE analysis of protein mix-tures, but it is being designed to have the flexibil-ity to accommodate mass spectrometry data aswell.

This research is funded by the United States Department ofEnergy, Office of Biological and Environmental Research, underContract No. W-31-109-ENG-38.

B39Analysis of the Shewanella oneidensisProteome in Cells Grown in Continu-ous Culture

Carol S. Giometti1 ([email protected]), Mary S.

Lipton2 ([email protected]), Gyorgy Babnigg1,Sandra L. Tollaksen1, Tripti Khare1, James K.Fredrickson2, Richard D. Smith2, Yuri A. Gorby2,and John R. Yates III3

1Argonne National Laboratory; 2Pacific Northwest NationalLaboratory; and 3The Scripps Institute

We are using two complementary methods toidentify and quantify the proteins expressed byShewanella oneidensis MR-1 grown under differentconditions in continuous culture. Cells are grownin chemostats under aerobic, O2-limited, or anaer-obic conditions with fumarate provided as theelectron acceptor. Cells are harvested and aliquotsof the same samples are analyzed for protein byusing (1) two-dimensional gel electrophoresis(2DE) coupled with peptide mass analysis and (2)capillary liquid chromatography separations cou-pled with Fourier transform ion cyclotron reso-nance (FTICR) mass spectrometry. The 2DEpatterns readily reveal statistically significant dif-ferences in protein abundance and in proteinpost-translational modifications that result fromdifferent growth conditions. The proteins that dif-fer significantly in expression are then identifiedon the basis of their tryptic peptide masses. Inparallel with the 2DE analysis, we are using accu-rate mass tags (AMTs) to identify all of the S.oneidensis proteins expressed under specific growthconditions and to detect quantitative differencesby using stable isotope labeling methods. Byusing these two different protein separation anddetection methods, we are able to more compre-hensively analyze the proteome than by usingeither method alone. Specifically, 2DE provides arapid turnaround �snap shot� of the major proteindifferences (including post-translational modifica-tions) under different growth conditions, andAMTs provide a comprehensive inventory of theexpressed proteins in each sample.

This work is funded by the U.S. Department of Energy, Officeof Biological and Environmental Research, under contract No.W31-109-ENG-38 (Argonne National Laboratory) and contractNo. DE-AC06-76RLO1830 (Pacific Northwest National Labo-ratory).

Microbial Genomics

Genomes to Life I 63

Page 74: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

B41The Molecular Basis for Metabolic andEnergetic Diversity

Timothy Donohue1 ([email protected]),Jeremy Edwards2, Mark Gomelsky3, JonathanHosler4, Samuel Kaplan5, and William Margolin5

1Bacteriology Department, University ofWisconsin-Madison; 2Chemical Engineering Department,University of Delaware; 3Department of Molecular Biology,University of Wyoming; 4Department of Biochemistry,University of Mississippi Medical Center; and 5Departmentof Microbiology & Medical Genetics, University of TexasMedical School at Houston

Our long-term goal is to engineer microbial cellswith enhanced metabolic capabilities. As a firststep, this team of scientists and engineers seeks toacquire a thorough understanding of energy-gen-erating processes and genetic regulatory networksof the photosynthetic bacterium, Rhodobactersphaeroides. The ability to capitalize on the meta-bolic activities of this versatile bacterium wasincreased by the completion of the R. sphaeroidesgenome sequence at the DOE-supported JointGenome Institute. The R. sphaeroides Genomes toLife Consortium is deciphering importantenergy-generating activities of this bacterium andstudying the assembly and operation of energygenerating machines. The long term goals ofthese efforts are to acquire the informationneeded to design microbial machines that degradetoxic compounds, remove greenhouse gases, orsynthesize biodegradable polymers with increasedefficiency. At the February, 2003 workshop, wewill provide a progress report on our analysis ofthe metabolic capabilities of this facultative micro-organism, particularly on the identification ofproteins and regulators that are central to growthvia respiration and the utilization of solar energyby photosynthesis.

In one line of experiments, we have begun tocharacterize the multitude of aerobic respiratoryenzymes that this bacterium uses to generateenergy in the presence of O2. The goal of theseexperiments is to engineer strains that can moreefficiently extract energy from nutrients in thepresence of O2.

We have begun to visualize the assembly of thephotosynthetic apparatus that this bacterium usesto harvest solar energy. When this analysis is com-bined with modeling of solar energy utilization bythis well-studied microbe, we hope to identify

principles that will allow us to engineer microbeswith increased ability to generate energy fromlight.

The feasibility of generating strains with increasedcapacity for using solar radiation is enhanced byour identification of previously unrecognized pig-ment-binding proteins in the photosyntheticapparatus of R. sphaeroides. Efforts are currentlyunderway to understand the contribution of thesenew pigment-binding proteins to the utilizationof light energy that is critical for growth ofphotosynthetic bacteria and plants.

Simultaneously we have initiated a program toidentify new regulators of the solar lifestyle of thisbacterium and to use gene chips to identify addi-tional proteins that could be critical to the utiliza-tion of light energy. Both of these approacheshave provided new candidates that are currentlybeing analyzed for their role as regulators or con-tributors to the utilization of solar energy.

We hope to illustrate why this cross-disciplinary,systems approach to the analysis of energy genera-tion by this facultative bacterium can provide newinsights into fundamental aspects of energy gener-ation and the utilization of solar energy by thisand other photosynthetic organisms.

B43Integrative Studies of Carbon Genera-tion and Utilization in theCyanobacterium Synechocystis sp. PCC6803

Wim Vermaas1 ([email protected]), RobertRoberson1, Julian Whitelegge2, Kym Faull2, RossOverbeek3, and Svetlana Gerdes3

1Arizona State University; 2University of California LosAngeles; and 3Integrated Genomics, Inc.

The research focuses on photosynthetic and respi-ratory electron transport and on carbon utilizationin the cyanobacterium Synechocystis sp. PCC 6803.The research compares wild type with targetedmutants lack the photosystems, the terminal oxi-dases, succinate dehydrogenase, and/or othercomplexes that are important for photosynthesisand respiration.

Microbial Genomics

64 Genomes to Life I

Page 75: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Physiological Analysis

1. Carbon utilization: Mutants lacking terminaloxidases were found to accumulate large inclu-sions that resembled poly-â-hydroxybutyrate bod-ies. As these inclusions were by far most prevalentin mutants lacking terminal oxidases, they mostlikely result from fermentation reactions. The verysignificant accumulation ofpolyhydroxyalkanoates (up to about 25% of thetotal cell volume) in Synechocystis mutants lackingthe terminal oxidases suggests a potential applica-tion of this strain in light-driven production of�bioplastic� materials.

2. Membrane biogenesis: One major question thatremains in chloroplasts and cyanobacteria is howthylakoids are formed. A gene that apparently isinvolved with thylakoid biogenesis has been inter-rupted at various positions, and the interruptionthat has segregated (with the site of insertionclose to the end of the gene) showed a normalultrastructural phenotype. We are also inactivatinggenes involved with cell division, in the hope ofcreating a larger cyanobacterial cell that is moreeasily analyzed by light microscopy techniques.

3. Toward a system suitable for crystal structure anal-ysis: Synechocystis has proven to be an excellentmodel system from a standpoint of moleculargenetics, physiology, etc. However, for elucidationof structures of membrane proteins (another goalthat should be pursued in microbial cell projects)thermophilic cyanobacteria, such asThermosynechococcus elongatus, for which a genomesequence is known, are very much preferable. Forfull utility of this system, mutants that possiblyimpair the photosystems should be used andtherefore the organism should be able to growphotoheterotrophically, which it currently isunable to do. Therefore, we have initiated randommutagenesis experiments to see whether aThermosynechococcus mutant can be obtained thatcan grow under conditions other thanphotoautotrophic ones.

Structural Analysis

Much progress has been made in ultrastructuralpreservation and documentation of cytoplasmicorganization in Synechocystis sp. PCC 6803.

1. Three-dimensional reconstruction: Using thick(0.2-0.3 µm) sections of Synechocystis, we havecollected data on high- and intermediate-voltagetransmission electron microscopes (0.2-1 MeV)to three-dimensionally image the intracellular

organization of Synechocystis sp. PCC 6803 cells.In this procedure, thick sections are incrementallytilted from -60o to +60o and serial tilt views elec-tronically captured at 1.5o intervals. We have pro-duced the first high-resolution three-dimensionalimages of a cyanobacterium, and are now in theprocess to identify physical relationships betweenthylakoid and cytoplasmic membranes, the fate ofmembranes upon cell division, etc. Clearly,tomography using thick sections is an extremelypowerful technique that is providing new insightsinto the ultrastructural organization of the cell.

2. Membrane organization: We observe �thylakoidcenters� in dividing cells. These structures werediscovered about two decades ago in anothercyanobacterium, and apparently have been forgot-ten since. Thylakoid centers are located at thepoint of apparent origin of thylakoid membranes(i.e., the point of thylakoid membrane conver-gence) near the cytoplasmic membrane. At thesespecialized regions, membranes either terminateor turn 180° in close proximity to the cytoplasmicmembrane. The thylakoid center apparentlyextends as a tube-like structure for at least 200nm. We are interested in exploring the role andcomposition of these thylakoid centers.

Proteomics

With the complete genome sequence available,comparative proteome analysis is underway withthe goal of providing data that will integrate withthe ultrastructural work and in mutants of photo-synthesis and respiration. By using sub-fraction-ation of soluble or membrane preparations it ispossible to preserve native protein complexes pro-viding functional insights. Soluble fractions areexamined by size-exclusion chromatography and2D-gel electrophoresis for evidence of thecarboxysome seen in EM work to be more abun-dant in the SDH-less mutant. Membrane prepara-tions are sub-fractionated by sucrose-gradientcentrifugation to separate thylakoids from cyto-plasmic membranes, and to search for prepara-tions that may be enriched in thylakoid centers, inorder to better understand the specialization ofeach membrane system and the traffickingbetween them. Membrane protein complexes arebeing separated by size-exclusion chromatographyunder non-denaturing conditions aftersolubilization of membranes with detergents asthe first dimension of a 2D chromatography sys-tem. A second dimension denaturing chromatog-raphy system incorporates intact protein massmeasurement (LC-MS+). Fractions collected dur-ing LC-MS+ are used for protein identification

Microbial Genomics

Genomes to Life I 65

Page 76: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

by fingerprinting and sequencing and are alsoavailable for further analysis by ultra-high resolu-tion Fourier transform mass spectrometry(FT-MS). The successful analysis of intactphotosystem I reaction-center polypeptides PsaBand PsaA of mass 81,167 and 82,876 Da, respec-tively, demonstrates that intact mass proteomicscan be applied to large integral membrane pro-teins with resolution sufficient to detect singlemethionine oxidations at this size.

Bioinformatics

Integrated Genomics began work on the GTLProject in October 2002; hence progress in thisarea has been somewhat limited (but not insignifi-cant) so far. The initial objectives are:

1. To support the wet lab characterization of criti-cal genes by making specific predictions thatwill be tested experimentally by our partners.

2. To produce a detailed metabolic reconstructionto support a more complete understanding andmodeling of this organism.

Integrated Genomics is coordinating closely withother members of the group to establish prioritiesrelating to prediction of gene function. During

the first month, we have made two predictions,one addressing the lycopene cyclase, and onerelated to a gene associated with phycobilisomes.The lycopene cyclase is by far the more interestingof these predictions, and other members of thegroup have begun experiments to confirm orreject our candidate gene. We have now begun atabulation of predictions.

The second broad goal is to develop a detailedmetabolic reconstruction for Synechocystis sp. PCC6803. In particular, two aspects of this effort arepursued: (1) development of a detailed graphi-cal/web-based interface to a reconstruction con-necting functional components to genes in theorganism, and (2) development of a preciselyencoded version of the reaction network in a formsuitable for supporting modeling efforts.

Altogether, the GTL project on Synechocystis sp.PCC 6803 provides integrated func-tional-genomics information regarding the struc-ture of the organism and the function ofphotosynthesis and respiration related processes,based on physiological, ultrastructural,proteomics, and bioinformatics experimentationon the wild type and targeted mutants.

Microbial Genomics

66 Genomes to Life I

Page 77: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Genomes to Life I 67

Technology Development

B45Comparative Optical Mapping: A NewApproach for Microbial ComparativeGenomics

Shiguo Zhou1 ([email protected]), Thomas S.Anantharaman2, Erika Kvikstad1, Andrew Kile1,Mike Bechner1, Wen Deng3, Jun Wei3, ValerieBurland3, Frederick R. Blattner3, Chris Mackenzie6,Timothy Donohue4, Samuel Kaplan6, and David C.Schwartz1,5 (dcschwartz @facstaff.wisc.edu)

1Laboratory for Molecular and Computational Genomics,2Animal Health and Biomedical Sciences, 3Laboratory ofGenetics, 4Department of Bacteriology, and 5Department ofChemistry, University of Wisconsin-Madison, Madison, WI53706; and 6Department of Microbiology and MolecularGenetics, University of Texas-Houston Medical School,6431 Fannin St., Houston, TX 77030

The recent plethora of sequenced genomes hasjust ushered in a new era of genetics-basedresearch. Although an impressive number of spe-cies have been, or are planned to be sequenced,the full value of such efforts will be fully accruedwhen patterns of variation can be discerned andannotated for many strains or isolates within agiven species. As such, bacteria are an ideal placeto start investigations aimed at the discernment ofgenome rearrangements, chromosome deletions,and horizontal transfer of foreign DNA, sincethese events help drive bacterial evolution. Forexample, genome remodeling events may causeirreversible gene loss, or add novel functionalitiesto an organism. Unfortunately, currentapproaches do not adequately identify and charac-terize such large-scale genomic rearrangements.The Optical Mapping System, developed in ourlaboratory creates high resolution maps of entiregenomes, using DNA directly extracted fromcells�this approach obviates the need for librar-ies, PCR, and probe technologies. The systemuses a complex blend of single molecule technolo-gies to enable high throughput and the construc-tion of reliable maps. This capability has beenproven by the construction and sequence compar-ison of 13 bacterial optical maps, and 3 parasites.Recent advances in both throughput and resolu-tion of the Optical Mapping System has enabledgenomic comparisons amongst different strains of

the same species or closely related species, allow-ing for the pinpoint discernment of insertions,deletions and rearrangements. Comparisons ofoptical maps vs. in silico maps, and in silico vs. insilico maps constructed for two strains of Yersiniapestis (CO-92 biovar Orientalis and KIM), E. coliK12 and Shigella flexneri 2a, two strains of S.flexneri (2a and Y), and two strains of Rhodobactersphaeroides (2.4.1 and ATCC 17029) haverevealed regions of homology, insertion sites and apanoply of rearrangements. These results portendthe wide use of Optical Mapping to uniquely pro-vide genome structural details for a large numberof strains, isolates or even closely related species,in ways that would complement direct sequenceanalysis.

B47Optical Mapping of Multiple MicrobialGenomes

Shiguo Zhou1 ([email protected]), MichaelBechner1, Erika Kvikstad1, Andrew Kile1, SusanReslewic1, Aaron Anderson1, Rod Runnheim1,Jessica Severin1, Dan Forrest1, Chris Churas1,Casey Lamers1, Samuel Kaplan4, Chris Mackenzie4,Timothy J. Donohue2, and David C. Schwartz1, 3

(dcschwartz @facstaff.wisc.edu)

1Laboratory for Molecular and Computational Genomics,2Department of Bacteriology, and 3Department of Chemistry,University of Wisconsin-Madison, Madison, WI 53706; and4Department of Microbiology and Molecular Genetics,University of Texas-Houston Medical School, 6431 FanninSt., Houston, TX 77030

Our laboratory has developed Optical Mapping,which is a proven system for the construction ofordered restriction maps from individual DNAmolecules directly extracted from cells. Such mapshave utility in large-scale sequencing efforts byproviding scaffolds for assembly, and as an inde-pendent means for validation, since entiregenomes are mapped without the use of clones orPCR amplicons. Given the major increase in thethroughput of Optical Mapping, we have usedthis system to map multiple microbial genomes,

Page 78: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

which included; Thalassiosira pseudonanna (dia-tom), Enterococcus faecium, Pseudomonasfluorescens, Rhodobacter sphaeroides andRhodospirillum rubrum. These efforts were fundedby DOE to help expedite and validate parallelsequencing projects. Our optical mapping datashowed that T. pseudonanna has a haploid genomesize of 33.8 Mb, possessing 22 chromosomesranging from 658 kb to 3,322 kb, while othergenomes had the following features: E. faecium(2.8 Mb), P. fluorescens (6.9 Mb, which is 1.4 Mblarger than expected size of 5.5 Mb), R. rubrum(4.2 Mb instead of the expected 3.4 Mb), and R.sphaeroides (4.2 Mb). Overall the map resolutionvaried from a low resolution map of averagerestriction fragment size of 50 kb for R. rubrum(XbaI), to a high resolution map with an averagefragment size of 6.0 kb for the R. sphaeroidesHindIII map. Comparison of optical maps of R.sphaeroides with the nascent sequence contigs ofthe genome sequencing project detailed the utilityof optical maps in expediting sequencing projectsby the identification of misassemblies withinnascent sequence contigs, gap characteristics andthe orientation of sequence contigs.

B49Identification of ATP Binding Proteinswithin Sequenced Bacterial GenomesUtilizing Phage Display Technology

Suneeta Mandava, Lee Makowski, and Diane J.Rodi ([email protected])

Combinatorial Biology Unit, Biosciences Division, ArgonneNational Laboratory, Argonne, IL 60439

This project is applying a novel approach togenome-wide identification of small moleculebinding proteins. Our ability to rapidly sequenceprokaryotic genomes is generating an unprece-dented amount of DNA sequence data. In spite ofthe large number of functional genomics toolscurrently available, typically about 40% of pre-dicted ORFs remain unidentified in terms offunction. Our results demonstrate that the simi-larity between the sequence of a protein and thesequences of phage-displayed peptides affin-ity-selected against small molecules can be predic-tive for that protein binding to the smallmolecule. In this project, we utilize tagged deriva-tives of the common metabolite ATP fixed to asolid surface as bait to capture peptide-bearingphage particles from solution. Population analysis

of hundreds of these captured peptides demon-strates that subpopulations represent portions ofknown ATP-binding motifs.

To test our ability to predict ATP binding sitelocations within a protein the similarity betweenthe sequences of our affinity selected ATP pep-tides and the sequences of known ATP bindingproteins from the PDB were calculated. Shownhere is an example of this technique applied to theATP-binding protein phosphoenolpyruvatecarboxykinase: (Renderings were carried outusing RASMOL in which the segments of maxi-mum similarity are designated in red, and blue thelowest similarity.)

To determine whether or not this method can beused as a global approach to predicting whichproteins bind ATP within a sequenced genome, aswell as to identify the position of those ATP bind-ing sites, we have developed software which com-pares the ATP-binding peptide pool to entiregenome protein sequences. Comparison of theATP-binding peptide populations for affinity totwo sets of data, ATP-binding proteins in the Pro-tein Data Bank and the entire E. coli K12proteome confirmed that annotation of openreading frames for small molecule binding is pos-sible using this method. Successful identificationof residues within 7 Å of the ATP ligand withinPDB structures is accomplished in 75% of pro-teins tested. Alignment of all the proteins in theE. coli proteome by peptide-similarity score segre-gates ATP binders to the top of the list whencompared to control scores obtained with alter-nate ligand-selected pools of peptides.

The best characterized of the conserved sequencemotifs that bind ATP or GTP is a glycine-richregion called a P-loop or Walker A box, which

Technology Development

68 Genomes to Life I

Page 79: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

typically forms a flexible loop between abeta-strand and an alpha-helix. In general, thisloop has been shown to interact with one of thephosphate groups of the nucleotide within crystalstructures. However, in two recently publishedP-loop-containing crystal structures, the P-loopmotif is located far from the ATP-binding pocketof the protein. In these two cases maximum simi-larity with our ATP-binding peptide populationresides within the actual ATP-binding site asopposed to the P-loop site. This discrepancy maybe the result of crystal contacts that alter ATPbinding, or a confirmation of the binding funneltheory of Nussinov, which describes the process ofligand binding as a series of short-livedconformational ensembles along a decreasingenergy landscape. This interpretation of theseobservations predicts that although a P-loop motifis predictive of ATP binding, and is the initial siteof an early-lived molecular handshake with ATP,the ATP ligand may not remain in the proximityof the P-loop in the final global free energy mini-mum conformation. This scenario similarly pre-dicts that there may exist alternate primarysequence motifs predictive for ligand bindingwithin the same protein sequence, such as themotif identified with our ATP-binding peptidepool.

Distribution of the software, analysis methodsand data generated from this ongoing project isbeing accomplished by the construction of anORACLE web-based database named RELIC(for REceptor LIgand Contacts). The architectureof the database is currently in place, and is sched-uled to be made accessible to the public withinthe next three months, subsequent to optimiza-tion of software GUIs and a web-based usersmanual.

B51Development of Vectors for DetectingProtein-Protein Interactions in Bacte-ria

Peter Agron and Gary Andersen([email protected])

Biology and Biotechnology Research Program, LawrenceLivermore National Laboratory, Livermore, CA 94551

We are interested in developing better approachesfor mapping bacterial protein-protein interac-tions, particularly in Caulobacter crescentus, amodel system for studying cellular differentiationand the cell cycle. Because of the advantages forhigh-throughput screening, our focus has been onusing Escherichia coli as a host for two-hybridassays. Initially, the BacterioMatch system fromStratagene was tested with 11 pairs of Caulobactergenes known to encode interacting proteins.Interactions were detected with several pairs, butthe results were not found to be easily reproduc-ible. Also, no interaction was observed with FtsZ,which is known to dimerize with high affinity.Therefore, we have constructed new vectors basedon protein-fragment complementation withmouse dihydrofolate reductase (DHFR). In thisassay, fusions to two portions of DHFR willreconstitute enzyme activity if tethered by pro-tein-protein interactions, thus conferringtrimethoprim resistance to the host astrimethoprim specifically targets the endogenousDHFR. The vectors have compatible repliconswith different markers, thus allowing effectivescreening in E. coli. We are initially testing this

Technology Development

Genomes to Life I 69

Page 80: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

system with bacteriophage ë cI and CaulobacterftsZ, two genes encoding proteins that dimerizewith high affinity. Using this system, we are alsoplacing the interacting domains of additional C.crescentus gene pairs in either the 5� or 3� orienta-tion to each of the two portions of DHFR. Alibrary of randomly sheared C. crescentus DNAfragments is being placed in either orientation tothe carboxy-terminal DHFR protein fragments toscreen for known interactions. Based on theseresults we will test up to 200 additional genes ofinterest for protein-protein interactions with therandom-fragment C. crescentus library.

B53Development and Use of Microarray-Based Integrated Genomic Technol-ogies for Functional Analysis of Envi-ronmentally ImportantMicroorganisms

Jizhong Zhou1 ([email protected]), Liyou Wu1,Xiudan Liu1, Tingfen Yan1, Yongqing Liu1, SteveBrown1, Matthew W. Fields1, Dorothea K.Thompson1, Dong Xu1, Joel Klappenbach2, JamesM. Tiedje2, Caroline Harwood3, Daniel Arp4, andMichael Daly5

1Oak Ridge National Laboratory; 2Michigan StateUniversity; 3University of Iowa; 4Oregon State University;and 5Uniformed Services University of the Health Sciences

Microarrays constitute a powerful genomics tech-nology for assessing whole-genome expressionlevels and defining regulatory networks. Underthe support of the DOE Microbial Genome Pro-gram, whole genome microarrays containing indi-vidual open reading frames were constructed forShewanella oneidensis MR-1 (~4.9 Mb),Deinococcus radiodurans (3.2 Mb),Rhodopseudomonas palustris (4.8 Mb), andNitrosomonas europaea (2.7 Mb) at Oak RidgeNational Laboratory. DNA fragments having lessthan 75% similarity to other sequences in thegenome were selected as specific probes using ourautomatic primer design program,PRIMERGEN. The majority of the probes havethe size of less than 1 kb. To obtain sufficientPCR products for array fabrication, genes wereamplified 8 or 16 times in a total reaction volumeof 100 ul. The amplified products were thenpooled and purified using automated procedures.In total, approximately 4700, 3046, 4508, and2354 genes were amplified from the S. oneidensis,

D. radiodurans, R. palustris, and N. europaeagenomes, respectively. Additional sets of primerswere designed for genes that did not giveexpected amplification products or that gavelow-quality amplification. A 50-mer specificoligonucleotide was designed and synthesized forgenes that did not yield desired PCR productsafter two attempts with PCR primers. Thegenome coverage for all four bacteria ranged from95 to 99%. Evaluation of microarray quality bydirect scanning, PicoGreen staining andmicroarray hybridization indicated that themicroarray printing quality was very good interms of spot morphology, intensity and unifor-mity, and the constructed microarrays have beensent to many collaborators at different institu-tions.

To evaluate the performance of 50-meroligonucleotide arrays for gene expression, 96genes from S. oneidensis MR-1 were randomlyselected. Microarrays containing various lengthsof oligonucleotides (30-70 nt) and PCR productsfrom the same set of genes were constructed. Pre-liminary hybridization results indicated that thesensitivity of oligonucleotide probes is signifi-cantly lower than that of PCR products. Furtherin-depth comparisons are ongoing.

Genetic mutants are important to functionalgenomics analysis. However, mutant generation isvery time-consuming and labor-intensive. Thus, asimple, high-throughput single-strandoligonucleotide mutagenesis approach has beenevaluated in S. oneidensis. A new genetic vectorcontaining appropriate sets of genes was con-structed. Our preliminary results indicated thatthe developed genetic vector can replicate well inS. oneidensis and confer desired properties. Furtherwork includes the introduction of single-strandoligonucleotides for targeted gene deletion. Also,a protocol for efficient transformation of MR-1cells by electroporation was developed, which is acritical step for developing a high-throughput sin-gle-stranded oligonucleotide-based genetic sys-tem.

Technology Development

70 Genomes to Life I

Page 81: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

B55Electron Tomography of Whole Bacte-rial Cells

Ken Downing ([email protected])

Lawrence Berkeley National Laboratory

In our initial work to develop electron tomogra-phy of intact cells and explore its limits of applica-bility, we have established culture and preparationconditions for a number of small microbes thatmay be potential targets for this work. Theseinclude Magnetospirillum magnetotacticum,Caulobacter crescentis and Mesoplasma florum. Wehave shown that we can record 2-D projectionimages by electron microscopy of each of these infrozen-hydrated preparations. We thus retain thenative state with no stain or other contrastenhancements, but can see a wealth of internalstructures. The mesoplasma is of particular inter-est since it is among the smallest and simplest liv-ing organisms, while these bacteria are sufficientlythin that we can obtain good data with an elec-tron beam energy of 300-400 kV. We have col-lected several sets of preliminary tomographicdata using facilities at the Max Planck Institute inMartinsried, Germany. 3-D reconstructions com-puted from these data are far more informativethan the projection images. As expected, the 3-Dmaps show textures, representing distribution ofproteins and/or nucleic acids, that vary within theorganism, as well as some interesting and unex-pected internal membrane structures. A numberof steps need to be taken before we can begin torelate the densities seen in these reconstructions toindividual protein complexes, but the preliminary

data does suggest that this will indeed be feasible.Our own electron microscope, which will be espe-cially well suited for this type of tomography, ispresently being installed. Once it is in operationwe will be able to further optimize our datarecording protocols, including implementation ofdual-axis tomograms, to improve the resolutionand interpretability of the reconstructions.

B57Single Cell Proteomics—D. radiodurans

Norman J. Dovichi([email protected])

Department of Chemistry, University of Washington, Seattle,WA 98195-1700

We are developing technology to monitor changesprotein expression in single tetrads of D.radiodurans following exposure to ionizing radia-tion. We hypothesize that exposure to ionizingradiation will create a distribution in the amountof genomic damage and that protein expressionwill reflect the extent of radiation damage.

To test these hypotheses, we will develop the fol-lowing technologies:• Fluorescent markers for radiation exposure

• DNA/rRNA determination of each cell in a D.radiodurans tetrad

• Two-dimensional capillary electrophoresis analysisof the protein content of a single tetrad

Technology Development

Genomes to Life I 71

Downing— Fig. 2. Section from a tomographic reconstruction of a

cell undergoing division, with flagellum beginning to bud from left

end. Patches of the periodic surface layer protein and internal

membrane structures are visible in this section.

Downing— Fig. 1. Electron micrograph of frozen-hydrated

Caulobacter crescentis. The sample was rapidly frozen, with no

stain or other contrast enhancement, preserving the native

structure of the cell components.

Page 82: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

• Ultrasensitive laser-induced fluorescencedetection of proteins separated by capillaryelectrophoresis

These technologies will be combined to determineprotein expression in single tetrads of D.radiodurans, the extent of DNA damage followingexposure to Cs-137 radiation, and the amount ofchromosomal and rRNA per cell. This technologywill be a powerful tool for functional analysis ofthe microbial proteome and its response to ioniz-ing radiation.

We have generated a number of fully automatedtwo-dimensional capillary electrophoresis separa-tions of proteins extracted from D. radiodurans.The figure below presents an example, in whichthe proteins from D. radiodurans are first sub-jected to capillary SDS-DALT separation, whichis the capillary version of SDS-PAGE usingreplaceable polymers. Like SDS-PAGE,SDS-DALT separates proteins based on theirmolecular weight, with low molecular weight pro-teins migrating first from the capillary. Fractionsare successively transferred to a second capillary,where proteins are separated in a sub-micellarelectrophoresis buffer. Components are detectedwith an ultrasensitive laser-induced fluorescencedetector at the exit of that capillary. Over 150fractions are successively transferred from the firstcapillary to the second to generate a comprehen-sive analysis of the protein content of this bacte-rium. Data are stored in a computer and manipu-lated to form the pseudo-silver stain image offigure 1. We estimate that there are between 200and 300 components resolved in this separation.

B59Genomes to Proteomes to Life: Appli-cation of New Technologies for Com-prehensive, Quantitative and HighThroughput Microbial Proteomics

Richard D. Smith ([email protected]), James K.Fredrickson, Mary S. Lipton, David Camp, GordonA. Anderson, Ljiljana Pasa-Tolic, Ronald J. Moore,Margie F. Romine, Yufeng Shen, Yuri A. Gorby, andHarold R. Udseth

Biological Sciences Division, Pacific Northwest NationalLaboratory, Richland, WA 99352

Achieving the Genomes to Life (GtL) Programgoals will require obtaining a comprehensive sys-tems-level understanding of the components andfunctions that give a cell life. At present ourunderstanding of biological processes is substan-tially incomplete; e.g. we do not know with goodconfidence all the biomolecular players in even themost studied pathways and networks in microbialsystems. It is clear that many important signaltransduction proteins will be present only at verylow levels (~ hundreds of copies per cell) and willprovide extreme challenges for current character-ization methods. There is also a growing recogni-tion of the limitations associated with geneexpression (e.g. cDNA array) measurements.Increasing evidence indicates that the correlationbetween gene expression and protein abundancescan be low, and that the correlation between geneexpression and gene function is even lower. Thus,global protein characterization (proteomic) stud-ies actually complement gene expression measure-ments.

Successes in genome sequencing efforts haveincreased interest in proteomics and also providedan informatic foundation for high throughputmeasurements. As a result, a key capability envi-sioned for success of the GtL program is the abil-ity to broadly identify large numbers of proteinsand their modification states with high confi-dence, as well as to measure their abundances.The challenges associated with making usefulcomprehensive proteomic measurements includeidentifying and quantifying large sets of proteinsthat have relative abundances spanning more thansix orders of magnitude, that vary broadly inchemical and physical properties, that have tran-sient and low levels of modifications, and that aresubject to endogenous proteolytic processing.Additionally, proteomic measurements should not

Technology Development

72 Genomes to Life I

Page 83: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

be significantly biased against e.g. membrane,large or small proteins. A related need is the abil-ity to rapidly and reliably characterize proteininteractions with other biomolecules, particularlytheir multi-protein complexes. The combinedinformation on protein complexes and thechanges observed from global proteome measure-ments in response to a variety of perturbations isessential for the development of detailed compu-tational models for microbial systems and theeventual capability for predicting their responsee.g. to environmental changes and mutations.

We report on development and application of newtechnologies for global proteome measurementsthat are orders of magnitude more sensitive andfaster than existing technologies and that promiseto meet many of the needs of the GtL program.The approaches are based upon the combinationof nano-scale ultra-high pressure capillary liquidchromatography separations and high accuracymass measurements using Fourier transform ioncyclotron resonance (FTICR) mass spectrometry.Combined, these techniques enable the use ofhighly specific peptide �accurate mass and time�(AMT) tags. This new approach avoids thethroughput limitations associated with other massspectrometric technologies using tandem massspectrometry (MS/MS), and thus enables funda-mentally greater throughput and sensitivity forproteome measurements. Additional new develop-ments have also significantly extended thedynamic range of measurements to approximatelysix orders of magnitude and are now providingthe capability for proteomic studies from verysmall cell populations, and even single cells. A sig-nificant challenge for these studies is the immensequantities of data that must be managed andeffectively processed and analyzed in order to beuseful. Thus, a key component of our programinvolves the development of the informatic toolsnecessary to make the data more broadly availableand for extracting knowledge and new biologicalinsights from complex data sets.

The development of this new technology is pro-ceeding in concert with its applications to a num-ber of microbial systems (initially ShewanellaoneidensisMR1, Deinococcus radiodurans R1 andRhodopseudomonas palustris) in collaboration withleading experts on each organism. This research isproviding the first comprehensive information onthe nature of expressed proteins by these systemsand how they respond to mutations in the organ-ism or perturbations to its environment. Initialstudies applying these approaches have demon-strated the capability for automated high-confi-

dence protein identifications, broad and unbiasedproteome coverage, and the capability for exploit-ing stable-isotope (e.g. 15N) labeling methods toobtain high precision relative protein abundancemeasurements from microbial cultures. These ini-tial efforts have demonstrated the most completeprotein coverage yet obtained for a number ofmicroorganisms, and have begun revealing newbiological understandings.

Finally, it is projected that the AMT tag approachwill be applicable for making much faster andcomprehensive �metabolome� measurements, andcan also likely be extended to the characterizationof proteomes (and metabolomes) of much morecomplex microbial communities.

This research is supported by the Office of Biological and Envi-ronmental Research of the U.S. Department of Energy. PacificNorthwest National Laboratory is operated for the U.S. Depart-ment of Energy by Battelle Memorial Institute through ContractNo. DE-AC06-76RLO 1830.

B61New Developments in StatisticallyBased Methods for Peptide Identifica-tion via Tandem Mass Spectrometry

Kenneth D. Jarman, Kristin H. Jarman, AlejandroHeredia-Langner, and William R. Cannon([email protected])

Pacific Northwest National Laboratory, Richland, WA99352

High-throughput proteomic technologies seek tocharacterize the state of the proteome in a cellpopulation in much the same manner that DNAmicroarrays seek to characterize the state of geneexpression in a cell population. Characterizationof the proteins can be done using several differentmethods, one of which is to digest the proteinsfirst, typically using trypsin, into peptides whichare then analyzed using tandem mass spectrome-try (MS/MS). A typical procedure may involveextracting cellular proteins followed by trypticdigestion and then separating the peptides withliquid chromatography. The separated peptidesare then identified by MS/MS. Ideally, peptideswill subsequently be quantitated,post-translational modifications will be deter-mined and the information regarding the peptideswill be assembled into a picture of the proteomicstate of a cell population.

Technology Development

Genomes to Life I 73

Page 84: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Just as with DNA microarrays, quality assuranceof the high-throughput process is of paramountimportance in order for proteomics to be of valueto biologists. If peptides are initially identifiedpoorly, then this information and the informationon post-translational state and quantitation ofprotein expression is not of much value. For thisreason, there has been much work recently ondeveloping peptide identification methods forMS/MS spectra. This area of research has pro-ceeded on two fronts, the first of which seeks totake advantage of the wide availability of genomesequences. The database search methods try toidentify the peptide that resulted in the observedMS/MS spectrum by picking the best candidatefrom a list of peptides generated from the genomesequence. De novo methods on the other hand,seek to sequence and hence identify a peptide sim-ply from the observed MS/MS spectrum. Regard-less of which approach is used, it is essential tohave a method for scoring each peptide so thataccurate and reliable identifications can be made.

In this work, we present a statistically rigorousscoring algorithm for peptide identification thatcan be used alone, or incorporated into a databasesearch algorithm or a de novo peptide sequencingalgorithm. Our approach is based on a probabilis-tic model for the occurrence of spectral peaks cor-responding to key partial peptide ion types. Inparticular, the ion frequencies for the most fre-quently observed ion types are initially estimatedfrom a training dataset of known sequences.These frequencies are then used to construct a fin-gerprint for any candidate peptide of interest,

where the fingerprint consists of a list of spectralpeaks and their corresponding probabilities ofappearance. A spectrum is then scored against thecandidate fingerprints using a likelihood ratiobetween the hypothesis that the candidate peptideis not present and the hypothesis that the candi-date peptide is present. This likelihood ratio canbe used for peptide identification. In addition, aprobabilistic score that estimates the probabilityof a candidate peptide being present in the testsample can be constructed from the likelihoodratio. This approach is tested using a large datasetof over 2000 spectra for tryptic peptides of differ-ent lengths ranging from 6-mer to 30-mer aminoacids. Performance results indicate that thisapproach is accurate, and consistent across differ-ent peptide lengths and experimental conditions.False positive and false negative error rates forsequence length 10-mer and shorter are generallybelow 5%, while error rates for sequences longerthan 10-mers are typically below 3%.

In addition, we present a Genetic Algorithm(GA) for de novo peptide sequencing. Unlikeother de novo construction techniques, this meth-odology does not try to build amino acid chainsby piecing together a feasible path through agraph using the spectral information available butstarts with complete sequences and attempts togradually find one that matches the target spec-trum optimally. Due to its building approach, theGA is not immediately deterred by incompletespectra, peaks produced by unusually occurringpeptide fragments or background noise.

Technology Development

74 Genomes to Life I

Page 85: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Appendix 1: Attendees ListAttendees list as of January 31, 2003.

Eivind AlmaasUniversity of Notre Dame225 Nieuwland Science HallNotre Dame, IN 46556574-631-3368, Fax [email protected]

Gary AndersenLawrence Berkeley National Laboratory1 Cyclotron Road, Mail Stop 70A-3317Berkeley, CA 94720510-495-2795, Fax [email protected]

Carl AndersonBrookhaven National LaboratoryBiology DepartmentUpton, NY 11973631-344-3375, Fax [email protected]

Gordon AndersonPacific Northwest National LaboratoryP.O. Box 999 MS K8-91Richland, WA [email protected]

Adam ArkinLawrence Berkeley National Laboratory1 Cyclotron Road MS 3-144Berkeley, CA 94720510-495-2366, Fax [email protected]

Daniel ArpOregon State University2082 CordleyCorvallis, OR 97331541-737-1294, Fax [email protected]

Nitin BaligaInstitute for Systems Biology1441 North 34th StreetSeattle, WA 98118206-732-1266, Fax [email protected]

Jill BanfieldUniversity of California BerkeleyMcCone HallBerkeley, CA [email protected]

Albert-László BarabásUniversity of Notre DameDepartment of PhysicsNotre Dame, IN 46556574-631-5767, Fax [email protected]

John BattistaLouisiana State University and A & M CollegeDepartment of Biological SciencesBaton Rouge, LA 70803225-578-2810, Fax [email protected]

Paul BayerU.S. Department of EnergyBiological and Environmental ResearchSC-75, GTN,1000 Independence Ave., SWWashington, DC 20585-1290301-903-5324, Fax [email protected]

Alex BeliaevPacific Northwest National [email protected]

Genomes to Life I 75

Page 86: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Jeffrey BlanchardNational Center for Genome Resources2935 Rodeo Park Dr.Santa Fe, NM 87505505-995-4405, Fax [email protected]

Harvey Bolton, JrBattelle/Pacific Northwest National LaboratoryP.O. Box 999, Mailstop P7-50Richland, WA 99352509-376-3950, Fax [email protected]

Daniel BondUniversity of Massachusetts at [email protected]

Mark BorodovskyGeorgia TechAtlanta, GA [email protected]

Randy BrichU.S. Department of EnergyPacific Northwest National LaboratorySite Office P.O. Box 550Richland, WA 99352509-372-4617, Fax [email protected]

Fred BrockmanPacific Northwest National [email protected]

Cindy Bruckner-LeaPacific Northwest National LaboratoryP.O. Box 999, Mailstop K8-93Richland, WA 99352509-376-2175, Fax [email protected]

Michelle BuchananOak Ridge National LaboratoryOak Ridge, TN [email protected]

William CannonPacific Northwest National Laboratory902 Battelle Blvd.Richland, WA 99352509-375-6732, Fax [email protected]

David CaseThe Scripps Research InstituteDept. of Molecular BiologyTPC15 La Jolla, CA 92037858-784-9768, Fax [email protected]

Denise CaseyOak Ridge National LaboratoryHuman Genome ManagementInformation System1060 Commerce Park, MS 6480Oak Ridge, TN 37830865-574-0597, Fax [email protected]

Swapnil ChhabraSandia National LabsMO51, Rm122Biosystems Research DepartmentOakland, CA [email protected]

Sallie ChisholmMassachusetts Institute of Technology48-425 MITCambridge, MA 02139617-253-1771, Fax [email protected]

Parag ChitnisNational Science Foundation4201 Wilson BlvdArlington, VA 22203703-292-8443, Fax [email protected]

Linda ChriseyOffice of Naval Research800 N. Quincy StreetArlington, VA 22217-5660703-696-4504, Fax [email protected]

Appendix 1: Attendees List

76 Genomes to Life I

Page 87: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

George ChurchHarvard University200 Longwood Ave.Boston, MA 02115617-432-1278, Fax [email protected]

Stacy CiufoUniversity of MassachusettsMicrobiology/ Morrill4 North Amherst, MA 01003413-577-1392, Fax [email protected]

Dean ColeU.S. Department of EnergyMedical Science Div., SC-73Washington, DC 20505301-903-3268, Fax [email protected]

James ColeMichigan State UniversityRibosomal Database Project-II,2225A Biomedical Physical SciencesEast Lansing, MI 48824517-353-3843, Fax [email protected]

Steve ColsonPacific Northwest National LaboratoryP.O. Box 999, MS-IN K8-88Richland, WA 99352509-376-4598, Fax [email protected]

Michael ColvinLawrence Livermore National LaboratoryMailstop L-448Livermore, CA 94552925-423-9177, Fax [email protected]

Maddalena CoppiUniversity of Massachusetts at AmherstDepartment of Microbiology203N Morrill IVNAmherst, MA [email protected]

Bob CoyneNational Science Foundation703-292-8439, Fax [email protected]

Terence CritchlowLawrence Livermore National Laboratory7000 East Ave MS L-560Livermore, CA [email protected]

Claribel Cruz-GarciaMichigan State UniversityCenter for Microbial Ecology Plantand Soil ScienceBuilding Rm 540East Lansing, MI 48824517-353-7858, Fax [email protected]

Brian DavisonOak Ridge National LaboratoryP.O. Box 2009Oak Ridge, TN 37831-6226865-576-8522, Fax [email protected]

Jonathan DelatizkyBBN Technologies10 Moulton StreetCambridge, MA [email protected]

Ed DeLongMBARI7700 SandholdtMoss Landing, CA 95039831-775-1843, Fax [email protected]

David DixonPacific Northwest National LaboratoryP.O. Box 999, MS K9-90Richland, WA 99352509-372-4999, Fax [email protected]

Genomes to Life I 77

Appendix 1: Attendees List

Page 88: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Mitchel DoktyczOak Ridge National LaboratoryP.O. Box 2008, MS 6123Oak Ridge, TN 37831-6123865-574-6204, Fax [email protected]

Timothy DonohueUniversity of Wisconsin-Madison1550 Linden DriveMadison, WI 53562608-262-4663, Fax [email protected]

Janet DoriganU.S. [email protected]

Norman DovichiUniversity of WashingtonDepartment of ChemistrySeattle, WA 98195-1700206-543-7835, Fax [email protected]

Kenneth DowningLawrence Berkeley National LaboratoryDonner LabBerkeley, CA 94720510-486-5941, Fax [email protected]

Daniel DrellU.S. Department of EnergySC-72/GTN 1000 Independence Ave., SWWashington, DC 20585-1290301-903-4742, Fax [email protected]

Leland EllisUSDA/ARSNational Program Staff5601 Sunnyside AvenueBeltsville, MD 20705-5138301-504-4788, Fax [email protected]

Brendlyn FaisonU.S. Department of EnergyOffice of Biological and Environmental Research301-903-0042, Fax [email protected]

Jordan FeidlerThe MITRE Corporation7515 Colshire DriveMcLean, VA 22102-7508703-883-5624, Fax [email protected]

Brad FenwickUSDA Competitive [email protected]

Matthew FieldsOak Ridge National Laboratory1 Bethel Valley RoadOak Ridge, TN 37831-6038865-241-3775, Fax [email protected]

Marvin FrazierU.S. Department of EnergyOffice of Biological and Environmental Research19901 Germantown Rd.Germantown, MD 20874301-903-5468, Fax [email protected]

Jim FredricksonPacific Northwest National LaboratoryP.O. Box 999, Mailstop P7-50Richland, WA 99352509-376-7063, Fax [email protected]

George GarrityMichigan State University6162 Biomedical Physical Sciences Bldg.East Lansing, MI 48824-4320517-432-2459, Fax [email protected]

Al GeistOak Ridge National LaboratoryP.O. Box 2008Oak Ridge, TN 37831-6367865-574-3153, Fax [email protected]

Appendix 1: Attendees List

78 Genomes to Life I

Page 89: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Svetlana GerdesIntegrated Genomics, Inc.2201 W. Campbell Park DriveChicago, IL 60612312-491-0846, Fax [email protected]

Raymond GestelandUniversity of UtahDepartment of Human Genetics15 N. 2030 E., Rm. 7410Salt Lake City, UT 84112-5330801-581-5190, Fax [email protected]

Steven GillThe Institute for Genomic Research9712 Medical Center DriveRockville, MD [email protected]

Carol GiomettiArgonne National Laboratory9700 South Cass Avenue,Building 202Argonne, IL 60439630-252-3839, Fax [email protected]

Silvia Gonzalez-AcinasMassachusetts Institute of Technology15 Vassar St., 48-108Cambridge, MA [email protected]

Andrey GorinOak Ridge National Laboratory1 Bethel Valley RoadBld. 6012, MS 6367Oak Ridge, TN 37831865-241-3972, Fax [email protected]

Debbie GracioPacific Northwest National LaboratoryP.O. Box 999 MS K1-85Richland, WA 99352509-375-6362, Fax [email protected]

Yonatan GradChurch Lab, Harvard Medical School250 Longwood Avenue, Rm 221Seeley Mudd BuildingBoston, MA 02115617-432-0063, Fax [email protected]

Masood HadiSandia National LaboratoriesP.O. Box 969, MS 99517011 East AveLivermore, CA [email protected]

Bruce HamiltonNational Science Foundation4201 Wilson Boulevard, Suite 565Arlington, VA [email protected]

Frank HarrisOak Ridge National LaboratoryP.O. Box 2008Oak Ridge, TN 37831865-574-4333, Fax [email protected]

Caroline HarwoodUniversity of IowaDepartment of Microbiology3-432 BSBIowa City, IA 52242319-335-7783, Fax [email protected]

Terry HazenLawrence Berkeley National LaboratoryMS 70A-3317One Cyclotron Rd.Berkeley, CA 94720510-486-6223, Fax [email protected]

Grant HeffelfingerSandia National LaboratoriesP.O. Box 5800, MS-0885Albuquerque, NM 87185505-845-7801, Fax [email protected]

Appendix 1: Attendees List

Genomes to Life I 79

Page 90: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Diane HevehanU.S. Department of Defense3050 Defense Pentagon, Rm. 3C257Washington, DC 20301703-693-6835, Fax [email protected]

Ed HildebrandWhite House Office of Scienceand Technology Policy1801 Pennsylvania Ave. NWWashington, DC 20502202-456-7341, Fax [email protected]

Roland F. HirschU.S. Department of EnergyMedical Sciences Division,SC-73 GTN 1000 Independence Avenue SWWashington, DC [email protected]

Lynette HirschmanMITRE202 Burlington RdBedford, MA 01730781-271-7789, Fax [email protected]

Hoi-Ying HolmanLawrence Berkeley National LaboratoryOne Cyclotron RoadBerkeley, CA 94720510-486-5943, Fax [email protected]

John HoughtonU.S. Department of EnergyBiological and Environmental Research1000 Independence Avenue, SWWashington, DC 20585-1290301-903-8288, Fax [email protected]

Peter HoytOak Ridge National LaboratoryP.O. Box 2008, MS 6123Oak Ridge, TN 37831-6123865-574-6211, Fax [email protected]

Greg HurstOak Ridge National LaboratoryP.O. Box 2008, MS 6131Oak Ridge, TN 37831865-574-6142, Fax [email protected]

Barbara JasnySCIENCE/[email protected]

Gary JohnsonU.S. Department of EnergyOffice of [email protected]

Matthew KaneNational Science Foundation4201 Wilson Blvd.Arlington, VA [email protected]

Arthur KatzU.S. Department of EnergyOffice of Biological and Environmental Research1000 Independence Avenue, SWWashington, DC 20585-11290301-903-4932, Fax [email protected]

Jay KeaslingUniversity of California201 Gilman HallDept. of Chemical EngineeringBerkeley, CA 94720-1462510-642-4862, Fax [email protected]

Martin KellerDiversa Corporation4955 Directors PlaceSan Diego, CA [email protected]

Appendix 1: Attendees List

80 Genomes to Life I

Page 91: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Steve KennelOak Ridge National LaboratoryP.O. Box 2008, MS 6101Oak Ridge, TN 37831-6101865-574-0825, Fax [email protected]

David KirchmanUniversity of Delaware700 Pilottown RoadLewes, DE 19958302-645-4375, Fax [email protected]

Joel KlappenbachCenter for Microbial EcologyMichigan State University540 Plant & Soil SciencesEast Lansing, MI 48824517-353-9021, Fax [email protected]

Michael KnotekConsultant10127 N. Bighorn Butte DriveOro Valley, AZ 85737520-591-8108, Fax [email protected]

Eugene KolkerBIATECH19310 North Creek Parkway, Ste. 115Bothell, WA 98011425-481-7200 ext 100, Fax [email protected]

Julia KrushkalUniversity of Tennessee Health Science CenterDepartment of Preventive Medicine66 N. Pauline, Ste. 615Memphis, TN 38163901-448-1361, Fax [email protected]

Henrietta KulagaContractor-IPTO20 Firstfield Rd. Suite 125Gaithersburg, MD 20878301-990-9570, Fax [email protected]

Sri KumarDARPA3701 N. FairfaxArlington, VA [email protected]

Cheryl KuskeLos Alamos National LaboratoryBioscience Division, M888Los Alamos, NM 87545505-665-4800, Fax [email protected]

Todd LaneSandia National LaboratoriesP.O. Box 969 MS9951Livermore, CA 94550-969925-294-2057, Fax [email protected]

Frank LarimerOak Ridge National Laboratory1060 Commerce Park DriveOak Ridge, TN 37831865-574-1253, Fax [email protected]

Michael LaubHarvard UniversityBauer Center for Genomics Research7 Divinity AveCambridge, MA [email protected]

Charles LawrenceWadsworth CenterEmpire State PlazaAlbany, NY 12201518-473-3382, Fax [email protected]

John LeighUniversity of WashingtonCampus Box 357242Seattle, WA 98195-7242206-685-1390, Fax [email protected]

Appendix 1: Attendees List

Genomes to Life I 81

Page 92: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Jerry LiNational Institute of General Medical Sciences45 Center Dr., Rm 2As.19FBethesda, MD 20892301-594-0682, Fax [email protected]

Tim LilburnAmerican Type Culture Collection10801 University BoulevardManassas, VA 20110703-365-2700 x599, Fax [email protected]

Debbie LindellMassachusetts Institute of TechnologyMIT 48-336, 77 Massachusetts AveCambridge, MA 02139617-258-6835, Fax [email protected]

Mary LiptonPacific Northwest National Laboratory3335 Q. AvenueRichland, WA 99352509-373-9039, Fax [email protected]

Derek LovleyDepartment of MicrobiologyUniversity of MassachusettsAmherst, MA 01003413-545-9651, Fax [email protected]

Susan LucasDOE Joint Genome Institute2800 Mitchell DriveWalnut Creek, CA 94598925-296-5638, Fax [email protected]

Diane MakowskiArgonne National Laboratory9700 S. Cass AvenueArgonne, IL 60439630-252-3963, Fax [email protected]

Stephanie MalfattiLawrence Livermore National LaboratoryP.O. Box 808Livermore, CA 94550925-424-6274, Fax [email protected]

Julie MalicoatOak Ridge Associated [email protected]

Reinhold MannPacific Northwest National LaboratoryP.O. Box 999Richland, WA 99352509-376-6764, Fax [email protected]

Betty MansfieldOak Ridge National Laboratory1060 Commerce ParkOak Ridge, TN 37830865-576-6669, Fax [email protected]

Terence MarshMichigan State University41 Giltner HallEast Lansing, MI 48824517-432-1365, Fax [email protected]

Vincent MartinLawrence Berkeley National Laboratory201 Gilman HallBerkeley, CA [email protected]

Anthony MartinoSandia National LaboratoriesP.O. Box 969, MS9951Livermore, CA 94551925-294-2095, Fax [email protected]

Lee Ann McCueWadsworth CenterP.O. Box 509Albany, NY 12201518-473-3382, Fax [email protected]

Appendix 1: Attendees List

82 Genomes to Life I

Page 93: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Shawn McLaughlinNational Oceanic and AtmosphericAdministration904 S. Morris St.Oxford, MD 21654410-226-5193, Fax [email protected]

Barbara MethéThe Institute for Genomic Research9712 Medical Center DriveRockville, MD 20850301-838-0200, Fax [email protected]

Noelle MettingU.S. Department of EnergySC-72/GTN 1000 Independence Ave., SWWashington, DC [email protected]

Simon MinovitskyLawrence Berkeley [email protected]

Bud MishraCourant Institute andCold Spring Harbor [email protected]

Julie MitchellSan Diego Supercomputer CenterUniversity of California San Diego9500 Gilman Dr. MC0527La Jolla, CA 92103858-534-5126, Fax [email protected]

Emmanuel MongodinThe Institute for Genomic Research9712 Medical Center DriveRockville, MD 20850301-838-5859, Fax [email protected]

Sue MorssArgonne National Laboratory9700 South Cass AvenueArgonne, IL 60439630-252-6784, Fax [email protected]

Ali NavidIndiana UniversityChemistry Building800 E. Kirkwood AveBloomington, IN 47405812-855-2047, Fax [email protected]

Karen NelsonThe Institute for Genomic Research9712 Medical Center DriveRockville, MD 20850301-838-3565, Fax [email protected]

Camilla NesboDalhousie University5850 College StreetHalifax, Nova Scotia,Canada B3H 1X5902-494-2968, Fax [email protected]

Frank OlkenLawrence Berkeley National Laboratory1 Cyclotron Rd.Mailstop 50B-3238Berkeley, CA 94720-8147510-486-5891, Fax [email protected]

Peter OrtolevaIndiana UniversityChemistry Building800 E. KirkwoodBloomington, IN 47405812-855-2717, Fax [email protected]

Ross OverbeekIntegrated Genomics, Inc.2201 W. Campbell Park Dr.Chicago, IL 60612630-567-7677, Fax [email protected]

Appendix 1: Attendees List

Genomes to Life I 83

Page 94: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Himadri PakrasiWashington UniversityDepartment of BiologyBox 1137St. Louis, MO 63130314-935-6853, Fax [email protected]

Anna PalmisanoU.S. Department of Energy19901 Germantown Rd.Germantown, MD 20852301-903-9963, Fax [email protected]

Morey ParangApoCom Genomics11020 Solway School Rd., Suite 101Knoxville, TN 37931865-927-6120, Fax [email protected]

Bahram ParvinLawrence Berkeley National LaboratoryM.S. 50B-2239Berkeley, CA [email protected]

Ari PatrinosU.S. Department of EnergyOffice of Biological and Environmental ResearchSC-70/GTN 1000 Independence Ave., SWWashington, DC 20585-1290301-903-3251, Fax [email protected]

Deborah PaynePacific Northwest National Laboratory902 Battelle BoulevardRichland, WA 99352509-375-2904, Fax [email protected]

Dale PelletierOak Ridge National LaboratoryP.O. Box 2008, MS 6149Oak Ridge, TN [email protected]

Martin PolzMassachusetts Institute of Technology77 Mass. Ave, 48-421Cambridge, MA [email protected]

Valerie ReedU.S. Department of EnergyOffice of Biomass Programs1000 Independence Ave, SWWashington, DC [email protected]

Haluk ResatPacific Northwest National LaboratoryP.O. Box 999, Mail Stop K1-83Richland, WA 99352509-372-6340, Fax [email protected]

Paul RichardsonDOE Joint Genome Institute2800 Mitchell DriveWalnut Creek, CA 94598925-296-5851, Fax 925-296-5875pmricharsond

Mark RintoulSandia National [email protected]

Robert RobersonArizona State UniversityDepartment of Plant BiologyTempe, AZ [email protected]

Karin RodlandPacific Northwest National LaboratoryP.O. Box 999Richland, WA 99352509-376-7605, Fax [email protected]

Appendix 1: Attendees List

84 Genomes to Life I

Page 95: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Lars RohlinUniversity of California Los Angeles405 Hilgard Ave, BH5531Los Angeles, CA 90095310-825-5849, Fax [email protected]

Charles RomineASCR/MICSU.S. Department of Energy1000 Independence Ave., SWWashington, DC 20585-1290301-903-5152, Fax [email protected]

Margie RominePacific Northwest National LaboratoryP.O. Box 999, MS P7-50Richland, WA 99352509-376-8287, Fax [email protected]

Lucian RussellExpert Reasoning & Decisions LLC6012 Jewell CourtAlexandria, VA 22312703-916-0474, Fax [email protected]

Andrey RzhetskyColumbia University1150 St. Nicholas Ave.Russ Berrie PavilionNew York, NY 10032212-851-5150, Fax [email protected]

Herbert SauroKeck Graduate Institute535 Watson DriveClaremont, CA 91711909-607-0377, Fax [email protected]

Christophe SchillingGenomatica, Inc.5405 Morehouse DriveSan Diego, CA 92121858-362-8550, Fax [email protected]

David SchwartzUniversity of Wisconsin-MadisonGenetics/Biotechnology Center425 Henry MallMadison, WI [email protected]

Margrethe SerresMarine Biological Laboratory7 MBL StreetWoods Hole, MA 02543508-289-7388, Fax [email protected]

Imran ShahUniversity of Colorado4200 E 9th Ave, B-119Denver, CO 80262303-315-7222, Fax [email protected]

Henry ShawU.S. Department of EnergyOffice of Biological and Environmental [email protected]

Arie ShoshaniLawrence Berkeley National Laboratory1 Cyclotron Rd.Berkeley, CA 94720510-486-5171, Fax [email protected]

Robert SiegelPacific Northwest National Laboratory902 Battelle BlvdRichland, WA [email protected]

Richard SmithPacific Northwest National LaboratoryP.O. Box 999 (K8-98)Richland, WA 99352509-376-0723, Fax [email protected]

Appendix 1: Attendees List

Genomes to Life I 85

Page 96: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

David SpringerPacific Northwest National LaboratoryBiological Sciences Division, K4-12Richland, WA 99352509-372-6762, Fax [email protected]

Thomas SquierPacific Northwest National LaboratoryP.O. Box 999, MS P7-53Richland, WA 99352509-376-2218, Fax [email protected]

David A. StahlUniversity of WashingtonDept Civil & Environ. Eng.302 More Hall, Box 352700Seattle, WA 98195-2700206-685-3464, Fax [email protected]

Marvin StodolskyU.S. Department of EnergyOffice of Biological and Environmental Research1000 Independence Avenue, SWSC-72/GTNWashington, DC 20585-1290301-903-4475, Fax [email protected]

T.P. StraatsmaPacific Northwest National LaboratoryP.O. Box 999, MS K1-83Richland, WA 99352509-375-2802, Fax [email protected]

Ray StultsLos Alamos National Laboratory505-667-5535, Fax [email protected]

F. Robert TabitaDepartment of MicrobiologyOhio State University484 West 12th AvenueColumbus, OH 43210-1292614-292-4297, Fax [email protected]

David ThomassenU.S. Department of EnergyOffice of Biological and Environmental Research1000 Independence Avenue, SC-72Germantown BuildingWashington, DC 20585-1290301-903-9817, Fax [email protected]

Dorothea ThompsonOak Ridge National Laboratory1 Bethel Valley RoadOak Ridge, TN 37831-6038865-574-4815, Fax [email protected]

Jerilyn TimlinSandia National LaboratoriesP.O. Box 5800, M.S. 0886Albuquerque, NM [email protected]

Cary TuckfieldSavannah River Technology CenterBldg. 773-42AAiken, SC 29808803-725-8215, Fax [email protected]

Wim VermaasArizona State UniversityDepartment of Plant BiologyBox 871601Tempe, AZ 85287-1601480-965-3698, Fax [email protected]

Michael ViolaU.S. Department of EnergySC-70/GTN 1000 Independence Ave., SWWashington, DC 20585-1290301-903-5346, Fax [email protected]

Lawrence WackettUniversity of MinnesotaBiochemistry1479 Gortner AvenueSt. Paul, MN 55108612-625-3785, Fax [email protected]

Appendix 1: Attendees List

86 Genomes to Life I

Page 97: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Lisa WaidnerUniversity of DelawareCollege of Marine Studies700 Pilottown Rd.Lewes, DE 19958302-645-4030, Fax [email protected]

Elizabeth WeitzkeIndiana UniversityChemistry Building800 E. Kirkwood Ave.Bloomington, IN 47405812-855-2047, Fax [email protected]

Owen WhiteThe Institute for Genomic Research9712 Medical Center DriveRockville, MD 20850301-838-5824, Fax [email protected] please cc [email protected]

Julian WhiteleggeUniversity of California Los AngelesDept of Chemistry405 Hilgard AveLos Angeles, CA 90095310-794-5156, Fax [email protected]

William WhitmanUniversity of GeorgiaDept. of MicrobiologyAthens, GA 30602-2605706-542-4219, Fax [email protected]

Steven WileyPacific Northwest National Laboratory902 Battelle BoulevardRichland, WA 99352509-373-6218, Fax [email protected]

David WilsonCornell University458 Biotechnology Bldg.Ithaca, NY 14853607-255-5706, Fax [email protected]

Ying XuOak Ridge National Laboratory1060 Commerce Park Drive, MS 6480Life Sciences DivisionOak Ridge, TN 37830-6480865-574-7263, Fax [email protected]

Huei-Che Bill YenUniversity of Missouri-ColumbiaRm 117, Schweitzer HallColumbia, MO 65211573-882-9771, Fax [email protected]

Jizhong ZhouOak Ridge National Laboratory1 Bethel Valley RoadOak Ridge, TN 37831-6038865-576-7544, Fax [email protected]

Shiguo ZhouUniversity of Wisconsin-Madison425 Henry MallMadison, WI [email protected]

Robin ZimmerApoCom Genomics11020 Solway School Rd.Suite 101Knoxville, TN 37931865-927-6120, Fax [email protected]

Appendix 1: Attendees List

Genomes to Life I 87

Page 98: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Genomes to Life I 89

Appendix 2: Poster Presenters

Gary AndersenB51 Development of Vectors for Detecting

Protein-Protein Interactions in Bacteria

Adam ArkinA4 Rapid Deduction of Stress Response

Pathways in Metal/RadionuclideReducing Bacteria

Dan ArpB13 Gene Expression Profiles in

Nitrosomonas europaea, an ObligateChemolithoautotroph

Jillian F. BanfieldB7 Ecological and Evolutionary Analyses

of a Spatially and GeochemicallyConfined Acid Mine DrainageEcosystem Enabled by CommunityGenomics

Albert-László BarabásiA26 Hierarchical Organization of

Modularity in Metabolic Networks

John R. BattistaB9 Uncovering the Regulatory Networks

Associated with IonizingRadiation-Induced Gene Expression inD. radiodurans R1

Mark BorodovskyA54 Predicting Genes from Prokaryotic

Genomes: Are “Atypical” GenesDerived from Lateral Gene Transfer?

Fred BrockmanB5 Approaches for Obtaining Genome

Sequence from ContaminatedSediments Beneath a LeakingHigh-Level Radioactive Waste Tank

Michelle BuchananA12 New Approaches for High-Throughput

Identification and Characterization ofProtein Complexes

William R. CannonB61 New Developments in Statistically

Based Methods for PeptideIdentification via Tandem MassSpectrometry

David A. CaseA32 Parallel Scaling in Amber Molecular

Dynamics Simulations

Denise CaseyB63 Communicating Genomes to Life

George ChurchA2 Microbial Ecology, Proteogenomics,

and Computational Optima

James R. ColeA44 A Web-Based Laboratory Information

Management System (LIMS) forLaboratory Microplate Data Generatedby High-Throughput GenomicApplications

Michael ColvinA56 Advanced Molecular Simulations of E.

coli Polymerase III

Michael J. DalyB29 The Dynamics of Cellular Stress

Responses in Deinococcus radiodurans

Jonathan DelatizkyA46 BioSketchpad: An Interactive Tool for

Modeling Biomolecular and CellularNetworks

D. A. DixonA6 Bioinformatics and Computing in the

Genomes to Life Center for Molecularand Cellular Systems

Timothy DonohueB41 The Molecular Basis for Metabolic and

Energetic Diversity

Norman J. DovichiB57 Single Cell Proteomics—D. radiodurans

Page 99: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Ken DowningB55 Electron Tomography of Whole

Bacterial Cells

James K. FredricksonB21 Environmental Sensing, Metabolic

Response and Regulatory Networks inthe Respiratory Versatile BacteriumShewanella oneidensis MR-1

George M. GarrityA36 Towards a Self-Organizing and

Self-Correcting Prokaryotic Taxonomy

Raymond GestelandB35 In Search of Diversity: Understanding

How Post-Genomic Diversity isIntroduced to the Proteome

Carol S. GiomettiB37 The Microbial Proteome Project: A

Database of Microbial ProteinExpression in the Context of GenomeAnalysis

Carol S. GiomettiB39 Analysis of the Shewanella oneidensis

Proteome in Cells Grown inContinuous Culture

D. K. GracioA6 Bioinformatics and Computing in the

Genomes to Life Center for Molecularand Cellular Systems

Caroline S. HarwoodB11 Strategies to Harness the Metabolic

Diversity of Rhodopseudomonas palustris

Grant S. HeffelfingerA20 Carbon Sequestration in Synechococcus

sp.: From Molecular Machines toHierarchical Modeling

P. R. HoytA14 Automation of Protein Complex

Analyses in Rhodopseudomonas palustrisand Shewanella oneidensis

Gregory B. HurstA8 Mass Spectrometry in the Genomes to

Life Center for Molecular and CellularSystems

David L. KirchmanB3 A Metagenomic Library of Bacterial

DNA Isolated from the Delaware River

Joel A. KlappenbachA44 A Web-Based Laboratory Information

Management System (LIMS) forLaboratory Microplate Data Generatedby High-Throughput GenomicApplications

Eugene KolkerA64 Interdisciplinary Study of Shewanella

oneidensis MR-1’s Metabolism & MetalReduction

Cheryl R. KuskeB1 Identification and Isolation of Active,

Non-Cultured Bacteria fromRadionuclide and Metal ContaminatedEnvironments for Genome Analysis

John LeighB25 Global Regulation in the Methanogenic

Archaeon Methanococcus maripaludis

Mary S. LiptonB39 Analysis of the Shewanella oneidensis

Proteome in Cells Grown inContinuous Culture

Derek LovleyA24 Analysis of the Genetic Potential and

Gene Expression of MicrobialCommunities Involved in the in situBioremediation of Uranium andHarvesting Electrical Energy fromOrganic Matter

Derek LovleyA34 Microbial Cell Model of G.

sulfurreducens: Integration of in silicoModels and Functional GenomicStudies

S. MalfattiB33 Finishing and Analysis of the Nostoc

punctiforme Genome

Betty K. MansfieldB63 Communicating Genomes to Life

90 Genomes to Life I

Appendix 2: Poster Presenters

Page 100: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Anthony MartinoA16 Analysis of Protein Complexes from a

Fundamental Understanding of ProteinBinding Domains and Protein-ProteinInteractions in Synechococcus WH8102

Harley McAdamsA40 Characterization of Genetic Regulatory

Circuitry Controlling AdaptiveMetabolic Pathways

Lee Ann McCueA52 Comparative Genomics Approaches to

Elucidate Transcription RegulatoryNetworks

Julie C. MitchellA48 Molecular Docking with Adaptive Mesh

Solutions to the Poisson- BoltzmannEquation

Karen E. NelsonB27 Identification of Regions of Lateral

Gene Transfer Across theThermotogales

Howard OchmanB19 Lateral Gene Transfer and the History

of Bacterial Genomes

Peter J. OrtolevaA58 Karyote®: Automated

Physico-Chemical Cell ModelDevelopment Through InformationTheory

Brian PalenikA18 Carbon Sequestration in Synechococcus:

Microarray Approaches

Morey ParangA60 The Commercial Viability of

EXCAVATOR™: A Software Tool ForGene Expression Data Clustering

D. A. PayneA6 Bioinformatics and Computing in the

Genomes to Life Center for Molecularand Cellular Systems

Haluk ResatA38 Computational Framework for

Microbial Cell Simulations

Mark D. RintoulA22 Systems Biology Models for

Synechococcus sp.

Diane J. RodiB49 Identification of ATP Binding Proteins

within Sequenced Bacterial GenomesUtilizing Phage Display Technology

Herbert M SauroA42 Data Exchange and Programmatic

Resource Sharing: The Systems BiologyWorkbench, BioSPICE and the SystemsBiology Markup Language (SBML)

Margrethe H. SerresB31 Analysis of Proteins Encoded on the S.

oneidensis MR-1 Chromosome, TheirMetabolic Associations and ParalogousRelationships

Christophe H. SchillingA30 SimPheny: A Computational

Infrastructure Bringing Genomes toLife

David C. SchwartzB45 Comparative Optical Mapping: A New

Approach for Microbial ComparativeGenomics

David C. SchwartzB47 Optical Mapping of Multiple Microbial

Genomes

Imran ShahA28 Computational Elucidation of

Metabolic Pathways

Richard D. SmithB59 Genomes to Proteomes to Life:

Application of New Technologies forComprehensive, Quantitative and HighThroughput Microbial Proteomics

T. P. StraatsmaA62 Modeling Electron Transfer in

Flavocytochrome c3 FumarateReductase

F. Robert TabitaB17 The Rhodopseudomonas palustris

Microbial Cell Project

Appendix 2: Poster Presenters

Genomes to Life I 91

Page 101: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Jerilyn TimlinA18 Carbon Sequestration in Synechococcus:

Microarray Approaches

Wim VermaasB43 Integrative Studies of Carbon

Generation and Utilization in theCyanobacterium Synechocystis sp. PCC6803

Lawrence P. WackettA50 Functional Analysis and Discovery of

Microbial Genes Transforming Metallicand Organic Pollutants: Database andExperimental Tools

William WhitmanB25 Global Regulation in the Methanogenic

Archaeon Methanococcus maripaludis

H. Steven WileyA10 Genomes to Life Center for Molecular

and Cellular Systems: A ResearchProgram for Identification andCharacterization of Protein Complexes

David B. WilsonB15 Genomics of Thermobifida fusca Plant

Cell Wall Degradating Proteins

Jizhong ZhouB23 Integrated Analysis of Protein

Complexes and Regulatory NetworksInvolved in Anaerobic EnergyMetabolism of Shewanella oneidensisMR-1

Jizhong ZhouB53 Development and Use of

Microarray-Based Integrated GenomicTechnologies for Functional Analysis ofEnvironmentally ImportantMicroorganisms

Shiguo ZhouB45 Comparative Optical Mapping: A New

Approach for Microbial ComparativeGenomics

Shiguo ZhouB47 Optical Mapping of Multiple Microbial

Genomes

Robin D. ZimmerA60 The Commercial Viability of

EXCAVATOR™: A Software Tool ForGene Expression Data Clustering

Appendix 2: Poster Presenters

92 Genomes to Life I

Page 102: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Appendix 3: GTL Web Sites

Web Sites

Genomes to Life Web Site

• GTL Roadmap, April 2001:http://DOEGenomesToLife.org

• Image Galleryhttp://DOEGenomesToLife.org/gallery/images.html

• Payoffs for the Nationhttp://DOEGenomesToLife.org/payoffs.html

• Current Researchhttp://DOEGenomesToLife.org/research/index.html

• GTL Program FY 2003 Call for Applicationshttp://www.er.doe.gov/production/grants/Fr03-05.html

Current GTL Program Projects

• July 23, 2002, Press Release “Energy Depart-ment Awards $103 Million for Post-GenomicResearch”http:// DOEGenomesToLife.org/research/2002awards.html

• Oak Ridge National Laboratory for Genomesto Life Center for Molecular and Cellular Sys-tems: A Research Program for Identificationand Characterization of Protein Complexeshttp://www.ornl.gov/GenomestoLife/

• Lawrence Berkeley National Laboratory forRapid Deduction of Stress Response Pathways

in Metal/Radionuclide Reducing Bacteriahttp://vimss.lbl.gov/

• Sandia National Laboratories for CarbonSequestration in Synechococcus: From MolecularMachines to Hierarchical Modelinghttp://www.genomes-to-life.org/

• University of Massachusetts, Amherst, forAnalysis of the Genetic Potential and GeneExpression of Microbial Communities Involvedin the in situ Bioremediation of Uranium andHarvesting Electrical Energy from OrganicMatterhttp://www.geobacter.org

• Harvard Medical School for Microbial Ecology,Proteogenomics and Computational Optimahttp://arep.med.harvard.edu/DOEGTL/

Complementary Web Sites

• Human Genome Project Informationhttp://www.ornl.gov/hgmis/

• DOE Microbial Genome Programhttp://www.ornl.gov/microbialgenomes/

• Microbial Genomics Gatewayhttp://microbialgenome.org/

Genomes to Life I 93

Page 103: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Publications and Presentations

GTL Science and Administration

• Program Overview, Jan 2003http://DOEGenomesToLife.org/pubs/overview_screen.pdf

• GTL draft facilities strategy and plan submittedto the Biological and Environmental AdvisoryCommittee by the Life Sciences Division of theBiological and Environmental Research pro-gram for the Dec 3-4, 2002 meetinghttp://DOEGenomesToLife.org/pubs/GTLFac34BERAC45.pdf

• Resources and Technology Centers for Biologi-cal Discovery in the 21st Century brochure(Published version with images derived fromworking paper below; June 2002)http://DOEGenomesToLife.org/research/GTLFacilities18screen.pdf

• Working paper presented to the BERAC; April25, 2002http://DOEGenomesToLife.org/pubs/gtl_facilities.pdf

• Primer Pictorial; Oct 2001http://DOEGenomesToLife.org/primer.html

• Genomes to Life Roadmap; April 2001http://DOEGenomesToLife.org/roadmap/index.html

• Bringing the Genome to Life Report of a sub-committee of the Biological and EnvironmentalResearch Advisory Committee (BERAC);August 2000http://DOEGenomesToLife.org/history/genome-to-life-rpt.html

Workshop Reports on Computing

• Mathematics for GTL Workshop, Gaithersburg,Maryland; March 18-19, 2002http://DOEGenomesToLife.org/pubs/GTLMath-6.pdf

• Computer Science for GTL Workshop,Gaithersburg, Maryland; March 6-7, 2002http://doegenomestolife.org/pubs/computerscience/

• Computing Infrastructure and NetworkingWorkshop, Gaithersburg, Maryland; Jan 22-23,2002http://DOEGenomesToLife.org/compbio/mtg_1_22_02/infrastructure.html

• Visions for Computational and Systems Biol-ogy Workshop for the Genomes to Life Pro-gram; September 6-7, 2001 (Notes withExecutive Summary)http://DOEGenomesToLife.org/pubs/CompbioVisions-4.pdf

• First GTL Computational Biology Workshop;Aug 7-8, 2001http://DOEGenomesToLife.org/compbio/mtg8_8_01/titlepage.htm

Workshop Reports on Technologies

• Report on the Imaging Workshop for theGenomes to Life Program, April 16-18, 2002http://DOEGenomesToLife.org/technology/imaging

• Executive Summary of Workshop on ImagingTechnologies, April 16-17, 2002http://DOEGenomesToLife.org/pubs/imaging_summary.pdf

• Imaging Technology Additional Readinghttp://DOEGenomesToLife.org/technology/imaging/workshop2002/biblio.html

• Genomes to Life: Technology Assessment forMass Spectrometry, Dec 10-11, 2001http://DOEGenomesToLife.org/pubs/mass_spec.pdf

Appendix 3: GTL Web Sites

94 Genomes to Life I

Page 104: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Energy Security and Global Climate

Change

• “Energy Security and Climate Stabilization bro-chure”; August 2002http://DOEGenomesToLife.org/energy_carbon.pdf

• “Energy and Climate Change Payoffs posterpresented at the G8 Energy Conference”; April2002http://DOEGenomesToLife.org/warming/climateposter.pdf

• James Edmonds, Nov 27, 2001 to Biologicaland Environmental Research AdvisoryCommitteehttp://DOEGenomesToLife.org/nov27slides/edmonds.htm

• Robin Graham, Nov 27, 2001 to Biologicaland Environmental Research AdvisoryCommitteehttp://DOEGenomesToLife.org/nov27slides/graham.htm

• Workshop on The Role of Biotechnology inMitigating Greenhouse Gas Concentrations;June 23, 2001: A Workshop Summary by KenNealson and J. Craig Venter, WorkshopCochairshttp://DOEGenomesToLife.org/warming/GTLBiotechwksp.htm

Bioremediation

• “Innovative Approaches for Cleaning Up andTreating Hazardous Wastes at DOE Sites” bro-chure; June 2002http://DOEGenomesToLife.org/cleanup.pdf

• Presentation by Blaine Metting on November27, 2001 to Biological and EnvironmentalResearch Advisory Committee

http://DOEGenomesToLife.org/nov27slides/metting.htm

Video

• 3-D image of a Shewanella oneidensis cellhttp://DOEGenomesToLife.org/pubs/video.html#shewanella

Related Publications

• Microbial Ecology and Genomics: A Cross-roads of Opportunity. A Report from theAmerican Academy of Microbiology” (based ona colloquium held Feb 23-25, 2001)http://www.asmusa.org/acasrc/pdfs/MicroEcoreport.pdf

• “A Target Area of the Scientific SimulationInitiative”http://cbcg.lbl.gov/ssi-csb/HomeDetail.html

• “Computational Challenges in Structure andFunctional Genomics” (IBM Systems Journal,Vol 40, No 2, 2001)http://www.research.ibm.com/journal/sj/402/headgordon.pdf

• “The Biomedical Information Science andTechnology Initiative” (June 3, 1999)http://www.nih.gov/about/director/060399.htm

• “Advanced Computational StructuralGenomics”http://cbcg.lbl.gov/ssi-csb/Meso.html

• Presentation by Gary Johnson, Oct 26, 2001 tothe Office of Advanced Scientific ComputingResearch Advisory Committeehttp://DOEGenomesToLife.org/compbio/johnson.htm

Appendix 3: GTL Web Sites

Genomes to Life I 95

Page 105: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Author Index

AAdams, Michael W. W. . . . . . . . . . . . . . . . . . . . . . . . . 62

Adamson, Anne E.. . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Adkins, Joshua N. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Agron, Peter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Al-Hashimi, Hashim M. . . . . . . . . . . . . . . . . . . . 16, 18

Allen, Eric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Amster, Jon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Anantharaman, Thomas S.. . . . . . . . . . . . . . . . . . . . . 67

Andersen, Gary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Anderson, Aaron . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Anderson, Gordon A. . . . . . . . . . . . . . . . . . . . . . . 9, 72

Arkin, Adam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Arp, Daniel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51, 70

Atkins, John. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Atlas, R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Auberry, Deanna . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Ausubel, Fred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

BBabnigg, Gyorgy . . . . . . . . . . . . . . . . . . . . . . . . . 62, 63

Balazsi, G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Banfield, Jillian F.. . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Barabási, Albert-László . . . . . . . . . . . . . . . . . . . . . . 25

Barns, Susan M.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Barsky, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Battista, John R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Beatty, Thomas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Bechner, Michael . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Beliaev, Alex S.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 8, 56

Belta, Calin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Besemer, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Blattner, Frederick R. . . . . . . . . . . . . . . . . . . . . . . . . . 67

Bond, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . 20, 26, 29

Borodovsky, Mark . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Bownas, Jennifer L. . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Brahamsha, Bianca. . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Brockman, Fred . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Brown, Steve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Brozell, Scott . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Bruckner-Lea, C. J. . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Buchanan, Michelle . . . . . . . . . . . . . . . . . . . . . . . . . 13

Bumgarner, Roger . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Burland, Valerie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Butler, Jessica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

CCamp, David . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Campbell, J. W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Cannon, William R.. . . . . . . . . . . . . . . . . . . . 9, 54, 73

Carmack, C. Steven . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Case, David A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Casey, Denise K. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Chain, P. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Chen, Baowei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Childers, Susan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Chisholm, Sallie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Churas, Chris. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Church, George . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Ciufo, Stacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Cole, James R. . . . . . . . . . . . . . . . . . . . . . . . . . . 35, 56

Coleman, James R. . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Colson, Steve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Colvin, Michael . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Coppi, Maddalena . . . . . . . . . . . . . . . . . . . . . . . . 20, 29

Cottrell, Matthew T.. . . . . . . . . . . . . . . . . . . . . . . . . . 46

Crowley, Michael . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

DDaly, Michael J. . . . . . . . . . . . . . . . . . . . . . . . . . 59, 70

Dam, Phuongan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Davidson, George S. . . . . . . . . . . . . . . . . . . . . . . . . . 18

Delatizky, Jonathan . . . . . . . . . . . . . . . . . . . . . . . . . 36

Deng, Wen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Genomes to Life I 97

Page 106: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Dixon, David A. . . . . . . . . . . . . . . . . . . . . . . . . . . 9, 13

Doggett, N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Doktycz, Mitchel J. . . . . . . . . . . . . . . . . . . . . . . . 13, 14

Donohue, Timothy J.. . . . . . . . . . . . . . . . . . . . . 64, 67

Dovichi, Norman J. . . . . . . . . . . . . . . . . . . . . . . . . . 71

Downing, Ken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Drenkard, Eliana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Dubchak, Inna. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Dupuis, Michel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

EEarl, Ashlee M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Edwards, Jeremy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Elhai, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Ellis, Lynda B.M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Estes, Sherry A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Esteve-Nunez, Abraham . . . . . . . . . . . . . . . . . . . . . . 20

Eyck, Lynn F. Ten . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

FFaull, Kym . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Faulon, Jean-Loup . . . . . . . . . . . . . . . . . . . . . . . . 18, 19

Feldhaus, Jane M. Weaver . . . . . . . . . . . . . . . . . . . . . 11

Feldhaus, Michael J. . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Fields, Matthew W. . . . . . . . . . . . . . . . . . . . . . 8, 56, 70

Foote, Linda J.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Foote, R. S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Forrest, Dan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Fredrickson, James K. . . . . . . . . . . . 54, 59, 62, 63, 72

Fridman, T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Frink, Laurie J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

GGalli, Giulia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Garrity, George M. . . . . . . . . . . . . . . . . . . . . . . . . . 30

Garza, Priscilla A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Geist, Al . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16, 18

Gerdes, Svetlana. . . . . . . . . . . . . . . . . . . . . . . . . . 25, 64

Gessler, Damian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Gesteland, Raymond . . . . . . . . . . . . . . . . . . 10, 13, 61

Gibson, Janet L.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Giddings, Michael . . . . . . . . . . . . . . . . . . . . . . . . 10, 13

Gill, Steven R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Giometti, Carol S. . . . . . . . . . 10, 13, 20, 29, 56, 62, 63

Glaven, Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Gomelsky, Mark. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Gorby, Yuri A. . . . . . . . . . . . . . . . . . . . . . 11, 54, 63, 72

Gorin, Andrey . . . . . . . . . . . . . . . . . . . . . . . . . 9, 16, 18

Goulian, Mark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Gracio, D. K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Gygi, Francois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

HHaaland, David M. . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Hackett, Murray . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Hance, Ioana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Hart, William E. . . . . . . . . . . . . . . . . . . . . . . . . . 16, 18

Harwood, Caroline S. . . . . . . . . . . . . . . . . . 51, 53, 70

Hazen, Terry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Heffelfinger, Grant S. . . . . . . . . . . . . . . . . . . . . . . . 18

Heredia-Langner, Alejandro . . . . . . . . . . . . . . . . . . . . 73

Hettich, Robert L. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Hill, Eric A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Holst, Michael J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Hooker, Brian S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Hosler, Jonathan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Howell, Heather A. . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Hoyt, P. R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Hsu, Yuan-Man . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Hugenholtz, Philip . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Hurst, Gregory B. . . . . . . . . . . . . . . . . . . . . . . . 10, 13

IIrwin, Diana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Ivancic, Franjo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

JJaffe, Jake. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Jakobsson, Erik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Jarman, Kenneth D.. . . . . . . . . . . . . . . . . . . . . . . . . . 73

Jarman, Kristin H. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

KKadner, Kristin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Kaplan, Samuel . . . . . . . . . . . . . . . . . . . . . . . 31, 64, 67

98 Genomes to Life I

Page 107: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Keasling, Jay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Keller, Martin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8, 47

Kennel, Steven J. . . . . . . . . . . . . . . . . . . . 10, 11, 13, 14

Khare, Tripti . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62, 63

Kile, Andrew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Kirchman, David L. . . . . . . . . . . . . . . . . . . . . . . . . . 46

Klappenbach, Joel A. . . . . . . . . . . . . . . . . . . 35, 56, 70

Klotz, Martin G. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Kolker, Eugene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Kolter, Roberto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Koonin, Eugene. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Krushkal, Julia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Kucherlapati, Raju . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Kulam, Siddique A. . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Kuman, Vijay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Kuske, Cheryl R. . . . . . . . . . . . . . . . . . . . . . . . . 45, 47

Kvikstad, Erika . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

LLamers, Casey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Lane, Todd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16, 18

Lankford, Trish K. . . . . . . . . . . . . . . . . . . . . . 10, 11, 14

Larimer, Frank W.. . . . . . . . . . . 9, 10, 11, 13, 51, 53, 56

Lau, Ed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Laub, Mike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7, 32

Lawrence, Charles E.. . . . . . . . . . . . . . . . . . . . . . . . 38

Leang, Ching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Leaphart, Adam. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Leigh, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Leptos, Kyriacos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Li, Ming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Liao, James C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Lightstone, Felice. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Lilburn, Timothy G. . . . . . . . . . . . . . . . . . . . . . . . . . 30

Lin, Chian-Tso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Lindell, Debbie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Lindsey, Susan D. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Lipton, Mary S. . . . . . . . . . . 10, 11, 14, 56, 59, 63, 72

Liu, Jun S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Liu, Xiudan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Liu, Yongqing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Locascio, Phil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Logsdon, John. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Longmire, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Lory, Steve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Lovley, Derek . . . . . . . . . . . . . . . . . . . . . 20, 26, 29, 62

Lu, H. Peter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Lu, T-Y. S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

MMackenzie, Christopher . . . . . . . . . . . . . . . . . . . . 31, 67

Mahadevan, Radhakrishnan . . . . . . . . . . . . . . . . . 26, 29

Makowski, Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Malfatti, S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Mandava, Suneeta . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Mansfield, Betty K. . . . . . . . . . . . . . . . . . . . . . . . . . 23

Maranas, Costas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Marcia, Roummel . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Margolin, William . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Markillie, L. Meng . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Martin, Sheryl A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Martin, Vincent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Martino, Anthony . . . . . . . . . . . . . . . . . . . . 16, 18, 19

Maye, M. Uljana . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

McAdams, Harley . . . . . . . . . . . . . . . . . . . . . . . . . . 32

McCammon, J. Andrew . . . . . . . . . . . . . . . . . . . . . . . 36

McCue, Lee Ann . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Means, Shawn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Meeks, J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Mendoza, E. S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Metha, Teena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Methé, Barbara . . . . . . . . . . . . . . . . . . . . . . . . . . 20, 29

Miller, Keith D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Mills, Marissa D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Mitchell, Julie C. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Mongodin, Emmanuel. . . . . . . . . . . . . . . . . . . . . . . . 58

Mongru, D. A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Moore, Barry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Moore, Ronald J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Murray, Alison E.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Murray, Maria C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

NNatarajan, Vijaya . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Navid, Ali . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Nealson, Kenneth H. . . . . . . . . . . . . . . . . . . . . . . 56, 62

Genomes to Life I 99

Page 108: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Negash, Sewite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Nelson, Chad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Nelson, Karen E. . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Nunez, Cinthia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Nylander, Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

OOchman, Howard . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Oda, Yasuhiro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Oliveira, Joseph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Olken, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8, 18

Olman, Victor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9, 18

Olsen, Gary J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Oltvai, Zoltán N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Ortoleva, Peter J. . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Osterman, A. L.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Ostrouchov, George . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Overbeek, Ross . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Owen, Arch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

PPalenik, Brian. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Palsson, Bernhard O. . . . . . . . . . . . . . . . . . . . 26, 29, 56

Palzkill, Timothy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Parang, Morey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Park, Byung-Hoon. . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Park, Sung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Pasa-Tolic, Ljiljana . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Passovets, S.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Paulsen, Ian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Payne, Debbie A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Pelletier, Dale . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10, 11

Peterson, Scott N. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Petti, Allegra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Plimpton, Steven J. . . . . . . . . . . . . . . . . . . . . 16, 18, 19

Polz, Martin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Pomposiello, Pablo. . . . . . . . . . . . . . . . . . . . . . . . . . . 20

QQin, Zhaohui S.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

RRamsey, J. Michael . . . . . . . . . . . . . . . . . . . . . . . . 13, 14

Ravasz, Erzsebet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Razumovskaya, Jane . . . . . . . . . . . . . . . . . . . . . . . 9, 10

Reich, Claudia I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Resat, Haluk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Reslewic, Susan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Rey, Federico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Richardson, Paul . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Riley, Monica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Rintoul, Mark D. III . . . . . . . . . . . . . . . . . . . . . 18, 19

Roberson, Robert . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Roberts, Victoria A.. . . . . . . . . . . . . . . . . . . . . . . . . . 36

Rodi, Diane J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Rodland, Karin D. . . . . . . . . . . . . . . . . . . . . . . . . 13, 14

Roe, Diana C. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16, 18

Romine, Margaret F. . . . . . . . . . . . . . . . . 11, 47, 54, 72

Rosen, J. Ben. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Rubin, Harvey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Runnheim, Rod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

SSamanta, Sudip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Samatova, Nagiza F.. . . . . . . . . . . . . . . . . . . . . . . 16, 18

Samudrala, Ram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Sandler, Steve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Santos, Scott R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Sauro, Herbert M . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Saxman, Paul R.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Sayavedra-Soto, Luis . . . . . . . . . . . . . . . . . . . . . . . . . 51

Sayyed-Ahmad, Abdalla . . . . . . . . . . . . . . . . . . . . . . . 41

Schilling, Christophe H. . . . . . . . . . . . . . . . . . . 26, 29

Schmoyer, D.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Schug, Jonathan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Schwartz, David C. . . . . . . . . . . . . . . . . . . . . . . . . . 67

Schwegler, Eric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Segre, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Serres, Margrethe H. . . . . . . . . . . . . . . . . . . . . . . . . . 60

Severin, Jessica. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Shah, Imran. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Shah, Manesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9, 18

Shapiro, Lucy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Shebolina, Zhenya . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

100 Genomes to Life I

Page 109: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Shen, Yufeng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Shi, Liang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Shoshani, Arie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Siegel, Robert W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Sinclair, Michael B. . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Singh, Anup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Smith, Dayle M. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Smith, Richard D. . . 10, 11, 13, 53, 54, 56, 59, 63, 72

Sofia, Heidi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9, 31

Sokolsky, Oleg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Söll, Dieter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Somera, A. L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Sommerville, Leslie E. . . . . . . . . . . . . . . . . . . . . . . . . 45

Springer, David L. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Squier, Thomas C. . . . . . . . . . . . . . . . . . . . . . . . . 11, 13

Stahl, David. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Stanek, Dawn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Steen, Robert. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Steffen, Martin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Straatsma, T. P.. . . . . . . . . . . . . . . . . . . . . . . . . . . 9, 43

Strauss, Charlie . . . . . . . . . . . . . . . . . . . . . . . . . . 16, 18

TTabita, F. Robert . . . . . . . . . . . . . . . . . . . . . . . . 51, 53

Tarallo, Regina. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Thelen, Daniel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Thomas, Edward V. . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Thompson, Dorothea K. . . . . . . . . . . . . . . . . . 8, 56, 70

Thompson, William . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Tian, Yuan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Tiedje, James M. . . . . . . . . . . . . . . . . . . . . . . 35, 56, 70

Timlin, Jerilyn A. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Tollaksen, Sandra L.. . . . . . . . . . . . . . . . . . . . . . . 62, 63

Tolonen, Andrew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Travnik, Evelyn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Trease, Harold E. . . . . . . . . . . . . . . . . . . . . . . . . . 31, 54

Tuncay, Kagan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Tyson, Gene W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

UUberbacher, E.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Udseth, Harold R. . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

VVan Berkel, Gary J.. . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Venclovas, Ceslovas . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Verberkmoes, Nathan C. . . . . . . . . . . . . . . . . . . . . . . 10

Vergez, L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Vermaas, Wim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Vokler, I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Vorpagel, Erich R. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

WWackett, Lawrence P. . . . . . . . . . . . . . . . . . . . . . . . . 37

Waidner, Lisa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Wall, Judy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Wang, Qiong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Wang, Yisong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Webb, Jonathan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Wei, Jun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Wei, Xueming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Weir-Lipton, Mary S. . . . . . . . . . . . . . . . . . . . . . . . . . 54

Weiss, Shimon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Weitzke, Elizabeth . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Welber, Lois. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Whitelegge, Julian . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Whitman, William . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Wiley, H. Steven . . . . . . . . . . . . . . . . . . . . . . . . 11, 13

Wills, Norma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Wilson, David B. . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Wright, Matt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Wu, Liyou . . . . . . . . . . . . . . . . . . . . . . . . . . . 35, 51, 70

Wyrick, Judy M.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

XXu, Dong. . . . . . . . . . . . . . . . . . . . . . . 9, 18, 42, 56, 70

Xu, Ying. . . . . . . . . . . . . . . . . . 9, 13, 16, 18, 19, 42, 56

YYan, Tingfen . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51, 70

Yates, John R. III . . . . . . . . . . . . . . . . . . . . . . . . . 62, 63

Young, David. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Young, Malin . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10, 13

Yust, Laura N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Genomes to Life I 101

Page 110: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

ZZengler, Karsten . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Zhou, Jizhong . . . . . . . . . . . . 8, 35, 51, 53, 56, 59, 70

Zhou, Shiguo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Zimmer, Robin D. . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Zinser, Eric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

102 Genomes to Life I

Page 111: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

Institution Index

AAmerican Type Culture Collection . . . . . . . . . . . . . 30

ApoCom Genomics . . . . . . . . . . . . . . . . . . . . . . . . 42

Argonne National Laboratory . . . . . . . . . . . . . . 10, 13,20, 29, 56, 62–63, 68

Arizona State University. . . . . . . . . . . . . . . . . . . . . 64

BBaylor College of Medicine . . . . . . . . . . . . . . . . . . 56

BBN Technologies . . . . . . . . . . . . . . . . . . . . . . . . . 36

BIATECH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

CCornell University . . . . . . . . . . . . . . . . . . . . . . . . . 52

DDesert Research Institute . . . . . . . . . . . . . . . . . . . . 35

Diversa Corporation . . . . . . . . . . . . . . . . . . . . . . . 47

Diversa, Inc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

DOE Joint Genome Institute . . . . . . . . . . . . . . . . . 47

EEmory University . . . . . . . . . . . . . . . . . . . . . . . . . 39

FFt. Lewis College. . . . . . . . . . . . . . . . . . . . . . . . . . 45

GGenomatica, Inc. . . . . . . . . . . . . . . . . . . . . . . . 26, 29

Georgia Technical Institute . . . . . . . . . . . . . . . . . . . 39

HHarvard Medical School. . . . . . . . . . . . . . . . . . . . . . 7

Harvard University . . . . . . . . . . . . . . . . . . . . . . . . 38

IIndiana University . . . . . . . . . . . . . . . . . . . . . . . . . 41

Institute for Genomic Research . . . . 18, 20, 29, 58, 60

Integrated Genomics, Inc. . . . . . . . . . . . . . . . . 25, 64

KKeck Graduate Institute . . . . . . . . . . . . . . . . . . . . . 34

LLawrence Berkeley National Laboratory . . . . 8, 18, 71

Lawrence Livermore National Laboratory. . 39, 61, 69

Los Alamos National Laboratory . . 16, 18, 45, 47, 61

Louisiana State University and A & M College . . . . 60

MMarine Biological Laboratory . . . . . . . . . . . . . . . . . 60

Massachusetts Institute of Technology . . . . . . . . . . . 7

Michigan State University. . . . . . . . . . . 30, 35, 56, 70

NNational Center for Biotechnology Information . . . 59

National Center for Genome Resources . . . . . . . . . 19

Northwestern University . . . . . . . . . . . . . . . . . . . . 25

Northwestern University Medical School . . . . . . . . 25

OOak Ridge National Laboratory . . . . . . . 8–11, 13–14,

16, 18–19, 23, 35, 42, 51, 53, 56, 59, 70

Ohio State University. . . . . . . . . . . . . . . . . . . . 51, 53

Oregon State University . . . . . . . . . . . . . . . . . . 51, 70

PPacific Northwest National Laboratory. . 8–11, 13–14,

31, 43, 47, 53–54, 56, 59, 62–63, 72–73

Genomes to Life I 103

Page 112: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

SSandia National Laboratories . . . 8, 10, 13, 16, 18–19

Scripps Research Institute . . . . . . . . . . . 27, 36, 62–63

Stanford University . . . . . . . . . . . . . . . . . . . . . . . . 32

UUniversity of Arizona. . . . . . . . . . . . . . . . . . . . . . . 53

University of British Columbia . . . . . . . . . . . . . . . . 53

University of California, Berkeley . . . . . . . . . . . . 8, 48

University of California, Davis . . . . . . . . . . . . . . . . 61

University of California, Los Angeles . . . . . 53–54, 64

University of California, San Diego. . . . 18, 26, 36, 56

University of California, Santa Barbara . . . . . . . . . . 18

University of Colorado. . . . . . . . . . . . . . . . . . . . . . 33

University of Delaware. . . . . . . . . . . . . . . . . . . 46, 64

University of Georgia . . . . . . . . . . . . . . . . . . . . 57, 62

University of Illinois, Urbana . . . . . . . . . . . . . . 18, 62

University of Iowa . . . . . . . . . . . . . . . . . . . 51, 53, 70

University of Louisville . . . . . . . . . . . . . . . . . . 51, 61

University of Massachusetts . . . . . . . . . . . . . . . 20, 62

University of Massachusetts, Amherst . . . . . . . . 26, 29

University of Michigan. . . . . . . . . . . . . . . . . . . 16, 18

University of Minnesota . . . . . . . . . . . . . . . . . . . . . 37

University of Mississippi Medical Center. . . . . . . . . 64

University of Missouri, Columbia. . . . . . . . . . . . . . . 8

University of Missouri, St. Louis . . . . . . . . . . . . . . 61

University of North Carolina . . . . . . . . . . . . . . 10, 13

University of Notre Dame . . . . . . . . . . . . . . . . . . . 25

University of Pennsylvania . . . . . . . . . . . . . . . . . . . 36

University of Southern California . . . . . . . . . . . 56, 62

University of Southern California, San Diego . . . . . 18

University of Tennessee . . . . . . . . . . . . . . . . . . . . . 20

University of Texas Medical School, Houston 31, 64, 67

University of the Health Sciences . . . . . . . . . . . 59, 70

University of Utah . . . . . . . . . . . . . . . . . . . 10, 13, 61

University of Washington. . . . . . . . . . . . . . . . . . . . 57

University of Washington, Seattle . . . . . . . . . . . . 8, 71

University of Wisconsin, Madison. . . . . . . . . . . 64, 67

University of Wyoming . . . . . . . . . . . . . . . . . . . . . 64

WWadsworth Center . . . . . . . . . . . . . . . . . . . . . . . . . 38

YYale University . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

104 Genomes to Life I

Page 113: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,

AGENDAGenomes to Life Contractor-Grantee Workshop I

February 9–12, 2003Sheraton National Hotel, Arlington, Virginia

Sunday, February 97:00–9:00 p.m. No-host mixer, 16th floor, Galaxy Ballroom

Monday, February 108:30 a.m. Welcome, logistics, program overview,

introduction to breakout sessions

9:30 George Church, Harvard Univ.

10:00 Michelle Buchanan, ORNL

10:30 BREAK

11:00 Grant Heffelfinger, SandiaNational Lab.

11:30 Derek Lovley, Univ. of Mass. at Amherst

12:00 Adam Arkin, LBNL

12:30–2:00 LUNCH

2:00–4:00 Session A Breakouts

4:00–4:30 BREAK

4:30–6:30 Poster Session A. Poster boards at themeeting will be numbered; posterplacement will correspond to posternumber as indicated on abstract listing.

Tuesday, February 118:30 a.m. Lucy Shapiro, Stanford Univ.

9:00 Carrie Harwood, Univ. of Iowa

9:30 Bernhard Palsson, Univ. of Ca., San Diego

10:00 BREAK

10:30 Town Hall – Data Sharing

11:30 Production Genomics Facility – DNAsequencing capabilities, process, issues

12:00 Related programs at other federalagencies:

NIGMS (NIH), Jerry Li; NSF, Matt Kane;DARPA, Sri Kumar

12:30 LUNCH

2:00–4:30 Session B Breakouts

4:30–6:30 Poster Session B. Poster boards at themeeting will be numbered; posterplacement will correspond to posternumber as indicated on abstract listing.

Wednesday, February 128:30–10:00 a.m. Shewanella Federation

8:30–8:45 Shewanella Federation Overview –Jim Fredrickson, PNNL

8:40–8:55 Cultivation Techniques & InitialCollaborative Experiments –Yuri Gorby, PNNL

8:55–9:10 Microarray Analyses – Jizhong Zhou,ORNL

9:10–9:30 Proteome Analyses Accurate Mass Tags –Richard Smith, PNNL2D Gels – Carol Giometti, ANL

9:30–9:45 Data Analysis – Eugene Kolker, BIATECH

9:45–10:00 Future Directions using ContinuousCultivation – Ken Nealson, USC

10:00 BREAK

10:30–12:00 Town Hall – GTL Facilities Discussion –Marvin Frazier, DOE

12:00–12:45 Blue Skies Forum – Craig Venter, Institutefor Biological Energy Alternatives

12:45 Adjourn

Breakout SessionsBreakout sessions will focus on key topics of interest tothe Genomes to Life program. They are intended to beopen discussion sessions for all interested participants. Thesessions will begin with several short presentations to helpstimulate discussion and will be led by several co-chairs.Sessions may focus on several key questions or challenges.

1. Data standards, data sharing, software openness:Session A Co-chairs – Ed Uberbacher, ORNL;Mike Colvin, LLNL; Arie Shoshani, LBNL

2. Environmental genomics / microbial communities,including computational needs:Session A Co-chairs – Derek Lovley, Univ. of Massachu-setts, Amherst; Barbara Methe, TIGR

3. Proteomics, molecular machines, including computa-tional needs:Session A Co-chairs – Michelle Buchanan, ORNL;Grant Heffelfinger, Sandia

4. New insights from comparative genomics:Session B Co-chairs – Jim Fredrickson, PNNL;Susan Lucas, DOE Joint Genome Institute

5. Technology development / integration:Session B Chair – George Church, Harvard Univ.

6. Regulatory networks, including computational needs:Session B Co-chairs – Adam Arkin, LBNL;Bob Tabita, Ohio State Univ.

Page 114: Genomes to Life Program - Energy.gov · 2019. 9. 3. · Genomes to Life Program Gary Johnson U.S. Department of Energy (SC-30) Office of Advanced Scientific Computing Research 301/903-5800,