Topic Outline

Preview:

DESCRIPTION

Network Biology Data Biological, Conceptual and Computational Issues around Network, System, and Pathway Data The Abstract and The Concrete. Topic Outline. Lessons from Genome Program and Abstract Ideas to transform data to information when looking at systems data. - PowerPoint PPT Presentation

Citation preview

Network Biology Data

Biological, Conceptual and Computational Issues around Network, System, and Pathway Data

The Abstract andThe Concrete

Topic Outline Lessons from Genome Program and Abstract Ideas to transform data to

information when looking at systems data.

Two examples of Concrete Tools (ready for use) WebGestalt (for large sets of genes) Ingenuity (for networks)

A Concrete Thing: Bioinformatics Resource Center

(under development) Other tools under development

Human Genome Project (HGP): Past Lessons and Future Directions in Data…

Phenotype and System Data

Individualized Genotype data within populations

Genome Data

Genome-encoded “parts list” as data

integrator. -Common Data Elements of gene and gene Products of transcripts and

proteins. Enabling Integration and Comparison of data in NEW ways…

GeneKeyDB and related work as an integrative

foundation that can help merge with other data.

Genome Data`

HGP Highlighted some ways to succeed or fail with large data sets. ? Lessons Learned applicable for systems bio of expression, proteomics, genetic data sets? Yes.

?But, are some new approaches needed to understand SYSTEM data? Yes.

Biggest Lesson: A Biodata item has 2 questions attached to it…Mayr…HGP showed importance of the why questions in thinking about and organizing data.

Other genotype, phenotype, system data

Genome Data

How? Why?

A datum…

Genotype + Environment + DEVELOPMENT ==> Phenotype

1) Astounding Results Importance of Network thinking in development and physiology for data to

explain phenotype (e.g. PAX6)

2) Some relevance from HGP data approaches, but…Need new bioinformatics tools for network data and

thinking…

HGP results and Future Issues for new data….

Δ data in Regulatory networks

Δ data in Cellular signaling networks

Δ data in protein coding

Δ data in Regulatory networks

Δ data in Cellular signaling networks

Δ data in protein coding

A way of thinking about data…

Bioinformatics: Finding the (genotypic, environmental data) difference that makes

the (phenotypic data) difference.

(Many differences that make an interesting difference, NOT at protein coding, but at complex networks)

e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books. Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101

What is a “Network” way of viewing data…

A Biological network can be expressed and manipulated in terms of “graph theory.”

Combinatorial algorithms are needed to analyze graphs.

Nodes or Vertices May be • Genes• Gene products• Hormones, signals• Metabolites• Publications• Functional Sequence

Elements

Nodes or Vertices May be • Genes• Gene products• Hormones, signals• Metabolites• Publications• Functional Sequence

Elements

Edges or LinesEdges or Linesmay be may be • Undirected vs. directedUndirected vs. directed• Weighted vs. unweighted.Weighted vs. unweighted.

Edges or LinesEdges or Linesmay be may be • Undirected vs. directedUndirected vs. directed• Weighted vs. unweighted.Weighted vs. unweighted.

1.21.70.9

++++++

Could be…Could be…• Co-expression Networks• Gene Regulatory networks • Cell-Cell communication and signal

transduction networks.• Phylogenetic relationships among

genes, species, networks: orthology, paralogy, etc. (trees, clades, etc.)

• Gene Ontology or other Directed Acyclic Graphs.

Could be…Could be…• Co-expression Networks• Gene Regulatory networks • Cell-Cell communication and signal

transduction networks.• Phylogenetic relationships among

genes, species, networks: orthology, paralogy, etc. (trees, clades, etc.)

• Gene Ontology or other Directed Acyclic Graphs.

e.g. Alon U. 2003. Science 301: 1866; Barabasi Linked. 2003. Plume Books. Barabasi AL, Oltvai ZN. 2004. Nat. Rev. Genetics 5: 101

What is a “Network” way of viewing data…

A Biological network can be expressed and manipulated in terms of “graph theory.”

Combinatorial algorithms are needed to analyze graphs.

Nodes or Vertices May be • Genes• Gene products• Hormones, signals• Metabolites• Publications• Functional Sequence

Elements

Nodes or Vertices May be • Genes• Gene products• Hormones, signals• Metabolites• Publications• Functional Sequence

Elements

Edges or LinesEdges or Linesmay be may be • Undirected vs. directedUndirected vs. directed• Weighted vs. unweighted.Weighted vs. unweighted.• Experimental correlation (can Experimental correlation (can

be undirected) vs. be undirected) vs. mechanistic & directedmechanistic & directed

Edges or LinesEdges or Linesmay be may be • Undirected vs. directedUndirected vs. directed• Weighted vs. unweighted.Weighted vs. unweighted.• Experimental correlation (can Experimental correlation (can

be undirected) vs. be undirected) vs. mechanistic & directedmechanistic & directed

1.21.70.9

++++++

Tightly connected modules might be

found…Might be loosely analogous to

a protein sequence module that is conserved, duplicated,

and diverged. Might see similarity across different

tissue, species, etc.

Data Storage & Collaborative

Bioinformatics

Integrative Bioinformatics

Genotype & Phenotype Data Sets

Comparative Bioinformatics & Data Mining

Data Visualization

& Stats

GeneKeyDB

Large Molecular data sets

Genetic

Data

Existing Knowledge

Phenotype Data

Microarray data, proteome, etc.

MuTrack WebQTL Williams et al UTHSC

Gene-centered data integration (via GeneKEyDB, BioFoundation)Comparative, Boolean, other operations on Gene Sets & Networks WebGestalt and Ingenuity are two examples

Comparative Cladistic

Phylogenetic Analysis

Network Analysis

CS, Stats, Bio

Graph Algorithms

Sequence and

Network Modularity

Network modules:

DuplicatedDiverged

Converged

Need to collaborate, integrate, and COMPARE to find differences in

biological NETWORKS. Collaborative, Integrative, and Comparative

Bioinformatics

WebGestalt Web-based Gene Set Analysis Toolkit http://bioinfo.vanderbilt.edu/webgestalt

BingZhang

Can upload gene sets based on

1)IDs (e.g. affy, locus link, protein IDs from chip, proteome, etc.)

2) Genome LocationOr…3) Gene Ontology(common biological process,

molecular function, cellular location)

Manipulate data, as set of genes or gene productsRNA expression, proteome, genomics, statistical genetics, etc. all produce list of genes that may function in a network.

1 of 3 things to doBoolean operations on multiple sets or retrieving orthologs.

2 of 3 things to doRetrieve Data and other IDs

1 of 3 things to do

3rd thing to do “Unusual” Properties across set

e.g. What GO (biological processes, molecular functions,

and cellular locations) are in the set? Are they any that seem to occur

more than than expected…

Co-occurrence of genes and publications (GRIF)

Protein Domains in set

Chromosome locations in set…

Pathways in set (1)

Pathways in set (2)

Ingenuity

A commercial tool for manipulating graphs (networks).

VU Licensehttp://bioinfo.vanderbilt.edu/wiki/Ingenuity

(Also some open source tools, cytoscape, GeNetViz, etc. )

Use of Commercial

tool, Ingenuity by Dr N. Deanne

and Dr. Beauchamp

Pathways (3)

Bioinformatics Resource Center Developing a Bioinformatics Resource Center (BRC) that will

consist

Training infrastructure and applied workshops Support faculty using existing tools and databases (CaBIG, custom

statistical packages, NCBI genomics, imaging,molecular structure resources).

Collaborative IT Establish accessible databases in shared cores and support faculty

using these resources. … Integrative IT

Web sites that integrate information from disparate data sets: Comparative IT

Systems biology: comparing data across multiple platforms to identify new patterns—tissues and cells, molecular pathways, model organisms, toxins, etc

(taken from VUMC Strategic Plan).

Other systems…

Construction projects that can be further formed by your needs… CollabCore and Lab Blogs Genepedia, GeneKeyDB, BioFoundation Extensions to Webgestalt TFCAT, GeneCAT, CladeCAT, Pazar

AcknowledgmentsBing Zhang Stefan Kirov

Leslie GallowayBarbara JacksonBetty Lou AlspaughOakley CrawfordSuzanne Baktash Xinxia Peng Harold Shanafield Sam WangAdam TebbeShawn Ericson

Jeff Horner

A few collaborators…Bonnie LaFleur Shawn Levy

Phil Dexheimer

Michael LangstonCS collaborator

Wyeth WassermanDan Goldowitz and the TMGCRob Williams et al

WebQtl, etc.Erich BakerDan Beauchamp

Natasha Deanne Chad Johnson

Recommended