41
Analysis Environments Analysis Environments For Functional Genomics For Functional Genomics Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana- Champaign [email protected] , www.beespace.uiuc.edu Informatics Research First Annual BeeSpace Workshop June 6, 2005

Analysis Environments For Functional Genomics

  • Upload
    mahsa

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Analysis Environments For Functional Genomics. Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign [email protected] , www.beespace.uiuc.edu. Informatics Research First Annual BeeSpace Workshop June 6, 2005. What are Analysis Environments. - PowerPoint PPT Presentation

Citation preview

Page 1: Analysis Environments For Functional Genomics

Analysis EnvironmentsAnalysis Environments For Functional GenomicsFor Functional Genomics

Bruce R. SchatzInstitute for Genomic Biology

University of Illinois at [email protected] , www.beespace.uiuc.edu

Informatics ResearchFirst Annual BeeSpace Workshop

June 6, 2005

Page 2: Analysis Environments For Functional Genomics

What are Analysis EnvironmentsWhat are Analysis Environments

Functional Analysis Find the underlying Mechanisms Of Genes, Behaviors, Diseases

Comparative Analysis Top-down data mining (vs Bottom-up) Multiple Sources especially literature

Page 3: Analysis Environments For Functional Genomics

Building Analysis EnvironmentsBuilding Analysis Environments

Manual by Humans Interaction user navigation Classification collection indexing

Automatic by Computers Federation search bridges Integration results links

Page 4: Analysis Environments For Functional Genomics

Needles and HaystacksNeedles and Haystacks

Genes Honey Bees have 13K genes Perhaps 100 have known functions

Paths Perhaps 30K protein families exist KEGG has 200 known pathways

Statistical Clustering for Interactive DiscoveryAcross Two Orders of Magnitude!

Page 5: Analysis Environments For Functional Genomics

Trends in Analysis EnvironmentsTrends in Analysis Environments

Central versus Distributed Viewpoints

The 90s Pre-Genome Entrez (NIH NCBI) versus WCS (NSF Arizona)

The 00s Post-Genome GO (NIH curators) versus BeeSpace (NSF Illinois)

Page 6: Analysis Environments For Functional Genomics

Pre-Genome EnvironmentsPre-Genome Environments

Focused on Syntax pre-Web

WCS (Worm Community System) Search words across sources Follow links across sources Words automatic, Links manual

Towards Uniform Searching

Page 7: Analysis Environments For Functional Genomics

Post-Genome EnvironmentsPost-Genome Environments

Focused on Semantics post-Web

BeeSpace (Honey Bee Inter Space) Navigate concepts across sources Integrate data across sources Concepts automatic, Links automatic

Towards Question Answering

Page 8: Analysis Environments For Functional Genomics

Worm Community SystemWorm Community System WCS Information:Literature BIOSIS, MEDLINE, newsletters,

meetings

Data Genes, Maps, Sequences, strains, cells

WCS FunctionalityBrowsing search, navigationFiltering selection, analysisSharing linking, publishing

WCS: 250 users at 50 labs across Internet (1991)

Page 9: Analysis Environments For Functional Genomics

WCSMolecular

Page 10: Analysis Environments For Functional Genomics

WCS Cellular

Page 11: Analysis Environments For Functional Genomics

WCS invokes

gm

Page 12: Analysis Environments For Functional Genomics

WCS vis-à-vis

acedb

Page 13: Analysis Environments For Functional Genomics

from Objects to Concepts

from Syntax to Semantics

Infrastructure is Interaction with Abstraction

Internet is packet transmission across computers

Interspace is concept navigation across repositories

Towards the InterspaceTowards the Interspace

Page 14: Analysis Environments For Functional Genomics

THE THIRD WAVE OF NET EVOLUTIONTHE THIRD WAVE OF NET EVOLUTION

PACKETS

OBJECTS

CONCEPTS

Page 15: Analysis Environments For Functional Genomics

Technology

Engineering

Electrical

FORMAL

INFORMAL

(manual)

(automatic)

IEEE

communities

groups

individuals

LEVELS OF INDEXESLEVELS OF INDEXES

Page 16: Analysis Environments For Functional Genomics

Navigation in MEDSPACENavigation in MEDSPACE

For a patient with Rheumatoid Arthritis Find a drug that reduces the pain (analgesic) but does not cause stomach (gastrointestinal) bleeding

Choose DomainChoose Domain

Page 17: Analysis Environments For Functional Genomics

Concept SearchConcept Search

Page 18: Analysis Environments For Functional Genomics

Concept NavigationConcept Navigation

Page 19: Analysis Environments For Functional Genomics

Retrieve DocumentRetrieve Document

Page 20: Analysis Environments For Functional Genomics

Navigate DocumentNavigate Document

Page 21: Analysis Environments For Functional Genomics

Post-Genome Informatics IPost-Genome Informatics I

Comparative Analysis within theDry Lab of Biological Knowledge

Classical Organisms have Genetic Descriptions.There will be NO more classical organisms beyondMice and Men, Worms and Flies, Yeasts and Weeds.

Must use comparative genomics on classical organismsVia sequence homologies and literature analysis.

Page 22: Analysis Environments For Functional Genomics

Post-Genome Informatics IIPost-Genome Informatics II

Functional Analysis within theDry Lab of Biological Knowledge

Automatic annotation of genes to standard classifications, e.g. Gene Ontology via homology on computed protein sequences.

Automatic analysis of functions to scientific literature, e.g. concept spaces via text extractions. Thus must use functions in literature descriptions.

Page 23: Analysis Environments For Functional Genomics

Conceptual Navigation in BeeSpaceConceptual Navigation in BeeSpace

NeuroscienceLiterature

MolecularBiology

Literature

BeeLiterature

Flybase,WormBase

BeeGenome

Brain RegionLocalization

Brain GeneExpression

Profiles

BehavioralBiologist

MolecularBiologist

Neuro-scientist

Page 24: Analysis Environments For Functional Genomics

BeeSpace Analysis EnvironmentBeeSpace Analysis Environment Build Concept Space of Biomedical Literature

for Functional Analysis of Bee Genes

-Partition Literature into Community Collections-Extract and Index Concepts within Collections-Navigate Concepts within Documents-Follow Links from Documents into Databases

Locate Candidate Genes in Related Literatures then follow links into Genome Databases

Page 25: Analysis Environments For Functional Genomics

Question AnsweringQuestion AnsweringBehaviour Organism Gene

Molecular Function

Reference

Foraging

Rover vs sitter phenotype Drosophila melanogaster for Protein kinase G 8

Roamer vs dweller phenotype C. elegans egl-4 Protein kinase G 16

Division of labour: age at onset of foraging

Apis mellifera for Protein kinase G 9

Division of labour: age at onset of foraging

Apis mellifera mlv Mn transporter 19

Division of labour: foraging-related? Apis mellifera per Transcription cofactor 68

Division of labour: foraging-related? Apis mellifera ache Acetylcholine esterase 69

Division of labour: foraging-related? Apis mellifera IP(3)K Inositol signaling 70

Foraging specialization: nectar vs. pollen

Apis mellifera pkc Protein kinase C 71

Social feeding Drosophila melanogaster dpnfNeuropeptide Y

(NPY) homolog21

Social feeding (aggregation) C. elegans npr-1 Receptor for NPY 22, 23

Page 26: Analysis Environments For Functional Genomics

Functional PhrasesFunctional Phrases<gene> encodes <chemical> Sokolowski and colleagues demonstrated in Drosophila melanogaster that the foraging gene (for) encodes a cGMP dependent protein kinase (PKG). The dg2 gene encodes a cyclic guanosine monophosphate (cGMP)- dependent protein kinase (PKG). <chemical> affects/causes <behavior> Thus, PKG levels affected food-search behavior. cGMP treatment elevated PKG activity and caused foraging behavior. <gene> regulates <behavior> Amfor, an ortholog of the Drosophila for gene, is involved in the regulation of age at onset of foraging in honey bees. This idea is supported by results for malvolio (mvl), which encodes a manganese transporter and is involved in regulating Drosophila feeding and age at onset of foraging in honey bees.

Page 27: Analysis Environments For Functional Genomics

Data Integration (FlyBase Gene)Data Integration (FlyBase Gene)D. melanogaster gene foraging , abbreviated as for , is reported here . It has also been known in FlyBase as BcDNA:GM08338, CG10033 and l(2)06860. It encodes a product with cGMP-dependent protein kinase activity (EC:2.7.1.-) involved in protein amino acid phosphorylation which is a component of the cellular_component unknown . It has been sequenced and its amino acid sequence contains an eukaryotic protein kinase , a protein kinase C-terminal domain , a tyrosine kinase catalytic domain , a serine/Threonine protein kinase family active site , a cAMP-dependent protein kinase and a cGMP-dependent protein kinase . It has been mapped by recombination to 2-10 and cytologically to 24A2--4 . It interacts genetically with Csr . There are 27 recorded alleles : 1 in vitro construct (not available from the public stock centers), 25 classical mutants ( 3 available from the public stock centers) and 1 wild-type. Mutations have been isolated which affect the larval nerve terminal and are behavioral, pupal recessive lethal, hyperactive, larval neurophysiology defective and larval neuroanatomy defective. for is discussed in 80 references (excluding sequence accessions), dated between 1988 and 2003. These include at least 6 studies of mutant phenotypes , 2 studies of wild-type function , 3 studies of natural polymorphisms and 7 molecular studies . Among findings on for function, for activity levels influence adult olfactory trap response to a food medium attractant. Among findings on for polymorphisms, the frequency of for R and for s strains in three natural populations are studied to determine the contribution of the local parasitoid community to the differences in for R and for s frequencies.

Page 28: Analysis Environments For Functional Genomics

BeeSpace Information SourcesBeeSpace Information Sources Biomedical Literature- Medline (medicine)- Biosis (biology)- Agricola, CAB Abstracts, Agris (agriculture)

Model Organisms (heredity)-Gene Descriptions (FlyBase, WormBase) Natural Histories (environment)-BeeKeeping Books (Cornell, Harvard)

Page 29: Analysis Environments For Functional Genomics

Medical Concept Spaces (1998)Medical Concept Spaces (1998)

Medical Literature (Medline, 10M abstracts) Partition with Medical Subject Headings (MeSH)

Community is all abstracts classified by core term 40M abstracts containing 280M concepts computation is 2 days on NCSA Origin 2000

Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K with > 10K)

Page 30: Analysis Environments For Functional Genomics

Biological Concept Spaces (2006)Biological Concept Spaces (2006)

Compute concept spaces for All of BiologyBioSpace across entire biomedical literature

50M abstracts across 50K repositories

Use Gene Ontology to partition literature into biological communities for functional analysis

GO same scale as MeSH but adequate coverage?GO light on social behavior (biological process)

Page 31: Analysis Environments For Functional Genomics

Concept SwitchingConcept Switching

In the Interspace…

each Community maintains its own repository

Switching is navigating Across repositories

use your specialty vocabulary to search another specialty

Page 32: Analysis Environments For Functional Genomics

CONCEPT SWITCHINGCONCEPT SWITCHING

“Concept” versus “Term” set of “semantically” equivalent terms

Concept switching region to region (set to set) match

term

Semantic region

Concept SpaceConcept Space

Page 33: Analysis Environments For Functional Genomics

Biomedical SessionBiomedical Session

Page 34: Analysis Environments For Functional Genomics

Categories and ConceptsCategories and Concepts

Page 35: Analysis Environments For Functional Genomics

Concept SwitchingConcept Switching

Page 36: Analysis Environments For Functional Genomics

Document RetrievalDocument Retrieval

Page 37: Analysis Environments For Functional Genomics

Interactive Functional AnalysisInteractive Functional AnalysisBeeSpace will enable users to navigate a uniform space of

diverse databases and literature sources for hypothesis development and testing, with a software system beyond a searchable database, using literature analyses to discover functional relationships between genes and behavior.

Genes to BehaviorsBehaviors to GenesConcepts to ConceptsClusters to ClustersNavigation across Sources

Page 38: Analysis Environments For Functional Genomics

BeeSpace Information SourcesBeeSpace Information Sources

General for All Spaces:

Scientific Literature-Medline, Biosis, Agricola, Agris, CAB Abstracts-partitioned by organisms and by functions

Model Organisms -Gene Descriptions (FlyBase, WormBase, MGI, OMIM,

SCD, TAIR)

Special Sources for BeeSpace:-Natural History Books (Cornell Library, Harvard Press)

Page 39: Analysis Environments For Functional Genomics

XSpace Information SourcesXSpace Information SourcesOrganize Genome Databases (XBase)Compute Gene Descriptions from Model OrganismsPartition Scientific Literature for Organism XCompute XSpace using Semantic Indexing

Boost the Functional Analysis from Special SourcesCollecting Useful Data about Natural Historiese.g. CowSpace Leverage in AIPL Databases

Page 40: Analysis Environments For Functional Genomics

Towards the InterspaceTowards the Interspace

The Analysis Environment technology is GENERAL!

BirdSpace? BeeSpace?PigSpace? CowSpace? BehaviorSpace? BrainSpace?

BioSpace… Interspace

Page 41: Analysis Environments For Functional Genomics

Prototype SystemPrototype System

Overall Architecture and Interface -- Todd Littell

Language Parsing and Entity Recognition – Jing Jiang Normalization and Theme Clustering – Qiaozhu Mei Concept Navigation and Switching – Azadeh Shakery Gene Summarization and Linking – Xu Ling Collection Development and Navigation – Xin He

Specialty Systems Question Answering – Eugene Grois Annotation Pipeline – Pouya Kheradpour