19
BRC 2011 Session #4 – “Omics” Data

BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities pathogen datasets; host datasets; integrating pathogen-host datasets

Embed Size (px)

Citation preview

Page 1: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

BRC 2011Session #4 – “Omics” Data

Page 2: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Session #4 - Outline

Challenges and Opportunities pathogen datasets; host datasets; integrating pathogen-host datasets BRC approach to managing “omics” data mRNAs, ncRNAs, RNAi, proteomics, metabolomics systems-level analysis

Francis Ouellette – “Interesting Gene List” visualization and analysis & training approaches

Ideas from Systems Biology and DBP interactionsTalking PointsOpen discussion

Page 3: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Session #4 – Opportunities

Andrew R. Joyce & Bernhard Ø. Palsson, Nature Reviews Molecular Cell Biology 7, 198-210 (March 2006)

Page 4: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Session #4 – Challenges

Approach to “omics” data is somewhat pathogen specific Host “omics” data is relevant for bacteria, viruses and parasites; less so for vectors Pathogen “omics” relevant for bacteria, parasites and vectors; less so for viruses

What kind of “omics” data should be supported by BRCs? Pathogen vs host mRNA, ncRNA, RNAi, proteomics, metabolomics, lipidmics, others Raw, minimally processed or highly interpreted (status of NCBI SRA) Results data and metadata

What should we do with the data? Make available for download Make available for browsing Make available for visualization Make available for analysis

Current infrastructure is focused largely on genomics Genome sequence and gene/protein annotations about the pathogens; no infrastructure for host genes (Some progress on web services) Analysis and visualization tools are focused on comparative genomics; few tools for “omics” data analysis and visualization

Standard nomenclature for naming our data sets so that they can be more easily identified and exchanged How to acquire data sets of sufficient quality and quantity

Reliable sourcing of data, and acquisition from diverse off-site providers in real time Availability of data and metadata in public resources – lack of standards; difficult to access

Data quality, reliability, and reproducibility Technology/platform bias and lab-to-lab variations Noise in data and false positives Metadata driven analysis requires manual curation efforts to clean up signal from noise

Projection of omics data and its interpretation to closely related organisms Use of omics data to improve annotations Moving from data integration to knowledge integration

Page 5: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets
Page 6: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Session #4 – Opportunities

Currently no organized resources for viral pathogen host response/host factor data; this would be very useful for the virology community

Many BRC groups have extensive experience with microarray data and network analysis that could be leveraged

Host data is becoming increasing relevant for novel drug discovery Using networks to relate different kinds of data Ask system-level biological questions that cannot be answered by any one

‘omics data type alone Visualization of multiple layers of information, simultaneously. How many

tracks can one realistically add before a new approach is needed? Use omics data to identify/validate/correct gene models and gene functions,

regulatory elements, metabolic and signaling pathways, and phenotypes Development of simple tools and pipelines to enable HT processing of omics

data besides sequencing and transcriptomics

Page 7: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Talking points

Approach to “omics” data management Raw vs minimally processed vs interpreted results Facilitating relevant data capture from targeted projects Capturing other high value related data Adoption and use of data standards, especially for metadata

Utility of visualization and analysis of IGLsSupport for re-analysis of primary “omics” dataWhat to do with non-gene/protein-centric “omics” data

Page 8: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

“ I N T E R E S T I N G G E N E L I S T ” V I S U A L I Z AT I O N A N D A N A LY S I S & T R A I N I N G A P P R O A C H E S

Francis Ouellette

Page 9: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Overview of Systems Biology & DBP Projects

Four systems biology groups funded by NIAID, including: Systems Virology (Michael Katze group, Univ. Washington)

Influenza H1N1 and H5N1 and SARS Coronavirus statistical models, algorithms and software, raw and processed gene expression data,

and proteomics data Systems Influenza (Alan Aderem group, Institute for Systems Biology)

various Influenza virus microarray, mass spectrometry, and lipidomics data

ViPR Driving Biological Projects Abraham Brass, Mass. General Hospital

Dengue virus host factor database from RNAi screen Lynn Enquist / Moriah Szpara, Princeton University

Deep sequencing and neuronal microarrays for functional genomic analysis of Herpes Simplex Virus

Page 10: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Proposal for “Omics” Data

1. “Omics” data management (host)a) Project metadata

b) Assay/experiment metadata

c) Data analysis metadata

d) Primary results

e) Derived results (e.g. “interesting gene lists” (IGLs))

2. Add additional related datasets

3. Visualize IGLs in context of biological pathways and networks

4. Statistical analysis of pathway sub-network overrepresentation

5. Re-analysis of primary data using assembled pipeline tools

Page 11: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

What level of data should be stored and made accessible

Primary results data Need to define what is considered “primary” data for each platform

Microarray example: raw image files (.tiff) vs probe intensity values (.cel)

Opportunity for re-processing leading to re-interpretation

Derived/processed results “Interesting gene lists” from microarray, RNAi, proteomics, and

other experimental platforms “Interesting metabolites lists”

Page 12: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Metadata (MIBBI-compliant)

Project Level Metadata Hypothesis, rationale, study design, etc. Publications and links pertaining to the project Data providers - PI, other key personnel, affiliations, contact information

Assay Level Metadata Sample source and characteristics of source Sample type Source/sample treatment information Assay details

Data Processing/Analysis Level Metadata Algorithm(s) used for transforming primary to derived data Configuration parameters

Page 13: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Interpretation of “Interesting Gene Lists”

Visualizing interesting gene lists overrepresentation in protein-protein networks and/or biological pathways

Statistical assessment of enrichment

Page 14: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Visualizing Hits from Interesting Gene Lists

Select Dataset(s) of interestChoose all (or subset) of

genes on list Intersect/Subtract between

studies

Visualize selected genes as a biological network

Page 15: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

“Quick & Dirty” Overrepresentation Visualization

Reactome SkyPainter Limited to reactions and interactions found in Reactome db Visualizes “Big Picture” using pathway representations

Constructed using gene list from HCV study HCV host factors residing in the nucleus

Ribonucleoprotein complex, transcription factors, kinases, protein metabolism/modification, nucleic acid binding / metabolism

Page 16: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Visualizing Hits from Gene Lists (Cytoscape)

Page 17: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Statistical Enrichment Analysis

Gene Ontology biological process overrepresentation CLASSIFI

Protein interaction network module enrichment (PINME) analysis Obtain all known human protein-protein interactions from BioGRID Determine module (sub-network) structures (e.g. using dMoNet) Identify function of modules (e.g. using CLASSIFI) Determine overrepresentation statistics for IGLs Visualize results

Page 18: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Modules in Networks

Page 19: BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets

Talking points

Approach to “omics” data management Raw vs minimally processed vs interpreted results Facilitating relevant data capture from targeted projects Capturing other high value related data Adoption and use of data standards, especially for metadata

Utility of visualization and analysis of IGLsSupport for re-analysis of primary “omics” dataWhat to do with non-gene/protein-centric “omics” data