Upload
suzan-smith
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
PCBC Bioinformatics Core & Committee PCBC Steering Committee Call
Nathan Salomonis Cincinnati Children’s
Larsson Omberg, Sage Bionetworks
Nathan SalomonisDivision of Biomedical Informatics, CCHMC
Bionformatics Working Group• Bruce Aronow• Nathan Salomonis• Phillip Dexheimer• Carolyn Lutzko
• Alex Pico
• Larsson Omberg• Kenny Daily
• Antonis Hatzopoulos
• Winston Hide• Shanan Sui
• Joseph Huo• Elias Zambidis
• Michael Kyba
• Jennifer Larkin
• Lynn Schriml• Michael Terrin• Ling Tang
*
*
*
**
*****
*
*
*
• C4• SAGE BIONETWORKS• VANDERBILT U.• HARVARD U.• JOHNS HOPKINS U.• USCF• STANFORD U.• U. MINNESOTA• NHLBI• ADMINISTRATIVE CORE U. MARYLAND
PCBC Bioinformatics Committee & Core
• Create structured annotations for iPSC generation and derived products (metadata).
• Provide new tools and resources for access to analysis of C4 data.
• Provide education to the PCBC and beyond.
• Spearhead informatics efforts in the consortium.
Prior Progress on Primary Aims
• Developed Metadata standards for cell lines• Developed an advanced online portal in Synapse for direct
access to: – PCBC omics datasets– Integrated analysis results– Metadata– Protocols (experimental, software)– Other datasets
• Created specialized tools for PCBC data access:– ToppGene progenitor signatures (pathway analysis)– Cytoscape tool for Synapse data visualization– AltAnalyze for integrated omics analyses and progenitor cell-type
prediction
• Collaboratively wrote papers for description of these resources and datasets.
Major Updates
• PCBC Omics Portal is Live (Stealth Release). • Resubmitting manuscript 1 following Cell
Stem Cell encouraging reviews.• Multitude of new interfaces, automated
worfklows, result sets in Synapse.• Significant progress on differentiation
manuscript analyses.• New software to help experimentalists
analyze their own omics data.• Recent and future bioinformatics workshops.
Synapse: Online repository for PCBC data access, annotations, sharing and analysis
Online repository for PCBC data access, annotations, sharing and analysis.
Key Features of Synapse
• Download PCBC Omics data from the web or programmatically (R/Python/Java).
• Easily post new datasets, images or presentation.
• “Time Machine” of Data Files and Analyses.
• Access Control – Private Work Areas.
• DOI Annotation for Direct Data Access from Publications.
• Wiki Content – Editing
• Help Desk
Target Audiences of PCBC Database in Synapse
InvestigatorsExplore Genes/Pathways
Explore Processing Pipelines
Genes/Pathways Search Engines
Review Results of Previous Analyses
Communicate Early Results
Share Results
Target Audiences of PCBC Database in Synapse
Target Audience
Bioinformaticians
Process Own DataUsing Defined pipelines
Download & Query Raw data
Access directly from R/Python/JAVA
Download Analysis Results
Share analysis
Target Audience
Usage
• Over 300 users outside of the bioinformatics core.– 111 registered PCBC users accessing the site.– 200 folks outside of the PCBC
Brand New Features in Synapse
• PCBC Portal is open-access to anyone with a free Synapse account (March 2015 – stealth release prior publication).
• New and Improved heatmap viewer with integrated RNA-Seq, DNA-methylation and microRNA.
• Simple interactive metadata navigator for cell lines (to be updated with Wicell).
• Amazon hosted virtual computing environment with tools for sequence analysis of any data.
• Expanded bioinformatics best practices and protocol comparison (tutorial videos, algorithm comparisons, etc.).
• Improved attribution pages.
• New analysis methods and results associated with bioinformatics core papers being (re)submitted
PCBC Metadata Developed According to Global Vocabularies of Terms
Creating Metadata Standards for SharingExchanging and Analyzing PCBC Data
How is the PCBC Metadata Standard Organized ?- categories of metadata
- describing cell line, host and classification methods
-including investigator, cell of origin, method of reprogramming, -reprogramming gene combinations, donor gender, age, ethnicity and disease status
Metadata Collection Standards- developed for the PCBC consortium- defined through an iterative process- relevant terms mapped to established community
ontologies- metadata collected for each cell line submitted to C4
Metadata Associated Data
• mRNA-Seq– 301 samples
• microRNA-Seq– 252 samples
• DNA-methylation– 131 samples
PCBC Metadata Developed According to Global Vocabularies of Terms
Ontologies: Disease Ontology, NCBI Taxonomy vocabulary, Cell Ontology, Cell Line Ontology, HsapDv (human developmental stage ontology), NCI Thesaurus (race, ethnicity), PATO (gender), Human Phenotype Ontology
Tools:
PCBC Metadata Developed According to Global Vocabularies of Terms
PCBC Metadata Developed According to Global Vocabularies of Terms
PCBC Metadata Annotations as Exchange Format (ISA-Tab)
Allow Global Data Sharing/ReuseDocument Provenance/History of Data
Investigation-Study-Assay (isatab)
New Software for PCBC Researchers
• We are in the final stages of releasing tools to allow bioinformatics novice researchers to analyze their own bulk and single-cell RNA-Seq datasets (AltAnalyze version 2.10).
• New tools for cell-type prediction automated within this tool-kit.
• Used by over a dozen PCBC researchers at the Stanford Bioinformatics training course.
Manuscripts
1. Re-Submission of the first C4 Manuscript (Cell Reports):– Integrated Genomic Analysis of Diverse Induced Pluripotent
Stem Cell Lines Identifies Novel Molecular Determinants of Pluripotency
2. Data Descriptor manuscript (Following manuscript 1 acceptance):– Comprehensive Characterization of Diverse Pluripotent Stem
Cells from the Progenitor Cell Biology Consortium
3. Expected Submission in September– Multi-Lineage Characterization of Diverse Induced Pluripotent
Stem Cells and their Derivatives • (collaborative multicenter effort lead by Sage)