Upload
merryl-powell
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
GeneSpring
January 2009
• Agilent Bioinformatics & GeneSpring overview
• GX 10 Guided & Advanced Data Analysis
• Practice & Discussion
GeneSpring GX 10 for Gene Expression Analysis
February 2009
Antoni Wandycz Elise ChangAgilent Technologies
GeneSpring
January 2009
Agilent Bioinformatics Suite
Transcriptome‘GX 10’
miRNA, QPCR, Exon
Metabolome ‘GX 11’
Proteome ‘GX 11’
DNA‘DNA Analytics’ChIP, Methyl, CGH
DNA
RNA
Protein CH2OH
GeneSpring Workgroup
Data storage & Computation
Share & Collaborate
GeneSpring
January 2009
History and Future of GeneSpring
2004 200920062005 2007 2008
GX 7.3.1 Released
GX 10 • GX 7.3 functions• miRNA, Exon,
QPCR analysis• Pathway
analysis• Support for
eArray
GX 9 development on avadis platform
GX 11
Agilent acquires Silicon Genetics
Agilent acquires Stratagene
GX 9
GeneSpring
January 2009
GeneSpring GX: Multiple-Platform Compatibility
•Agilent Feature Extraction files (>FE v8.5)
•Affymetrix CEL, CHP
•llumina BeadStudio (>v 3.1)
•ABI SDS, RQ Manager (for QPCR)
•Custom Formats (ALL 1 & 2-color microarrays)
•.GPR files from AXON Scanners (GenePix software)
GeneSpring
January 2009
GeneSpring GX 10 – Key features
• Guided Workflows
• New Applications - miRNA, QPCR, Exon & more in future
• Project-based organization & Translation-on-the-fly
• Biological Context - Pathway Analysis, GSEA, GO, IPA, etc.
• Customization - Scripting in Jython and R
GeneSpring
January 2009
Pre-determined steps:
• Normalization
• QC
• Statistics
• GO
• Pathways
GX 10 Key features: Guided Workflows
GeneSpring
January 2009
Project-based organization
GeneSpring
January 2009
GeneSpring GX 10: Translation (Chap 3 in GX 10 manual)
•Comparing Platformsi.e. Affymetrix vs. Agilent vs. Spotted
•Comparing Speciesi.e. Mouse vs. Human -- Homology Table (NCBI’s Homologene)
•Comparing Applications: i.e. Gene Expression & QPCR or miRNA
GeneSpring
January 2009
Compare platforms, applications, species
GX 10 Key features: Translation
Homology table displayed
GeneSpring
January 2009
Venn Diagram
Compare experiments from different platforms, applications, & species
GeneSpring
January 2009
GX 10 Key features: Biological Context
GO Analysis (Fx, Process, Location)
GSEA (Gene Set Enrichment Analysis)
Pathway Analysis
GeneSpring
January 2009
GeneSpring GX 10: Gene Ontology (GO) Analysis
Likelihood that your genes of interest fell into a GO category, just by chance
HELP always
available
GeneSpring
January, 2009
Pathway Analysis in GX 10: Two types of Pathway Analysis in GX 10:
1. ‘Pathway Analysis’ ToolBuilding networks of related entities8 Pathway Interaction Databases and NLP
2. ‘Find Significant Pathways’ ToolEntity-list enrichment with known pathways(Step 8 in Guided Workflow)
BioPax format pathways (.owl)
GeneSpring
January 2009
Overlay Networks with Expression Data/Conditions
GeneSpring
January 2009
Cellular Location Overlay of Network
GeneSpring
January 2009
‘Find Similar Pathways’ Tool
Analysis performed on all pathways imported into GX 10
Significant enrichment of my genes in particular pathways?
Significant pathways are added to experiment
GeneSpring
January 2009
e-Seminars & Workshops www.genespring.com
Recorded Seminars:
1. Introduction to GX 10
2. Analysis of miRNA & GE data
3. Analysis of QPCR & GE data
4. Alternative Splicing
5. Pathway Analysis
GeneSpring
January, 2009
Affymetrix Files
Getting Started in GeneSpring GX 10Advanced Workflow: (To Find Differentially Expressed Genes)
GeneSpring
January 2009
Cardiogenomics dataset: Affymetirx data
Congestive heart failure (CHF)is a degenerative condition in which the heart no longer functions effectively as a pump.
The most common cause of CHF isdamage to the heart muscle by not enough oxygen. This is usually due to narrowing of the coronary arteries which take blood to the heart.
Idiopathic cardiomyopathy results in weakened hearts due to an unknown cause.
Ischemic cardiomyopathy is causedby a lack of oxygen to the heart due to coronary artery disease.
GeneSpring
January 2009
Cardiogenomics dataset: Affymetirx data
Experimental Goal:
To identify the molecular mechanisms underlying congestive heart failure, gene expression profiles were compared between male and female patients with idiopathic, ischemic or non-failing heart conditions.
Male Female
Non-failing 2 samples 2 samples
Idiopathic 2 samples 2 samples
Ischaemic 2 samples 2 samples
CEL files generated by Affymetrix GCOS
GeneSpring
January 2009
SAMPLE GENDER CHFETIOLOGY
1 Female Idiopathic
2 Female Idiopathic
3 Male Idiopathic
4 Male Idiopathic
5 Female Ischemic
6 Female Ischemic
7 Male Ischemic
8 Male Ischemic
9 Female Non-failing
10 Female Non-failing
11 Male Non-failing
12 Male Non-failing
Experimental Setup in GeneSpringGender Interpretation
Condition 1: Female (Samples 1, 2, 5, 6, 9, 10)
Condition 2: Male (Samples 3, 4, 7, 8, 11, 12 )
The selected Interpretation determines how the samples are displayed in the various views and the comparisons that are made in analyses such as statistics.
CHF Etiology Interpretation
Condition 1: Idiopathic (Samples 1, 2, 3, 4) Condition 2: Ischemic (Samples 5, 6, 7, 8)
Condition 3: Non-failing (Samples 9, 10, 11, 12)
Gender/CHF Etiology Interpretation
Condition 1: Female/Idiopathic (Samples 1, 2) Condition 2: Male/Idiopathic (Samples 3, 4)Condition 3: Female/Ischemic (Samples 5, 6)Condition 4: Male/Ischemic (Samples 7, 8)Condition 5: Female/Non-failing (Samples 9, 10)Condition 6: Male/Non-failing (Samples 11,
12)
GeneSpring
January 2009
GeneSpring GX 10 Vocabulary
Project – collection of experiments
Entity – gene, probe, probeset, exon, etc.
Interpretation – samples that are grouped together based on conditions.
Technology – A file containing information on array design and biological information (annotation) for all the entities on the array
Biological Genome – a collection of all major annotations (NCBI) for any organism; essential for Generic/Custom arrays lacking annotations
GeneSpring
January, 2009
Getting Started in GeneSpring
Cardiogenomics Experiment: Transcriptional profiling to learn more about molecular mechanisms underlying Congestive Heart Failure (CHF)
Sample Data: Myocardial samples from patients with normal hearts and Ischemic & Idiopathic cardiomyopathies (3 Etiologies)
Variables: Gender (2) and Etiology (3)
Technology: Affymetrix U133Plus2 array
GeneSpring
January 2009
Getting Started: Create New Project
From Startup screen OR from File/New Project
GeneSpring
January 2009
Getting Started with Advanced Analysis
Experiment Type: Affymetrix Expression (3 Affy choices!)
Workflow Type: Advanced Analysis
GeneSpring
January 2009
Select Data for Experiment
Select ‘Choose Files’ to load data files found on your computer.
Note: ‘Choose Samples’ option is used when creating experiments with samples already loaded into GX 10
GeneSpring
January 2009
Sample Upload
GeneSpring
January 2009
Summarization Algorithms in GX 10 for CEL Files
Summarization of Affymetrix probes
and baseline transformation of probeset values.
GeneSpring
January 2009
Summarization algorithms in GX 10
BACKGROUND SUBTRACTION
NORMALIZATION PROBESUMMARIZATION
RMA PM based Quantile Log (PM)
MAS5 PM-MM based Scaling One-step Tukey Biweight
PLIER PM-MM based Quantile Log (PM)
LiWong PM-MM based Quantile Linear (PM)
GCRMA PM-MM based Quantile Log (PM)
In addition to different calculations, the algorithms differ in the order in which Normalization and Summarization are performed.
GeneSpring
January 2009
CEL files are the raw data files that contain signal values for individual probes.
CEL files are preprocessed to generate one value per probeset.
Preprocessing steps are:1. Background
subtraction2. Normalization3. Summarization of
probesetvalues
Different preprocessing algorithms are available.
DAT File
CEL File
CDF File
+
ImageAnalysis
Hybridization& Scanning
Array
Preprocessing of Affymetrix Arrays
CHP
GCOSAGCC
GeneSpring
January 2009
BoxWhisker plot: Summary of Normalized Intensities
GeneSpring
January 2009
Advanced Workflow Experiment Setup
Experiment Grouping
Specify parameters/conditions
GeneSpring
January 2009
Experiment Grouping
The experimental parameters are added in this window.
For each array, the particular parameter value (condition) is also specified.
Values can be added manually or loaded from a saved file (circled in Red).
GeneSpring
January 2009
Advanced Workflow Experiment Setup
Create Interpretation
In the Guided Workflow, only one interpretation is automatically provided.
Here, users can create multiple interpretations
GeneSpring
January 2009
Grouping and Interpretation
2 experimental variables: CHF Etiology and Gender
For this experiment, 3 interpretations could be created:
1) Gender
2) CHF Etiology (Ischemic, Idiopathic, non-failing)
3) CHF Etiology and Gender: This interpretation is automatically created in the Guided Workflow.
Example: Gender Only
GeneSpring
January, 2009
Creating Interpretations: step 2 of 3
GeneSpring
January, 2009
Creating Interpretations: step 3 of 3
GeneSpring
January 2009
Advanced Analysis Workflow: Quality Control
QC on Samples and Probes automatically performed in Guided Workflow
Users can specify settings beyond those available in Guided Workflow
GeneSpring
January, 2009
Quality Control on Samples
GeneSpring
January, 2009
Filter by Expression
GeneSpring
January 2009
Advanced Analysis Workflow: Analysis
Statistical Analysis
Filter on Volcano Plot (both Stats and Fold Change)
Fold Change
Clustering
Find Similar Entities
Filter on Parameters
PCA
GeneSpring
January 2009
Getting Started with Guided Workflow
Experiment Type: Agilent Single-color
Workflow Type: Guided Workflow
GeneSpring
January 2009
Sample Upload
GeneSpring
January 2009
BoxWhisker plot: Summary of Normalized Intensities
GeneSpring
January 2009
GeneSpring GX 10: Important Menu options:
Project: Import/Export project zip
Tools: Script Editor/ R EditorImport BioPAX pathwaysGS7 data migrationOptions…
Annotations: Update Technology AnnotationsCreate Biological GenomeUpdate Pathway Interactions
Help: License ManagerUpdate Product
GeneSpring
January, 2009
Pathway Analysis
To use ‘Find Significant Pathways’ Tool:
1. Download BioPax format (.owl) pathways www.biopax.org to your computer
2. Import .owl pathways into GX 10 from Tools and ‘Import BioPax pathways’ option
3. From Workflows menu (in the right margin of GX 10) select ‘Find Similar Pathways’ and choose your Entity List of interest
GeneSpring
January, 2009
Performing Pathway Analysis in GX 10:
1. In the Annotations Menu, select ‘Update Pathway Interactions’ from Agilent Server
2. Before choosing an organism, GX 10 must first create a Pathway Database Infrastructure. May take >10 min
3. Once the Infrastructure database is complete, go back to Annotations/Update Pathway Interactions and choose your preferred organism. May take >20 minutes
4. From Workflows menu (in the right margin of GX 10) select ‘Pathway Analysis’ to begin building networks
GeneSpring
January 2009
Updating Annotations: Chap 3 in GX 10 pdf manual, pg. 51
Option 2: Update from file
Option 1: Update from Agilent Server
Option 3 is new in GX10: Update directly from NCBI
from GX (Biological Genome)
GeneSpring
January 2009
GeneSpring GX 10: Reference pages in Manual
Creating/Updating Technologies & Annotations: Chapter 3 in GX 10 pdf manual, pg. 51From 1) Agilent server; 2) file; 3) NCBI (Biological Genome)
GS7 to GS10 Data Migration:Chapter 4 in GX 10 manual, pg. 71 and in Quick Start Guide
Translation: Chapter 3.3 in pdf manual (pg 63)
GeneSpring
January, 2009
Thank you
www.genespring.com
Technical Support 24 hours/5 days per week [email protected]
1-800-227-9770 (option 6, 2)
[email protected]@agilent.com
[email protected] (Genomics)
GeneSpring
January 2009
Automated GX 7 Migration Tool Chapter 4 in GX 10 manual
Step1: Prepare for GS7 Migration- tool automatically prepares data
for migration (for large # of samples, this step takes time)
Step2: Select GS7 genome to migrate to GS10- all experiments, samples,
interpretation, gene lists, trees, parameter values, condition values, and classifications
will be automatically migrated
Step3: Open Project with name corresponding to GX 7 genome to see the migrated data. Note that if genome was
assigned a project in GX 7, this name will be the name of the project in GX 10 instead of the name of GX 7 genome
GeneSpring
January 2009
GX 10: Biological Context
GO Analysis (Fx, Process, Location)
GSEA (Gene Set Enrichment Analysis)
GSA (Gene Set Analysis)
Pathway Analysis (Interaction DB)
Find Similar Entity Lists
Find Significant Pathways (BioPax.org)
Link to Ingenuity’s IPA
NLP (mine literature)
GeneSpring
January 2009
GSEA
GSEA interrogates genome-wide expression profiles from samples belonging to two different classes (e.g. normal and tumor) and determines whether genes in an a priori defined gene set correlate with class distinction
Reference: Subramanian et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.
PNAS. September 30, 2005, 10.1073
GeneSpring
January 2009
GSEA Method
1. Rank genes based on the correlation between their expression intensities and class distinction
• Genes that differ most in their expression between the two classes will appear at the top and bottom of the list
• Assumption is that genes related to the phenotypic distinction of the classes will tend to be found at the top and bottom of the list
2. Calculate enrichment score (ES) to reflect the degree of overrepresentation of genes in a particular gene set at the top and bottom of the entire ranked list
3. Derive p-value for the ES to estimate its significance level
4. Adjust p-value for multiple testing
GeneSpring
January 2009
Gene Set Enrichment Analyses
GeneSpring
January 2009
Gene Set Enrichment Analyses
How is performing GSEA or GSA on GO gene sets different from doing GO Analysis on a list of differentially expressed genes?
• Statistical analysis can miss genes with small changes relative to noise that, as a group, can have significant impact on the observed difference in phenotype
– Use All Entities list as input for GSEA or GSA
• Instead of looking at only at individual differentially expressed genes, take a genome-wide approach to see if gene sets are associated with the phenotypic class distinction
– Enrichment in GO Analysis done with Fisher’s Exact while GSEA/GSA is done with a type of running sum statistics
• User can specify any Entity List as gene sets in GeneSpring GX
GeneSpring
January 2009
Identifiers Necessary for GSEA
Technology must contain Gene Symbol
Columns that must be marked in custom technology to perform GSEA:
• Annotation file must contain a column (Column X) containing Gene Symbol
– Column X must be marked “Gene Symbol”
– Select “Gene Symbol” mark from the drop-down menu while creating Custom technology.
GeneSpring
January 2009
Gene Sets
GSEA/GSA can use either Broad lists or any Entity Lists in GeneSpring
Broad Institute has defined four categories of gene sets:
• C1- Grouped based on cytogenic location.
• C2- Functional lists. ~1000 gene lists corresponding to pathways or functional process (if they are both involved in inflammatory response, they can also be in the same list)
• C3- Regulation lists. Grouped according by promoter analysis. Genes are regulated by the same motif (may or may not know transcription factor). Cases where they simply share same binding motif and therefore assumed to be co-regulated.
• C4- Proximity to known oncogene and tumor suppresors. For example, all the neighbors of BRCA.
• C5- GO gene sets. Each category is represented as a gene set except for very broad categories such as Biological Process and categories with less than 10 genes
GeneSpring
January 2009
Key Differences Between GSEA and GSA
The two algorithms share the same idea, but differ in the way they determine what gene sets are significantly enriched
• Differs in the GSA "maxmean" statistic: this is the mean of the positive or negative part of gene scores in the gene set, whichever is larger in absolute value. Efron and Tibshirani shows that the method used in GSA is often more powerful than the modified Kolmogorov-Smirnov statistic used in GSEA.
• GSA uses a somewhat different null distribution for estimation of false discovery rates: it does "restandardization" of the genes, in addition of the permutation of samples (done in GSEA)
• GSA also can handle more than two conditions (limitation in GSEA)