59
GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene Expression Analysis February 2009 Antoni Wandycz Elise Chang Agilent Technologies

GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

Embed Size (px)

Citation preview

Page 1: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

• Agilent Bioinformatics & GeneSpring overview

• GX 10 Guided & Advanced Data Analysis

• Practice & Discussion

GeneSpring GX 10 for Gene Expression Analysis

February 2009

Antoni Wandycz Elise ChangAgilent Technologies

Page 2: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Agilent Bioinformatics Suite

Transcriptome‘GX 10’

miRNA, QPCR, Exon

Metabolome ‘GX 11’

Proteome ‘GX 11’

DNA‘DNA Analytics’ChIP, Methyl, CGH

DNA

RNA

Protein CH2OH

GeneSpring Workgroup

Data storage & Computation

Share & Collaborate

Page 3: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

History and Future of GeneSpring

2004 200920062005 2007 2008

GX 7.3.1 Released

GX 10 • GX 7.3 functions• miRNA, Exon,

QPCR analysis• Pathway

analysis• Support for

eArray

GX 9 development on avadis platform

GX 11

Agilent acquires Silicon Genetics

Agilent acquires Stratagene

GX 9

Page 4: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GeneSpring GX: Multiple-Platform Compatibility

•Agilent Feature Extraction files (>FE v8.5)

•Affymetrix CEL, CHP

•llumina BeadStudio (>v 3.1)

•ABI SDS, RQ Manager (for QPCR)

•Custom Formats (ALL 1 & 2-color microarrays)

•.GPR files from AXON Scanners (GenePix software)

Page 5: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GeneSpring GX 10 – Key features

• Guided Workflows

• New Applications - miRNA, QPCR, Exon & more in future

• Project-based organization & Translation-on-the-fly

• Biological Context - Pathway Analysis, GSEA, GO, IPA, etc.

• Customization - Scripting in Jython and R

Page 6: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Pre-determined steps:

• Normalization

• QC

• Statistics

• GO

• Pathways

GX 10 Key features: Guided Workflows

Page 7: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Project-based organization

Page 8: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GeneSpring GX 10: Translation (Chap 3 in GX 10 manual)

•Comparing Platformsi.e. Affymetrix vs. Agilent vs. Spotted

•Comparing Speciesi.e. Mouse vs. Human -- Homology Table (NCBI’s Homologene)

•Comparing Applications: i.e. Gene Expression & QPCR or miRNA

Page 9: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Compare platforms, applications, species

GX 10 Key features: Translation

Homology table displayed

Page 10: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Venn Diagram

Compare experiments from different platforms, applications, & species

Page 11: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GX 10 Key features: Biological Context

GO Analysis (Fx, Process, Location)

GSEA (Gene Set Enrichment Analysis)

Pathway Analysis

Page 12: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GeneSpring GX 10: Gene Ontology (GO) Analysis

Likelihood that your genes of interest fell into a GO category, just by chance

HELP always

available

Page 13: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Pathway Analysis in GX 10: Two types of Pathway Analysis in GX 10:

1. ‘Pathway Analysis’ ToolBuilding networks of related entities8 Pathway Interaction Databases and NLP

2. ‘Find Significant Pathways’ ToolEntity-list enrichment with known pathways(Step 8 in Guided Workflow)

BioPax format pathways (.owl)

Page 14: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Overlay Networks with Expression Data/Conditions

Page 15: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Cellular Location Overlay of Network

Page 16: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

‘Find Similar Pathways’ Tool

Analysis performed on all pathways imported into GX 10

Significant enrichment of my genes in particular pathways?

Significant pathways are added to experiment

Page 17: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

e-Seminars & Workshops www.genespring.com

Recorded Seminars:

1. Introduction to GX 10

2. Analysis of miRNA & GE data

3. Analysis of QPCR & GE data

4. Alternative Splicing

5. Pathway Analysis

Page 18: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Affymetrix Files

Getting Started in GeneSpring GX 10Advanced Workflow: (To Find Differentially Expressed Genes)

Page 19: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Cardiogenomics dataset: Affymetirx data

Congestive heart failure (CHF)is a degenerative condition in which the heart no longer functions effectively as a pump.

The most common cause of CHF isdamage to the heart muscle by not enough oxygen.  This is usually due to narrowing of the coronary arteries which take blood to the heart. 

Idiopathic cardiomyopathy results in weakened hearts due to an unknown cause.

Ischemic cardiomyopathy is causedby a lack of oxygen to the heart due to coronary artery disease.

Page 20: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Cardiogenomics dataset: Affymetirx data

Experimental Goal:

To identify the molecular mechanisms underlying congestive heart failure, gene expression profiles were compared between male and female patients with idiopathic, ischemic or non-failing heart conditions.

Male Female

Non-failing 2 samples 2 samples

Idiopathic 2 samples 2 samples

Ischaemic 2 samples 2 samples

CEL files generated by Affymetrix GCOS

Page 21: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

SAMPLE GENDER CHFETIOLOGY

1 Female Idiopathic

2 Female Idiopathic

3 Male Idiopathic

4 Male Idiopathic

5 Female Ischemic

6 Female Ischemic

7 Male Ischemic

8 Male Ischemic

9 Female Non-failing

10 Female Non-failing

11 Male Non-failing

12 Male Non-failing

Experimental Setup in GeneSpringGender Interpretation

Condition 1: Female (Samples 1, 2, 5, 6, 9, 10)

Condition 2: Male (Samples 3, 4, 7, 8, 11, 12 )

The selected Interpretation determines how the samples are displayed in the various views and the comparisons that are made in analyses such as statistics.

CHF Etiology Interpretation

Condition 1: Idiopathic (Samples 1, 2, 3, 4) Condition 2: Ischemic (Samples 5, 6, 7, 8)

Condition 3: Non-failing (Samples 9, 10, 11, 12)

Gender/CHF Etiology Interpretation

Condition 1: Female/Idiopathic (Samples 1, 2) Condition 2: Male/Idiopathic (Samples 3, 4)Condition 3: Female/Ischemic (Samples 5, 6)Condition 4: Male/Ischemic (Samples 7, 8)Condition 5: Female/Non-failing (Samples 9, 10)Condition 6: Male/Non-failing (Samples 11,

12)

Page 22: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GeneSpring GX 10 Vocabulary

Project – collection of experiments

Entity – gene, probe, probeset, exon, etc.

Interpretation – samples that are grouped together based on conditions.

Technology – A file containing information on array design and biological information (annotation) for all the entities on the array

Biological Genome – a collection of all major annotations (NCBI) for any organism; essential for Generic/Custom arrays lacking annotations

Page 23: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Getting Started in GeneSpring

Cardiogenomics Experiment: Transcriptional profiling to learn more about molecular mechanisms underlying Congestive Heart Failure (CHF)

Sample Data: Myocardial samples from patients with normal hearts and Ischemic & Idiopathic cardiomyopathies (3 Etiologies)

Variables: Gender (2) and Etiology (3)

Technology: Affymetrix U133Plus2 array

Page 24: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Getting Started: Create New Project

From Startup screen OR from File/New Project

Page 25: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Getting Started with Advanced Analysis

Experiment Type: Affymetrix Expression (3 Affy choices!)

Workflow Type: Advanced Analysis

Page 26: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Select Data for Experiment

Select ‘Choose Files’ to load data files found on your computer.

Note: ‘Choose Samples’ option is used when creating experiments with samples already loaded into GX 10

Page 27: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Sample Upload

Page 28: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Summarization Algorithms in GX 10 for CEL Files

Summarization of Affymetrix probes

and baseline transformation of probeset values.

Page 29: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Summarization algorithms in GX 10

BACKGROUND SUBTRACTION

NORMALIZATION PROBESUMMARIZATION

RMA PM based Quantile Log (PM)

MAS5 PM-MM based Scaling One-step Tukey Biweight

PLIER PM-MM based Quantile Log (PM)

LiWong PM-MM based Quantile Linear (PM)

GCRMA PM-MM based Quantile Log (PM)

In addition to different calculations, the algorithms differ in the order in which Normalization and Summarization are performed.

Page 30: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

CEL files are the raw data files that contain signal values for individual probes.

CEL files are preprocessed to generate one value per probeset.

Preprocessing steps are:1. Background

subtraction2. Normalization3. Summarization of

probesetvalues

Different preprocessing algorithms are available.

DAT File

CEL File

CDF File

+

ImageAnalysis

Hybridization& Scanning

Array

Preprocessing of Affymetrix Arrays

CHP

GCOSAGCC

Page 31: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

BoxWhisker plot: Summary of Normalized Intensities

Page 32: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Advanced Workflow Experiment Setup

Experiment Grouping

Specify parameters/conditions

Page 33: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Experiment Grouping

The experimental parameters are added in this window.

For each array, the particular parameter value (condition) is also specified.

Values can be added manually or loaded from a saved file (circled in Red).

Page 34: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Advanced Workflow Experiment Setup

Create Interpretation

In the Guided Workflow, only one interpretation is automatically provided.

Here, users can create multiple interpretations

Page 35: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Grouping and Interpretation

2 experimental variables: CHF Etiology and Gender

For this experiment, 3 interpretations could be created:

1) Gender

2) CHF Etiology (Ischemic, Idiopathic, non-failing)

3) CHF Etiology and Gender: This interpretation is automatically created in the Guided Workflow.

Example: Gender Only

Page 36: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Creating Interpretations: step 2 of 3

Page 37: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Creating Interpretations: step 3 of 3

Page 38: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Advanced Analysis Workflow: Quality Control

QC on Samples and Probes automatically performed in Guided Workflow

Users can specify settings beyond those available in Guided Workflow

Page 39: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Quality Control on Samples

Page 40: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Filter by Expression

Page 41: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Advanced Analysis Workflow: Analysis

Statistical Analysis

Filter on Volcano Plot (both Stats and Fold Change)

Fold Change

Clustering

Find Similar Entities

Filter on Parameters

PCA

Page 42: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Getting Started with Guided Workflow

Experiment Type: Agilent Single-color

Workflow Type: Guided Workflow

Page 43: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Sample Upload

Page 44: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

BoxWhisker plot: Summary of Normalized Intensities

Page 45: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GeneSpring GX 10: Important Menu options: 

Project: Import/Export project zip

Tools: Script Editor/ R EditorImport BioPAX pathwaysGS7 data migrationOptions…

Annotations: Update Technology AnnotationsCreate Biological GenomeUpdate Pathway Interactions

Help: License ManagerUpdate Product

Page 46: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Pathway Analysis

To use ‘Find Significant Pathways’ Tool:

1. Download BioPax format (.owl) pathways www.biopax.org to your computer

2. Import .owl pathways into GX 10 from Tools and ‘Import BioPax pathways’ option

3. From Workflows menu (in the right margin of GX 10) select ‘Find Similar Pathways’ and choose your Entity List of interest

Page 47: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Performing Pathway Analysis in GX 10:

1. In the Annotations Menu, select ‘Update Pathway Interactions’ from Agilent Server

2. Before choosing an organism, GX 10 must first create a Pathway Database Infrastructure. May take >10 min

3. Once the Infrastructure database is complete, go back to Annotations/Update Pathway Interactions and choose your preferred organism. May take >20 minutes

4. From Workflows menu (in the right margin of GX 10) select ‘Pathway Analysis’ to begin building networks

Page 48: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Updating Annotations: Chap 3 in GX 10 pdf manual, pg. 51

Option 2: Update from file

Option 1: Update from Agilent Server

Option 3 is new in GX10: Update directly from NCBI

from GX (Biological Genome)

Page 49: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GeneSpring GX 10: Reference pages in Manual

Creating/Updating Technologies & Annotations: Chapter 3 in GX 10 pdf manual, pg. 51From 1) Agilent server; 2) file; 3) NCBI (Biological Genome)

GS7 to GS10 Data Migration:Chapter 4 in GX 10 manual, pg. 71 and in Quick Start Guide

Translation: Chapter 3.3 in pdf manual (pg 63)

Page 50: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January, 2009

Thank you

www.genespring.com

Technical Support 24 hours/5 days per week [email protected]

1-800-227-9770 (option 6, 2)

[email protected]@agilent.com

[email protected] (Genomics)

Page 51: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Automated GX 7 Migration Tool Chapter 4 in GX 10 manual

Step1: Prepare for GS7 Migration- tool automatically prepares data

for migration (for large # of samples, this step takes time)

Step2: Select GS7 genome to migrate to GS10- all experiments, samples,

interpretation, gene lists, trees, parameter values, condition values, and classifications

will be automatically migrated

Step3: Open Project with name corresponding to GX 7 genome to see the migrated data. Note that if genome was

assigned a project in GX 7, this name will be the name of the project in GX 10 instead of the name of GX 7 genome

Page 52: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GX 10: Biological Context

GO Analysis (Fx, Process, Location)

GSEA (Gene Set Enrichment Analysis)

GSA (Gene Set Analysis)

Pathway Analysis (Interaction DB)

Find Similar Entity Lists

Find Significant Pathways (BioPax.org)

Link to Ingenuity’s IPA

NLP (mine literature)

Page 53: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GSEA

GSEA interrogates genome-wide expression profiles from samples belonging to two different classes (e.g. normal and tumor) and determines whether genes in an a priori defined gene set correlate with class distinction

Reference: Subramanian et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.

PNAS. September 30, 2005, 10.1073

Page 54: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

GSEA Method

1. Rank genes based on the correlation between their expression intensities and class distinction

• Genes that differ most in their expression between the two classes will appear at the top and bottom of the list

• Assumption is that genes related to the phenotypic distinction of the classes will tend to be found at the top and bottom of the list

2. Calculate enrichment score (ES) to reflect the degree of overrepresentation of genes in a particular gene set at the top and bottom of the entire ranked list

3. Derive p-value for the ES to estimate its significance level

4. Adjust p-value for multiple testing

Page 55: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Gene Set Enrichment Analyses

Page 56: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Gene Set Enrichment Analyses

How is performing GSEA or GSA on GO gene sets different from doing GO Analysis on a list of differentially expressed genes?

• Statistical analysis can miss genes with small changes relative to noise that, as a group, can have significant impact on the observed difference in phenotype

– Use All Entities list as input for GSEA or GSA

• Instead of looking at only at individual differentially expressed genes, take a genome-wide approach to see if gene sets are associated with the phenotypic class distinction

– Enrichment in GO Analysis done with Fisher’s Exact while GSEA/GSA is done with a type of running sum statistics

• User can specify any Entity List as gene sets in GeneSpring GX

Page 57: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Identifiers Necessary for GSEA

Technology must contain Gene Symbol

Columns that must be marked in custom technology to perform GSEA:

• Annotation file must contain a column (Column X) containing Gene Symbol

– Column X must be marked “Gene Symbol”

– Select “Gene Symbol” mark from the drop-down menu while creating Custom technology.

Page 58: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Gene Sets

GSEA/GSA can use either Broad lists or any Entity Lists in GeneSpring

Broad Institute has defined four categories of gene sets:

• C1- Grouped based on cytogenic location.

• C2- Functional lists. ~1000 gene lists corresponding to pathways or functional process (if they are both involved in inflammatory response, they can also be in the same list)

• C3- Regulation lists. Grouped according by promoter analysis. Genes are regulated by the same motif (may or may not know transcription factor). Cases where they simply share same binding motif and therefore assumed to be co-regulated.

• C4- Proximity to known oncogene and tumor suppresors. For example, all the neighbors of BRCA.

• C5- GO gene sets. Each category is represented as a gene set except for very broad categories such as Biological Process and categories with less than 10 genes

Page 59: GeneSpring January 2009 Agilent Bioinformatics & GeneSpring overview GX 10 Guided & Advanced Data Analysis Practice & Discussion GeneSpring GX 10 for Gene

GeneSpring

January 2009

Key Differences Between GSEA and GSA

The two algorithms share the same idea, but differ in the way they determine what gene sets are significantly enriched

• Differs in the GSA "maxmean" statistic: this is the mean of the positive or negative part of gene scores in the gene set, whichever is larger in absolute value. Efron and Tibshirani shows that the method used in GSA is often more powerful than the modified Kolmogorov-Smirnov statistic used in GSEA.

• GSA uses a somewhat different null distribution for estimation of false discovery rates: it does "restandardization" of the genes, in addition of the permutation of samples (done in GSEA)

• GSA also can handle more than two conditions (limitation in GSEA)