49
Pooling Data Across Microarray Experiments Using Different Versions of Affymetrix Oligonucleotide Arrays Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes UT MD Anderson Cancer Center Houston, TX, USA

Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

  • Upload
    keahi

  • View
    15

  • Download
    0

Embed Size (px)

DESCRIPTION

Pooling Data Across Microarray Experiments Using Different Versions of Affymetrix Oligonucleotide Arrays . Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes UT MD Anderson Cancer Center Houston, TX, USA. Combining Information across Microarray Studies. - PowerPoint PPT Presentation

Citation preview

Page 1: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Pooling Data Across Microarray Experiments Using Different Versions of Affymetrix Oligonucleotide Arrays

Jeffrey S. MorrisLi Zhang, Chunlei Wu, Keith

Baggerly and Kevin CoombesUT MD Anderson Cancer

CenterHouston, TX, USA

Page 2: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Combining Information across Microarray Studies

Many publicly available microarray data sets Can combine information across studies to:1. Validate results from individual studies

Find intersection of differentially expressed genes Build model using one study, validate using

another2. Discover new biological insights by analyses

pooling data across studies. Potential for increased statistical power Important since many individual studies

are underpowered.

Page 3: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Pooling Data across Studies

Challenge: In general, microarray data from different studies not comparable

Clinical differences Different study populations

Technical differences Laboratory differences: sample collection

and storage, microarray protocol Different platforms: cDNA/oligo, different

versions of same technology (e.g. Affy chips)

Page 4: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Pooling Data across Studies

Approaches in Existing Literature:1. Include study effects in model

Gene-specific study effects SVD, Distance-weighted DiscriminationDrawback: First-order corrections not enough

2. Model unitless summary measures standardized log fold-change t-statistics probabilities of +/0/- expressionDrawback: Implicit assumptions about

comparability of clinical populations across studies

Page 5: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Pooling Data across Studies

Sequence-related reasons for incomparability of raw expression levels across platforms:

Cross-hybridization RNA degradation (near 5’ end) Probe validity – map to RefSeq? Alternative splicing

It may be possible that, by taking these into account, we can obtain more comparable raw expression levels to use in pooled analyses

Our focus: combining information across different versions of Affymetrix genechips

Page 6: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Overview of Affymetrix GeneChips

Probes: 25-base sequences from gene of interest

Probesets: set of probes corresponding to same gene.

Obtained from current sequence information in GenBank, Unigene, RefSeq

Generations of human chips: HuGeneFL: 5600 genes, 20 probes/gene U95Av2: 10,000 genes, 16 probes/gene U133A: 14,500 genes, 11 probes/gene

Page 7: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Example: CAMDA Lung Cancer Data

CAMDA: “Critical Assessment of Microarray Data Analysis”: annual conference at Duke University

CAMDA 2003: Two studies relating gene expression data to survival in lung cancer patients

1. Harvard (Bhattacharjee, et al. 2001) 124 lung adenocarcinoma samples

2. Michigan (Beer, et al. 2002) 86 lung adenocarcinoma samples

GOAL: Pool data across studies to identify prognostic genes for lung cancer.

Page 8: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Pooling Information across Studies

Harvard patients – worse prognosis

Page 9: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Pooling Information across Chip Types

Michigan: HuGeneFl Chip 6,633 probe sets -- 20 probe pairs each

Harvard: HG_U95Av2 Chip 12,453 probe sets – 16 probe pairs each

Problems: Different genes Incomparable expression levels

Page 10: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

“Partial Probeset” Method

HuGeneFL :HG_U95Av2:

Matching Probes

“Partial Probesets”1. Identify “matching probes”2. Recombine into new probesets based on UNIGENE

clusters, which we refer to as “partial probesets” 3. Eliminate any probesets containing just one or two

probes Note: Any quantification method can subsequently

be used (MAS, dChip, RMA, PDNN)

……

Page 11: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Pooling Information across Chip Types

Page 12: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Quantification of Expression Levels

Gene expressions quantified by applying Li’s PDNN model to our partial probesets Uses probe sequence info to predict patterns of

specific and nonspecific hybridization intensities Allows borrowing of strength across probe sets Model is not overparameterized – O(N probesets)

See Zhang, et al. (2003) Nature Biotech for further details on method and comparison

Page 13: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Detecting Outliers

L54 L88 L89 L90 Other outliers: 6 from Michigan, 2 Harvard

Other preprocessing (remove low expr./normalize) Matching clinical/microarray data for 200 patients

(124 H, 76 M)

Log-scale plots to detect outliers Large spot detected on 4 Michigan chips

Page 14: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Assessing Our Method for Combining Information Across Chip Types

“Partial Probeset” method appears to give comparable expression levels across chip types.

Page 15: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Assessing our Method for Combining Information across Chip Types

Median “partial probeset” size is 7, vs. 16 or 20Loss of precision?

No evidence of significant precision loss

Page 16: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Assessing “Partial Probeset” Method

Agreement in relative quantifications across samples

Page 17: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Assessing “Partial Probeset” Method

Agreement in relative quantifications across samples

Less variable genes worse

Page 18: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Assessing “Partial Probeset” Method

Agreement in relative quantifications across samples

Less variable genes worse

Eliminate genes with sd<0.20 or r<0.90

Page 19: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Assessing “Partial Probeset” Method

Agreement in relative quantifications across samples

Less variable genes worse

Eliminate genes with sd<0.20 or r<0.90

1,036 genes

Page 20: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Identifying Prognostic Genes

1. Preprocess raw microarray data Outlier Detection, Normalization, Quantification,

Remove Some Genes? Left with “n-by-p” matrix of expression levels for

p genes on n microarrays.2. Identify which genes are correlated to

outcome of interest Perform standard statistical test for each gene –

obtain (permutation) p-values Find “cutpoint” on p-values to declare

significance that accounts for multiplicities.

Page 21: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Identifying Prognostic Genes

After preprocessing: 1036 genes, 200 samples

Identify genes related to survival After adjusting for known clinical

predictors Provide prognostic information on

survival above and beyond clinical predictors

Page 22: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Identifying Prognostic Genes: Cox Regression

Modeling Hazard : (t) ~ Prob(X<t +t | X>t ) Cox Model: i(t) = 0(t) exp(Xi )

Xi = Vector of covariates for subject i = Vector of regression coefficients

Key Assumption: Proportional Hazards Hazard ratio between subjects with different

covariates does not vary over time. i(t )/k(t ) = exp{ (Xi-Xk) } Exp() =Change in hazard per unit change in X

Page 23: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Identifying Prognostic Genes: Cox Regression

Modeling Best Clinical Model:

Factor Exp()

Z p

Study Michigan = 0 Harvard = 1

0.67

1.95 2.73

0.0062

Age 0.03

1.03 2.60

0.0094

Stage Early (1-2) = 0 Late (3-4) = 1

1.53

4.61 6.61

<0.000000001

Page 24: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Identifying Prognostic Genes

Series of 1036 multivariable Cox models fit to identify prognostic genes. Each model contained: Study (Michigan=-1, Harvard=1). Age (continuous factor). Stage (early=0/late=1). Probeset (log intensity value as continuous factor).

Exact p-values for each probeset computed using permutation approach

By using multivariate modeling, we search for genes offering prognostic information beyond clinical predictors

Page 25: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Identifying Prognostic Genes: BUM Method

No prognostic genes pvals Uniform Prognostic genes smaller pvals Fit Beta-Uniform mixture to histogram

of p-values – “BUM” method (Pounds and Morris, 2003 Bioinformatics)

Method can be used to identify prognostic genes while controlling FDR

Page 26: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Results Histogram suggests

there are some significant probesets

FDR=0.20 corresponds pval cutoff of 0.0024 (BUM, Pounds and Morris 2003)

26 probesets flagged as significant

Page 27: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged GenesRank Gene p Function

1 FCGRT -2.07 <0.00001

Induced by IF- in treating SCLC

2 ENO2 1.46 0.00001 Marker of NSCLC

4 RRM1 1.81 0.00002 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 Marker of NSCLC

11 CPE 0.72 0.00031 Marker of SCLC

12 ADRBK1 -2.20 0.00044 Co-expressed with Cox-2 in lung ADC

16 CLU -0.52 0.00109 Marker of SCLC

20 SEPW1 -1.29 0.00145 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 Marker of invasiveness in Stg 1 NSCLC

25 BTG2 -0.75 0.00232 Induced by p53 in SCLC cell lines

Page 28: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged GenesRank Gene p Function

1 FCGRT -2.07 <0.00001

Induced by IF- in treating SCLC

2 ENO2 1.46 0.00001 Marker of NSCLC

4 RRM1 1.81 0.00002 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 Marker of NSCLC

11 CPE 0.72 0.00031 Marker of SCLC

12 ADRBK1 -2.20 0.00044 Co-expressed with Cox-2 in lung ADC

16 CLU -0.52 0.00109 Marker of SCLC

20 SEPW1 -1.29 0.00145 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 Marker of invasiveness in Stg 1 NSCLC

25 BTG2 -0.75 0.00232 Induced by p53 in SCLC cell lines

Page 29: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged GenesRank Gene p Function

1 FCGRT -2.07 <0.00001 Induced by IF- in treating SCLC

2 ENO2 1.46 0.00001 Marker of NSCLC

4 RRM1 1.81 0.00002 Linked to survival in NSCLC

8 CHKL -1.43 0.00010 Marker of NSCLC

11 CPE 0.72 0.00031 Marker of SCLC

12 ADRBK1 -2.20 0.00044 Co-expressed with Cox-2 in lung ADC

16 CLU -0.52 0.00109 Marker of SCLC

20 SEPW1 -1.29 0.00145 H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 Marker of invasiveness in Stg 1 NSCLC

25 BTG2 -0.75 0.00232 Induced by p53 in SCLC cell lines

Page 30: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged GenesRan

kGene p pStage Function

1 FCGRT -2.07 <0.00001

0.154

Induced by IF- in treating SCLC

2 ENO2 1.46 0.00001 0.282

Marker of NSCLC

4 RRM1 1.81 0.00002 0.321

Linked to survival in NSCLC

8 CHKL -1.43 0.00010 0.979

Marker of NSCLC

11 CPE 0.72 0.00031 0.088

Marker of SCLC

12 ADRBK1

-2.20 0.00044 0.484

Co-expressed with Cox-2 in PUC

16 CLU -0.52 0.00109 0.014

Marker of SCLC

20 SEPW1 -1.29 0.00145 0.028

H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 0.082

Marker of invasiveness in Stage 1 NSCLC

Page 31: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged GenesRan

kGene p pStage Function

1 FCGRT -2.07 <0.00001

0.154

Induced by IF- in treating SCLC

2 ENO2 1.46 0.00001 0.282

Marker of NSCLC

4 RRM1 1.81 0.00002 0.321

Linked to survival in NSCLC

8 CHKL -1.43 0.00010 0.979

Marker of NSCLC

11 CPE 0.72 0.00031 0.088

Marker of SCLC

12 ADRBK1

-2.20 0.00044 0.484

Co-expressed with Cox-2 in PUC

16 CLU -0.52 0.00109 0.014

Marker of SCLC

20 SEPW1 -1.29 0.00145 0.028

H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 0.082

Marker of invasiveness in Stage 1 NSCLC

Page 32: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged GenesRan

kGene p pStage Function

1 FCGRT -2.07 <0.00001

0.154

Induced by IF- in treating SCLC

2 ENO2 1.46 0.00001 0.282

Marker of NSCLC

4 RRM1 1.81 0.00002 0.321

Linked to survival in NSCLC

8 CHKL -1.43 0.00010 0.979

Marker of NSCLC

11 CPE 0.72 0.00031 0.088

Marker of SCLC

12 ADRBK1

-2.20 0.00044 0.484

Co-expr with Cox-2 in lung adenocarc

16 CLU -0.52 0.00109 0.014

Marker of SCLC

20 SEPW1 -1.29 0.00145 0.028

H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 0.082

Marker of invasiveness in Stage 1 NSCLC

Page 33: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged GenesRan

kGene p pStage Function

1 FCGRT -2.07 <0.00001

0.154

Induced by IF- in treating SCLC

2 ENO2 1.46 0.00001 0.282

Marker of NSCLC

4 RRM1 1.81 0.00002 0.321

Linked to survival in NSCLC

8 CHKL -1.43 0.00010 0.979

Marker of NSCLC

11 CPE 0.72 0.00031 0.088

Marker of SCLC

12 ADRBK1

-2.20 0.00044 0.484

Co-expressed with Cox-2 in PUC

16 CLU -0.52 0.00109 0.014

Marker of SCLC

20 SEPW1 -1.29 0.00145 0.028

H202 cytotox. in NSCLC cell lines

21 FSCN1 0.66 0.00150 0.082

Marker of invasiveness in Stage 1 NSCLC

Page 34: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged Genes

Rank Gene p pStage Function3 NFRK

B-2.81 0.0000

10.058 Amplified in AML

7 ATIC 1.81 0.00009

0.771 Fusion partner of ALK which defines subtype of ALCL

13 BCL9 -1.64 0.00069

0.057 Over-expressed in ALL

15 TPS1 -0.64 0.00107

0.882 Associated with pulmonary inflammation

25 BTG2 -0.75 0.00232

0.726 Inhibits cell proliferation in primary mouse embryo fibroblasts lacking functional p53

Page 35: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged Genes

Rank Gene p pStage Function3 NFRK

B-2.81 0.0000

10.058 Amplified in AML

7 ATIC 1.81 0.00009

0.771 Fusion partner of ALK which defines subtype of ALCL

13 BCL9 -1.64 0.00069

0.057 Over-expressed in ALL

15 TPS1 -0.64 0.00107

0.882 Associated with pulmonary inflammation

25 BTG2 -0.75 0.00232

0.726 Inhibits cell proliferation in primary mouse embryo fibroblasts lacking functional p53

Page 36: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Selected Flagged Genes

Rank Gene p pStage Function3 NFRK

B-2.81 0.0000

10.058 Amplified in AML

7 ATIC 1.81 0.00009

0.771 Fusion partner of ALK which defines subtype of ALCL

13 BCL9 -1.64 0.00069

0.057 Over-expressed in ALL

15 TPS1 -0.64 0.00107

0.882 Associated with pulmonary inflammation

25 BTG2 -0.75 0.00232

0.726 Inhibits cell proliferation in primary mouse embryo fibroblasts lacking functional p53

Page 37: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Results Our gene list has almost no overlap with other

publications of these data. Reasons:1. We addressed a different research question

Us: ID Genes offering prognostic info beyond clinical Michigan: Univariate Cox models fit; results used to

construct dichotomous “risk index” Harvard: Cluster analysis done; clusters linked to

survival; found genes driving the clustering2. Pooling across studies yielded significant gains in

statistical power. Most genes (17/26) in our study are not flagged if we

analyze 2 data sets separately (i.e. no pooling)

Page 38: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

CAMDA 2003 Results

We Won!

Page 39: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Limitations of Partial Probeset Method

Worked well for combining across HuGeneFL/ U95Av2 ~25% probes from HuGeneFL on U95Av2, with

4,101 probesets Not enough matching probes for use with

U95Av2/U133A ~6% of probes from U95Av2 also on U133A,

with only 628 probesets Requiring matching probes strong criterion,

maybe weaker criterion would suffice?

Page 40: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Alternative Splicing

Diagram of C2GnT I gene organization and different mRNA variants of this gene that are differentially expressed across tissue types. From Falkenberg, et al. (2003) Glycobiology 13(6), 411-418.

Page 41: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Full-Length Transcript Based Probesets

New probeset definition (FLTBP): probes match the same set of full-length mRNA sequences

Procedure 1. Construct comprehensive library of full-length mRNA

transcript sequences from RefSeq and HinvDB 2. For each probe, identify all matching full-length

transcripts using Blast programU95Av2: 15% matched no sequence, 33% matched multiple seq.U133A: 18% matched no sequence, 38% matched multiple seq.

3. Group probes with same matched target lists (FLTBPs)U95Av2: 23,972 probesets, U133A: 14,148 probesets

Page 42: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Full-Length Transcript Based Probesets

Matching across chip types: 9,642 FLTBPs match across U95Av2 and U133A Affymetrix has their own method for mapping their

probesets across arrays – 9,480 pairs of probesets (only about ½ map the same way as FLTBPs)

Example: Lung cancer cell line data 28 cell lines, each hybridized onto both U95Av2 and

U133A arrays. Paired design suggests any differences between

paired measurements due to technical, not biological, sources.

Different quantification methods (PDNN, RMA, MAS, dChip)

Page 43: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Results Density

estimate of chip-to-chip correlations for each gene

Positive shift for FLTBP suggests better correlations

Improvement greatest for PDNN

Correlation still not perfect

-0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Gene-gene correlation

Den

sity

A

PDNN(P<0.00001)

-0.5 0.0 0.5 1.00.

00.

51.

01.

52.

02.

5

Gene-gene correlation

Den

sity

B

RMA(P<0.00031)

-0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Gene-gene correlation

Den

sity

C

MAS5(P<0.00575)

-0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

2.5

Gene-gene correlation

Den

sity

D

dChip(P<0.00005)

Page 44: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Example: Sample Gene 1

4

5

6

7

8

1 6 11 16 21 26Sample

Ln(p

robe

sig

nal)

4

5

6

7

8

1 6 11 16 21 26Sample

Ln(p

robe

sig

nal)

10.5

11.0

11.5

6.0 6.5 7.0U133A

U95

Av2

R2=0.84R2=0.33

Plot of probe signals for two chip types (Red=FLTBP) Scatterplot of log-expression values for each sample across the

two chip types (Black=all probes, Red=FLTBP) Correlation across chips significantly improved with FLTBP

Page 45: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Example: Sample Gene 2

4

5

6

7

8

9

10

1 6 11 16 21 26

Sample

Ln(p

robe

sig

nal)

4

5

6

7

8

9

10

1 6 11 16 21 26

Sample

Ln(p

robe

sig

nal)

11

12

13

14

15

7 8 9 10 11

U95Av2

U13

3A

R2=0.87R2=0.25

Again, significantly higher correlation using FLTBP than using Affymetrix’ definition

Page 46: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Results Boxplot of

chip-to-chip correlations (over genes) for each sample

PDNN resulted in higher correlations

PDNN RMA MAS5 dChip

FLTB

P

Affy

PS

FLTB

P

Affy

PS

FLTB

P

Affy

PS

FLTB

P

Affy

PS

0.6

0.7

0.8

0.9

1.0

Sam

ple

corr

elat

ions

Page 47: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Conclusions New method for pooling info across studies

using different versions of Affymetrix chips. Recombine matched probes into new

probesets using Unigene clusters. Method appears to obtain comparable

expression levels across chips without sacrificing much precision or significantly altering the relative ordering of the samples.

Worked well combining information across HuGeneFL/U95Av2, but not U95Av2/U133A

Page 48: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

Conclusions

Discussed new probeset definition based on full-length transcript sequences. Removes effect of known alternative splicing Yields stronger between-chip correlations

than Affymetrix standard definitions Pooling information across studies is

difficult – there is still more work to be done – but worth the effort.

Page 49: Jeffrey S. Morris Li Zhang, Chunlei Wu, Keith Baggerly and Kevin Coombes

ReferencesMorris JS, Yin G, Baggerly KA, Wu C, and Zhang L (2005). Pooling Information Across Different Studies and Oligonucleotide Microarray Chip Types to Identify Prognostic Genes for Lung Cancer. Methods of Microarray Data Analysis IV, eds. JS Shoemaker and SM Lin, pp. 51-66, New York: Springer-Verlag.

Wu C, Morris JS, Baggerly KA, Coombes KR, Minna JD, and Zhang L (2005). A probe-to-transcripts mapping method for cross-platform comparisons of microarray data taking into account the effects of alternative splicing. Under review.

Morris JS, Wu C, Coombes KR, Baggerly KA, Wang J, and Zhang L (2006). Alternative Probeset Definitions for Combining Microarray Data Across Studies Using Different Versions of Affymetrix Oligonucleotide Arrays. To appear in Meta-Analysis in Genetics, edited by Rudy Guerra and David Allison, Chapman-Hall.