Matching the Pemetrexed Heatmap
Keith A. Baggerly
September 24, 2009
Contents
1 Executive Summary 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Options and Libraries 2
3 Loading and Parsing Data 23.1 Earlier Rda Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4 Matching the Reported Genes 2
5 Drawing Heatmaps of the Reported (and Offset) Genes 35.1 A Heatmap of the Reported Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35.2 A Heatmap of the Reported Genes, Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
6 Using Correlation to Guess Starting Cell Lines 3
7 Steepest Ascent with Binreg 37.1 Exporting the Novartis A NCI-60 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77.2 Exporting the Starting Guess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77.3 Invoking Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
8 Loading Matlab Output and Comparing Gene Lists 168.1 Loading Scores from Each Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168.2 Extracting the Cell Lines Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168.3 Loading the Heatmap Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178.4 Comparing Gene Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9 Save Rda File 17
10 Appendix 1810.1 File Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1810.2 Saves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1810.3 SessionInfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1
matchPemetrexedHeatmap.Rnw 2
List of Figures
1 Heatmap of centered expression values from the Novartis A set expression data for the peme-trexed probesets reported by Hsu et al. [3] As these genes were chosen to separate sensitivefrom resistant lines, we expect to see fairly pervasive structure separating two subgroups. Wedo not see this structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Heatmap of centered expression values from the Novartis A set expression data for the probe-sets reported by Hsu et al. [3] after “offsetting” by one row. As genes were chosen to separatesensitive from resistant lines, we expect to see fairly pervasive structure separating two sub-groups. We see this structure here, suggesting that the reported list is incorrect due to anindexing error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Heatmap of correlations between samples using the offset probesets from Figure 2. The clearestgroups of cell lines are those in the upper right and lower left. . . . . . . . . . . . . . . . . . 6
4 First iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. The shift producing the maximum score (moving TK-10from Unused to Resistant) is indicated with an offset circle. This shift increases the score from58 to 66. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5 Second iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. The shift producing the maximum score (moving SW-620from Resistant to Unused) is indicated with an offset circle. This shift increases the score from66 to 71. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Third iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. The shift producing the maximum score (moving SNB-75from Sensitive to Unused) is indicated with an offset circle. This shift increases the score from71 to 73. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 Fourth iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increasesor decreses as a given change is made. The shift producing the maximum score (movingKM12 from Resistant to Unused) is indicated with an offset circle. This shift leaves the scoreunchanged at 73. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
8 Fifth iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. The shift producing the maximum score (moving SF-258from Sensitive to Unused) is indicated with an offset circle. This shift increases the score from73 to 77. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
9 Sixth iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decresesas a given change is made. The shift producing the maximum score (moving CCRF-CEM fromResistant to Unused) is indicated with an offset circle. This shift increases the score from 77to 85. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10 Seventh iteration to find an overall maximum for pemetrexed. White asterisks indicate thestatus of each cell line using the starting guess, and colors indicate how the score increases ordecreses as a given change is made. Any shift from the current guess decreases the score. Ourbest score is 85 (a perfect match). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
matchPemetrexedHeatmap.Rnw 3
11 Heatmap for the pemetrexed signature using the final set of cell lines obtained through oursearch procedure. This heatmap is a perfect match for that in Figure 1B of Hsu et al. [3],indicating that these are the cell lines and genes involved. . . . . . . . . . . . . . . . . . . . 15
List of Tables
1 Executive Summary
1.1 Introduction
Hsu et al. [3] construct a signature for pemetrexed using the NCI-60 cell lines. They note that pemetrexedand cisplatin sensitivity appear to be inversely correlated in NSCLC, suggesting that patients resistant toone would likely respond to the other. In this report, we outline our reconstruction of the heatmap provided,and our inferences about the specific genes and cell lines involved.
1.2 Methods
We loaded two previously constructed Rda files: novartisA and hsuReportedGeneLists. We extracted quan-tifications for the reported probesets, and examined heatmaps to see if there was clear separation of sensitiveand resistant cell lines. We repeated this procedure after “offsetting” the probesets by one row to accountfor possible problems with binreg. We then constructed pairwise sample correlation matrices to suggest thespecific cell lines used in each group. Starting from this initial guess, we then examined all other sets of celllines in a local “neighborhood” and assigned each set a “score” of the number of reported probesets (offset ornot) that were matched. We shifted to the set with the highest score in the neighborhood and repeated theprocess until a local maximum was reached. This scoring makes use of Matlab scripts which call the binregsoftware; the primary file is buildPemetrexedHeatmap.m.
1.3 Results
Using probesets“off-by-one” from those reported, we were able to perfectly reconstruct the heatmap reported,thus identifying the specific genes and cell lines involved. The cell lines areResistant (8): K-562, MOLT-4, HL-60(TB), MCF7, HCC-2998, HCT-116, NCI-H460, and TK-10,Sensitive (10): SNB-19, HS 578T, MDA-MB-231/ATCC, MDA-MB-435, NCI-H226, M14, MALME-3M,SKMEL-2, SK-MEL-28, and SN12C. We match all 85 of the genes reported after offsetting.
The various intermediate files and gene lists for pemetrexed are saved in RDataObjects as pemetrexedAll.Rda.
1.4 Conclusions
The reported signature for pemetrexed is incorrect due to an off-by-one indexing error.
2 Options and Libraries
> options(width = 80)
matchPemetrexedHeatmap.Rnw 4
3 Loading and Parsing Data
3.1 Earlier Rda Files
We begin by loading two Rda files assembled earlier: novartisA and hsuReportedGenelists.
> rdaList <- c("novartisA", "hsuReportedGenelists")
> for (rdaFile in rdaList) {
+ rdaFullFile <- file.path("RDataObjects", paste(rdaFile, "Rda",
+ sep = "."))
+ if (file.exists(rdaFullFile)) {
+ cat("loading ", rdaFullFile, " from cache\n")
+ load(rdaFullFile)
+ }
+ else {
+ cat("building ", rdaFullFile, " from raw data\n")
+ Stangle(file.path("RNowebSource", paste("buildRda", rdaFile,
+ "Rnw", sep = ".")))
+ source(paste("buildRda", rdaFile, "R", sep = "."))
+ }
+ }
loading RDataObjects/novartisA.Rda from cacheloading RDataObjects/hsuReportedGenelists.Rda from cache
4 Matching the Reported Genes
Hsu et al. [3] used 18 cell lines to form their pemetrexed signature: 8 resistant and 10 sensitive. Fromthese, the binreg software was used to select the 85 genes having the most extreme two-sample t-test valuesseparating resistant from sensitive.
We loaded the list of reported genes above. We now check that all of these genes reported are present inthe novartisA data matrix.
> unmatchedRows <- which(is.na(match(pemetrexedReportedProbesets[,
+ "probesetID"], rownames(novartisA))))
> unmatchedRows
integer(0)
> matchedPemetrexedProbesets <- rownames(pemetrexedReportedProbesets)
We match all 85 of the reported probesets.
5 Drawing Heatmaps of the Reported (and Offset) Genes
5.1 A Heatmap of the Reported Genes
We now draw a heatmap showing the expression levels of the reported genes across all of the samples.Since these genes were chosen specifically to separate one group of cell lines from another, we expect to see
matchPemetrexedHeatmap.Rnw 5
some clear differential structure. We center and scale the results for each row (probeset) before display (theheatmap function in R does this scaling by default). We also use an analog of the Matlab jet colormap. Theheatmap is shown in Figure 1. There is essentially no separating structure visible.
5.2 A Heatmap of the Reported Genes, Offset
Coombes et al. [1] noted that all of the gene lists initially reported by Potti et al. [4] had been “offset by one”due to an indexing error. We try introducing the same type of offset (e.g., replacing 1100 at with 1101 at)and redrawing the heatmap. This heatmap is shown in Figure 2. There is now clearly visible structureseparating groups of cell lines. This suggests that the probesets shown were part of the “true” signature, andthat the reported list is incorrect due to the same indexing error that affected the initial gene lists reportedby Potti et al. [4].
6 Using Correlation to Guess Starting Cell Lines
We know of no simple analytic way of determining which cell lines were involved, and the names are notgiven in Hsu et al. [3] or in their supplementary material. We rely instead on a combination of informedguessing and brute force. We begin by looking at correlations between samples using the expression valuesfor the offset genes identified above. A heatmap is shown in Figure 3. Looking at the correlation heatmapsuggests some clear candidates for the larger (sensitive) group: the 12 samples in the cluster at the topright. Reading from the top, these are SNB-75, HS 578T, SNB-19, SF-268, NCI-H226, SK-MEL-2, M14,MALME-3M, SK-MEL-28, MDA-MB-231/ATCC, SN12C, and MDA-MB-435. Likewise, the 10 samples inthe cluster at the bottom left are good candidates for the smaller (resistant) group. Reading from the top,these are SW-620, NCI-H460, KM12, HCT-116, HCC-2998, K-562, CCRF-CEM, HL-60(TB), MCF7, andMOLT-4.
> startingSensitiveLines <- c("SNB-75", "HS 578T", "SNB-19", "SF-268",
+ "NCI-H226", "SK-MEL-2", "M14", "MALME-3M", "SK-MEL-28", "MDA-MB-231/ATCC",
+ "SN12C", "MDA-MB-435")
> startingResistantLines <- c("SW-620", "NCI-H460", "KM12", "HCT-116",
+ "HCC-2998", "K-562", "CCRF-CEM", "HL-60(TB)", "MCF7", "MOLT-4")
The starting lists involve 12 and 10 cell lines, respectively, whereas Hsu et al. [3] use 10 and 8.
7 Steepest Ascent with Binreg
In order to improve our guess, we use a process of steepest ascent with the binreg software used by Potti etal. [4] Specifically, we give each set of cell lines a score corresponding to the number of target probesets wecan match. From our starting set, we score all sets that can be reached by changing the status of at mostone cell line (e.g., from sensitive to resistant, from sensitive to excluded, from excluded to resistant). Wethen shift our central location to the set in the neighborhood with the highest score and repeat until a localmaximum is reached.
7.1 Exporting the Novartis A NCI-60 Data
To conduct our search, we first export the novartisA NCI-60 in a format usable by binreg.
matchPemetrexedHeatmap.Rnw 6
> tempMat <- log2(novartisA[matchedPemetrexedProbesets, ])
> source(file.path("Scripts", "jet.colors.R"))
> heatmap(as.matrix(tempMat), col = jet.colors(64), margins = c(7,
+ 4), cexRow = 0.45, cexCol = 0.65)M
CF
7T
−47
DH
CT
−11
6K
M12
HC
T−
15R
PM
I−82
26K
−56
2H
L−60
(TB
)S
RC
CR
F−
CE
MM
OLT
−4
SW
−62
0O
VC
AR
−4
OV
CA
R−
3H
S 5
78T
SN
B−
75S
F−
295
SF
−53
9N
CI−
H32
2MO
VC
AR
−5
A49
8U
O−
31E
KV
XA
549/
ATC
CR
XF
393
786−
0IG
RO
V1
OV
CA
R−
8N
CI/A
DR
−R
ES
HT
29D
U−
145
SN
12C
AC
HN
TK
−10
CA
KI−
1N
CI−
H22
6LO
X IM
VI
NC
I−H
23M
DA
−M
B−
231/
ATC
CH
OP
−92
HO
P−
62S
F−
268
BT
−54
9P
C−
3S
K−
ME
L−5
UA
CC
−25
7U
AC
C−
62N
CI−
H52
2U
251
SN
B−
19C
OLO
205
HC
C−
2998
NC
I−H
460
SK
−O
V−
3M
DA
−M
B−
435
M14
SK
−M
EL−
2M
ALM
E−
3MS
K−
ME
L−28
31545_at37746_r_at32748_at241_g_at40864_at33451_s_at41833_at39247_at40393_at36584_at40683_at40952_at797_at31462_f_at32892_at41448_at33377_at1227_g_at37484_at34245_at41402_at41643_at39350_at41442_at36191_at41459_at31537_at33144_at1355_g_at41738_at38545_at39799_at40327_at36118_at590_at33918_s_at41853_at39797_at33613_at39543_at40212_at32317_s_at37374_at41127_at37344_at34318_at242_at41435_at37744_r_at32225_at35699_at38908_s_at39328_at33854_at35762_at40492_at32835_at32259_at38404_at38119_at39329_at35747_at32573_at39018_at34859_at39169_at1100_at40821_at32433_at39749_at35351_at38287_at31510_s_at1318_at40854_at32251_at33361_at35434_at39149_at41757_at36535_at41234_at36988_at34858_at38478_at
Figure 1: Heatmap of centered expression values from the Novartis A set expression data for the pemetrexedprobesets reported by Hsu et al. [3] As these genes were chosen to separate sensitive from resistant lines, weexpect to see fairly pervasive structure separating two subgroups. We do not see this structure.
matchPemetrexedHeatmap.Rnw 7
> offsetPemetrexedProbesets <- rownames(novartisA)[match(matchedPemetrexedProbesets,
+ rownames(novartisA)) + 1]
> tempMat <- log2(novartisA[offsetPemetrexedProbesets, ])
> heatmap(as.matrix(tempMat), col = jet.colors(64), margins = c(7,
+ 4), cexRow = 0.45, cexCol = 0.65)R
PM
I−82
26 SR
HT
29H
L−60
(TB
)K
−56
2H
CC
−29
98C
CR
F−
CE
MM
OLT
−4
MC
F7
HC
T−
116
KM
12N
CI−
H46
0T
−47
DN
CI−
H32
2MN
CI−
H23
NC
I−H
522
DU
−14
5IG
RO
V1
SK
−O
V−
3T
K−
10O
VC
AR
−4
OV
CA
R−
3H
CT
−15
SW
−62
0P
C−
3O
VC
AR
−5
A54
9/AT
CC
EK
VX
A49
8R
XF
393
AC
HN
786−
0C
OLO
205
NC
I/AD
R−
RE
SC
AK
I−1
OV
CA
R−
8S
K−
ME
L−5
UA
CC
−25
7U
AC
C−
62LO
X IM
VI
BT
−54
9S
N12
CH
OP
−92
MD
A−
MB
−23
1/AT
CC
UO
−31
HO
P−
62U
251
SN
B−
19S
K−
ME
L−2
M14
MA
LME
−3M
MD
A−
MB
−43
5S
K−
ME
L−28
NC
I−H
226
SF
−26
8H
S 5
78T
SF
−29
5S
F−
539
SN
B−
75
40328_at1319_at38288_at41739_s_at39544_at32252_at40394_at40855_at1101_at38405_at34319_at37485_at39248_at33145_at41854_at38909_at36536_at35352_at33452_at38546_at40213_at38120_at33362_at591_s_at242_at39750_at37375_at37745_s_at32893_s_at41443_at41128_at1228_s_at33855_at41460_at356_at35763_at40684_at41436_at33378_at798_at40822_at35435_s_at40865_at32318_s_at31511_at33614_at39798_at31538_at32749_s_at41235_at39800_s_at35748_at38479_at31463_s_at31546_at40953_at36119_at41449_at34246_at33919_at32434_at36192_at39150_at39351_at41644_at32574_at40493_at36989_at39170_at37345_at1356_at41403_at41834_g_at36585_at39019_at34860_g_at34859_at32836_at39330_s_at39329_at32260_at243_g_at32226_at41758_at37747_at
Figure 2: Heatmap of centered expression values from the Novartis A set expression data for the probesetsreported by Hsu et al. [3] after “offsetting” by one row. As genes were chosen to separate sensitive fromresistant lines, we expect to see fairly pervasive structure separating two subgroups. We see this structurehere, suggesting that the reported list is incorrect due to an indexing error.
matchPemetrexedHeatmap.Rnw 8
> pemetrexedCor <- cor(t(scale(t(tempMat))))
> heatmap(pemetrexedCor, scale = "none", margins = c(7, 7), cexCol = 0.7,
+ cexRow = 0.7, col = jet.colors(64))M
OLT
−4
MC
F7
HL−
60(T
B)
CC
RF
−C
EM
K−
562
HC
C−
2998
HC
T−
116
KM
12N
CI−
H46
0S
W−
620
RP
MI−
8226 SR
HC
T−
15T
K−
10IG
RO
V1
SK
−O
V−
3O
VC
AR
−3
OV
CA
R−
4D
U−
145
CO
LO 2
05P
C−
3T
−47
DN
CI−
H23
NC
I−H
522
HT
29N
CI−
H32
2MA
549/
ATC
CLO
X IM
VI
EK
VX
OV
CA
R−
5S
K−
ME
L−5
UA
CC
−25
7U
AC
C−
62C
AK
I−1
OV
CA
R−
8N
CI/A
DR
−R
ES
AC
HN
RX
F 3
93A
498
786−
0H
OP
−62
SF
−29
5S
F−
539
BT
−54
9H
OP
−92
UO
−31
U25
1M
DA
−M
B−
435
SN
12C
MD
A−
MB
−23
1/AT
CC
SK
−M
EL−
28M
ALM
E−
3M M14
SK
−M
EL−
2N
CI−
H22
6S
F−
268
SN
B−
19H
S 5
78T
SN
B−
75
MOLT−4MCF7HL−60(TB)CCRF−CEMK−562HCC−2998HCT−116KM12NCI−H460SW−620RPMI−8226SRHCT−15TK−10IGROV1SK−OV−3OVCAR−3OVCAR−4DU−145COLO 205PC−3T−47DNCI−H23NCI−H522HT29NCI−H322MA549/ATCCLOX IMVIEKVXOVCAR−5SK−MEL−5UACC−257UACC−62CAKI−1OVCAR−8NCI/ADR−RESACHNRXF 393A498786−0HOP−62SF−295SF−539BT−549HOP−92UO−31U251MDA−MB−435SN12CMDA−MB−231/ATCCSK−MEL−28MALME−3MM14SK−MEL−2NCI−H226SF−268SNB−19HS 578TSNB−75
Figure 3: Heatmap of correlations between samples using the offset probesets from Figure 2. The clearestgroups of cell lines are those in the upper right and lower left.
matchPemetrexedHeatmap.Rnw 9
> affyControlRows = grep("^AFFX", rownames(novartisA))
> write.table(novartisA[-affyControlRows, ], file = file.path("MatlabFiles",
+ "NCI60Data", "nci60_numbers.csv"), sep = ",", row.names = FALSE,
+ col.names = FALSE)
> write.table(rownames(novartisA)[-affyControlRows], file = file.path("MatlabFiles",
+ "NCI60Data", "nci60_probesets.csv"), sep = ",", row.names = FALSE,
+ col.names = FALSE, quote = FALSE)
> write.table(colnames(novartisA), file = file.path("MatlabFiles",
+ "NCI60Data", "nci60_samples.csv"), sep = ",", row.names = FALSE,
+ col.names = FALSE, quote = FALSE)
7.2 Exporting the Starting Guess
We likewise export our starting guesses for the resistant and sensitive cell lines, and the target probeset idsthat we want to match.
> write.table(startingResistantLines, file = file.path("MatlabFiles",
+ "Pemetrexed", "startingResistantLines.csv"), sep = ",", row.names = FALSE,
+ col.names = FALSE, quote = FALSE)
> write.table(startingSensitiveLines, file = file.path("MatlabFiles",
+ "Pemetrexed", "startingSensitiveLines.csv"), sep = ",", row.names = FALSE,
+ col.names = FALSE, quote = FALSE)
> write.table(offsetPemetrexedProbesets, file = file.path("MatlabFiles",
+ "Pemetrexed", "targetProbesets.csv"), sep = ",", row.names = FALSE,
+ col.names = FALSE, quote = FALSE)
7.3 Invoking Matlab
The main Matlab script is buildPemetrexedHeatmap.m. This script iteratively explores the neighborhoodto increase the score until a maximum is found. At each iteration, it produces a figure indicating how thecurrent guess should be changed. For pemetrexed, this search and change process takes seven steps, asillustrated in Figures 4-10. The baseline score is 58 in agreement. The biggest jump (to 66) is seenwhen we add TK-10 to the resistant group. The biggest jump in the second neighborhood (to 71) is seenwhen we drop SW-620 from the resistant group. The biggest jump in the third neighborhood (to 73) is seenwhen we drop SNB-75 from the sensitive group. There is no change in the fourth neighborhood that causesan increase, and only one alteration to the list that leaves things at 73 – dropping KM12 from the resistantgroup (we make this alteration). The biggest jump in the fifth neighborhood (and there is one, to 77) is seenwhen we drop SF-268 from the sensitive group. The biggest jump in the sixth neighborhood (to 85, a perfectscore) is seen when we drop CCRF-CEM from the resistant group. Exploring the seventh neighborhoodshows that we are indeed at a maximum; any further alteration of the list drops the score. The numbers ofresistant and sensitive lines now match those shown by Hsu et al. [3]
After reaching a local maximum, we produce a heatmap using the final set of cell lines selected. Thisheatmap, shown in Figure 11, perfectly matches the one shown in Figure 1B of Hsu et al. [3]
8 Loading Matlab Output and Comparing Gene Lists
8.1 Loading Scores from Each Iteration
We begin by loading the numerical scores and starting vectors for each iteration of the fitting procedure.
matchPemetrexedHeatmap.Rnw 10
Figure 4: First iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving TK-10 from Unused to Resistant) isindicated with an offset circle. This shift increases the score from 58 to 66.
matchPemetrexedHeatmap.Rnw 11
Figure 5: Second iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving SW-620 from Resistant to Unused) isindicated with an offset circle. This shift increases the score from 66 to 71.
matchPemetrexedHeatmap.Rnw 12
Figure 6: Third iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving SNB-75 from Sensitive to Unused) isindicated with an offset circle. This shift increases the score from 71 to 73.
matchPemetrexedHeatmap.Rnw 13
Figure 7: Fourth iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving KM12 from Resistant to Unused) isindicated with an offset circle. This shift leaves the score unchanged at 73.
matchPemetrexedHeatmap.Rnw 14
Figure 8: Fifth iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving SF-258 from Sensitive to Unused) isindicated with an offset circle. This shift increases the score from 73 to 77.
matchPemetrexedHeatmap.Rnw 15
Figure 9: Sixth iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. The shift producing the maximum score (moving CCRF-CEM from Resistant to Unused)is indicated with an offset circle. This shift increases the score from 77 to 85.
matchPemetrexedHeatmap.Rnw 16
Figure 10: Seventh iteration to find an overall maximum for pemetrexed. White asterisks indicate the statusof each cell line using the starting guess, and colors indicate how the score increases or decreses as a givenchange is made. Any shift from the current guess decreases the score. Our best score is 85 (a perfect match).
matchPemetrexedHeatmap.Rnw 17
Figure 11: Heatmap for the pemetrexed signature using the final set of cell lines obtained through our searchprocedure. This heatmap is a perfect match for that in Figure 1B of Hsu et al. [3], indicating that these arethe cell lines and genes involved.
matchPemetrexedHeatmap.Rnw 18
> pemetrexedIteration1 <- read.table(file.path("MatlabFiles", "Pemetrexed",
+ "pemetrexedIteration1.csv"), header = TRUE, sep = ",", row.names = 1)
> pemetrexedIteration2 <- read.table(file.path("MatlabFiles", "Pemetrexed",
+ "pemetrexedIteration2.csv"), header = TRUE, sep = ",", row.names = 1)
> pemetrexedIteration3 <- read.table(file.path("MatlabFiles", "Pemetrexed",
+ "pemetrexedIteration3.csv"), header = TRUE, sep = ",", row.names = 1)
> pemetrexedIteration4 <- read.table(file.path("MatlabFiles", "Pemetrexed",
+ "pemetrexedIteration4.csv"), header = TRUE, sep = ",", row.names = 1)
> pemetrexedIteration5 <- read.table(file.path("MatlabFiles", "Pemetrexed",
+ "pemetrexedIteration5.csv"), header = TRUE, sep = ",", row.names = 1)
> pemetrexedIteration6 <- read.table(file.path("MatlabFiles", "Pemetrexed",
+ "pemetrexedIteration6.csv"), header = TRUE, sep = ",", row.names = 1)
> pemetrexedIteration7 <- read.table(file.path("MatlabFiles", "Pemetrexed",
+ "pemetrexedIteration7.csv"), header = TRUE, sep = ",", row.names = 1)
> pemetrexedIteration7[1:5, ]
startStatus Resistant Unused SensitiveCCRF-CEM Unused 77 85 50K-562 Resistant 85 72 45MOLT-4 Resistant 85 71 31HL-60(TB) Resistant 85 76 37RPMI-8226 Unused 71 85 57
8.2 Extracting the Cell Lines Used
Next, we extract the cell lines used to obtain the perfectly matching heatmap.
> pemetrexedResistantLines <- rownames(pemetrexedIteration3)[pemetrexedIteration7[,
+ "startStatus"] == "Resistant"]
> pemetrexedSensitiveLines <- rownames(pemetrexedIteration3)[pemetrexedIteration7[,
+ "startStatus"] == "Sensitive"]
> pemetrexedResistantLines
[1] "K-562" "MOLT-4" "HL-60(TB)" "MCF7" "HCC-2998" "HCT-116"[7] "NCI-H460" "TK-10"
> pemetrexedSensitiveLines
[1] "SNB-19" "HS 578T" "MDA-MB-231/ATCC" "MDA-MB-435"[5] "NCI-H226" "M14" "MALME-3M" "SK-MEL-2"[9] "SK-MEL-28" "SN12C"
8.3 Loading the Heatmap Genes
Next, we load the probesets used in the final heatmap.
> softwarePemetrexedProbesets <- read.table(file.path("MatlabFiles",
+ "Pemetrexed", "topPemetrexedGenesInHeatmapOrder.txt"), header = FALSE,
+ sep = ",", strip.white = TRUE)
> softwarePemetrexedProbesets <- as.character(softwarePemetrexedProbesets[,
matchPemetrexedHeatmap.Rnw 19
+ 1])
> sort(softwarePemetrexedProbesets)
[1] "1101_at" "1228_s_at" "1319_at" "1356_at" "242_at"[6] "243_g_at" "31463_s_at" "31511_at" "31538_at" "31546_at"[11] "32226_at" "32252_at" "32260_at" "32318_s_at" "32434_at"[16] "32574_at" "32749_s_at" "32836_at" "32893_s_at" "33145_at"[21] "33362_at" "33378_at" "33452_at" "33614_at" "33855_at"[26] "33919_at" "34246_at" "34319_at" "34859_at" "34860_g_at"[31] "35352_at" "35435_s_at" "356_at" "35748_at" "35763_at"[36] "36119_at" "36192_at" "36536_at" "36585_at" "36989_at"[41] "37345_at" "37375_at" "37485_at" "37745_s_at" "37747_at"[46] "38120_at" "38288_at" "38405_at" "38479_at" "38546_at"[51] "38909_at" "39019_at" "39150_at" "39170_at" "39248_at"[56] "39329_at" "39330_s_at" "39351_at" "39544_at" "39750_at"[61] "39798_at" "39800_s_at" "40213_at" "40328_at" "40394_at"[66] "40493_at" "40684_at" "40822_at" "40855_at" "40865_at"[71] "40953_at" "41128_at" "41235_at" "41403_at" "41436_at"[76] "41443_at" "41449_at" "41460_at" "41644_at" "41739_s_at"[81] "41758_at" "41834_g_at" "41854_at" "591_s_at" "798_at"
8.4 Comparing Gene Lists
Now we see which genes we can and cannot match.
> sum(!is.na(match(offsetPemetrexedProbesets, softwarePemetrexedProbesets)))
[1] 85
> setdiff(softwarePemetrexedProbesets, offsetPemetrexedProbesets)
character(0)
As noted above, the software output matches the offset gene list perfectly.
9 Save Rda File
Finally, we save the data associated with our examination of the pemetrexed signature.
> save(pemetrexedCor, pemetrexedIteration1, pemetrexedIteration2,
+ pemetrexedIteration3, pemetrexedIteration4, pemetrexedIteration5,
+ pemetrexedIteration6, pemetrexedIteration7, pemetrexedReportedProbesets,
+ pemetrexedResistantLines, pemetrexedSensitiveLines, pemetrexedTable,
+ matchedPemetrexedProbesets, offsetPemetrexedProbesets, softwarePemetrexedProbesets,
+ file = file.path("RDataObjects", "pemetrexedAll.Rda"))
matchPemetrexedHeatmap.Rnw 20
10 Appendix
10.1 File Location
> getwd()
[1] "/Users/kabagg/ReproRsch/AnnAppStat"
10.2 Saves
10.3 SessionInfo
> sessionInfo()
R version 2.9.1 (2009-06-26)i386-apple-darwin8.11.1
locale:en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:[1] stats graphics grDevices utils datasets methods base
other attached packages:[1] geneplotter_1.22.0 lattice_0.17-25 annotate_1.22.0[4] AnnotationDbi_1.6.1 Biobase_2.4.1 XML_2.6-0
loaded via a namespace (and not attached):[1] DBI_0.2-4 grid_2.9.1 KernSmooth_2.23-2 RColorBrewer_1.0-2[5] RSQLite_0.7-1 xtable_1.5-5
References
[1] Coombes KR, Wang J, Baggerly KA: Microarrays: retracing steps. Nat Med, 13:1276-7, 2007. Authorreply, 1277-8.
[2] Gyorffy B, Surowiak P, Kiesslich O, et al.: Gene expression profiling of 30 cancer cell lines predictsresistance towards 11 anticancer drugs at clinically achieved concentrations. Int J Cancer, 118:1699-712, 2006.
[3] Hsu DS, Balakumaran BS, Acharya CR, et al.: Pharmacogenomic strategies provide a rational approachto the treatment of cisplatin-resistant patients with advanced cancer. J Clin Oncol, 25:4350-4357, 2007
[4] Potti A, Dressman HK, Bild A, et al: Genomic signatures to guide the use of chemotherapeutics. NatMed, 12:1294-1300, 2006.