Upload
norah-mathews
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Reanalysis of Petricoin et al. Ovarian Cancer Data Set 3
Russ Wolfinger and Geoff Mann
SAS Institute Inc.
NISS Proteomics Workshop
March 6, 2003
New Paper from MD Anderson
Baggerly, K.A, Morris, J.S., and Coombes, K.R. (2003). Cautions about Reproducibility in Mass Spectrometry Patterns: Joint Analysis of Several Proteomic Data Sets
Email: [email protected]
• Reanalyses of all three ovarian cancer data sets
• For data set 3, they note that two pairs of m/z values provide perfect discrimination: 435.46 & 465.57, and 2.79 & 245.2. Easy to find with simple t-tests; genetic algorithm unnecessary.
First Pair: 435.46 and 465.57 Da
Green: Cancer, Red: NormalLeft: Green in Front, Right: Red in Front
Going Small: 435 Da
• At least 100 peptide fragments (including permutations) add up to 435, e.g. AFY, SMY, PPW, KNH, GGGAC, SSGGG
• 30 Hits from ChemFinder.com, including Sphingosyl-phosphocholine, a lipid molecule
• Similar kind of story for 465 Da
Going Large: Cross-Validated Stepwise Discriminant Analysis
1. Subtract baselines and determine 330 most prominent peak areas, all with m/z > 600.
2. Form 500 random partitions of the 253 spectra, with a 33% stratified holdout sample in each.
3. Perform stepwise discriminant analysis on each partition, using entry p = 0.05, exit p = 0.20, and max variables = 5.
4. Compute misclassification rate on each trial.
Results of Cross-Validated Stepwise Discriminant Analysis
1. Always picked 5 variables
2. Misclassification rate = 5%.
3. Most common discriminators:
• 681, appeared in 100% of selected quintuples
• 7379, in 63%
• 869, in 54%
• 4004, in 44%