Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Very Large Biomedical Data Sets(Trying To Do Thousands of Hypothesis Tests at the Same Time)
Bradley EfronStanford University
Reference “Microarrays, Empirical Bayes and the Two-Groups Model”http://www-stat.stanford.edu/~brad/papers/twogroups.pdf
What is “Statistics”?
• Learning from experience– That arrives a little bit at a time.
• Clinical Trial– No one patient’s response is conclusive, but
information can be accrued across patients….
Gene 4124: Prostate Cancer Study
• 50 Healthy Men‒1.05, 0.34, 1.16, ‒0.29, ‒0.40 LL 0.13, ‒0.81, 0.71, 0.80
Mean ‒0.033
• 52 Prostate Cancer Patients0.07, 1.67, 1.58, ‒1.06, ‒1.04 L
L ‒1.05, 0.83, 0.21, 0.50Mean 0.325
• Question: Is gene 4124 “overexpressed”in prostate cancer patients?
Hypothesis Test
Prostate Cancer Study
• 6033 genes
• 6033 z-values, comparing cancer patients with healthy controls for each gene
• Is gene 4124 still “interesting”?
PROSTATE CANCER DATA (Microarray)(Singh et al. 2002)
-0.91-0.790.00-0.80-0.80-0.70-0.67-0.09-0.25gene6033
0.100.09-0.89-0.88-0.87-0.91-0.881.33-0.90gene6032
-1.18-0.82-1.18-1.17-0.92-0.91-0.790.100.35gene6031
.
.
-0.14-0.14-0.10-1.080.941.701.050.18-1.12gene5
-1.130.43-0.19-0.36-0.13-1.13-0.102.42-0.36gene4
-0.03-1.100.094.040.11-1.160.220.100.06gene3
3.57-0.82-0.27-0.830.25-0.75-0.16-0.85-0.84gene2
1.470.732.77-1.09-0.58-0.99-1.08-0.75-0.93gene1
“z”pat102pat101pat52pat51pat50pat49pat2pat1
TESTSTATISTICS
PROSTATE CANCERHEALTHY
Question: Which genes, if any, are implicated in the development of prostate cancer?
Doing 6033 Hypothesis Tests at Once
False Discovery Rates(Benjamini and Hochberg 1995)
False Discovery Control Algorithm
A SNP Study(Quertermous et al.)
• 1000 subjects: 500 cardiovascular, 500 healthy
• Polymorphisms examined at 550,000 locations on whole genome
• Look for correlation between polymorphisms and disease status
550,000 z-values, one for each SNP• Mostly null!• Fdr{|z| > 4} = 34.7/41 = 84%
The Brain Data(Schwartzman et al. 2005)