22
Very Large Biomedical Data Sets (Trying To Do Thousands of Hypothesis Tests at the Same Time) Bradley Efron Stanford University Reference “Microarrays, Empirical Bayes and the Two-Groups Model” http://www-stat.stanford.edu/~brad/papers/twogroups.pdf

Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

Very Large Biomedical Data Sets(Trying To Do Thousands of Hypothesis Tests at the Same Time)

Bradley EfronStanford University

Reference “Microarrays, Empirical Bayes and the Two-Groups Model”http://www-stat.stanford.edu/~brad/papers/twogroups.pdf

Page 2: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

What is “Statistics”?

• Learning from experience– That arrives a little bit at a time.

• Clinical Trial– No one patient’s response is conclusive, but

information can be accrued across patients….

Page 3: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

Gene 4124: Prostate Cancer Study

• 50 Healthy Men‒1.05, 0.34, 1.16, ‒0.29, ‒0.40 LL 0.13, ‒0.81, 0.71, 0.80

Mean ‒0.033

• 52 Prostate Cancer Patients0.07, 1.67, 1.58, ‒1.06, ‒1.04 L

L ‒1.05, 0.83, 0.21, 0.50Mean 0.325

• Question: Is gene 4124 “overexpressed”in prostate cancer patients?

Page 4: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

Hypothesis Test

Page 5: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 6: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

Prostate Cancer Study

• 6033 genes

• 6033 z-values, comparing cancer patients with healthy controls for each gene

• Is gene 4124 still “interesting”?

Page 7: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

PROSTATE CANCER DATA (Microarray)(Singh et al. 2002)

-0.91-0.790.00-0.80-0.80-0.70-0.67-0.09-0.25gene6033

0.100.09-0.89-0.88-0.87-0.91-0.881.33-0.90gene6032

-1.18-0.82-1.18-1.17-0.92-0.91-0.790.100.35gene6031

.

.

-0.14-0.14-0.10-1.080.941.701.050.18-1.12gene5

-1.130.43-0.19-0.36-0.13-1.13-0.102.42-0.36gene4

-0.03-1.100.094.040.11-1.160.220.100.06gene3

3.57-0.82-0.27-0.830.25-0.75-0.16-0.85-0.84gene2

1.470.732.77-1.09-0.58-0.99-1.08-0.75-0.93gene1

“z”pat102pat101pat52pat51pat50pat49pat2pat1

TESTSTATISTICS

PROSTATE CANCERHEALTHY

Question: Which genes, if any, are implicated in the development of prostate cancer?

Page 8: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

Doing 6033 Hypothesis Tests at Once

Page 9: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 10: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

False Discovery Rates(Benjamini and Hochberg 1995)

Page 11: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 12: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 13: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

False Discovery Control Algorithm

Page 14: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

A SNP Study(Quertermous et al.)

• 1000 subjects: 500 cardiovascular, 500 healthy

• Polymorphisms examined at 550,000 locations on whole genome

• Look for correlation between polymorphisms and disease status

550,000 z-values, one for each SNP• Mostly null!• Fdr{|z| > 4} = 34.7/41 = 84%

Page 15: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 16: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 17: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”

The Brain Data(Schwartzman et al. 2005)

Page 18: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 19: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 20: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 21: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”
Page 22: Very Large Biomedical Data Setsckirby/brad/talks/2008Biomedical2.pdf · gene1 -0.93 -0.75 -1.08 -0.99 -0.58 -1.09 2.77 0.73 1.47 pat1 pat2 pat49 pat50 pat51 pat52 pat101 pat102 “z”