37
Microarray data analysis MBVINFX410 26 th Nov, 2012 Ståle Nygård, BioinformaCcs core facility, OUS/UiO staaln@ifi.uio.no

Microarray(dataanalysis( - UiO

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Microarray(dataanalysis( - UiO

Microarray  data  analysis  

MBV-­‐INFX410  26th  Nov,  2012  

Ståle  Nygård,  BioinformaCcs  core  facility,  OUS/UiO  [email protected]  

Page 2: Microarray(dataanalysis( - UiO

Gene  expression  Gene  expression  is  the  process  by    which  informaCon  from  a  gene  is  used  in  the  synthesis  of  a  funcConal  gene  product.  

Page 3: Microarray(dataanalysis( - UiO

Microarrays  

•  Measure  the  expression  of  several  thousand  genes  simultaneously  

•  Are  oQen  used  to  find  differenCally  expressed  genes  – Between  groups  of  individuals  (with  different  phenotypes,  e.g.    disease/healthy,  long/short  survival  etc)  

– Over  Cme  (e.g  as  disease  develop,  as  Cssue  develop)    

Page 4: Microarray(dataanalysis( - UiO

4

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  •  Todays  technology:  High  

density  arrays  •  High  througput  sequencing  

(”Next  generaCon  sequencing”)  

•  High  througput  sequencing  

1977  

1987  

1995  

1996  

2003  

2005  

Future  

Page 5: Microarray(dataanalysis( - UiO

5

Development  of  microarrays  •  MulCple  Northern  blots  1977  

Page 6: Microarray(dataanalysis( - UiO

6

•  MulCple  Northern  blots  •  Macroarrays  (spo`ed  cDNAs,  nylon  filter,  ~  1000  

gener)  1987  

Development  of  microarrays  1977  

Page 7: Microarray(dataanalysis( - UiO

7

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  (cDNA  probes  >200  nt,  PCR  

produced)  

1977  

1987  

1995  

Page 8: Microarray(dataanalysis( - UiO

8

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  (oligos  ~50-­‐80  nt,  

more  than          10  000  genes)  

1977  

1987  

1995  

1996  

Page 9: Microarray(dataanalysis( - UiO

9

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  •  Todays  technology:  High  

density  arrays  (e.g  Illumina  BeadArrays:  50  nt  probes,        1  000  000s  of  probes)  

1977  

1987  

1995  

1996  

2003  

Page 10: Microarray(dataanalysis( - UiO

10

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  •  Todays  technology:  High  

density  arrays  •  Next  generaCon  sequencing  

(RNA-­‐Seq)  

1977  

1987  

1995  

1996  

2003  

2005  

Page 11: Microarray(dataanalysis( - UiO

11

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  •  Todays  technology:  High  density  

arrays  •  Next  generaCon  sequencing  (RNA-­‐  

seq)  •  Next-­‐next  generaCon  sequencing:  

True  single  molecule  sequencing.  E.g  NanoPore  technology  (h`p://www.nanoporetech.com)  

1977  

1987  

1995  

1996  

2003  

2005  

Future  

Page 12: Microarray(dataanalysis( - UiO

Microarray  technology  vs  RNAseq  •  Main  caveats  microarrays:  

–  Problem  with  alternaCve  splicing;  Probes  on  the  microarray  might  not  represent  all  the  (alternaCvely  spliced)  RNAs    

–  Problem  with  degradaCon  (less  of  a  problem  for  RNAseq)  

•  Main  caveats  RNA-­‐Seq:  –  Highly  expressed  genes  can  take  up  very  much  of  the  space  on  the  slide,  giving  low  accuracy  to  lowly  expressed  genes  

–  RNAseq  technology  is  sCll  more  expensive  than  microarrays  (but  the  prices  are  dropping)  

Page 13: Microarray(dataanalysis( - UiO

13

The  experiment  pipeline  

Biological question

Experimental design

QC of samples

Microarray experiment Preprocessing

of data

Statistical analysis

Biological verification & interpretation

1

2

3

4

QC of data

56

7

8

Page 14: Microarray(dataanalysis( - UiO

14

Microarray  pipline  (simplified)  

AmplificaCon  and    

Labelling  

RNA/DNA Nucleic  acid  

purificaCon  

Labeled RNA/DNA

HybridisaCon,  washing  

Bioinformaticanalysis

Scan,  QuanCtate  

Raw data

E B E`B E pBEBLE ÐB@E @B@E àB@E BhEpBHE °BPE pB‚E`B`EðBE BHE PB$E �BE B�E B@E(E BEBPE €B8E àB$E àB$E PB E#°BLE `B`E àBPE °B E ÐBDE B8E B���B B���E B$E�ÀBLE BE �B`E`B@E"�BTE °B E �B€E @B,E ���ÀB8E%BªE ÀB\E °BHE �B8E @B\E �BLE €B4E àB$E `B E ÀB8E @B4E ðB@E B E àB$E �BDE B<E ÐBTE ���°B,E B$E PB E B@E ðB,E B<E 0BHE €B4E B E @BE B(E €B,E BXE!@BXE `BDE àBdEpBHE B(E#ÀB4E `B4E €B4E °B4E)`B E @B4E 0BDE pBdE`BHE PB E @B E @B�E ÀBE!PB0E pB E"°B E pB,EàBPE B`E��BHE ��� B8EpB���E pB@E B

Pre-­‐processing  

Sample

Page 15: Microarray(dataanalysis( - UiO

15

The  experiment  pipeline  

Biological question

Experimental design

QC of samples

Microarray experiment Preprocessing

of data

Statistical analysis

Biological verification & interpretation

1

2

3

4

QC of data

56

7

8

Page 16: Microarray(dataanalysis( - UiO

16

Experimental  design:  general  strategy  

•  Ensure  that  you  will  not  have  any  systemaCcs  biases:  –  Distribute  the  biological  groups  in  a  balanced  way.    –  Divide  into  batches  of  the  same  sizes,  limited  b  the  capacity  on  each  step.  

–  Tip:  In  Excel  (or  similar  program)  color  code  sample  name  according  to  biological  group,  and  in  next  column  color  code  by  batch.  

•  Randomize  and  balance  according  to  the  biology  your  are  interested  in.    

Page 17: Microarray(dataanalysis( - UiO

17

Experimental  plan:  an  example  Biology  

A1  

A2  

A3  

A4  

A5  

A6  

B1  

B2  

B3  

B4  

B5  

B6  

C1  

C2  

C3  

C4  

C5  

C6  

Biology Sample  prepara3on  

order  

A1 1 B4 2 C2 3 A3 4 B6 5 C4 6 A5 7 B2 8 C6 9 A2 10 B3 11 C1 12 A4 13 B5 14 C3 15 A6 16 B1 17 C5 18

Biology Sample  

prepara3on  order  

 Extrac3on  

order  

A2 10 1 B6 5 2 C1 12 3 A5 7 4 B5 14 5 C6 9 6 A6 16 7 B4 2 8 C5 18 9 A3 4 10 C3 15 11 B2 8 12 A4 13 13 C4 6 14 B1 17 15 A1 1 16 B3 11 17 C2 3 18

Page 18: Microarray(dataanalysis( - UiO

18

Experimental  design:  Batch  effect  (1)  

Samples  color  coded  according  to  biology  

Page 19: Microarray(dataanalysis( - UiO

19

Exp.  design:  Batch  effect  (2)  

Samples  color  coded  according  to  labeling  date  

Page 20: Microarray(dataanalysis( - UiO

20

Image  analysis  of  microarray  data  •  Main  steps  

–  Address  spots  –  Separate  foreground  from  background  –  Quality  check:  Localize  and  remove  bad  quality  spots  

–  Quality  check  of  the  microarray  as  a  whole  

•  AutomaCzaCon  ü Commercial  plamorms  

•  Today  image  analysis  is  basically  an  automated  procedure  performed  by  the  soQware  

•  Manual  quality  check  is  relevant  for  protein  arrays  and  tailor-­‐made  microarrays  

Page 21: Microarray(dataanalysis( - UiO

NormalizaCon  

•  Goal:  remove  technical  arCfacts,  which  can  be  due  to  – Different  amounts  of  input  material  – Different  degrees  of  degradaCon  – Dust,  scratches  etc  on  the  arrays  – ++  

•  Most  normalizaCon  methods  assume  that  the  overall  intensity  is  the  same  for  different  samples  (e.g  quanCle  normlizaCon).    

Page 22: Microarray(dataanalysis( - UiO

22

QuanCle  normalizaCon    •  Enforce  equal  distribuCon  between  the  

microarrays.  Procedure  –  Sort  the  expression  values  for  each  

microarray  from  highest  to  lowest    –  Calculate  the  mean  value  for  each  rank  –  For  every  array    

•  let  the  highest  ranked  gene  have  the  mean  value  of  the  highest  ranked  genes  (of  all  arrays)  

•  Let  the  second  highest  ranked  gene  have  the  mean  value  of  the  second  highest  ranked  genes  (of  all  arrays)  

•  and  so  on  for  all  ranks  

Page 23: Microarray(dataanalysis( - UiO

NormalizaCon  using  TMM  (Trimmed  Mean  of  M-­‐values)  

Highly  expressed  genes  having  big  influence  on  library  size    

.

(a)

log2(Kidney1 NK1) − log2(Kidney2 NK2)

Den

sity

-6 -4 -2 0 2 4 6

0.0

0.4

0.8

log2(Liver NL) - log2(Kidney NK)

Den

sity

-6 -4 -2 0 2 4 6

0.0

0.2

0.4(b)

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●●

●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●●

●●

●● ●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

● ●●

● ● ●

●●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●●

● ●

●●

●●

●● ●●

● ●

●●

●●

●●●

●●●

● ●

●●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

●● ●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●●

●● ●●● ● ●

●●●

●●

●● ●

● ●

● ●

●●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●● ● ●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

● ●●

● ●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●● ●

●●●●

●●●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●● ●●

●●

●●

●●

● ●

●●

●●

● ●

● ● ●

●●

●●

● ●●

●●

●●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●● ●

●●

●●

● ●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●● ●

● ●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ● ●●

●●

●●

●●

● ● ●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●●●

●●

●● ●

●●

●●

●● ●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●● ●●

●●

●●

●● ●

●● ●

● ●

●●

●●

●●

● ●

●● ●

●●

● ●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●●

●●

● ●

●●● ●● ●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●●

● ● ●

●●

●●

●●

●● ●

●●●

● ●

●●●

●●

● ●●

●●

●●

● ●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

● ●

● ●

●●●

● ●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●● ●

●●

●●

●●

● ●● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ● ●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

● ●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●● ●

● ●

●●

●●

●●

● ●

●●

●●● ●

●●

● ●

●●

●●

●●

●● ●●●

●●

●●

●●

● ●

●●

●● ●

● ●●

●●

●●

●● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

● ●

●● ●●

●●

● ●

● ●

● ●

●●●

●●●

● ●

●●

● ●

●●

● ●

●●

●●

●● ●

● ●

●●

●●● ● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●● ● ●●●

●●

● ●●

●●●

● ● ●●

●●

●●

●●●●

●●

●● ●

●●

●●

● ●●

●●

●●●●

● ●●●

●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●●

● ● ●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●● ●●● ●

● ●

● ●

●●●

● ●

●●

●●

●●

● ●●●

●●●●

●● ●

● ● ●

●● ●

●●

●● ●

●●

●●

●●

●●

●●●●

● ●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

● ● ●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●● ●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●●

●●

● ●

●●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

● ●●

● ●

●●

●●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●● ●

● ●

●●

●● ●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

●● ●

●●

●●

● ●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

● ●

● ●

●● ●●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

●●

● ●●

● ●

● ●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●● ●

●●

●●

● ●●●

●●

● ●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●●

●● ●● ●

● ●●

●●

●● ●● ●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●● ●

●●

● ●

●●

● ●

● ●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

●●

● ● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●●

●●

●● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●● ●

● ●

● ●

●●

●●

●●

●●

●●●

●●●

● ●●●● ●●

●●

●●

●●

●●● ● ●

● ●

●●●

●●

● ●

●● ●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ● ●

●●

●●

● ●

●● ●

● ●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●● ●●

●●

●●

● ●

● ●

●●

●●

● ●

●●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●●●

●●●

●●

●● ●

● ●

● ● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●●

●●●

● ●●

● ●

●●

●● ●●

●●

● ●●

●● ●

●●●

●●

●●●

●●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

● ●

●● ● ●

●●

●●●

●●

● ●●

●●

●●

●●

● ●●●

●●

● ●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●● ● ●

● ●●

● ●

●●

●●

●●

● ●

● ●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●● ●●

●●

● ●

● ●● ● ●●

● ●

●●

●●

●●

●●●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●● ●●

● ●

●● ●

●●●

● ●

● ●

●●

●●

●●

●● ●

● ● ●

●●

● ●

●●

●●

●●

● ● ●●

●●

●●●

●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●● ● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●●●

●● ●

●● ●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●● ●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

● ●

●●

●●

●●●

●●

●●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ● ●

●●

-20 -15 -10

-50

5

A = log2( Liver NL Kidney NK)

M=

log 2

(Liv

erN

L)-

log 2

(Kid

ney

NK)

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Housekeeping genesUnique to a sample

(c)

In  TMM  the  genes  with  the  smallest  and  largest  raCos  (i.  e  40%  of  the  genes)  are  not  used  in  the  normalizaCon.  

Page 24: Microarray(dataanalysis( - UiO

24

DistribuCon  of  microarray  data  • Ordinary  scale:  noise  proporConal  to  signal,  data  not  normally  distributed  

• Log2  scale:  noise  less  proporConal  to  signal,    distribuCon  closer  to  normal  (a  prerequisit  for  many  tests)  

Normal  scale  

Log2  scale  

Page 25: Microarray(dataanalysis( - UiO

TesCng  for  differenCal  expression  –  microarray  data  

Ordinary  t-­‐test:          Variance  esCmates  can  be  improved  by  ”borrowing  strength”  across  genes  in  a  technique  called  variance  shrinkage        Many  methods  use  this  technique,  e.g  SAM  and  limma.  NB!  This  technique  is  relevant  only  for  small  sample  sizes.    

ti =xi − yiσ i

t 'i =xi − yi

B*σ i + (1−B)*σ all

Page 26: Microarray(dataanalysis( - UiO

DistribuCon  of  RNAseq  data  

•  What  is  the  distribuCon  of  counts  for  a  parCcular  RNA  – Counts  from  technical  replicates  are  approximately  Poisson  distributed.  

– Biological  replicates  exhibit  more  variance,  for  which  the  negaCve  binomial  distribuCon  gives  a  be`er  fit.  (Ballard  et  al,  2010)  

Page 27: Microarray(dataanalysis( - UiO

Poisson  vs  negaCve  binomial  distribuCon  

0 5 10 15 20

0.00

0.10

Mean=5

Count

Prob

abilit

y

PoissonNeg.binom (phi=0.01)Neg.binom (phi=0.1)

0 50 100 150 200

0.00

0.02

0.04

Mean=100

Count

Prob

abilit

y

PoissonNeg.binom (phi=0.01)Neg.binom (phi=0.1)

Page 28: Microarray(dataanalysis( - UiO

•  Counts  are  normalized  using  TMM  (Trimmed  mean  of  M-­‐values)  

•  A  negaCve  binomial  distribuCon  is  assumed  and  the  extra  dispersion  parameter  is  esCmated.  The  parameter  can  be  common  to  all  genes,  gene-­‐specific,  or  a  combinaCon  

The  edgeR  procedure  

Page 29: Microarray(dataanalysis( - UiO

CorrecCon  for  mulCple  tesCng  

In  ordinary  microarray  studies  (looking  at  all  genes),  use    false  discovery  rates  instead  of  ordinary  p-­‐values  

Page 30: Microarray(dataanalysis( - UiO

30

Hierarchical  clustering  •  Genes  and  samples  can  

be  clustered  at  the  same  Cme  

•  AgglomeraCve:  start  with  one  element  as  a  cluster  (bo`om-­‐up).  Most  common  

•  Divisive:  start  with  all  elements  in  one  large  cluster  (top-­‐down)  

•  Dendrogram:  a  cluster  tree  

•  Why  cluster  genes?  ü  Reduce  complexity  ü  Generate  hypothesis,  e.g.  

hypothesize  that  a  group  of  genes  with  similar  expression  profiles  interact  or  are  involved  in  the  same  process    

•  Why  cluster  samples?  ü  IdenCfy  known  sub-­‐

groups  ü  Find  new  or  more  

detailed  subgroups  ü  Quality  check  (detect  

outliers)  

Page 31: Microarray(dataanalysis( - UiO

31

Distance  measures  •  In  clustering  algorithms  two  similar  elements  should  be  placed  in  the  same  cluster  

What  profiles  are  most  similar?  

-­‐  Dependent  on  the    distance  measure  used  

X   If    

Eucledian  distance  measure  is  

used  

If  correlaCon  is  used  as  distance  measure  

X  

Page 32: Microarray(dataanalysis( - UiO

Network  construcCon  based  on  microarray  data  

 •  Network  construcCon  from  genomic  data  is  difficult.  Many  possible  combinaCons  of  interacCons.  •    Network  construcCon  could  be  guided  by  including  external  informaCon  about  interacCons.  •  Seeded  Bayesian  Networks    (Djebbari  and  Quackenbush,  2008)  guide  the  network  construcCon  by  including  interacCons  reported  in  literature  and  protein-­‐protein  interacCon  databases.  •    The  R  package  Bionet    connects  regulated  genes  using  a  protein-­‐protein  interacCon  database.  

Page 33: Microarray(dataanalysis( - UiO

PaCent  focused  analysis    (predicCon/classificaCon)  

Page 34: Microarray(dataanalysis( - UiO

ClassificaCon/predicCon  approach  •  Instead of looking at each gene’s correlation to the phenotype

one by one (gene focused analysis), the optimal classification/prediction rule looks at the effect of all genes simultaneously. We then answer the question: what is the effect of gene i when we account for the effect of all other genes.

•  Best prediction rule picks out genes with orthogonal (independent) information about the phenotype.

•  Methodological problem: How to fit a model with a much larger number (p) of explanatory variables (the genes) than the number of individuals (n). This is called the p > n (p larger than n) problem.

•  The solution is to reduce the number of dimensions

Page 35: Microarray(dataanalysis( - UiO

Variance  bias  trade-­‐off  •  Such  methods  are  in  fact  biased,  i.e  underesCmaCng  the  effect  of  each  gene.  

•  But  they  have  reduced  variance,  leading  to  smaller  predicCon  error.  

•  PredicCon  error=bias^2  +variance  

Page 36: Microarray(dataanalysis( - UiO
Page 37: Microarray(dataanalysis( - UiO

Dealing  with  survival  data  

•  Survival or time to event data have the problem of censoring. Event (e.g. death) does not always occur before end of study.

•  The  Cox  model  is  the  most  common  model  dealing  with  censoring.  In  the  Cox  model  the  hazard  rate  ,  i.e.  the  instantaneous  risk  of  failure  at  Cme  t,  is  modeled  by  

where  t  is  Cme  and  x  is  the  gene  expressions  of  a  specific  gene  and  β  is  the  effect  of  the  gene  on  survival.