64
Workshops in nextgenera1on science at UNC Charlo7e 2014 Workshop 2 R, RStudio, & reproducible research with knitr 1

WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Embed Size (px)

DESCRIPTION

Slides from Workshop 2 of Workshop in Next-Generation Science held at UNC Charlotte City Center Campus in May 2014

Citation preview

Page 1: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Workshops  in  next-­‐genera1on  science  at  UNC  Charlo7e  2014  

Workshop  2  -­‐  R,  RStudio,  &  reproducible  research  with  knitr  

1  

Page 2: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

 R,  RStudio,  &  reproducible  research  with  knitr  

2  

wings  2014  

Page 3: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

No  programming  experience  necessary  

"we  wanted  users  to  be  able  to  begin  in  an  interac1ve  environment,  where  they  did  not  consciously  think  of  themselves  as  programming.  Then  as  their  needs  became  clearer  and  their  sophis1ca1on  increased,  they  should  be  able  to  slide  gradually  into  programming..."  John  Chambers,  Stages  in  the  Evolu0on  of  S  

 

3  

Page 4: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Why  use  R?  

•  Free  &  open  source  •  Has  a  lot  of  support    – Popular  in  many  domains  (finance,  business  analy1cs,  sta1s1cs,  biology)  

•  Many  libraries  available  for  biological  data  analysis  through  Bioconductor  project    – Such  as  EdgeR  (today)  

•  Now  has  an  easy  to  use,  free  user  interface  called  RStudio  

4  

Page 5: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

RStudio  

•  A  very  nice  graphical  user  interface  for  R.  •  It's  free!    •  Integrates  well  with  knitr  –  tool  for  wri1ng  sta1s1cal  reports  w/  R  markdown  

5  

Page 6: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

R  Markdown  ".Rmd"    

•  Lets  you  write  a  report  that  combines  results  and  commands    

•  Sounds  weird,  but  once  you  get  used  to  it,  it's  very  powerful  

•  Catch  mistakes  before  publica1on  – Ask  a  friend  to  run  &  review  your  data  analysis    

6  

Page 7: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

knitr  &  R  Markdown  enable  literate  programming  

•  A  way  to  do  "literate  programming"    –  Developed  by  Donald  Knuth,  Stanford  Computer  Science  professor  

•  Literate  programming:  Write  programs  that  explain  what  they  are  doing  while  they  are  doing  it.  

•  Prac1cal  applica1on:  Data  Analysis  Reports  

7  

Page 8: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Plan  for  Today  

•  Introduce  R  and  RStudio  – Part  I:  Func1ons  &  plots  – Part  2:  Markdown  

– Part  3:  See  how  sta1s1cal  tes1ng  works  in  R  •  Differen1al  expression  analysis  walk-­‐through  (may  extend  into  Workshop  3)  

•  Goal:  Get  you  started!    – Lots  of  Web  resources  for  further  study  

8  

Page 9: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Let's  get  started!  

9  

Page 10: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Start  RStudio  

•  RStudio  has  panes    – w/  min,  max  bu7ons  (top  right)  

•  Panes  have  tabs  

10  

console  where  you  type  commands   environment,  shows  variables  you've  defined  

Page 11: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Make  new  project  (Part  1)  

•  Select  File  >  Project  >  New  Project  ..    

•  Choose  New  Directory  

11  

Page 12: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Make  new  project  (Part  2)  

•  Choose  Empty  Project  

12  

Page 13: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Make  new  project  (Part  3)  

•  Choose  Empty  Project  

•  Enter  "wings2014"    

•  Click  Create  Project  

13  

Page 14: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Project  name  in  upper  right  corner  

14  

Page 15: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

•  Open  folder  wings2014  •  See  wings2014.Rproj  file  •  Tip:  Aier  quit,  double-­‐click  to  start  RStudio  with  correct  directory  sekngs  

15  

Page 16: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Enter  commands  in  Console  

16  

>  symbol  is  the  prompt  

•  Type  commands  or  expressions  at  the  prompt,  ENTER  

•  R  evaluates  what  you  type,  prints  the  result  

•  Returns  prompt  

Page 17: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Try  arithme1c  expressions  

•  Add  +  •  Subtract  -­‐  •  Mul1ply  *  

•  Raise  to  a  power  **  

17  

•  Expressions  return  values  as  one-­‐element  vectors.    

•  [1]  indicates  that  the  value  next  to  it  has  this  index.  

Page 18: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Save  results  to  variables  

18  

•  Use  '='  to  assign  result  to  a  variable  – Nothing  printed  

•  Type  variable  name  to  see  what's  in  it  

•  Use  variables  in  expressions  

Page 19: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Variables  refer  to  objects  

19  

•  Environment  tab  shows  objects  created  thus  far  •  Most  of  what  you  do  in  R  involves  manipula1ng  objects  saved  to  variable  names  – Use  objects  as  inputs  to  func1ons    

Page 20: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

R  func1ons  

•  R  has  many  func1ons  – math  – plokng  – sta1s1cal  tests    

•  Func1ons  take  inputs    called  arguments  •  Most  func1ons  have  many  possible  arguments  – Usually  have  reasonable  defaults  

20  

argument  

Page 21: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

How  to  use  a  func1on  in  4  steps  

1.  Type  func1on  name  2.  Type  "("  open  paren  

!  RStudio  types  closing  paren  for  you  

3.  Type  arguments  –  if  more  than  one  argument,  insert  ","  (comma)  

4.  Type  ENTER  

21  

sqrt  calculates  square  root    

Page 22: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:    rnorm  func1on      

•  rnorm  creates  a  vector  of  numbers  randomly  sampled  from  normal  distribu1on  with  specified  mean,  standard  devia1on  

22  

func1on  name  

rnorm(10,5,5)!  

sample  size  

mean  standard  devia1on  

arguments  

Page 23: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:    rnorm  func1on      

•  Mean  and  standard  devia1on  are  op1onal  

•  If  you  don't  specify  them,  they  default  default  to:    – 0  default  mean  – 1  default  sd  

23  

Page 24: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

R  1p!  

•  Use  UP  arrow  key  to  retrieve  previous  command  – Saves  typing  

24  

Page 25: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  R  allows  named  arguments  

Order  can  vary    

25  

rnorm(10,mean=5,sd=2)!  

Page 26: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

26  

•  Type  help(rnorm) to  list  arguments,  defaults  

•  help  is  a  func1on  –  takes  other  func1ons  as  arguments  

help  shows  how  to  use  a  func1on    

Page 27: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Now  you  know  how  to...  

•  Calculate  values  &  see  the  result    •  Save  output  to  variables  •  Use  Environment  tab  to  view  variables  

•  Use  R  func1ons    

Next  -­‐-­‐-­‐  ploKng!!!  

27  

Page 28: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

R  plokng  func1ons  

•  Many  op1ons  – generic  x-­‐y  plot,  sca7er  plots  – barplots  – dendrograms    – histograms  ...  and  much  more  

•  Highly  configurable!  –  log  or  linear  scale  axes  – different  characters  or  colors  for  points  ...  and  much  more  

28  

Page 29: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Generic  x-­‐y  plot  (sca7er  plot)    

•  named  argument  main  determines  plot  1tle  

•  Note:  Enclose  text  in  quotes    

29  

Page 30: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Try  other  op1ons  

•  col  -­‐  color  of  points  (in  quotes)  

•  pch  -­‐  point  character  – numeric  code  –  le7er  (in  quotes)    

30  and  many  more..  

Page 31: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Histogram  (hist)  

•  main  -­‐  plot  1tle  (in  quotes)    

•  col  -­‐  color  of  bars  (in  quotes)  

31  

Page 32: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Adding  to  a  plot  (1)  

•  abline -­‐  "a  b  line"    –  add  straight  line  

•  Arguments:  –  v  or  h  for  loca1on  of  ver1cal  or  horizontal  line  

–  a  and  b  for  slope  and  y  intercept    

32  

Page 33: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Adding  to  a  plot  (2)  

•  points    –  add  points  to  a  plot  

•  Arguments:  – x  ,  y  x  &  y  values  for  the  points    

–  other  op1ons,  same  as  for  plot !

33  

Page 34: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Take-­‐home:  In  R  you  can  "script"  a  plot  

•  Using  plokng  commands  like  points,  abline,  lines  you  can  add  more  data  to  a  plot,  element  by  element  

•  Most  plokng  commands  accept  the  same  op1ons,  like  – pch  -­‐  point  character  – col  -­‐  color  

•  Learning  one  plokng  command  helps  you  learn  many.  

34  

Page 35: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Graphics  demo  

•  Enter  demo(graphics)!

•  Type  ENTER  to  see  next  plot  

35  

Page 36: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Part  2  -­‐  R  Markdown  

36  

Page 37: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

How  to  install  knitr  •  Go  to  Packages  tab    •  Not  checked?  – Check  it  

•  Not  installed?  – Select  Tools  >  Install  Packages...  

– Enter  knitr  – Click  Install  

•  May  need  to  restart  RStudio  

37  

Page 38: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Setup  -­‐  to  enable  be7er  coding!      Go  to  Tools  >  Global  Preferences  >  Panes  •  Top  right:  console  

•  Lower  right:  Environment,  History,  Files,  Plots,  Help  

•  Top  Lei:  Source    

•  Lower  lei:  everything  else  

38  

Page 39: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Make  R  Markdown  file  

•  Click  "new"  file  icon  •  Choose  R  Markdown  – Creates  an  example  R  Markdown  

•  Take  a  moment  to  scan  document  

39  

Page 40: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

R  Markdown  has  plain  text  with  formakng  instruc1ons  

•  Row  of  "==="  makes  "Title"  a  top  level  heading    

40  

Page 41: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

R  Markdown  has  code  chunks  

•  Code  chunk  -­‐  three  back  1cs,  {r},  ends  with  three  more  back  1cs  

•  gray  background  41  

Page 42: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

knitr  "knits"  code  &  text  

•  Makes  an  HTML  document  (web  page)  that  combines    – code    – output  from  code  – your  text  explana1ons  

42  

Page 43: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Knit  HTML  

•  Save  the  file  as  "Example.Rmd"  

•  Click  •  Preview  appears  •  HTML  file  appears  •  Click  Example.html  in  File  tab  – choose  View  in  Web  browser        

43  

Page 44: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

knitr  makes  an  HTML  document  (a  Web  page)  

•  Images  embedded  •  You  can  email  it,  save  in  a  Dropbox,  etc  

44  

Page 45: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Edit  Example  

•  Edit  Plain  text  •  Edit  code  chunks  

45  

Page 46: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Run  commands  in  Markdown  

•  Put  cursor  inside  code  chunk  

•  Type  CNTRL-­‐ENTER  – or  click  run  

46  

Page 47: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Shortcut:  Chunks  menu  (top  right)  

•  Put  cursor  in  a  chunk  •  Use  Run  Current  Chunk  to  run  en1re  chunk  •  Or  Run  All    

47  

Page 48: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Edit  Markdown,  make  plot  look  nicer  

•  Use  col  to  add  color  •  Use  las  to  change  orienta1on  of  y  axis  numbers  

48  

Page 49: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Run  the  new  code  

49  

•  Put  cursor  inside  code  chunk  

•  Type  CNTRL-­‐ENTER  – or  click  run  

Page 50: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  knit  your  Markdown  

50  

Page 51: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Sta1s1cal  tests  in  R  

•  Tests  implemented  as  func1ons  – Usually  return  list  objects  

•  List  is  – object  that  contains  other  objects  of  many  types  

•  Previously,  you  saw  vectors  – Output  of  rnorm  command  – Vectors  are  like  lists  that  only  contain  one  type  of  object  (e.g.,  numbers  only)  

51  

Page 52: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Start  a  new  sec1on  

•  Heading,  smaller  than  1tle  heading  

52  

•  Make  new  code  chunk  •  Make  new  vectors  

•  Run  t.test!

Page 53: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Tip:  Markdown  help  

•  Using  R  Markdown  opens  Web  page  w/  more  info  

•  Markdown  Quick  Reference  shows  Markdown  codes  in  Help  tab   53  

Page 54: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Run  the  code  

54  

•  t.test  output  is  in  result!•  result is  a  list  

•  Cursor  inside  chunk  •  Type  CNTRL-­‐ENTER  – or  click  run  

Page 55: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Type  result  (variable  name)  in  console  for  a  summary  

55  

Page 56: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Result  is  a  list  with  named  components    

•  Use  names  func1on  to  find  what  it  contains  •  Use  $  to  retrieve  named  components  

56  

Page 57: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Differen1al  expression  analysis  walk-­‐through    

Effects  of  mild  chronic  heat  stress  on  gene  expression  in  tomato  pollen  

 

57  

Page 58: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Goals  

•  Show  you  how  to  structure  a  data  analysis  – Useful  framework  you  can  use  in  many  sekngs  

•  Give  you  an  example  differen1al  gene  expression  analysis  for  RNA-­‐Seq  – Use  it  as  a  star1ng  point  for  other  projects  –   Tip:  Review  edgeR  user  guide  for  other  example  data  analyses  

58  

Page 59: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Structure  of  the  data  analysis  

•  Introduc1on  –  explain  the  experimental  design  –  state  ques1ons  (no  more  than  3,  ideally  2)  

•  Analysis  –  describe  steps  of  analysis,  with  results  –  explain  judgment  calls,  like  P  value  cutoffs  

•  Conclusion  –  answer  the  original  ques1ons  

•  State  limita1ons  of  the  analysis  •  Session  info  including  soiware  versions  used  

Adapted  from  Jeff  Leek's  Data  Analysis,  Coursera    59  

Page 60: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Prac1ce:  Setup  •  Go  to  h7ps://bitbucket.org/lorainelab/tomatopollen  

60  

Page 61: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Download  repository  61  

Page 62: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Move  to  Desktop  

•  Subfolders  correspond  to  analysis  chunks  –  See  README.md  for  details  

•  Open  Differen0alExpression  

Folder  name  suffix  based  on  repo  version  

62  

Page 63: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Double-­‐click  ".Rproj"  file  in  Differen1al  Expression  folder  

•  Opens  a  new  RStudio  window    

63  

Page 64: WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr

Review  of  the  experiment    

•  Tomato  plants  subjected  to  chronic  mild  heat  stress  &  control  – Greenhouse  C    – Greenhouse  B  

•  Mature  pollen  grains  harvested  in  batches  over  eight  weeks,  ~  10  plants  per  batch  – One  treatment  sample,  one  control  sample  per  collec1on  

•  RNA  extracted,  sent  to  UCLA  for  sequencing  –  10  libraries,  5  treatments,  5  controls,  69  base  paired  end  sequencing  

64  Next:  Step-­‐by-­‐step  walk-­‐through  of  R  Markdown