33
Introduc)on to RHadoop Master’s Degree in Informa1cs Engineering Master’s Programme in ICT Innova1on: Data Science (EIT ICT Labs Master School) Academic Year 20152106

IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Introduc)on  to  RHadoop Master’s  Degree  in  Informa1cs  Engineering  

Master’s  Programme  in  ICT  Innova1on:  Data  Science  (EIT  ICT  Labs  Master  School)  Academic  Year  2015-­‐2106  

Page 2: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Contents

•  Introduc1on  to…  •  MapReduce  •  HDFS  •  Hadoop  •  Data  Analy1cs  with  RHadoop  

Page 3: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

MapReduce  &  DQ  

• Divide  and  Conquer  (DQ)  •  General  idea  

•  Divide  a  problem  into  sub-­‐problems  (smaller)  •  Solve  each  problem  (independently)  •  Combine  the  solu1ons  

Page 4: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

DQ:  pseudo-­‐code

Func1on  DQ  (X:  Problem  data)    if  small(X)  then      S  =  easy(X)    if  not      divide(X)  =>  (X1,  ...,  Xk)      for  i  =  1  to  k  do        Si  =  DQ(Xi)      S  =  combine(S1,  ...,  Sk)    return  S  

 

Page 5: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

DQ:  efficiency

•  Efficiency  of  this  approach  •  An  appropriate  threshold  must  be  selected  to  apply  easy(X)  •  Decomposi1on  and  combining  func1ons  must  be  efficient  •  Sub-­‐problems  must  be  (approximately)  of  the  same  size  

Page 6: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

DQ:  Remarks

•  It  can  not  be  applied  to  any  type  of  problems  •  Some1mes,  it  might  not  be  obvious  how  to  divide  a  large  problem  into  sub-­‐problems  •  If  such  division  is  uneven,  we  will  have  an  unbalanced  system,  which  would  have  an  import  impact  on  the  overall  performance  of  the  algorithm  •  The  size  of  the  reduced  problems  must  be  significantly  smaller  than  the  original  one  so  that  massively  parallel  supercomputer  could  be  used  and  the  communica1on  overhead  can  be  compensated  

Page 7: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

MapReduce:  general  scheme

Source:  www.academia.edu  

Page 8: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

MapReduce:  more  detail

Source:  Hadoop  Book  

Page 9: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

MapReduce:  example

Source:  MilanoR  

Page 10: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Hadoop  Distributed  File  System  (HDFS)

• Distributed  File  System  evolved  from  Google  implementa1on  (GFS)  •  Fault-­‐tolerant:  files  and  divided  in  chunks  and  those  are  distributed  and  replicated  through  the  cluster  •  Normally,  the  replica1on  ra1o  is  3  

•  There  is  a  Master  Node  that  stores  this  meta-­‐data:  which  files,  into  how  many  chunks  these  are  divided  and  where  they  are  stored  •  Large  block  sizes  are  preferred  (128MB  by  default)  

Page 11: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Hadoop  Distributed  File  System  (HDFS)

Source:  Hadoop  tutorial  

Page 12: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Hadoop  Distributed  File  System  (HDFS)

•  In  HDFS,  blocks  should  be  read  from  the  beginning  to  the  end  (this  favors  the  MapReduce  approach)  •  Files  in  the  HDFS  system  ARE  NOT  stored  along  with  the  host  system  files  •  HDFS  is  normally  an  abstrac1on  OVER  an  exis1ng  file  system  (ext3,  ext4,  etc.)  •  Thus,  there  are  specific  commands  to  manipulate  the  HDFS  file  system  

•  To  open  a  file  stored  in  HDFS,  the  client  must  contact  the  NameNode  to  retrieve  the  loca1on  of  each  block  of  the  file  (at  the  DataNodes)  •  Parallel  reads  are  possible  (and  preferred)  

Page 13: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Hadoop  Distributed  File  System  (HDFS)

• Data  locality:  normally,  when  launching  a  job,  it  is  run  in  the  same  node  that  stores  the  data  it  must  manipulate  •  The  meta-­‐data  stored  in  the  NameNode  is  not  automa1cally  replicated  (it  must  be  done  manually  or  with  an  inac1ve  NameNode)  

Page 14: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

HDFS  from  the  command  line

•  Each  user  of  the  HDFS  has  a  personal  directory  •  No  security  direc1ves  implemented,  so  users  can  write  anywhere  

•  Access  to  HDFS  through  the  hdfs  command    hdfs  dfs  command  

•  Important  commands  •  -­‐copyFromLocal  vs.  -­‐copyToLocal  •  -­‐mkdir  •  -­‐cp,  -­‐mv  

•  Documenta1on  in  the  Hadoop  Website  

Page 15: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Hadoop  MRv1  vs  Yarn  (MRv2)

• Hadoop  MRv1  •  Resources  management  and  tasks  scheduling  and  monitoring  done  by  a  single  process  (bogle-­‐neck):  Job  Tracker  •  Each  sub-­‐problem  is  run  by  an  independent  process:  Task  Tracker  

• Hadoop  MRv2  •  Resources  management  and  tasks  scheduling  and  monitoring  are  split  in  different  processes  •  Resource  Manager  (RM):  overall  resources  management  •  Applica>on  Master(AM):  per  job  tasks  scheduling  and  monitoring  

•  A  NodeManager  runs  the  tasks  at  each  compu1ng  node  

Page 16: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Hadoop  MRv1  vs  Yarn  (MRv2)

Page 17: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Example:  wordcount

•  Input:  document  made  up  of  words  • Output:  A  set  of  (Word,  count(Word))  •  Two  func1ons:  map  and  reduce  

• map(k1,  v1):    

for  each  word  w  in  v1    emit(w,  1)  

•  reduce(k2,  v2_list):    

int  result  =  0;  for  each  v  in  v2_list  

 result  +=  v;  emit(k2,  result)    

Page 18: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Example:  wordcount

Page 19: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

Example:  wordcount

Page 20: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop

• Developed  by  Revolu1on  Analy1cs  (acquired  by  Microsol)  •  Three  main  components  

•  rhdfs:  R  +  HDFS  •  rmr2:  R  +  Map  Reduce  •  rhbase:  R  +  Hbase  

• Can  be  downloaded  from:  hgps://github.com/Revolu1onAnaly1cs/RHadoop/wiki/Downloads  

• Already  installed  and  configured  in  the  VM  provided…  

Page 21: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  interac)ng  with  HDFS #  Load  rhdfs  library  library(rhdfs)      

#  Start  rhdfs  hdfs.init()    

#  Basic  "ls",  path  is  mandatory  hdfs.ls("/user/hadoop”)    

#  Create  directory  work.dir  <-­‐  "/user/hadoop/aux/”  hdfs.mkdir(work.dir)      

#  And  delete  hdfs.delete(work.dir)    

#  Create  again  hdfs.mkdir(work.dir)      

Page 22: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  wordcount  example

•  Library  loading  and  ini1aliza1on  

#  Loading  the  RHadoop  libraries  library('rhdfs’)  library('rmr2')      #  Ini1alizaing  the  RHadoop  hdfs.init()      

Page 23: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  wordcount  example

wordcount  =  func1on(input,                #  The  output  can  be  an  HDFS  path  but                #  if  it  is  NULL  some  temporary  file  will                #  be  generated  and  wrapped  in  a  big  data                #  object,  like  the  ones  generated  by  to.dfs                output  =  NULL,                pagern  =  "  ")  {  

 

 #  Defining  wordcount  Map  func1on    wc.map  =  func1on(.,  lines)  {      keyval(  unlist(strsplit(x  =  lines,  split  =  pagern)),  1)    }    

 

 #  Defining  wordcount  Reduce  func1on    wc.reduce  =  func1on(word,  counts  )  {        keyval(word,  sum(counts))    }  

Page 24: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  wordcount  example

 #  Defining  MapReduce  parameters  by  calling  mapreduce  func1on    mapreduce(input  =  input  ,                    output  =  output,                    #  You  can  specify  your  own  input  and  output  formats                    #  and  produce  binary  formats  with  the  func1ons                    #  make.input.format  and  make.output.format                    input.format  =  "text”,                    map  =  wc.map,                    reduce  =  wc.reduce,                    #  With  combiner                    combine  =  T)  

}    

Page 25: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  wordcount  example

#  Running  MapReduce  Job  by  passing  the  Hadoop  #  input  directory  loca1on  as  parameter  wordcount('/user/hadoop/wordcount/quijote.txt')    #  Retrieving  the  RHadoop  MapReduce  output  #  data  by  passing  output  #  directory  loca1on  as  parameter  from.dfs("/tmp/file1b0817a5bcd0")    

•  El  Quijote  can  be  downloaded  from:  hgp://www.gutenberg.org/cache/epub/996/pg996.txt    

Page 26: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  airline  example

• We  will  analyze  the  commercial  data  of  an  airline  •  The  input  data  file  is  a  CSV  • We  will  need  to  use  a  custom  input  formager  to  ease  the  task  of  processing  the  file  

• Data  can  be  downloaded  from:    hgp://stat-­‐compu1ng.org/dataexpo/2009/1987.csv.bz2  

 

Page 27: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  airline  example

library(rmr2)  library('rhdfs’)    

hdfs.init()      

#  Put  data  in  HDFS  hdfs.data.root  =  '/user/hadoop/rhadoop/airline’  hdfs.data  =  file.path(hdfs.data.root,  'data’)  hdfs.mkdir(hdfs.data)    

hdfs.put("/home/hadoop/Downloads/1987.csv",  hdfs.data)    

hdfs.out  =  file.path(hdfs.data.root,  'out')  

Page 28: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  airline  example  (input  format) #  #  asa.csv.input.format()  -­‐  read  CSV  data  files  and  label  field  names  #  for  beger  code  readability  (especially  in  the  mapper)  #  asa.csv.input.format  =  make.input.format(format='csv',  mode='text',  streaming.format  =  NULL,  sep=',',                                                                                    col.names  =  c('Year',  'Month',  'DayofMonth',  'DayOfWeek',                                                                                                                                      'DepTime',  'CRSDepTime',  'ArrTime',  'CRSArrTime',                                                                                                                                      'UniqueCarrier',  'FlightNum',  'TailNum',                                                                                                                                      'ActualElapsedTime',  'CRSElapsedTime',  'AirTime',                                                                                                                                      'ArrDelay',  'DepDelay',  'Origin',  'Dest',  'Distance',                                                                                                                                      'TaxiIn',  'TaxiOut',  'Cancelled',  'Cancella1onCode',                                                                                                                                      'Diverted',  'CarrierDelay',  'WeatherDelay',                                                                                                                                      'NASDelay',  'SecurityDelay',  'LateAircralDelay'),                                                                                    stringsAsFactors=F)  

Page 29: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  airline  example  (mapper  1/2)

#  #  the  mapper  gets  keys  and  values  from  the  input  formager  #  in  our  case,  the  key  is  NULL  and  the  value  is  a  data.frame  from  read.table()  #  mapper.year.market.enroute_1me  =  func1on(key,  val.df)  {          #  Remove  header  lines,  cancella1ons,  and  diversions:        val.df  =  subset(val.df,  Year  !=  'Year'  &  Cancelled  ==  0  &  Diverted  ==  0)          #  We  don't  care  about  direc1on  of  travel,  so  construct  a  new  'market'  vector        #  with  airports  ordered  alphabe1cally  (e.g,  LAX  to  JFK  becomes  'JFK-­‐LAX')        market  =  with(val.df,  ifelse(Origin  <  Dest,  paste(Origin,  Dest,  sep='-­‐'),  paste(Dest,  Origin,  sep='-­‐'))  )  

Page 30: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  airline  example  (mapper  2/2)

     #  key  consists  of  year,  market        output.key  =  data.frame(year=as.numeric(val.df$Year),  market=market,  stringsAsFactors=F)          #  emit  data.frame  of  gate-­‐to-­‐gate  elapsed  1mes  (CRS  and  actual)  +  1me  in  air        output.val  =  val.df[,c('CRSElapsedTime',  'ActualElapsedTime',  'AirTime')]        colnames(output.val)  =  c('scheduled',  'actual',  'inflight')          #  and  finally,  make  sure  they're  numeric  while  we're  at  it        output.val  =  transform(output.val,  scheduled  =  as.numeric(scheduled),                                                                                            actual  =  as.numeric(actual),  inflight  =  as.numeric(inflight))          return(  keyval(output.key,  output.val)  )  }  

Page 31: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  airline  example  (reducer)

#  #  the  reducer  gets  all  the  values  for  a  given  key  #  the  values  (which  may  be  mul1-­‐valued  as  here)  come  in  the  form  of  a  data.frame  #  reducer.year.market.enroute_1me  =  func1on(key,  val.df)  {        output.key  =  key      output.val  =  data.frame(flights  =  nrow(val.df),                                                                                              scheduled  =  mean(val.df$scheduled,  na.rm=T),                                                                                              actual  =  mean(val.df$actual,  na.rm=T),                                                                                              inflight  =  mean(val.df$inflight,  na.rm=T)  )        return(  keyval(output.key,  output.val)  )  }  

Page 32: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  final  configura)on  and  execu)on

mr.year.market.enroute_1me  =  func1on  (input,  output)  {      mapreduce(input  =  input,                          output  =  output,                          input.format  =  asa.csv.input.format,                          map  =  mapper.year.market.enroute_1me,                          reduce  =  reducer.year.market.enroute_1me,                          backend.parameters  =  list(                              hadoop  =  list(D  =  "mapred.reduce.tasks=2")                          ),                          verbose=T)  }    out  =  mr.year.market.enroute_1me(hdfs.data,  hdfs.out)  

Page 33: IntroducontoRHadooplaurel.datsi.fi.upm.es/_media/docencia/asignaturas/... · IntroducontoRHadoop Master’s(Degree(in(Informacs(Engineering(Master’s(Programme(in(ICT(Innovaon:(DataScience((EIT(ICT(Labs(Master(School)(Academic(Year(2015G2106

RHadoop:  gathering  results

results  =  from.dfs(  out  )  results.df  =  as.data.frame(results,  stringsAsFactors=F  )  colnames(results.df)  =  c('year',  'market',  'flights',  'scheduled',  'actual',  'inflight')    print(head(results.df))    #  save(results.df,  file="out/enroute.1me.market.RData")