23
Presenter: John Johnson Date: 01/01/14 PRESENTATION TITLE Everything but your code ®

How to Choose a Host for a Big Data Project

Embed Size (px)

Citation preview

Page 1: How to Choose a Host for a Big Data Project

Presenter:  John  Johnson  Date:  01/01/14  

PRESENTATION  TITLE  

Everything  but  your  code®  

Page 2: How to Choose a Host for a Big Data Project

Contents  

•  What  does  “Big  Data”  really  mean?  •  Big  Data  use  cases  •  Considera:ons  when  building  your  project/applica:on  

•  Hos:ng  op:ons  and  Big  Data  challenges  •  Opera:ons-­‐as-­‐a-­‐Service  •  Customer  close-­‐up  

Page 3: How to Choose a Host for a Big Data Project

“Big  data  is  a  collec.on  of  [unstructured]  data  from  tradi.onal  and  digital  sources  inside  and  outside  your  company  that  

represents  a  source  for  ongoing  discovery  and  analysis.”    

-­‐  Lisa  Arthur,  Forbes  

Page 4: How to Choose a Host for a Big Data Project

Big  Data  use  cases  

DATA  

Make  unstructured  info  transparent  and  usable  at  much  higher  frequency  

Precisely  tailor  products/services  for  beJer  analysis  and  segmenta:on  

Improve  development  of  next  gen  products/services    

Create  and  store  unstructured    transac:onal  data  

Page 5: How to Choose a Host for a Big Data Project

Planning  your  build  

When  you’re  building  big  data  applica:ons  you  have  to  have  a  view  of  the  complete  Stack  

The  Stack  

Page 6: How to Choose a Host for a Big Data Project

Requirements  of  Big  Data  ApplicaEons  

•  Big  Data  is  power  hungry  •  10  or  40Gbps  networks  at  a  minimum  

•  Big  Data  is  distributed    •  Big  Data  is  monitoring  intensive    –  Requires  accurate,  specific  and  frequent  diagnos:cs  to  run  properly  

•  Big  Data  apps  require  tons  of  memory  and  storage  

•  Applica:on  Support  Tools  

   

Page 7: How to Choose a Host for a Big Data Project

What  do  you  need  for  the  back  end?  Take  a  big  task  and  divide  into  smaller,  discrete  tasks  that  can  be  carried  

out  in  parallel  

In  the  cloud,  your  data  could  be  spread  across  mul:ple  servers  

Because  of  this  complexity,  the  task  needs  to  be  divided  into  smaller  tasks  

Page 8: How to Choose a Host for a Big Data Project

Choosing  a  hos:ng  op:on  for  your  project  

In-­‐house  vs.  

Cloud  vs.  

Coloca:on  vs.  

Dedicated  Managed  Hos:ng  (Opera:ons  as  a  Service)  

Page 9: How to Choose a Host for a Big Data Project

Management  vs.  Resources  

Shared   Dedicated  

Fully  Managed  

Unmanaged  

[Managem

ent]  

[Resources]  

Cloud  

Coloca:on  

DIY  

OaaS    (Dedicated  Managed  Hos:ng)  

Page 10: How to Choose a Host for a Big Data Project

In-­‐house  –  What  you  get  

•  Purpose  built  system  (custom  design)  =  Fast!    

•  Minimal  Packet  Loss,  JiJer  and  Latency  

•  Single  Tenant    

•  Reduced/No  Server  or  Data  Sprawl    

•  Transparent  Infrastructure  

•  10  or  40Gbps  Network    

✓  ✓  

✓  ✓  

✓  ✓  

Page 11: How to Choose a Host for a Big Data Project

Challenges  of  Big  Data  w/  In-­‐House  hos:ng  

•  Do  you  have  the  experience  and  knowledge  to  design,  build  and  maintain  the  network?  

 § Have  you  thought  about  the  total  costs?  – Data  center  costs    – Equipment  costs  – Staffing  costs  – Applica:on  Support  costs    

•  Did  you  factor  in  applica:on  support  tools?    

•  Do  you  want  to  be  an  internet  plumber?  

 

$  

Page 12: How to Choose a Host for a Big Data Project

Cloud  –  what  you  get  

•  Quick  spin-­‐up  :me    •  Lower  equipment  costs    •  Lower  personnel  costs  for  infrastructure  support  

✓  

✓  

✓  

Page 13: How to Choose a Host for a Big Data Project

Challenges  of  Big  Data  in  a  Cloud  environment  

•  Would  your  opera:ons  be  adversely  affected  by  packet  loss,  jiJer  and  latency?  

 •  Do  you  want  to  share  resources  with  other  

companies  on  a  system  that’s  designed  to  be  big,  but  not  fast?  

   •  Does  your  data  need  to  be  “in  one  place”?    •  Distributed  data  puts  a  stress  on  the  network  that  

most  cloud  environments  were  not  designed  for    

! !

Page 14: How to Choose a Host for a Big Data Project

Challenges  of  Big  Data  in  a  Cloud  environment    •  Is  the  cloud  provider  capable  of  providing  the  intensive  

monitoring  needed  by  Big  Data  applica:ons?  –  Requires  accurate,  specific  and  frequent  diagnos:cs  to  run  properly  

–  The  privacy  of  the  cloud  works  against  efficiency  

Page 15: How to Choose a Host for a Big Data Project

Coloca:on  Hos:ng–  what  you  get  

•  Lower  equipment  costs  •  Control  over  non-­‐data  center  infrastructure  (servers,  network,  etc.)  

•  Not  responsible  for  data  center  design,  build  or  maintenance  

•  No  tech  support  for  equipment  •  Single-­‐tenancy  

✓  ✓  

✓  ✓  

✓  

Page 16: How to Choose a Host for a Big Data Project

Challenges  of  Big  Data  in  a  Coloca:on  Environment  

•  Do  you  want  to  be  responsible  for  all  non-­‐data  center  support?  

 •  Are  you  comfortable  with  having  no  applica:on  

support?    •  Does  the  provider  custom-­‐design  your  architecture,  

or  rely  on  a  ‘one  size  fits  most’  deployment?      •  What  hardware  is  single-­‐tenant,  and  what  is  mul:-­‐

tenant/shared,  and  would  the  shared  elements  impact  your  opera:ons?  

   

Page 17: How to Choose a Host for a Big Data Project

Opera:ons-­‐as-­‐a-­‐Service  (Dedicated  Managed  Hos:ng)  

OperaEons-­‐as-­‐a-­‐Service  

In-­‐House   Cloud   ColocaEon   OaaS  via  Peak  HosEng  

Minimal  Packet  Loss,  JiQer  and  Latency  

þ   ý   Maybe   þ  

Single  Tenant   þ   ý   þ   þ  Reduced/No  Server  or  Data  Sprawl   þ   ý   Maybe   þ  

DC  Techs  Supplied   ý   þ   þ   þ  SysAdmin  Supplied   ý   ý   ý   þ  Transparent  Infrastructure   þ   ý   þ   þ  Custom  Design   þ   þ   Maybe   þ  10  or  40Gbps  Network   þ   ?   ?   þ  ApplicaEon  Support  tools   ý   ý   ý   þ  

Page 18: How to Choose a Host for a Big Data Project

Peak  Hos:ng  Customer  Close-­‐up  

Big  social  data  analy:cs  company,  delivering  advanced  social  intelligence  and  real-­‐:me  threat  detec:on  across  the  consumer  packaged  goods,  food  and  beverage,  media  and  entertainment  

and  pharmaceu:cal  industries.    

Akuda  Labs’  Pulsar  real-­‐:me  streaming  classifica:on  engine  available,  currently  processing  5  Billion  SCOPS  (was  500  million  when  the  came  to  Peak  Hos:ng)  for  their  product,  ListenLogic                                                                                          

Page 19: How to Choose a Host for a Big Data Project

                                                         -­‐  The  search  •  Needs:  

–  At  least  1  Billion  SCOPS  processing  power  to  run  Hadoop-­‐level,  deep  dive  ques:ons  

–  Answers  in  real-­‐:me  

Build  vs.  Buy?  Cloud   DIY  (Build)   Dedicated  Managed  HosEng  

Not  an  op:on  due  to  shared  and  distributed  

infrastructure  in  a  cloud  environment  

•  Total  control  •  EXPENSIVE  $$$                -­‐  HW                  -­‐  Staffing  

•  Their  best  op:on  •  Now,  which  provider?  

Page 20: How to Choose a Host for a Big Data Project

-­‐  The  choice  

Best  performing  hardware  

Fast  network    

Customized  Infrastructure  –  designed  specifically  for  Akuda  Labs   !  

þ  

Technical  Support  staff  þ  

OperaEons-­‐as-­‐a-­‐Service  

þ  

Page 21: How to Choose a Host for a Big Data Project

-­‐  What  we  did  

2012:    •  Provided  40-­‐50  servers  –  24  &  34  core  machines  w/  128GB  RAM    2013:    •  Akuda  upgrades  to  64-­‐core  servers  w/  512GB  RAM  •  S:ll  only  40-­‐50  servers  •  Connected  via  dual  10Gbps  networking  

 Pool  servers  for  customers  and  simply  add  more  servers  to  the  pool  as  needed  –  rather  than  deploy  a  new  cluster  per  customer  

New  Abili:es  

Process  100X  the  data  they  previously  could  

Easily  process  500  million  SCOPS,  with  the  ability  to  process  50  billion  if  they  had  enough  data  

Page 22: How to Choose a Host for a Big Data Project

-­‐  The  ROI  

BeQer  Efficiency  

BeQer  Service  BeQer  Economics  

More  ProducEvity  

Trim  server  count  by  20%   Schedule  tasks  on-­‐demand  instead  of  wai:ng  for  

resources  

BeJer  performance,  higher  levels  of  customiza:on  and  produc:vity  

 All  while  paying  30%  less  than  with  

previous  provider  

Worked  together  to  design,  build,  maintain,  and  support  current  

infrastructure    

Page 23: How to Choose a Host for a Big Data Project

In  conclusion