26
1 Cloudera Search Embracing Apache Solr into Cloudera’s Pla9orm for Big Data Eva Andreasson, Sr. Product Manager, Cloudera Steven Noels, Cofounder and SVP of Products, NGDATA

Cloudera Search Webinar: Big Data Search, Bigger Insights

Embed Size (px)

DESCRIPTION

Cloudera Search brings full-text, interactive search and scalable indexing to data in HDFS and Apache HBase. Powered by and adding to Apache Solr, Cloudera Search fully integrates with CDH to bring scale and reliability for next-generation open source search -- Big Data search.

Citation preview

Page 1: Cloudera Search Webinar: Big Data Search, Bigger Insights

1

Cloudera  Search  Embracing  Apache  Solr  into  Cloudera’s  Pla9orm  for  Big  Data    Eva  Andreasson,  Sr.  Product  Manager,  Cloudera    Steven  Noels,  Co-­‐founder  and  SVP  of  Products,  NGDATA  

Page 2: Cloudera Search Webinar: Big Data Search, Bigger Insights

Who  is  Cloudera?  

2  

What  the  Enterprise  Requires  

§  Only  100%  open  source  Hadoop-­‐based  pla<orm  with  both  batch  and  real-­‐@me  processing  engines,  enterprise-­‐ready  with  na@ve  high  availability  

§  Suite  of  system  and  data  management  soEware  

§  Comprehensive  support  and  consul@ng  services  

§  Broadest  Hadoop  training  and  cer@fica@on  programs  

Extensive  Partner  Ecosystem  

§  Over  600  partners  across  hardware,  soEware  and  services    

The  Leader  in  Big  Data  

Management    

§  Deliver  a  revolu@onary  data  management  pla<orm  powered  by  Apache  Hadoop  

§  World’s  leading  commercial  vendor  of    Apache  Hadoop  

§  Enable  organiza@ons  to  improve  opera@onal  efficiency  and  Ask  Bigger  Ques@ons  of  all  their  data  

Customers  &  Users  Across  Industries  

§  More  produc@on  deployments  than  all  other  vendors  combined  

Page 3: Cloudera Search Webinar: Big Data Search, Bigger Insights

   

INGEST   STORE   EXPLORE   PROCESS   ANALYZE   SERVE  

CDH   CLOUDERA  MANAGER  

CLOUDERA  SUPPORT  

Cloudera  Enterprise  

3  

BRINGS  STORAGE  &  COMPUTE  TOGETHER  

WORKS  WITH  EVERY  TYPE  OF  DATA  

CHANGES  THE  ECONOMICS  OF  DATA  

MANGAGEMENT  

A  revolu@onary  solu@on  powered  by  Apache  Hadoop  

CLOUDERA  NAVIGATOR  

Page 4: Cloudera Search Webinar: Big Data Search, Bigger Insights

“ About  NGDATA  

NGDATA  is  the  next  genera@on  Customer  Intelligence  company  that  enables  ac@onable  customer  insights,  personalized  product  offers  and  in@mate  customer  experience  with  a  unique  combina@on  of  interac@ve  Big  Data  management  and  machine  learning  technologies  in  one  integrated  solu@on.  

Business Expertise

Enterprise Architectures

Big Data Technology

Machine Learning,

Algorithms, Analytics

Customer Intelligence

VISION  &  EXPERTISE   SOLUTION  

Customer Database

Enterprise Data

Reference Data

Customer Data

Customer Engagement

Governance and Risk

Management

Insights, Trends and Analysis

lily

A  Next  GeneraVon  Customer  Intelligence  Company  

Page 5: Cloudera Search Webinar: Big Data Search, Bigger Insights

Agenda  

§  Why  Search?  §  What  is  Cloudera  Search?  §  Using  Cloudera  Search  §  Learn  more  

Page 6: Cloudera Search Webinar: Big Data Search, Bigger Insights

6

Why  Search?  

Page 7: Cloudera Search Webinar: Big Data Search, Bigger Insights

Cloudera’s  Enterprise  Strategy  

An  Integrated  Part  of  the  Hadoop  System  

One  pool  of  data  

One  security  framework  

One  set  of  system  resources  

One  management  interface  

Page 8: Cloudera Search Webinar: Big Data Search, Bigger Insights

Search  Simplifies  Interac@on  

Explore  

Navigate  

Correlate  Experts  know  MapReduce.  Savvy  people  know  SQL.    

Everyone  knows  Search.  

Page 9: Cloudera Search Webinar: Big Data Search, Bigger Insights

Benefits  of  Search  

Improved  Big  Data  ROI  •  An  interac@ve  experience  without  technical  knowledge  •  Single  data  set  for  mul@ple  compu@ng  frameworks  

9

Faster  Vme  to  insight  •  Exploratory  analysis,  esp.  unstructured  data  •  Broad  range  of  indexing  op@ons  to  accommodate  needs  

Cost  efficiency  •  Single  scalable  pla<orm;  no  incremental  investment  •  No  need  for  separate  systems,  storage  

Solid  foundaVons  and  reliability  •  Solr  in  produc@on  environments  for  years  •  Hadoop-­‐powered  reliability  and  scalability  

Page 10: Cloudera Search Webinar: Big Data Search, Bigger Insights

10

What  is  Cloudera  Search?  

Page 11: Cloudera Search Webinar: Big Data Search, Bigger Insights

Cloudera  Search  

InteracVve  search  for  Hadoop  •  Full-­‐text  and  faceted  naviga@on  •  Batch,  near  real-­‐@me,  and  on-­‐demand  indexing  

11

Apache  Solr  integrated  with  CDH  •  Established,  mature  search  with  vibrant  community  •  Separate  run@me  like  MapReduce,  Impala  •  Incorporated  as  part  of  the  Hadoop  ecosystem  

Open  Source  •  100%  Apache,  100%  Solr  •  Standard  Solr  APIs  

Page 12: Cloudera Search Webinar: Big Data Search, Bigger Insights

Scalable  and  Robust  Index  Storage  

HDFS  

Lucene  

Extrac@on   Mapping  

Solr  

Zookeeper  

SolrCloud  

Querying  API   Indexing  API  

12  

Solr  and  HDFS  •  Scalable,  cost-­‐efficient  index  storage  

•  Higher  availability  •  Search  and  process  data  in  one  pla<orm  

Page 13: Cloudera Search Webinar: Big Data Search, Bigger Insights

Near  Real  Time  Indexing  at  Ingest  

Log  File   Solr  and  Flume  •  Data  ingest  at  scale  •  Flexible  extrac@on  and  mapping  

•  Indexing  at  data  ingest  •  Document-­‐level  ACL  

HDFS  

Flume  Agent  

Indexer  

Other  Log  File  

Flume  Agent  

Indexer  

13  

Page 14: Cloudera Search Webinar: Big Data Search, Bigger Insights

Streamlined  Extrac@on  and  Mapping  

Cloudera  Morphlines  •  Simple  and  flexible  data  transforma@on    

•  Reusable  across  mul@ple  index  workloads  

•  Over  @me,  extend  and  re-­‐use  across  pla<orm  workloads  

syslog   Flume  Agent  

Solr  sink  

Command:  readLine  

Command:  grok  

Command:  loadSolr  

Solr  

Event  

Record  

Record  

Record  

Document  

Page 15: Cloudera Search Webinar: Big Data Search, Bigger Insights

Scalable  Batch  Indexing  

Index  shard  

Files  

Index  shard  

Indexer  

Files  

Solr  server  

Indexer  

Solr  server  

15

HDFS  

Solr  and  MapReduce  •  Flexible,  scalable  batch  indexing  

•  Start  serving  new  indices  with  no  down@me  

•  On-­‐demand  indexing,  cost-­‐efficient  re-­‐indexing  

Page 16: Cloudera Search Webinar: Big Data Search, Bigger Insights

Scalable  Batch  Indexing  

16

Mapper:  Parse  input  into  

indexable  document  

Mapper:  Parse  input  into  

indexable  document  

Mapper:  Parse  input  into  

indexable  document  

Index  shard  1  

Index  shard  2  

Arbitrary  reducing  steps  of  indexing  and  merging  

End-­‐Reducer  (shard  1):  Index  document  

End-­‐Reducer  (shard  2):  Index  document  

Page 17: Cloudera Search Webinar: Big Data Search, Bigger Insights

Searchable  Real-­‐Time  Data  Indexing  HBase  

HDFS  

HBase  

interac@ve  load  

Indexer(s)  

Triggers  on  

updates   Solr  server  

Solr  server  Solr  server  Solr  server  Solr  server  

Search  

+   =  planet-­‐sized  tabular  data  immediate  access  &  updates  fast  &  flexible  informaVon  discovery  

B IG  DATA  DATAMANAGEMENT  

Page 18: Cloudera Search Webinar: Big Data Search, Bigger Insights

Searchable  Real-­‐Time  Data  HBase  &  Search  

HBase  SEP  Triggers  &  Indexer  

•  HBase  replica@on  mechanism  for  reliable  indexing  

•  light-­‐weight,  zero  impact  on  write  performance  

•  easy  to  set  up  &  integrate  •  flexible,  configura@on-­‐based  mapping  &  content  extrac@on  

Many  use  cases  

•  indexes  near-­‐real-­‐@me  HBase  updates  into  Solr  

•  fielded  search  on  HBase  columns  

•  faceted  search  •  query  by  example  •  datacube  

•  secondary  indexes  

Page 19: Cloudera Search Webinar: Big Data Search, Bigger Insights

Simple,  Customizable  Search  Interface  

Hue  •  Simple  UI  •  Navigated,  faceted  drill  down  

•  Customizable  display  •  Full  text  search,  standard  Solr  API  and  query  language  

Page 20: Cloudera Search Webinar: Big Data Search, Bigger Insights

Simplified  Management  

Cloudera  Manager  •  Install,  configure,  deploy  Solr  services  on  the  cluster  

•  Unified  management  and  monitoring  

•  Resource  management  

Page 21: Cloudera Search Webinar: Big Data Search, Bigger Insights

21

Using  Cloudera  Search  

Page 22: Cloudera Search Webinar: Big Data Search, Bigger Insights

Skybox  

•  Advanced  parallel  image  processing  on  images  stored  in  HDFS  

•  Before:  difficult  to  interac@vely  evaluate  image  quality  and  correlate  with  satellite  logs  

•  Now:  Index  images  and  satellite  logs  at  acquisi@on  and  on  demand,  interac@vely  introspect  image  quality  

Scalable,  efficient  image  search  for  analysis  and  process  improvement  

Page 23: Cloudera Search Webinar: Big Data Search, Bigger Insights

Explorys  Medical  

"Hadoop  has  been  Explorys'  center  of  gravity  for  data  management  since  the  company's  incep@on.  The  addi@on  of  Search  to  Cloudera's  pla<orm  expands  its  usability  by  suppor@ng  more  workloads  and  reducing  data  movement  between  infrastructure  systems.  Deploying  Cloudera  Search  supports  Explorys'  mission  to  help  healthcare  providers  deliver  beker,  more  cost  efficient  care  through  fast,  flexible  data  analysis."    

-­‐-­‐  Michael  Onders,  SVP  &  CTO,  Explorys  

Event,  exploraVon,  and  data  correlaVon    to  meet  SLAs  

Page 24: Cloudera Search Webinar: Big Data Search, Bigger Insights

Pakerns  and  Predic@ons  

•  Iden@fy  pakerns  in  social  media  and  perform  analy@cs  on  term  usage  to  improve  suicide  predic@ve  capability    

•  Before:  Social  media  data  sets  too  large;  tradi@onal  enterprise  search  

•  Now:  Near  real-­‐@me  correla@on  of  medical  records,  notes,  social  media;  access  for  doctors  and  non-­‐tech  staff  

ProacVve  healthcare  for  returning  military  veterans  

Page 25: Cloudera Search Webinar: Big Data Search, Bigger Insights

Ques@ons  

•  Ask  on  the  Q&A  tab      

•  Recording  will  be  available    at  cloudera.com  

 •  A^er  webinar,  inquire  at:  

[email protected]      •  Presenters  contact  info:    

[email protected]  [email protected]    

 

Thank  you  for  a,ending!    

25

Download  Cloudera  Search    cloudera.com/downloads  

 

Learn  more  about  Cloudera  Search,  powered  by  Solr  

cloudera.com/search        

Learn  more  about  NGDATA  and  Lily  

www.ngdata.com  

Page 26: Cloudera Search Webinar: Big Data Search, Bigger Insights