Upload
cloudera-inc
View
1.935
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Cloudera Search brings full-text, interactive search and scalable indexing to data in HDFS and Apache HBase. Powered by and adding to Apache Solr, Cloudera Search fully integrates with CDH to bring scale and reliability for next-generation open source search -- Big Data search.
Citation preview
1
Cloudera Search Embracing Apache Solr into Cloudera’s Pla9orm for Big Data Eva Andreasson, Sr. Product Manager, Cloudera Steven Noels, Co-‐founder and SVP of Products, NGDATA
Who is Cloudera?
2
What the Enterprise Requires
§ Only 100% open source Hadoop-‐based pla<orm with both batch and real-‐@me processing engines, enterprise-‐ready with na@ve high availability
§ Suite of system and data management soEware
§ Comprehensive support and consul@ng services
§ Broadest Hadoop training and cer@fica@on programs
Extensive Partner Ecosystem
§ Over 600 partners across hardware, soEware and services
The Leader in Big Data
Management
§ Deliver a revolu@onary data management pla<orm powered by Apache Hadoop
§ World’s leading commercial vendor of Apache Hadoop
§ Enable organiza@ons to improve opera@onal efficiency and Ask Bigger Ques@ons of all their data
Customers & Users Across Industries
§ More produc@on deployments than all other vendors combined
INGEST STORE EXPLORE PROCESS ANALYZE SERVE
CDH CLOUDERA MANAGER
CLOUDERA SUPPORT
Cloudera Enterprise
3
BRINGS STORAGE & COMPUTE TOGETHER
WORKS WITH EVERY TYPE OF DATA
CHANGES THE ECONOMICS OF DATA
MANGAGEMENT
A revolu@onary solu@on powered by Apache Hadoop
CLOUDERA NAVIGATOR
“ About NGDATA
NGDATA is the next genera@on Customer Intelligence company that enables ac@onable customer insights, personalized product offers and in@mate customer experience with a unique combina@on of interac@ve Big Data management and machine learning technologies in one integrated solu@on.
Business Expertise
Enterprise Architectures
Big Data Technology
Machine Learning,
Algorithms, Analytics
Customer Intelligence
VISION & EXPERTISE SOLUTION
Customer Database
Enterprise Data
Reference Data
Customer Data
Customer Engagement
Governance and Risk
Management
Insights, Trends and Analysis
lily
A Next GeneraVon Customer Intelligence Company
Agenda
§ Why Search? § What is Cloudera Search? § Using Cloudera Search § Learn more
6
Why Search?
Cloudera’s Enterprise Strategy
An Integrated Part of the Hadoop System
One pool of data
One security framework
One set of system resources
One management interface
Search Simplifies Interac@on
Explore
Navigate
Correlate Experts know MapReduce. Savvy people know SQL.
Everyone knows Search.
Benefits of Search
Improved Big Data ROI • An interac@ve experience without technical knowledge • Single data set for mul@ple compu@ng frameworks
9
Faster Vme to insight • Exploratory analysis, esp. unstructured data • Broad range of indexing op@ons to accommodate needs
Cost efficiency • Single scalable pla<orm; no incremental investment • No need for separate systems, storage
Solid foundaVons and reliability • Solr in produc@on environments for years • Hadoop-‐powered reliability and scalability
10
What is Cloudera Search?
Cloudera Search
InteracVve search for Hadoop • Full-‐text and faceted naviga@on • Batch, near real-‐@me, and on-‐demand indexing
11
Apache Solr integrated with CDH • Established, mature search with vibrant community • Separate run@me like MapReduce, Impala • Incorporated as part of the Hadoop ecosystem
Open Source • 100% Apache, 100% Solr • Standard Solr APIs
Scalable and Robust Index Storage
HDFS
Lucene
Extrac@on Mapping
Solr
Zookeeper
SolrCloud
Querying API Indexing API
12
Solr and HDFS • Scalable, cost-‐efficient index storage
• Higher availability • Search and process data in one pla<orm
Near Real Time Indexing at Ingest
Log File Solr and Flume • Data ingest at scale • Flexible extrac@on and mapping
• Indexing at data ingest • Document-‐level ACL
HDFS
Flume Agent
Indexer
Other Log File
Flume Agent
Indexer
13
Streamlined Extrac@on and Mapping
Cloudera Morphlines • Simple and flexible data transforma@on
• Reusable across mul@ple index workloads
• Over @me, extend and re-‐use across pla<orm workloads
syslog Flume Agent
Solr sink
Command: readLine
Command: grok
Command: loadSolr
Solr
Event
Record
Record
Record
Document
Scalable Batch Indexing
Index shard
Files
Index shard
Indexer
Files
Solr server
Indexer
Solr server
15
HDFS
Solr and MapReduce • Flexible, scalable batch indexing
• Start serving new indices with no down@me
• On-‐demand indexing, cost-‐efficient re-‐indexing
Scalable Batch Indexing
16
Mapper: Parse input into
indexable document
Mapper: Parse input into
indexable document
Mapper: Parse input into
indexable document
Index shard 1
Index shard 2
Arbitrary reducing steps of indexing and merging
End-‐Reducer (shard 1): Index document
End-‐Reducer (shard 2): Index document
Searchable Real-‐Time Data Indexing HBase
HDFS
HBase
interac@ve load
Indexer(s)
Triggers on
updates Solr server
Solr server Solr server Solr server Solr server
Search
+ = planet-‐sized tabular data immediate access & updates fast & flexible informaVon discovery
B IG DATA DATAMANAGEMENT
Searchable Real-‐Time Data HBase & Search
HBase SEP Triggers & Indexer
• HBase replica@on mechanism for reliable indexing
• light-‐weight, zero impact on write performance
• easy to set up & integrate • flexible, configura@on-‐based mapping & content extrac@on
Many use cases
• indexes near-‐real-‐@me HBase updates into Solr
• fielded search on HBase columns
• faceted search • query by example • datacube
• secondary indexes
Simple, Customizable Search Interface
Hue • Simple UI • Navigated, faceted drill down
• Customizable display • Full text search, standard Solr API and query language
Simplified Management
Cloudera Manager • Install, configure, deploy Solr services on the cluster
• Unified management and monitoring
• Resource management
21
Using Cloudera Search
Skybox
• Advanced parallel image processing on images stored in HDFS
• Before: difficult to interac@vely evaluate image quality and correlate with satellite logs
• Now: Index images and satellite logs at acquisi@on and on demand, interac@vely introspect image quality
Scalable, efficient image search for analysis and process improvement
Explorys Medical
"Hadoop has been Explorys' center of gravity for data management since the company's incep@on. The addi@on of Search to Cloudera's pla<orm expands its usability by suppor@ng more workloads and reducing data movement between infrastructure systems. Deploying Cloudera Search supports Explorys' mission to help healthcare providers deliver beker, more cost efficient care through fast, flexible data analysis."
-‐-‐ Michael Onders, SVP & CTO, Explorys
Event, exploraVon, and data correlaVon to meet SLAs
Pakerns and Predic@ons
• Iden@fy pakerns in social media and perform analy@cs on term usage to improve suicide predic@ve capability
• Before: Social media data sets too large; tradi@onal enterprise search
• Now: Near real-‐@me correla@on of medical records, notes, social media; access for doctors and non-‐tech staff
ProacVve healthcare for returning military veterans
Ques@ons
• Ask on the Q&A tab
• Recording will be available at cloudera.com
• A^er webinar, inquire at:
[email protected] • Presenters contact info:
[email protected] [email protected]
Thank you for a,ending!
25
Download Cloudera Search cloudera.com/downloads
Learn more about Cloudera Search, powered by Solr
cloudera.com/search
Learn more about NGDATA and Lily
www.ngdata.com