Upload
theinevitablecloud
View
202
Download
3
Tags:
Embed Size (px)
DESCRIPTION
For more information: https://www.facebook.com/TheInevitableCloud Linkedin: The Inevitable Cloud Community
Citation preview
Who is Cloudera?
2
What the Enterprise Requires
The market-leading Hadoop-based platform with batch and real-time processing frameworks
A comprehensive suite of system and data management software
Training and certification programs
Comprehensive support and consulting services
Extensive Partner Ecosystem
Over 400 partners across hardware, software and services
The Leader in Big Data
Management
Deliver a revolutionary data management platform based on Apache Hadoop
Enable organizations to improve operational efficiency and Ask Bigger Questions of all their data
Customers & Users Across Industries
More production deployments than all other vendors combined
©2013 Cloudera, Inc. All Rights Reserved.
Data Has Changed in the Last 30 Years D
ATA
GR
OW
TH
END-USER APPLICATIONS
THE INTERNET
MOBILE DEVICES
SOPHISTICATED MACHINES
STRUCTURED DATA – 10%
1980 2012
UNSTRUCTURED DATA – 90%
3 ©2013 Cloudera, Inc. All Rights Reserved.
What if you wanted to…
4
Data
Question
Speed
Usage
Type/Form
©2013 Cloudera, Inc. All Rights Reserved.
So what is Apache ?
Self-Healing
High-Bandwidth
Clustered Storage
Byte Streams
Fault-Tolerant
Distributed Processing
Schema-on-Read
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Input File
HDFS storage distribution
Node A Node B Node C Node D Node E
1
2
3
4
5
2
4
5
1
2
5
1
3
4
2
3
5
1
3
4
Output File
MapReduce compute distribution
Node A Node B Node C Node D Node E
Storage
Compute
©2013 Cloudera, Inc. All Rights Reserved. 5
6
Next-Gen Data Management
©2013 Cloudera, Inc. All Rights Reserved.
The Key Benefit: Agility/Flexibility
7
Schema-on-Read (Hadoop):
Schema-on-Write (RDBMS):
• Prescriptive Data Modeling:
• Create static DB schema
• Transform data into RDBMS
• Query data in RDBMS format
• New columns must be added explicitly before new data can propagate into the system.
• Good for Known Unknowns (Repetition)
• Descriptive Data Modeling:
• Copy data in its native format
• Create schema + parser
• Query Data in its native format (does ETL on the fly)
• New data can start flowing any time and will appear retroactively once the schema/parser properly describes it.
• Good for Unknown Unknowns (Exploration)
©2013 Cloudera, Inc. All Rights Reserved.
Scalable Technology + Scalable Development
8
Grows without requiring developers to re-architect their algorithms/application
©2013 Cloudera, Inc. All Rights Reserved.
AUTO SCALE
Low ROB (but still a ton of
aggregate value)
High ROB
Economics: Return on Byte
9 ©2013 Cloudera, Inc. All Rights Reserved.
Cloud Deployment
CDH: Cloudera Distribution incl. Apache Hadoop
Coordination
Data Integration
Fast Read/Write
Access
Batch Processing Languages
Web Console
Job Workflow
Metadata
APACHE ZOOKEEPER
APACHE FLUME, APACHE SQOOP APACHE HBASE
APACHE PIG, APACHE HIVE
HUE
APACHE OOZIE
APACHE HIVE MetaStore Interactive SQL
Data Mining Lib
Impala
APACHE MAHOUT
APACHE WHIRR
Bu
ild/T
est
: APA
CH
E B
IGTO
P
Cloudera Manager Free Edition (Installation Wizard)
©2013 Cloudera, Inc. All Rights Reserved. 10
Hadoop Core Kernel MapReduce, HDFS
Connectivity
Data Processing Lib DataFu for Pig
ODBC/JDBC/FUSE/HTTPS
Cloudera Enterprise
11 ©2013 Cloudera, Inc. All Rights Reserved.
The Cloudera Solution Stack
12
CLOUDERA UNIVERSITY
DEVELOPER TRAINING
ADMINISTRATOR TRAINING
DATA SCIENCE TRAINING
CERTIFICATION PROGRAMS
PROFESSIONAL SERVICES
USE CASE DISCOVERY NEW HADOOP DEPLOYMENT PROOF-OF-CONCEPT
DEPLOYMENT CERTIFICATION PROCESS & TEAM DEVELOPMENT
PRODUCTION PILOTS
MANAGEMENT SOFTWARE & TECHNICAL SUPPORT (SUBSCRIPTION)
CDH
INGEST STORE EXPLORE PROCESS ANALYZE SERVE
CM CLOUDERA MANAGER
CS CLOUDERA SUPPORT
OSS APACHE HADOOP & OPEN SOURCE SOFTWARE
©2013 Cloudera, Inc. All Rights Reserved.
Powered by Cloudera Impala
13
BEFORE IMPALA
• With Impala: Interactive ANSI-92 SQL queries Native distributed query engine Optimized for low-latency
• Provides:
Answers as fast as you can ask Everyone can ask questions of all data Big data storage and analytics together
WITH IMPALA
• Unified storage: Supports HDFS and HBase Flexible file formats and schemas
• Unified Metastore • Unified Security • Unified Client Interfaces:
ODBC/JDBC SQL syntax Hue Beeswax Web UI
BATCH PROCESSING
USER INTERFACE
REAL-TIME ACCESS
©2013 Cloudera, Inc. All Rights Reserved.
Cloudera in the Enterprise Stack
14 ©2013 Cloudera, Inc. All Rights Reserved.
Use Case: A Major Financial Institution
©2013 Cloudera, Inc. All Rights Reserved. 15
The Challenge: • Current EDW at capacity; cannot support growing data depth and width • Performance issues in business critical apps; little room for innovation.
New solution saves tens of millions by optimizing existing EDW for analytics & reducing data storage costs by 99%
The Solution: • Cloudera Enterprise offloads data
storage (S), processing (T) & some analytics (Q) from the EDW.
• EDW resources can now be focused on repeatable operational analytics.
• Month data scan in 4 secs vs. 4 hours
Operational (44%)
ELT Processing (42%)
Analytics (11%)
DATA WAREHOUSE
Analytics Processing
Storage
CLOUDERA
Operational (50%)
Analytics (50%)
DATA WAREHOUSE
Beyond Data Warehousing
16
COMMUNICATIONS Location- based advertising
HEALTH CARE Patient sensors, monitoring, EHRs Quality of care
LAW ENFORCEMENT & DEFENSE Threat analysis, Social media monitoring, Photo analysis
EDUCATION & RESEARCH Experiment sensor analysis
FINANCIAL SERVICES Risk & portfolio analysis New products
ON-LINE SERVICES / SOCIAL MEDIA People & career matching Website optimization
UTILITIES Smart Meter analysis for network capacity
CONSUMER PACKAGED GOODS Sentiment analysis of what’s hot, customer service
MEDIA / ENTERTAINMENT Viewers / advertising effectiveness
TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment
LIFE SCIENCES Clinical trials Genomics
RETAIL Consumer sentiment Optimized marketing
AUTOMOTIVE Auto sensors reporting location, problems
HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis
OIL & GAS Drilling exploration sensor analysis
©2013 Cloudera, Inc. All Rights Reserved.
17
The Road Ahead
Bringing Compute to Data
Bringing Applications
to Data
2006-2012 2013-???
Flexibility • Store any data • Run any analysis • Keep’s pace with the rate of change of incoming data
Scalability • Proven growth to PBS/1,000s of nodes • No need to rewrite queries, automatically scales • Keep’s pace with the rate of growth of incoming data Economics • Cost per TB at a fraction of other options • Keep all of your data alive in an active archive • Powering the data beats algorithm movement
The Cloudera Platform for Big Data
18 ©2013 Cloudera, Inc. All Rights Reserved.
Dr. Amr Awadallah CTO/Founder @awadallah [email protected]