Upload
vubao
View
220
Download
3
Embed Size (px)
Citation preview
1 © Copyright 2013 EMC Corporation. All rights reserved.
Gain Insights From Unstructured Data Using Pivotal HD
2 © Copyright 2013 EMC Corporation. All rights reserved.
Traditional Enterprise Analytics Process
3 © Copyright 2013 EMC Corporation. All rights reserved.
The Fundamental Paradigm Shift
Internet age and exploding data growth
Enterprises leverage new data sources to identify emerging trends and opportunities
Traditional database tools not able to cope
4 © Copyright 2013 EMC Corporation. All rights reserved.
Hadoop: Platform for Big Data
Flexible
Scalable
Inexpensive
Fault-toleran
Rapidly Adopted
Gain Insights from
Unstructured Data
5 © Copyright 2013 EMC Corporation. All rights reserved.
The Analytics Process with Hadoop
6 © Copyright 2013 EMC Corporation. All rights reserved.
$-
$20,000
$40,000
$60,000
$80,000
2008 2009 2010 2011 2012 2013
Big Data Platform Price/TB
Big Data DB Hadoop
Economics Have Changed the Game
Big Data RDBMS
pricing will ultimately
converge with
Hadoop pricing
7 © Copyright 2013 EMC Corporation. All rights reserved.
Our Big Bets With Hadoop
1. HDFS becomes the data substrate for the next generation of data infrastructures
2. A set of integrated, enterprise-scale services will evolve on top of HDFS
1. Provisioning flexibility and elasticity become critical capabilities for this data infrastructure
8 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal and Hadoop
9 © Copyright 2013 EMC Corporation. All rights reserved.
Analytical Query Operational Intelligence
In-Memory DB
Run-Time Applications
In-Memory Objects
Enterprise Data Warehouse
RDBMS
Continues to serve as system of record
HDFS
Data Staging Platform
Data Mgmt. Services
Data Visualization
Compliance and financial reporting
Traditional BI/Reporting
Pivotal Data Fabric
Data Visualization
Stream Ingestion
Streaming Services
10 © Copyright 2013 EMC Corporation. All rights reserved.
Flexible Deployment Model
deploy
Public Cloud On Premise Private Cloud
11 © Copyright 2013 EMC Corporation. All rights reserved.
PIVOTAL HD The World’s Most Powerful Hadoop Distribution
12 © Copyright 2013 EMC Corporation. All rights reserved.
What Is Pivotal HD?
World’s first true SQL processing for enterprise-
ready Hadoop
100% Apache Hadoop-based platform
Virtualization and cloud ready with VMWare and
Isilon
Available as a software-only or appliance-based
solution
13 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal Hadoop Distributions
100% Open Source Compatible
Current Release Apache Hadoop 1.x
Upcoming Release Apache Hadoop 2.x
14 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal HD Architecture: Apache
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource Management & Workflow
Yarn
Zookeeper
Apache
15 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal HD Architecture: Enterprise
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource Management & Workflow
Yarn
Zookeeper
Command
Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HD Enterprise
Apache Pivotal HD Enterprise
16 © Copyright 2013 EMC Corporation. All rights reserved.
Data Loader Architecture
Cloud Infrastructure Platform Cloud Infrastructure Platform
.
.
.
Streams
Push
Pull
Connectors
Flume
HDFS
Data Loader
Data Source Registration
Copy Strategy
Optimization
Web GUI and CLI
Data Destination Registration
Data Copy
Job Management
Data Processing
REST APIs
Files
HDFS
NFS
HTTP
FTP
Local
17 © Copyright 2013 EMC Corporation. All rights reserved.
Cluster Management With Command Center
Configure
Monitor
Manage
Analyze
Deploy
18 © Copyright 2013 EMC Corporation. All rights reserved.
Pivotal HD Architecture: HAWQ
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource Management & Workflow
Yarn
Zookeeper
Command
Center
Data Loader
Pivotal HD Enterprise
Apache Pivotal HD Enterprise HAWQ
Xtension Framework
Catalog Services
Query Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ– Advanced Database Services
Hadoop Virtualization (HVE)
19 © Copyright 2013 EMC Corporation. All rights reserved.
HAWQ: A True SQL Engine for Hadoop
Scale and Performance
Fault Tolerance
Transaction Support
Data Management and Analysis
20 © Copyright 2013 EMC Corporation. All rights reserved.
Leveraging Greenplum DB On Top of Hadoop
HAWQ
Query Engine Catalog Service
HDFS
Resourc
e
Managem
ent
GPXF
Planner Optimizer
Executor Transaction
Manager
21 © Copyright 2013 EMC Corporation. All rights reserved.
GPXF: Xtension Framework
Enable custom connector
development for other data
sources HDFS HBase Hive
Xtension Framework
22 © Copyright 2013 EMC Corporation. All rights reserved.
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host . . . Query Executor Query Executor Query Executor
Clients
JDBC/ODBC
SQL Console
SELECT beer, price FROM Bars b, Sells s WHERE b.name = s.bar AND b.city = ‘San Francisco’
HDFS Namenode
HAWQ Master Host
Query Optimizer
Query Parser
How HAWQ Works: Submit Query
23 © Copyright 2013 EMC Corporation. All rights reserved.
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host . . . Query Executor Query Executor Query Executor
Clients
JDBC/ODBC
SQL Console HDFS Namenode
HAWQ Master Host
Query Optimizer
Query Parser
How HAWQ Works: Optimizer
Cost Model
Resources
Parse Tree
Metadata
24 © Copyright 2013 EMC Corporation. All rights reserved.
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host . . . Query Executor Query Executor Query Executor
Clients
JDBC/ODBC
SQL Console HDFS Namenode
HAWQ Master Host
Query Optimizer
Query Parser
HAWQ Query Plan
ScanBars
b
HashJoinb.name = s.bar
ScanSells
s Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
25 © Copyright 2013 EMC Corporation. All rights reserved.
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host . . . Query Executor Query Executor Query Executor
Clients
JDBC/ODBC
SQL Console HDFS Namenode
HAWQ Master Host
Query Optimizer
Query Parser
Query Plan Sent To HAWQ Segments
ScanBars
b
HashJoinb.name = s.bar
ScanSells
s Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
ScanBars
b
HashJoinb.name = s.bar
ScanSells
s Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
ScanBars
b
HashJoinb.name = s.bar
ScanSells
s Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
26 © Copyright 2013 EMC Corporation. All rights reserved.
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host . . . Query Executor Query Executor Query Executor
Clients
JDBC/ODBC
SQL Console HDFS Namenode
HAWQ Master Host
Query Optimizer
Query Parser
HAWQ Leverages Dynamic Pipelining
D y n a m i c P i p e l i n i n g ™
27 © Copyright 2013 EMC Corporation. All rights reserved.
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host
HDFS Datanode
HAWQ Segment Host . . . Query Executor Query Executor Query Executor
Clients
JDBC/ODBC
SQL Console HDFS Namenode
HAWQ Master Host
Query Optimizer
Query Parser
Aggregate Data: Sent To The Master & Client
28 © Copyright 2013 EMC Corporation. All rights reserved.
HAWQ Deployment Model
Dynamic Pipelining
... ...
... ... Master
Servers & Name Nodes
Query planning & dispatch
Segment Servers &
Data Nodes
Query processing & data storage
External Sources
Loading, streaming, etc.
HDFS
ODBC/JDBC Driver
29 © Copyright 2013 EMC Corporation. All rights reserved.
HAWQ Benchmarks
User inteligence 4.2 198
Sales analysis 8.7 161
Click analysis 2.0 415
Data exploration 2.7 1,285
BI drill down 2.8 1,815
47X
19X
208X
476X
648X
30 © Copyright 2013 EMC Corporation. All rights reserved.
HAWQ: The Foundation of Big Data
Analytical Query Operational Intelligence
In-Memory DB
Run-Time Applications
In-Memory Objects
HDFS
Data Staging Platform
Data Mgmt. Services
Pivotal Data Fabric
Stream Ingestion
Streaming Services