Upload
hortonworks
View
116
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Apache Ambari BOF Meet Up @ Hadoop Summit 2013 OpenStack http://www.meetup.com/Apache-Ambari-User-Group/events/119184782/
Citation preview
© Hortonworks Inc. 2013
Hadoop + OpenStack integration Roadmap
Himanshu Bari
June 28th, 2013
Sr. Product Manager [email protected]
© Hortonworks Inc. 2013
Disclaimer • This document may contain product features and technology directions
that are under development or may be under development in the future.
• Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all affect timing and final delivery.
• This document’s description of these features and technology directions does not represent a contractual commitment from Hortonworks to deliver these features in any generally available product.
• Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
© Hortonworks Inc. 2013
Agenda
Why Hadoop on OpenStack
Use cases A bit under the hood
© Hortonworks Inc. 2013
Big Data & Cloud Intersection
Point è2013
Big Data & Cloud are top priority for CIOs
Page 4
*
© Hortonworks Inc. 2013
OpenStack is an open source cloud management platform
Glance Image Service
Keystone Identity Service
Horizon
Quantum Nova Cinder
Block Store
Swift Object Store
(Apache License)
Ceilometer Metering
Heat Orchestration
Integrated
Mutli-hypervisor & guest OS support
© Hortonworks Inc. 2013
OpenStack has taken over Amazon AWS in market awareness…
Source: Google trends
© Hortonworks Inc. 2013
Maturing quickly with broad support.. Pushed by
150+ vendors Millions of dollars in
venture capital Early adop;on across all
ver;cals
© Hortonworks Inc. 2013
Why Hadoop & OpenStack? Hadoop provides a greenfield use case • Net new workload • Needs scale out
infrastructure • Shared platform
OpenStack provides the perfect cloud platform • Operational agility • Supports scale out architecture • Deployment choice across
public & private clouds
1. Open source communities provide the fastest path to innovation 2. Open source is changing the game as economics and accessibility serve to
accelerate cloud & big data market trends 3. Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc…
Marries two of the largest open source movements
© Hortonworks Inc. 2013
Accelerate Adoption of Hadoop on OpenStack
Page 9
The leading contributor to Apache Hadoop
The leading system integrator for OpenStack
The leading contributor to OpenStack
Apache Hadoop… The killer app for OpenStack
© Hortonworks Inc. 2013
OpenStack Infrastructure
Savanna Elastic Hadoop Controller
Collaborating on Project Savanna
Page 10
Swift storage
Hadoop Cluster
NN
NN
NN
2
Ambari Hadoop management
- - + +
NN
NN
1
3
1. Cluster templates: deploy pre configured Hadoop clusters in seconds from Horizon or Ambari
2. HDFS-Swift connectors:
move data between HDFS and Swift object storage
3. Simplified elasticity
Project Savanna Automate deployment of Apache Hadoop on OpenStack
© Hortonworks Inc. 2013
Agenda
Why Hadoop on OpenStack
Use cases A bit under the hood
© Hortonworks Inc. 2013
Focus on API driven tight integration
Hide Hadoop complexity through APIs “It Just Works” experience
Fully leverage virtualization Scalability, Reliability, Performance
Project Savanna design Goals
© Hortonworks Inc. 2013
Problems driving use cases
Finance Compliance
IT Marketing
Web Mobile
Sensor
Interactive
Batch
Dev QA Prod
Operational nightmare of supporting multiple cluster flavors
Lack of agility Underutilized resources
Maintenance complications
Cluster requirements vary by business unit, data type & analytics use case
Can’t migrate from public to private cloud
© Hortonworks Inc. 2013
Provisioning related use cases
- Frequent dev/test/staging cluster provision requests - Migrations from staging to prod and vice versa - Reduce operator error in cluster provisioning - Migrate away from Amazon EMR for Ad hoc analytics
requests to support experimentation
© Hortonworks Inc. 2013
Simplified provisioning P
hase
-1
Pha
se-2
Use as is Single click provisioning
Modify
Update VM resource allocation, service to VM mapping and service config
Provision and/or save
template
Template based provisioning
Hadoop as a service (job flow based provisioning)
Pick job type
+ Cascading, streaming & custom jar
Upload data to Swift
Get results in Swift
Cluster template E.g. QA cluster
Node template a. Resource based -‐ node.Large b. Func;on based -‐ node.NameNode
Modify
© Hortonworks Inc. 2013
Ambari embedded in Horizon
© Hortonworks Inc. 2013
Swift object store support
Phase-1
Phase-2 Bug fixes & optimizations
Read/write data from/to Swift object stores Option-1: Copy data from Swift to HDFS, run mapreduce and copy results back to swift Option-2: Run mapreduce directly on top of Swift (Output data still needs to be copied from HDFS to Swift)
© Hortonworks Inc. 2013
Elasticity related use cases
- Commission a new node or decommission a node for maintenance
- For dev/test/staging clusters: automatically vary
cluster data & compute capacity based on tenant, workload, time of day, resource utilization etc.
- Automatically vary compute capacity for production
clusters
© Hortonworks Inc. 2013
Elasticity N
ode
elas
ticity
(c
ompu
te a
nd/o
r dat
a)
Manual
Rule based
Long lived Short lived
Cluster life (Swift or HDFS used for storage)
Phase-1
Phase-2
Handle variable workloads eg. Alter cluster compute node count for peak/off-peak hrs.
Job flow based clusters for ad-hoc analysis
Best for Dev/QA use
Best for predictable workloads.
© Hortonworks Inc. 2013
Multi-tenancy related use cases
- Improve server utilization by creating a common server pool for Hadoop and non Hadoop workloads
- Simplify maintenance & upgrade testing with the
ability to multiple Hadoop clusters with different versions on the same server pool
- Support varying SLAs based on tenant and workload
through resource isolation provided by VMs - Simplify chargeback/showback
© Hortonworks Inc. 2013
Multi-tenancy
Phase-1
Phase-2
• Access isolation • Single sign-on for Ambari & HUE through Keystone
integration • Dedicated Ambari & HUE instance per cluster per
tenant • Resource isolation
• CPU, memory isolation through VMs • Ability to pin a Hadoop VM to a given set of physical
hosts to enable per tenant physical host isolation • Version isolation
• Choice of Hadoop versions for tenants
• Access isolation • Single Ambari instance per tenant ( multi-cluster
support with Ambari) • Keystone enhancements to support Hadoop job flow
level RBAC to support Hadoop as a service
© Hortonworks Inc. 2013
Agenda
Why Hadoop on OpenStack
Use cases A bit
under the hood
© Hortonworks Inc. 2013
Savanna logical architecture
OpenStack Infrastructure
Network Storage
Security Compute
Savanna Controller
HDP Savanna plugin
API
Hadoop Provisioning
Ambari template management
Horizon + Savanna UI
A P I
Configuration Elasticity
Orchestration
Plugin manager
Hadoop Cluster
Ambari + API
© Hortonworks Inc. 2013
Provisioning workflow overview
24
Horizon
Savanna Controller
+ HDP OpenStack
Plugin
Nova Glance Cluster request
Provisions vanilla VMs
Ambari configures all services and
starts the cluster
VM IMAGE OS only
OR Pre loaded
with HDP bits
HDP plugin passes cluster
template to Ambari
Hadoop Cluster
……
HDP Plugin installs Ambari
Ambari Server
HUE
NN
JT
DN
DN
© Hortonworks Inc. 2013
Ambari based cluster templates
Preconfigured information across all clusters using this template
HDP Stack Information
- Services & Components & Packages - Description - Package Dependencies
Hadoop Topology
Component / Host Group Mapping
Hadoop Configuration All Hadoop Configuration for the Cluster (hundreds of parameters and their values)
Per cluster pluggable data
- User names - Passwords - Host names - Host VM flavors ( CPU/Mem) - Node count per host group ………. ………. ………. ……….
© Hortonworks Inc. 2013
Swift object store support (Hadoop-8545)
Dir
File1 file2 file3
KE
YS
TON
E
Dir/file1 Dir/file2
MapReduce, pig & Hive
Swift store-1
Create, read, write, delete, mkdir, ls, mv & stat
HDFS +
Swift Bridge
Container -1 Container -2
Swift store-n
…
Dir/file3
Container -1
Input data
Output results
© Hortonworks Inc. 2013
Hadoop virtualization extensions(HVE)
• Account for the additional ‘node group’ layer so replicas do not end up on VMs in the same hypervisor
• Available in HDP 1.3. Work in progress to enable in HDP 2.0 ( YARN & HDFS)
Data Center
Rack-1
Node group-1
VM1 VM2
Node group-2
VM1 VM2
Rack-2
Node group-1
VM1 VM2
Node group-2
VM1 VM2
- Replica (place, choose & remove) policies
- Balancer policies - Task placement &
container allocation(YARN)