Upload
hortonworks
View
730
Download
3
Tags:
Embed Size (px)
Citation preview
Hadoop EverywhereHortonworks. We do Hadoop.
$ whoamiSean RobertsPartner Solutions EngineerLondon, EMEA & everywhere
@seanolinkedin.com/in/seanorama
MacGyver. Data Freak. Cook. Autodidact. Volunteer. Ancestral Health. Fito. Couchsurfer. Nomad
- HDP 2.3- http://hortonworks.com/
- Hadoop Summit recordings:- http://2015.hadoopsummit.org/san-jose/- http://2015.hadoopsummit.org/brussels/
- Past & Future workshops:- http://hortonworks.com/partners/learn/
What’s New!
Agenda● Hadoop Everywhere● Deployment challenges & requirements● Cloudbreak & our Docker approach● Workshop: Your own CloudBreak
○ And auto-scaling with Periscope● Cloud best practicesReminder:● Attendee phone lines are muted● Please ask questions in the chat
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DisclaimerThis document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.
Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
Hadoop Everywhere
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Any applicationBatch, interactive, and real-time
Any dataExisting and new datasets
AnywhereComplete range of deployment options
Commodity Appliance Cloud
YARN: data operating system
Existing applications
Newanalytics
Partner applications
Data access: batch, interactive, real-time
Hadoop Everywhere
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hybrid Deployment ChoiceWindows, Linux, On-Premise or CloudData “gravity” guides choice
Compatible ClustersRun applications and data processing workloads wherever and whenever needed
Replicated DatasetsDemocratize Hadoop data access via automated sharing of datasets using Apache Falcon
Hadoop Up There, Down Here...Everywhere!
Dev / Test BI / ML
IoT Apps
On-Premises
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Use Cases Where?Active Archive / Compliance Reporting Sensitive data = “down here”; “up there” valid for many
scenarios
ETL / Data Warehouse Optimization Usually has “down here” gravity; DW in cloud is changing that
Smart Meter Analysis Data typically flows “up there”
Single View of Customer May have “down here” gravity; unless you’re using SaaS apps
Supply Chain Optimization May have heavy “down here” gravity
New Data for Product Management “Up there” could be considered for many scenarios.
Vehicle Data for Transportation/Logistics Why not “up there”?
Vehicle Data for Insurance May have “down here” gravity (ex. join with existing risk data)
Anywhere? Up There or Down Here?
DeploymentChallenges & Requirements
Deployment challenges● Infrastructure is different everywhere
○ e.g. Each cloud provider has their own API○ e.g. Each provider has different networking methods
● OS/images are different everywhere● How to do service discovery?● How to dynamically scale/manage?
See prior operations workshops
- Infrastructure- Operating System- Environment Prepared (see docs)- Ambari Agent/Server installed & registered- Deploy HDP Cluster
- Ambari Blueprints or Cluster Wizard- Ongoing configuration/management
Deployment requirements
Options for Automation- Many combinations of tools
- e.g. Foreman, Ansible, Chef, Puppet, docker-ambari, shell scripts, CloudFormation, …
- Provider specific- Cisco UCS, Teradata, HP, Google’s bdutil, …
- Docker with Cloudbreak
Using Ambari with all of the above!
https://github.com/seanorama/ambari-bootstrap/
Demo: Basic script-based example
https://github.com/seanorama/ambari-bootstrap
Requirements:● Infrastructure prepped (see HDP docs)● Nodes with RedHat EL or CentOS 6 systems● HDFS paths mounted (see HDP docs)● sudo or root access
ambari-bootstrap
After Ambari deployment● (optional) Configure local YUM/APT repos● Deploy HDP with Ambari Wizard or Blueprint● Ongoing configuration/management
Using Ansiblehttps://github.com/rackerlabs/ansible-hadoop
Build once. Deploy anywhere.
Docker
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Multiplicity of
Stacks
Multiplicity of hardware
environments
Static website Web frontend
User DB
Queue
Analytics DB
Development VMQA server Public Cloud
Contributor’s laptop
Docker is a “Shipping Container” System for Code
Production ClusterCustomer Data Center
An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker• Container based virtualization• Lightweight and portable• Build once, run anywhere• Ease of packaging applications• Automated and scripted• Isolated
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Why Is Docker So Exciting?For Developers:Build once…run anywhere
• A clean, safe, and portable runtime environment for your app.
• No missing dependencies, packages etc.• Run each app in its own isolated container• Automate testing, integration, packaging• Reduce/eliminate concerns about
compatibility on different platforms• Cheap, zero-penalty containers to deploy
services
For DevOps:Configure once…run anything
• Make the entire lifecycle more efficient, consistent, and repeatable
• Eliminate inconsistencies between SDLC stages
• Support segregation of duties• Significantly improves the speed and
reliability of CICD• Significantly lightweight compared to VMs
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
More Technical ExplanationWHY WHA
T• Run on any LINUX
• Regardless of kernel version (2.6.32+)• Regardless of host distro• Physical or virtual, cloud or not• Container and host architecture must match
• Run anything• If it can run on the host, it can run in the
container• i.e. if it can run on a Linux kernel, it can run
• High Level—It’s a lightweight VM• Own process space• Own network interface• Can run stuff as root
• Low Level—It’s chroot on steroids• Container=isolated processes• Share kernel with host• No device emulation (neither HVM nor PV)
from host)
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker - How it worksApp
A
Hypervisor (Type 2)
Host OS
Server
GuestOS
Bins/Libs
AppA’
Guest
OS
Bins/Libs
AppB
Guest
OS
Bins/Libs
Docker
Host OS kernel
Server
bin
App A
lib
App B
VM
Container
Containers are isolated. Share OS and bins/libraries
GuestOS
GuestOS
…result is significantly faster deployment, much less overhead, easier migration, faster restart
lib
App B
lib
App B
lib
App B
binA
pp A
CloudbreakTool for Provision and Managing Hadoop Clusters In The Cloud
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak• Developed by SequenceIQ• Open source with Apache 2.0
license [ Apache project soon ]• Cloud and infrastructure
agnostic, cost effective Hadoop As-a-Service platform API.
• Elastic – can spin up any number of nodes, add/remove on the fly
• Provides full cloud lifecycle management post-deployment
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Key Features of CloudbreakElastic
• Enables provisioning an arbitrary node Cluster
• Enables (de)commissioning nodes from Cluster
• Policy and time based based scaling of cluster
Flexible
• Declarative and flexible Hadoop cluster creation using blueprints
• Provision to multiple public cloud providers or Openstack based private cloud using same common API
• Access all of this functionality through rich UI, secured REST API or automatable Shell
Enterprise-ready
• Supports basic, token based and OAuth2 authentication model
• The cluster is provisioned in a logically isolated network
• Tracking usage and cluster metrics
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
BI / Analytics(Hive)
IoT Apps(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test(all HDP services)
Data Science(Spark)
Cloudbreak
1. Pick a Blueprint2. Choose a Cloud3. Launch HDP!
Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev /
Test
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak Approach• Use Ambari for heavy lifting
• Provisioning of Hadoop services• Monitoring
• Use Ambari Blueprints• Assign Host groups to physical instance types
• Public/Private Cloud provider API abstracted• Azure/Google/Amazon/Openstack
• Run Ambari agent/server in Docker container• Networking: docker run –net=host• Service discovery: consul (previously serf)
Workshop: Your own Cloudbreak
cloudbreak-deployer● https://github.com/sequenceiq/cloudbreak-deployer
Requirements:● A Docker host (laptop, server or Cloud infrastructure)● Resources:
○ Very little. Tested with 2GB of RAM.
Workshop: Your Own CloudBreak
Requirement: a Docker host● OSX or Windows: http://boot2docker.io/
○ boot2docker init○ boot2docker up○ eval "$(boot2docker shellinit)"○ boot2docker ssh
● Linux: Install the docker daemon● Anywhere: docker-machine “lets you create Docker hosts on your
computer, on cloud providers, and inside your own data center”○ Example on Rackspace:
■ docker-machine create --driver rackspace \--rackspace-api-key $OS_PASSWORD \--rackspace-username $OS_USERNAME \--rackspace-region DFW docker-rax
■ docker-machine ssh docker-rax
Install cloudbreak-deployerhttps://github.com/sequenceiq/cloudbreak-deployer
● curl \ https://raw.githubusercontent.com/sequenceiq/cloudbreak-deployer/master/install | sh && cbd --version
● cbd init● cbd start
You’ll then have your own CloudBreak & Periscope server with API and Web UI
Done: Your own Cloudbreak
Deploy a cluster with your CloudBreak
Documentation:http://sequenceiq.com/cloudbreak/#cloudbreak-credentials
1. Add Credentials
2. Create Cluster
3. Use your ClusterAmbari available as expected
To reach your Hadoop hosts:● SSH to Docker Host
○ Hosts arre listed in “Cloud stack description”○ ssh cloudbreak@IPofHost
● Shell to the “ambari-agent” container○ sudo docker ps | grep ambari-agent
■ note the CONTAINER ID○ sudo docker -it CONTAINERID bash
● Use the hosts as usual. e.g.:○ hadoop fs -ls /
Cloudbreak internals
Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Cloudbreak Internals
Uluwatu(cbreak UI)
Sultans(User mgmt UI)
Browser
CloudbreakshellOAuth2
(UAA)
uaa-db(psql)
Cloudbreak(rest API)
cb-db(psql)
Periscope(autoscaling
)
ps-db(psql)
consul registrator ambassador
docker
Docker
Page 42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Swarm• Native clustering for Docker• Distributed container orchestration• Same API as Docker
Page 43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Swarm – How it works • Swarm managers/agents• Discovery services• Advanced scheduling
Page 44 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Consul • Service discovery/registry• Health checking• Key/Value store• DNS• Multi datacenter aware
Page 45 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Consul – How it works • Consul servers/agents
• Consistency through a quorum (RAFT)
• Scalability due to gossip based protocol (SWIM)
• Decentralized and fault tolerant
• Highly available
• Consistency over availability (CP)
• Multiple interfaces - HTTP and DNS
• Support for watches
Page 46 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari • Easy Hadoop cluster provisioning
• Management and monitoring
• Key feature - Blueprints
• REST API, CLI shell
• Extensible• Stacks• Services• Views
Page 47 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari – How it works• Ambari server/agents
• Define a blueprint (blueprint.json)
• Define a host mapping (hostmapping.json)
• Post the cluster create
Page 48 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Run Hadoop as Docker containers
HDP as Docker Containersvia Cloudbreak
• Fully Automated Ambari Cluster installation• Avoid GUI, use rest API only (ambari-shell)• Fully Automated HDP installation with blueprints• Quick installation (pre-pulled rpms)• Same process/images for dev/qa/prod• Same process for single/multinode
Cloudbreak Ambari HDP
Installs Ambari on the VMs
Docker
VM
Docker
VM
Docker
Linux
Instructs Ambari to build
HDP cluster
Cloud Provider/Bare Metal
Provisions VMs from
Cloud Providers
Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Provisioning – How it works
Start VMs - with a running
Docker daemon
Cloudbreak Bootstrap•Start Consul Cluster
•Start Swarm Cluster (Consul for discovery)
Start Ambari servers/agents - Swarm API
Ambari services
registered in Consul
(Registrator)
Post Blueprint
Page 50 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
Page 51 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn amb-ser amb-
agn
amb-agn
amb-agn
amb-agn
Page 52 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn amb-ser amb-
agn
amb-agn
amb-agn
amb-agn
Blueprint
Page 53 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn- hdfs- hbase
amb-seramb-agn-hdfs-hive
amb-agn-hdfs-yarn
amb-agn-hdfs-zookpr
amb-agn-nmnode-hdfs
Workshop: Auto-Scale your Clusterwith Periscope
Page 55 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Optimize Cloud Usage via Elastic HDP Clusters
Dev / Test
Auto-scaling Policy
• Policies based on any Ambari metrics• Dynamically scale to achieve physical elasticity• Coordinates with YARN to achieve elasticity based on
the policies.
Page 56 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Scaling for Static and Dynamic Clusters
Auto-scale PolicyAuto-scale PolicyAuto-scale Policy
YARN
Ambari Alerts
Ambari Metrics
Ambari
Ambari
Ambari
Provisioning
Cloudbreak Static
Dynamic
Enforces Policies Scales Cluster/YARN Apps
Metrics and Alerts Feed Cloudbreak/Periscope
Scale by Ambari Monitoring Metric1. Ambari: review metric2. CloudBreak: set alert3. Cloudbreak: set scaling policy
Scale up/down by time1. Set time-based alert2. Set scaling policy
Repeat with an alertand policy whichscales down
Roadmap
Page 60 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Release Summary Cloudbreak● It’s own project
(separate from Ambari)● Supported on Linux
flavors which support Docker
Periscope● Feature of Cloudbreak 1.0● Will be embedded in
Ambari later in 2015
Page 61 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Release Timeline
Cloudbreak 1.0 GA
June/July 2015
Cloudbreak 2.0 GA2H2015
Ambari 2.1.0HDP “Dal” / 2.3
Ambari 2.2HDP “Erie” / 2.4
Cloudbreak 1.1August 2015
(est)
Ambari 2.1.1HDP “Dal-M10”
CloudbreakIncubator Proposal
July/August 2015 (est)
Page 62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Supported Cloud Environments
Cloudbreak HDP 2.3
Microsoft Azure GA
AWS GA
Google Compute GA
Cloudbreak HDP 2.3
Cloudbreak HDP 2.4
Openstack Community Tech Preview Tech Preview
Red Hat OSP TBD
HP Helion GA (Tentative)
Mirantis OpenStack
HDP as a Service
Hortonworks Data Platform On Azure
RackspaceCloud Big Data Platform● Rapidly spin up on-demand HDP clusters● Integrated with Cloud Files (OpenStack Swift)● Opt-in for Managed Services by Rackspace
Managed Big Data Platform● Fully Managed HDP on Dedicated and/or Cloud● Leverage Fanatical Support and Industry Leading SLA’s● Supported by Rackspace with escalation to Hortonworks
CSC
HDP on IaaS - Best Practices
Microsoft Azure● Deployment
○ Deploy using CloudBreak○ Deploy using HWX Azure Gallery Image
● Integrated with Azure Blob Storage● Supported directly by Hortonworks● Other offerings
○ Microsoft HDInsight○ HDP Sandbox
Azure Deployment Guideline● All in same Region● Instance Types
○ Typical: A7○ Performance: D14○ 8x1TB Standard LRS x3 Virtual Hard Disk per
server● Multiple Storage Accounts are recommended
○ Recommend no more than 40 Virtual Hard Disks per Storage Account
Azure Blob StoreAzure Blob Store (Object Storage)
● wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
Can be used as a replacement for HDFS● Thoroughly tested in HDP release test suites
Amazon Web Services● Deploy using CloudBreak● Integrated with AWS S3 (object storage)● Supported directly by Hortonworks
Amazon Deployment Guideline● All in same Region/AZ● Instances with Enhanced
Networking
Master Nodes:● Choose EBS Optimized● Boot: 100GB on EBS● Data: 4+ 1TB on EBS
Worker Nodes:● Boot: 100GB on EBS● Data: Instance Storage
○ EBS can be used, but local is preferred
Instance Types:● Typical: d2.● Performance: i2.https://aws.amazon.com/ec2/instance-types/
AWS RDS● Some services rely on MySQL, Oracle or PostgreSQL:
○ Apache Ambari○ Apache Hive○ Apache Oozie○ Apache Ranger
● Use RDS for these instead of managing yourself.
AWS S3 (Object Storage)● s3n:// with HDP 2.2 (Hadoop 2.6)● s3a:// with HDP 2.3 (Hadoop 2.7)
Not currently a direct replacement for HDFS
Recommended to configure access with IAM Role/Policy● https://docs.aws.amazon.
com/IAM/latest/UserGuide/policies_examples.html#iam-policy-example-s3
● Example: http://git.io/vLoGY
Google Cloud● Deploy using
○ CloudBreak○ Google bdutil with Apache Ambari plug-in
● Integrated with Google Cloud Storage● Supported directly by Hortonworks
Google Deployment Guideline
● Instance Types○ Typical: n1 standard 4 with single 1.5 TB
persistent disks○ Performance: n1 standard 8 with 1TB SSD
● Google GCS (Object Storage)● gs://<CONFIGBUCKET>/dir/file● Not currently a replacement for HDFS
S3 & GCS as Secondary storage systemThe connectors are currently eventually consistent so do not replace HDFS
Backup● Falcon, distCP, hadoop fs, HBase ExportSnapshot● Kafka+Storm bolt sends messages to S3/GCS
providing backup & point-in-time recovery sourceInput/Output● Convenient & broadly used upload/download method
○ As a middleware to ease integration with Hadoop & limit access● Publishing static content (optionally with CloudFront)
○ Removes need to manage any web services ● Storage for temporary/ephemeral clusters
Questions
$ shutdown -h now
- HDP 2.3- http://hortonworks.com/
- Hadoop Summit recordings:- http://2015.hadoopsummit.org/san-jose/- http://2015.hadoopsummit.org/brussels/
- Past & Future workshops:- http://hortonworks.com/partners/learn/