Cloudbreak - Technical Deep Dive

Cloudbreak – Technical Deep DiveJanos Matyas & Krisztian HorvathHortonworks

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Presenters

Krisztian HorvathSenior Member of technical staff, CloudbreakFormer Co-Founder at SequenceIQ

Janos MatyasSenior Director of Engineering, CloudbreakFormer Co-Founder and CTO for SequenceIQ


AgendaGoals and Motivations

Technology Stack + Deep Dive

Lessons Learned + Best Practices

Demo + Q & A


Goals and Motivations – What We Wanted to Do…

Declarative/full Hadoop stack provisioning in all major cloud providers Automate and unify the process Zero-configuration approach Same process through a cluster lifecycle (Dev, QA, UAT, Prod) Provide tooling - UI, REST API and CLI/shell Secure and multi-tenant SLA policy based autoscaling


Goals and Motivations – What We Wanted to Do…

All cloud providers are fundamentally different… Compute, network, security, performance

We want to share what we found, and how we made it work!





Demo + Q & A


Technology Stack

Apache Ambari Cloud provider API Salt Docker Packer


Deep Dive - Overview

Cloudbreak Deployer (CBD)– Tool to deploy the Cloudbreak application– Microservice architecture (using Docker)– DevOps friendly

Cloudbreak Application– Extensible, available through UI, CLI, REST API– SLA auto-scaling policy management

Cluster deployed with Cloudbreak


Deep Dive – Cloudbreak Deployer

Installation– Single binary, written in Go– Requires Docker 1.9.1+– DIY installation on any RHEL / CentOS / Oracle Linux 7 (64-bit) distro– Use one of the pre-built cloud images (AWS, Azure, GCP, OpenStack)

Operations– Easy upgrades/downgrades, automatic schema migration

Cloud provider support– AWS – generates IAM roles– Azure – ARM and DASH config

Utilities– Cloudbreak shell support - interactive, remote, automated execution, OAuth2 token generation– Local development environment setup


Deep Dive – Cloudbreak Application

Installation– Done with Cloudbreak Deployer (CBD)

Operations– Consistent feature set through UI, CLI and secure REST API– Multi-tenant, ACL setup, usage reports– Custom stack repositories, failure actions– Event history, cluster management– SLA based auto-scaling policy configs, enforcement

Cloud provider support– Agnostic API– AWS, Azure, GCP, OpenStack, Mesos– SPI interface – bring your own provider, stack under Cloudbreak management


Deep Dive – Cluster deployed with Cloudbreak

Installation– Managed by Cloudbreak using cloud provider API– Default (optimized) configs – specific to cloud provider

Operations– Default, custom configs for stacks, services, network, storage, security– Declarative Hadoop cluster– Custom instance types (heterogeneous clusters)– Different storage types – Configurable network– Security (access, Kerberos, SSSD, FreeIPA)

Utilities– Ambari Views– Metadata/shared clusters support





Demo + Q & A


Lessons Learned Not all cloud providers are the same

– Difference in performance, storage and functionality

(Capacity) planning – Based on workload type (batch / interactive and ad-hoc / long running)– Use heterogeneous clusters– Trial and error – mistakes are cheap, iterate until you find your best fit– Leverage the cloud - scale your cluster on demand

Number one consideration – storage – Multiple choices (ephemeral, block storage and BLOB store)– Bring compute to storage – might not work (everywhere) – in cloud everything is as a service– Independently scale storage from compute, partition your data

Security– Consider using strict security rules (private subnets, access, etc) and use edge nodes


Lessons Learned - AWS Compute

– Find your instance types for the workload, use heterogeneous clusters – Different instance types for transient (e.g. C4, M4) and long running (e.g. H2, D2) clusters– Dedicated instances (to avoid noise, regulations e.g. HIPPA)

Storage– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)– Note that S3 gives you only eventual consistency– Different driver implementation: S3n (native, jets3t based), S3a (successor of n) , S3 (block based)

Network– Use enhanced networking (Amazon Linux by default, RHEL based – apply patch)– Placement groups– Not all instance types can use the 10Gbit network (e.g. use 8x)

Security– Use instance roles to access S3, deploy in a private subnet/VPC


Lessons Learned - AWS

* D28xlarge used as instance type


Lessons Learned - AWS

* D28xlarge used as instance type


Lessons Learned - Azure Compute

– Find your instance types for the workload, use heterogeneous clusters – Different instance types for transient (e.g. A and D family) and long running (e.g. Dv2) clusters– Use ARM instead of old API

Storage– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)– Storage account scaling limitations – Use WASB or WASB with DASH (default with Cloudbreak)– Azure Data Lake Store – soon– Ephemeral disk is faster than root disk – does not survive auto-updates

Network– No PTR record/reverse lookup support

Security– Integrate/sync with your corporate AD


Lessons Learned - Azure


Lessons Learned - GCP Compute

– Find your instance types for the workload, use heterogeneous clusters – No template based provisioning

Storage– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)– Use Google Cloud Storage Connector

Network– Network isolation/DNS problem

Security


Lessons Learned - OpenStack Compute

– Find your instance types for the workload, use heterogeneous clusters – Use Heat templates instead of API calls (we support both)

Storage– Currently we support only Cinder volumes– Swift and Ceph is planned– Data locality through Cloudbreak – let us know your topology or rack/hypervisor mapping

Network– Configure DNS properly – Use multiple network (Neutron) nodes in case of a large cluster

Security– Use Keystone 3 (support for OAuth, Federation, introduction of groups/domains)


Lessons Learned - Mesos In Tech Preview

– come and talk to us after the talk– Or @Hortonworks boot





Demo + Q & A


Thank You