41
© Hortonworks Inc. 2014 YARN Ready – Apache Slider Provisioning, Managing, and Monitoring YARN Applications Sumit Mohanty @smohanty (@hortonworks) Steve Loughran @steveloughran (@hortonworks) Page 1

YARN Ready - Integrating to YARN using Slider Webinar

Embed Size (px)

DESCRIPTION

YARN Ready webinar series - this is the 2nd in the series of webinars that shows developers how to integrate to YARN using Slider.

Citation preview

Page 1: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

YARN Ready – Apache Slider

Provisioning, Managing, and Monitoring YARN Applications

Sumit Mohanty

@smohanty (@hortonworks)

Steve Loughran

@steveloughran (@hortonworks)

Page 1

Page 2: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Agenda

• Long running applications on YARN• Introduction to Slider• Writing a Slider Application• Key Slider Features• Conclusion• Q/A

Page 2

Page 3: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Applications on Yarn

Page 3

Page 4: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

YARN runs code across the cluster

Page 4

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

• Servers run YARN Node Managers• NM's heartbeat to Resource Manager• RM schedules work over cluster• RM allocates containers to apps• NMs start containers• NMs report container health

Page 5: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Client creates App Master

Page 5

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

ClientApplication Master

Page 6: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

AM asks for containers

Page 6

HDFS

YARN Node Manager

HDFS

YARN Node Manager

HDFS

YARN Resource Manager

HDFS

YARN Node Manager

Application Master

Container

Container

Container

Page 7: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

YARN notifies AM of failures

Page 7

HDFS

YARN Node Manager

HDFS

YARN Node Manager

Container

HDFS

YARN Resource Manager

HDFS

YARN Node Manager

Application Master

Container

Container

Page 8: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Long Running Applications

Page 8

Page 9: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Management

Page 9

• Application instance must be managed– (Install/Configure/Start)–Restart–Reconfigure/Rolling update–Stop/Graceful stop–Status–Activate/deactivate/rebalance

• Upgrade–Long running applications need to provide upgrade support,

preferably rolling upgrade

Page 10: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Registration and Discovery

Page 10

• Application must declare itself–URLs–Host/port–Config (client config)

• Application must be discoverable–Registry–Name-based lookups–Regularly updated

• Client support–Callback if “data” changes; thick clients–Configurable gateway; thin clients

Page 11: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Monitoring

Page 11

• Metrics– Instantaneous metrics (jmx)–Time-series metrics (ganglia)–Configure Ganglia or other metrics stores

• Alerts–Based of jmx/port scan/container status–Configure Nagios or other alerting mechanism

Page 12: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Logs and Events

Page 12

• Logs–Continuous log gathering–Single view for logs across all containers

• Lifecycle Events– Integration with Application Timeline Server

Page 13: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

In addition to …

Page 13

• Security–Configured for security–Token renewal

• High Availability–On a highly available cluster (NN, RM HA)– Itself highly available (multi-master)

• Packaging• Configurability• …

Page 14: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Apache Slider

Page 14

Page 15: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Why?

• Many mature applications exist• Full YARN-integration takes effort• Running under YARN delivers access to all the data in HDFS –and the CPU power alongside it

• As Hadoop stack evolves, more to integrate with• Management tools –e.g. Ambari– exist to monitor applications in-cluster

Page 15

Page 16: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Slider is an in-incubation project with one goal:

Make it possible and easy to deploy and manage existing applications on a YARN cluster

Page 16

Status: Currently in Tech Preview

GA with the next HDP release, tentatively November

Page 17: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Slider view of an Application

Page 17

• An application is a set of components• A component is a daemon/launched exe

– configuration– scripts, data files, etc.

• Component may have one or more instances• Component instances are managed

–By extension, the app instance is

• Example–HBase Application (3 components)

– HBase Master– HBase RegionServer– HBase REST service

Page 18: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

YARN Containers with Slider

Page 18

YARN Node Manager

Component (container)AppMaster (container)

YARN Node Manager

HDFS

Slider Agent

Application

Slider AppMaster

Slider Client

HDFS

HDFS

YARN Resource Manager

Page 19: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Application by Slider

Page 19

SliderApp Package

SliderCLI

HDFS

YARN Resource Manager“The RM”

HDFS

YARN Node Manager

Agent Component

HDFS

YARN Node Manager

Agent Component

Similar to any YARN application

1. CLI starts an instance of the AM

2. AM requests containers

3. Containers activate with an Agent

4. Agent gets application definition

5. Agent registers with AM

6. AM issues commands

7. Agent reports back, status,configuration, etc.

8. AM publishes endpoints, configurations

Application Registry

App Master/Agent Provider

1

2

3

3

4

4

5 5

6

8

7

6

7

Page 20: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Slider AppMaster/Agent/Client

Page 20

• AppMaster–Common YARN interactions–Common *-client interactions–Publishing needs

• Agent–Configure and start–Re-configure and restart–Heartbeats & failure detection–Port allocations and publishing–Custom commands if any (e.g. graceful-stop)

• Client–App life cycle commands (flex, status, …)

Page 21: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Memcached on YARNSample Slider App

Page 21

Page 22: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Other Application Packages

Page 23

• Reference doc for Memcached Application–http://slider.incubator.apache.org/docs/slider_specs/

hello_world_slider_app.html

• Slider github repo has other app–Accumulo–HBase–Storm–Memcached-windows

Page 23: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Other Capabilities

Page 24

Page 24: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

App Packaging Capabilities

Page 25

• Dynamic port allocation and sharing• Inter-component dependency

–Specify the start order of components

• Exports–Construct arbitrary name value pairs–E.g. URLs (org.apache.slider.monitor: http://${HBASE_MASTER_HOST}:$

{site.hbase-site.hbase.master.info.port}/master-status)

• Default HDFS and ZK isolation

Page 25: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Application Registry

Page 26

• A common problem (not specific to Slider)– https://issues.apache.org/jira/browse/YARN-913

• Currently,– Apache Curator based– Register URLs pointing to actual data– AM doubles up as a webserver for published data

• Plan– Registry should be stand-alone– Slider is a consumer as well as publisher– Slider focuses on declarative solution for Applications to publish data– Allows integration of Applications independent of how they are hosted

Page 26: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Plan: YARN Service Registry

# YARN-wide registry in Zookeeper# Services listed by (user, service class, name)/yarnRegistry/users/sumit/slider/cluster1

# Ephemeral liveness node /yarnRegistry/users/sumit/slider/cluster1/live

# service entry lists bindings: URLs, IPC (host, port), ZK

# individual components have own (ephemeral) entries & endpoints/yarnRegistry/users/sumit/slider/cluster1/components/appmaster

# ZK R/W API, REST read-only API

Page 27

Page 27: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Security

Page 28

• Applications validated to work in Kerberos secured cluster–Secure cluster created and keytabs available to application

components–Security parameters specified in application configuration–User obtains TGT (kinit) prior to Slider application creation–E.g. HBase 0.98.4

• Agent-AM SSL communication–One-Way by default–Two-Way can be enabled

• Work initiated on ticket renewal for long running applications–YARN, HDFS

Page 28: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Failure Handling

Page 29

• Application Component Failure–Component instance restarted

• AppMaster failure–YARN restarts the AppMaster, Slider reconstructs states, registry–App lifecycle commands are temporarily unavailable

• NodeManager failure–App remains unaffected

• ResourceManager/NodeManager failures with HA–App remains unaffected

Page 29: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Windows and Linux Support

Page 30

• Feature set parity on both platforms• Similar packaging constructs

–Typically, only path spec needs to change

• Both Linux and Windows Server as a platform for–Client (host slider-client)–Cluster (host hadoop cluster)

Page 30: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Join in: Bring your favorite Applications to YARN

Page 31

Page 31: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Slider-ifying a new application

1. Grab slider: http://slider.incubator.apache.org/downloads/

2. Look at App Package docs:http://slider.incubator.apache.org/docs/slider_specs/

3. Look at source code examples under app-packages

4. Start with memcached/memcached-windows

Page 32

Page 32: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

YARN API vs. Slider

Page 33

• Native YARN app– Your own AppMaster is in charge: container placement, fault handling– You can implement an IPC API for callers to manipulate the application– AppMaster can send out event notifications

Ideal for large-scale distributed algorithms, with specific placement and scheduling needs

• Slider App– Slider AppMaster handles YARN integration with best-effort placement

history, fault handling (recreate component instance)– Simple API/Web UI for cluster manipulation, endpoint listing– Lots of failure and security testing– You only need to write the App package (& test)

Long-lived applications where failures can addressed by restarting elsewhere, with flexing decisions by admins

Page 33: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Everyone is welcome

• Useful Links–Website

– http://slider.incubator.apache.org/

–Dev Mailing Lists– [email protected]

– JIRA– https://issues.apache.org/jira/browse/SLIDER

• Current and Upcoming Releases– Slider 0.30 (May)– Slider 0.40 (July)– Slider 0.50 (planned)

Page 34

Page 34: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Q/Ahttp://slider.incubator.apache.org/

Page 35

Page 35: YARN Ready - Integrating to YARN using Slider Webinar

Next Steps

Page 36: YARN Ready - Integrating to YARN using Slider Webinar

5Next Steps

1. Review YARN Slider Resources

2. Review webinar recordingor attend the next webinar

3. Attend Office Hours

4. Sign up for a 2 day class

5. Attend the next YARN webinar

Page 37: YARN Ready - Integrating to YARN using Slider Webinar

ResourcesSetup HDP 2.1 environment• Leverage Sandbox: Hortonworks.com/Sandbox

Get Started with YARN

• http://hortonworks.com/get-started/YARN

Technical Preview

• http://hortonworks.com/blog/apache-slider-technical-preview-now-available/

Apache• http://slider.incubator.apache.org/

Dev Mailing Lists• [email protected]

JIRA• https://issues.apache.org/jira/browse/SLIDER

Page 38: YARN Ready - Integrating to YARN using Slider Webinar

Hortonworks Office Hours

YARN Office HoursDial in and chat with YARN experts

Next Office Hour: Thursday August 14 @ 10-11am PDT. Register:

https://hortonworks.webex.com/hortonworks/onstage/g.php?t=a&d=628190636

We plan Office Hours for September 11th and October 9th @ 10am PT (2nd Thursdays)

Invitations will go out to those that attended or reviewed YARN webinars

Page 39: YARN Ready - Integrating to YARN using Slider Webinar

And from Hortonworks University

Hortonworks Course: Developing Custom YARN ApplicationsFormat: Online

Duration: 2 Days

When: September – date tbd

Cost: No Charge to Hortonworks Partners

Space: Very Limited

Interested? Please contact Lisa

Page 41: YARN Ready - Integrating to YARN using Slider Webinar

© Hortonworks Inc. 2014

Thank you.

Page 42