Upload
priyanka-gugale
View
261
Download
3
Embed Size (px)
Citation preview
Apache Apex Meetup
Agenda
● Understanding YARN○ Why YARN○ Introducing YARN○ YARN architecture○ Beyond batch○ Application Lifecycle
● Building YARN application
Apache Apex Meetup
Why YARN
Hadoop v1 (MR1) Architecture● Job Tracker
○ Manages cluster resources ○ Job scheduling
● Task Tracker○ Per-node Agent○ Manages tasks
MapReduce Status
Job Submission
JobTracker
Task Task
Task Task
Client
Client
TaskTracker
Task Task
Task Tracker
TaskTracker
Apache Apex Meetup
Limitations with MR1
• Scalabilityo Maximum cluster size: 4,000 nodeso Maximum concurrent tasks: 40,000
• Availability
• Resource Utilization
• Running non-MapReduce applications
Why YARN (Cont…)
Apache Apex Meetup
Introducing YARN
● YARN - Yet Another Resource Negotiator
● Framework that facilitates writing arbitrary distributed processing frameworks and applications.
● YARN Applications/frameworks:e.g. MapReduce2, Apache Spark, Apache Giraph, Apache Apex etc.
Apache Apex Meetup
Hadoop beyond Batch
YARN for better resource utilization
More applications than MapReduce
Apache Apex Meetup
Introducing YARN
≈
7Proprietary and Confidential
Job Tracker
Resource Manager
Application Master
Timeline Server
Task Tracker Node Manager
Map Slot
Reduce Slot
≈
≈
YARNMap Reduce 1
≈
Apache Apex Meetup
• Resource Managero Manages and allocates cluster resources
o Application scheduling
o Applications Manager
• Node Managero Per-machine agent
o Manages life-cycle of container
o Monitors resources
• Application Mastero Per-application
o Manages application scheduling and task execution
Hadoop v2 (YARN) Architecture
App Master Cntr
NodeManager
Cntr Cntr
NodeManager
Cntr AppMaster
NodeManager
ResourceManager
MapReduce StatusJob SubmissionNode StatusResource Request
Client
Client
Apache Apex Meetup
Application Submission workflow
YarnClient
Node RM
(ApplicationsManagers + Scheduler)
Node NM
Node NM
Application Master
ContainerContainer
1) Submit application
2) Launch application Master
RM = Resource ManagerNM = Node ManagerAM = Application Master = Heartbeats
3) AM registers with RM
4) AM negotiates for containers
5) Launch Container
5) Launch Container
Apache Apex Meetup
Sample YARN application - Client
1. Start the service - YarnClient- YarnClient.start()
2. Create Application object - YarnClientApplication- YarnClient.createApplication()
3. Set up App Context - ApplicationSubmissionContext- ApplicationSubmissionContext represents information needed by ResourceManager to launch ApplicationMaster
4. Submit application to resource manager- YarnClient.submitApplication(ApplicationSubmissionContext)
11Proprietary and Confidential
AppName, Priority, ContainerLaunchContext,…
Apache Apex Meetup
Sample YARN Application - App Master1. Register App Master with Resource Manager
- AMRMClient.registerApplicationMaster
2. Negotiate containers from resource manager - Provides ContainerRequest - request for container resources- AMRMClient.addContainerRequest
3. Build ContainerLaunchContext- Uses container returned by Resource Manager- ContainerLaunchContext - represents information needed by node manager to launch a container
12Proprietary and Confidential
ContainerId,Commands,Environment,LocalResources,…
Apache Apex Meetup
Sample YARN Application - App Master (cont…)
4. Launch container using NMClient.startContainer
5. Wait till all containers are done- AllocateResponse.getCompletedContainersStatuses
6. Unregister application from Resource Manager- AMRMClient.unregisterApplicationMaster
13Proprietary and Confidential
Apache Apex Meetup
References● Simple Yarn code example
○ https://github.com/hortonworks/simple-yarn-app
● Document references○ https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html○ http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/○ http://www.slideshare.net/
Apache Apex Meetup
Resources
15
Apache Apex Community Page
Apache Apex LinkedIn Group