Upload
seonghyun-jeong
View
1.094
Download
7
Embed Size (px)
DESCRIPTION
Apache Hadoop YARN Overview
Citation preview
목차
1. Apache Hadoop YARN
2. YARN Daemons
3. Why is YARN needed?
4. Scheduling
5. Fault Tolerance
6. Q & A
2
APACHE HADOOP YARN
3
HADOOP 1 vs. 2 YARN
4
Apache Hadoop YARN
(Yet Another Resource Negotiator)
• YARN의구성
• 리소스메니저(Resource Manager)
• 어플리케이션매니저(Applications Manager)
• 스케줄러(Scheduler)
• 어플리케이션마스터(Application Master)
• 노드매니저(Node Manager)
• 컨테이너(Container)
• hadoop-0.23.0에서 소개 (2011년 11월 11일발표)
• 현재최신버전 hadoop-2.4.0 (2014년 04월 07일발표)
5
YARN DAEMONS
6
YARN Daemons
Resource Manager(RM)
• master node에서동작
• global resource scheduler
• application들의자원요구의할당및관리
Node Manager(NM)
• slave node에서동작
• node의자원을관리
• container에 node의자원을할당
7
YARN Daemons
Containers
• RM의요청에의해 NM에서할당
• slave node의 CPU core,
memory의자원을할당
• applications은다수의 container로동작
Application Master(AM)
• application당한개씩존재
• application의 spec을정의
• container에서동작
• application task를위해 container
할당을 RM에게요청
8
WHY IS YARN NEEDED?
9
Why is YARN needed?• Cluster의확장성
• HADOOP 1
• 클러스터의규모와는상관없이 Job Tracker의개수는 1개 (병목지점)
• 예를들어 4000개의노드로구성된클러스터에단 1개의 JobTracker가모든노드의job을관리
• HADOOP 2 YARN
• Job Tracker의기능을 Resource Manager와 Application Master로 분리
• 클러스터에는여러개의 Application이동작가능
• 각 Application은Application Master가모든 task를관리
10
Job Tracker
Resource
Manager
Application
MasterApplication
MasterApplication
Master
HADOOP 2 YARN
TaskTracker
Task
Computation
JobTracker
Job & Task
Management
Resource
Scheduling
Platform + Application
Framework
ResourceManager
Resource
Scheduling
NodeManager
Resource
Monitoring &
Enforcement
Application Master
Job & Task
Management
Platform
Application Framework
Hadoop 1 Hadoop 2 YARN
11
Why is YARN needed?• Application 호환성
• HADOOP 1
• MapReduce외에다른 Application은클러스터의자원을공유할수없는문제
• HADOOP 2 YARN
• 같은클러스터내에서 M/R와다른 application을실행할수있도록지원
12
Why is YARN needed?• 자원할당의유연성
• HADOOP 1
• Slot에미리자원(memory, cpu cores)를할당후미리정해진설정에따라서slot을 job에할당
• Job이모두끝나기전까지는자원이반납되지않는문제
• HADOOP 2 YARN
• 요청이있을때마다요구하는자원의 spec에맞게자원을 container의개념으로할당
• container마다다른 spec의자원을갖을수있음
• 모든 task는 container에서수행되고 task가끝나는즉시자원을반납
13
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
MapReduce Status
Job Submission
Resource allocation
14
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
MapReduce Status
Job Submission
Resource allocation
Client
15
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
MapReduce Status
Job Submission
Resource allocation
Client
16
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Map
(ready)
Map
(ready)
Task
Tracker
Reduce
(ready)
Reduce
(ready)
MapReduce Status
Job Submission
Resource allocation
Client
17
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Map
(running)
Map
(running)
Task
Tracker
Reduce
(ready)
Reduce
(ready)
MapReduce Status
Job Submission
Resource allocation
Client
18
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Map
(done)
Map
(done)
Task
Tracker
Reduce
(running)
Reduce
(running)
MapReduce Status
Job Submission
Resource allocation
Client
19
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Map
(done)
Map
(done)
Task
Tracker
Reduce
(running)
Reduce
(running)
MapReduce Status
Job Submission
Resource allocation
Client
20
Client
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Map
(done)
Map
(done)
Task
Tracker
Reduce
(running)
Reduce
(running)
MapReduce Status
Job Submission
Resource allocation
Client
21
Client(pending)
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
MapReduce Status
Job Submission
Resource allocation
22
Client
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
Task
Tracker
Slot Slot
MapReduce Status
Job Submission
Resource allocation
23
Client
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Map
(ready)
Map
(ready)
Task
Tracker
Reduce
(ready)
Reduce
(ready)
Task
Tracker
Slot Slot
MapReduce Status
Job Submission
Resource allocation
24
Client
HADOOP 1
slot base resource allocate
Job
Tracker
Task
Tracker
Map
(running)
Map
(running)
Task
Tracker
Reduce
(ready)
Reduce
(ready)
Task
Tracker
Slot Slot
MapReduce Status
Job Submission
Resource allocation
25
Client
HADOOP 2 YARN
container base resource allocation
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
26
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
27
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
28
App
Master
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
29
App
Master
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
30
App
Master
Container Container
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
31
App
Master
Map
(ready)
Map
(ready)
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
32
App
Master
Map
(running)
Map
(running)
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
33
App
Master
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
34
App
Master
Container
Container
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
35
App
Master
Reduce
(ready)
Reduce
(ready)
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
36
App
Master
Reduce
(running)
Reduce
(running)
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
37
App
Master
Reduce
(running)
Reduce
(running)
Client
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
38
App
Master
Reduce
(running)
Reduce
(running)
Client
App
Master
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
39
App
Master
Reduce
(running)
Reduce
(running)
Client
App
Master
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
40
App
Master
Reduce
(running)
Reduce
(running)
Client
Container
Container
App
Master
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
41
App
Master
Reduce
(running)
Reduce
(running)
Client
Map
(ready)
Map
(ready)
App
Master
HADOOP 2 YARN
container base resource allocation
Client
Resource
Manager
Node
Manager
Node
Manager
Node
ManagerMapReduce Status
Job Submission
Resource Request
Resource Allocation
42
App
Master
Reduce
(running)
Reduce
(running)
Client
Map
(ready)
Map
(ready)
App
Master
SCHEDULING
43
FIFO Scheduler (Hadoop v 1.x)• Hadoop 1.x의기본스케줄러
• 우선순위를부여하여각우선순위마다 FIFO큐를갖을수있다
• 5단계의우선순위존재 (very high, high, normal, low, very low)
• 높은우선순위의큐부터 slot을할당받기때문에, 낮은순위의큐는기아현상발생
• 기아현상의단점을보완하기위해 Fair 스케줄러사용
• 실험, 테스트용도의소형클러스터외에는사용하는것을권장하지않음
44
Fair Scheduler (Hadoop v 1.x)FIFO와달리 task slot을 pool에부여하고사용자는자신이사용하는pool의 slot을이용
• Pool
• 각사용자가공평하게클러스터를공유하기위해각사용자는pool을갖는다
• 각 pool을보장된최소 slot 수용량을설정가능
• 단일 job을수행중일때클러스터의모든자원을독점
• 특정사용자(pool)가더많은 job을제출해도한정된 task slot에의해더많은클러스터의자원을할당받지못함
• job에서실행중인모든 task가끝나면해당 slot을다른 pool에할당
45
Fair Scheduler (Hadoop v 1.x)• Slot
• 모든 slot은동일한 cpu time을점유• 같은조건의 pool일때모든 pool은동일한 slot과 cpu time을점유• FIFO로동작
• Preemption• 자원을공평하게할당받지못했을때다른 pool의 slot을선점
• mapred.fairscheduler.preemption에서설정 (기본값: false)
• 미리설정한 preemption.timeout보다 긴시간동안자원을할당받지못할때• 수용량이상의자원을할당받은 pool의 job을 kill하고 slot을선점• 선점된 slot에서실행중이던 job는자원을재할당받아처음부터재실행• 선점되는 job은가장최근에실행된 job
• 그동안실행된연산의결과가없어지는기회비용최소화
46
Capacity Scheduler (Hadoop v 2.x)• Hadoop 2.x YARN의기본스케줄러
• yarn.resourcemanager.scheduler.class
• org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Capacity
Scheduler
• 큐의계층화및 capacity 설정을위해별도의설정필요
• capacity-scheduler.xml
• 다수의큐로구성하여계층화할수있다
• 하나의큐는다른큐의자식일수있다 (트리구조)
• root 큐가존재하고클러스터전체큐를서브큐로갖는다
• 각큐는할당된 capacity가있어각사용자가클러스터를개별적으로사용하는효과
• hadoop 2.x에서 preemption이사라졌으며추가할계획이없음
47
Capacity Scheduler Configuration• yarn.scheduler.capacity.<queue-path>.queues
• 해당큐의서브큐를정의 ( , )로구분
• yarn.scheduler.capacity.<queue-path>.capacity
• 상위큐에서해당큐가최소로할당받을수있는 capacity를정의(0~100)
• yarn.scheduler.capacity.<queue-path>.maximum-capacity
• 상위큐에서최대로할당받을수있는 capacity정의(0~100)
• yarn.scheduler.capacity.<queue-path>.state
• 큐의실행상태 <STOPPED, RUNNING>
• yarn.scheduler.capacity.<queue-path>.user-limit-factor
• 큐의자원이가용한범위내에서단일 user가점유할수있는자원의한계(0~1)
48
Capacity Scheduler Configuration
49
root
cluster node count: 10
memory per node: 10GB
cluster total capacity: 100GB
yarn.nodemanager.resource.memory-mb = 10240
capacity: 100GB
Capacity Scheduler Configuration
50
root
A B
root.queues=A, B
root.A.capacity=60 root.B.capacity=40
capacity: 100GB
capacity: 60GB capacity: 40GB
Capacity Scheduler Configuration
51
root
A B
root.B.queues=b1, b2
b1 b2
root.B.b1.capacity=20 root.B.b2.capacity=80
capacity: 32GBcapacity: 8GB
capacity: 100GB
capacity: 60GB capacity: 40GB
root.A.capacity=60
Capacity Scheduler Configuration
52
root
A B
b1 b2
capacity: 32GB
available: 32GB
used: 0%
capacity: 8GB
available: 8GB
used: 0%
capacity: 100GB
available: 40GB
used: 60%
capacity: 60GB
available: 0GB
used: 100%
capacity: 40GB
available: 40GB
used: 0%
A 큐의모든 capacity를소모했다고가정
Capacity Scheduler Configuration
53
root
A B
b1 b2
capacity: 32GB
available: 32GB
used: 0%
capacity: 8GB
available: 0GB
used: 100%
capacity: 60GB
available: 0GB
used: 100%
capacity: 40GB
available: 32GB
used: 20%
b1에새로운각각 4GB를요구하는 job두개가제출
capacity: 100GB
available: 32GB
used: 72%
Capacity Scheduler Configuration
54
root
A B
b1 b2
capacity: 32GB
available: 24GB
used: 0%
capacity: 8GB
available: 0GB
used: 200%
capacity: 60GB
available: 0GB
used: 100%
capacity: 40GB
available: 24GB
used: 40%
b1의 capacity를모두소진한상태에서 b1에8GB를요구하는새로운 job이제출
capacity: 100GB
available: 24GB
used: 86%
Capacity Scheduler Configuration
55
root
A B
b1 b2
capacity: 32GB
available: 24GB
used: 0%
capacity: 8GB
available: 0GB
used: 200%
capacity: 60GB
available: 0GB
used: 100%
capacity: 40GB
available: 24GB
used: 40%
b2에 32GB를요구하는 job이제출되었지만요구하는자원이클러스터의가용한자원보다크기때문에wait상태로다른 job이자원을반납할때까지대기
capacity: 100GB
available: 24GB
used: 86%
Capacity Scheduler Configuration
56
root
A B
b1 b2
capacity: 32GB
available: 0GB
used: 100%
capacity: 8GB
available: 0GB
used: 100%
capacity: 60GB
available: 0GB
used: 100%
capacity: 40GB
available: 0GB
used: 100%
b1에서 8G의자원을할당받아동작중이던 job이종료되고자원을반납하게되면회수된자원을 b2에서대기중이던 job에할당
capacity: 100GB
available: 0GB
used: 100%
FAULT TOLERANCE
57
Fault Tolerance• Task(Container) –기존방식과동일
• MRAppMaster는 task가멈추거나예외가발생하면다시시작하도록시도
• 기본 4번: mapred.map.max.attempts, mapred.reduce.max.attempts)
• task가너무많이실패할경우 application이실패했다고간주
• 강제로종료된 task의경우실패로간주하지않음
• Application Master
• AM가 heartbeat을보내는것을멈추면, RM은 AM을다시시작하도록시도
• 기본 2번: yarn.resourcmanager.am.maxretries
• MRAppMaster의 optional setting: Job recovery
• if false, 모든 task들이다시시작
• if true, 실패한 task만을다시실행시켜 MRAppMaster의 task의상태를회복
58
Fault Tolerance• Node Manager
• NM가 heartbeat을 RM으로보내는것을중단할경우, RM은해당node를 active node에서제외.
• 기본 10분: yarn.resourcemanager.nm.liveness-monitor.expiry-interval-ms
• MRAppMaster에의해서해당 node의 task들은실패한것으로간주.
• AppMaster가있는 node가실패할경우, application의실패로간주.
• Resource Manager
• RM이작동을멈출경우, 어떠한 job, task도 container에배치불가
• HA(High Availability)를 지원하도록설정가능
59
Q & A
60
감사합니다.
61