View
122
Download
0
Category
Preview:
Citation preview
Hadoop MapReduce -System’s View
By Niketan Pansare (np6@rice.edu)Rice University
Wednesday, March 27, 13
JobSubmission at Client’s side
Client Node Job tracker Node
Task tracker Node
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTrackerjobSubmissionClient.getNewJobID()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
JobTracker
jobSubmissionClient.getNewJobID()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
JobTracker
jobSubmissionClient.getNewJobID()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
JobTracker
jobSubmissionClient.getNewJobID()
RPC call
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
jobConf.getOutputFormat().checkOutputSpecs()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources
JobSubmissionFiles
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources
1. Get destination paths- Job staging area (getStagingArea())- Job submission area- Job config file path (getJobConfPath())- Job jar file path (getJobJar())- Information about splits: (a) split meta file (getJobSplitMetaFile()) (b) split file (getJobSplitFile())
JobSubmissionFiles
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (jar)
Shared FS (HDFS)
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (jar)
Shared FS (HDFS)
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (jar)
Shared FS (HDFS)
jar file + replication = 10
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (jar)
Shared FS (HDFS)
jar file + replication = 10
replication = mapred.submit.replication = default: 10
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (splits/config)
Shared FS (HDFS)
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (splits/config)
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (splits/config)
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (splits/config)
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
JobTracker
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
JobTracker
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
JobTracker
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
JobSplit.SplitMetaInfo
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
JobTracker
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
JobSplit.SplitMetaInfo
d. Copy split file to HDFS (replica=10) path given by
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
JobTracker
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
JobSplit.SplitMetaInfo
d. Copy split file to HDFS (replica=10) path given by
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
JobTracker
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
JobSplit.SplitMetaInfo
d. Copy split file to HDFS (replica=10) path given by
JobSplit.TaskSplitIndex
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
JobTracker
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
JobSplit.SplitMetaInfo
d. Copy split file to HDFS (replica=10) path given by
JobSplit.TaskSplitIndex
e. Copy job config file to JobTracker path given by
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
JobTracker
Copy Job Resources (splits/config)
JobSubmissionFiles
Shared FS (HDFS)
a. Compute splits jobConf.getInputFormat().getSplits()
b. Sort splits based on size (biggest goes first)- Modify Array.sort() in writeSplit() for randomization
c. Copy split “meta” file to jobtracker into path given by
JobSplit.SplitMetaInfo
d. Copy split file to HDFS (replica=10) path given by
JobSplit.TaskSplitIndex
e. Copy job config file to JobTracker path given by
job config file
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
JobTracker
After copying job resources (jar, split files, config)
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
JobTracker
After copying job resources (jar, split files, config)
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
JobTracker
After copying job resources (jar, split files, config)
RPC submitJob()
Wednesday, March 27, 13
Client Node
Client pgm
Job
job.submit()JobClient
jobClient.submitJobInternal()
Client stub to JobTracker
JobTracker
After copying job resources (jar, split files, config)
RPC submitJob()
Done with Job Submission at Client side ....Now let’s look at JobTracker’s side.
Wednesday, March 27, 13
JobSubmission at Job tracker node
Client Node Job tracker Node
Task tracker Node
Client stub to JobTracker
JobTracker
Wednesday, March 27, 13
JobSubmission at Job tracker node
Client Node Job tracker Node
Task tracker Node
Client stub to JobTracker
RPC submitJob() JobTracker
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
Read job config file
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
Read job config file
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
createSplits()
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
split meta file (JobSplit.SplitMetaInfo)
createSplits()
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
split meta file (JobSplit.SplitMetaInfo)
createSplits()
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
split meta file (JobSplit.SplitMetaInfo)
createSplits()JobSplit.TaskSplitMetaInfo[] splits
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
split meta file (JobSplit.SplitMetaInfo)
JobSplit.TaskSplitMetaInfo[] splits
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
split meta file (JobSplit.SplitMetaInfo)
JobSplit.TaskSplitMetaInfo[] splits
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
1 map per split
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
Map<Node, List<TIP>>nonRunningMapCache
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
mapred.reduce.tasks
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
Other bookkeeping structures:
runningMapCache, nonLocalMaps, failedMaps, ...
+JobProfile, JobStatus
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Run by TaskTracker and are used to setup and to cleanup tasks
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
2 = One for map and other for reduce task
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
What code to run by TaskInProgress ?
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
What code to run by TaskInProgress ?User-defined
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
What code to run by TaskInProgress ?
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
What code to run by TaskInProgress ?
For setup and cleanup, specified by mapred.output.committer.classDefault: FileOutputCommitter
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
What code to run by TaskInProgress ?
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
job.initTasks()
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Done initializing:
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Done initializing:
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker Node
JobTracker
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Queue exists ? + User permissions
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
addJob()
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
addJob()
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
addJob()
Notify Listeners of the queue
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
addJob()
Wednesday, March 27, 13
JobSubmission at Job tracker node
Job tracker NodesubmitJob()
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
addJob()
Done submitting the job !!!
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster.
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster.
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)
- Multiple queue, each with different priority (VERY_HIGH, HIGH, ....)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)
- Multiple queue, each with different priority (VERY_HIGH, HIGH, ....)- User specifies job priority (mapred.job.priority)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)
- Multiple queue, each with different priority (VERY_HIGH, HIGH, ....)- User specifies job priority (mapred.job.priority)- Logic:
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)
- Multiple queue, each with different priority (VERY_HIGH, HIGH, ....)- User specifies job priority (mapred.job.priority)- Logic: First select queue with highest priority
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)
- Multiple queue, each with different priority (VERY_HIGH, HIGH, ....)- User specifies job priority (mapred.job.priority)- Logic: First select queue with highest priority Then FIFO within that queue
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanupJobQueueTaskScheduler
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanupJobQueueTaskScheduler
JIPListener
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanupJobQueueTaskScheduler
JIPListener
Callback jobAdded(JIP)
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanupJobQueueTaskScheduler
JIPListener
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
1. Calculate availableMapSlots
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobQueueTaskScheduler
List<Task> assignTasks(TaskTracker)
1. Calculate availableMapSlots
JobTracker
availableMapSlots = trackerCurrentMapCapacity � trackerRunningMaps
= min(dmapLoadFactor ⇤ trackerMapCapacitye, trackerMapCapacity)
� trackerRunningMaps
where,
trackerMapCapacity = taskTrackerStatus.getMaxMapSlots()
trackerRunningMaps = taskTrackerStatus.countMapTasks()
mapLoadFactor =
X
8jobsJIP’s numMapTask � finishedMapTask
clusterStatus.getMaxMapTasks()
TaskTrackerStatus
ClusterStatusJIPListener
JobInProgress (JIP)
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener getJobQueue() usesMap<JobSchedulingInfo, JIP> +
FIFO_JOB_QUEUE comparator
Process jobs in higher priority queue first
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Task t = job.findNewMapTask()
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Task t = job.findNewMapTask()
- Return task with most failures (not on given m/c) w/o locality (JIP’s failedMaps) - Return non-running tasks using locality info (JIP’s nonRunningMapCache)- Return speculative task
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Task t = job.findNewMapTask()
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Task t = job.findNewMapTask()
assignedTasks.add(t)
// Also, make sure there are free slots in cluster for speculative tasks
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Task t = job.findNewMapTask()
assignedTasks.add(t)
// Also, make sure there are free slots in cluster for speculative tasks
Do same thing for reducer
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Task t = job.findNewMapTask()
assignedTasks.add(t)
// Also, make sure there are free slots in cluster for speculative tasks
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
List<Task> assignTasks(TaskTracker)
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Task t = job.findNewMapTask()
assignedTasks.add(t)
// Also, make sure there are free slots in cluster for speculative tasks
return assignedTasks
Wednesday, March 27, 13
Task Scheduling
Job tracker Node
JobTracker QueueManagerqueueManager
JobInProgress (job)
JobSplit.TaskSplitMetaInfo[] splits
TaskInProgress[] maps
TaskInProgress[] reduces
Map<Node, List<TIP>>nonRunningMapCache
Set<TaskInProgress>nonRunningReduces
TaskInProgress[2]setup
TaskInProgress[2]cleanup
JobQueueTaskScheduler
JIPListener
for(i = 1 to availableMapSlots) {for(JIP job : JIPListener.getJobQ()) {
}}
Task t = job.findNewMapTask()
assignedTasks.add(t)
// Also, make sure there are free slots in cluster for speculative tasks
return assignedTasks
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Pools:
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Pools:
Min share: 30 slots 40 slots
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Pools:
Min share: 30 slots 40 slots
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Pools:Cluster: 100 slots available. Allocate them !
Min share: 30 slots 40 slots
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Pools:Cluster: 100 slots available. Allocate them !
Min share: 30 slots 40 slots
40 slots30 slots30 slots
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Pools:Cluster: 100 slots available. Allocate them !
Min share: 30 slots 40 slots
40 slots30 slots30 slots
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Pools:Cluster: 100 slots available. Allocate them !
Min share: 30 slots 40 slots
40 slots30 slots30 slots
15 15Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
Goal: Provide fast response time for small jobs and guaranteed service levels for productions jobs.
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Additional features: - Job weights for unequal sharing (based on priority or size) - Limits for #running jobs per user/pool
Usage:cp build/contrib/fairscheduler/*.jar libmapred.jobtracker.taskScheduler to o.a.h.m.FairSchedulermapred.fairscheduler.allocation.file to /path/pool.xml
Pools:Cluster: 100 slots available. Allocate them !
Min share: 30 slots 40 slots
40 slots30 slots30 slots
15 15Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
~ FairScheduler, queues instead of pools.
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
~ FairScheduler, queues instead of pools.Queue share % of cluster. Queue can have jobs of different priorities
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
~ FairScheduler, queues instead of pools.Queue share % of cluster. Queue can have jobs of different priorities
FIFO scheduling within each queue. Scheduling more deterministic than FairScheduler.
Wednesday, March 27, 13
TaskScheduler class• Used by JobTracker to schedule Task on TaskTracker.• Uses one or more JobInProgressListener to receive notifications about the jobs.• Uses ClusterStatus to get info about the state of cluster. • Methods:• start(), terminate(), refresh()• Collection<JobInProgress> getJobs(String queueName)• List<Task> assignTasks(TaskTracker)
• Implementations:• Specified by mapred.jobtracker.taskScheduler • Default: FIFO scheduler (o.a.h.mapred.JobQueueTaskScheduler)• Facebook’s FairScheduler• Yahoo’s CapacityScheduler
- Doesnot support preemption- Bad for production cluster (high priority can be misused)
~ FairScheduler, queues instead of pools.Queue share % of cluster. Queue can have jobs of different priorities
FIFO scheduling within each queue. Scheduling more deterministic than FairScheduler.
Also, unlike other 2, provides support for memory-based scheduling and preemption.
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTrackerjobClient
TaskScheduler
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTrackerjobClient
this.jobClient = (InterTrackerProtocol) UserGroupInformation.getLoginUser().doAs( new PrivilegedExceptionAction<Object>() { public Object run() throws IOException { return RPC.waitForProxy(InterTrackerProtocol.class, InterTrackerProtocol.versionID, jobTrackAddr, fConf); } });
TaskScheduler
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTrackerjobClient
jobClient.heartbeat(…);
this.jobClient = (InterTrackerProtocol) UserGroupInformation.getLoginUser().doAs( new PrivilegedExceptionAction<Object>() { public Object run() throws IOException { return RPC.waitForProxy(InterTrackerProtocol.class, InterTrackerProtocol.versionID, jobTrackAddr, fConf); } });
TaskScheduler
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTrackerjobClient
jobClient.heartbeat(…);
this.jobClient = (InterTrackerProtocol) UserGroupInformation.getLoginUser().doAs( new PrivilegedExceptionAction<Object>() { public Object run() throws IOException { return RPC.waitForProxy(InterTrackerProtocol.class, InterTrackerProtocol.versionID, jobTrackAddr, fConf); } });
TaskScheduler
List<Task> assignTasks(TaskTracker)
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTrackerjobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
this.jobClient = (InterTrackerProtocol) UserGroupInformation.getLoginUser().doAs( new PrivilegedExceptionAction<Object>() { public Object run() throws IOException { return RPC.waitForProxy(InterTrackerProtocol.class, InterTrackerProtocol.versionID, jobTrackAddr, fConf); } });
TaskScheduler
List<Task> assignTasks(TaskTracker)
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() {
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) {
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now);
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions();
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks();
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
Heartbeat protocol:- Periodic- Indicate health of TaskTracker
- Failure detection- Remote Procedure Call- Piggyback directives
- Launch a task- Perform cleanup/commit
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
TaskTracker uses 2 internal
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
TaskTracker uses 2 internal classes:
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
TaskTracker uses 2 internal classes: - TaskLauncher
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
TaskTracker uses 2 internal classes: - TaskLauncher
mapLauncher,reduceLauncher
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
TaskTracker uses 2 internal classes: - TaskLauncher
mapLauncher,reduceLauncher- TaskInProgress’s launchTask()
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
TaskTracker uses 2 internal classes: - TaskLauncher
mapLauncher,reduceLauncher- TaskInProgress’s launchTask()
Calls TaskRunner
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
HeartbeatResponse heartbeatResponse = jobClient.heartbeat(…);
List<Task> assignTasks(TaskTracker)
offerService() { while(is task tracker running flags) { HeartbeatResponse heartbeatResponse = transmitHeartBeat(now); TaskTrackerAction[] actions = heartbeatResponse.getActions(); // type: LaunchTaskAction, CommitTaskAction // or explicit cleanup directive markUnresponsiveTasks(); killOverflowingTasks(); // if low disk space: reduce first, then least progress }}
void run() { offerService();}
TaskTracker uses 2 internal classes: - TaskLauncher
mapLauncher,reduceLauncher- TaskInProgress’s launchTask()
Calls TaskRunner
TaskRunner
start()
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager.
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
Wednesday, March 27, 13
Task creation
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
Wednesday, March 27, 13
Task creation in little more detail
Job tracker Node Task tracker Node
JobTracker TaskTracker
TaskScheduler
jobClient
List<Task> assignTasks(TaskTracker)
void run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)- Pass JVM args and OS specific manipulations to TaskLog and then to o.a.h.util.Shell, which invokes JVM through java’s ProcessBuilder.
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)- Pass JVM args and OS specific manipulations to TaskLog and then to o.a.h.util.Shell, which invokes JVM through java’s ProcessBuilder.
Note, args for JVM already set by TaskRunner’s getJVMArgs(...)
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)- Pass JVM args and OS specific manipulations to TaskLog and then to o.a.h.util.Shell, which invokes JVM through java’s ProcessBuilder.
Note, args for JVM already set by TaskRunner’s getJVMArgs(...)- Default main class: Child.java
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)- Pass JVM args and OS specific manipulations to TaskLog and then to o.a.h.util.Shell, which invokes JVM through java’s ProcessBuilder.
Note, args for JVM already set by TaskRunner’s getJVMArgs(...)- Default main class: Child.java
Different JVM
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)- Pass JVM args and OS specific manipulations to TaskLog and then to o.a.h.util.Shell, which invokes JVM through java’s ProcessBuilder.
Note, args for JVM already set by TaskRunner’s getJVMArgs(...)- Default main class: Child.java
Different JVM
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)- Pass JVM args and OS specific manipulations to TaskLog and then to o.a.h.util.Shell, which invokes JVM through java’s ProcessBuilder.
Note, args for JVM already set by TaskRunner’s getJVMArgs(...)- Default main class: Child.java
Different JVM
Childvoid main(..) { .... }
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)- Pass JVM args and OS specific manipulations to TaskLog and then to o.a.h.util.Shell, which invokes JVM through java’s ProcessBuilder.
Note, args for JVM already set by TaskRunner’s getJVMArgs(...)- Default main class: Child.java
Different JVM
umbilicalChildvoid main(..) { .... }
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
- Launches a new “child” JVM per task using class JvmManager. - Why? Any bug in map/reduce don’t affect TaskTracker.
- Builds child JVM options using property mapred.java.child.opts (heapsize (max/initial), garbage collection options). Default: -Xmx200m
- To control additional processes by child JVM (eg: Hadoop Streaming), use property mapred.child.ulimit (limit of virtual memory)
- For short-lived tasks, reuse JVMs using mapred.job.reuse.jvm.num.tasks (default 1)
- Task for a given JVM: sequentially; but across JVMs: parallelly.
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
- TaskController pluggable through mapred.task.tracker.task-controller (DefaultTaskController or LinuxTaskController)- Creates directories for task (attempt, working, log)- Pass JVM args and OS specific manipulations to TaskLog and then to o.a.h.util.Shell, which invokes JVM through java’s ProcessBuilder.
Note, args for JVM already set by TaskRunner’s getJVMArgs(...)- Default main class: Child.java
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
TaskReporter
- Create TaskReporter that also uses umbilical object.
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
TaskReporter
- Create TaskReporter that also uses umbilical object.- Check if it is job/task setup/cleanup task.
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
TaskReporter
- Create TaskReporter that also uses umbilical object.- Check if it is job/task setup/cleanup task.
- If so, run their respective method and return.
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
TaskReporter
- Create TaskReporter that also uses umbilical object.- Check if it is job/task setup/cleanup task.
- If so, run their respective method and return.- Else, do Map/Reduce specific actions !!!
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
TaskReporter
- Create TaskReporter that also uses umbilical object.- Check if it is job/task setup/cleanup task.
- If so, run their respective method and return.- Else, do Map/Reduce specific actions !!!
- Perform commit operation if it is required.
Wednesday, March 27, 13
Task creation in little more detail
Task tracker Node
TaskTrackerjobClientvoid run() { offerService();}
TaskRunner
start()
LaunchTaskAction
void run() {
}
JvmManager
JvmRunnerrunChild() {..tracker.getTaskController().launchTask(...)..}
Different JVM
umbilicalChildvoid main(..) { .... }
MapTask or Reduce Taskrun(job, umbilical) {
}
TaskReporter
- Create TaskReporter that also uses umbilical object.- Check if it is job/task setup/cleanup task.
- If so, run their respective method and return.- Else, do Map/Reduce specific actions !!!
- Perform commit operation if it is required.- If speculative task, ensure only one of the duplicate task is committed.
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Sort/Spill
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Store output of map into in-memory circular buffer (MapOutputBuffer)
Sort/Spill
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Store output of map into in-memory circular buffer (MapOutputBuffer)- If no reducer, uses DirectMapOutputCollector instead, which writes immediately to disk.
Sort/Spill
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Store output of map into in-memory circular buffer (MapOutputBuffer)- If no reducer, uses DirectMapOutputCollector instead, which writes immediately to disk.- When buffer reaches certain threshold, a background thread MapOutputBuffer’s inner class SpillThread will start spilling the buffer to the disk (mapred.local.dir).
Sort/Spill
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Store output of map into in-memory circular buffer (MapOutputBuffer)- If no reducer, uses DirectMapOutputCollector instead, which writes immediately to disk.- When buffer reaches certain threshold, a background thread MapOutputBuffer’s inner class SpillThread will start spilling the buffer to the disk (mapred.local.dir).
- If specified, run combiner if at least 3 spill files (min.num.spills.for.combine)
Sort/Spill
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Store output of map into in-memory circular buffer (MapOutputBuffer)- If no reducer, uses DirectMapOutputCollector instead, which writes immediately to disk.- When buffer reaches certain threshold, a background thread MapOutputBuffer’s inner class SpillThread will start spilling the buffer to the disk (mapred.local.dir).
- If specified, run combiner if at least 3 spill files (min.num.spills.for.combine)- Before writing to disk, compress if mapred.compress.map.output is true.
Sort/Spill
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Store output of map into in-memory circular buffer (MapOutputBuffer)- If no reducer, uses DirectMapOutputCollector instead, which writes immediately to disk.- When buffer reaches certain threshold, a background thread MapOutputBuffer’s inner class SpillThread will start spilling the buffer to the disk (mapred.local.dir).
- If specified, run combiner if at least 3 spill files (min.num.spills.for.combine)- Before writing to disk, compress if mapred.compress.map.output is true.- Sort uses user-defined Comparator and Partitioner.
Sort/Spill
Wednesday, March 27, 13
Map-specific actions:
map
map
map
MapperInputFormat
mapper & input using ReflectionUtils.newInstance(...)
split 1
split 2
split 3
split 4
split 5
Build split using MapTask’s getSplitDetails(splitIndex, ...) + Use FileSystem/Deserializer from JobConf
For each key-value read from the split (through context.nextKeyValue()), call user-defined map
Store output of map into in-memory circular buffer (MapOutputBuffer)- If no reducer, uses DirectMapOutputCollector instead, which writes immediately to disk.- When buffer reaches certain threshold, a background thread MapOutputBuffer’s inner class SpillThread will start spilling the buffer to the disk (mapred.local.dir).
- If specified, run combiner if at least 3 spill files (min.num.spills.for.combine)- Before writing to disk, compress if mapred.compress.map.output is true.- Sort uses user-defined Comparator and Partitioner.
Sort/SpillFinal output: One sorted
partitioned file
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
<Partition, Key offset, Value offset>
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
- Spill before the data buffer is full. Tweak io.sort.record.percent using heuristic:
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
- Spill before the data buffer is full. Tweak io.sort.record.percent using heuristic:= 16 / (16 + avgRecordSize) ... (0.05 optimal if avgRecordSize ~ 300 byte)
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
- Spill before the data buffer is full. Tweak io.sort.record.percent using heuristic:= 16 / (16 + avgRecordSize) ... (0.05 optimal if avgRecordSize ~ 300 byte)
- See https://issues.apache.org/jira/browse/MAPREDUCE-64
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
- Spill before the data buffer is full. Tweak io.sort.record.percent using heuristic:= 16 / (16 + avgRecordSize) ... (0.05 optimal if avgRecordSize ~ 300 byte)
- See https://issues.apache.org/jira/browse/MAPREDUCE-64INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
- Spill before the data buffer is full. Tweak io.sort.record.percent using heuristic:= 16 / (16 + avgRecordSize) ... (0.05 optimal if avgRecordSize ~ 300 byte)
- See https://issues.apache.org/jira/browse/MAPREDUCE-64INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
2. Few but very large records filling up the data buffer
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
- Spill before the data buffer is full. Tweak io.sort.record.percent using heuristic:= 16 / (16 + avgRecordSize) ... (0.05 optimal if avgRecordSize ~ 300 byte)
- See https://issues.apache.org/jira/browse/MAPREDUCE-64INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
2. Few but very large records filling up the data buffer- Increase buffer size and also spill percent (~ 1). Key: Try to spill only once.
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
- Spill before the data buffer is full. Tweak io.sort.record.percent using heuristic:= 16 / (16 + avgRecordSize) ... (0.05 optimal if avgRecordSize ~ 300 byte)
- See https://issues.apache.org/jira/browse/MAPREDUCE-64INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
2. Few but very large records filling up the data buffer- Increase buffer size and also spill percent (~ 1). Key: Try to spill only once.- Tradeoff: Buffer takes memory from JVM (i.e. from mapred.child.java.opts). Therefore, if Max JVM =1GB and $1=128MB, then user code gets only 896MB.
Wednesday, March 27, 13
In-memory circular buffer
io.sort.mb (Default: 100MB = 104857600 bytes) = $1
$1 * io.sort.spill.percent (Default: 0.8)
$1 * io.sort.record.percent (Default: 0.05)
Record pointers
kvindices(1 int)
kvoffsets (3 ints)
Index buffer:
Partition buffer:
Avail data buffer: $1 * (1 - 0.05) * 0.8 = 79691776
INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
Max #records w/o spill: $1 * 0.05 / (4 ints * 4 bytes) = 327680
2 common cases for spilling:1. Lot of small records filling up the record buffer
- Spill before the data buffer is full. Tweak io.sort.record.percent using heuristic:= 16 / (16 + avgRecordSize) ... (0.05 optimal if avgRecordSize ~ 300 byte)
- See https://issues.apache.org/jira/browse/MAPREDUCE-64INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record full = true
2. Few but very large records filling up the data buffer- Increase buffer size and also spill percent (~ 1). Key: Try to spill only once.- Tradeoff: Buffer takes memory from JVM (i.e. from mapred.child.java.opts). Therefore, if Max JVM =1GB and $1=128MB, then user code gets only 896MB.
INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full = true
Wednesday, March 27, 13
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
Wednesday, March 27, 13
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
Wednesday, March 27, 13
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskTracker (map-side)
mapping info
Wednesday, March 27, 13
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskTracker (map-side)
mapping info
TaskTracker (reduce-side)
TaskTracker (reduce-side)
JobTracker
thru heartbeat
Wednesday, March 27, 13
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskTracker (map-side)
mapping info
TaskTracker (reduce-side)
TaskTracker (reduce-side)
JobTracker
thru heartbeat
Reducers know which machines to fetch data from.
Wednesday, March 27, 13
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskTracker (map-side)
mapping info
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskTracker (map-side)
mapping info
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
TaskTracker (map-side)
mapping info
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
TaskTracker (map-side)
mapping info
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskTaskTracker (map-side)
mapping info
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
ReduceCopierfetchOutput() {
}
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
ReduceCopierfetchOutput() {
}
MapOutputCopier
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
ReduceCopierfetchOutput() {
}
MapOutputCopier
HttpServer
MapOutputServlet
- Get output using HTTP
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
ReduceCopierfetchOutput() {
}
MapOutputCopier
HttpServer
MapOutputServlet
- Get output using HTTP
- mapred.reduce.parallel.copies: #MapOutputCopier threads (i.e. # fetches in parallel on each reduce task)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
ReduceCopierfetchOutput() {
}
MapOutputCopier
HttpServer
MapOutputServlet
- Get output using HTTP
- mapred.reduce.parallel.copies: #MapOutputCopier threads (i.e. # fetches in parallel on each reduce task)
- Default: 5
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
ReduceCopierfetchOutput() {
}
MapOutputCopier
HttpServer
MapOutputServlet
- Get output using HTTP
- mapred.reduce.parallel.copies: #MapOutputCopier threads (i.e. # fetches in parallel on each reduce task)
- Default: 5
- tasktracker.http.threads: #clients HttpServer will service
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
ReduceCopierfetchOutput() {
}
MapOutputCopier
HttpServer
MapOutputServlet
- Get output using HTTP
- mapred.reduce.parallel.copies: #MapOutputCopier threads (i.e. # fetches in parallel on each reduce task)
- Default: 5
- tasktracker.http.threads: #clients HttpServer will service- Default: 40
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
TaskTracker (map-side)
mapping info
ReduceCopierfetchOutput() {
}
MapOutputCopier
HttpServer
MapOutputServlet
- Get output using HTTP
- mapred.reduce.parallel.copies: #MapOutputCopier threads (i.e. # fetches in parallel on each reduce task)
- Default: 5
- tasktracker.http.threads: #clients HttpServer will service- Default: 40- Mapreduce2 will use Netty (2x #processors)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
TaskTracker (map-side)
mapping info
HttpServer
MapOutputServlet
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
TaskTracker (map-side)
mapping info
HttpServer
MapOutputServlet
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform “in-memory merge” if
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform (interleaved) “on-disk merge” if
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform (interleaved) “on-disk merge” if- #files on disk > 2*io.sort.factor - 1 (fairly rare)
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform (interleaved) “on-disk merge” if- #files on disk > 2*io.sort.factor - 1 (fairly rare)Eg: 50 files and io.sort.factor = 10
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform (interleaved) “on-disk merge” if- #files on disk > 2*io.sort.factor - 1 (fairly rare)Eg: 50 files and io.sort.factor = 105 rounds of merging, 10 files at a time*
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform (interleaved) “on-disk merge” if- #files on disk > 2*io.sort.factor - 1 (fairly rare)Eg: 50 files and io.sort.factor = 105 rounds of merging, 10 files at a time*
Merge
SORT
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform (interleaved) “on-disk merge” if- #files on disk > 2*io.sort.factor - 1 (fairly rare)Eg: 50 files and io.sort.factor = 105 rounds of merging, 10 files at a time*
Merge
SORT
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Finally, spills in-memory data to disk. Why ?
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform (interleaved) “on-disk merge” if- #files on disk > 2*io.sort.factor - 1 (fairly rare)Eg: 50 files and io.sort.factor = 105 rounds of merging, 10 files at a time*
Merge
SORT
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Finally, spills in-memory data to disk. Why ?- Assumes user reduce() needs all the RAM.
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTaskif(mapred.job.tracker != local)
ReduceCopierfetchOutput() {
}
MapOutputCopier
Is map output size < ShuffleRamManager’s MaxSingleShuffleLimit ?- Yes: Keep output in memory- No: Write it to disk
MaxSingleShuffleLimit = mapred.child.java.opts’s -Xmx * mapred.job.shuffle.input.buffer.percent (default: 0.7) * 0.25f
INFO org.apache.hadoop.mapred.ReduceTask: Shuffling ? bytes (? raw bytes) into (RAM/Local-FS) from attempt_?
LocalFSMerger
InMemFSMergeThread
Perform (interleaved) “on-disk merge” if- #files on disk > 2*io.sort.factor - 1 (fairly rare)Eg: 50 files and io.sort.factor = 105 rounds of merging, 10 files at a time*
Merge
SORT
Perform “in-memory merge” if- Used memory > (-Xmx * 0.7) * mapred.job.shuffle.merge.percent (default: 0.66)- Or #map outputs > mapred.inmem.merge.threshold (default: 1000)
Finally, spills in-memory data to disk. Why ?- Assumes user reduce() needs all the RAM.- Can tweak it using mapred.job.reduce.input.buffer.percent (default: 0) to ~ 0.7, if simple reducer.
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTask
Merge
SORT
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTask
Merge
SORT
Use RawKeyValueIterator and
Wednesday, March 27, 13
reduce
reduce
Sort/Spill
Reduce-specific actions:
map
map
map
MapperInputFormat
split 1
split 2
split 3
split 4
split 5
TaskStatus.Phase.
Fetch
SHUFFLE
ReduceTask
Merge
SORT
call user-defined Reducer class.
part-0
part-1
Reducer OutputFormat
REDUCE
Use RawKeyValueIterator and
Wednesday, March 27, 13
References- Hadoop - The definitive guide 3rd edition by Tom White.- Hadoop Operations by Eric Sammers.- Data-Intensive Text Processing by Jimmy Lin and Chris Dyers.- Mining of Massive Datasets by Rajaraman et al.- Online Aggregation for Large MapReduce Jobs by Pansare et al.- Distributed and Cloud Computing by Hwang et al.- http://developer.yahoo.com/hadoop/tutorial/- http://www.slideshare.net/cloudera/mr-perf- http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html- http://www.cs.rice.edu/~fd2/pdf/hpdc106-dinu.pdf
Wednesday, March 27, 13
Recommended