Upload
mislam77
View
1.118
Download
1
Embed Size (px)
DESCRIPTION
Presented at HAdoop Summit
Citation preview
Oozie:Scheduling WorkflowsOntheGrid
MohammadKIslamkamrul@yahoo‐inc.com
Agenda• OozieOverview• Oozie3.xfeatures:– Bundle– Scalability– Usability
• Challenges• FuturePlan• Q&A
Overview:Workflow• OozieexecutesworkflowdefinedasDAGofjobs.• Thejobtypeincludes:Map‐Reduce/Pipes/Streaming/Pig/CustomJavaCodeetc.
• IntroducedinOozie1.x.
startM/Rjob
M/Rstreaming
job
decision
fork
Pigjob
M/Rjob
join
end JavaFSjob
ENOUGH
MORE
Overview:Coordinator• Oozieexecutesworkflowbasedon:– TimeDependency(Frequency)– DataDependency
• IntroducedinOozie2.x.
Hadoop
OozieServer
OozieClient
OozieWorkflow
WSAPI OozieCoordinator
CheckDataAvailability
Oozie3.x:Bundle• Usercandefineandexecutea bunch of coordinatorapplica\ons.
• Usercouldstart/stop/suspend/resume/rerun inthebundlelevel.
• Benefits:Easytomaintainandcontrollargedatapipelinesapplica\onsforServiceEngineeringteam.
Hadoop
OozieServer
OozieClient
Workflow
WSAPI
Coordinator
CheckDataAvailability
Bundle
OozieAbstracNonLayers
Coord Action 1
Coord Action 2
Coord Action1
Coord Action 2
WF Job 1 WF Job 2 WF Job 2
M/R Job
PIG Job
FS Job
M/R Job
PIG Job
Bundle Layer1
Coord Job 1 Coord Job 2
Layer2
WF Job 1
Layer3
EnhancedStabilityandScalability
• Issue:– Atveryhighload,Ooziebecomesslow.– 90%ofthetotalOoziesupportincidence.
• Reason:– Lotofac\vebutnon‐progressingjobs.– Oozieinternalqueueisfull.
• Resolu\on:– Throclethenumberofac\vejobs/coordinator– Putthejobinto\meoutstate.– Enforcetheuniquenessforooziequeueelement.
ImprovedUsability
• Issue:– Coordinatorjob’sstatusisnotintui\veandcausesconfusiontotheOozieuser.
• Reason:– StatusSUCCEEDEDdoesn’tmeanjobissuccessful!!
– StatusPREMATERisforoozieinternaluseonly.Butitwasexposedtouser.
• Resolu\on:– RedesignCoordinatorstatus
CoordinatorStatusRedesign
PREP Running
KILLED
SUCCEEDED
FAILED
DONE_WITH_ERROR
SUSPENDED
PAUSED
Current
New
PREP PREMATER Running
KILLED
SUCCEEDED
FAILED
SUSPENDED
PREMATER SUCCEEDED
TheSecondYear...• NumberofReleases– FeatureReleases:3– Patches:9
• Backward compa5bility isstronglymaintained.
• NoneedtoresubmitthejobifOozieisrestarted.
• CodeOverhaul:– Re‐designedthecommandpacerntoavoidDBconnec\onleaksandtoimproveDBconnec\onsusages.
OozieUsages• Y!internalusages:– Totalnumberofuser:377
– Totalnumberofprocessedjobs≈600K/month
• Externaldownloads:– 1500+inlast8monthsfromGithub– Alargenumberofdownloadsmaintainedby3rdpartypackaging.
OozieUsagesCont.
• UserCommunity:– Membership• Y!internal‐265• External–163
– Message(approximate):• Y!internal–9/day• External–7/day
Challenges1:DataAvailabilityCheck
• Issue:– Currentlychecksdirectoryineveryminute(polling based).
– IncreasesNNoverheadanddoes not scale well.• Reason:Nometa‐datasystemwithappropriateno\fica\onsmechanism.
• Plannedresolu\on:IncorporatewithHCatalogmetadatasystem.
Challenges2:AdaptabilitytoHadoop
• Issues:IfHadoopNNorJTisdown,Ooziesubmitsjobandobviouslyfails.Userinterven\onisrequiredwhenHadoopserverisback.
• Impact:InconvenientforOozieuser.Forexample,ifHadoopisrestartedonFridaynight,jobwillnotrunun\lnextMonday.
• PlannedResolu\on:GracefulhandlingofHadoopdown\me:– IfHadoopisdown,blocksubmission.– WhenHadoopbecomesavailable
• Submittheblockedjob• Auto‐resubmittheuntracedjob.
Challenges3:HorizontallyScalable
• Issues:OneinstanceofOoziecouldnotefficientlyhandleaverylargenumberofjobs(say100K/hours).Inaddi\on,Ooziedoesn’tsupportloadbalancing.
• Reason:Oozieinternaltaskqueueisnotsynchronizedacrossmul\pleOozieinstances.
• PlannedResolu\on:UseZookeeperforcoordina\on.• Benefits:Astheloadincreases,addextraOozieserver.
FuturePlan
• AutomaNcFailover:UsingZooKeeper.• Monitoring:RichWSAPIforapplica\onMonitoring/Aler\ng.
• ImprovedUsability:– Distcpac\on– HiveAc\on
• Asynchronousdataprocessing.• Incrementaldataprocessing.• ApacheMigraNon:Worksini\ated.
Q&A
MohammadKIslam
kamrul@yahoo‐inc.com
• Githublink:hcp://yahoo.github.com/oozie• Mailinglist:[email protected]
BackupSlides
OozieWorkflowApplica\on• Contents– Aworkflow.xmlfile– Resourcefiles,configfilesandPigscripts– AllnecessaryJARandna\velibraryfiles
• Parameters– Theworkflow.xml,isparameterized,parameterscanbepropagatedtomap-reduce,pig &sshjobs
• Deployment– InadirectoryintheHDFSoftheHadoopclusterwheretheHadoop&Pigjobswillrun
19
OoziecmdRunningaWorkflowJob
WorkflowApplicaNonDeployment
$ hadoop fs –mkdir hdfs://usr/tucu/wordcount-wf $ hadoop fs –mkdir hdfs://usr/tucu/wordcount-wf/lib $ hadoop fs –copyFromLocal workflow.xml wordcount.xml hdfs://usr/tucu/wordcount-wf $ hadoop fs –copyFromLocal hadoop-examples.jar hdfs://usr/tucu/wordcount-wf/lib $
WorkflowJobExecuNon
$ oozie run -o http://foo.corp:8080/oozie \ -a hdfs://bar.corp:9000/usr/tucu/wordcount-wf \
input=/data/2008/input output=/data/2008/output
Workflow job id [1234567890-wordcount-wf] $
WorkflowJobStatus
$ oozie status -o http://foo.corp:8080/oozie -j 1234567890-wordcount-wf Workflow job status [RUNNING]
... $
20