View
119
Download
1
Category
Preview:
Citation preview
Apache Falcon
DevOps
Sanjeev Tripurari
Tech Lead Operationsinmobi
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop
clusters
httpfalconapacheorg
Whatrsquos on GRID
usersanjeev
usermohit
userIliyas
projectsmeetup
projectssupport
datastreamclick
datastreambeacon
Basic Components
Falcon
bull Prism
bull Server
bull Client
ActiveMQ
Oozie
Hadoop
Whatrsquos in for DevOps
Cluster
NameNode JT Oozie ActiveMQ Colo
Feed
Data DataPath Lifetime Retention OwnerReplication
Process
Job Queue Priority Parallelism Input Output Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR12
Server
Oozie ActiveMQ
HDFS - MR12
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command
falcon entity -submit -type [clusterfeedprocess] -file cluster-definitionxml
falcon entity -list -type [clusterfeedprocess]
Cluster
bull Submit
bull Delete
falcon entity -list -type [feedprocess] -name [processnamefeedname] -[OPTIONS]
FeedProcess OPTIONS
bull schedule
bull Status
bull list
bull touch
bull depedency
bull definition
bull update
bull delete
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop
clusters
httpfalconapacheorg
Whatrsquos on GRID
usersanjeev
usermohit
userIliyas
projectsmeetup
projectssupport
datastreamclick
datastreambeacon
Basic Components
Falcon
bull Prism
bull Server
bull Client
ActiveMQ
Oozie
Hadoop
Whatrsquos in for DevOps
Cluster
NameNode JT Oozie ActiveMQ Colo
Feed
Data DataPath Lifetime Retention OwnerReplication
Process
Job Queue Priority Parallelism Input Output Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR12
Server
Oozie ActiveMQ
HDFS - MR12
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command
falcon entity -submit -type [clusterfeedprocess] -file cluster-definitionxml
falcon entity -list -type [clusterfeedprocess]
Cluster
bull Submit
bull Delete
falcon entity -list -type [feedprocess] -name [processnamefeedname] -[OPTIONS]
FeedProcess OPTIONS
bull schedule
bull Status
bull list
bull touch
bull depedency
bull definition
bull update
bull delete
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Whatrsquos on GRID
usersanjeev
usermohit
userIliyas
projectsmeetup
projectssupport
datastreamclick
datastreambeacon
Basic Components
Falcon
bull Prism
bull Server
bull Client
ActiveMQ
Oozie
Hadoop
Whatrsquos in for DevOps
Cluster
NameNode JT Oozie ActiveMQ Colo
Feed
Data DataPath Lifetime Retention OwnerReplication
Process
Job Queue Priority Parallelism Input Output Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR12
Server
Oozie ActiveMQ
HDFS - MR12
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command
falcon entity -submit -type [clusterfeedprocess] -file cluster-definitionxml
falcon entity -list -type [clusterfeedprocess]
Cluster
bull Submit
bull Delete
falcon entity -list -type [feedprocess] -name [processnamefeedname] -[OPTIONS]
FeedProcess OPTIONS
bull schedule
bull Status
bull list
bull touch
bull depedency
bull definition
bull update
bull delete
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Basic Components
Falcon
bull Prism
bull Server
bull Client
ActiveMQ
Oozie
Hadoop
Whatrsquos in for DevOps
Cluster
NameNode JT Oozie ActiveMQ Colo
Feed
Data DataPath Lifetime Retention OwnerReplication
Process
Job Queue Priority Parallelism Input Output Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR12
Server
Oozie ActiveMQ
HDFS - MR12
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command
falcon entity -submit -type [clusterfeedprocess] -file cluster-definitionxml
falcon entity -list -type [clusterfeedprocess]
Cluster
bull Submit
bull Delete
falcon entity -list -type [feedprocess] -name [processnamefeedname] -[OPTIONS]
FeedProcess OPTIONS
bull schedule
bull Status
bull list
bull touch
bull depedency
bull definition
bull update
bull delete
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Whatrsquos in for DevOps
Cluster
NameNode JT Oozie ActiveMQ Colo
Feed
Data DataPath Lifetime Retention OwnerReplication
Process
Job Queue Priority Parallelism Input Output Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR12
Server
Oozie ActiveMQ
HDFS - MR12
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command
falcon entity -submit -type [clusterfeedprocess] -file cluster-definitionxml
falcon entity -list -type [clusterfeedprocess]
Cluster
bull Submit
bull Delete
falcon entity -list -type [feedprocess] -name [processnamefeedname] -[OPTIONS]
FeedProcess OPTIONS
bull schedule
bull Status
bull list
bull touch
bull depedency
bull definition
bull update
bull delete
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR12
Server
Oozie ActiveMQ
HDFS - MR12
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command
falcon entity -submit -type [clusterfeedprocess] -file cluster-definitionxml
falcon entity -list -type [clusterfeedprocess]
Cluster
bull Submit
bull Delete
falcon entity -list -type [feedprocess] -name [processnamefeedname] -[OPTIONS]
FeedProcess OPTIONS
bull schedule
bull Status
bull list
bull touch
bull depedency
bull definition
bull update
bull delete
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command
falcon entity -submit -type [clusterfeedprocess] -file cluster-definitionxml
falcon entity -list -type [clusterfeedprocess]
Cluster
bull Submit
bull Delete
falcon entity -list -type [feedprocess] -name [processnamefeedname] -[OPTIONS]
FeedProcess OPTIONS
bull schedule
bull Status
bull list
bull touch
bull depedency
bull definition
bull update
bull delete
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Falcon Entity Operation
Command
falcon entity -submit -type [clusterfeedprocess] -file cluster-definitionxml
falcon entity -list -type [clusterfeedprocess]
Cluster
bull Submit
bull Delete
falcon entity -list -type [feedprocess] -name [processnamefeedname] -[OPTIONS]
FeedProcess OPTIONS
bull schedule
bull Status
bull list
bull touch
bull depedency
bull definition
bull update
bull delete
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Falcon Clusterltxml version=10 encoding=UTF-8 standalone=yesgt
ltcluster name=uk-clusterAlpha description= colo=uk xmlns=urifalconcluster01gt
ltinterfacesgt
ltinterface type=readonly endpoint=ldquohftpnnclustermycom50070rdquo version=0202-cdh3u3gt
ltinterface type=write endpoint=ldquohdfsnnclustermycom8020rdquo version=0202-cdh3u3gt
ltinterface type=execute endpoint=ldquojtclustermycom8021 version=0202-cdh3u3gt
ltinterface type=workflow endpoint=ldquohttpoozieclustermycom11000oozie version=316gt
ltinterface type=messaging endpoint=ldquotcpamqclustermycom61616daemon=truerdquo version=543gt
ltinterfacesgt
ltlocationsgt
ltlocation name=staging path=storefalconstaginggt
ltlocation name=temp path=tmpgt
ltlocation name=working path=storefalconworkinggt
ltlocationsgt
ltpropertiesgt
ltproperty name=coloname value=ukgt
ltpropertiesgt
ltclustergt
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Falcon Feedltfeed description=input feed name=uk-inputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtinputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconinput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Falcon Feedltfeed description=input feed name=uk-outputfeed xmlns=urifalconfeed01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltgroupsgtoutputltgroupsgt
ltfrequencygthours(1)ltfrequencygt
ltlate-arrival cut-off=hours(6) gt
ltclustersgt
ltcluster name=uk-clusterAlpha type=sourcegt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltretention limit=hours(24) action=delete gt
ltclustergt
ltclustersgt
ltlocationsgt
ltlocation type=data path=usersanjeevfalconoutput$YEAR$MONTH$DAY$HOUR gt
ltlocationsgt
ltACL owner=sanjeev group=users permission=0x755 gt
ltschema location=none provider=none gt
ltfeedgt
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Falcon Processltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=falcon-sanjeev-process xmlns=urifalconprocess01 xmlnsxsi=httpwwww3org2001XMLSchema-instancegt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-02-20T1800Z end=2015-02-23T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt1ltparallelgt
ltfrequencygthours(1)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput end=today(180) start=today(180) feed=uk-inputfeed name=input gt
ltinputsgt
ltoutputsgt
ltoutput instance=now(00) feed=uk-outputfeed name=output gt
ltoutputsgt
ltpropertiesgt
ltproperty name=fileTime value=$formatTime(dateOffset(instanceTime() 1 DAY) yyyy-MMM-dd)gt
ltproperty name=user value=$user()gt
ltproperty name=baseTime value=$today(00)gt
ltpropertiesgt
ltworkflow engine=oozie path=usersanjeevfalconworkflow gt
ltretry policy=periodic delay=minutes(10) attempts=3 gt
ltprocessgt
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Oozie Workflow
ltworkflow-app xmlns=urioozieworkflow03 name=fs-workflowgt
ltstart to=fs-cmdsgt
ltaction name=fs-cmdsgt
ltfsgt
ltmkdir path=$outputgt
ltfsgt
ltok to=endgt
lterror to=failgt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error message[$wferrorMessage(wflastErrorNode())]ltmessagegt
ltkillgt
ltend name=endgt
ltworkflow-appgt
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Whatrsquos on HDFSInput Feed usersanjeevfalconinput2015022000
Input Feed usersanjeevtfalconinput2015022018
Output Feed usersanjeevtfalconoutput
Workflow usersanjeevtfalconworkflowworkflowxml
falcon entity -type cluster -submit -file uk-clusterAlphaxml
falcon entity -type feed -submit -file uk-inputfeedxml
falcon entity -type feed -submit -file uk-outputfeedxml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-processxml
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Typical Production Process and Workflow(process) process-click-convert
ltxml version=10 encoding=UTF-8 standalone=yesgt
ltprocess name=process-click-convert xmlns=urifalconprocess01gt
ltclustersgt
ltcluster name=uk-clusterAlphagt
ltvalidity start=2015-01-15T0000Z end=2100-01-01T0000Zgt
ltclustergt
ltcluster name=us-clusterGammagt
ltvalidity start=2015-01-15T0030Z end=2100-01-01T0000Zgt
ltclustergt
ltclustersgt
ltparallelgt2ltparallelgt
ltordergtFIFOltordergt
ltfrequencygtminutes(30)ltfrequencygt
lttimezonegtUTClttimezonegt
ltinputsgt
ltinput name=Input feed=feed-click-stream start=now(0-30) end=now(0-1)gt
ltinputsgt
ltoutputsgt
ltoutput name=Output feed=feed-click-convert instance=now(0-30)gt
ltoutputsgt
ltpropertiesgt
ltproperty name=queueName value=streamgt
ltproperty name=jobPriority value=NORMALgt
ltpropertiesgt
ltworkflow path=projectssupportclickconversion lib=projectssupportlibgt
ltprocessgt
(workflow) projectssupportclickconversionworkflowxml
ltworkflow-app xmlns=urioozieworkflow03 name=click-conversiongt
ltstart to=click-convert gt
ltaction name=click-convertgt
ltjavagt
ltjob-trackergt$jobTrackerltjob-trackergt
ltname-nodegt$nameNodeltname-nodegt
ltpreparegt
ltdelete path=$Outputgt
ltdelete path=$wfconf(Outputstats)gt
ltdelete path=$wfconf(Outputtmp)gt
ltpreparegt
ltconfigurationgt
ltpropertygt
ltnamegtmapredjobqueuenameltnamegt
ltvaluegt$queueNameltvaluegt
ltpropertygt
ltpropertygt
ltnamegtmapredjobpriorityltnamegt
ltvaluegt$jobPriorityltvaluegt
ltpropertygt
ltconfigurationgt
nnclustermycom
ltmain-classgtcommyclusterioDriverltmain-classgt
ltarggt-inputpathltarggtltarggt$Inputltarggt
ltarggt-outputpathltarggtltarggt$Outputltarggt
ltarggt-statspathltarggtltarggt$wfconf(Outputstats)ltarggt
ltarggt-stagingpathltarggtltarggt$wfconf(Outputtmp)ltarggt
ltjavagt
ltok to=end gt
lterror to=fail gt
ltactiongt
ltkill name=failgt
ltmessagegtWorkflow failed error
message[$wferrorMessage(wflastErrorNode())]
ltmessagegt
ltkillgt
ltend name=end gt
ltworkflow-appgt
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Falcon Instance
Operation
Command
falcon instance -type [feedprocess] -[statuslist]
falcon entity -list -type [feedprocess] -name [processnamefeedname] -start
YYYY-MM-DDTHHMMZ -end YYYY-MM-DDTHHMMZ [OPTIONS]
FeedProcess OPTIONS
bull status
bull list
bull logs
bull kill
bull rerun
bull suspend
bull resume
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Monitoring
bull Falcon CLI
bull Oozie CLI
bull ActiveMQ
bull falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
bull falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T1800Z - end
2015-02-23T0000Z -status
Consolidated Status SUCCEEDED
Instances
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T1800Z uk-clusterAlpha - SUCCEEDED 2015-02-20T1800Z 2015-02-20T1801Z - httpoozieclustermycom11000ooziejob=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
MonitoringDashboardbull httpsgithubcomajayyadavfalcon-dashboard
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
OnBoarding Pipeline
bull Group All Process
bull Minutely Hourly Daily Weekly Monthly
bull Group Related Feeds
bull Verify All process jars workflows pushed to cluster
bull Verify ownerships of all feed and process directories
bull Verify owners have job scheduling access roles in particular cluster
bull Validate the feeds
bull Submit and schedule the feeds so retention and replication is in place
bull Dryrun the process schedule
bull Submit and schedule the process
bull Document the FEED SLA HDFS Usage retention period for
monitoring
bull Document the PROCESS SLA to observe delays
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Challenges
bull Tightly Integrated with Oozie
bull Monitoring onboarding needs streamlined
bull Realtime change in Schedule Time Queues
Advantagesbull Development is very aggressive
bull Industry is adopted quickly
bull Once onboarded focus only needs to be on set of critical process
bull Easy shutdown and upgrade as all the running jobs are managed by oozie
bull DevOps can do easy setup and manage data
Thank You
Thank You
Recommended