Upload
amazon-web-services
View
917
Download
0
Embed Size (px)
Citation preview
Amazon Simple Work Flow Engine (SWF): How Beamr uses SWF for video optimization in the cloud
Dan Julius, VP R&D, Beamr
There are few times when something new comes along that makes me think the following:
- Really? Nahh...- Hmm.. Maybe..- OMG!! How the f*** does it do that?- GIVE IT TO ME NOW!!
“Lens In The Face
- Optimized ±5B Photos in 2015
- Reduces JPEG file size from 20%-80%
- Community of tens of thousands photographers
- Free trial available at jpegmini.com
- Enterprise and consumer products
SOURCE1080p @ 3.5 mbps
BEAMR VIDEO1080p @ 2.1 mbps
*Images are courtesy of Universal Studios
REDUCED BY 40%
BEAMR KEY FEATURES
5©2016 Beamr Imaging Ltd. All Rights Reserved
Bitrate reduction Always safe Fully automatic Standard compliant
Significant reduction. in bitrate or file size
Retains original image or video
quality
No additional QC steps required
No modification needed for existing players
Agenda
6©2016 Beamr Imaging Ltd. All Rights Reserved 5 5
1. Why workflows?
2. SWF Concepts
3. Tips and Gotchas
4. Q&A
For Python → Ask me later
Java/Ruby → Checkout Flow Framework
No Code
Activity Workers
©2016 Beamr Imaging Ltd. All Rights Reserved
Activity WorkersResponsible for doing the “work”
Typically a long-running process
Poll queue task-list → Do work
While True:
Poll an SWF queue task-list
Process the task
Return result to SWF
12
Deciders
©2016 Beamr Imaging Ltd. All Rights Reserved
DECIDERS
Responsible for making workflow decisions
Typically a long-running process
While True:
Poll task-list for a decision
Analyze execution history
Make a decision on next step(s)
Return result to SWF
13
SWF Responsibilities
©2016 Beamr Imaging Ltd. All Rights Reserved
Coordinate system components
Schedule “decision tasks” (a queue?)
Schedule “activity tasks” (a queue?)
Maintain workflow state
Catch errors and timeouts
Provide an API to track workflow-execution progress
Does No work. Makes No decisions.
14
Example of Two-Task Workflow Life Cycle
15©2016 Beamr Imaging Ltd. All Rights Reserved
SWF Decider Worker A Worker B
start execution
new exc. what to do?
Schedule A
starting...
A task for you...
Completed. Result is...
New history. what now?Schedule B
A task for you...
Completed. Result is...
New history. What now?Close. Result is...
Are we done?
no
Are we done?
YesSTART ENDTask A Task B
Tips for Activity Workers
©2016 Beamr Imaging Ltd. All Rights Reserved
Do only one thing
Be Stateless
Catch all exceptions and return failure
Send heartbeats (and check responses)
While True:
Poll an SWF task-list
Process the task
Return result to SWF
16
Tips for Deciders
©2016 Beamr Imaging Ltd. All Rights Reserved
Focus on Decision Making. Avoid doing any work.
Be Stateless
Decide based on entire* execution history
While True:
Poll SWF for an execution
Analyze execution history
Make a decision on next step(s)
Return result to SWF
Parallelize tasks when appropriate by returning multiple decisions.
Catch all exceptions and always return a valid decision
Expect the unexpected... (e.g. a decision may be rejected)
17
The Execution History
©2016 Beamr Imaging Ltd. All Rights Reserved
[ { "eventId": 11, "eventTimestamp": 1326671603.102, "eventType": "WorkflowExecutionTimedOut", ... }, ... { "activityTaskScheduledEventAttributes": { "activityId": "verification-27", "activityType": { "name": "activityVerify", "version": "1.0" }, ….. "input": "5634-0056-4367-0923,12/12,437", ... }, "eventId": 8, ... "eventType": "ActivityTaskScheduled" },...{ ... "eventId": 2, "eventTimestamp": 1326668003.094, "eventType": "DecisionTaskScheduled" }]
The workflow-execution “state”
It’s just json
Every event is notedDecisions are notedScheduled tasks are notedStart and Finish are noted…
How to make a decision?- Read the execution history- Apply some “logic”- Return one or more decisions
18
CLI Demo
©2016 Beamr Imaging Ltd. All Rights Reserved
A distributed “Image Processing” workflow
Prerequisites:
- AWS Account, IAM Credentials- AWSCLI- bash / jq / ImageMagic / JPEGmini
20
CLI Demo - Registration
©2016 Beamr Imaging Ltd. All Rights Reserved
Register Domain / Workflow / Activity Types
We use “code” to configure SWF. No CloudFormation available.LOFT_Domain
demo (Domain)
Process Media
Ingest
CleanupOptimize
Thumbnail
21
CLI Demo - Process Media Workflow
©2016 Beamr Imaging Ltd. All Rights Reserved
START ENDIngest Cleanup
Optimize
Thumbnail
Download from Web to EFS
Process files on EFSUpload results to S3
Remove files form EFS
22
Some Advanced Topics
©2016 Beamr Imaging Ltd. All Rights Reserved
Child Workflows- Simplify large workflows- Reuse child-workflows
Signals- Sent from somewhere (external / activity)- SWF will schedule a Decision-Task
24
[ ... { "eventId": 153, "eventTimestamp": 1326671603.102, "eventType": "WorkflowExecutionSignaled", "workflowExecutionSignaledEventAttributes"{ ... } ... }, ...]
High Availability
©2016 Beamr Imaging Ltd. All Rights Reserved
REST API
At least two deciders, and two workers of each type, across multiple zones
25
The Execution History
©2016 Beamr Imaging Ltd. All Rights Reserved
Beware of the “LastEvent” samplesNote the previousStartedEventId
When then History gets long - Note the nextPageToken.
Pages are small (100 items). Beware of rate limits.
- Cache history pages- Use ChildWorkflows or ContinueAsNewWorkflow- Use short names in input and results fields
[ { "eventId": 11, "eventTimestamp": 1326671603.102, "eventType": "WorkflowExecutionTimedOut", ... }, ... { "activityTaskScheduledEventAttributes": { "activityId": "verification-27", "activityType": { "name": "activityVerify", "version": "1.0" }, ….. "input": "5634-0056-4367-0923,12/12,437", ... }, "eventId": 8, ... "eventType": "ActivityTaskScheduled" },...{ ... "eventId": 2, "eventTimestamp": 1326668003.094, "eventType": "DecisionTaskScheduled" }]
26
Monitoring and Scaling
©2016 Beamr Imaging Ltd. All Rights Reserved
Processes to Monitoring- Monitor each decider process- Monitor each worker process
Metrics to Monitor- Activity task list sizes- Decider task list sizes- DecisionTaskScheduleToStartTime - ActivityTaskScheduleToStartTime
Scale Workers with Spot instances - Use short-timeouts- Heartbeats
27
Things Will Break...
©2016 Beamr Imaging Ltd. All Rights Reserved
API calls may fail, therefore backoff and retry- Rate limiting- Network errors
Expect the unexpected:- Decisions or Responses may be rejected
- Closing a workflow execution could fail due to pending signals- An Activity Worker response may fail due to cancelled execution
- Events may be aggregated, or delivered out of order- Task lists are only mostly FIFO
28
Dan JuliusVP R&D, [email protected]
Workflows vs. Messages
©2016 Beamr Imaging Ltd. All Rights Reserved
A complex workflow requires more than just passing
messages from task to task…
Workflows have “state”
Tasks might need to be synchronized
Tasks can fail or timeout
32
How SWF Works (maybe)
©2016 Beamr Imaging Ltd. All Rights Reserved
SWF
Worker A Worker B
Decidertask-list
task-listtask-list
Execution history
33
Poll
The Manual Worker
©2016 Beamr Imaging Ltd. All Rights Reserved
The worker process:
While True:Poll an SWF task-listSend email to manual-worker
The employee:
When an email is received,Process task, and submit completion form
WebServer
Respond to SWF
Submit Form
SWFWorker Process
ManualLabour
34
More Advanced Topics
©2016 Beamr Imaging Ltd. All Rights Reserved
Markers- Added by deciders - Simplify understanding history- “Milestones”
Timers- Set by deciders - SWF will schedule a Decision-Task when timer fires
Priorities- SWF will can re-order tasks based on Priority
[ ... { "eventId": 153, "eventTimestamp": 1326671603.102, "eventType": "TimerFired", "TimerFiredEventAttributes"{ ... } ... }, ...]
37
[ ... { "eventId": 153, "eventTimestamp": 1326671603.102, "eventType": "MarkerRecorded", "markerRecordedEventAttributes"{ ... } ... }, ...]
Some Concepts
©2016 Beamr Imaging Ltd. All Rights Reserved
Worklow Activity
Task
State
Logical Flow
Activity Workers Asynchronous processing
Distributed processing
Scalability
Domain
Activity Type Activity Version
Activity Timeouts
Results
Decider
Input Data
Events
Workflow executionExecution History
Polling
workflowId + runId
Workflow Starters
decisions
Activity Task
Lambda Task Decision Task
Task Lists
38