Upload
philip-watts
View
51
Download
0
Embed Size (px)
Citation preview
PROBLEM STATEMENT
“FLEXING BETWEEN THE CLOUDS”
▸ Goals of Virtualization seem universally applicable
▸ !(Vendor Lock-in)
▸ Not all workloads are valued equally
=>=>
IT Magic Anywhere
SUCCESS CRITERIA
WIN CONDITIONS
‣ Availability of compute resources are independent of the cloud provider
‣ Batch jobs can be allocated based on point in time cost metrics
‣ Work segregation based on compliance qualifications
TOOLCHAIN
MY CURRENT “FAVORITE” TOYSResources
Image Creation
Infrastructure Provisioning
Service Discovery
Scheduler
Driver
DEFINITIONS: RESOURCE CONTEXT
THE BANE OF TECHNICAL UNDERSTANDING (AKA WORDS):
▸ Region: The isolation boundary of a Nomad Cluster
▸ Datacenter: Low latency, high bandwidth, private network
▸ Resources: The available capacity provided by a node
Region Datacenter
AWS Continental AWS_Region
GCE Continental GCE_Region
Azure Location Location
Region Datacenter
AWS Global AWS_Region
GCE Global GCE_Region
Azure Global Sets of Locations
Common / Comfortable Pattern Ideal Pattern
NOMAD ARCHITECTURE - SINGLE REGION VIEW
BDFL FOR WORKLOAD DECISIONS
‣ In Nomad, Datacenter can speak to Region Aware Servers
‣ Datacenters don’t need to be the same platform
‣ Default Region is “global”
ARCHITECTURE OF SOLUTION
▸ Nomad Clients potentially provide Resources for Jobs
▸ Communication between Datacenters may need secured
▸ Nodes run a Consul Agent and Nomad Client
▸ Nomad Servers “Bin Pack” task onto nodes
THREE PICTURES OF THE SAME THINGSingle Region / Multi DataCenter
(different Clouds)
DEFINITIONS: TASK CONTEXT
WORDS: THE SEQUEL▸ Task: Desired state declaration of workload
▸ Constraints: Rules limiting where a job can run
▸ Evaluations: Queued request to compare desired and present state of work over the region
▸ Caused by a state change event
▸ Job Completion
▸ Node Addiction/Subtraction
▸ Job Scheduled
▸ Allocations: Mapping of tasks to resources within constraints
JOB TYPES: SERVICE
KEEPING THE SITE UP
▸ Long running jobs that should always be available
▸ Scheduling decisions favor QoS
▸ Example: Ensuring a front end web service is always available
JOB TYPES: BATCH
WHAT TO DO WITH ALL THIS DATA?
▸ A set of work spanning a few minutes to a few days
▸ Based on the Berkley Sparrow Two Choices model
▸ http://people.eecs.berkeley.edu/~keo/publications/sosp13-final17.pdf
▸ Probes a set of nodes which meet constraints and sends work to the "least loaded" nodes
▸ Example: Tasks to manipulate a queue of data when present
JOB TYPES: SYSTEM
KEEPING THE LIGHTS ON
▸ A unique job type used to declare jobs which should run on every node which meets the job constraints
▸ Are re-evaluated whenever a node joins the cluster
▸ Example: distributing common tasks, which can benefit from rolling updates, job updates, service discovery
NOMAD SCHEDULING INTERNALS
GETTING FROM WORK AND RESOURCES TO ACCOMPLISHMENTS
▸ Evaluations read the Job Specification and find constraints
▸ Evaluation Brokers maintain the pending queue, priority, and at least once delivery
▸ Schedulers submit an Allocation Plan, evaluated for feasibility, followed by priority
▸ Allocations set jobs against resources
LIKE TETRIS FOR WORKLOADS
▸ Tasks require resources
▸ Nodes have “dimensions” of resources
▸ Allocation fits Tasks inside Nodes
BIN PACKING
TASK GROUPS
PREVENTING TASK SEPARATION ANXIETY
▸ Task Groups allow for multiple Jobs to require they are scheduled on the same node
▸ Are created implicitly for single tasks in isolation
▸ Can be used to enforce compliance elements required to run together
▸ Example: Requiring log shipping co-processes
CONSTRAINTS
JUST BECAUSE YOU CAN, DOESN’T MEAN YOU SHOULD▸ Job Constraints limit the resources available for a particular
job group
▸ Constraints can map workloads directly to Customized Hardware such as AWS Placement Groups
CONSTRAINTS AND COMPLIANCE
SATISFYING COMPLIANCE REQUIREMENTS
▸ Constraints on datacenter can be used for Data Isolation inside National Boundaries.
▸ Healthcare workload that must stay within the EU
▸ Metadata attributes can allow for custom declarations.
▸ Eg. PCI DSS Compliance:
▸ Maintain network firewall
▸ Protect run Anti-Malware/Anti-Virus
▸ Monitor and log access
▸ Regularly test security systems and procedures.
1 job "sample_service" { 2 ... 3 meta { 4 pci_dss = true 5 } 6 group "webservice" { 7 constraint { 8 attribute = "meta.pci_dss" 9 value = true 10 } 11 } 12 }
Constraint Snippet
CONSTRAINTS: SATISFYING SPECIAL NEEDS
DIFFERENT THINGS ARE DIFFERENT
▸ Not all platforms are created equal
▸ Platform attributes for specifying Cloud Platforms
1 job "sample_service" { 2 ... 3 constraint { 4 attribute = attr.platform 5 value = aws 6 } 7 }
▸ ${attr.platform} = aws May be relevant if you needFloat (GPU) processing, which AWS offers and GCE doesn’t
RAW EXECS
CHEKHOV’S TASK DRIVER
▸ Unconstrained, Un-isolated, Disabled by Default
“IT SEEMS TO BE A DEEP INSTINCT IN HUMAN BEINGS FOR MAKING EVERYTHING COMPULSORY THAT ISN'T FORBIDDEN”
▸ Runs as the user Nomad is running as
▸ Disabled by default
client { options = { driver.raw_exec.enable = 1 } }
~Robert A. Heinlein
OPERATOR INTERACTION
RELIABLE MAGIC = OPERATIONS
1 $ nomad run jobfile.nomad -address=$nomad_server
‣ Operators schedule jobs against a server
‣ Nomad figures out how/where/when to run tasks
‣ Complex solution through iteration
Phil Watts DevOps Artificer @ REĀN Cloud
@pwattstbd github.com/marsupermammal
[email protected] www.reancloud.com
import "os"
func presentation() { os.Exit(0) }