Art of Cloud Workload Translation

Preview:

Citation preview

Vision | Traction | Results

WelcomeStuart Stafford – Chief Technology Officer

The Art of Workload Translation

How it works in the real world!

Agenda

WHAT IS WORKLOAD TRANSLATION?

FIRST, KNOW YOURSELF

THINKING DIFFERENTLY

IDENTIFYING THE OPPORTUNITIES (AND THE LIMITATIONS)

TOP TIPS

Q&A

What is Workload Translation?

• The process of adapting a workload from one infrastructure paradigm to another.

• Involves revisiting architecture and sizing decisions from a different perspective using different criteria.

• Not solely a technical exercise.

• Can be whole of environment or a point solution.

• Art vs Science.

Paradigm Change (Airline vs Passengers)

Typical On-Premise Cloud

Big chunk sunk costs Fine grained consumption costs

Manual processes Highly automated

Full access (Perceived freedom) Operational control (Perceived restrictions)

Enduring assets Disposable Resources

BAU focussed Value focussed

Components Services

The Wider View

• Technical differences.

• Financial model.

• Skills requirements.

• Vendor support.

• Licensing

• Business and legislative requirements.

Art vs Science

• Tools and calculations are valuable – to a point.

• Hands on experience and observation is crucial.

• Challenge the numbers.

• Business drivers, available skills and other “soft” factors can override technical considerations.

• An element of judgement is always required.

First, know yourself!

• How well do you know your existing environment?

• The right information reduces the risk and improves the outcome.

• “Pay for use” actually means “Pay for what you turn on even if you don’t use it”. What aren’t you using?

• Estimation is often required.

• Remember to think about business priorities and objectives.

Instance Sizing• Minimum you need Peak, Average and “Profile” of the following per

instance;• CPU (Mhz).

• Disk IO per volume (IOPS).

• Network Throughput (MBPS).

• Understand the characteristics of each instance and the software within (Background processing, Real time user responsiveness, memory hogs).

• Measure at the virtualisation layer.

• Know where the resource constraints are skewing the numbers.

Metric Collection example

EC2 Instance selection

Storage sizing

• IOPS is the starting point (Peak, Avg and Profile) but don’t ignore throughput.

• Cross reference with latency to identify existing constraints.

• Beware the backups.

• Adjust the numbers if required.

• Understand the relationship between the storage performance of a volume and the workload delivered by it.

EBS Volume Selection

Network sizing

• vNic stats are only the starting point.

• Whole environment Netflow style data is the Nirvana.

• Know where it goes and what it does.

• Beware the backups.

• Think about EC2 Instance type network performance.

• Consider latency requirements.

Extra things to consider

• Vendor support / requirements.

• Backup / data protection.

• Network reliability / resilience.

• Changes to application architecture.

• Monitoring.

• Licensing.

Thinking Differently

• Logical vs Physical .

• Designing for failure.

• Run it hot (What you need and no more) where you can.

• Services vs DIY.

• Micro cost modelling.

• Licensing by the hour.

Micro cost modelling

Identifying the Opportunities

• Cost reduction (looking at the whole picture).

• Technical debt reduction (Offload the work).

• Making the intangible, tangible (true costs for systems).

• Flexibility, Disposability and Repeatability.

• Environment wide variability (Performance / Resiliance / Redundancy).

• Culture change (Shift from technology focus to business focus).

• Virtually unlimited infrastructure.

Identifying the Limitations / Traps

• Visibility of the hardware (Actually a benefit…).

• Lack of access to the hypervisor.

• Lack of VM console access to EC2 instances.

• No live migration of EC2 instances.

• Stuff still breaks.

• When it’s gone, it’s gone.

• Bill shock.

Top Tips

• Put standards and discipline in place day one.

• Tackle “easy” use cases first.

• Combine statistics with observations to really understand workloads.

• Stage architecture changes to manage risk.

• Read the detail on the AWS services.

• Keep focus on business drivers.

Q&A

Thank You!

Recommended