Upload
eberhard-wolff
View
1.546
Download
2
Tags:
Embed Size (px)
DESCRIPTION
High availability and scalability used to be solved in hardware - but that is quite expensive. This presentation shows how modern technologies like virtualization, cloud, NoSQL and new software architectures provide new and cheaper solutions - that are probably also even better than the traditional approaches.
Citation preview
Eberhard Wolff - @ewolff
High Availability and Scalability: Too Expensive!–
Architectures for Future Enterprise Systems
Eberhard Wolff Freelance Consultant / Trainer
Head Technolocy Advisory Board adesso AG
Eberhard Wolff - @ewolff
The Dream
Foto: http://www.vaxman.de/
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Where Are We?
Eberhard Wolff - @ewolff
Non-functional Requirements
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Performance
Availability
Eberhard Wolff - @ewolff
Availability: Traditional Approach
Eberhard Wolff - @ewolff
• Buy highly reliable hardware
• Built a small cluster • 2 machines
• Maybe add a stand-by data center
Eberhard Wolff - @ewolff
• Eventually system will fail
• …and you are in real trouble
Eberhard Wolff - @ewolff
True Story • “Machine rebooted over night.” • “Several times.” • “No idea how often.” • “No idea why…”
Eberhard Wolff - @ewolff
Let’s look at an example
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
• Server fails • Application fails • No service to the customer
• Can we do better?
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
What You Have Just Seen
Eberhard Wolff - @ewolff
• Failing systems do not impact user • Failing systems are just restarted • Restarts happen automatically
• System run in different data centers • i.e. eu-west-1a / b / c
Eberhard Wolff - @ewolff
Elastic Load
Balancer
System EU West 1a
System EU West 1b
System EU West 1c
Eberhard Wolff - @ewolff
What It Takes… • Virtualization • +API to start new servers
• Watchdog to detect failed servers • Redundant data centers if needed
Eberhard Wolff - @ewolff
Can be implemented in your datacenter!
I have none.
So I used the Amazon Cloud
Eberhard Wolff - @ewolff
Alternatives
Eberhard Wolff - @ewolff
Hardware • As cheap as it gets
• Not highly available
• Availability in Software
Eberhard Wolff - @ewolff
Traditional Servers
Eberhard Wolff - @ewolff
Traditional Servers
Eberhard Wolff - @ewolff
Highly customized
Hard to reproduce
Eberhard Wolff - @ewolff
• Depends on details • True story: • Order of patch
installations matter
Eberhard Wolff - @ewolff
Stateful
Eberhard Wolff - @ewolff
Redundancy in Hardware
Eberhard Wolff - @ewolff
Traditional Servers
Eberhard Wolff - @ewolff
Phoenix Servers
Eberhard Wolff - @ewolff
Easy to create a new server
Eberhard Wolff - @ewolff
Reliably reproducible
Eberhard Wolff - @ewolff
Stateless
Eberhard Wolff - @ewolff
Stateless
• No data is lost • New server can take load
immediately
Eberhard Wolff - @ewolff
Redundancy in Software
Eberhard Wolff - @ewolff
Implementations • Might use a VM image • …or a PaaS • …or provisioning tools
Eberhard Wolff - @ewolff
Provisioning Tools
Eberhard Wolff - @ewolff
• Easy to create test environments • …with other software version
Eberhard Wolff - @ewolff
Chaos Monkey
• Tool by Netflix • Video streaming • #1 in Internet usage in the US
Eberhard Wolff - @ewolff
Chaos Monkey
• Kill random machines • To ensure system survives
hardware failures
Eberhard Wolff - @ewolff
Would you rather rely on…
…highly available hardware
…or a Chaos Monkey tested system?
Eberhard Wolff - @ewolff
Resilience
Eberhard Wolff - @ewolff
Performance
Availability
Eberhard Wolff - @ewolff
Availability
Performance
Eberhard Wolff - @ewolff
Performance: Traditional Approach
Eberhard Wolff - @ewolff
• Estimate • #Users • Use Cases • Data volume • Etc.
• Add a little bit
• Order servers
Eberhard Wolff - @ewolff
Performance: Problems
Eberhard Wolff - @ewolff
Problem: Estimate & Scaling • Performance hard to estimate • Coarse grained scaling • Backfires
Eberhard Wolff - @ewolff
True Story • Initial estimate wrong • Just need a little more • Cluster: two servers • Add one • About 50% higher costs • Order / install server takes time • Bad performance until server
delivered
Eberhard Wolff - @ewolff
Problem: Load Peak • Business has load peaks • i.e. events that people register for
• Need to have enough hardware for load peaks
• Costly
Eberhard Wolff - @ewolff
Problem: Testing • Testing • Need production-like infrastructure
• Prohibitive costs • Only needed during tests
Eberhard Wolff - @ewolff
Eberhard Wolff - @ewolff
Elastic Load
Balancer
System EU West 1b
System EU West 1c
System EU West 1c
System EU West 1c
Eberhard Wolff - @ewolff
What You Have Just Seen • System tunes itself depending on
load • Same approach as for availability • +Watchdog for load
Eberhard Wolff - @ewolff
Easy to create a new server
Reliably reproducible
Redundancy in Software
Stateless
✔
✔
✔
?
Eberhard Wolff - @ewolff
Stateless • Stateless web servers: best practice • Some Java framework don’t follow
the approach
• Can store HTTP session externally • i.e. RDBMS, NoSQL, Cache
Eberhard Wolff - @ewolff
What about Databases?
Eberhard Wolff - @ewolff
Databases • Often assumed to be
just “fast and scalable” • Large scale doable i.e.
Data Warehouse • Often use traditional
approach • Cluster with two nodes • Highly available
hardware
Eberhard Wolff - @ewolff
Database: Problems • Availability • Highly available hardware
• Performance • Limited scaling
• Costly
Eberhard Wolff - @ewolff
Databases • New approaches
• Used by NoSQL databases
• But also i.e. MySQL • …or in system architecture
Eberhard Wolff - @ewolff
Databases • Replication • Read performance • Availability
• Sharding • Spread data across servers • Write performance
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1
Shard 1
Replica 2
Replica 3
Shard 2
Replica 1
Replica 2
Replica 3
Eberhard Wolff - @ewolff
Availability
Replica 1
Shard 1
Replica 2
Replica 3
Shard 2
Replica 1
Replica 2
Replica 3
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1
Shard 1
Replica 2
Replica 3
Replica 1
Shard 2
Replica 2
Replica 3
Replica 1
Shard 3
Replica 2
Replica 3
Eberhard Wolff - @ewolff
Scaling MongoDB
Replica 1
Shard 1
Replica 2
Replica 3
Shard 2
Replica 1
Replica 2
Replica 3
?
Eberhard Wolff - @ewolff
Replicas & Shards • Easy to understand
• But: Coarse grained scaling
• Adding another shard means • Moving lots of data • Add quite some servers
Eberhard Wolff - @ewolff
Amazon Dynamo Model Server A
Shard1 Shard3
Shard4
Server B Shard2 Shard1
Shard4
Server D Shard4 Shard2
Shard3
Server C Shard3 Shard2
Shard1
Eberhard Wolff - @ewolff
Amazon Dynamo Model Server A
Shard1 Shard3
Shard4
Server B Shard2 Shard1
Shard4
Server D Shard4 Shard2
Shard3
Server C Shard3 Shard2
Shard1
Eberhard Wolff - @ewolff
Amazon Dynamo Model Server A
Shard1 Shard3
Shard4
Server B Shard2 Shard1
Shard4
Server D Shard4 Shard2
Shard3
Server C Shard3 Shard2
Shard1
New Server
Eberhard Wolff - @ewolff
Amazon Dynamo Model • Published in the Dynamo paper • Implementations:
Riak, Cassandra etc
• Fine grained scaling • Can immediately write to new node
Eberhard Wolff - @ewolff
Hardware • Not highly reliable
• Scales by distributing load across servers
• No NAS, SAN, RAID…
• As cheap as it gets
Eberhard Wolff - @ewolff
Sum Up • Virtualization • + Phoenix server • = Better availability • = Better performance • = Lower costs • Stateless servers • NoSQL
Eberhard Wolff - @ewolff
Thank You!