Upload
proidea
View
43
Download
0
Tags:
Embed Size (px)
Citation preview
‘Tree’ component – detailed view
human factor software client library
ISP protocol stack network
load balancers OS power source
client
network connection
sever
What is not a fault?
Service is not working
on our side*
* Caused by e.g. technical failures, outages, corrupted data, attacks
Value delivering without working system
Bring your own wine, we’re waiting for license.Last election in Poland
What fault-tolerance is not?
It’s NOT making sure your system
never goes down.
It (eventually) will.
What is a fault-tolerance?
It’s making sure that system can
quickly recover and/or
client is not impacted.
Solving – redundancy
Hot/warm replicas
Caches
Geographical distribution, CDNs
Hardware redundancy
Alternative systems and procedures
Solving – design
Stateless
Auditing
Idempotent requests
Uniqueness / randomness
Asynchronous and decoupling
EIPs
Commands, not data
Break the rules
Solving – procedures
Backup creation, cleanup and restore
QA & potential problems
Continuous integration
Deployment
Solving – observe
Dive deep, post-mortems
Identify bottlenecks
Observe key metrics
Verify assumptions
Predict traffic
Tradeoffs - real
cost
durability
time
consistency
trust
audit (traceability)
complexity
security
scalability
functionalitystability
reliability
extensibility
performance
maintainability
manageability