View
1.524
Download
7
Category
Tags:
Preview:
DESCRIPTION
Fault tolerant microservices - LJC Skills Matter 4thNov2014
Citation preview
Fault tolerant microservices
BSkyB@chbatey
@chbatey
Who is this guy?
● Enthusiastic nerd● Senior software engineer at BSkyB● Builds a lot of distributed applications● Apache Cassandra MVP
@chbatey
Agenda
1. Setting the scene○ What do we mean by a fault?○ What is a microservice?○ Monolith application vs the micro(ish) service
2. A worked example○ Identify an issue○ Reproduce/test it○ Show how to deal with the issue
@chbatey
So… what do applications look like?
@chbatey
So... what do systems look like now?
@chbatey
But different things go wrong...
down
slow network
slow app
2 second max
missing packets
GC :(
@chbatey
Fault tolerance
1. Don’t take forever - Timeouts2. Don’t try if you can’t succeed 3. Fail gracefully 4. Know if it’s your fault5. Don’t whack a dead horse6. Turn broken stuff off
@chbatey
Time for an example...
● All examples are on github● Technologies used:
○ Dropwizard○ Spring Boot○ Wiremock○ Hystrix○ Graphite○ Saboteur
@chbatey
Shiny App DeviceService
UserService
PinService
Shiny AppShiny App
Shiny App
UserService User
Service
DeviceService
Play Movie
Example: Movie player service
@chbatey
Testing microservices
You don’t know a service is fault tolerant if you don’t test faults
@chbatey
Isolated service tests
Shiny App
Mocks User
Device Pin
service
Play Movie AcceptanceTest
Prime
@chbatey
1 - Don’t take forever
● If at first you don’t succeed, don’t take forever to tell someone
● Timeout and fail fast
@chbatey
Which timeouts?
● Socket connection timeout● Socket read timeout
@chbatey
Your service hung for 30 seconds :(
Customer
You :(
@chbatey
Which timeouts?
● Socket connection timeout● Socket read timeout● Resource acquisition
@chbatey
Your service hung for 10 minutes :(
@chbatey
Let’s think about this
@chbatey
A little more detail
@chbatey
Wiremock + Saboteur + Vagrant
● Vagrant - launches + provisions local VMs● Saboteur - uses tc, iptables to simulate
network issues● Wiremock - used to mock HTTP
dependencies● Cucumber - acceptance tests
@chbatey
I can write an automated test for that?
WiremockUser Service
Device ServicePin Service
Saboteur
Vagrant + Virtual box VM
PlayMovie
Service
AcceptanceTest
prime to drop traffic
reset
@chbatey
Implementing reliable timeouts
● Homemade: Worker Queue + Thread pool (executor)
@chbatey
Implementing reliable timeouts
● Homemade: Worker Queue + Thread pool (executor)
● Hystrix
@chbatey
Implementing reliable timeouts
● Homemade: Worker Queue + Thread pool (executor)
● Hystrix● Spring Cloud Netflix
@chbatey
A simple Spring RestController@RestController
public class Resource {
private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class);
@Autowired
private ScaryDependency scaryDependency;
@RequestMapping("/scary")
public String callTheScaryDependency() {
LOGGER.info("RestContoller: I wonder which thread I am on!");
return scaryDependency.getScaryString();
}
}
@chbatey
Scary dependency@Component
public class ScaryDependency {
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class);
public String getScaryString() {
LOGGER.info("Scary dependency: I wonder which thread I am on!");
if (System.currentTimeMillis() % 2 == 0) {
return "Scary String";
} else {
Thread.sleep(10000);
return "Really slow scary string"; }
}
}
@chbatey
All on the tomcat thread
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey.examples.Resource - RestContoller: I wonder which thread I am on!13:07:32.896 [http-nio-8080-exec-1] INFO info.batey.examples.ScaryDependency - Scary dependency: I wonder which thread I am on!
@chbatey
Seriously this simple now?@Component
public class ScaryDependency {
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class);
@HystrixCommand
public String getScaryString() {
LOGGER.info("Scary dependency: I wonder which thread I am on!");
if (System.currentTimeMillis() % 2 == 0) {
return "Scary String";
} else {
Thread.sleep(10000);
return "Really slow scary string";
}
}
}
@chbatey
What an annotation can do...
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey.examples.Resource - RestController: I wonder which thread I am on!13:07:32.896 [hystrix-ScaryDependency-1] INFO info.batey.examples.ScaryDependency - Scary Dependency: I wonder which thread I am on!
@chbatey
Timeouts take home
● You can’t use network level timeouts for SLAs
● Test your SLAs - if someone says you can’t, hit them with a stick
● Scary things happen without network issues
@chbatey
2 - Don’t try if you can’t succeed
@chbatey
Complexity
● When an application grows in complexity it will eventually start sending emails
@chbatey
Complexity
● When an application grows in complexity it will eventually start sending emails contain queues and thread pools
@chbatey
Don’t try if you can’t succeed
● Executor Unbounded queues :(○ newFixedThreadPool○ newSingleThreadExecutor○ newThreadCachedThreadPool
● Bound your queues and threads● Fail quickly when the queue /
maxPoolSize is met● Know your drivers
@chbatey
This is a functional requirement
● Set the timeout very high● Use wiremock to add a large delay to the
requests● Set queue size and thread pool size to 1● Send in 2 requests to use the thread and fill
the queue● What happens on the 3rd request?
@chbatey
3 - Fail gracefully
@chbatey
Expect rubbish
● Expect invalid HTTP● Expect malformed response bodies● Expect connection failures● Expect huge / tiny responses
@chbatey
Testing with WiremockstubFor(get(urlEqualTo("/dependencyPath"))
.willReturn(aResponse()
.withFault(Fault.MALFORMED_RESPONSE_CHUNK)));
{
"request": {
"method": "GET",
"url": "/fault"
},
"response": {
"fault": "RANDOM_DATA_THEN_CLOSE"
}
}
{
"request": {
"method": "GET",
"url": "/fault"
},
"response": {
"fault": "EMPTY_RESPONSE"
}
}
@chbatey
4 - Know if it’s your fault
@chbatey
What to record
● Metrics: Timings, errors, concurrent incoming requests, thread pool statistics, connection pool statistics
● Logging: Boundary logging, elasticsearch / logstash
● Request identifiers
@chbatey
Graphite + Codahale
@chbatey
Response times
@chbatey
Separate resource pools
● Don’t flood your dependencies● Be able to answer the questions:
○ How many connections will you make to dependency X?
○ Are you getting close to your max connections?
@chbatey
So easy with Dropwizard + Hystrix @Override
public void initialize(Bootstrap<AppConfig> appConfigBootstrap) {
HystrixCodaHaleMetricsPublisher metricsPublisher
= new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry())
HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher);
}
metrics:
reporters:
- type: graphite
host: 192.168.10.120
port: 2003
prefix: shiny_app
@chbatey
5 - Don’t whack a dead horse
Shiny App DeviceService
UserService
PinService
Shiny AppShiny App
Shiny App
UserService User
Service
DeviceService
Play Movie
@chbatey
What to do..
● Yes this will happen..● Mandatory dependency - fail *really* fast● Throttling● Fallbacks
@chbatey
Circuit breaker pattern
@chbatey
Implementation with Hystrix
@GET
@Timed
public String integrate() {
LOGGER.info("I best do some integration!");
String user = new UserServiceDependency(userService).execute();
String device = new DeviceServiceDependency(deviceService).execute();
Boolean pinCheck = new PinCheckDependency(pinService).execute();
return String.format("[User info: %s] \n[Device info: %s] \n[Pin check: %s] \n", user, device,
pinCheck);
}
@chbatey
Implementation with Hystrixpublic class PinCheckDependency extends HystrixCommand<Boolean> {
@Override
protected Boolean run() throws Exception {
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck");
HttpResponse pinCheckResponse = httpClient.execute(pinCheck);
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity());
return Boolean.valueOf(pinCheckInfo);
}
}
@chbatey
Implementation with Hystrixpublic class PinCheckDependency extends HystrixCommand<Boolean> {
@Override
protected Boolean run() throws Exception {
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck");
HttpResponse pinCheckResponse = httpClient.execute(pinCheck);
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity());
return Boolean.valueOf(pinCheckInfo);
}
@Override
public Boolean getFallback() {
return true;
}
}
@chbatey
Triggering the fallback
● Error threshold percentage● Bucket of time for the percentage● Minimum number of requests to trigger● Time before trying a request again● Disable● Per instance statistics
@chbatey
6 - Turn off broken stuff
● The kill switch
@chbatey
To recap
1. Don’t take forever - Timeouts2. Don’t try if you can’t succeed 3. Fail gracefully 4. Know if it’s your fault5. Don’t whack a dead horse6. Turn broken stuff off
@chbatey
Links
● Examples:○ https://github.com/chbatey/spring-cloud-example○ https://github.com/chbatey/dropwizard-hystrix○ https://github.com/chbatey/vagrant-wiremock-saboteur
● Tech:○ https://github.com/Netflix/Hystrix○ https://www.vagrantup.com/○ http://wiremock.org/○ https://github.com/tomakehurst/saboteur
@chbatey
Questions?
● Thanks for listening!● http://christopher-batey.blogspot.co.uk/
@chbatey
Developer takeaways
● Learn about TCP● Love vagrant, docker etc to enable testing● Don’t trust libraries
@chbatey
Hystrix cost - do this yourself
@chbatey
Hystrix metrics
● Failure count● Percentiles from Hystrix
point of view● Error percentages
@chbatey
How to test metric publishing?
● Stub out graphite and verify calls?● Programmatically call graphite and verify
numbers?● Make metrics + logs part of the story demo
Recommended