Upload
marcin-grzejszczak
View
4.294
Download
5
Embed Size (px)
Citation preview
Microservices tracing with Spring Cloud and Zipkin
Marcin Grzejszczak
Marcin Grzejszczak @mgrzejszczak, 11-13 May 2016
About meDeveloper at Pivotal
Part of Spring Cloud Team
Working with OSS:● Accurest - Consumer Driven Contracts verifier for Java● JSON Assert - fluent JSON assertions● Spock Subjects Collaborators Extension● Gradle Test Profiler● Up To Date Gradle Plugin
TWITTER: @MGrzejszczakBLOG: http://TOOMUCHCODING.COM
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
AgendaWhat is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
An ordinary system...
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
UI calls backend
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
UI -> BACKEND
Everything is awesome
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CLICK 200
Until it’s not
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CLICK 500
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Time to debug
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
It doesn’t look like this
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
More like this
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
On which server / instance was the exception thrown?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
SSH and grep for ERROR to find it?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
Span
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
The basic unit of work (e.g. sending RPC)
● Spans are started and stopped
● They keep track of their timing information
● Once you create a span, you must stop it at some point in the future
● Has a parent and can have multiple children
Trace
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
A set of spans forming a tree-like structure.
● For example, if you are running a book store then
○ Trace could be retriving a list of available books
○ Assuming that to retrive the books you have to send 3 requests to 3 services then you could have at least 3 spans (1 for each hop) forming 1 trace
SERVICE 1
REQUEST
No Trace IdNo Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = XSpan Id = A
Trace Id = XSpan Id = A
Trace Id = XSpan Id = A
REQUEST
RESPONSE
Trace Id = XSpan Id = BClient Sent
Trace Id = XSpan Id = B
Client Received
Trace Id = XSpan Id = B
Server Received
Trace Id = XSpan Id = C
Trace Id = XSpan Id = BServer Sent
REQUEST
RESPONSE
Trace Id = XSpan Id = DClient Sent
Trace Id = XSpan Id = D
Client Received
Trace Id = XSpan Id = D
Server Received
Trace Id = XSpan Id = E
Trace Id = XSpan Id = DServer Sent
Trace Id = XSpan Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = XSpan Id = FClient Sent
Trace Id = XSpan Id = F
Client Received
Trace Id = XSpan Id = F
Server Received
Trace Id = XSpan Id = G
Trace Id = XSpan Id = FServer Sent
Trace Id = XSpan Id = G
Trace Id = XSpan Id = C
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span Id = AParent Id = null
Span Id = BParent Id = A
Span Id = CParent Id = B
Span Id = DParent Id = C
Span Id = EParent Id = D
Span Id = FParent Id = C
Span Id = GParent Id = F
Is it that simple?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Is it that simple?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
How do you pass tracing information (incl. Trace ID) between:
● different libraries?
● thread pools?
● asynchronous communication?
● …?
Log correlation with Spring Cloud Sleuth
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
We take care of passing tracing information between threads / libraries / contexts for● Hystrix● RxJava● Rest Template● Feign● Messaging with Spring Integration● Zuul● ...
If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
Now let’s aggregate the logs!
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Instead of SSHing to the machines aggregate the logs!
● With Cloud Foundry’s (CF) Loggergator the logs from different instances are streamed into a single place
● You can harvest your logs with Logstash Forwarder / FileBeat
● You can use ELK stack to stream and visualize the logs
Spring Cloud Sleuth with Maven
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Spring Cloud Sleuth with Gradle
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-sleuth"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.
RELEASE"
}
}
Log correlation with Spring Cloud SleuthDEMO
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Great! We’ve found the exception!But meanwhile....
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
The system is slow...
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CLICK 200
One of the services is slow?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Which one?How to measure that?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Client Sent (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing it
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from the server side
Let’s log events!
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
● The request started at T=0ms
● It took 300 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 100 ms
● Why is there a delay between sending and receiving messages?
Conclusions
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
CS 0 ms SR 100 ms
SS 200 msCR 300 ms
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
https://blogs.oracle.com/jag/resource/Fallacies.html
Distributed tracing - terminology
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Span
Trace
Logs (annotations)
Tags (binary annotations)
Logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Represents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a span
● For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla.org/en-US/docs/Web/API/PerformanceTiming)
Main logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Client Send (CS)○ The client has made a request - the span was started
● Server Received (SR)○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
Main logs
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Server Send (SS)○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ SS timestamp - CR timestamp = NETWORK LATENCY
Key-value pair
● Every span may also have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth○ message/payload-size○ http.method○ commandKey for Hystrix
Tag
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
How to visualise latency in a distributed system?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Zipkin is a distributed tracing system
● It runs as a separate process (you can run it as a Spring Boot application)
● It helps gather timing data needed to troubleshoot latency problems in microservice architectures
● The front end is a "waterfall" style graph of service calls showing call durations as horizontal bars
The answer is: Zipkin
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
How does Zipkin work?
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
SPANS SENT TO COLLECTORS
SPANS SENT TO COLLECTORS
STORE IN DB
APP
APP
UI QUERIES FOR TRACE INFO VIA API
Spring Cloud Sleuth and Zipkin integration
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● We take care of passing tracing information between threads / libraries / contexts
● Upon closing of a Span we will send it to Zipkin○ either via HTTP (spring-cloud-sleuth-zipkin)○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Sping Cloud Stream Collector as a Spring Boot app (spring-cloud-sleuth-zipkin-stream)○ you can add the dependency to Zipkin UI!
Spring Cloud Sleuth Zipkin with Maven
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.RELEASE</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
Spring Cloud Sleuth Zipkin with Gradle
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-zipkin"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.
RELEASE"
}
}
SERVICE 1/start
REQUEST
No Trace IdNo Span Id
RESPONSE
SERVICE 2/foo
SERVICE 3/bar
Trace Id = XSpan Id = A
Trace Id = XSpan Id = A
Trace Id = XSpan Id = A
REQUEST
RESPONSE
Trace Id = XSpan Id = BClient Sent
Trace Id = XSpan Id = B
Client Received
Trace Id = XSpan Id = B
Server Received
Trace Id = XSpan Id = C
Trace Id = XSpan Id = BServer Sent
REQUEST
RESPONSE
Trace Id = XSpan Id = DClient Sent
Trace Id = XSpan Id = D
Client Received
Trace Id = XSpan Id = D
Server Received
Trace Id = XSpan Id = E
Trace Id = XSpan Id = DServer Sent
Trace Id = XSpan Id = E
SERVICE 4/baz
REQUEST
RESPONSE
Trace Id = XSpan Id = FClient Sent
Trace Id = XSpan Id = F
Client Received
Trace Id = XSpan Id = F
Server Received
Trace Id = XSpan Id = G
Trace Id = XSpan Id = FServer Sent
Trace Id = XSpan Id = G
Trace Id = XSpan Id = C
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
DEMO
Zipkin for Brewery
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● A test app for Spring Cloud end to end tests
● Source code: https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
Summary
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
Marcin Grzejszczak @mgrzejszczak, Kraków, 11-13 May 2016
THANK YOU● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone
and run getReadyForConference.sh - NOTE: you need Vagrant!)
● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository
● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation
● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release
● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server
● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud Foundry - point of entry to the app
● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry
● http://brewery-zipkin-web.cfapps.io - Zipkin deployed to PCF for Brewery Sample app
Marcin Grzejszczak, @mgrzejszczak Kraków, 11-13 May 2016