27
NATS: Control Flow for Distributed Systems

NATS: Control Flow for Distributed Systems

  • Upload
    apcera

  • View
    1.969

  • Download
    1

Embed Size (px)

Citation preview

Page 1: NATS: Control Flow for Distributed Systems

NATS:Control Flow for Distributed Systems

Page 2: NATS: Control Flow for Distributed Systems

Focus

© 2015 Bridgevine. All Rights reserved. December 9, 2015 2

Page 3: NATS: Control Flow for Distributed Systems

The Transaction Engine

© 2015 Bridgevine. All Rights reserved. December 9, 2015 3

Page 4: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 4

Engine speak

We refer to the outer circles as components, you’ll see that term later...

Recently started referring to the center as the “queue”. It’s a combination of NATS and Elasticsearch. More on this later too...

Elasticsearch is also the store used by History Recorder and the Cache components.

Since everything communicates by messages, we wanted to use the same message struct format but retain flexibility for the different types of information the system will pass...

Page 5: NATS: Control Flow for Distributed Systems

Find your center

© 2015 Bridgevine. All Rights reserved. December 9, 2015 5

Page 6: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 6

The Msg Struct

We share by communicating via a single Msg struct.

We’ve evolved to this format, but expect more changes.

Could end up going with a “net/context” approach, but must retain compiler advantages.

interface{} and []byte type overuse is disadvantageous.

Page 7: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 7

Pub/Sub Queue - NATS http://nats.io/

Love the performance focus. Major reason for selection.

Love the simplicity.

Using standard messaging for config updatesWant all instances of an API to get the update

Message QueueingOnly want one instance of component to process.Employed in request/reply processing.Avoids duplicate logging.

Page 8: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 8

Central Storage Engine - Elasticsearchhttps://www.elastic.co/products/elasticsearch

Initially selected for components providing searchfunctionality.

Flexibility allows for a variety of use cases.

Used as the key/value store companion to NATS for storing message payloads.

Want to keep the NATS messages small.

Page 9: NATS: Control Flow for Distributed Systems

NATS Msg

© 2015 Bridgevine. All Rights reserved. December 9, 2015 9

Page 10: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 10

NATS Msg struct

We will be referring to NATS Msg fields of Subject and Reply in the next few slides… Here’s what the struct looks like.

Our Msg struct ends up being encrypted and stored in the Data field in the NATS Msg.

We don’t really deal directly with the NATS Msg too much. Client API methods are there to handle the construction of this struct, but you can do it yourself too.https://github.com/nats-io/nats/blob/master/nats.go#L1323

Page 11: NATS: Control Flow for Distributed Systems

Request/Reply

© 2015 Bridgevine. All Rights reserved. December 9, 2015 11

Page 12: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 12

Request/Reply Steps

Origin first subscribes to the reply subject it’s about to ask for. Important to do this first.

Origin publishes message with a reply subject. The reply subject should be a unique string.https://github.com/nats-io/nats/blob/master/nats.go#L1357

Subscriber replies to origin by using origin’s msg.Reply as msg.Subject in the message it publishes.

Origin will receive the message. That’s it.

Go client simplifies this with Request method.https://github.com/nats-io/nats/blob/master/nats.go#L1337

Page 13: NATS: Control Flow for Distributed Systems

Forwarding

© 2015 Bridgevine. All Rights reserved. December 9, 2015 13

Page 14: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 14

Origin Step 1 Step 2

Forwarding Steps

Origin subscribes to reply subject. Important to do this first.

Origin then publishes Request/Reply message.

Step 1 Receives message and produces result.

Step 1 Publishes message with new subject and uses same reply as the message from Origin.

Step 2 Receives message, processes and publishes using reply from Step 1’s message as subject.

Origin will receive the message from Step 2.

Page 15: NATS: Control Flow for Distributed Systems

Subscribe/QueueSubscribe

© 2015 Bridgevine. All Rights reserved. December 9, 2015 15

Page 16: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 16

Subscribe when all subscribers should receive the message.https://github.com/nats-io/nats/blob/master/nats.go#L1399

Configuration updates drive this use case.

QueueSubscribe when only one of the subscribers should receive the message.

https://github.com/nats-io/nats/blob/master/nats.go#L1412

So far...everything else/

Limit processing to one instance of a component in a loadbalanced environment.

Page 17: NATS: Control Flow for Distributed Systems

Combo Time!

© 2015 Bridgevine. All Rights reserved. December 9, 2015 17

Page 18: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 18

Combos are good!

Publish + SubscribeSend configuration update to all instances of a component.

Request/Reply + QueueSubscribeCan’t control subscribing from publishing side.

Use QueueSubscribe to have only one instance of a componentprocess the request.

Request/Reply + QueueSubscribe + ForwardingStart with initial processing component.Forward message to continue processingSelect components like “Provider Interfaces” always forward.Select components like “Rules Engine” always reply.Some depend on subject.

Page 19: NATS: Control Flow for Distributed Systems

Timeouts

© 2015 Bridgevine. All Rights reserved. December 9, 2015 19

Page 20: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 20

“NATS is a fire-and-forget messaging system. If you need higher levels of service, you build it into the client” - http://nats.io/documentation/concepts/nats-pub-sub/

Multiple levels of timeouts to provide higher level of service.

Originating request timeout - overall time we will wait before responding to requestor.

During requests involving multiple responses - time to return regardless of the response percentage. Must be less than request timeout.

Processing timeouts - ensure we kill long running processes. These timeouts will be longer than transaction timeouts. Allows us to still gather data without hastily throwing away information.

We may need to dynamically adjust to external conditions. If a provider is experiencing latency issues, it may make more sense to wait a bit longer than lose orders.

Page 21: NATS: Control Flow for Distributed Systems

Queue

© 2015 Bridgevine. All Rights reserved. December 9, 2015 21

Page 22: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 22

The rather obvious (now) ...

Wanted to do logging via NATS and started with a dedicated logging package. Quickly realized this could/should be simplified.

All components use NATS for communication already and wanted logging done via NATS. Was it as simple as adding a Log method to our NATS pub/sub code?

Wanted to log the interaction with the central data store. Store, Load, Delete

Wanted to keep messages small.

Need to provide consumers with a stable API.

Would like to tune cache use without major refactoring efforts.

Page 23: NATS: Control Flow for Distributed Systems

The birth of Queue

Interfaces are good. Queue should define the interfaces it would need implementations for to provide Messaging and Caching functionality.

Instance of Queue could be created with references to concrete types satisfying the interface.

Concerns that were once combined got their own identity.

The Msg struct was now in its own repo and also defined Msg Handler type. Things are making sense.

The NATS and Elasticsearch repos provided simple wrappers to client libs. Don’t want to expose Clients to components.

© 2015 Bridgevine. All Rights reserved. December 9, 2015 23

Page 24: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 24

Queue interface definitions

Page 25: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 25

Queue API

Request, Publish, Subscribe, Load, Store, Delete, Log

Don’t force the consumers of the API to do what must be done:

Request, PublishStore payload. Set CacheKey on Msg.

Request, Subscribe, QueueSubscribeIf CacheKey present, retrieves payload from CacheAdd Payload to CacheData on Msg.

Load, Store, DeleteLog these events

LogUse runtime.FuncForPC to get caller information

Page 26: NATS: Control Flow for Distributed Systems

Close down

© 2015 Bridgevine. All Rights reserved. December 9, 2015 26

Page 27: NATS: Control Flow for Distributed Systems

© 2015 Bridgevine. All Rights reserved. December 9, 2015 27

Would like to ultimately open source Queue and other potentially useful packages. Have already started contributing back with some open source projects:

https://github.com/Bridgevine/soaphttps://github.com/Bridgevine/httphttps://github.com/Bridgevine/xml

Help us build this and more?More info on what we are building...Bridgevine Company WebsiteOpen Positions

Thank you! If you have questions:[email protected] or @stonean