48
A snapshot, a stream, and a bunch of deltas Applying Lambda Architectures in a post-Microservice World Q-Con London March 6th 2018 Ade Trenaman, SVP Engineering, Raconteur, HBC Tech t: @adrian_trenaman http://tech.hbc.com t: @hbcdigital fa: @hbcdigital in: hbc_digital

t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

A snapshot, a stream, and a bunch of deltasApplying Lambda Architectures in a post-Microservice World

Q-Con London March 6th 2018

Ade Trenaman, SVP Engineering, Raconteur,HBC Techt: @adrian_trenaman

http://tech.hbc.com

t: @hbcdigital fa: @hbcdigital in: hbc_digital

Page 2: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant
Page 3: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant
Page 4: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant
Page 5: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

~$3.5Bnannual e-commerce revenue

Page 6: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

00’s of Stores

Page 7: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

What this talk is aboutSolving the problem of microservice dependencies with lambda architectures:

> performance, scalability, reliability

Lambda architecture examples:

> product catalog, search, real-time inventory, third-party integration

Lessons learnt:

> It’s not all rainbows and unicorns> Kinesis vs. Kafka

Page 8: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Some context: a minimalist abstraction of our architectural evolution

2007Monolith

2010 Service Oriented

2012µ-Services

λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ

λ λ λ λ λ λ λ

2016Rise of

Serverless λ

λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ

λ λ λ λ λ λ λ

2018+Multi-bannerMulti-tenant Multi-region

λ architecturesStreams

GraphQL In the seams

Page 9: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Slam on the breaks! Dublin Microservices Meetup, Feb 2015

Page 10: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Part 0In which we briefly describe lambda architecture, and the Hollywood Principle

Page 11: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Lambda architecture: making batch processing sexy again.

Some kind of data-source

A view of the data

Batch processing ‘the baseline’

Stream processing ‘real-time’

Provide low-latency, high-throughput, reliable, convenient access to the data.

Preserve the integrity and purpose of the source of truth.

Stream: an append-only, immutable log store of interesting events.

Page 12: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Lambda architecture: making batch processing sexy again.

Some kind of data-source

A view of the data

Batch processing ‘the baseline’

Stream processing ‘real-time’

Need to rebuild the view? Take latest snapshot, and replay all events with a greater timestamp.

T_7: po

T_6: dipsy

T_8: tinky

T_1: fooT_2: barT_3: pepeT_4: pipiT_5: lala

T_1: fooT_2: barT_3: pepeT_4: pipiT_5: lalaT_6: dipsy

...

Page 13: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

“Don’t call us, we’ll call you.”

Page 14: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Inversion of control: previously, we ask for data when we need it.

Some kind of data-source

A view of the data

Provide low-latency, high-throughput, reliable, convenient access to the data.

Preserve the integrity and purpose of the source of truth.

Page 15: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Inversion of control: now, when the data changes, we are informed.

Some kind of data-source

A view of the data

Provide low-latency, high-throughput, reliable, convenient access to the data.

Preserve the integrity and purpose of the source of truth.

Page 16: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Part IIn which we learn the perils of caching in a microservices architecture, and how

lambda architecture helped us out.

Page 17: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Gilt: we source luxury brands...

Page 18: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

… we shoot the product in our studios

Page 19: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

… we receive

Page 20: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

… we sell every day at noon

Page 21: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

… stampede!

Page 22: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant
Page 23: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

The Gilt ProblemMassive pulse of traffic, every day.

=> serve fast

Low inventory quantities of high value merchandise, changing rapidly=> can’t cache

Individually personalised landing experiences=> can’t cache

Page 24: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Caching“Just say no.”

“Until you have to say yes.”

“Then, just say maybe.”

Page 25: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

consumer (e.g. web-pdp)

A stateless, cache-free library, busted.

product-service inventory-service price-service

commons <<lib>>

Hmm, engineer adds a local brand cache to reduce network calls..

… and then later, another cache for product information.

Leads to (1) arbitrary caching policies, & (2) duplicated cache information.

product cache

brand cache

Page 26: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

A caching library. Worked well initially, but...

product-service inventory-service price-service

consumer (e.g. web-pdp)

commons <<lib>>

We changed the commons library to cache products with a consistent, timed refresh (20m).

Worked well, until the business changed its mind about one small thing: let’s make everything in the warehouse sellable.

Orders of magnitude more SKUs:

* JSON from product service > 1Gb* Startup time > 10m* JVM garbage collection every 20m on cache clear* ~1hr to propagate a change.* m4.xlarge, w/ 14Gb JVM Heap

product cache

brand cache

Page 27: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Near real-time caching at scale

Source of Truth - PG

admin

web-pdp

commons

L1

* Startup time ~1s* No more stop-the-world GC* ~seconds to propagate a change.* c4.xlarge (CPU!!!), w/ 6Gb JVM Heap

Next: replace JSON marshalling with binary OTW format (e.g. AVRO)

S3

Brands, products, sales, channels, ...

s

Elasticache

product-service

Kinesis

Calatrava λ

Page 28: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

https://github.com/gilt/calatrava - soon to be public

Page 29: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Part 2In which we learn how we’ve used Lambda architecture to implement a near

real-time search index, but needed an additional relational ‘view of truth’.

Page 30: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Problem: polling a polling service means changes to product data are not reflected in realtime.

Source of Truth - PG

admin

product-service

search-indexer

Page 31: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Source of Truth - PG

admin

View of Truth - PG

svc-search-feed

Kinesis

CalatravaVOT

* Changes are propagated in real-time to Solr* Rebuild of index (s + *) with zero down time* Same logic for batch & stream (thank you akka-streams)* V.O.T.: “We needed a relational DB to solve a relational problem”

S3

Brands, products, sales, channels, ...

s

Page 32: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Part 3In which we use a lambda architecture to facade an unscalable unreliable system

as a reliable R+W API… and benefit from always using the same flow.

Page 33: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Real-time inventory: bridging bricks’n’clicks

internet

OMS

warehousestores

Inventory SOT

?

Page 34: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Real-time inventory: bridging bricks’n’clicks

OMS

warehousestores

Inventory SOT RTAM

* Every sku inventory level every 24hrs* Threshold (O, LWM, HWM) inventory events.

λ

Elasticache

R+W

* Absolute inventory values

REST API

* APIBuilder.io

Page 35: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Making a web reservation

OMS

warehousestores

Inventory SOT RTAM

λ

Elasticache REST API

R+W

1. Is inventory >0 ? 2. Attempt a reservation with OMS. IF it fails, generate a random reservation ID. 3. Put the change on the RTAM stream4. Update the cache (and stream, not shown)5, 6, 7. Trigger a best effort to true-up inventory with ATP (available to purchase)

1.

2.3.

4.

λ

5.

6.

7.

Page 36: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

XTHERE IS ONLY ONE PATH

Page 37: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Part 4In which we learn that the paradigm generalises across third-party boundaries.

Page 38: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

International E-Commerce: Taxes, Shipping & Duty is HARD. Performance is critical!

Page 39: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Typical solution: cache for PDP & PA, go direct at checkout. Asymmetric, with chance of sticker-shock.

Third Party Shipping Partner

Intl Pricing Cache

Product Listing

Product Details

Checkout

pricingservice

Page 40: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Elasticache

Stream driven solution with flow.io

Pricing Service

Product Listing

Product Details

Checkout

Page 41: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

Part 5In which we consider Kafka vs. Kinesis

Page 42: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

∞Stream: an immutable, append-only log.

Except it isn’t.

Which makes us use snapshots, and complicates our architecture.

Page 43: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

LOG COMPACTION

Page 44: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

“Log compaction”: always remember the latest version of the same object.

Source of Truth

(1, janes bond)

(2, dr. who)

(1, james bond)

(3, fr. ted)

(1, janes bond)

(2, dr. who)

(1, james bond)

(3, fr. ted)

Page 45: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

TABLE STREAM DUALITY

Page 46: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

KTable & Kafka Streams Library

Page 47: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

K-Table & Kafka Streams...

Page 48: t: @adrian trenaman Ade Trenaman, SVP Engineering ... · Rise of Serverless λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ 2018+ Multi-banner Multi-tenant

#thanks @adrian_trenaman @gilttech @hbcdigital

(0) Apply lambda arch to create scalable, reliable offline systems.

(1) Replicate and transform the one source of truth

(2) It’s not all unicorns and rainbows: complex VOT, snapshots

(3) Kinesis is the gateway drug; Kafka is the destination.