54
world of microservices Flink in Zalando’s

Flink in Zalando's World of Microservices

Embed Size (px)

Citation preview

Page 1: Flink in Zalando's World of Microservices

world of microservices

Flink in Zalando’s

Page 2: Flink in Zalando's World of Microservices

Agenda

Zalando’s Microservices Architecture

Saiki - Data Integration and Distribution at Scale

Flink in a Microservices World

Stream Processing Use Cases:

Business Process Monitoring

Continuous ETL

Future Work

2

Page 3: Flink in Zalando's World of Microservices

About us

Mihail Vieru Big Data Engineer,

Business Intelligence

Javier López Big Data Engineer,

Business Intelligence

3

Page 4: Flink in Zalando's World of Microservices

4

Page 5: Flink in Zalando's World of Microservices

15 countries

4 fulfillment centers

~18 million active customers

2.9 billion € revenue 2015

150,000+ products

10,000+ employees

One of Europe's largest

online fashion retailers

Page 6: Flink in Zalando's World of Microservices

Zalando Technology

1100+ TECHNOLOGISTS

Rapidly growing

international team

http://tech.zalando.com

6

Page 7: Flink in Zalando's World of Microservices

VINTAGE ARCHITECTURE

Page 8: Flink in Zalando's World of Microservices

Classical ETL process

Vintage Business Intelligence

Business Logic

Data Warehouse (DWH)

Database DBA

BI

Business Logic

Database

Business Logic

Database

Business Logic

Database

Dev

8

Page 9: Flink in Zalando's World of Microservices

Vintage Business Intelligence

Very stable architecture that is still in use in the oldest

components

Classical ETL process

• Use-case specific

• Usually outputs data into a Data Warehouse

• well structured

• easy to use by the end user (SQL) 9

Page 10: Flink in Zalando's World of Microservices

RADICAL AGILITY

Page 11: Flink in Zalando's World of Microservices

AUTONOMY

MASTERY

PURPOSE

Radical Agility

11

Page 12: Flink in Zalando's World of Microservices

Autonomy

Autonomous teams

• can choose own technology stack

• including persistence layer

• are responsible for operations

• should use isolated AWS accounts 12

Page 13: Flink in Zalando's World of Microservices

Supporting autonomy — Microservices

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

13

Page 14: Flink in Zalando's World of Microservices

Supporting autonomy — Microservices

Business Logic

Database

Team

A Business Logic

Database

Team

B

RE

ST

AP

I RE

ST

AP

I

Applications communicate using REST APIs

Databases hidden behind the walls of AWS VPC

public Internet

14

Page 15: Flink in Zalando's World of Microservices

Supporting autonomy — Microservices

Business Logic

Database

Team

A Business Logic

Database

Team

B

Classical ETL process is impossible!

RE

ST

AP

I

public Internet

RE

ST

AP

I

15

Page 16: Flink in Zalando's World of Microservices

Supporting autonomy — Microservices

Business

Logic

Database

RE

ST

AP

I

Ap

p A

Business

Logic

Database

RE

ST

AP

I

App B

Business

Logic

Database

RE

ST

AP

I

Ap

p C

Business

Logic

Database

RE

ST

AP

I

Ap

p D

16

Page 17: Flink in Zalando's World of Microservices

Supporting autonomy — Microservices

Business Intelligence

Business

Logic

Database

RE

ST

AP

I

Ap

p A

Business

Logic

Database

RE

ST

AP

I

App B

Business

Logic

Database

RE

ST

AP

I

Ap

p C

Business

Logic

Database

RE

ST

AP

I

Ap

p D

17

Page 18: Flink in Zalando's World of Microservices

Supporting autonomy — Microservices

App A App B App D App C BI

Data Warehouse

?

18

Page 19: Flink in Zalando's World of Microservices

SAIKI

Page 20: Flink in Zalando's World of Microservices

SAIKI

Saiki Data Platform

App A App B App D App C BI

Data Warehouse

20

Page 21: Flink in Zalando's World of Microservices

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C BI

Data Warehouse

Buku

21

Page 22: Flink in Zalando's World of Microservices

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C BI

Data Warehouse

Buku

AWS S3

Tukang

REST API

22

Page 23: Flink in Zalando's World of Microservices

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C BI

Data Warehouse

Buku Tukang

REST API

AWS S3

23

E.g. Forecast DB

Page 24: Flink in Zalando's World of Microservices

Saiki — Data Integration & Distribution

Old Load Process New Load Process

relied on Delta Loads relies on Event Stream

JDBC connection RESTful HTTPS connections

Data Quality could be controlled by BI

independently

trust for Correctness of Data in the

delivery teams

PostgreSQL dependent Independent of the source technology

stack

N to 1 data stream N to M streams, no single data sink

24

Page 25: Flink in Zalando's World of Microservices

BI

Data Warehouse E.g. Forecast DB

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C

Buku Tukang

REST API

AWS S3

25

Page 26: Flink in Zalando's World of Microservices

BI

Data Warehouse E.g. Forecast DB

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C

Buku Tukang

REST API

Stream Processing

via Apache Flink

AWS S3

26

Page 27: Flink in Zalando's World of Microservices

BI

Data Warehouse E.g. Forecast DB

SAIKI

Saiki — Data Integration & Distribution

App A App B App D App C

Buku Tukang

REST API

Stream Processing

via Apache Flink Data Lake .

AWS S3

27

Page 28: Flink in Zalando's World of Microservices

FLINK IN A

MICROSERVICES

WORLD

Page 29: Flink in Zalando's World of Microservices

Current state in BI

29

• Centralized ETL tools

(Pentaho DI/Kettle and Oracle Stored Procedures)

• Reporting data does not arrive in real time

• Data to data science & analytical teams is provided using

Exasol DB

• All data comes from SQL databases

Page 30: Flink in Zalando's World of Microservices

Opportunities for Next Gen BI

• Cloud Computing

• Distributed ETL

• Scale

• Access to real time data

• All teams publishing data to central bus (Kafka)

30

Page 31: Flink in Zalando's World of Microservices

Opportunities for Next Gen BI

• Hub for Data teams

• Data Lake provides distributed access and fine

grained security

• Data can be transformed (aggregated, joined, etc.)

before delivering it to data teams

• Non structured data

• “General-purpose data processing engines like Flink or

Spark let you define own data types and functions.” -

Fabian Hueske

31

Page 32: Flink in Zalando's World of Microservices

The right fit

Stream Processing

32

Page 33: Flink in Zalando's World of Microservices

The right fit - Data processing engine (I)

33

Feature Apache Spark 1.5.2 Apache Flink 0.10.1

processing mode micro-batching tuple at a time

temporal processing

support

processing time event time, ingestion time, processing

time

batch processing yes yes

latency seconds sub-second

back pressure handling through manual configuration implicit, through system architecture

state storage distributed dataset in memory distributed key/value store

state access full state scan for each microbatch value lookup by key

state checkpointing yes yes

high availability mode yes yes

Page 34: Flink in Zalando's World of Microservices

The right fit - Data processing engine (II)

34

Feature Apache Spark 1.5.2 Apache Flink 0.10.1

event processing guarantee exactly once exactly once

Apache Kafka connector yes yes

Amazon S3 connector yes yes

operator library neutral ++ (split, windowByCount..)

language support Java, Scala, Python Java, Scala, Python

deployment standalone, YARN, Mesos standalone, YARN

support neutral ++ (mailing list, direct contact & support

from data Artisans)

documentation + neutral

maturity ++ neutral

Page 36: Flink in Zalando's World of Microservices

Flink in AWS - Our appliance

36

MASTER ELB

EC2 Docker

Flink Master

EC2 Docker

Flink Shadow Master

WORKERS ELB

EC2 Docker

Flink Worker

EC2 Docker

Flink Worker

EC2 Docker

Flink Worker

Page 37: Flink in Zalando's World of Microservices

USE CASES

Page 38: Flink in Zalando's World of Microservices

BUSINESS PROCESS

MONITORING

Page 39: Flink in Zalando's World of Microservices

Business Process

A business process is in its simplest form a chain of

correlated events:

Business Events from the whole Zalando platform flow through

Saiki => opportunity to process those streams in near real-time

39

start event completion event

ORDER_CREATED ALL_SHIPMENTS_DONE

Page 40: Flink in Zalando's World of Microservices

Near Real-Time Business Process Monitoring

• Check if technically the Zalando platform works

• 1000+ Event Types

• Analyze data on the fly:

• Order velocities

• Delivery velocities

• Control SLAs of correlated events, e.g. parcel sent out

after order

40

Page 41: Flink in Zalando's World of Microservices

Architecture BPM

41

App A App B

Nakadi Event Bus

App C

Operational Systems

Kafka2Kafka

Unified Log

Stream Processing Tokoh

ZMON

Cfg Service

PU

BL

IC IN

TE

RN

ET

OA

UT

H

Saiki BPM

Elasticsearch

Page 42: Flink in Zalando's World of Microservices

How we use Flink in BPM

• 1 Event Type -> 1 Kafka topic

• Define a granularity level of 1 minute (TumblingWindows)

• Analyze processes with multiple event types (Join &

Union)

• Generation and processing of Complex Events (CEP &

State)

42

Page 43: Flink in Zalando's World of Microservices

CONTINUOUS ETL

Page 44: Flink in Zalando's World of Microservices

Extract Transform Load (ETL)

Traditional ETL process:

• Batch processing

• No real time

• ETL tools

• Heavy processing on the storage side

44

Page 45: Flink in Zalando's World of Microservices

What changed with Radical Agility?

• Data comes in a semi-structured format (JSON payload)

• Data is distributed in separate Kafka topics

• There would be peak times, meaning that the data flow

will increase by several factors

• Data sources number increased by several factors

45

Page 46: Flink in Zalando's World of Microservices

Architecture Continuous ETL

46

App A App B

Nakadi Event Bus

App C

Operational Systems

Kafka2Kafka

Unified Log

Stream Processing Saiki Continuous ETL

Tukang

Oracle DWH

Orang

Page 47: Flink in Zalando's World of Microservices

How we (would) use Flink in Continuous ETL

• Transformation of complex payloads into simple ones for

easier consumption in Oracle DWH

• Combine several topics based on Business Rules (Union,

Join)

• Pre-Aggregate data to improve performance in the

generation of reports (Windows, State)

• Data cleansing

• Data validation 47

Page 48: Flink in Zalando's World of Microservices

FUTURE WORK

Page 49: Flink in Zalando's World of Microservices

Complex Event Processing for BPM

Cont. example business process:

• Multiple PARCEL_SHIPPED events per order

• Generate complex event ALL_SHIPMENTS_DONE, when

all PARCEL_SHIPPED events received

(CEP lib, State)

49

Page 50: Flink in Zalando's World of Microservices

Deployments from other BI Teams

Flink Jobs from other BI Teams

Requirements:

• manage and control deployments

• isolation of data flows

• prevent different jobs from writing to the same sink

• resource management in Flink

• share cluster resources among concurrently running jobs

StreamSQL would significantly lower the entry barrier

50

Page 51: Flink in Zalando's World of Microservices

Replace Kafka2Kafka

• Python app

• extracts events from REST API Nakadi Event Bus

• writes them to our Kafka cluster

Idea: Replace Python app with Flink Jobs, which would write

directly to Saiki’s Kafka cluster (under discussion)

51

Page 52: Flink in Zalando's World of Microservices

Other Future Topics

• CD integration for Flink appliance

• Monitoring data flows via heartbeats

• Flink <-> Kafka

• Flink <-> Elasticsearch

• New use cases for Real Time Analytics/ BI

• Sales monitoring

52

Page 53: Flink in Zalando's World of Microservices

THANK YOU