46
FLINK IN ZALANDO’S WORLD OF MICROSERVICES JAVIER LOPEZ MIHAIL VIERU 12-09-2016

Javier Lopez_Mihail Vieru - Flink in Zalando's World of Microservices - Flink Forward

Embed Size (px)

Citation preview

FLINK IN

ZALANDO’S

WORLD OF

MICROSERVICES

JAVIER LOPEZ

MIHAIL VIERU

12-09-2016

Please write title, subtitle

and speaker name in all

capital letters

2

Please write the title in

all capital letters

AGENDA

Please write the title in

all capital letters

● Zalando’s Microservices Architecture

● Saiki - Data Integration and Distribution at Scale

● Flink in a Microservices World

● Stream Processing Use Cases:

o Business Process Monitoring

o Continuous ETL

● Future Work

3

Please write the title in

all capital letters

ABOUT US

Please write the title in

all capital letters

Mihail Vieru Big Data Engineer,

Business Intelligence

Javier López Big Data Engineer,

Business Intelligence

4

Please write the title in

all capital letters

Please write the title in

all capital letters

5

Please write the title in

all capital letters

Please write the title in

all capital letters

One of Europe's largest online fashion retailers

15 countries

~19 million active customers

~3 billion € revenue 2015

1,500 brands

150,000+ products

11,000+ employees in Europe

6

Please write the title in

all capital letters

Put images in the grey

dotted box "unsupported

placeholder"

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

ZALANDO TECHNOLOGY

Put images in the grey

dotted box "unsupported

placeholder"

Please write the title in

all capital letters

Put images in the grey

dotted box "unsupported

placeholder"

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

1300+ TECHNOLOGISTS

Rapidly growing

international team

http://tech.zalando.com

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

VINTAGE ARCHITECTURE

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

8

Please write the title in

all capital letters

VINTAGE BUSINESS INTELLIGENCE

Please write the title in

all capital letters

Classical ETL process

Business

Logic

Data Warehouse (DWH)

Database DBA

BI

Business

Logic

Database

Business

Logic

Database

Business

Logic

Database

Dev

9

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

VINTAGE BUSINESS INTELLIGENCE

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

DWH Oracle

Exasol

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

RADICAL AGILITY

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

11

Please write the title in

all capital letters

Put images in the grey

dotted box "unsupported

placeholder"

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

RADICAL AGILITY

Put images in the grey

dotted box "unsupported

placeholder"

Please write the title in

all capital letters

Put images in the grey

dotted box "unsupported

placeholder"

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

AUTONOMY

MASTERY

PURPOSE

12

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

RADICAL AGILITY - AUTONOMY

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Technologies Operations Teams

13

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SUPPORTING AUTONOMY: MICROSERVICES

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

Business

Logic

Database

RE

ST

AP

I

14

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SUPPORTING AUTONOMY: MICROSERVICES

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Business

Logic

Database

Team A Business

Logic

Database

Team B

RE

ST

AP

I RE

ST

AP

I

public Internet

Applications communicate using REST APIs

Databases hidden behind the walls of AWS VPC

15

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SUPPORTING AUTONOMY: MICROSERVICES

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Business

Logic

Database

Team A Business

Logic

Database

Team B

RE

ST

AP

I RE

ST

AP

I

public Internet

Classical ETL process is impossible!

16

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SUPPORTING AUTONOMY: MICROSERVICES

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box Business

Logic

Database

RE

ST

AP

I

App A

Business

Logic

Database

RE

ST

AP

I

Ap

p B

Business

Logic

Database

RE

ST

AP

I

Ap

p C

Business

Logic

Database

RE

ST

AP

I

Ap

p D

Business Intelligence

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

SAIKI

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

18

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SAIKI DATA PLATFORM

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SAIKI

App A App B App D App C BI

Data Warehouse

19

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SAIKI — DATA INTEGRATION & DISTRIBUTION

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box BI

Data Warehouse E.g. Forecast DB

SAIKI

App A App B App D App C

Exporter

REST API

Stream Processing

via Apache Flink Data Lake .

AWS S3

20

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

SAIKI — SUMMARY

Δ J

D

B

C

REST

B

E

F

O

R

E

A

F

T

E

R

Data sources

Technologies

Data sources

Connections

Data sources

Extraction

Data

Delivery

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

FLINK IN A

MICROSERVICES WORLD

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

22

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

OPPORTUNITIES FOR NEXT GEN BI

Cloud Computing - Distributed ETL

- Scale

Access to Real Time Data - All teams publish data to central event

bus

Hub for Data Teams - Data Lake provides distributed access

and fine grained security

- Data can be transformed (aggregated,

joined, etc.) before delivering it to data

teams

Semi-Structured Data

“General-purpose data processing engines

like Flink or Spark let you define own data

types and functions.”

- Fabian Hueske,

dataArtisans

23

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

THE RIGHT FIT

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

STREAM PROCESSING

24

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

THE RIGHT FIT — STREAM PROCESSING ENGINE

Candidates:

Storm & Samza ruled out because of batch processing

requirement

25

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

26

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

SPARK VS. FLINK DIFFERENCES

Feature Apache Spark 1.5.2 Apache Flink 0.10.1

Processing mode micro-batching tuple at a time

Temporal processing support processing time event time, ingestion time,

processing time

Latency seconds sub-second

Back pressure handling manual configuration implicit, through system

architecture

State access full state scan for each microbatch value lookup by key

Operator library neutral ++ (split, windowByCount..)

Support neutral ++ (mailing list, direct contact &

support from data Artisans)

27

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

APACHE FLINK

• true stream processing framework

• process events at a consistently high rate with low

latency

• scalable

• great community and on-site support from Berlin/

Europe

• university graduates with Flink skills

https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/

28

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

FLINK ON AWS - OUR APPLIANCE

MASTER ELB

EC2 Docker

Flink Master

EC2 Docker

Flink Shadow Master

WORKERS ELB

EC2 Docker

Flink Worker

EC2 Docker

Flink Worker

EC2 Docker

Flink Worker

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

USE CASES

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

BUSINESS PROCESS

MONITORING

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

31

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

BUSINESS PROCESS

A business process is in its simplest form a chain of

correlated events:

start event completion event

ORDER_CREATE

D

ALL_PARCELS_SHIPPED

Business Events from the whole Zalando platform flow through

Saiki => opportunity to process those streams in near real time

32

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

REAL-TIME BUSINESS PROCESS MONITORING

• Check if business processes in the Zalando platform work

• Analyze data on the fly:

o Order velocities

o Delivery velocities

o Control SLAs of correlated events, e.g. parcel sent out

after order

33

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Saiki BPM

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

ARCHITECTURE BPM

Cfg Service

App A App B

Nakadi Event Bus

App C

Operational Systems

Kafka2Kafka

Unified Log

PU

BLIC

INT

ER

NE

T

OA

UT

H

Alert Svc

UI

Elasticsearch

Stream Processing

34

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

HOW WE USE FLINK IN BPM

• 1000+ Event Types; 1 Event Type -> 1 Kafka topic

• Analyze processes with correlated event types (Join &

Union)

• Enrich data based on business rules

• Sliding Windows (1min to 48hrs) for Platform Snapshots

• State for alert metadata

• Generation and processing of Complex Events (CEP lib)

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

STREAMING ETL

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

36

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Extract Transform Load (ETL)

Traditional ETL process:

• Batch processing

• No real time

• ETL tools

• Heavy processing on the storage side

37

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

WHAT CHANGED WITH RADICAL AGILITY?

• Data comes in a semi-structured format (JSON payload)

• Data is distributed in separate Kafka topics

• There would be peak times, meaning that the data flow

will increase by several factors

• Data sources number increased by several factors

38

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

`

Saiki Streaming ETL

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

ARCHITECTURE STREAMING ETL

Stream Processing

App A App B

Nakadi Event Bus

App C

Operational Systems

Kafka2Kafka

Unified Log Exporter

Oracle DWH

Importer

39

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

HOW WE (WOULD) USE FLINK IN STREAMING ETL

• Transformation of complex payloads into simple ones for

easier consumption in Oracle DWH

• Combine several topics based on Business Rules (Union,

Join)

• Pre-Aggregate data to improve performance in the

generation of reports (Windows, State)

• Data cleansing

• Data validation

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

FUTURE USE CASES

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

41

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

COMPLEX EVENT PROCESSING FOR BPM

Cont. example business process:

• Multiple PARCEL_SHIPPED events per order

• Generate complex event ALL_PARCELS_SHIPPED,

when all PARCEL_SHIPPED events received

(CEP lib, State)

42

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

DEPLOYMENTS FROM OTHER BI TEAMS

Flink Jobs from other BI Teams

Requirements:

• manage and control deployments

• isolation of data flows

o prevent different jobs from writing to the same sink

• resource management in Flink

o share cluster resources among concurrently running jobs

StreamSQL would significantly lower the entry barrier

43

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

REPLACE KAFKA2KAFKA COMPONENT

• Python app

• extracts events from REST API Nakadi Event Bus

• writes them to our Kafka cluster

Idea: Create Nakadi consumer/producer to enable stream

processing with Flink to other internal users

(first POC done)

44

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

OTHER FUTURE TOPICS

• New use cases for Real Time Analytics/ BI

o Sales monitoring

o Price monitoring

• Fraud detection for payments (evaluation)

• Contact customer according to variable event pattern

(evaluation)

45

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

Please write the title in

all capital letters

Use bullet points to

summarize information

rather than writing long

paragraphs in the text

box

CONCLUSION

Flink proved to be the right fit for our current stream

processing use cases. It enables us to build Zalando’s Next

Gen BI platform.

https://tech.zalando.de/blog/?tags=Saiki

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters

THANK YOU

Put images in the grey

dotted box "unsupported

placeholder" - behind

the orange box and

quote in capital letters