32
Big thanks to everyone!

Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Embed Size (px)

Citation preview

Page 1: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Big thanks to everyone!

Page 2: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

The convergence ofreal-time analytics and

event-driven applications@StephanEwen

Flink Forward San FranciscoApril 11, 2017

2

Page 3: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

3

2016 was the year when streaming technologies became mainstream

2017 is the year to realize the full spectrum

of streaming applications

Page 4: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Some large scale streaming applications

4

Page 5: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

5

Detecting fraud in real time

As fraudsters get better, need to update models without downtime

Live 24/7 service

Credit card transactions

Notificationsand alerts

Evolving fraudmodels built bydata scientists

@

Page 6: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

6

@ Athena X SQL to define metrics Thresholds and actions to trigger Blends analytics and

actionsStreams from Hadoop, Kafka, etc

SQL, thresholds,

actions

AnalyticsAlerts

Derived streams

Page 7: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

7

Route events to Kafka, ES, Hive Complex interaction sessions rules Mix of stateless / small state / large state

Stream Processing as a Service• Launching, monitoring, scaling, updating• DSL to define jobs

@

Page 8: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

8

Blink based on Flink A core system in Alibaba Search

• Machine learning, search, recommendations• A/B testing of search algorithms• Online feature updates to boost conversion rate

Alibaba is a major contributor to Flink Contributing many changes back to open source

@

Page 9: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

9

@

Complete social network implementedusing event sourcing andCQRS (Command Query Responsibility Segregation)

Page 10: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

What can we learn from these?

10

All these applications run on Flink Applications, not just analytics

• Not just finding out what the data means but acting on that at the same time

Workloads going beyond the traditional Hadoop realm• Hadoop is possible deploy, source, and sink• Container engines and other storage systems

increasingly popular with Flink

Page 11: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

So, what is data streaming?

11

First wave for streaming was lambda architecture• Aid batch systems to be more real-time

Second wave was analytics (real time and lag-time)• Based on distributed collections, functions, and

windows

The next wave is much broader:A new architecture for event-driven applications

Page 12: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

12

Event–driven applications

Page 13: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Events, State, Time, and Snapshots

14

f(a,b)

Event-driven functionexecuted distributedly

Page 14: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Events, State, Time, and Snapshots

15

f(a,b)

Maintain fault tolerant local state similar toany normal application

Page 15: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Events, State, Time, and Snapshots

16

f(a,b)

wall clock

event time clock

Access and react tonotions of time and progress,handle out-of-order events

Page 16: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Events, State, Time, and Snapshots

17

f(a,b)

wall clock

event time clock

Snapshot point-in-timeview for recovery,rollback, cloning,versioning, etc.

Page 17: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Event–driven applications

18

Event-drivenApplications

Stream Processing

Batch Processing

Stateful, event-driven,event-time-aware processing

(event sourcing, CQRS, …)

(streams, windows, …)

(data sets)

Page 18: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

The APIs

19

Process Function (events, state, time)

DataStream API (streams, windows)

Table API (dynamic tables)

Stream SQL

Stream- &Batch Processing

Analytics

StatefulEvent-DrivenApplications

Page 19: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Process Function

20

class MyFunction extends ProcessFunction[MyEvent, Result] {

// declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…)

def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state (event, state.value) match { … }

out.collect(…) // emit events state.update(…) // modify state

// schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) }

def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached }}

Page 20: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Data Stream API

21

val lines: DataStream[String] = env.addSource( new FlinkKafkaConsumer09<>(…))

val events: DataStream[Event] = lines.map((line) => parse(line))

val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction())

stats.addSink(new RollingSink(path))

Page 21: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Table API & Stream SQL

22

Page 22: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Streaming Architecturefor Event-driven Applications

23

Page 23: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Compute, State, and Storage

24

Classic tiered architecture

Streaming architecture

database

layer

computelayer

application state+ backup

compute+

stream storageand

snapshot storage(backup)

application state

Page 24: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Performance

25

synchronous reads/writesacross tier boundary

asynchronous writesof large blobs

all modificationsare local

Classic tiered architecture

Streaming architecture

Page 25: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Consistency

26

distributed transactions

at scale typicallyat-most / at-least once

exactly onceper state

=1 =1snapshot consistency

across states

Classic tiered architecture

Streaming architecture

Page 26: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Scaling a Service

27

separately provision additionaldatabase capacity

provision computeand state together

Classic tiered architecture

Streaming architecture

provision compute

Page 27: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Rolling out a new Service

28

provision a new database(or add capacity to an existing one)

provision compute

and state together

simply occupies someadditional backup

space

Classic tiered architecture

Streaming architecture

Page 28: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Time, Completeness, Out-of-order

29

?

event time clocksdefine data

completenessevent time timers

handle actions for

out-of-order data

Classic tiered architecture

Streaming architecture

Page 29: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Repair External State

30

Streaming architecture

streams(lets say Kafka etc) live application external state

wrong results

backed up data(HDFS, S3, etc.)

Page 30: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Repair External State

31

Streaming architecture

live application external state

overwritewith correct results

streams(lets say Kafka etc)

backed up data(HDFS, S3, etc.) application on backup

input

Page 31: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

Repair External State

32

Streaming architecture

live application external state

overwritewith correct results

streams(lets say Kafka etc)

backed up date(HDFS, S3, etc.)

Each service doubles as a batch job!

application on backup input

Page 32: Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and data-driven applications

33

Streaming has outgrown the Hadoop Stack

Event-driven applications and realtime analytics converge with Apache Flink

Event-driven applications become easierto manage, faster, and more powerful following a

streaming architecture implemented with Flink