January 2016 Flink Community Update & Roadmap 2016

Community Update &

Roadmap 2016

Robert Metzger

@rmetzger_

[email protected]

Berlin Apache Flink Meetup,January 26, 2016

January Community Update

What happened in the last month

2

What happened?

3

Google proposed Dataflow API to Apache

Incubator

Proposal discussions at the mailing list:

• SQL / Stream SQL support

• CEP (Complex Event Processing) library

Flink Kinesis Connector

Chengxiang Li added as committer

Discussions for releasing 1.0.0

Now merged to master (1.0-SNAPSOT)

4

Savepoints: Manual checkpoints for restarting jobs with state

Kafka 0.9.0.0 integration

Job submission through JobManager web interface

Checkpoint statistics in JobManager web interface

Streaming examples are now in the binary dist

Reading List

Benchmarking Streaming Computation

Engines at Yahoo!

Receiving metrics from Apache Flink

applications

Running Apache Flink on Amazon Elastic

Mapreduce

5

1. http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at

2. http://mnxfst.tumblr.com/post/136539620407/receiving-metrics-from-apache-flink-applications

3. http://themodernlife.github.io/scala/hadoop/hdfs/sclading/flink/streaming/realtime/emr/aws/2016/01/06/running-apache-flink-on-amazon-elastic-mapreduce/

http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at

http://mnxfst.tumblr.com/post/136539620407/receiving-metrics-from-apache-flink-applications

http://themodernlife.github.io/scala/hadoop/hdfs/sclading/flink/streaming/realtime/emr/aws/2016/01/06/running-apache-flink-on-amazon-elastic-mapreduce/

Upcoming talks

FOSDEM Brussels (4 talks) (Jan 30-31)

Big Data Technology Summit Warsaw

(Feb. 25-26)

Qcon London (March 7-9)

Hadoop Summit Dublin (2 talks) (April 13-

14)

Strata San Jose

Strata London

6

Global Meetup Community

Brazil-Sao Paulo Apache Flink Meetup

Apache Flink Taiwan User Group

Also new groups in Delhi, Phoenix and

Dallas

7

Github stats

8

900 Stars

Roadmap 2016

Whats next?

9

Overview

10

SQL / StreamSQL

CEP Library

Managed Operator State

Dynamic Scaling

Miscellaneous

SQL and StreamSQL

11

SQL / StreamSQL

12

Structured queries over data sets and

streams

Add support for SQL

• Standard SQL queries over (batch) data sets

• Continuous StreamSQL queries over data

streams

Keep and extend Table API as structured

query API on data sets and streams

Proposed Architecture

13

Table API(Batch) SQL

Query StreamSQL

Query

Ap

ach

e C

alci

te Standard SQL parser

CustomizedStreamSQL

parser

Optimizer

Logical Plan

DataSetProgram

DataStreamProgram

APIs

Internals

SQL integration into APIs

14

val stream : DataStream[(String, Double, Int)] = env.addSource(new FlinkKafkaConsumer(...))

val tabEnv = new TableEnvironment(env)tabEnv.registerStream(stream, “myStream”, (“ID”, “MEASURE”, “COUNT”))

val sqlQuery = tabEnv.sql(“SELECT ID, MEASURE FROM myStream WHERE

COUNT > 17”)

Define Kafka input stream

Define table environment

SQL Query

Complex Event Processing

15

CEP Library

Complex Event Processing: the analysis of

complex patterns such as correlations and

sequence detection from multiple sources

Most current systems are not distributed

(beyond multi-threading)

Goal: provide an easy to use API for CEP,

running on a distributed high-throughput, low

latency engine.

16

CEP Example

17

Realtime stock prices

15.1 15.3 15.2 15.5State

MachineAlerts

StartPrice drop by at least $.5

Ignore

Alert

Programming API for CEP

CEPStream<Event> cepStream = CEP.from(inputDataStream)

// groupingGroupedCEPStream<Event> grouped = cepStream.groupBy(“id”)

// windowsWindowedCEPStream windowed = grouped.timeWindow(Time.minutes(10), Time.minutes(1))WindowedCEPStream windowed = grouped.countWindow(10L, 1L)

// pattern matchingCEPStream<Result> resultStream = CEP.from(input).groupBy(0).pattern(

Pattern.<Event>next("e1").where( (evt) -> evt.id == 42 ).followedBy("e2").where( (evt) -> evt.id == 1337 ).within(Time.minutes(10))

).select( (Map<String, Event> patternElements) -> new Result(patternElements.get("e2").timestamp -

patternElements.get("e1").timestamp) )18

convert stream into CEPStream of Events

Window events

Define a pattern to match

DSL for CEP

select e1.id, e1.price from every e1 = Event(price > 10) → e2 = Event(date == 42) → e3 = Event(price == 10) within 10 seconds where e1.id == e2.id

19

No programming required

Potentially integrated with SQL

Managed Operator State

20

State in Flink

21

Operator

“count tweet impressions”

User Function

state

impression counts

Retrieve/set count for tweet it

State in Flink

22

Operator


User Function

state

impression counts


What happens if the job crashes?

Loss of data

Solution: Checkpoints

23

Operator


User Function

impression counts


Periodic checkpoints of state to HDFS

Restore from HDFS in case of failure

state

Solution: Checkpoints

24

Operator


User Function

impression counts


Periodic checkpoints of state to HDFS


state

This is the current state in Flink!

State on Steroids

25

Operator


User Function

impression counts


state

State on Steroids

26

Operator


User Function

impression counts


state

Spill to diskasync/incremental snapshots


What if stategrows too big?

State on Steroids

27

Operator


User Function

impression counts


state

Spill to disk

State on Steroids

28

Operator


User Function

impression counts


state



What if stategrows too big?

Checkpointing stalls processing!

State on Steroids

29

Operator


User Function

impression counts


state



Dealing with Dynamic

Resources

30

Streams with varying data rate

31

time

events

/second

With static resources: Provision for max. rate

Idle capacity

(1) Adjust Parallelism

32

Initialconfiguration

Scale Out(for load)

Scale In(save resources)

(1) Adjust Parallelism

Adjusting parallelism without (significantly) interrupting the program

Initial version:

• Checkpoint -> stop -> restart-with-different-parallelism

Stateless operators: Trivial

Stateful operators: Repartition state

• Transparent for key/value state and windows

• Consistent hashing simplifies state reorganization

33

(2) Dynamic Worker Pool

34

JobManager

ResourceManager

Pool of Cluster ResourcesYARN/Mesos/…

TaskManager

TaskManager

Miscellaneous

Support for Apache Mesos

Security• Over-the-wire encryption of RPC (akka) and data

transfers (netty)

More connectors• Apache Cassandra

• Amazon Kinesis

Enhance metrics• Throughput / Latencies

• Backpressure monitoring

• Spilling / Out of Core

35