35
Community Update & Roadmap 2016 Robert Metzger @rmetzger_ [email protected] Berlin Apache Flink Meetup, January 26, 2016

January 2016 Flink Community Update & Roadmap 2016

Embed Size (px)

Citation preview

Page 1: January 2016 Flink Community Update & Roadmap 2016

Community Update &

Roadmap 2016

Robert Metzger

@rmetzger_

[email protected]

Berlin Apache Flink Meetup,January 26, 2016

Page 2: January 2016 Flink Community Update & Roadmap 2016

January Community Update

What happened in the last month

2

Page 3: January 2016 Flink Community Update & Roadmap 2016

What happened?

3

Google proposed Dataflow API to Apache

Incubator

Proposal discussions at the mailing list:

• SQL / Stream SQL support

• CEP (Complex Event Processing) library

Flink Kinesis Connector

Chengxiang Li added as committer

Discussions for releasing 1.0.0

Page 4: January 2016 Flink Community Update & Roadmap 2016

Now merged to master (1.0-SNAPSOT)

4

Savepoints: Manual checkpoints for restarting jobs with state

Kafka 0.9.0.0 integration

Job submission through JobManager web interface

Checkpoint statistics in JobManager web interface

Streaming examples are now in the binary dist

Page 5: January 2016 Flink Community Update & Roadmap 2016

Reading List

Benchmarking Streaming Computation

Engines at Yahoo!

Receiving metrics from Apache Flink

applications

Running Apache Flink on Amazon Elastic

Mapreduce

5

1. http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at

2. http://mnxfst.tumblr.com/post/136539620407/receiving-metrics-from-apache-flink-applications

3. http://themodernlife.github.io/scala/hadoop/hdfs/sclading/flink/streaming/realtime/emr/aws/2016/01/06/running-apache-flink-on-amazon-elastic-mapreduce/

Page 6: January 2016 Flink Community Update & Roadmap 2016

Upcoming talks

FOSDEM Brussels (4 talks) (Jan 30-31)

Big Data Technology Summit Warsaw

(Feb. 25-26)

Qcon London (March 7-9)

Hadoop Summit Dublin (2 talks) (April 13-

14)

Strata San Jose

Strata London

6

Page 7: January 2016 Flink Community Update & Roadmap 2016

Global Meetup Community

Brazil-Sao Paulo Apache Flink Meetup

Apache Flink Taiwan User Group

Also new groups in Delhi, Phoenix and

Dallas

7

Page 8: January 2016 Flink Community Update & Roadmap 2016

Github stats

8

900 Stars

Page 9: January 2016 Flink Community Update & Roadmap 2016

Roadmap 2016

Whats next?

9

Page 10: January 2016 Flink Community Update & Roadmap 2016

Overview

10

SQL / StreamSQL

CEP Library

Managed Operator State

Dynamic Scaling

Miscellaneous

Page 11: January 2016 Flink Community Update & Roadmap 2016

SQL and StreamSQL

11

Page 12: January 2016 Flink Community Update & Roadmap 2016

SQL / StreamSQL

12

Structured queries over data sets and

streams

Add support for SQL

• Standard SQL queries over (batch) data sets

• Continuous StreamSQL queries over data

streams

Keep and extend Table API as structured

query API on data sets and streams

Page 13: January 2016 Flink Community Update & Roadmap 2016

Proposed Architecture

13

Table API(Batch) SQL

Query StreamSQL

Query

Ap

ach

e C

alci

te Standard SQL parser

CustomizedStreamSQL

parser

Optimizer

Logical Plan

DataSetProgram

DataStreamProgram

APIs

Internals

Page 14: January 2016 Flink Community Update & Roadmap 2016

SQL integration into APIs

14

val stream : DataStream[(String, Double, Int)] = env.addSource(new FlinkKafkaConsumer(...))

val tabEnv = new TableEnvironment(env)tabEnv.registerStream(stream, “myStream”, (“ID”, “MEASURE”, “COUNT”))

val sqlQuery = tabEnv.sql(“SELECT ID, MEASURE FROM myStream WHERE

COUNT > 17”)

Define Kafka input stream

Define table environment

SQL Query

Page 15: January 2016 Flink Community Update & Roadmap 2016

Complex Event Processing

15

Page 16: January 2016 Flink Community Update & Roadmap 2016

CEP Library

Complex Event Processing: the analysis of

complex patterns such as correlations and

sequence detection from multiple sources

Most current systems are not distributed

(beyond multi-threading)

Goal: provide an easy to use API for CEP,

running on a distributed high-throughput, low

latency engine.

16

Page 17: January 2016 Flink Community Update & Roadmap 2016

CEP Example

17

Realtime stock prices

15.1 15.3 15.2 15.5State

MachineAlerts

StartPrice drop by at least $.5

Ignore

Alert

Page 18: January 2016 Flink Community Update & Roadmap 2016

Programming API for CEP

CEPStream<Event> cepStream = CEP.from(inputDataStream)

// groupingGroupedCEPStream<Event> grouped = cepStream.groupBy(“id”)

// windowsWindowedCEPStream windowed = grouped.timeWindow(Time.minutes(10), Time.minutes(1))WindowedCEPStream windowed = grouped.countWindow(10L, 1L)

// pattern matchingCEPStream<Result> resultStream = CEP.from(input).groupBy(0).pattern(

Pattern.<Event>next("e1").where( (evt) -> evt.id == 42 ).followedBy("e2").where( (evt) -> evt.id == 1337 ).within(Time.minutes(10))

).select( (Map<String, Event> patternElements) -> new Result(patternElements.get("e2").timestamp -

patternElements.get("e1").timestamp) )18

convert stream into CEPStream of Events

Window events

Define a pattern to match

Page 19: January 2016 Flink Community Update & Roadmap 2016

DSL for CEP

select e1.id, e1.price from every e1 = Event(price > 10) → e2 = Event(date == 42) → e3 = Event(price == 10) within 10 seconds where e1.id == e2.id

19

No programming required

Potentially integrated with SQL

Page 20: January 2016 Flink Community Update & Roadmap 2016

Managed Operator State

20

Page 21: January 2016 Flink Community Update & Roadmap 2016

State in Flink

21

Operator

“count tweet impressions”

User Function

state

impression counts

Retrieve/set count for tweet it

Page 22: January 2016 Flink Community Update & Roadmap 2016

State in Flink

22

Operator

“count tweet impressions”

User Function

state

impression counts

Retrieve/set count for tweet it

What happens if the job crashes?

Loss of data

Page 23: January 2016 Flink Community Update & Roadmap 2016

Solution: Checkpoints

23

Operator

“count tweet impressions”

User Function

impression counts

Retrieve/set count for tweet it

Periodic checkpoints of state to HDFS

Restore from HDFS in case of failure

state

Page 24: January 2016 Flink Community Update & Roadmap 2016

Solution: Checkpoints

24

Operator

“count tweet impressions”

User Function

impression counts

Retrieve/set count for tweet it

Periodic checkpoints of state to HDFS

Restore from HDFS in case of failure

state

This is the current state in Flink!

Page 25: January 2016 Flink Community Update & Roadmap 2016

State on Steroids

25

Operator

“count tweet impressions”

User Function

impression counts

Retrieve/set count for tweet it

state

Page 26: January 2016 Flink Community Update & Roadmap 2016

State on Steroids

26

Operator

“count tweet impressions”

User Function

impression counts

Retrieve/set count for tweet it

state

Spill to diskasync/incremental snapshots

Restore from HDFS in case of failure

What if stategrows too big?

Page 27: January 2016 Flink Community Update & Roadmap 2016

State on Steroids

27

Operator

“count tweet impressions”

User Function

impression counts

Retrieve/set count for tweet it

state

Spill to disk

Page 28: January 2016 Flink Community Update & Roadmap 2016

State on Steroids

28

Operator

“count tweet impressions”

User Function

impression counts

Retrieve/set count for tweet it

state

Spill to diskasync/incremental snapshots

Restore from HDFS in case of failure

What if stategrows too big?

Checkpointing stalls processing!

Page 29: January 2016 Flink Community Update & Roadmap 2016

State on Steroids

29

Operator

“count tweet impressions”

User Function

impression counts

Retrieve/set count for tweet it

state

Spill to diskasync/incremental snapshots

Restore from HDFS in case of failure

Page 30: January 2016 Flink Community Update & Roadmap 2016

Dealing with Dynamic

Resources

30

Page 31: January 2016 Flink Community Update & Roadmap 2016

Streams with varying data rate

31

time

events

/second

With static resources: Provision for max. rate

Idle capacity

Page 32: January 2016 Flink Community Update & Roadmap 2016

(1) Adjust Parallelism

32

Initialconfiguration

Scale Out(for load)

Scale In(save resources)

Page 33: January 2016 Flink Community Update & Roadmap 2016

(1) Adjust Parallelism

Adjusting parallelism without (significantly) interrupting the program

Initial version:

• Checkpoint -> stop -> restart-with-different-parallelism

Stateless operators: Trivial

Stateful operators: Repartition state

• Transparent for key/value state and windows

• Consistent hashing simplifies state reorganization

33

Page 34: January 2016 Flink Community Update & Roadmap 2016

(2) Dynamic Worker Pool

34

JobManager

ResourceManager

Pool of Cluster ResourcesYARN/Mesos/…

TaskManager

TaskManager

Page 35: January 2016 Flink Community Update & Roadmap 2016

Miscellaneous

Support for Apache Mesos

Security• Over-the-wire encryption of RPC (akka) and data

transfers (netty)

More connectors• Apache Cassandra

• Amazon Kinesis

Enhance metrics• Throughput / Latencies

• Backpressure monitoring

• Spilling / Out of Core

35