37
Perfect Norikra 2nd Season Stream Processing Casual Talks #2 2017/07/27 Satoshi Tagomori (@tagomoris)

Perfect Norikra 2nd Season

Embed Size (px)

Citation preview

Page 1: Perfect Norikra 2nd Season

Perfect Norikra 2nd SeasonStream Processing Casual Talks #2 2017/07/27

Satoshi Tagomori (@tagomoris)

Page 2: Perfect Norikra 2nd Season

Satoshi "Moris" Tagomori (@tagomoris)

Fluentd, MessagePack-Ruby, Norikra, ...

Treasure Data, Inc.

Page 3: Perfect Norikra 2nd Season
Page 4: Perfect Norikra 2nd Season

http://norikra.github.io/

Page 5: Perfect Norikra 2nd Season

Streaming +

SQL

Page 6: Perfect Norikra 2nd Season

Norikra: Schema-less Stream Processing using SQL

• Server software, written in JRuby, runs on JVM

• Open source software (GPLv2)

• http://norikra.github.io/

• https://github.com/norikra/norikra

Page 7: Perfect Norikra 2nd Season

SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins)

WHERE current=”San Diego” AND attend.$0 AND attend.$1

GROUP BY user.age

{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}

{“user.age":35,"cnt":5}, {"user.age":36,"cnt":8}, ...

Page 8: Perfect Norikra 2nd Season

How Norikra is Perfect• Ultra fast bootstrap • Schema on read • Handling complex (nested) events • Dynamic query registration/unregistration • Simple Web UI • Data connector: Fluentd • Extensible: UDF/Listener plugins • Performance: good enough for small/middle site

Page 9: Perfect Norikra 2nd Season

Schema on Read• Query first, Data next • Query must know what it requires

• field names, types of fields, ... • Platform can ingest any data into processor.

Query can fetch events which matches required schema.

schema-less (mixed) data stream

fields subset

for query A

fields subset for query B

query A

query Bevents from

billing service

events from API endpoint

Page 10: Perfect Norikra 2nd Season

Architecture

Norikra Server (on JVM)

Esper Instance (Query Engine)

Type DefinitionManager

Output Event Pool

Norikra Engine

RPC Servermizuno (Jetty + Rack)

Rack RPC Handler

NorikraClientmsgpack-

rpc-over-http

Page 11: Perfect Norikra 2nd Season

For details :)• Norikra: Stream Processing with SQL

http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql

• Norikra: SQL Stream Processing in Ruby http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby

• Norikra in Action http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring

• Landscape of Norikra Features http://www.slideshare.net/tagomoris/norikra-meetup-features

• Norikra Recent Updates http://www.slideshare.net/tagomoris/norikra-recent-updates

Page 12: Perfect Norikra 2nd Season

Recent Updates

• v1.4.0: Jul 19, 2016 • Add support for "-D" and "-agentlib" of JVM • Update msgpack version

• Previous release v1.3.1: May 7, 2015 • Explained in "Norikra Recent Updates" slide

Page 13: Perfect Norikra 2nd Season

User Companies

• LINE Corporation

• Kayac Inc.

• Mercari, Inc.

• (and some/many others)

Page 14: Perfect Norikra 2nd Season

https://www.slideshare.net/tagomoris/how-to-make-norikra-perfect

Page 15: Perfect Norikra 2nd Season

Perfect Norikra• All features of Norikra

• Including "Ultra fast bootstrap" • Compatible RPC API w/ original Norikra

• Distributed execution on any scheduler • YARN? Mesos? or ...? • Automatic failover & retry for failures (HA) • Automated optimization for load balancing • Dynamic scaling out

from 1 to 100 nodes - without any restarts/retries

Page 16: Perfect Norikra 2nd Season

MAKE Norikra

PERFECT AGAIN

Page 17: Perfect Norikra 2nd Season

Features for More Perfection

• Loading operator internal states from Batch query engines

• Sharing operator internal states between queries

Page 18: Perfect Norikra 2nd Season

Stream Processing

• Monitoring, Reporting, Alerting

• Fast recommendation

• Matching behaviors

• and ...

Page 19: Perfect Norikra 2nd Season

Handling Long Term Data/History

timeline

Website audience data

Jul 24, 2014 Purchase a car

Jul 28, 2017 ....?

Start batch queryto read 3~4 years history

Offer a nice bonus to possible customer!

Browser session already expired......

Page 20: Perfect Norikra 2nd Season

Stream Processing on Long Term Data

timeline

Website audience data: processed continuously

Jul 24, 2014 Purchase a car

Jul 28, 2017 Got a nice bonus offer!

Jul 28, 2017 Got a wrong offer...

Rewrite the query & start itwithout past data... more 3 years required for test?

Page 21: Perfect Norikra 2nd Season

Resume/Restart of Queries

• Queries may be stopped/killed by many reasons • cluster version up / migration • troubles

• Queries should be modified anytime • wrong logic • data schema upgrade • new business requirement

Page 22: Perfect Norikra 2nd Season

What we want:

timeline

Website audience data: processed continuously

Jul 24, 2014 Purchase a car

Jul 28, 2017 Got a nice bonus offer!

Jul 28, 2017 Got a wrong offer...

Rewrite & start the query with past long history

Page 23: Perfect Norikra 2nd Season

Load "Running" QueriesLoad "running" stream query from batch engines!

Submit a stream query

Query the history on batch engines & load the result as intermediate state of stream query

Start to process realtime data

Page 24: Perfect Norikra 2nd Season

Load "Running" QueriesLoad "running" stream query from batch engines!

Submit a stream query

Query the history on batch engines & load the result as intermediate state of stream query

Start to process realtime data

Page 25: Perfect Norikra 2nd Season

JOINs with Past DataSubmit a stream query w/ JOIN past data

JOIN

Submit a query

Query past data from batch & load it

JOINStart to process realtime data w/ JOIN

Page 26: Perfect Norikra 2nd Season

JOINs with Past DataSubmit a stream query w/ JOIN past data

JOIN

Submit a query

Query past data from batch & load it

JOINStart to process realtime data w/ JOIN

Page 27: Perfect Norikra 2nd Season

True Lambda Architecture

• Use just one DSL on both of Stream & Batch • SQL!

• Ingest data stream to both of Stream & Storage

• Handle time window intelligently • Specify time window out of DSL • Write once on batch, Run anywhere :D

Page 28: Perfect Norikra 2nd Season

Idempotent Operator State

• As a stream operator with realtime data

• As a loaded stream operator with past data

• Serializable operator internal states

Page 29: Perfect Norikra 2nd Season

Sharing Operators between Queries

Query A

Query B

Page 30: Perfect Norikra 2nd Season

SHARED Operators

Sharing Operators between Queries

history(stream)

history(batch: 3 - 4 years ago)

JOIN

Query Afilter + projection

Query Bfilter + projection

Page 31: Perfect Norikra 2nd Season

Sharing Operators during Updating Query

history(stream)

history(batch: 3 - 4 years ago)

JOIN

Query Afilter + projection

Oops, I found mistake on Query A!

Page 32: Perfect Norikra 2nd Season

SHARED Operators

Sharing Operators during Updating Query

history(stream)

history(batch: 3 - 4 years ago)

JOIN

Query Afilter + projection

Query A'filter + projection

I've just added updated query...

Page 33: Perfect Norikra 2nd Season

Sharing Operators during Updating Query

history(stream)

history(batch: 3 - 4 years ago)

JOIN

Query A'filter + projection

It works!I can remove older one.

Page 34: Perfect Norikra 2nd Season

Perfect Stream Processing Engine• Just same SQL on both of Batch and Stream

• Stream processor which can resume queries using batch query engine results • reduces memory usage of JOINs • reduces memory usage about historical data

• Stream Processor which can share operators between queries • reduces total amount of memory usage • makes it possible to restart/update queries anytime,

casually

Page 35: Perfect Norikra 2nd Season

Perfect Norikra

Page 36: Perfect Norikra 2nd Season

Named

Page 37: Perfect Norikra 2nd Season

It has still 0 bytes. Stay tuned!

We are hiring! - Treasure Data