32
Concord: Simple & Flexible Stream Processing on Apache Mesos Shinji Kim Co-founder, Concord Systems @concord @databythebay #datagrid

Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay May 2016

  • Upload
    concord

  • View
    424

  • Download
    0

Embed Size (px)

Citation preview

Concord: Simple & Flexible

Stream Processing on Apache Mesos

Shinji Kim

Co-founder, Concord Systems

@concord @databythebay #datagrid

Overview

•  What is Stream Processing?

•  Today’s Stream Processing

•  Introducing Concord

1. Concepts & API

2. Job Topology Management

3. Operations, Toolings, Performance

4. Message Delivery Guarantees

•  Future Development Plans

Page 2

What is stream processing?

Page 3

•  Processing Data in motion

•  Sits between message queues and databases

•  Used for faster:

–  Data enrichment

–  Aggregation

–  Filtering / deduplication

Today’s Stream Processing

•  Faster MapReduce jobs à ends up running core business logic on top

–  Fradulent click detection

–  Real-time budget updates

–  Trigger-based trading

•  Your stream processing jobs are more like microservices

•  Need support for services / application management: Cluster mgmt, Monitoring, Debuggability

Page 4

Introducing Concord

Concord is a distributed stream processing framework

built in C++ on top of Apache Mesos, designed for

high-performance, real-time applications that require

flexibility & control.

Page 5

Introducing Concord

Page 6

Data  Sources   Data  Sinks  

Pub / Sub Operator Model

•  Composable jobs by Metadata

A   B  words  Metadata(

Name=‘A’, istreams=[], ostreams=[‘words’])

Metadata( Name=‘B’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[])

Page 7

Pub / Sub Operator Model

•  Composable jobs by Metadata

A   B  words  Metadata(

Name=‘A’, istreams=[], ostreams=[‘words’])

Metadata( Name=‘B’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[])

Page 8

C   Metadata( Name=‘C’, istreams=[‘words’, StreamGrouping.SHUFFLE], ostreams=[])

Simple API in Multiple Languages

•  ProcessRecord, ProduceRecord, ProcessTimer

•  GetState, SetState backed by Rocksdb

•  API available in Python, Ruby, Go, Java/Scala, C++

B  Metadata( Name=‘C’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[‘wordcount’])

Page 9

words   wordcount  

Key   Value  

Corgi   2  

Chiwawa   4  

Dashhound   5  

Useful for multiple teams to consume the same streaming data in real-time

Page 10

Native Integration with Apache Mesos

Page 11

•  Dynamic resource scheduling

•  Task Isolation

•  Task supervision

•  High Availability

Containerized Execution Environment

•  Horizontal scaling

•  Multi-tenancy

•  Hot code deployment & dynamic topology

Page 12

Mesos  Agent  

RocksD

B  

Concord is Flexible: Run-time deployment

Page 13

Concord is Flexible: Run-time deployment

Page 14

Concord is Flexible: Run-time deployment

Page 15

Concord is Flexible: Run-time deployment

Page 16

Concord supports Distributed Tracing

Page 17

Monitor all operator instances at glance

Page 18

Concord supports Transparent Debugging

[2015-11-02 15:36:44.770] [dispatcher_latencies] [info] 127.0.0.1:31000: traceId: -8816532120874703981, parentId: 0, id: -6816766813334129096, p50: 388179us, p95: 519668us, p99: 524812us, p999: 526425us

[2015-11-02 15:37:13.929] [principal_latencies] [info] 127.0.0.1:31001:

traceId: -4811311467074699790, parentId: -7681059555040553620, id: -1899872683843643522, p50: 73355us, p95: 145626us, p99: 210345us, p999: 272018us

[2015-11-02 15:36:43.323] [incoming_throughput] [info] 12288 req in 1045515us. total: 367616 req [2015-11-02 15:36:30.240] [outgoing_throughput] [info] 100000 req in 4804526us. total: 600000 req

Page 19

Concord performs well at scale

•  Word count benchmark (1.13B msgs) –  Concord: 500K QPS/node at 10ms/event

–  Storm: 16K QPS/node at 100ms/event

–  Spark Streaming: 100K QPS/node at 1s batch window

•  Server log processing (29G server log, ~260M msgs) –  4 nodes, 8 vCPU, 32GB RAM each

–  Concord: 1M – 1.8M QPS

–  Spark Streaming: 72K – 2M QPS

•  Consistent performance

Page 20

Concord is designed for Predictability

•  As you scale, JVM reconfiguration and GC pauses are inevitable (Framework GC vs. Application GC)

•  Cluster abstracted as CPU, Memory, Disk numbers à cluster optimization & overall runtime

•  Fast Compile à Test à Deploy cycle without downtime

Page 21

Message Delivery Guarantees

Today: Fast > Complete or Perfect

•  Best-effort / at-most-once processing –  When operator or node crashes, the local cache goes away

–  Automatically retries the failed operator (number of retries is configurable)

–  Recommends implementing check mechanisms in operators (e.g., Concord Kafka consumer)

Page 22

Message Delivery Guarantees

Soon: Fast + Complete > Perfect

•  In development for at-least-once with Kafka –  Kafka acts as a message bus between operators –  Kafka replays data from checked offset (data duplication)

Eventually: Fast + Complete + Perfect

•  Transactional datastore in design phase

Page 23

Future plans

•  “At least once” guarantee support with Kafka

•  DC/OS integration

•  More data source / data sink connector support

•  Higher level DSL

Page 24

Concord: Simple & Flexible streaming application framework on Apache Mesos

Page 25

•  Operator model that you can use multiple languages

Concord: Simple & Flexible streaming application framework on Apache Mesos

Page 26

•  Operator model that you can use multiple languages

à Fast development and iteration time for multiple teams using the same data

Concord: Simple & Flexible streaming application framework on Apache Mesos

Page 27

•  Operator model that you can use multiple languages

à Fast development and iteration time for multiple teams using the same data

•  Dynamic topology, run-time deployment and scaling

Concord: Simple & Flexible streaming application framework on Apache Mesos

Page 28

•  Operator model that you can use multiple languages

à Fast development and iteration time for multiple teams using the same data

•  Dynamic topology, run-time deployment and scaling

à Decoupled development & dev ops work

Concord: Simple & Flexible streaming application framework on Apache Mesos

Page 29

•  Operator model that you can use multiple languages

à Fast development and iteration time for multiple teams using the same data

•  Dynamic topology, run-time deployment and scaling

à Decoupled development & dev ops work

•  High performance at scale

Concord: Simple & Flexible streaming application framework on Apache Mesos

Page 30

•  Operator model that you can use multiple languages

à Fast development and iteration time for multiple teams using the same data

•  Dynamic topology, run-time deployment and scaling

à Decoupled development & dev ops work

•  High performance at scale

à Predictable system for real-time applications

Concord: Simple & Flexible streaming application framework on Apache Mesos

Page 31

•  Low-latency / Real-time applications:

–  Real-time fraud detection

–  Financial market data processing for real-time risks and triggers

–  Real-time campaign management for real-time bidding (RTB)

Thank You!

Get Started: http://concord.io

[email protected] / @shinjikim

@concord @databythebay #datagrid