Download pdf - Paper6745 presentation tianjian

Building A Massive Stream Computing Platform ForFlexible Applications

Tianjian Chen

Zhengrui Man

Hao Li

Xin Sun

Raymond K. Wong

Zhiwei Yu

Jun, 2014 IEEE BigData Congress

Unveil Something behind the Paper

Highlights

• Applications Design the System Themselves

• Complete Modularization Strategy

• Extreme Simple Stream Model

A More Accurate Version of Development History

“Hey guys, there is a new stream computing technology coming!”

Great! Can it make every process go faster?

“OK , that may depend on …”

OP1

OP2

OP3

OP4 Mobile Devices

User Filter

AdsRecalling

AdsRanking

PushController

User Preference Database

Location Logging API

Ads POIDatabase

LBS Ads Service

DRPC

DRPC

LBSAds Query

Can I do this?

• Location Based Ads Push System

• Co-Serving With Online Services

OP1

OP2

Redis Cluster

Web Page Cache

OP3 OP4

OP5 User Model

Log Filter

Data Join

FeatureExtraction

ModelUpdate

Web Crawling

Logging APICan I do this?

• User Preference Tracking System

Stream Application

Vortex Platform Hadoop Platform

Map-Reduce Application

On

line

Web

Ser

vice

sCan I co-operate it with m/r?

Private Cluster Leased from Public Clusters

Can I scale-out it on leased containers?

Handle Fluctuating Workloads?

Can it be 100% reliable?

Complete Modularization

Nukua Automation System

Universal Resource Manger

Spinal DMQ

Stream Computing Core

DRPC Service Interface

Layer 1 : Computing Resources

Layer 2 : Deployment Automation

Layer 3 : Data Transmission

Layer 4 : Topology Representation

Layer 5 : Stream Application

5 Independent Sub-Systems

Everything is Configurable

Even for The Message Relay Model

Message Queuing Model

Operator 1

Queue A Queue B Queue C

Operator 2

Message Passing Model

Operator 1

Operator 2

Operator 3

uplink

uplink

uplink

downlink

downlink

downlink

Spinal DMQ Cluster

SCC Operator Cluster

SCC Master

downlink

downlink

downlink

uplink

uplink

OP1 Sub-Links ofOP1Downstream

Sub-Links ofOP2Upstream

OP2

OperatorCluster

OperatorCluster

Spinal DMQ Cluster

Message Queuing Configuration

downlink

downlink

downlink

uplink

uplink

OP1 Sub-Links ofOP1Downstream

Sub-Links ofOP2Upstream

OP2

Message Passing Configuration

A Simple Stream Model

Traditional Stream Model

0 1 2 3 5 6 7 8 n

• Independent Consumer Status• High Index Overhead• High Snapshot Overhead

Vortex Stream Model

0 n

Head Tail

• Unified Status• Minimal Index Overhead• Minimal Snapshot Overhead

Lessons Learned

• Highly Configurable System For Flexible Applications

• Big Data Requires Everything Simple & Reliable

Thank U!

• G+: [email protected]

• Skype: tianjian_chen

• Linkedin: http://lnkd.in/bRN6xsh