Realtime Computation with Storm

Preview:

DESCRIPTION

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use! We will talk about how Storm is architected, how to interoperate with Hadoop, and a few real-world use-cases.

Citation preview

Realtime Computationwith Storm

Brad Andersonbanderson@maprtech.com

@boorad

Definition & OverviewInteroperability

Use Cases

Stream ProcessingCEP

Distributed RPC

Source Data

•Social Media Feeds

•Network Sensors

•App/Web Logs

•Stock Tick Data

•Weather Data

•Auctions of Ad Impressions

•Payment Transactions

Before Storm

Queues Workers

Example

(simplified)

StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing“Just works”

Concepts

streams

Unbounded sequence of tuples

Tuple Tuple Tuple Tuple Tuple Tuple Tuple

spouts

Source of streams

spoutspublic  interface  ISpout  extends  Serializable  {        void  open(Map  conf,                            TopologyContext  context,                            SpoutOutputCollector  collector);        void  close();        void  nextTuple();        void  ack(Object  msgId);        void  fail(Object  msgId);}

bolts

Processes input streams and produces new streams

boltspublic  class  DoubleAndTripleBolt  extends  BaseRichBolt  {        private  OutputCollectorBase  _collector;

       public  void  prepare(Map  conf,                                                TopologyContext  context,                                                OutputCollectorBase  collector)  {                _collector  =  collector;        }

       public  void  execute(Tuple  input)  {                int  val  =  input.getInteger(0);                                _collector.emit(input,  new  Values(val*2,  val*3));                _collector.ack(input);        }

       public  void  declareOutputFields(OutputFieldsDeclarer  declarer)  {                declarer.declare(new  Fields("double",  "triple"));        }        }

topologies

Network of spouts and bolts

topologies        TopologyBuilder builder = new TopologyBuilder();                builder.setSpout("spout", new RandomSentenceSpout(), 5);                builder.setBolt("split", new SplitSentence(), 8)                 .shuffleGrouping("spout");        builder.setBolt("count", new WordCount(), 12)                 .fieldsGrouping("split", new Fields("word"));

TridentCascading for Storm

Trident Facilities• Joins

• Aggregations

• Grouping

• Functions

• Filters

• Consistent, Exactly-Once Semantics

TridentTopology  topology  =  new  TridentTopology();                TridentState  wordCounts  =          topology.newStream("spout1",  spout)              .each(new  Fields("sentence"),  new  Split(),  new  Fields("word"))              .groupBy(new  Fields("word"))              .persistentAggregate(new  MemoryMapState.Factory(),                                                        new  Count(),                                                        new  Fields("count"))                                              .parallelismHint(6);

Interoperability

spouts•Kafka (with transactions)

•Kestrel

•JMS

•AMQP

•Beanstalkd

bolts• Functions

• Filters

• Aggregation

• Joins

• Talk to databases, Hadoop write-behind

Hadoop

Que

ue

batchprocesses

Apps

BusinessValue

RawData

realtime processes

Storm

Hadoop

Que

ue

batchprocesses

Apps

BusinessValue

RawData

realtime processes

Storm

Parallel Cluster Ingest

Hadoop

Que

ue

batchprocesses

Apps

BusinessValue

RawData

realtime processes

Storm

Fran

z

TailSpout

Hadoop

batchprocesses

Apps

BusinessValue

RawData

realtime processes

Storm

Fran

z

TailSpout

Use Cases

Twitter

URL

Tweeter

Tweeter

Tweeter

Follower

Follower

Follower

Follower

Follower

Follower

Distinct follower

Distinct follower

Distinct follower

Reach

Heartbyte

Fleet Logistics

Brad Andersonbanderson@maprtech.com

@boorad

http://github.com/{tdunning | boorad}/mapr-spout

Thank you.

Brad Andersonbanderson@maprtech.com

@boorad

http://github.com/{tdunning | boorad}/mapr-spout

Recommended