32
Realtime Computation with Storm Brad Anderson [email protected] @boorad

Realtime Computation with Storm

  • Upload
    boorad

  • View
    1.886

  • Download
    1

Embed Size (px)

DESCRIPTION

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use! We will talk about how Storm is architected, how to interoperate with Hadoop, and a few real-world use-cases.

Citation preview

Page 1: Realtime Computation with Storm

Realtime Computationwith Storm

Brad [email protected]

@boorad

Page 2: Realtime Computation with Storm
Page 3: Realtime Computation with Storm

Definition & OverviewInteroperability

Use Cases

Page 4: Realtime Computation with Storm

Stream ProcessingCEP

Distributed RPC

Page 5: Realtime Computation with Storm

Source Data

•Social Media Feeds

•Network Sensors

•App/Web Logs

•Stock Tick Data

•Weather Data

•Auctions of Ad Impressions

•Payment Transactions

Page 6: Realtime Computation with Storm

Before Storm

Queues Workers

Page 7: Realtime Computation with Storm

Example

(simplified)

Page 8: Realtime Computation with Storm

StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing“Just works”

Page 9: Realtime Computation with Storm

Concepts

Page 10: Realtime Computation with Storm

streams

Unbounded sequence of tuples

Tuple Tuple Tuple Tuple Tuple Tuple Tuple

Page 11: Realtime Computation with Storm

spouts

Source of streams

Page 12: Realtime Computation with Storm

spoutspublic  interface  ISpout  extends  Serializable  {        void  open(Map  conf,                            TopologyContext  context,                            SpoutOutputCollector  collector);        void  close();        void  nextTuple();        void  ack(Object  msgId);        void  fail(Object  msgId);}

Page 13: Realtime Computation with Storm

bolts

Processes input streams and produces new streams

Page 14: Realtime Computation with Storm

boltspublic  class  DoubleAndTripleBolt  extends  BaseRichBolt  {        private  OutputCollectorBase  _collector;

       public  void  prepare(Map  conf,                                                TopologyContext  context,                                                OutputCollectorBase  collector)  {                _collector  =  collector;        }

       public  void  execute(Tuple  input)  {                int  val  =  input.getInteger(0);                                _collector.emit(input,  new  Values(val*2,  val*3));                _collector.ack(input);        }

       public  void  declareOutputFields(OutputFieldsDeclarer  declarer)  {                declarer.declare(new  Fields("double",  "triple"));        }        }

Page 15: Realtime Computation with Storm

topologies

Network of spouts and bolts

Page 16: Realtime Computation with Storm

topologies        TopologyBuilder builder = new TopologyBuilder();                builder.setSpout("spout", new RandomSentenceSpout(), 5);                builder.setBolt("split", new SplitSentence(), 8)                 .shuffleGrouping("spout");        builder.setBolt("count", new WordCount(), 12)                 .fieldsGrouping("split", new Fields("word"));

Page 17: Realtime Computation with Storm

TridentCascading for Storm

Page 18: Realtime Computation with Storm

Trident Facilities• Joins

• Aggregations

• Grouping

• Functions

• Filters

• Consistent, Exactly-Once Semantics

Page 19: Realtime Computation with Storm

TridentTopology  topology  =  new  TridentTopology();                TridentState  wordCounts  =          topology.newStream("spout1",  spout)              .each(new  Fields("sentence"),  new  Split(),  new  Fields("word"))              .groupBy(new  Fields("word"))              .persistentAggregate(new  MemoryMapState.Factory(),                                                        new  Count(),                                                        new  Fields("count"))                                              .parallelismHint(6);

Page 20: Realtime Computation with Storm

Interoperability

Page 21: Realtime Computation with Storm

spouts•Kafka (with transactions)

•Kestrel

•JMS

•AMQP

•Beanstalkd

Page 22: Realtime Computation with Storm

bolts• Functions

• Filters

• Aggregation

• Joins

• Talk to databases, Hadoop write-behind

Page 23: Realtime Computation with Storm

Hadoop

Que

ue

batchprocesses

Apps

BusinessValue

RawData

realtime processes

Storm

Page 24: Realtime Computation with Storm

Hadoop

Que

ue

batchprocesses

Apps

BusinessValue

RawData

realtime processes

Storm

Parallel Cluster Ingest

Page 25: Realtime Computation with Storm

Hadoop

Que

ue

batchprocesses

Apps

BusinessValue

RawData

realtime processes

Storm

Fran

z

TailSpout

Page 26: Realtime Computation with Storm

Hadoop

batchprocesses

Apps

BusinessValue

RawData

realtime processes

Storm

Fran

z

TailSpout

Page 27: Realtime Computation with Storm

Use Cases

Page 28: Realtime Computation with Storm

Twitter

URL

Tweeter

Tweeter

Tweeter

Follower

Follower

Follower

Follower

Follower

Follower

Distinct follower

Distinct follower

Distinct follower

Reach

Page 29: Realtime Computation with Storm

Heartbyte

Page 30: Realtime Computation with Storm

Fleet Logistics