Definition & OverviewInteroperability
Use Cases
Stream ProcessingCEP
Distributed RPC
Source Data
•Social Media Feeds
•Network Sensors
•App/Web Logs
•Stock Tick Data
•Weather Data
•Auctions of Ad Impressions
•Payment Transactions
Before Storm
Queues Workers
Example
(simplified)
StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing“Just works”
Concepts
streams
Unbounded sequence of tuples
Tuple Tuple Tuple Tuple Tuple Tuple Tuple
spouts
Source of streams
spoutspublic interface ISpout extends Serializable { void open(Map conf, TopologyContext context, SpoutOutputCollector collector); void close(); void nextTuple(); void ack(Object msgId); void fail(Object msgId);}
bolts
Processes input streams and produces new streams
boltspublic class DoubleAndTripleBolt extends BaseRichBolt { private OutputCollectorBase _collector;
public void prepare(Map conf, TopologyContext context, OutputCollectorBase collector) { _collector = collector; }
public void execute(Tuple input) { int val = input.getInteger(0); _collector.emit(input, new Values(val*2, val*3)); _collector.ack(input); }
public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("double", "triple")); } }
topologies
Network of spouts and bolts
topologies TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
TridentCascading for Storm
Trident Facilities• Joins
• Aggregations
• Grouping
• Functions
• Filters
• Consistent, Exactly-Once Semantics
TridentTopology topology = new TridentTopology(); TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) .parallelismHint(6);
Interoperability
spouts•Kafka (with transactions)
•Kestrel
•JMS
•AMQP
•Beanstalkd
bolts• Functions
• Filters
• Aggregation
• Joins
• Talk to databases, Hadoop write-behind
Hadoop
Que
ue
batchprocesses
Apps
BusinessValue
RawData
realtime processes
Storm
Hadoop
Que
ue
batchprocesses
Apps
BusinessValue
RawData
realtime processes
Storm
Parallel Cluster Ingest
Hadoop
Que
ue
batchprocesses
Apps
BusinessValue
RawData
realtime processes
Storm
Fran
z
TailSpout
Hadoop
batchprocesses
Apps
BusinessValue
RawData
realtime processes
Storm
Fran
z
TailSpout
Use Cases
URL
Tweeter
Tweeter
Tweeter
Follower
Follower
Follower
Follower
Follower
Follower
Distinct follower
Distinct follower
Distinct follower
Reach
Heartbyte
Fleet Logistics