12
Apache Storm Real-Time Event Processing

Apache storm

Embed Size (px)

Citation preview

Page 1: Apache storm

Apache StormReal-Time Event Processing

Page 2: Apache storm

Big Data Tools•Data Processing• Perform calculations on datasets e.g. Storm

•Data Transfer• Gather & ingest data into data processing systems e.g. Kafka

•Data Storage• Store the datasets during various data processing stages e.g. Hadoop

Page 3: Apache storm

Apache StormDistributed, real-time computational framework, used to process unbounded streams.• It enables the integration with messaging and persistence

frameworks.• It consumes the streams of data from different data sources.• It process and transform the streams in different ways.

Page 4: Apache storm

Apache Storm ConceptsTopology Storm topology represents a graph of computations using:• Nodes• Represents individual computations

• Edges• Represents data being passes between Nodes

Topology is driven through the continuous live feed of data and perform some operation.

Page 5: Apache storm

Topology

Node Edge Node Edge Node

Page 6: Apache storm

Apache Storm Concepts• Tuple• Data send between nodes in form of Tuples.

• Stream• Unbounded sequence of Tuples between two Nodes.

• Spout• Source of Stream in Topology.

• Bolt• Computational Node, accept input stream and perform

computations.

Page 7: Apache storm

Topology

Spout Stream Bolt Stream BoltMessag

ing System

Live feed of data

Page 8: Apache storm

Apache Storm Concepts• Spout• Receive data by• Listen to message queue for incoming messages• Listen to database changes• Listen to other source of data feed

• Act as a source of stream• Read data from data source• Emit tuple to next type of node called Bolt.

Page 9: Apache storm

Apache Storm Concepts•Bolt• Accept tuple from its input stream• Perform computation/transformation• Perform filtering, aggregation or perhaps join• Emit new tuple to its output stream

Page 10: Apache storm

Apache KafkaKafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.• Kafka maintains feeds of messages in topics• Producers write data to topics and consumers read from

topics• Topics are partitioned and replicated across multiple nodes.

Page 11: Apache storm

Kafka ConfigurationKAFKA_HOME\config\server.properties# A comma seperated list of directories under which to store log fileslog.dirs=C:/Installers/kafka/kafka-logs

KAFKA_HOME\config\zookeeper.properties# the directory where the snapshot is stored.dataDir=C:/Installers/kafka/zookeeper-data

Page 12: Apache storm

Kafka CommandsStart Zookeeper$ bin/zookeeper-server-start.sh config/zookeeper.properties

$ bin\windows\zookeeper-server-start.bat config\zookeeper.properties

Start Kafka Broker$ bin/kafka-server-start.sh config/server.properties

$ bin\windows\kafka-server-start.bat config\server.properties

Create a Topic$ bin/kafka-topics.sh --list --zookeeper localhost:2181

$ bin\windows\kafka-topics --list --zookeeper localhost:2181