Upload
kapil-kumar
View
392
Download
0
Embed Size (px)
Citation preview
Apache StormReal-Time Event Processing
Big Data Tools•Data Processing• Perform calculations on datasets e.g. Storm
•Data Transfer• Gather & ingest data into data processing systems e.g. Kafka
•Data Storage• Store the datasets during various data processing stages e.g. Hadoop
Apache StormDistributed, real-time computational framework, used to process unbounded streams.• It enables the integration with messaging and persistence
frameworks.• It consumes the streams of data from different data sources.• It process and transform the streams in different ways.
Apache Storm ConceptsTopology Storm topology represents a graph of computations using:• Nodes• Represents individual computations
• Edges• Represents data being passes between Nodes
Topology is driven through the continuous live feed of data and perform some operation.
Topology
Node Edge Node Edge Node
Apache Storm Concepts• Tuple• Data send between nodes in form of Tuples.
• Stream• Unbounded sequence of Tuples between two Nodes.
• Spout• Source of Stream in Topology.
• Bolt• Computational Node, accept input stream and perform
computations.
Topology
Spout Stream Bolt Stream BoltMessag
ing System
Live feed of data
Apache Storm Concepts• Spout• Receive data by• Listen to message queue for incoming messages• Listen to database changes• Listen to other source of data feed
• Act as a source of stream• Read data from data source• Emit tuple to next type of node called Bolt.
Apache Storm Concepts•Bolt• Accept tuple from its input stream• Perform computation/transformation• Perform filtering, aggregation or perhaps join• Emit new tuple to its output stream
Apache KafkaKafka is a distributed publish-subscribe messaging system that is designed to be fast, scalable, and durable.• Kafka maintains feeds of messages in topics• Producers write data to topics and consumers read from
topics• Topics are partitioned and replicated across multiple nodes.
Kafka ConfigurationKAFKA_HOME\config\server.properties# A comma seperated list of directories under which to store log fileslog.dirs=C:/Installers/kafka/kafka-logs
KAFKA_HOME\config\zookeeper.properties# the directory where the snapshot is stored.dataDir=C:/Installers/kafka/zookeeper-data
Kafka CommandsStart Zookeeper$ bin/zookeeper-server-start.sh config/zookeeper.properties
$ bin\windows\zookeeper-server-start.bat config\zookeeper.properties
Start Kafka Broker$ bin/kafka-server-start.sh config/server.properties
$ bin\windows\kafka-server-start.bat config\server.properties
Create a Topic$ bin/kafka-topics.sh --list --zookeeper localhost:2181
$ bin\windows\kafka-topics --list --zookeeper localhost:2181