STREAMANALTIX 2.1.6 Emitters · This option helps to override the configuration defined at message level for fields. If the option selected is Message Configuration , following fields

StreamAnalytix 2.1.6 Emitters

pg. 1

STREAMANALTIX 2.1.6

EMITTERS

Use drag and drop Operators to store data in any static store.


pg. 2

Introduction Welcome to StreamAnalytix! StreamAnalytix platform enables enterprises to analyze and respond to events in real-time at Big Data scale. With its unique multi-engine architecture, StreamAnalytix provides an abstraction that offers a flexibility to execute data pipelines using a stream processing engine of choice depending upon the application use-case, taking into account the advantages of Storm or Spark Streaming based upon processing methodology (CEP, ESP) and latency.

About This Guide StreamAnalytix Emitters are the drag and drop operators for storing the processed data in Hadoop, NoSQL stores, Databases, Indexing stores, or third party BI tools. This guide lists all the StreamAnalytix Emitters, describes their configuration properties, and explains their use in detail. More Information Please visit www.streamanalytix.com To give us your feedback on your experience with the application and report bugs or problems, mail us at [email protected] To receive updated documentation in the future please register yourself at www.streamanalytix.com We welcome your feedback.

http://www.streamanalytix.com/

mailto:[email protected]

http://www.streamanalytix.com/


pg. 3

Terms & Conditions This manual, the accompanying software and other documentation, is protected by U.S. and international copyright laws, and may be used only in accordance with the accompanying license agreement. Features of the software, and of other products and services of Impetus Technologies, may be covered by one or more patents. All rights reserved.

All other company, brand and product names are registered trademarks or trademarks of their respective holders. Impetus Technologies disclaims any responsibility for specifying which companies own which marks or which organizations.

USA Los Gatos Impetus Technologies, Inc. 720 University Avenue, Suite 130 Los Gatos, CA 95032, USA Ph.: 408.252.7111, 408.213.3310 Fax: 408.252.7114 © 2017 Impetus Technologies, Inc., All rights reserved.

If you have any comments or suggestions regarding this document, please send them via e-mail to [email protected]



pg. 4

Table of Contents Introduction ............................................................................................................................................... 2

About This Guide.................................................................................................................................... 2

Terms & Conditions ................................................................................................................................... 3

EMITTERS ................................................................................................................................................... 5

HDFS/ Native HDFS ................................................................................................................................ 6

Advanced HDFS ...................................................................................................................................... 8

Cassandra ............................................................................................................................................. 10

HBase.. ................................................................................................................................................. 14

JDBC…. .................................................................................................................................................. 17

ElasticSearch ........................................................................................................................................ 19

Solr…..................................................................................................................................................... 22

Kafka…. ................................................................................................................................................. 25

RabbbitMQ ........................................................................................................................................... 27

ActiveMQ ............................................................................................................................................. 29

Foreach................................................................................................................................................. 30

Streaming ............................................................................................................................................. 31

Router................................................................................................................................................... 32


pg. 5

EMITTERS Emitters defines the destination stage of a pipeline that could be a NoSql store, Indexer, Relational database, or third party BI tool. For Spark pipelines, you can use the following emitters:

Emitters Description Native HDFS Writes data into HDFS. Advance HDFS Write data into HDFS with advance configuration. Cassandra Writes data to a Cassandra cluster. HBase Writes data to an HBase cluster. Elasticsearch Writes data to an Elasticsearch cluster. Solr Writes data to a Solr cluster. Kafka Writes data to Kafka. RabbitMQ Writes data to RabbitMQ. MapRStreams Writes data to MapRStreams. JDBC Writes data to JDBS. Foreach Passes data to the custom for each implementation. Streaming Writes data to built-in RT Dashboards. Router Writes into another pipeline. Custom Writes data into any destination.

For Storm pipelines, you can use the following channels:

Emitters Description HDFS Writes data into HDFS. Hive Write data into Hive tables. Cassandra Writes data to a Cassandra cluster. HBase Writes data to a HBase cluster. Elasticsearch Writes data to a Elasticsearch cluster. Solr Writes data to a Solr cluster. Kafka Writes data to Kafka. RabbitMQ Writes data to RabbitMQ. MapRStreams Writes data to MapRStreams. JDBC Writes data to JDBS. ActiveMQ Writes data to ActiveMQ. Streaming Writes data to built-in RT Dashboards. Router Writes into another pipeline. Custom Writes data into any destination.


pg. 6

HDFS/ Native HDFS HDFS emitter can sink pipeline data into HDFS (Hadoop Distributed File System). Configure HDFS Persister To add a HDFS emitter into your pipeline, drag the emitter on to the canvas, connect it to a channel or processor, and right click on it to configure it.

Field Description Connection Name Select a HDFS connection.

Add Configuration Add custom HDFS properties.

HDFS Path attributes :

HDFS Path Directory path on HDFS where data has to be written.

Fields Message fields which will be persisted.

Control File Enable control file creation for providing information about outputFileName, creationDate, numberOfRecords, dateOfProcessing, blockSize, replicationFactor, streamingSource, compressionFormat, fieldDelimiter for created HDFS file.


pg. 7

HDFS Directory Prefix

Prefix added to each directory created on HDFS.

Block Size Size of each block (in Bytes) allocated in HDFS.

Replication Replication factor used to make additional copies of data.

Output Type Output format in which results will be processed.

Delimiter Delimiter character used to separate two fields.

Compression Type Algorithm to use to compress the data.

Location Location where the HDFS file get moved.

Native HDFS It is same as the HDFS Emitter but used for Spark pipelines.


pg. 8

Advanced HDFS Advanced HDFS emitter can persist data into multiple HDFS directories. To configure an Advance HDFS emitter, provide the HDFS directory paths along with the list of fields of the message. This list of fields gets stored in an HDFS file in a specified delimited format inside the provided HDFS directory. Additionally, it also supports creation of a meta data file for the HDFS file and allows you to do the following configurations:

1. Configure the prefix for the file. 2. Configure the MVEL expression for the HDFS path using the PIPELINE SCOPED VARIABLES and

incoming message fields. An example of a valid MVEL expression: /abc/@ {messageName.field5}: messageName is the name of the incoming message, field5 is one of the fields within the message.

3. Set the replication factor for the file. 4. Set the block size for the file. 5. Configure the delimiter type for the record values. 6. Configure the compression format for the file. 7. Configure the maximum size for the file. Once the file size reaches the maximum value, a new

file is created. Unlimited size file can be created by providing the size as zero (0). 8. Configure the number of records to synchronize at a time.

Configure Advanced HDFS To add an Advance HDFS emitter into your pipeline, drag the emitter on to the canvas, connect it to a channel or processor, and right click on it to configure it.


pg. 9

Field Description Connection Name Select a HDFS connection.

Max Active Writers Number of writers who can write the data in HDFS.

Sync Interval Time interval after which data will synchronise with HDFS.

Add Configuration Add additional custom HDFS properties.

Add New Path Set multiple HDFS paths to write data.

HDFS Path Directory path on HDFS where data has to be written.

File Output Format Select output format from the drop-down list.

Fields Message fields which will be persisted.

Control File Enable control file creation for providing information about outputFileName, creationDate, numberOfRecords, dateOfProcessing, blockSize, replicationFactor, streamingSource, compressionFormat, fieldDelimiter for created HDFS file.

HDFS File Prefix Prefix added to each file created on HDFS.

Sync Size Number of records after which data has to be synced with the write disk.

Block Size Size of each block (in Bytes) allocated in HDFS.

Rotation Policy Rotation policy that will rotate files depending on specified size/time.

File Rotation Size Trigger file rotation when data files reach a specified file size (in bytes).

Replication Replication factor used to make additional copies of data.

Output Type Output format in which results will be processed.

Delimiter Delimiter character used to separate two fields.

Compression Type Algorithm to use to compress the data.


pg. 10

Cassandra Cassandra emitter persists data into a Cassandra cluster. Cassandra is highly scalable and can manage huge amount of structured data. It provides high availability and eliminates even a single point of failure. You can specify the Cassandra storage configurations on a group level (including the target table name using a Javascript expression) and enable data compression. Configure Cassandra Emitter for Spark To add a Cassandra emitter into your pipeline, drag it to the canvas, connect it to a Channel or Processor, and right click to provide configuration settings.

Field Description

Connection Name Select a Cassandra connection.

Key Space Namespace that defines data replication on nodes.The keyspace is auto created with convention 'ns_' + tenantId where tenantId is the workspace id.

Data Definition Source

Data Definition source tells whether to use Message level configuration or Custom. Message Configuration: During message creation, you can define table name for each and every field. Those table names will be used by persister during data insertion. If each user defined column has different table


pg. 11

name, then persistId, saxMessageType and IndexId will be part of each and every table and PersistId will be the PRIMARY key of each table. Here you cannot define Secondary or composite index. Even unable to select output fields. All the columns will be persisted who’s property 'store' under persistence tab, is marked as TRUE at message level-Select message configuration source options from the drop-down list. Custom: This option helps to override the configuration defined at message level for fields.

If the option selected is Message Configuration , following fields are displayed:

Consistency Level Consistency level refers to how up- to- date and synchronized a row of Cassandra data is on all of its replicas.

Caching Cassandra includes integrated caching and distributes cache data around the cluster. You can enable or disable caching by selecting ALL/NONE.

Compression Enables compression on data. .

Replication Strategy

Replication Strategy specifies the implementation class for determing the nodes where replicas are placed. Possible strategies are 'SimpleStrategy' and 'NetworkTopologyStrategy'.

Replication Factor For a replication factor N, Cassandra will create N-1 replicas of the record.

Thrift Client Retries

Number of retries the Cassandra client will make to make a connection to server.

Thrift Client Retries Interval in Ms


Ignore Null values If enabled then every update/insert validate values of all the fields. If value is null or empty, then those fields would not be the part of updation. This helps you to keep the previous data of the column.

If the Data Definition source selected is Custom, following fields are displayed.

Output Fields Fields in the message that needs to be a part of the data.

Key Columns Fields those are part of Output fields can be used here. This helps to define Primary or composite key for table.

Secondary Index Fields those are part of Output fields can be used here. Selected fields will be used to create clustering key.

Table Name Expression

MVEL Expression used to evaluate table name. For example consider the expression below: @{'ns_1_myindex' + Math.round(<MessageName>.timestamp 3600*1000))} Here a new index will be created with one hour time range and data will be dynamically indexed on the basis of field whose field alias name is 'timestamp'.


pg. 12

NOTE: Batching is not configured in Cassandra in case of Spark since Spark implicitly batches records based on time window. Configure Cassandra Emitter for Storm

Field Description Connection Name Select a Cassandra connection.

Parallelism Number of executors (threads) of the emitter.

Task Count Number of instances of the emitter.

Is Batch Enable If this is selected as true then the data will be inserted in batches. If true, you need to specify a batch size for the batches.

KeySpace Defines a new or existing keyspace and its replica placement strategy.

Data Definition Source

Data Definition source tells whether to use Message level configuration or Custom. Message Configuration: During message creation, you can define table name for each and every field. Those table names will be used by persister during data insertion. If each user defined column has different table name, then persistId, saxMessageType and IndexId will be part of each and every table and PersistId will be the PRIMARY key of each table. Here you cannot define


pg. 13

Secondary or composite index. Even unable to select output fields. All the columns will be persisted who’s property 'store' under persistence tab, is marked as TRUE at message level-Select message configuration source options from the drop-down list. Custom: This option helps to override the configuration defined at message level for fields.

If the option selected is Message Configuration, following fields are displayed :

Consistency Level Consistency level refers to how up- to- date and synchronized a row of Cassandra data is on all of its replicas.

Caching Cassandra includes integrated caching and distributes cache data around the cluster. You can enable or disable caching by selecting ALL/NONE

Compression Enables compression on data ..

Replication Strategy

Replication Strategy specifies the implementation class for determing the nodes where replicas are placed. Possible strategies are 'SimpleStrategy' and 'NetworkTopologyStrategy'.

Replication Factor For a replication factor N, Cassandra will create N-1 replicas of the record.

Thrift Client Retries


Thrift Client Retries Interval in Ms


Ignore Null values If enabled then every update/insert validate values of all the fields. If value is null or empty, then those fields would not be the part of updation. This helps you to keep the previous data of the column.

If the Data Definition source selected is Custom, following fields are displayed.


MVEL Expression used to evaluate table name. For example consider the expression below: @{'ns_1_myindex' + Math.round(<MessageName>.timestamp 3600*1000))} Here a new index will be created with one hour time range and data will be dynamically indexed on the basis of field whose field alias name is 'timestamp'.

Output Fields Fields in the message that needs to be a part of the data.

Key Columns Fields those are part of Output fields can be used here. This helps to define Primary or composite key for table.


pg. 14

Secondary Index Fields those are part of Output fields can be used here. Selected fields will be used to create clustering key.

HBase HBase emitter stores streaming data into HBase. HBase provides quick random access to huge amount of structured data. Configure Hbase Emitter for Spark To add a Hbase emitter into your pipeline, drag it to the canvas, connect it to a Channel or Processor, and right click on it to configure.

Field Description Connection Name Select a HBase connection.

Operation Type RDD: Spark native API’s are used to persist data in HBase. Record Based: Provides custom implementation for data ingestion in HBase.

Output Message Name of the message/messagas to be used in the pipeline.

Table Name Expression MVEL Expression that is used to evaluate the persistence table name. The keyspace is created by StreamAnalytix with


pg. 15

convention 'ns_' + tenantId where tenantId is your workspace id.

Compression Enables compression on data if seleted True It provides the facility to compress the message before storing it. The algorithm used is Snappy.

Region Splitting This functionality defines how the HBase tables should be pre-split. The default value is ‘No pre-split’. The supported options are: Default: No Pre-Split- Only one region will be created initially. Based on Region Boundaries: Regions are created on the basis of given key boundaries. For example, if your key is a hexadecimal key and you provide a value ‘4, 8, d’, it will create four regions as follows:

• 1st region for keys less than 4 • 2nd region for keys greater than 4 and less than 8 • 3rd region for keys greater than 8 and less than d • 4th region for keys greater than d

Encoding Data encoding type either UTF-8 (base encoding) or BASE

64(64 bit encoding). Exclude Empty Fields Ignore null values or include null values.

Configure Hbase Emitter for Storm


pg. 16

Field Description Connection Name Select a HBase connection.



Is Batch Enable TRUE to insert data in batches. (specify a batch size for the batches).


Use MVEL Expression to populate table name. For example consider the expression below: @{'ns_1_myindex' + Math.round(<MessageName>.timestamp 3600*1000))} Here a new index will be created with one hour time range and data will be dynamically indexed on the basis of field whose field alias name is 'timestamp'.

Compression TRUE to enable data compression.

Region Splitting Defintion

A replication strategy specifies the implementation class for determining the nodes where replicas are placed. Possible strategies are SimpleStrategy and NetworkTopologyStrategy.

Encoding Data encoding type either UTF-8 (base encoding) or BASE 64(64 bit encoding).

Exclude Empty Fields Ignore null values or include null values.

Add Configuration Add additional custom HBaseproperties.


pg. 17

JDBC JDBC Emitter allows you to push data into following relational databases: MySQL, PostgreSQL, Oracle DB Configure JDBC Emitter for Spark To add a JDBC Emitter into your pipeline, drag the JDBC Emitter on the canvas and connect it to a Channel or Processor. The Configuration Settings of the JDBC Emitter are as follows:

Field Description Connection Name Select a database connection.

Message Name Name of the message/messages to be used in the pipeline.

Is Batch Enable Enable parameter to batch multiple messages and improve write performance.

Batch Size Number of messages to be batched together.

Table Name Existing database tablename whose schema is to be fetched.

Fetch Schema Fetches schema of table mentioned in the table name. The key symbol shown against the field indicates it is the primary key of the table. Select the “ignore” checkbox to skip


pg. 18

the field. Data will not be pushed in this column as it is ignored. Incoming data of pipeline can be mapped with these table fields. You can provide static or dynamic values in the table. For providing dynamic values , you can use MVEL expression . There are some predefined functions of database which can be mapped with table field values. For example, now() and Sysdate() functions are used for fetching current date.

Add Configuration Add additional custom properties.

Configure JDBC Emitter for Storm To add a JDBC Emitter into your pipeline, drag the JDBC Emitter on the canvas and connect it to a Channel or Processor. The Configuration Settings of the JDBC Emitter are as follows :

Field Description Connection Name Select a database connection.

Message Name Name of the message/messages to be used in the pipeline.

Is Batch Enable Enable parameter to batch multiple messages and improve write performance.

Batch Size Number of messages to be batched together.

Table Name Existing database tablename whose schema is to be fetched.

Fetch Schema Fetches schema of table mentioned in the table name. The key symbol shown against the field indicates it is the primary key of the table. Select the “ignore” checkbox to skip the field. Data will not be pushed in this column as it is ignored. Incoming data of pipeline can be mapped with these table fields. You can provide static or dynamic values in the table. For providing dynamic values , you can use MVEL expression . There are some predefined functions of database which can be mapped with table field values. For example, now() and Sysdate() functions are used for fetching current date.

Add Configuration Add additional properties.


pg. 19

ElasticSearch ElasticSearch emitter allows you to sink data into an Elasticsearch index store. While configuring an Elasticsearch emitter, you can specify the target index name using a Javascript expression and enable replication, shards, full text search and custom routing. Configure Elasticsearch Emitter for Spark To add an Elastic Search emitter into your pipeline, drag the emitter to the canvas and connect it to a channel or processor. The configuration settings are as follows:

Field Description Connection Name Select an Elasticsearch connection.

Operation Type Default Operation Type.

Output Message Output message that needs to be indexed.

Across Field Search Enabled

This specifies if full text search is to be enabled across all fields.

Index Number of Shards This specifies number of shards to be created in index store


pg. 20

Index Replication Factor This specifies number of additional copies of data that are to be kept across nodes. Should be less than n-1, where n is the number of nodes in the cluster.

Index Expression It is the MVEL Expression that is used to evaluate the index name. This can help you leverage field based partitioning. For example consider the expression below: @{'ns_1_myindex' + Math.round(<MessageName>.timestamp 3600*1000))} A new index will be created with one hour time range and data will be dynamically indexed on the basis of field whose field alias name is 'timestamp'

Routing Required This specifies if custom dynamic routing is to be enabled. If enabled, a routing policy json needs to be defined.

Index Source It stores the actual JSON in the index store and uses it to serve search requests.

Add Configuration To add additional Elascticsearch properties.

NOTE: Batching is not configured in Elasticsearch/Solr in case of Spark since Spark implicitly batches records based on time window. Configure Elastic Search Emitter for Storm To add an Elastic Search emitter into your pipeline, drag the emitter to the canvas and connect it to a channel or processor. The configuration settings are as follows:


pg. 21

Field Description Connection Name Select an Elasticsearch connection.



Is Batch Enable If this is selected as true then the data will be indexed in batches. If true, you need to specify a batch size for the batches.



Index Number of Shards

This specifies number of shards to be created in index store.

Index Replication Factor

This specifies number of additional copies of data to be kept across nodes. Should be less than n-1; where n is the number of nodes in the cluster.

Index Expression The MVEL Expression Expression is used to evaluate the index name. For example consider the expression below: @{'ns_1_myindex' + Math.round(<MessageName>.timestamp 3600*1000))} A new index will be created with one hour time range and data will be dynamically indexed on the basis of field whose field alias name is 'timestamp'



Add Configuration To add additional Elascticsearch properties.


pg. 22

Solr Solr Emitter enables you to index data into a Solr index-store. The index name will be evaluated on the basis of a JavaScript expression provided in Index Expression field. Configure Solr Emitter for Spark To add a Solr emitter into your pipeline, drag it on the canvas and connect it to a Channel or Processor. The configuration settings of the Solr emitter are as follows:

Field Description Connection Name Select a Solr connection.



Index Number of Shards This specifies number of shards to be created in index store.

Index Replication Factor This specifies number of additional copies of data is to be kept across nodes. Should be less than n-1, where n is the number of nodes in the cluster.

Index Expression It is the MVEL Expression that is used to evaluate the index name. This can help you leverage field based partitioning. For example consider the expression below:


pg. 23

@{'ns_1_myindex' + Math.round(<MessageName>.timestamp 3600*1000))} Here a new index will be created with one hour time range and data will be dynamically indexed on the basis of field whose field alias name is 'timestamp'.

Routing Required This specifies if custom dynamic routing is to be enabled. If enabled, a routing policy json needs to be defined as shown in the below figure.

Configure Solr Emitter for Storm To add a Solr emitter into your pipeline, drag it on the canvas and connect it to a Channel or Processor. The configuration settings of the Solr emitter are as follows:


pg. 24

Field Description Connection Name Select a Solr connection.



Is Batch Enable If this is selected as true then the data will be indexed in batches. If true, you need to specify a batch size for the batches.



Index Number of Shards

This specifies number of shards to be created in index store.

Index Replication Factor

This specifies number of additional copies of data to be kept across nodes. Should be less than n-1; where n is the number of nodes in the cluster.

Index Expression It is the MVEL Expression that is used to evaluate the index name. This can help you leverage field based partitioning. For example consider the expression below: @{'ns_1_myindex' + Math.round(<MessageName>.timestamp 3600*1000))} Here a new index will be created with one hour time range and data will be dynamically indexed on the basis of field whose field alias name is 'timestamp'.



Add Configuration To add additional Solr properties.


pg. 25

Kafka Kafka emitter sinks data into a Kafka cluster. Data format supported are: JSON, DELIMITER and Jsonarray. Configure Kafka Emitter for Spark To add a Kafka Emitter into your pipeline, drag the Kafka Emitter on the canvas and connect it to a Channel or Processor. Right click on the emitter to configure it as explained below:

Field Description

Connection Name Select a Kafka connection.

Topic Name Kafka topic name where you want to emit data.

Partitions No. of partitions to create for a topic. Each partition is ordered, immutable sequence of messages that is continually appended to a commit log.

Replication Factor For a topic with replication factor N, Kafka will tolerate up to N-1 failures without losing any messages committed to the log.

Producer Type Parameter specifies whether the messages are sent asynchronously in a background thread. Valid Values are async for asynchronous send and sync for synchronous send.

Output Format Data type format of the output message.

Output Fields Fields of the output message.

Add Configuration To add additional Kafka properties.


pg. 26

Configure Kafka Emitter for Storm To add a Kafka Emitter into your pipeline, drag the Kafka Emitter on the canvas and connect it to a Channel or Processor. Right click on the emitter to configure it as explained below:

Field Description Connection Name Select a Kafka connection.



Topic Name Name of the topic.

Partitions No. of partitions to create for a topic. Each partition is ordered, immutable sequence of messages that is continually appended to a committed log.

Replication Factor For a topic with replication factor N, Kafka will tolerate up to N-1 failures without losing any messages committed to the log.

Producer Type Parameter specifies whether the messages are sent asynchronously in a background thread. Valid Values are async for asynchronous send and sync for synchronous send.

Output Format Data type format of the output.


Add Configuration To add additional Kafka properties.


pg. 27

RabbbitMQ RabbitMQ emitter sinks data into a RabbitMQ cluster. Data formats supported are: JSON, DELIMITER, Jsonarray. Configure RabbitMQ Emitter for Spark To add a RabbitMQ Emitter into your pipeline, drag the RabbitMQ Emitter on the canvas and and connect it to a Channel or Processor. Right click on the emitter to configure it as explained below:

Field Description Connection Name Select a RabbitMQ connection.

Exchange Name Exchange name for RabbitMQ.

Exchange Type Select Exchange Type for RabbitMQ, accepts 3 types - direct, topic and fan-out.

Exchange Durable TRUE: the exchange will not be deleted if you restart RabbitMQ. FALSE: the exchange will be deleted if you restart RabbitMQ.

Routing Key Select Routing Key for RabbitMQ.

Queue Name Select Queue Name for RabbitMQ.

Queue Durable TRUE: the queue will not be deleted if you restart RabbitMQ. FALSE: the queue will be deleted if you restart RabbitMQ.


pg. 28



Add Configuration To add additional RabbitMQ properties.

Configure RabbitMQ Emitter for Storm To add a RabbitMQ Emitter into your pipeline, drag the RabbitMQ Emitter on the canvas and and connect it to a Channel or Processor. Right click on the emitter to configure it as explained below:

Field Description Connection Name Select a RabbitMQ connection.

Parallelism Number of executors (threads) of the emitter. Task Count Number of instances of the emitter. Exchange Name Exchange name for RabbitMQ. Exchange Type Select Exchange Type for RabbitMQ, accepts three types -

direct, topic and fanout. Exchange Durable TRUE: the exchange will not be deleted if you restart RabbitMQ.

FALSE: the exchange will be deleted if you restart RabbitMQ. Routing Key Select Routing Key for RabbitMQ.

Queue Name Select Queue Name for RabbitMQ.

Queue Durable Accepts either true or false.



Add Configuration To add additional RabbitMQ properties.


pg. 29

ActiveMQ ActiveMQ emitter publishes data to a defined ActiveMQ topic. Configure ActiveMQ Emitter for Storm To add an ActiveMQ Emitter into your pipeline, drag the ActiveMQ Emitter on the canvas, connect it to a Channel or Processor, and right click on it to configure.

Field Description Connection Name Select a ActiveMQ connection.

Parallelism Number of executors (threads) of the emitter. Task Count Number of instances of the emitter. Topic Name Topic name where ActiveMQ producers can publish messages.

Routing Key Specifies a key for redirecting messages to specific topic. Output Format Data type format of the output. Output Fields Fields in the message that needs to be a part of the data. Add Cofiguration Add additional custom properties.


pg. 30

Foreach Foreach is an action that Spark provides. Foreach emitter enables you to provide a custom implementation which gets executed using the foreach or foreach partition function. You have to write the implementation for the IteratorInterface to use this emitter and provide the implementation class name in the Executor Plugin text box. Configure Foreach Emitter for Spark To add a Foreach Emitter into your pipeline, drag the Foreach Emitter to the canvas, connect it to a Channel or Processor, and right click on it to configure.

Field Description Function Type Foreach: Through this function type, you can implement the

IteratorInterface where you will get the input source to implementation is JSONObject. JSONObject is the single message of Dstream. Foreach Partition: Through this function type, you can implement the IteratorInterface where you will get the input source as Iterator of JSONObject to the implementation. This Iterator of JSONobject is the list of JSON objects that are available inside a partition.

Executor Plugin Qualified name of the implementation class to which the control will be passed in order to process the incoming data. For example, sample executor plugin for Foreach function is com.execute.ForeachFunction sample executor plugin for Foreach Partition function is com.execute.ForeachPartitionFunction.



pg. 31

Streaming Streaming emitter enables you to visualize the data running in the pipeline in the StreamAnalytix built-in Real-time dashboards. Configure Streaming Emitter for Spark To add a Streaming Emitter into your pipeline, drag the Streaming Emitter to the canvas, connect it to a Channel or Processor, and right click on it to configure.

Field Description Stream Id Exchange name on which streaming messages will be sent.


Configure Streaming Emitter for Storm

Field Description Parallelism Number of executors (threads) of the emitter. Task Count Number of instances of the emitter. Stream Id Streaming messages are sent on the exchange name.


pg. 32

Router Router emitter integrates data pipelines. You can connect multiple pipelines and emit one pipeline data into another. Both the source and the destination pipelines should use Router emitter. Configure Router Emitter for Storm To add a Router Emitter into your pipeline, drag the Router Emitter on the canvas, connect it to a Channel or Processor, and right click on it to configure.

Field Description Parallelism Number of executors (threads) of the emitter.



To give us your feedback on your experience with the application and report bugs or problems, mail us at [email protected]


Documents

STREAMANALTIX 2.1.6 Emitters · This option helps to override the configuration defined at message level for fields. If the option selected is Message Configuration , following fields