Upload
bryan-bende
View
804
Download
0
Embed Size (px)
Citation preview
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Integrating Apache NiFi and Apache Apex
Feb 25th 2016
Bryan Bende – Member of Technical Staff
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Outline
• Introduction to NiFi
• NiFi Site-To-Site
• Apex + NiFi Integration
• Use Case Discussion
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
About Me
• Member of Technical Staff at Hortonworks
• Apache NiFi Committer & PMC Member
• Contributed NiFi + Apex Integration
• Twitter: @bbende / Blog: bryanbende.com
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introduction to Apache NiFi
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi• Powerful and reliable system to process and
distribute data
• Directed graphs of data routing and transformation
• Web-based User Interface for creating, monitoring, & controlling data flows
• Highly configurable - modify data flow at runtime, dynamically prioritize data
• Data Provenance tracks data through entire system
• Easily extensible through development of custom components
[1] https://nifi.apache.org/
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - TerminologyFlowFile
• Unit of data moving through the system• Content + Attributes (key/value pairs)
Processor• Performs the work, can access FlowFiles
Connection• Links between processors• Queues that can be dynamically prioritized
Process Group• Set of processors and their connections• Receive data via input ports, send data via output ports
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - User Interface
• Drag and drop processors to build a flow• Start, stop, and configure components in real time• View errors and corresponding error messages• View statistics and health of data flow• Create templates of common processor & connections
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Provenance
• Tracks data at each point as it flows through the system
• Records, indexes, and makes events available for display
• Handles fan-in/fan-out, i.e. merging and splitting data
• View attributes and content at given points in time
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Queue Prioritization
• Configure a prioritizer per connection
• Determine what is important for your data – time based, arrival order, importance of a data set
• Funnel many connections down to a single connection to prioritize across data sets
• Develop your own prioritizer if needed
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Extensibility
Built from the ground up with extensions in mind
Service-loader pattern for…• Processors• Controller Services• Reporting Tasks• Prioritizers
Extensions packaged as NiFi Archives (NARs)• Deploy NiFi lib directory and restart• Provides ClassLoader isolation• Same model as standard components
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Architecture
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
MasterNiFi Cluster Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
SlavesNiFi Nodes
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Site-To-Site
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Site-To-Site
• Direct communication between two NiFi instances
• Push to Input Port on receiver, or Pull from Output Port on source
• Communicate between clusters, standalone instances, or both
• Handles load balancing and reliable delivery
• Secure connections using certificates (optional)
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Push
• Source connects Remote Process Group to Input Port on destination
• Site-To-Site takes care of load balancing across the nodes in the cluster
NCM
Node 1
Input Port
Node 2
Input Port
Standalone NiFi
RPG
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Pull
• Destination connects Remote Process Group to Output Port on the source
• If source was a cluster, each node would pull from each node in cluster
NCM
Node 1
RPG
Node 2
RPG
Standalone NiFi
Output Port
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site-To-Site Client
• Code for Site-To-Site broken out into reusable module• https://github.com/apache/nifi/tree/master/nifi-commons/nifi-site-to-site-client
• Can be used from any Java program to push/pull from NiFi
Java Program
Site-To-Site Client
Node 1
Output Port
NCM
Node 2
Output Port
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apex + NiFi Integration
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apex + NiFi Integration
• Use Site-To-Site Client in Apex/Malhar Operators
• Input operators to pull data from NiFi Output Port
• Output operators to push data to NiFi Input Port
• NiFiDataPacket to represent data to/from NiFi (think FlowFile)
public interface NiFiDataPacket {
byte[] getContent();
Map<String, String> getAttributes();
}
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apex NiFi Input Operators
AbstractNiFiInputOperator Base class for NiFi Input Operators Provides interaction with Site-to-Site client, handles replaying of windows Delegates to sub-classes for creating a tuple and emitting a list of tuples
AbstractNiFiSinglePortInputOperator Extends AbstractNiFiInputOperator and adds a single OutputPort<T> Emits a list of tuples to the provided output port
NiFiSinglePortInputOperator Extends AbstractNiFiSinglePortInputOperator Provides implementation that produces NiFiDataPackets
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Input Operator Examplefinal SiteToSiteClient.Builder builder = new SiteToSiteClient.Builder() .url("http://localhost:8080/nifi") .portName("Apex") .requestBatchCount(5);
WindowDataManager wdm = new WindowDataManager.NoopWindowDataManager();
NiFiSinglePortInputOperator nifi = dag.addOperator("nifi", new NiFiSinglePortInputOperator(builder, wdm));
ConsoleOutputOperator console = dag.addOperator("console", new ConsoleOutputOperator());
dag.addStream("nifi_console", nifi.outputPort, console.input);
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apex NiFi Output Operators
AbstractNiFiOutputOperator Base class for NiFi Output Operators Provides method to process a list of tuples Uses NiFiDataPacketBuilder to convert incoming tuples to NiFiDataPackets
NiFiSinglePortOutputOperator Extends AbstractNiFiOutputOperator and adds a buffering Input Port Buffering Input Port flushes tuples (i.e. sends to NiFi) when batch size is reached
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Output Operator Examplefinal SiteToSiteClient.Builder builder = new SiteToSiteClient.Builder() .url("http://localhost:8080/nifi") .portName("Apex”);
NiFiDataPacketBuilder<String> dpb = new StringNiFiDataPacketBuilder();
WindowDataManager wdm = new WindowDataManager.NoopWindowDataManager();
NiFiSinglePortOutputOperator nifi = dag.addOperator("nifi", new NiFiSinglePortOutputOperator(builder, dpb, wdm, 1));
RandomEventGenerator rand = dag.addOperator("rand", new RandomEventGenerator());
dag.addStream("rand_nifi", rand.string_data, nifi.inputPort);
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Use Case Discussion
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Drive Data to Apex for Analysis
NiFi Apex
NiFi
NiFi
• Drive data from sources to central data center for analysis
• Tiered collection approach at various locations, think regional data centers
Edge
Edge
Core
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamically Adjusting Data Flow
• Push analytic results from Apex back to NiFi
• Push results back to edge locations/devices to change behavior
NiFi Apex
NiFi
NiFi
Edge
Edge
Core
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
1. Logs filtered by level and sent from Edge -> Core
2. Apex produces new filter levels based on rate & sends back to core
3. Edge polls core for new filter levels & updates filtering
Example: Dynamic Log Collection
Core NiFiApex
Edge NiFiLogs Logs
New Filters
Logs Output Log Input Log Output
Result Input Store Result
Service Fetch ResultPoll Service
Filter
New Filters
New Filters
Poll
Analytic
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Log Collection – Edge NiFi
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Log Collection – Core NiFi
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Log Collection – Apex ApplicationNiFiSinglePortInputOperator nifiInput = ...;dag.addOperator("nifi-in”, nifiInput);
LogLevelWindowCount count = dag.addOperator("count", new LogLevelWindowCount(attributName));
dag.setAttribute(count, OperatorContext.APPLICATION_WINDOW_COUNT, ...);
NiFiDataPacketBuilder<LogLevels> dataPacketBuilder = new DictionaryBuilder(...);
NiFiSinglePortOutputOperator nifiOutput = ...;dag.addOperator("nifi-out", nifiOutput);
// nifi-in > count -> nifi-outdag.addStream("nifi-in-count", nifiInput.outputPort, count.input);dag.addStream("count-nifi-out", count.output, nifiOutput.inputPort);
Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamic Log Collection – Full Flow
NiFi Apex
NiFi
NiFi
Edge
Edge
Core
Logs
Logs
Logs
New Filters
New Filters
New Filters
Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Summary
• Use NiFi to drive data from sources to Apex
• Leverage results from Apex to adjust your dataflows
• Dynamic Log Collection Example:
https://github.com/bbende/nifi-streaming-examples
Contact Info: • Email: [email protected]• Twitter: @bbende
Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you